Anirban Ghoshal
Senior Writer

Google Cloud adds 3 new Apache Airflow operators to Vertex AI

news
Aug 12, 20243 mins
Generative AIPython

The new operators will help enterprises integrate Vertex AI’s generative AI models into data pipelines orchestrated by Apacher Airflow and its managed workflow orchestration service Cloud Composer.

Google offices
Credit: Uladzik Kryhin / Shutterstock

Google Cloud has introduced three new Apache Airflow operators within its AI service, Vertex AI.

Apache Airflow, which can be thought of as an upgraded version of a cron job scheduler written in Python, helps enterprises connect data systems so that data flows between them.

Essentially, Airflow paves the way for developers to understand how data flows between two data systems inside an enterprise.

The new Airflow operators include —TextGenerationModelPredictOperator, TextEmbeddingModelGetEmbeddingsOperator, and GenerativeModelGenerateContentOperator — which can be used to generate text predictions, text embeddings, and other content generation, the company said in a blog post.

These integrations will open up new ways for enterprises to perform data analytics using pipelines and will result in use cases such as automated insights, data enrichment, advanced anomaly detection, generation of content and text embeddings, and translation, Google said.

Automated Insights use cases could include generating summaries, reports, and other insights from raw data, Christian Yarros, strategic cloud engineer at Google, explained in the blog post.

Data enrichment as a use case, according to Yarros, would include enhancing datasets with synthetic data via generative AI models.

The text embedding functionality of the operators can be used to take huge amounts of unstructured text and turn it into a structured format, allowing enterprises to dissect it and derive insights from it, Yarros wrote, adding that the content generation functionality can be used to provide DAG metadata such as descriptions, tags, and document values.

Some of the real-world applications of combining Apache Airflow and Vertex AI, according to the company, could be targeted marketing, data cleansing, and coalescing reports.

Enterprises can use Airflow to schedule and orchestrate an email campaign optimization process, Yarros wrote, explaining that once customer data is stored in Google Cloud storage, developers can use a generative model Airflow operator to analyze the customer data to create multiple personalized subject lines and content options for each customer segment.

Another way to use the operators would be to represent visual content in new ways. According to Yarros, this can be done by creating an Airflow DAG that triggers when image or video files are uploaded to Google Cloud storage.

Further, these operators can also be used for cost optimization, the company said, adding that enterprises can use an Airflow DAG to collect cloud resource usage data from monitoring APIs daily or hourly.

“Deploy a Google Generative Model trained on historical usage patterns and reference the model in your Google Generative Model Airflow Operators to analyze the data and identify unusual spikes in CPU usage, network traffic, or storage consumption,” Yarros wrote, adding that if significant anomalies are detected, alerts can be sent to the infrastructure team for investigation and corrective action.