GCP Series: How to Run Workflows with Google Cloud Composer

Overview:  Google Cloud Composer

A fully managed workflow orchestration service built on Apache Airflow.

  • Author, schedule, and monitor pipelines that span across hybrid and multi-cloud environments

  • Built on the Apache Airflow open source project and operated using Python

  • Frees you from lock-in and is easy to us

How to Use?

https://cloud.google.com/composer/docs/how-to

BENEFITS

Fully managed workflow orchestration

Cloud Composer’s managed nature and Apache Airflow compatibility allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources.

Integrates with other Google Cloud products

End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline.

Supports hybrid and multi-cloud

Author, schedule, and monitor your workflows through a single orchestration tool—whether your pipeline lives on-premises, in multiple clouds, or fully within Google Cloud.

Key features

Hybrid and multi-cloud

Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud. Create workflows that connect data, processing, and services across clouds to give you a unified data environment.

Open source

Cloud Composer is built upon Apache Airflow, giving users freedom from lock-in and portability. This open source project, which Google is contributing back into, provides freedom from lock-in for customers as well as integration with a broad number of platforms, which will only expand as the Airflow community grows.

Easy orchestration

Cloud Composer pipelines are configured as directed acyclic graphs (DAGs) using Python, making it easy for any user. One-click deployment yields instant access to a rich library of connectors and multiple graphical representations of your workflow in action, making troubleshooting easy. Automatic synchronization of your directed acyclic graphs ensures your jobs stay on schedule.

Why use Cloud Composer?

Cloud Composer is a fully managed workflow orchestration service, enabling you to create workflows that span across clouds and on-premises data centers. Built on the popular Apache Airflow open source project and operated using the Python programming language, Cloud Composer is free from lock-in and easy to use. By using Cloud Composer instead of a local instance of Apache Airflow, users can benefit from the best of Airflow with no installation or management overhead.

https://cloud.google.com/composer/docs/concepts/overview#why_use

In data analytics, a workflow represents a series of tasks for ingesting, transforming, analyzing, or utilizing data. In Airflow, workflows are created using DAGs, or “Directed Acyclic Graphs”.

DAG is a collection of tasks that you want to schedule and run, organized in a way that reflects their relationships and dependencies. DAGs are created in Python scripts, which define the DAG structure (tasks and their dependencies) using code.

Each task in a DAG can represent almost anything—for example, one task might perform any of the following functions:

  • Preparing data for ingestion
  • Monitoring an API
  • Sending an email
  • Running a pipeline

A DAG shouldn’t be concerned with the function of each constituent task—its purpose is to ensure that each task is executed at the right time, in the right order, or with the right issue handling.

For more information on DAGs and tasks, see the Apache Airflow documentation.

Environments

To run workflows, you first need to create an environment. Airflow depends on many micro-services to run, so Cloud Composer provisions Google Cloud components to run your workflows. These components are collectively known as a Cloud Composer environment.

Environments are self-contained Airflow deployments based on Google Kubernetes Engine, and they work with other Google Cloud services using connectors built into Airflow. You can create one or more environments in a single Google Cloud project. You can create Cloud Composer environments in any supported region.

For an in-depth look at the components of an environment, see Cloud Composer environment architecture.