Google Cloud VertexAI Orchestrator

How to orchestrate pipelines with Vertex AI

This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.

The Vertex orchestrator is an orchestrator flavor provided with the ZenML gcp integration that uses Vertex AI to run your pipelines.

This component is only meant to be used within the context of remote ZenML deployment scenario. Usage with a local ZenML deployment may lead to unexpected behavior!

When to use it

You should use the Vertex orchestrator if:

you're already using GCP.
you're looking for a proven production-grade orchestrator.
you're looking for a UI in which you can track your pipeline runs.
you're looking for a managed solution for running your pipelines.
you're looking for a serverless solution for running your pipelines.

How to deploy it

In order to use a Vertex AI orchestrator, you need to first deploy ZenML to the cloud. It would be recommended to deploy ZenML in the same Google Cloud project as where the Vertex infrastructure is deployed, but it is not necessary to do so. You must ensure that you are connected to the remote ZenML server before using this stack component.

The only other thing necessary to use the ZenML Vertex orchestrator is enabling Vertex relevant APIs on the Google Cloud project.

In order to quickly enable APIs, and create other resources necessary for to use this integration, you can also consider using the Vertex AI stack recipe, which helps you set up the infrastructure with one click.

How to use it

To use the Vertex orchestrator, we need:

The ZenML gcp integration installed. If you haven't done so, run
```
zenml integration install gcp
```
Docker installed and running.
A remote artifact store as part of your stack.
A remote container registry as part of your stack.
The GCP project ID and location in which you want to run your Vertex AI pipelines.
The pipeline client environment needs permissions to create a job in Vertex Pipelines, e.g. the Vertex AI User role: https://cloud.google.com/vertex-ai/docs/general/access-control#aiplatform.user
To run on a schedule, the client environment also needs permissions to create a Google Cloud Function (e.g. with the cloudfunctions.serviceAgent Role) and to create a Google Cloud Scheduler (e.g. with the Cloud Scheduler Job Runner Role). Additionally, it needs the Storage Object Creator Role to be able to write the pipeline JSON file to the artifact store directly.

We can then register the orchestrator and use it in our active stack:

zenml orchestrator register <ORCHESTRATOR_NAME> \
    --flavor=vertex \
    --project=<PROJECT_ID> \
    --location=<GCP_LOCATION>

# Register and activate a stack with the new orchestrator
zenml stack register <STACK_NAME> -o <ORCHESTRATOR_NAME> ... --set

ZenML will build a Docker image called <CONTAINER_REGISTRY_URI>/zenml:<PIPELINE_NAME> which includes your code and use it to run your pipeline steps in Vertex AI. Check out this page if you want to learn more about how ZenML builds these images and how you can customize them.

You can now run any ZenML pipeline using the Vertex orchestrator:

python file_that_runs_a_zenml_pipeline.py

Vertex UI

Vertex comes with its own UI that you can use to find further details about your pipeline runs, such as the logs of your steps. For any runs executed on Vertex, you can get the URL to the Vertex UI in Python using the following code snippet:

from zenml.post_execution import get_run

pipeline_run = get_run("<PIPELINE_RUN_NAME>")
orchestrator_url = deployer_step.metadata["orchestrator_url"].value

Run pipelines on a schedule

The Vertex Pipelines orchestrator supports running pipelines on a schedule, using logic resembling the official approach recommended by GCP.

ZenML utilizes the Cloud Scheduler and Cloud Functions services to enable scheduling on Vertex Pipelines. The following is the sequence of events that happen when running a pipeline on Vertex with a schedule:

Docker image is created and pushed (see above containerization).
The Vertex AI pipeline JSON file is copied to the Artifact Store specified in your Stack
Cloud Function is created that creates the Vertex Pipeline job when triggered.
Cloud Scheduler job is created that triggers the Cloud Function on the defined schedule.

Therefore, to run on a schedule, the client environment needs permissions to create a Google Cloud Function (e.g. with the cloudfunctions.serviceAgent Role) and to create a Google Cloud Scheduler (e.g. with the Cloud Scheduler Job Runner Role). Additionally, it needs the Storage Object Creator Role to be able to write the pipeline JSON file to the artifact store directly.

Once your have these permissions set in your local GCP CLI, here is how to create a scheduled Vertex pipeline in ZenML:

from zenml.config.schedule import Schedule

# Run a pipeline every 5th minute
pipeline_instance.run(
    schedule=Schedule(
        cron_expression="*/5 * * * *"
    )
)

The Vertex orchestrator only supports the cron_expression parameter in the Schedule object, and will ignore all other parameters supplied to define the schedule.

How to delete a scheduled pipeline

Note that ZenML only gets involved to schedule a run, but maintaining the lifecycle of the schedule is the responsibility of the user.

In order to cancel a scheduled Vertex pipeline, you need to manually delete the generated Google Cloud Function, along with the Cloud Scheduler job that schedules it (via the UI or the CLI).

Additional configuration

For additional configuration of the Vertex orchestrator, you can pass VertexOrchestratorSettings which allows you to configure (among others) the following attributes:

pod_settings: Node selectors, affinity and tolerations to apply to the Kubernetes Pods running your pipline. These can be either specified using the Kubernetes model objects or as dictionaries.

from zenml.integrations.gcp.flavors.vertex_orchestrator_flavor import VertexOrchestratorSettings
from kubernetes.client.models import V1Toleration


vertex_settings = VertexOrchestratorSettings(
    pod_settings={
        "affinity": {
            "nodeAffinity": {
                "requiredDuringSchedulingIgnoredDuringExecution": {
                    "nodeSelectorTerms": [
                        {
                            "matchExpressions": [
                                {
                                    "key": "node.kubernetes.io/name",
                                    "operator": "In",
                                    "values": ["my_powerful_node_group"],
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "tolerations": [
            V1Toleration(
                key="node.kubernetes.io/name",
                operator="Equal",
                value="",
                effect="NoSchedule"
            )
        ]
    }
)

@pipeline(
    settings={
        "orchestrator.vertex": vertex_settings
    }
)
  ...

Check out the API docs for a full list of available attributes and this docs page for more information on how to specify settings.

A concrete example of using the Vertex orchestrator can be found here.

For more information and a full list of configurable attributes of the Vertex orchestrator, check out the API Docs.

Enabling CUDA for GPU-backed hardware

Note that if you wish to use this orchestrator to run steps on a GPU, you will need to follow the instructions on this page to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.

PreviousKubernetes Orchestrator NextAWS Sagemaker Orchestrator

Last updated 6 months ago