Module core.backends.processing.processing_dataflow_backend

Definition of the DataFlow Processing Backend


ProcessingDataFlowBackend(project: str, region: str = 'europe-west1', job_name: str = 'zen_1611332481', image: str = '', machine_type: str = 'n1-standard-4', num_workers: int = 4, max_num_workers: int = 10, disk_size_gb: int = 50, autoscaling_algorithm: str = 'THROUGHPUT_BASED', **kwargs) : Use this to run a ZenML pipeline on Google Dataflow.

This backend utilizes the beam v2 runner to run a custom docker image on
the Dataflow job.

Adding this Backend will cause all 'Beam'-supported Steps in the
pipeline to run on Google Dataflow.

    project: GCP project to launch dataflow job.
    region: GCP region to launch dataflow job.
    job_name: Name of dataflow job.
    image: Docker Image to use. Must inherit from the beam base image.
    machine_type: Type of machine to run workload.
    num_workers: Number of workers on that machine.
    max_num_workers: Max number of workers in the workload.
    disk_size_gb: Disk size per worker.
    autoscaling_algorithm: Autoscaling algorithm to use.

### Ancestors (in MRO)

* zenml.core.backends.processing.processing_local_backend.ProcessingLocalBackend
* zenml.core.backends.base_backend.BaseBackend

### Class variables


### Methods

`get_beam_args(self, pipeline_name: str = None, pipeline_root: str = None) ‑> Union[List[str], NoneType]`
:   Returns a list of beam args for the pipeline.
        pipeline_name: Name of the pipeline.
        pipeline_root: Root dir of pipeline.