Training a ML classifier on 59M samples

In this tutorial, we’ll go through the step-by-step process of building a simple feedforward classifier trained on a public BigQuery datasource.


This tutorial is adapted from the blog post: Deep Learning on 33,000,000 data points using a few lines of YAML

tldr; One can utilize the Dataflow Processing Backend.

from zenml.core.backends.processing.processing_dataflow_backend import \

training_pipeline = TrainingPipeline(name='distributed_dataflow')

# add steps

# configure steps
processing_backend = ProcessingDataFlowBackend(project='GCP_PROJECT')

# Run the pipeline

