Training a ML classifier on 59M samples

In this tutorial, we’ll go through the step-by-step process of building a simple feedforward classifier trained on a public BigQuery datasource.

Note

This tutorial is adapted from the blog post: Deep Learning on 33,000,000 data points using a few lines of YAML

tldr; One can utilize the Dataflow Processing Backend.

from zenml.core.backends.processing.processing_dataflow_backend import \
    ProcessingDataFlowBackend

training_pipeline = TrainingPipeline(name='distributed_dataflow')

# add steps
...

# configure steps
processing_backend = ProcessingDataFlowBackend(project='GCP_PROJECT')

# Run the pipeline
training_pipeline.run(
    backends=[processing_backend],
)

Full code example can be found here

Detailed tutorial to follow! Check out the GitHub repo to get updates!