Training a ML classifier on 59M samples¶
In this tutorial, we’ll go through the step-by-step process of building a simple feedforward classifier trained on a public BigQuery datasource.
Note
This tutorial is adapted from the blog post: Deep Learning on 33,000,000 data points using a few lines of YAML
tldr; One can utilize the Dataflow Processing Backend.
from zenml.core.backends.processing.processing_dataflow_backend import \
ProcessingDataFlowBackend
training_pipeline = TrainingPipeline(name='distributed_dataflow')
# add steps
...
# configure steps
processing_backend = ProcessingDataFlowBackend(project='GCP_PROJECT')
# Run the pipeline
training_pipeline.run(
backends=[processing_backend],
)
Full code example can be found here
Detailed tutorial to follow! Check out the GitHub repo to get updates!