Quickstart¶
Get up and running in (almost) 3 steps. Let’s get you started with a simple pipeline. Please make sure to also check out the advanced concepts. This quickstart uses some built-ins and a very simple model.
Hint
If you are here just to see the code, you can find it on GitHub.
For visual learners¶
If you don’t feel like reading right now, please watch this video for a visual explanation of the quickstart:
Step 0: Installation¶
ZenML is available for easy installation into your environment via PyPI:
%%bash
pip install zenml
Alternatively, if you’re feeling brave, feel free to install the bleeding edge: NOTE: Do so on your own risk, no guarantees given!
%%bash
pip install git+https://github.com/maiot-io/zenml.git@main --upgrade
Step 1: Initialize a ZenML repo from within a git repo¶
zenml init
Step 2: Assemble, run and evaluate your pipeline locally¶
from zenml.core.datasources.csv_datasource import CSVDatasource
from zenml.core.pipelines.training_pipeline import TrainingPipeline
from zenml.core.steps.evaluator.tfma_evaluator import TFMAEvaluator
from zenml.core.steps.split.random_split import RandomSplit
from zenml.core.steps.preprocesser.standard_preprocesser.standard_preprocesser import StandardPreprocesser
from zenml.core.steps.trainer.tensorflow_trainers.tf_ff_trainer import FeedForwardTrainer
training_pipeline = TrainingPipeline(name='Quickstart')
# Add a datasource. This will automatically track and version it.
ds = CSVDatasource(name='Pima Indians Diabetes Dataset',
path='gs://zenml_quickstart/diabetes.csv')
training_pipeline.add_datasource(ds)
# Add a random 70/30 train-eval split
training_pipeline.add_split(RandomSplit(split_map={'train': 0.7, 'eval': 0.3}))
# StandardPreprocesser() has sane defaults for normal preprocessing methods
training_pipeline.add_preprocesser(
StandardPreprocesser(
features=['times_pregnant', 'pgc', 'dbp', 'tst', 'insulin', 'bmi',
'pedigree', 'age'],
labels=['has_diabetes'],
overwrite={'has_diabetes': {
'transform': [{'method': 'no_transform', 'parameters': {}}]}}
))
# Add a trainer
training_pipeline.add_trainer(FeedForwardTrainer(
loss='binary_crossentropy',
last_activation='sigmoid',
output_units=1,
metrics=['accuracy'],
epochs=20))
# Add an evaluator
training_pipeline.add_evaluator(
TFMAEvaluator(slices=[['has_diabetes']],
metrics={'has_diabetes': ['binary_crossentropy',
'binary_accuracy']}))
# Run the pipeline locally
training_pipeline.run()
Step 3: Leverage powerful integrations¶
# See schema of data
training_pipeline.view_schema()
# See statistics of train and eval
training_pipeline.view_statistics()
# Creates a notebook for evaluation
training_pipeline.evaluate()
Of course, each of these steps can be extended quite easily to accommodate more complex scenarios and use-cases. There is a steadily-growing number of integrations available, for example, Google Dataflow for distributed preprocessing or Google Cloud AI Platform as a [training(../backends/training-backends.md) backend].
What to do next?¶
Read about core concepts of ZenML.
Convert your legacy code-base to ZenML pipelines.
Understand deeper what makes a ZenML Repository.
See what ZenML has to offer with standard powerful abstractions like Pipelines, Steps, Datasources and Backends. If the standard ones don’t fit your needs, you can also create custom logic with ZenML.