Module core.backends.training.training_gcaip_backend¶
Definition of the GCAIP Training Backend
Classes¶
SingleGPUTrainingGCAIPBackend(project: str, job_dir: str, gpu_type: str = 'K80', machine_type: str = 'n1-standard-4', image: str = 'eu.gcr.io/maiot-zenml/zenml:cuda-0.2.0', job_name: str = 'train_1611569655', region: str = 'europe-west1', python_version: str = '3.7', max_running_time: int = 7200, **kwargs)
: Runs a TrainerStep on Google Cloud AI Platform.
A training backend can be used to efficiently train a machine learning
model on large amounts of data. This triggers a Training job on the Google
Cloud AI Platform service: https://cloud.google.com/ai-platform.
This backend is meant for a training job with a single GPU only. The user
has a choice of three GPUs, specified in the GCPGPUTypes Enum.
An opinionated wrapper around a GCAIP training job.
Args:
project: The GCP project in which to run the job.
job_dir: A bucket where to store some metadata while training.
gpu_type: The type of gpu.
machine_type: The type of machine to use. This must conform to
the GCP compatability matrix with the gpu_type. See details
here: https://cloud.google.com/ai-platform/training/docs/using
-gpus#compute-engine-machine-types-with-gpu
image: The Docker image with which to run the job.
job_name: The name of the job.
region: The GCP region to run the job in.
python_version: The Python version for the job.
max_running_time: The maximum running time of the job in seconds.
**kwargs:
### Ancestors (in MRO)
* zenml.core.backends.training.training_local_backend.TrainingLocalBackend
* zenml.core.backends.base_backend.BaseBackend
### Class variables
`BACKEND_TYPE`
:
### Methods
`get_custom_config(self)`
: Return a dict to be passed as a custom_config to the Trainer.
`get_executor_spec(self)`
: Return a TFX Executor spec for the Trainer Component.