Module core.pipelines.data_pipeline

Pipeline to create data sources

Classes

DataPipeline(name: str, enable_cache: Union[bool, NoneType] = True, steps_dict: Dict[str, zenml.core.steps.base_step.BaseStep] = None, backends_dict: Dict[str, zenml.core.backends.base_backend.BaseBackend] = None, metadata_store: Union[zenml.core.metadata.metadata_wrapper.ZenMLMetadataStore, NoneType] = None, artifact_store: Union[zenml.core.repo.artifact_store.ArtifactStore, NoneType] = None, datasource: Union[zenml.core.datasources.base_datasource.BaseDatasource, NoneType] = None, pipeline_name: Union[str, NoneType] = None, *args, **kwargs) : DataPipeline definition to create datasources.

A DataPipeline is used to create datasources in ZenML. Each data pipeline
creates a snapshot of the datasource in time. All datasources are consumed
by different ZenML pipelines like the TrainingPipeline.

Construct a base pipeline. This is a base interface that is meant
to be overridden in multiple other pipeline use cases.

Args:
    name: Outward-facing name of the pipeline.
    pipeline_name: A unique name that identifies the pipeline after
     it is run.
    enable_cache: Boolean, indicates whether or not caching
     should be used.
    steps_dict: Optional dict of steps.
    backends_dict: Optional dict of backends
    metadata_store: Configured metadata store. If None,
     the default metadata store is used.
    artifact_store: Configured artifact store. If None,
     the default artifact store is used.

### Ancestors (in MRO)

* zenml.core.pipelines.base_pipeline.BasePipeline

### Class variables

`PIPELINE_TYPE`
:

### Methods

`get_default_backends(self) ‑> Dict`
:   Gets list of default backends for this pipeline.

`get_tfx_component_list(self, config: Dict[str, Any]) ‑> List`
:   Creates a data pipeline out of TFX components.
    
    A data pipeline is used to ingest data from a configured source, e.g.
    local files or cloud storage. In addition, a schema and statistics are
    also computed immediately afterwards for the processed data points.
    
    Args:
        config: Dict. Contains a ZenML configuration used to build the
         data pipeline.
    
    Returns:
        A list of TFX components making up the data pipeline.

`steps_completed(self) ‑> bool`
:   Returns True if all steps complete, else raises exception

`view_schema(self)`
:   View schema of data flowing in pipeline.

`view_statistics(self, magic: bool = False)`
:   View statistics for data pipeline in HTML.
    
    Args:
        magic (bool): Creates HTML page if False, else
        creates a notebook cell.