Module core.steps.data.csv_data_step

Base interface for CSV Data Step

Functions

read_files_from_disk(pipeline: apache_beam.pipeline.Pipeline, base_path: str) ‑> apache_beam.pvalue.PCollection : The Beam PTransform used to read data from a collection of CSV files on a local file system. Args: pipeline: Input beam.Pipeline object coming from a TFX Executor. base_path: Base path pointing either to the directory containing the CSV files, or to a (single) CSV file.

Returns:
    A beam.PCollection of data points. Each row in the collection of
     CSV files represents a single data point.

Classes

CSVDataStep(path, schema: Dict = None) : CSV data step to load local tabular data from disk or a remote cloud storage bucket.

CSV data step constructor.

Args:
    path: Base path pointing either to the directory containing the
     CSV files, or to a (single) CSV file.
    schema: Optional schema providing data type information about the
     data source.

### Ancestors (in MRO)

* zenml.core.steps.data.base_data_step.BaseDataStep
* zenml.core.steps.base_step.BaseStep

### Methods

`read_from_source(self)`
: