Module core.steps.data.csv_data_step¶
Base interface for CSV Data Step
Functions¶
read_files_from_disk(pipeline: apache_beam.pipeline.Pipeline, base_path: str) ‑> apache_beam.pvalue.PCollection
: The Beam PTransform used to read data from a collection of CSV files
on a local file system.
Args:
pipeline: Input beam.Pipeline object coming from a TFX Executor.
base_path: Base path pointing either to the directory containing the
CSV files, or to a (single) CSV file.
Returns:
A beam.PCollection of data points. Each row in the collection of
CSV files represents a single data point.
Classes¶
CSVDataStep(path, schema: Dict = None)
: CSV data step to load local tabular data from disk or a remote cloud
storage bucket.
CSV data step constructor.
Args:
path: Base path pointing either to the directory containing the
CSV files, or to a (single) CSV file.
schema: Optional schema providing data type information about the
data source.
### Ancestors (in MRO)
* zenml.core.steps.data.base_data_step.BaseDataStep
* zenml.core.steps.base_step.BaseStep
### Methods
`read_from_source(self)`
: