Module core.steps.data.image_data_step¶
Base interface for Image Data Step
Functions¶
ReadImagesFromDisk(pipeline: apache_beam.pipeline.Pipeline, base_path: str) ‑> apache_beam.pvalue.PCollection
: The Beam PTransform used to load a collection of images and metadata
from a local file system or a remote cloud storage bucket.
Args:
pipeline (beam.Pipeline): Input beam.Pipeline object coming
from a TFX Executor.
base_path (Text): Base directory containing images and labels.
SplitByFileName(element: Dict[str, Any], num_partitions: int) ‑> int
: Helper function to identify the label file in a beam.Partition applied
to the PCollection of input files.
Args:
element: Dict with image features.
num_partitions (int): Number of partitions, unused.
add_label_and_metadata(image_dict: Dict[str, Any], label_dict: Dict[str, Any])
: Add label and metadata information to an image.
Args:
image_dict: Dict with image features.
label_dict (Text): JSON-readable string with label information.
Returns:
image_dict: Updated image feature dict with label and metadata
information.
get_matching_label(label_data: str, img_filename: str)
: Get a label matching an image file name from a JSON-readable label file.
Args:
label_data (Text): Label string, needs to be JSON-readable.
img_filename (Text): File name of the image.
Returns:
label: Label key of the image.
metadata: Dict, additional metadata information.
read_file_content(file: apache_beam.io.fileio.ReadableFile)
: Read contents from a file handle in binary and return it along with some
file metadata as a dict.
Args:
file (beam.io.fileio.ReadableFile): Beam ReadableFile object,
corresponds to an image file read from disk.
Returns:
data_dict: Dict with binary data and file metadata.
Classes¶
ImageDataStep(base_path, schema: Dict = None)
: Image data step used to load and process a collection of images along with
additional labels and metadata.
Image data step constructor. Use this data step in image
classification tasks with a single, scalar label.
This data step expects a directory containing the images as input,
along with a single, JSON-readable file containing the
two keys `label` and `metadata` for each image file in the directory.
In these keys, you can store additional label and metadata information
like date, copyright or GPS tags.
The entries of the JSON file should look like this:
# Single JSON record of an image in the base directory called
img123.jpg ::
{"img123.jpg": {
"label": 0,
"metadata": {
"height": 256,
"width": 256,
"num_channels": 3
}
}
Note that the label and metadata have to be present in the same
single file for all the images in the folder.
Args:
base_path: Base directory containing the images and the label file.
schema: Optional schema providing data type information about the
data source.
### Ancestors (in MRO)
* zenml.core.steps.data.base_data_step.BaseDataStep
* zenml.core.steps.base_step.BaseStep
### Methods
`read_from_source(self)`
: