Module core.steps.data.image_data_step

Base interface for Image Data Step

Functions

ReadImagesFromDisk(pipeline: apache_beam.pipeline.Pipeline, base_path: str) ‑> apache_beam.pvalue.PCollection : The Beam PTransform used to load a collection of images and metadata from a local file system or a remote cloud storage bucket.

Args:
    pipeline (beam.Pipeline): Input beam.Pipeline object coming
     from a TFX Executor.
    base_path (Text): Base directory containing images and labels.

SplitByFileName(element: Dict[str, Any], num_partitions: int) ‑> int : Helper function to identify the label file in a beam.Partition applied to the PCollection of input files.

Args:
    element: Dict with image features.
    num_partitions (int): Number of partitions, unused.

add_label_and_metadata(image_dict: Dict[str, Any], label_dict: Dict[str, Any]) : Add label and metadata information to an image.

Args:
    image_dict: Dict with image features.
    label_dict (Text): JSON-readable string with label information.

Returns:
    image_dict: Updated image feature dict with label and metadata
     information.

get_matching_label(label_data: str, img_filename: str) : Get a label matching an image file name from a JSON-readable label file.

Args:
    label_data (Text): Label string, needs to be JSON-readable.
    img_filename (Text): File name of the image.

Returns:
    label: Label key of the image.
    metadata: Dict, additional metadata information.

read_file_content(file: apache_beam.io.fileio.ReadableFile) : Read contents from a file handle in binary and return it along with some file metadata as a dict.

Args:
    file (beam.io.fileio.ReadableFile): Beam ReadableFile object,
    corresponds to an image file read from disk.

Returns:
    data_dict: Dict with binary data and file metadata.

Classes

ImageDataStep(base_path, schema: Dict = None) : Image data step used to load and process a collection of images along with additional labels and metadata.

Image data step constructor. Use this data step in image
classification tasks with a single, scalar label.

This data step expects a directory containing the images as input,
along with a single, JSON-readable file containing the
two keys `label` and `metadata` for each image file in the directory.
In these keys, you can store additional label and metadata information
like date, copyright or GPS tags.

The entries of the JSON file should look like this:

# Single JSON record of an image in the base directory called
img123.jpg ::

     {"img123.jpg": {
        "label": 0,
        "metadata": {
            "height": 256,
            "width": 256,
            "num_channels": 3
        }
     }

Note that the label and metadata have to be present in the same
single file for all the images in the folder.

Args:
    base_path: Base directory containing the images and the label file.
    schema: Optional schema providing data type information about the
     data source.

### Ancestors (in MRO)

* zenml.core.steps.data.base_data_step.BaseDataStep
* zenml.core.steps.base_step.BaseStep

### Methods

`read_from_source(self)`
: