BaseSequencerStep(statistics: tensorflow_metadata.proto.v0.statistics_pb2.DatasetFeatureStatisticsList = None, schema: tensorflow_metadata.proto.v0.schema_pb2.Schema = None, **kwargs)
: Base class for all sequencer steps. These steps are used to
specify transformation and filling operations on timeseries datasets
that occur before the data preprocessing takes place.
Base Sequencer constructor. This steps uses a beam pipeline to handle the data processing and the pipeline consists of a few steps, where the main logic for each datapoint can be summarized as follows: 1 - Add a timestamp to the datapoint 2 - Add a category key to the datapoint (optional) 3 - Split your data into sessions based on a windowing strategy 4 - Process the sessions to create sequences With the `abstractmethod`s listed below, you have the option to modify anyone of these steps. Args: statistics: Parsed statistics output of a preceding StatisticsGen. schema: Parsed schema output of a preceding SchemaGen. ### Ancestors (in MRO) * zenml.core.steps.base_step.BaseStep ### Class variables `STEP_TYPE` : ### Methods `get_category_do_fn(self)` : In ZenML, you have the option to split your data based on a categorical feature before the actual sequencing happens. This is especially helpful if you are dealing with a joint dataset (i.e dataset featuring multiple assets in the field, but you want to sequence on an asset-level) Similar to get_timestamp_do_fn, you need to implement a method, which returns an instance of a beam.DoFn class. This beam.DoFn should be responsible for extracting the category of a datapoint and add it to the datapoint and return it. For a practical example, you can check our StandardSequencer. `get_combine_fn(self)` : Once the data is split into sessions (and possibly categories too), it needs to be processed in order to extract sequences from the sessions. This method needs to return an instance of beam.CombineFn class, which processes the accumulated datapoints and extracts desired sequences. You can check out our StandardSequencer for a practical example. `get_timestamp_do_fn(self)` : The process of sequencing is highly dependent on the format of your data. For instance, the timestamp of a single datapoint can be infused within the datapoint in various shapes or forms. It is impossible to find THE one solution which would be able to parse all the different variants of timestamps and that is why we exposed this method. Through this method, you have access to all of the instance variables of your step and all you have to do is to return an instance of a beam.DoFn which returns a TimestampedValue. You can check our StandardSequencer for a practical example. `get_window(self)` : This method needs to return the desired windowing strategy for the beam pipeline.