Inspect Pipeline Runs
How to inspect a finished pipeline run
Inspecting Pipeline Runs
Each pipeline can have multiple runs associated with it, and for each run there might be several outputs for each step. Thus, to inspect a specific output, we first need to access the respective pipeline, then fetch the respective run, and then choose the step output of that specific run.
The overall hierarchy looks like this:
Let us investigate how to traverse this hierarchy level by level:
Repository
The highest level Repository
object is where to start from.
Pipelines
The repository contains a collection of all created pipelines with at least one run sorted by the time of their first run from oldest to newest.
You can either access this collection via the get_pipelines()
method or query a specific pipeline by name using get_pipeline(pipeline=...)
:
Be careful when accessing pipelines by index. Even if you just ran a pipeline it might not be at index -1
, due to the fact that the pipelines are sorted by time of first run. Instead, it is recommended to access the pipeline using the pipeline class, an instance of the class or even the name of the pipeline as a string: get_pipeline(pipeline=...)
.
Runs
Each pipeline can be executed many times. You can get a list of all runs using the runs
attribute of a pipeline. Or, you can query a specific run by run name using the get_run(run_name=...)
method:
Calling pipeline.runs
can currently be very slow when using remote metadata stores as all run data need to be transferred from the cloud to the local machine.
Alternatively, you can also access the runs from the pipeline class/instance itself.
Each run has a collection of useful metadata which you can access:
git_sha The Git commit SHA that the pipeline run was performed on. This will only be set if the pipeline code is in a git repository and there are no uncommitted files when running the pipeline.
status The status of a pipeline run can also be found here. There are four possible states: failed, completed, running, cached:
runtime_configuration Currently the runtime configuration contains information about the schedule that was used for the run, the run_name and the path to the file containing the pipeline.
Steps
Within a given pipeline run you can now further zoom in on individual steps using the steps
attribute or by querying a specific step using the get_step(step=...)
method.
The step name
refers to the pipeline attribute and not the class name of the steps that implement the step for a pipeline instance.
Outputs
Finally, this is how you can inspect the output of a step:
If there only is a single output, use the
output
attributeIf there are multiple outputs, use the
outputs
attribute, which is a dictionary that can be indexed using the name of an output:
The names of the outputs can be found in the Output
typing of your steps:
Code Example
Putting it all together, this is how we can access the output of the last step of our example pipeline from the previous sections:
or alternatively:
Last updated