Artifact Log Store
Storing logs in your artifact store.
The Artifact Log Store is the default log store flavor that comes built-in with ZenML. It stores logs directly in your artifact store, providing a zero-configuration logging solution that works out of the box.
The Artifact Log Store is ZenML's implicit default. You don't need to register it as a flavor or add it to your stack. When no log store is explicitly configured, ZenML automatically uses an Artifact Log Store to handle logs. This means logging works out of the box with zero configuration.
When to use it
The Artifact Log Store is ideal when:
You want logging to work without any additional configuration
You prefer to keep all your pipeline data (artifacts and logs) in one place
You don't need advanced log querying capabilities
You're getting started with ZenML and want a simple setup
How it works
The Artifact Log Store leverages OpenTelemetry's batching infrastructure while using a custom exporter that writes logs to your artifact store. Here's what happens during pipeline execution:
Log capture: All stdout, stderr, and Python logging output is captured and routed to the log store.
Batching: Logs are collected in batches using OpenTelemetry's
BatchLogRecordProcessorfor efficient processing.Export: The
ArtifactLogExporterwrites batched logs to your artifact store as JSON-formatted log files.Finalization: When a step completes, logs are finalized (merged if necessary) to ensure they're ready for retrieval.
Handling Different Filesystem Types
The Artifact Log Store handles different artifact store backends intelligently:
Mutable filesystems (local, S3, Azure): Logs are appended to a single file per step.
Immutable filesystems (GCS): Logs are written as timestamped files in a directory, then merged on finalization.
This ensures consistent behavior across all supported artifact store types.
Environment Variables
The Artifact Log Store uses OpenTelemetry's batch processing under the hood. You can tune the batching behavior using these environment variables:
ZENML_LOGS_OTEL_MAX_QUEUE_SIZE
100000
Maximum queue size for batch log processor
ZENML_LOGS_OTEL_SCHEDULE_DELAY_MILLIS
5000
Delay between batch exports in milliseconds
ZENML_LOGS_OTEL_MAX_EXPORT_BATCH_SIZE
5000
Maximum batch size for exports
ZENML_LOGS_OTEL_EXPORT_TIMEOUT_MILLIS
15000
Timeout for each export batch in milliseconds
These defaults are optimized for most use cases. You typically only need to adjust them for high-volume logging scenarios.
Log format
Logs are stored as newline-delimited JSON (NDJSON) files. Each log entry contains the following fields:
message
The log message content
level
Log level (DEBUG, INFO, WARN, ERROR, CRITICAL)
timestamp
When the log was created
name
The name of the logger
filename
The source file that generated the log
lineno
The line number in the source file
module
The module that generated the log
chunk_index
Index of this chunk (0 for non-chunked messages)
total_chunks
Total number of chunks (1 for non-chunked messages)
id
Unique identifier for the log entry (used to reassemble chunked messages)
For large messages (>5KB), logs are automatically split into multiple chunks with sequential chunk_index values and a shared id for reassembly.
Storage location
Logs are stored in the logs directory within your artifact store:
Best practices
Use the default: For most use cases, the automatic artifact log store is sufficient. Don't add complexity unless you need it.
Monitor storage: Logs can accumulate over time. Consider implementing log retention policies for your artifact store.
Large log volumes: If you're generating very large log volumes, consider using a dedicated log store like Datadog for better scalability and querying.
Sensitive data: Be mindful of what you log. Avoid logging sensitive information like credentials or PII.
For more information and a full list of configurable attributes, check out the SDK Docs.
Last updated
Was this helpful?