How to set up storage for secrets
Secrets managers provide a secure way of storing confidential information that is needed to run your ML pipelines. Most production pipelines will run on cloud infrastructure and therefore need credentials to authenticate with those services. Instead of storing these credentials in code or files, ZenML secrets managers can be used to store and retrieve these values in a secure manner.
We are deprecating secrets managers in favor of the centralized ZenML secrets store. Going forward, we recommend using the secrets store instead of secrets managers to configure and store secrets.
If you already use secrets managers to manage your secrets, please use the provided
zenml secrets-manager secrets migrateCLI command to migrate your secrets to the centralized secrets store.
Managing secrets through a secrets manager stack component suffers from a number of limitations, some of which are:
- you need to configure a Secrets Manager stack component and add it to your active stack before you can register and access secrets. With centralized secrets management, you don't need to configure anything; your ZenML local deployment or ZenML server replaces the secrets manager role.
- even with a secrets manager configured in your active stack, if you are using a secrets manager flavor with a cloud back-end (e.g. AWS, GCP or Azure), you still need to configure all your ZenML clients with the authentication credentials required to access the back-end directly. This is not only an inconvenience, it is also a security risk, because it basically represents a large attack surface. With centralized secrets management, you only need to configure the ZenML server to access the cloud back-end.
ZenML currently supports configuring the ZenML server to use the following back-ends as centralized secrets store replacements for secrets managers:
- the SQL database that the ZenML server is using to store other managed objects such as pipelines, stacks, etc. This is the default option and replaces the
localsecrets manager flavor.
- AWS Secrets Manager - replaces the
awssecrets manager flavor.
- GCP Secret Manager - replaces the
gcpsecrets manager flavor.
- Azure Key Vault - replaces the
azuresecrets manager flavor.
- HashiCorp Vault - replaces the
vaultsecrets manager flavor.
The centralized secrets store also supports using a custom back-end implementation.
There is no direct migration path planned for the GitHub secrets manager flavor, given that it can only be used inside a GitHub Actions workflow and thus is not a service that can be used as a centralized secrets store. If you are using the GitHub secrets manager flavor, you have the option of manually transferring your secrets to one of the other supported secrets store back-ends.
You should include a secrets manager in your ZenML stack if any other component of your stack requires confidential information (such as authentication credentials) or you want to access secret values inside your pipeline steps.
Here is an architecture diagram that shows how remote secrets managers fit into the overall story of a remote stack. As you can see the secrets manager is accessed from the client-side as well as from the orchestrator/step operator. On the client side the secret manager could be used to resolve credentials for the orchestrator and container registry. Orchestrators and Step Operators can also query the secrets manager to get credentials for other stack components, data sources, or other environments.
Out of the box, ZenML comes with a
localsecrets manager that stores secrets in local files. Additional cloud secrets managers are provided by integrations:
Uses local files to store secrets
Uses AWS to store secrets
Uses GCP to store secretes
Uses Azure Key Vaults to store secrets
Uses GitHub to store secrets
Uses HashiCorp Vault to store secrets
If you would like to see the available flavors of secrets managers, you can use the command:
zenml secrets-manager flavor list
A full guide on using the CLI interface to register, access, update and delete secrets is available here.
A ZenML secret is a grouping of key-value pairs which are defined by a schema. An AWS SecretSchema, for example, has key-value pairs for
AWS_SECRET_ACCESS_KEYas well as an optional
AWS_SESSION_TOKEN. If you don't specify a schema when registering a secret, ZenML will use the
ArbitrarySecretSchema, a schema where arbitrary keys are allowed.
Note that there are two ways you can register or update your secrets. If you wish to do so interactively, passing the secret name in as an argument (as in the following example) will initiate an interactive process:
zenml secrets-manager secret register SECRET_NAME -i
If you wish to specify key-value pairs using command line arguments, you can do so instead:
zenml secrets-manager secret register SECRET_NAME --key1=value1 --key2=value2
For secret values that are too big to pass as a command line argument, or have special characters, you can also use the special
@syntax to indicate to ZenML that the value needs to be read from a file:
zenml secrets-manager secret register SECRET_NAME --attr_from_literal=value \
You can access the secrets manager directly from within your steps through the
StepContext. This allows you to use your secrets for querying APIs from within your step without hard-coding your access keys. Don't forget to make the appropriate decision regarding caching as it will be disabled by default when the
StepContextis passed into the step.
from zenml.steps import step, StepContext
) -> None:
"""Load the example secret from the secret manager."""
# Load Secret from active secret manager. This will fail if no secret
# manager is active or if that secret does not exist.
retrieved_secret = context.stack.secrets_manager.get_secret(<SECRET_NAME>)
# retrieved_secret.content will contain a dictionary with all Key-Value
# pairs within your secret.
This will only work if the environment that your orchestrator uses to execute steps has access to the secrets manager. For example a local secrets manager will not work in combination with a remote orchestrator.
The concept of secret schemas exists to support strongly typed secrets that validate which keys can be configured for a given secret and which values are allowed for those keys.
Secret schemas are available as builtin schemas, or loaded when an integration is installed. Custom schemas can also be defined by sub-classing the
zenml.secret.BaseSecretSchemaclass. For example, the following is the builtin schema defined for a MySQL secret:
from typing import ClassVar, Optional
from zenml.secret.base_secret import BaseSecretSchema
MYSQL_METADATA_STORE_SCHEMA_TYPE = "mysql"
TYPE: ClassVar[str] = MYSQL_METADATA_STORE_SCHEMA_TYPE
ssl_verify_server_cert: Optional[bool] = False
To register a secret regulated by a schema, the
--schemaargument must be passed to the
zenml secrets-manager secret registercommand:
zenml secrets-manager secret register mysql_secret --schema=mysql --user=user --password=password
The keys and values passed to the CLI are validated using regular Pydantic rules:
- optional attributes don't need to be passed to the CLI and will be set to their default value if omitted
- required attributes must be passed to the CLI or an error will be raised
- all values must be a valid string representation of the data type indicated in the schema (i.e. that can be converted to the type indicated) or an error will be raised
Secret references work with any secrets manager and allow you to securely specify sensitive configurations for your stack components.
Examples of situations in which Secrets Manager scoping can be useful:
- you want to control whether a secret configured in a Secrets Manager stack component is visible in another Secrets Manager stack component. This is useful when you want to share secrets without necessarily sharing stack components.
- you want to be able to configure two or more secrets with the same name but with different values in different Secrets Manager stack components.
- you want to emulate multiple virtual Secrets Manager instances on top of a single infrastructure secret management service
The scope determines how secrets are shared across different Secrets Manager instances that use the same backend domain (e.g. the same AWS region, GCP project or Azure Key Vault). To understand if and how that is important for you, we first need to define what these terms mean:
- a Secrets Manager instance is created by running
zenml secrets-manager register. An instance is uniquely identified by its UUID (not by its name).
- a Secrets Manager backend domain can generally be thought of as the bucket where a Secrets Manager instance stores its secrets. Every Secrets Manager flavor uses a different implementation specific backend domain (e.g. an AWS region, a GCP project or an Azure Key Vault). This is usually reflected in the attributes that need to be configured for the Secrets Manager stack component.
All secrets in a backend domain share one global namespace, meaning that all Secrets Manager instances configured to use the same backend domain have to compete over the names of secrets that they store there. Secrets Manager scoping basically controls how the ZenML secret namespace is mapped to the underlying backend namespace.
The following diagram depicts the available secret scopes that you can configure for your Secrets Manager instance, if the flavor supports secret scoping. Note how the different secret namespaces are isolated from each other:
All Secrets Managers have two configuration attributes that determine how and if a Secrets Manager instance shares secrets with other Secrets Manager instances connected to the same back-end domain:
scopedetermines the secret scope and can be set to one of the following values:
none: no secret scoping is used when this scope value is configured. This essentially means that all secrets use the same global namespace that is shared not only with other ZenML Secrets Manager instances using a
nonescope, but also with other applications and users that configure secrets directly in the backend. This mode of operation is only used to preserve backwards compatibility with Secrets Manager instances that were already in use prior to the ZenML release 0.12.0 that introduced the concept of scoping. It is not recommended to use this scope with Secrets Manager instances that support scoping, as it will be deprecated and phased out in future ZenML versions.
global: secrets are shared across all Secrets Manager instances that connect to the same backend and have a
globalscope. You should use this scope if you want to share your secrets with everyone using ZenML in your team or organization and are not interested in micro-managing the access to these secrets.
component: secrets are not visible outside a Secrets Manager instance. This is the default for new instances of Secrets Manager flavors that support scoping. Use this scope if you don't intend to share your secrets with other projects or stacks. The component scope means that only stacks with a Secrets Manager with the exact UUID as your stack can access your secrets. The global or namespace scope are more suitable for sharing access to secrets.
namespace: secrets in a namespace scope are shared only by Secrets Manager instances that connect to the same backend and have the same
namespaceattribute value configured (see below). Use a namespace scope when you want to fine-tune the visibility of secrets across stacks and projects.
namespaceis a scope namespace value to use with the namespace scope