LogoLogo
ProductResourcesGitHubStart free
  • Documentation
  • Learn
  • ZenML Pro
  • Stacks
  • API Reference
  • SDK Reference
  • Overview
  • Integrations
  • Stack Components
    • Orchestrators
      • Local Orchestrator
      • Local Docker Orchestrator
      • Kubeflow Orchestrator
      • Kubernetes Orchestrator
      • Google Cloud VertexAI Orchestrator
      • AWS Sagemaker Orchestrator
      • AzureML Orchestrator
      • Databricks Orchestrator
      • Tekton Orchestrator
      • Airflow Orchestrator
      • Skypilot VM Orchestrator
      • HyperAI Orchestrator
      • Lightning AI Orchestrator
      • Develop a custom orchestrator
    • Artifact Stores
      • Local Artifact Store
      • Amazon Simple Cloud Storage (S3)
      • Google Cloud Storage (GCS)
      • Azure Blob Storage
      • Develop a custom artifact store
    • Container Registries
      • Default Container Registry
      • DockerHub
      • Amazon Elastic Container Registry (ECR)
      • Google Cloud Container Registry
      • Azure Container Registry
      • GitHub Container Registry
      • Develop a custom container registry
    • Step Operators
      • Amazon SageMaker
      • AzureML
      • Google Cloud VertexAI
      • Kubernetes
      • Modal
      • Spark
      • Develop a Custom Step Operator
    • Experiment Trackers
      • Comet
      • MLflow
      • Neptune
      • Weights & Biases
      • Google Cloud VertexAI Experiment Tracker
      • Develop a custom experiment tracker
    • Image Builders
      • Local Image Builder
      • Kaniko Image Builder
      • AWS Image Builder
      • Google Cloud Image Builder
      • Develop a Custom Image Builder
    • Alerters
      • Discord Alerter
      • Slack Alerter
      • Develop a Custom Alerter
    • Annotators
      • Argilla
      • Label Studio
      • Pigeon
      • Prodigy
      • Develop a Custom Annotator
    • Data Validators
      • Great Expectations
      • Deepchecks
      • Evidently
      • Whylogs
      • Develop a custom data validator
    • Feature Stores
      • Feast
      • Develop a Custom Feature Store
    • Model Deployers
      • MLflow
      • Seldon
      • BentoML
      • Hugging Face
      • Databricks
      • vLLM
      • Develop a Custom Model Deployer
    • Model Registries
      • MLflow Model Registry
      • Develop a Custom Model Registry
  • Service Connectors
    • Introduction
    • Complete guide
    • Best practices
    • Connector Types
      • Docker Service Connector
      • Kubernetes Service Connector
      • AWS Service Connector
      • GCP Service Connector
      • Azure Service Connector
      • HyperAI Service Connector
  • Popular Stacks
    • AWS
    • Azure
    • GCP
    • Kubernetes
  • Deployment
    • 1-click Deployment
    • Terraform Modules
    • Register a cloud stack
    • Infrastructure as code
  • Contribute
    • Custom Stack Component
    • Custom Integration
Powered by GitBook
On this page
  • When would you want to use it?
  • How to deploy it?
  • How do you use it?
  • Acknowledgements

Was this helpful?

Edit on GitHub
  1. Stack Components
  2. Annotators

Pigeon

Annotating data using Pigeon.

PreviousLabel StudioNextProdigy

Last updated 1 month ago

Was this helpful?

Pigeon is a lightweight, open-source annotation tool designed for quick and easy labeling of data directly within Jupyter notebooks. It provides a simple and intuitive interface for annotating various types of data, including:

  • Text Classification

  • Image Classification

  • Text Captioning

When would you want to use it?

If you need to label a small to medium-sized dataset as part of your ML workflow and prefer the convenience of doing it directly within your Jupyter notebook, Pigeon is a great choice. It is particularly useful for:

  • Quick labeling tasks that don't require a full-fledged annotation platform

  • Iterative labeling during the exploratory phase of your ML project

  • Collaborative labeling within a Jupyter notebook environment

How to deploy it?

To use the Pigeon annotator, you first need to install the ZenML Pigeon integration:

zenml integration install pigeon

Next, register the Pigeon annotator with ZenML, specifying the output directory where the annotation files will be stored:

zenml annotator register pigeon --flavor pigeon --output_dir="path/to/dir"

Note that the output_dir is relative to the repository or notebook root.

Finally, add the Pigeon annotator to your stack and set it as the active stack:

zenml stack update <YOUR_STACK_NAME> --annotator pigeon

Now you're ready to use the Pigeon annotator in your ML workflow!

How do you use it?

With the Pigeon annotator registered and added to your active stack, you can easily access it using the ZenML client within your Jupyter notebook.

For text classification tasks, you can launch the Pigeon annotator as follows:

from zenml.client import Client

annotator = Client().active_stack.annotator

annotations = annotator.annotate(
    data=[
        'I love this movie',
        'I was really disappointed by the book'
    ],
    options=[
        'positive',
        'negative'
    ]
)

For image classification tasks, you can provide a custom display function to render the images:

from zenml.client import Client
from IPython.display import display, Image

annotator = Client().active_stack.annotator

annotations = annotator.annotate(
    data=[
        '/path/to/image1.png',
        '/path/to/image2.png'
    ],
    options=[
        'cat',
        'dog'
    ],
    display_fn=lambda filename: display(Image(filename))
)

The launch method returns the annotations as a list of tuples, where each tuple contains the data item and its corresponding label.

You can also use the zenml annotator dataset commands to manage your datasets:

  • zenml annotator dataset list - List all available datasets

  • zenml annotator dataset delete <dataset_name> - Delete a specific dataset

  • zenml annotator dataset stats <dataset_name> - Get statistics for a specific dataset

Annotation files are saved as JSON files in the specified output directory. Each annotation file represents a dataset, with the filename serving as the dataset name.

Acknowledgements

Pigeon was created by and released as a and. It is licensed under the Apache License. It has been updated to work with more recent ipywidgets versions and some small UI improvements were added. We are grateful to Anastasis for creating this tool and making it available to the community.

Anastasis Germanidis
Python package
Github repository
ZenML Scarf
Pigeon annotator interface