Pigeon
Annotating data using Pigeon.
Last updated
Annotating data using Pigeon.
Last updated
Pigeon is a lightweight, open-source annotation tool designed for quick and easy labeling of data directly within Jupyter notebooks. It provides a simple and intuitive interface for annotating various types of data, including:
Text Classification
Image Classification
Text Captioning
If you need to label a small to medium-sized dataset as part of your ML workflow and prefer the convenience of doing it directly within your Jupyter notebook, Pigeon is a great choice. It is particularly useful for:
Quick labeling tasks that don't require a full-fledged annotation platform
Iterative labeling during the exploratory phase of your ML project
Collaborative labeling within a Jupyter notebook environment
To use the Pigeon annotator, you first need to install the ZenML Pigeon integration:
Next, register the Pigeon annotator with ZenML, specifying the output directory where the annotation files will be stored:
Note that the output_dir
is relative to the repository or notebook root.
Finally, add the Pigeon annotator to your stack and set it as the active stack:
Now you're ready to use the Pigeon annotator in your ML workflow!
With the Pigeon annotator registered and added to your active stack, you can easily access it using the ZenML client within your Jupyter notebook.
For text classification tasks, you can launch the Pigeon annotator as follows:
For image classification tasks, you can provide a custom display function to render the images:
The launch
method returns the annotations as a list of tuples, where each tuple contains the data item and its corresponding label.
You can also use the zenml annotator dataset
commands to manage your datasets:
zenml annotator dataset list
- List all available datasets
zenml annotator dataset delete <dataset_name>
- Delete a specific dataset
zenml annotator dataset stats <dataset_name>
- Get statistics for a specific dataset
Annotation files are saved as JSON files in the specified output directory. Each annotation file represents a dataset, with the filename serving as the dataset name.
Pigeon was created by Anastasis Germanidis and released as a Python package and Github repository. It is licensed under the Apache License. It has been updated to work with more recent ipywidgets
versions and some small UI improvements were added. We are grateful to Anastasis for creating this tool and making it available to the community.