Module core.datasources.bq_datasource

BigQuery Datasource definition

Classes

BigQueryDatasource(name: str, query_project: str, query_dataset: str, query_table: str, gcs_location: str, query_limit: Union[int, NoneType] = None, dest_project: str = None, schema: Dict = None, **unused_kwargs) : ZenML BigQuery datasource definition.

Use this for BigQuery training pipelines.

Initialize BigQuery source. This creates a DataPipeline that
essentially performs the following query using Apache Beam.

`SELECT * FROM query_project.query_dataset.query_table`

A Google Cloud Storage location needs to be provided to make this work.
The GCS location is used to write temporary dumps of the query as the
beam pipeline executes. The location must exist within a GCP project
specified through dest_project.

Args:
    name: name of datasource. Must be globally unique in the repo.
    query_project: name of gcp project.
    query_dataset: name of dataset.
    query_table: name of table in dataset.
    query_limit: how many rows, from the top, to be queried.
    gcs_location: google cloud storage (bucket) location to store temp.
    dest_project: name of destination project. If None is specified,
    then dest_project is set to the same as query_project.

### Ancestors (in MRO)

* zenml.core.datasources.base_datasource.BaseDatasource

### Class variables

`DATA_STEP`
:   A step that reads in data from a Google BigQuery table supplied on
    construction.