Service Foundry
Young Gyu Kim <credemol@gmail.com>

Apache Airflow - Using Azure Blob Storage

apache airflow azure blob storage

Introduction

This article is part of the series on Airflow on Kubernetes. In this series, we will cover the following topics:

This is the third article in the series.

In this article, we’ll dive into using Azure Blob Storage with Apache Airflow.

We’ll create two DAGs to showcase different approaches:

  • pandas-abfs-example.py: Demonstrates reading from and writing to Azure Blob Storage using Pandas DataFrames.

  • duckdb-abfs-example.py: Highlights reading data from Azure Blob Storage with DuckDB.

Limitation of this article

In this article, we will not cover the cases when using Custom Docker Image with the Kubernetes Pod Operator. For those cases, it is recommended to use libraries provided by Azure SDK for your programming language.

Set up Azure Blob Storage Connection

To use Azure Blob Storage in Apache Airflow, we need to set up the connection to Azure Blob Storage.

WASB is deprecated

Some documents online may refer to the Windows Azure Storage Blob driver (WASB) for Azure Blob Storage. However, wasb::// protocol is deprecated.

Now the protocols below are available:

  • abfs

  • abfss

  • adl

Microsoft has deprecated the Windows Azure Storage Blob driver (WASB) for Azure Blob Storage in favor of the Azure Blob Filesystem driver (ABFS)

For more information on the deprecation of WASB, refer to the link:

Create a Connection

To create a connection to Azure Blob Storage, run the main menu → Admin → Connections → Create.

wasb connection add 1
Figure 1. Admin/Connection menu

Fill in the following fields: * Connection ID: az_blob_storage (or any name you want) * Connection Type: Azure Blob Storage * Blob Storage Connection String (optional): Connection string for Azure Blob Storage

wasb connection add 2
Figure 2. Azure Blob Storage Connection - Form Fields 1/2
wasb connection add 3
Figure 3. Azure Blob Storage Connection - Form Fields 2/2

Connection ID

For most cases, the Connection ID has nothing to do with the actual connection. It is just a name for the connection. But in the example of DuckDB, the Connection ID is used to get the container name of Azure Blob Storage. So, we are going to create a new Azure Blob Storage connection with the Connection ID of my-storage later.

Connection String

For some reason, among security options, only the connection string works for the Azure Blob Storage connection. No need to set Blog Storage Login(account name) because the connection string includes the account name.

You can generate the connection string from the Azure Portal.

wasb connection string internal
Figure 4. Azure Portal - Generate Connection String

Please note that the connection string has its expiration date. So, you need to regenerate the connection string when it expires.

pandas-abfs-example.py

We are going to create a simple DAG having 2 tasks:

  • task1 - read JSON data from hard coded data and write it to Azure Blob Storage using Pandas DataFrame.

  • task2 - read JSON data from Azure Blob Storage and write it to the log.

Even though the two tasks runs on a different pod, we can share the path of the file using XCom.

dags/pandas-abfs-example.py
import pendulum
import json
from airflow.decorators import dag, task
from airflow.io.path import ObjectStoragePath


@dag(
    schedule=None,
    start_date=pendulum.datetime(2024, 11, 1, tz="UTC"),
    catchup=False,
    tags=["panadas", "abfs", "azure_blob_storage"],
)
def pandas_abfs_example():


    @task
    def save_file() -> ObjectStoragePath:
        import pandas as pd

        base = ObjectStoragePath("abfs://my-storage", conn_id="az_blob_storage")

        print("Saving file to Azure Blob Storage")
        print("base: ", base)

        data_string = '{"Courses":{"r1":"Spark"},"Fee":{"r1":"25000"},"Duration":{"r1":"50 Days"}}'

        df = pd.read_json(data_string)

        print("df: ", df)

        path = base / f"pandas/data.json"

        with path.open("w") as file:
            df.to_csv(file)

        return path

    @task
    def read_file(path: ObjectStoragePath):
        import pandas as pd

        base = ObjectStoragePath("abfs://my-storage", conn_id="az_blob_storage")

        print("Reading file from Azure Blob Storage")
        print("path: ", path)

        with path.open("r") as file:
            df = pd.read_csv(file)

        print("df: ", df)



    path = save_file()
    read_file(path)



pandas_abfs_example()

save_file function

In the DAG, we are going to use the ObjectStoragePath class to create the path for Azure Blob Storage.

        base = ObjectStoragePath("abfs://my-storage", conn_id="az_blob_storage")

'my-storage' is the container name of Azure Blob Storage. You can get the container name from the Azure Portal.

And path is created by adding the container name and the file name.

path = base / f"pandas/data.json"

In the save_file function, we are going to save the file to Azure Blob Storage. And we can see the data.json file in Azure Blob Storage.

pandas abfs 1
Figure 5. Azure Blob Storage - data.json file

read_file function

As the parameter of the read_file function, we are going to use the path of the file that is saved in the save_file function. It is shared using XCom. Each task running on a different pod can share the data using XCom. The size of the data shared using XCom is limited. So, it is not recommended to share large data like Pandas DataFrame or Spark DataFrame.

XComs

We can see the data shared using XCom in the Airflow UI.

pandas abfs xcom
Figure 6. Airflow UI - XCom

For more information on XComs, refer to the link:

Since the 'pandas' package is included in the default Docker image of Apache Airflow, we can use the 'pandas' package in the Airflow task without any additional installation. However, if you need to use other packages, you need to create a custom Docker image.

duckdb-abfs-example.py

The duckdb package is not included in the default Docker image of Apache Airflow. So, we need to create a custom Docker image.

Customized Base Docker Image for DuckDB

This docker image is the base image that is used for all Airflow tasks.

For more information on python libraries included in the Docker image of Apache Airflow, refer to the appendix.

docker/custom/Dockerfile
FROM apache/airflow:2.10.3

RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir duckdb

This Dockerfile installs the 'duckdb' package. And you can add other packages that you need.

In order to use this Docker image as the base image for the Airflow task, we need to push this Docker image to the container registry.

$ az acr build --image airflow-custom:2.10.3 --registry {your-acr-name} ./ docker/custom

And then, we need to update the values.yaml file to use this Docker image.

custom-values.yaml
images:
  airflow:
    repository: {your-acr-name}.azurecr.io/airflow-custom
    tag: 2.10.3

Now we can use DuckDB in the Airflow task.

duckdb-abfs-example.py

We are going to create a simple DAG having 1 task:

  • task1 - read Parquet data from Azure Blob Storage and analyze the data using DuckDB.

The Parquet files used in this example are the same as the Parquet files used in the previous article. The Sling ETL task migrates data from the source database to Azure Blob Storage in Parquet format. In the previous article, we used a separate Docker image and used Sling configuration file to use Azure Blob Storage. In this article, we are going to use the Azure Blob Storage connection in Apache Airflow.

dags/duckdb-abfs-example.py
import pendulum
import json
from airflow.decorators import dag, task
from airflow.io.path import ObjectStoragePath


@dag(
    schedule=None,
    start_date=pendulum.datetime(2024, 11, 1, tz="UTC"),
    catchup=False,
    tags=["abfs", "duckdb", "azure_blob_storage"],
)
def duckdb_abfs_example():

    @task
    def analyze_data():
        """ Analyze
        This task will analyze the data from Azure Blob Storage
        """
        print("Analyzing data from Azure Blob Storage")

        import duckdb

        base = ObjectStoragePath("abfs://my-storage", conn_id="my-storage")

        path = base / "sling/2024-11-29/division/*.parquet"
        conn = duckdb.connect(database=":memory:")
        conn.register_filesystem(path.fs)
        conn.execute(f"CREATE OR REPLACE VIEW division AS SELECT * FROM read_parquet('{path}')")

        df = conn.execute("SELECT COUNT(*) AS COUNT FROM division").fetchdf()
        print(df)

        message = "===> The number of records in the division table is: " + str(df["COUNT"][0])
        print(message)


    analyze_data()

    # azure_task >> analyze_task

duckdb_abfs_example()

In analyze_date function, we are using Azure Blob Storage to read Parquet files and analyze the data using DuckDB.

But DuckDB treats ObjectStoragePath in a different way. While Pandas get container name from the ObjectStoragePath, DuckDB gets the container name from the Connection ID.

So we need to create a new Azure Blob Storage connection with the Connection ID of my-storage.

I used the same information as the Pandas example but with a different Connection ID.

After executing the DAG, we can see the result of the DuckDB query in the log.

duckdb results 1
Figure 7. duckdb-abfs-example.py - result

Conclusion

In this article, we explored how to work with Azure Blob Storage in Apache Airflow. We created two DAGs to demonstrate the use of Pandas DataFrames and DuckDB with Azure Blob Storage. Additionally, we learned how to share data between tasks using XCom and how to build a custom Docker image to include the DuckDB package for use in Airflow tasks.

All my LinkedIn articles are available at My LinkedIn Article Library.

Appendix

Python libraries included in the Docker image of Apache Airflow 2.9.3

$ pip list

Package                                  Version
---------------------------------------- ---------------
adal                                     1.2.7
adlfs                                    2024.4.1
aiobotocore                              2.13.1
aiofiles                                 23.2.1
aiohttp                                  3.9.5
aioitertools                             0.11.0
aiosignal                                1.3.1
alembic                                  1.13.2
amqp                                     5.2.0
annotated-types                          0.7.0
anyio                                    4.4.0
apache-airflow                           2.9.3
apache-airflow-providers-amazon          8.25.0
apache-airflow-providers-celery          3.7.2
apache-airflow-providers-cncf-kubernetes 8.3.3
apache-airflow-providers-common-io       1.3.2
apache-airflow-providers-common-sql      1.14.2
apache-airflow-providers-docker          3.12.2
apache-airflow-providers-elasticsearch   5.4.1
apache-airflow-providers-fab             1.2.2
apache-airflow-providers-ftp             3.10.0
apache-airflow-providers-google          10.21.0
apache-airflow-providers-grpc            3.5.2
apache-airflow-providers-hashicorp       3.7.1
apache-airflow-providers-http            4.12.0
apache-airflow-providers-imap            3.6.1
apache-airflow-providers-microsoft-azure 10.2.0
apache-airflow-providers-mysql           5.6.2
apache-airflow-providers-odbc            4.6.2
apache-airflow-providers-openlineage     1.9.1
apache-airflow-providers-postgres        5.11.2
apache-airflow-providers-redis           3.7.1
apache-airflow-providers-sendgrid        3.5.1
apache-airflow-providers-sftp            4.10.2
apache-airflow-providers-slack           8.7.1
apache-airflow-providers-smtp            1.7.1
apache-airflow-providers-snowflake       5.6.0
apache-airflow-providers-sqlite          3.8.1
apache-airflow-providers-ssh             3.11.2
apispec                                  6.6.1
argcomplete                              3.4.0
asgiref                                  3.8.1
asn1crypto                               1.5.1
asyncssh                                 2.15.0
attrs                                    23.2.0
Authlib                                  1.3.1
azure-batch                              14.2.0
azure-common                             1.1.28
azure-core                               1.30.2
azure-cosmos                             4.7.0
azure-datalake-store                     0.0.53
azure-identity                           1.17.1
azure-keyvault-secrets                   4.8.0
azure-kusto-data                         4.5.1
azure-mgmt-containerinstance             10.1.0
azure-mgmt-containerregistry             10.3.0
azure-mgmt-core                          1.4.0
azure-mgmt-cosmosdb                      9.5.1
azure-mgmt-datafactory                   8.0.0
azure-mgmt-datalake-nspkg                3.0.1
azure-mgmt-datalake-store                0.5.0
azure-mgmt-nspkg                         3.0.2
azure-mgmt-resource                      23.1.1
azure-mgmt-storage                       21.2.1
azure-nspkg                              3.0.2
azure-servicebus                         7.12.2
azure-storage-blob                       12.20.0
azure-storage-file-datalake              12.15.0
azure-storage-file-share                 12.16.0
azure-synapse-artifacts                  0.19.0
azure-synapse-spark                      0.7.0
Babel                                    2.15.0
backoff                                  2.2.1
bcrypt                                   4.1.3
beautifulsoup4                           4.12.3
billiard                                 4.2.0
blinker                                  1.8.2
boto3                                    1.34.131
botocore                                 1.34.131
cachelib                                 0.9.0
cachetools                               5.3.3
cattrs                                   23.2.3
celery                                   5.4.0
certifi                                  2024.7.4
cffi                                     1.16.0
chardet                                  5.2.0
charset-normalizer                       3.3.2
click                                    8.1.7
click-didyoumean                         0.3.1
click-plugins                            1.1.1
click-repl                               0.3.0
clickclick                               20.10.2
colorama                                 0.4.6
colorlog                                 4.8.0
ConfigUpdater                            3.2
connexion                                2.14.2
cron-descriptor                          1.4.3
croniter                                 2.0.5
cryptography                             41.0.7
db-dtypes                                1.2.0
decorator                                5.1.1
Deprecated                               1.2.14
dill                                     0.3.8
distlib                                  0.3.8
dnspython                                2.6.1
docker                                   7.1.0
docstring_parser                         0.16
docutils                                 0.16
elastic-transport                        8.13.1
elasticsearch                            8.14.0
email_validator                          2.2.0
eventlet                                 0.36.1
filelock                                 3.15.4
Flask                                    2.2.5
Flask-AppBuilder                         4.5.0
Flask-Babel                              2.0.0
Flask-Caching                            2.3.0
Flask-JWT-Extended                       4.6.0
Flask-Limiter                            3.7.0
Flask-Login                              0.6.3
Flask-Session                            0.5.0
Flask-SQLAlchemy                         2.5.1
Flask-WTF                                1.2.1
flower                                   2.0.1
frozenlist                               1.4.1
fsspec                                   2023.12.2
gcloud-aio-auth                          4.2.3
gcloud-aio-bigquery                      7.1.0
gcloud-aio-storage                       9.2.0
gcsfs                                    2023.12.2.post1
gevent                                   24.2.1
google-ads                               24.1.0
google-analytics-admin                   0.22.8
google-api-core                          2.19.1
google-api-python-client                 2.137.0
google-auth                              2.32.0
google-auth-httplib2                     0.2.0
google-auth-oauthlib                     1.2.1
google-cloud-aiplatform                  1.59.0
google-cloud-appengine-logging           1.4.4
google-cloud-audit-log                   0.2.5
google-cloud-automl                      2.13.4
google-cloud-batch                       0.17.22
google-cloud-bigquery                    3.20.1
google-cloud-bigquery-datatransfer       3.15.4
google-cloud-bigtable                    2.24.0
google-cloud-build                       3.24.1
google-cloud-compute                     1.19.1
google-cloud-container                   2.49.0
google-cloud-core                        2.4.1
google-cloud-datacatalog                 3.19.1
google-cloud-dataflow-client             0.8.11
google-cloud-dataform                    0.5.10
google-cloud-dataplex                    2.2.1
google-cloud-dataproc                    5.10.1
google-cloud-dataproc-metastore          1.15.4
google-cloud-dlp                         3.18.1
google-cloud-kms                         2.24.1
google-cloud-language                    2.13.4
google-cloud-logging                     3.10.0
google-cloud-memcache                    1.9.4
google-cloud-monitoring                  2.22.1
google-cloud-orchestration-airflow       1.13.0
google-cloud-os-login                    2.14.5
google-cloud-pubsub                      2.22.0
google-cloud-redis                       2.15.4
google-cloud-resource-manager            1.12.4
google-cloud-run                         0.10.7
google-cloud-secret-manager              2.20.1
google-cloud-spanner                     3.47.0
google-cloud-speech                      2.26.1
google-cloud-storage                     2.17.0
google-cloud-storage-transfer            1.11.4
google-cloud-tasks                       2.16.4
google-cloud-texttospeech                2.16.4
google-cloud-translate                   3.15.4
google-cloud-videointelligence           2.13.4
google-cloud-vision                      3.7.3
google-cloud-workflows                   1.14.4
google-crc32c                            1.5.0
google-re2                               1.1.20240702
google-resumable-media                   2.7.1
googleapis-common-protos                 1.63.2
graphviz                                 0.20.3
greenlet                                 3.0.3
grpc-google-iam-v1                       0.13.1
grpc-interceptor                         0.15.4
grpcio                                   1.64.1
grpcio-gcp                               0.2.2
grpcio-status                            1.62.2
gunicorn                                 22.0.0
h11                                      0.14.0
h2                                       4.1.0
hpack                                    4.0.0
httpcore                                 1.0.5
httplib2                                 0.22.0
httpx                                    0.27.0
humanize                                 4.10.0
hvac                                     2.3.0
hyperframe                               6.0.1
idna                                     3.7
ijson                                    3.3.0
importlib-metadata                       6.11.0
importlib_resources                      6.4.0
inflection                               0.5.1
isodate                                  0.6.1
itsdangerous                             2.2.0
Jinja2                                   3.1.4
jmespath                                 0.10.0
json-merge-patch                         0.2
jsonpath-ng                              1.6.1
jsonschema                               4.23.0
jsonschema-specifications                2023.12.1
kombu                                    5.3.7
kubernetes                               29.0.0
kubernetes_asyncio                       29.0.0
lazy-object-proxy                        1.10.0
ldap3                                    2.9.1
limits                                   3.13.0
linkify-it-py                            2.0.3
lockfile                                 0.12.2
looker-sdk                               24.10.0
lxml                                     5.2.2
Mako                                     1.3.5
markdown-it-py                           3.0.0
MarkupSafe                               2.1.5
marshmallow                              3.21.3
marshmallow-oneofschema                  3.1.1
marshmallow-sqlalchemy                   0.28.2
mdit-py-plugins                          0.4.1
mdurl                                    0.1.2
methodtools                              0.4.7
microsoft-kiota-abstractions             1.3.3
microsoft-kiota-authentication-azure     1.0.0
microsoft-kiota-http                     1.3.2
more-itertools                           10.3.0
msal                                     1.29.0
msal-extensions                          1.2.0
msgraph-core                             1.1.1
msrest                                   0.7.1
msrestazure                              0.6.4.post1
multidict                                6.0.5
mysql-connector-python                   9.0.0
mysqlclient                              2.2.4
numpy                                    1.26.4
oauthlib                                 3.2.2
openlineage-integration-common           1.18.0
openlineage-python                       1.18.0
openlineage_sql                          1.18.0
opentelemetry-api                        1.25.0
opentelemetry-exporter-otlp              1.25.0
opentelemetry-exporter-otlp-proto-common 1.25.0
opentelemetry-exporter-otlp-proto-grpc   1.25.0
opentelemetry-exporter-otlp-proto-http   1.25.0
opentelemetry-proto                      1.25.0
opentelemetry-sdk                        1.25.0
opentelemetry-semantic-conventions       0.46b0
ordered-set                              4.1.0
packaging                                24.1
pandas                                   2.1.4
pandas-gbq                               0.23.1
paramiko                                 3.4.0
pathspec                                 0.12.1
pendulum                                 3.0.0
pip                                      24.3.1
platformdirs                             4.2.2
pluggy                                   1.5.0
ply                                      3.11
portalocker                              2.10.0
prison                                   0.2.1
prometheus_client                        0.20.0
prompt_toolkit                           3.0.47
proto-plus                               1.24.0
protobuf                                 4.25.3
psutil                                   6.0.0
psycopg2-binary                          2.9.9
pyarrow                                  16.1.0
pyasn1                                   0.5.1
pyasn1-modules                           0.3.0
PyAthena                                 3.8.3
pycparser                                2.22
pydantic                                 2.8.2
pydantic_core                            2.20.1
pydata-google-auth                       1.8.2
Pygments                                 2.18.0
PyJWT                                    2.8.0
PyNaCl                                   1.5.0
pyodbc                                   5.1.0
pyOpenSSL                                24.1.0
pyparsing                                3.1.2
python-daemon                            3.0.1
python-dateutil                          2.9.0.post0
python-dotenv                            1.0.1
python-http-client                       3.3.7
python-ldap                              3.4.4
python-nvd3                              0.16.0
python-slugify                           8.0.4
pytz                                     2024.1
PyYAML                                   6.0.1
redis                                    5.0.7
redshift-connector                       2.1.2
referencing                              0.35.1
requests                                 2.32.3
requests-oauthlib                        1.3.1
requests-toolbelt                        1.0.0
rfc3339-validator                        0.1.4
rich                                     13.7.1
rich-argparse                            1.5.2
rpds-py                                  0.19.0
rsa                                      4.9
s3transfer                               0.10.2
scramp                                   1.4.5
sendgrid                                 6.11.0
setproctitle                             1.3.3
setuptools                               66.1.1
shapely                                  2.0.4
six                                      1.16.0
slack_sdk                                3.31.0
sniffio                                  1.3.1
snowflake-connector-python               3.11.0
snowflake-sqlalchemy                     1.6.1
sortedcontainers                         2.4.0
soupsieve                                2.5
SQLAlchemy                               1.4.52
sqlalchemy-bigquery                      1.11.0
SQLAlchemy-JSONField                     1.0.2
sqlalchemy-redshift                      0.8.14
sqlalchemy-spanner                       1.7.0
SQLAlchemy-Utils                         0.41.2
sqlparse                                 0.5.0
sshtunnel                                0.4.0
starkbank-ecdsa                          2.2.0
statsd                                   4.0.1
std-uritemplate                          1.0.3
tabulate                                 0.9.0
tenacity                                 8.5.0
termcolor                                2.4.0
text-unidecode                           1.3
time-machine                             2.14.2
tomlkit                                  0.13.0
tornado                                  6.4.1
typing_extensions                        4.12.2
tzdata                                   2024.1
uc-micro-py                              1.0.3
unicodecsv                               0.14.1
universal_pathlib                        0.2.2
uritemplate                              4.1.1
urllib3                                  2.0.7
uv                                       0.2.31
vine                                     5.1.0
virtualenv                               20.26.3
watchtower                               3.2.0
wcwidth                                  0.2.13
websocket-client                         1.8.0
Werkzeug                                 2.2.3
wirerope                                 0.4.7
wrapt                                    1.16.0
WTForms                                  3.1.2
yarl                                     1.9.4
zipp                                     3.19.2
zope.event                               5.0
zope.interface                           6.4.post2

Python libraries included in the Docker image of Apache Airflow 2.10.3

$ pip list

Package                                  Version
---------------------------------------- ------------
adal                                     1.2.7
adlfs                                    2024.7.0
aiobotocore                              2.15.2
aiofiles                                 23.2.1
aiohappyeyeballs                         2.4.3
aiohttp                                  3.10.10
aioitertools                             0.12.0
aiosignal                                1.3.1
alembic                                  1.14.0
amqp                                     5.2.0
annotated-types                          0.7.0
anyio                                    4.6.2.post1
apache-airflow                           2.10.3
apache-airflow-providers-amazon          9.0.0
apache-airflow-providers-celery          3.8.3
apache-airflow-providers-cncf-kubernetes 9.0.1
apache-airflow-providers-common-compat   1.2.1
apache-airflow-providers-common-io       1.4.2
apache-airflow-providers-common-sql      1.19.0
apache-airflow-providers-docker          3.14.0
apache-airflow-providers-elasticsearch   5.5.2
apache-airflow-providers-fab             1.5.0
apache-airflow-providers-ftp             3.11.1
apache-airflow-providers-google          10.25.0
apache-airflow-providers-grpc            3.6.0
apache-airflow-providers-hashicorp       3.8.0
apache-airflow-providers-http            4.13.2
apache-airflow-providers-imap            3.7.0
apache-airflow-providers-microsoft-azure 11.0.0
apache-airflow-providers-mysql           5.7.3
apache-airflow-providers-odbc            4.8.0
apache-airflow-providers-openlineage     1.13.0
apache-airflow-providers-postgres        5.13.1
apache-airflow-providers-redis           3.8.0
apache-airflow-providers-sendgrid        3.6.0
apache-airflow-providers-sftp            4.11.1
apache-airflow-providers-slack           8.9.1
apache-airflow-providers-smtp            1.8.0
apache-airflow-providers-snowflake       5.8.0
apache-airflow-providers-sqlite          3.9.0
apache-airflow-providers-ssh             3.14.0
apispec                                  6.7.1
argcomplete                              3.5.1
asgiref                                  3.8.1
asn1crypto                               1.5.1
asyncssh                                 2.18.0
attrs                                    24.2.0
Authlib                                  1.3.2
azure-batch                              14.2.0
azure-common                             1.1.28
azure-core                               1.32.0
azure-cosmos                             4.7.0
azure-datalake-store                     0.0.53
azure-identity                           1.19.0
azure-keyvault-secrets                   4.9.0
azure-kusto-data                         4.6.1
azure-mgmt-containerinstance             10.1.0
azure-mgmt-containerregistry             10.3.0
azure-mgmt-core                          1.5.0
azure-mgmt-cosmosdb                      9.6.0
azure-mgmt-datafactory                   9.0.0
azure-mgmt-datalake-nspkg                3.0.1
azure-mgmt-datalake-store                0.5.0
azure-mgmt-nspkg                         3.0.2
azure-mgmt-resource                      23.2.0
azure-mgmt-storage                       21.2.1
azure-nspkg                              3.0.2
azure-servicebus                         7.12.3
azure-storage-blob                       12.23.1
azure-storage-file-datalake              12.17.0
azure-storage-file-share                 12.19.0
azure-synapse-artifacts                  0.19.0
azure-synapse-spark                      0.7.0
babel                                    2.16.0
backoff                                  2.2.1
bcrypt                                   4.2.0
beautifulsoup4                           4.12.3
billiard                                 4.2.1
blinker                                  1.8.2
boto3                                    1.35.36
botocore                                 1.35.36
cachelib                                 0.9.0
cachetools                               5.5.0
cattrs                                   24.1.2
celery                                   5.4.0
certifi                                  2024.8.30
cffi                                     1.17.1
chardet                                  5.2.0
charset-normalizer                       3.4.0
click                                    8.1.7
click-didyoumean                         0.3.1
click-plugins                            1.1.1
click-repl                               0.3.0
clickclick                               20.10.2
colorama                                 0.4.6
colorlog                                 6.9.0
ConfigUpdater                            3.2
connexion                                2.14.2
cron-descriptor                          1.4.5
croniter                                 5.0.1
cryptography                             42.0.8
db-dtypes                                1.3.0
decorator                                5.1.1
Deprecated                               1.2.14
dill                                     0.3.9
distlib                                  0.3.9
dnspython                                2.7.0
docker                                   7.1.0
docstring_parser                         0.16
elastic-transport                        8.15.1
elasticsearch                            8.15.1
email_validator                          2.2.0
eventlet                                 0.37.0
filelock                                 3.16.1
Flask                                    2.2.5
Flask-AppBuilder                         4.5.2
Flask-Babel                              2.0.0
Flask-Caching                            2.3.0
Flask-JWT-Extended                       4.6.0
Flask-Limiter                            3.8.0
Flask-Login                              0.6.3
Flask-Session                            0.5.0
Flask-SQLAlchemy                         2.5.1
Flask-WTF                                1.2.2
flower                                   2.0.1
frozenlist                               1.5.0
fsspec                                   2024.10.0
gcloud-aio-auth                          5.3.2
gcloud-aio-bigquery                      7.1.0
gcloud-aio-storage                       9.3.0
gcsfs                                    2024.10.0
gevent                                   24.10.3
google-ads                               25.1.0
google-analytics-admin                   0.23.2
google-api-core                          2.22.0
google-api-python-client                 2.151.0
google-auth                              2.35.0
google-auth-httplib2                     0.2.0
google-auth-oauthlib                     1.2.1
google-cloud-aiplatform                  1.71.1
google-cloud-appengine-logging           1.5.0
google-cloud-audit-log                   0.3.0
google-cloud-automl                      2.14.1
google-cloud-batch                       0.17.31
google-cloud-bigquery                    3.20.1
google-cloud-bigquery-datatransfer       3.17.1
google-cloud-bigtable                    2.26.0
google-cloud-build                       3.27.0
google-cloud-compute                     1.20.1
google-cloud-container                   2.53.0
google-cloud-core                        2.4.1
google-cloud-datacatalog                 3.21.1
google-cloud-dataflow-client             0.8.13
google-cloud-dataform                    0.5.13
google-cloud-dataplex                    2.3.1
google-cloud-dataproc                    5.15.1
google-cloud-dataproc-metastore          1.16.0
google-cloud-dlp                         3.25.0
google-cloud-kms                         3.1.0
google-cloud-language                    2.15.0
google-cloud-logging                     3.11.3
google-cloud-memcache                    1.10.0
google-cloud-monitoring                  2.23.0
google-cloud-orchestration-airflow       1.15.0
google-cloud-os-login                    2.15.0
google-cloud-pubsub                      2.26.1
google-cloud-redis                       2.16.0
google-cloud-resource-manager            1.13.0
google-cloud-run                         0.10.10
google-cloud-secret-manager              2.21.0
google-cloud-spanner                     3.49.1
google-cloud-speech                      2.28.0
google-cloud-storage                     2.18.2
google-cloud-storage-transfer            1.13.0
google-cloud-tasks                       2.17.0
google-cloud-texttospeech                2.21.0
google-cloud-translate                   3.17.0
google-cloud-videointelligence           2.14.0
google-cloud-vision                      3.8.0
google-cloud-workflows                   1.15.0
google-crc32c                            1.6.0
google-re2                               1.1.20240702
google-resumable-media                   2.7.2
googleapis-common-protos                 1.65.0
graphviz                                 0.20.3
greenlet                                 3.1.1
grpc-google-iam-v1                       0.13.1
grpc-interceptor                         0.15.4
grpcio                                   1.67.1
grpcio-gcp                               0.2.2
grpcio-status                            1.62.3
gunicorn                                 23.0.0
h11                                      0.14.0
h2                                       4.1.0
hpack                                    4.0.0
httpcore                                 1.0.6
httplib2                                 0.22.0
httpx                                    0.27.0
humanize                                 4.11.0
hvac                                     2.3.0
hyperframe                               6.0.1
idna                                     3.10
ijson                                    3.3.0
immutabledict                            4.2.0
importlib-metadata                       6.11.0
importlib_resources                      6.4.5
inflection                               0.5.1
isodate                                  0.7.2
itsdangerous                             2.2.0
Jinja2                                   3.1.4
jmespath                                 0.10.0
json-merge-patch                         0.2
jsonpath-ng                              1.7.0
jsonschema                               4.23.0
jsonschema-specifications                2023.12.1
kombu                                    5.4.2
kubernetes                               30.1.0
kubernetes_asyncio                       30.1.0
lazy-object-proxy                        1.10.0
ldap3                                    2.9.1
limits                                   3.13.0
linkify-it-py                            2.0.3
lockfile                                 0.12.2
looker-sdk                               24.18.1
lxml                                     5.3.0
Mako                                     1.3.6
markdown-it-py                           3.0.0
MarkupSafe                               3.0.2
marshmallow                              3.23.1
marshmallow-oneofschema                  3.1.1
marshmallow-sqlalchemy                   0.28.2
mdit-py-plugins                          0.4.2
mdurl                                    0.1.2
methodtools                              0.4.7
microsoft-kiota-abstractions             1.3.3
microsoft-kiota-authentication-azure     1.1.0
microsoft-kiota-http                     1.3.3
microsoft-kiota-serialization-json       1.0.0
microsoft-kiota-serialization-text       1.0.0
more-itertools                           10.5.0
msal                                     1.31.0
msal-extensions                          1.2.0
msgraph-core                             1.1.6
msrest                                   0.7.1
msrestazure                              0.6.4.post1
multidict                                6.1.0
mysql-connector-python                   9.1.0
mysqlclient                              2.2.5
numpy                                    1.26.4
oauthlib                                 3.2.2
openlineage-integration-common           1.23.0
openlineage-python                       1.23.0
openlineage_sql                          1.23.0
opentelemetry-api                        1.27.0
opentelemetry-exporter-otlp              1.27.0
opentelemetry-exporter-otlp-proto-common 1.27.0
opentelemetry-exporter-otlp-proto-grpc   1.27.0
opentelemetry-exporter-otlp-proto-http   1.27.0
opentelemetry-proto                      1.27.0
opentelemetry-sdk                        1.27.0
opentelemetry-semantic-conventions       0.48b0
ordered-set                              4.1.0
packaging                                24.1
pandas                                   2.1.4
pandas-gbq                               0.24.0
paramiko                                 3.5.0
pathspec                                 0.12.1
pendulum                                 3.0.0
pip                                      24.2
platformdirs                             4.3.6
pluggy                                   1.5.0
ply                                      3.11
portalocker                              2.10.1
prison                                   0.2.1
prometheus_client                        0.21.0
prompt_toolkit                           3.0.48
propcache                                0.2.0
proto-plus                               1.25.0
protobuf                                 4.25.5
psutil                                   6.1.0
psycopg2-binary                          2.9.10
pyarrow                                  18.0.0
pyasn1                                   0.6.1
pyasn1_modules                           0.4.0
PyAthena                                 3.9.0
pycparser                                2.22
pydantic                                 2.9.2
pydantic_core                            2.23.4
pydata-google-auth                       1.8.2
Pygments                                 2.18.0
PyJWT                                    2.9.0
PyNaCl                                   1.5.0
pyodbc                                   5.2.0
pyOpenSSL                                24.2.1
pyparsing                                3.2.0
python-daemon                            3.1.0
python-dateutil                          2.9.0.post0
python-dotenv                            1.0.1
python-http-client                       3.3.7
python-ldap                              3.4.4
python-nvd3                              0.16.0
python-slugify                           8.0.4
python3-saml                             1.16.0
pytz                                     2024.2
PyYAML                                   6.0.2
redis                                    5.2.0
redshift-connector                       2.1.3
referencing                              0.35.1
requests                                 2.32.3
requests-oauthlib                        1.3.1
requests-toolbelt                        1.0.0
rfc3339-validator                        0.1.4
rich                                     13.9.4
rich-argparse                            1.6.0
rpds-py                                  0.20.1
rsa                                      4.9
s3transfer                               0.10.3
scramp                                   1.4.5
sendgrid                                 6.11.0
setproctitle                             1.3.3
setuptools                               75.3.0
shapely                                  2.0.6
six                                      1.16.0
slack_sdk                                3.33.3
sniffio                                  1.3.1
snowflake-connector-python               3.12.3
snowflake-sqlalchemy                     1.6.1
sortedcontainers                         2.4.0
soupsieve                                2.6
SQLAlchemy                               1.4.54
sqlalchemy-bigquery                      1.12.0
SQLAlchemy-JSONField                     1.0.2
sqlalchemy-redshift                      0.8.14
sqlalchemy-spanner                       1.7.0
SQLAlchemy-Utils                         0.41.2
sqlparse                                 0.5.1
sshtunnel                                0.4.0
starkbank-ecdsa                          2.2.0
statsd                                   4.0.1
std-uritemplate                          2.0.0
tabulate                                 0.9.0
tenacity                                 8.5.0
termcolor                                2.5.0
text-unidecode                           1.3
time-machine                             2.16.0
tomlkit                                  0.13.2
tornado                                  6.4.1
typing_extensions                        4.12.2
tzdata                                   2024.2
uc-micro-py                              1.0.3
universal_pathlib                        0.2.5
uritemplate                              4.1.1
urllib3                                  2.2.3
uv                                       0.4.1
vine                                     5.1.0
virtualenv                               20.27.1
watchtower                               3.3.1
wcwidth                                  0.2.13
websocket-client                         1.8.0
Werkzeug                                 2.2.3
wirerope                                 0.4.7
wrapt                                    1.16.0
WTForms                                  3.2.1
xmlsec                                   1.3.14
yarl                                     1.17.1
zipp                                     3.20.2
zope.event                               5.0
zope.interface                           7.1.1