CeleryExecutor
is one of the ways you can scale out the number of workers. For this
to work, you need to setup a Celery backend (RabbitMQ, Redis, …) and
change your airflow.cfg
to point the executor parameter to
CeleryExecutor
and provide the related Celery settings.
For more information about setting up a Celery broker, refer to the exhaustive Celery documentation on the topic.
Here are a few imperative requirements for your workers:
airflow
needs to be installed, and the CLI needs to be in the pathHiveOperator
,
the hive CLI needs to be installed on that box, or if you use the
MySqlOperator
, the required Python library needs to be available in
the PYTHONPATH
somehowDAGS_FOLDER
, and you need to
synchronize the filesystems by your own means. A common setup would be to
store your DAGS_FOLDER in a Git repository and sync it across machines using
Chef, Puppet, Ansible, or whatever you use to configure machines in your
environment. If all your boxes have a common mount point, having your
pipelines files shared there should work as wellTo kick off a worker, you need to setup Airflow and kick off the worker subcommand
airflow worker
Your worker should start picking up tasks as soon as they get fired in its direction.
Note that you can also run “Celery Flower”, a web UI built on top of Celery,
to monitor your workers. You can use the shortcut command airflow flower
to start a Flower web server.
Some caveats: