How to install Apache Airflow on Ubuntu using terminal

Categories

To install Apache Airflow on Ubuntu, follow these steps:

  1. Update system packages: Open a terminal window and run the following commands to update the system packages:

Bash

sudo apt update
sudo apt upgrade

Use code with caution. Learn more

  1. Install required dependencies: Airflow requires several dependencies to be installed. Run the following command to install them:

Bash

sudo apt install software-properties-common
sudo apt-add-repository universe
sudo apt install python3-pip
sudo apt install libmysqlclient-dev libssl-dev libkrb5-dev

Use code with caution. Learn more

  1. Add the Airflow repository: Airflow is available as an external package through the Ubuntu universe repository. To add it, run the following command:

Bash

sudo add-apt-repository ppa:airflow/airflow

Use code with caution. Learn more

  1. Update package index and install Airflow: Once the repository is added, update the package index and install Airflow:

Bash

sudo apt update
sudo apt install apache-airflow

Use code with caution. Learn more

  1. Create a virtual environment: Airflow is recommended to be run in a virtual environment to isolate its dependencies from the system Python environment. To create a virtual environment, run the following command:

Bash

python3 -m venv airflow-env

Use code with caution. Learn more

  1. Activate the virtual environment: Activate the virtual environment to use the installed Airflow packages:

Bash

source airflow-env/bin/activate

Use code with caution. Learn more

  1. Create an Airflow user: Airflow requires a dedicated user to run as the scheduler and web server. To create this user, run the following command:

Bash

airflow users create --role Admin --username admin --password admin --firstname admin --lastname admin

Use code with caution. Learn more

  1. Initialize the Airflow database: The Airflow database needs to be initialized to store DAGs, logs, and other metadata. Run the following command to initialize the database:

Bash

airflow db init

Use code with caution. Learn more

  1. Start the Airflow scheduler: The Airflow scheduler is responsible for triggering DAGs according to their schedules. To start the scheduler in the background, run the following command:

Bash

nohup airflow scheduler &

Use code with caution. Learn more

  1. Start the Airflow webserver: The Airflow webserver provides a user interface for managing DAGs, tasks, and other Airflow configurations. To start the webserver in the background, run the following command:

Bash

nohup airflow webserver &

Use code with caution. Learn more

  1. Access the Airflow web UI: Once the webserver is running, open a web browser and navigate to http://localhost:8080. Enter the username admin and the password you set in step 7 to log in.

You should now have a functional Apache Airflow installation on your Ubuntu server. You can start creating and managing DAGs (Directed Acyclic Graphs), the building blocks of Airflow jobs.