What is Citus?
Citus is an open source extension to Postgres that distributes data and queries across multiple nodes in a cluster. Because Citus is an extension (not a fork) to Postgres, when you use Citus, you are also using Postgres. You can leverage the latest Postgres features, tooling, and ecosystem.
Citus transforms Postgres into a distributed database with features like sharding, a distributed SQL engine, reference tables, and distributed tables. The Citus combination of parallelism, keeping more data in memory, and higher I/O bandwidth can lead to significant performance improvements for multi-tenant SaaS applications, customer-facing real-time analytics dashboards, and time series workloads.
Multi-Node Citus
Steps to Perform on all Nodes:
1. Add repository
# Add Citus repository for package manager
curl https://install.citusdata.com/community/rpm.sh | sudo bash
2. Install PostgreSQL + Citus and initialize a database
# install PostgreSQL with Citus extension
sudo yum install -y citus101_13
# initialize system database (using RHEL 6 vs 7 method as necessary)
sudo service postgresql-13 initdb || sudo /usr/pgsql-13/bin/postgresql-13-setup initdb
# preload citus extension
echo "shared_preload_libraries = 'citus'" | sudo tee -a /var/lib/pgsql/13/data/postgresql.conf
3. Configure connection and authentication
Before starting the database let’s change its access permissions. By default the database server listens only to clients on localhost. As a part of this step, we instruct it to listen on all IP interfaces, and then configure the client authentication file to allow all incoming connections from the local network.
sudo vi /var/lib/pgsql/13/data/postgresql.conf
# Uncomment listen_addresses for the changes to take effect
listen_addresses = '*'
sudo vi /var/lib/pgsql/13/data/pg_hba.conf
# Allow unrestricted access to nodes in the local network. The following ranges
# correspond to 24, 20, and 16-bit blocks in Private IPv4 address spaces.
host all all 10.0.0.0/8 trust
# Also allow the host unrestricted access to connect to itself
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
4. Start database servers, create Citus extension
# start the db server
sudo service postgresql-13 restart
# and make it start automatically when computer does
sudo chkconfig postgresql-13 on
You must add the Citus extension to every database you would like to use in a cluster. The following example adds the extension to the default database which is named postgres.
sudo -i -u postgres psql -c "CREATE EXTENSION citus;"
############ Steps to be executed on the coordinator node #####################
The steps listed below must be executed only on the coordinator node after the previously mentioned steps have been executed.
1. Add worker node information
We need to inform the coordinator about its workers. To add this information, we call a UDF which adds the node information to the pg_dist_node catalog table, which the coordinator uses to get the list of worker nodes. For our example, we assume that there are two workers (named worker-101, worker-102). Add the workers’ DNS names (or IP addresses) and server ports to the table.
sudo -i -u postgres psql -c "SELECT * from citus_add_node('worker-101', 5432);"
sudo -i -u postgres psql -c "SELECT * from citus_add_node('worker-102', 5432);"
2. Verify that installation has succeeded
To verify that the installation has succeeded, we check that the coordinator node has picked up the desired worker configuration.
This command when run in the psql shell should output the worker nodes we added to the pg_dist_node table above.
sudo -i -u postgres psql -c "SELECT * FROM citus_get_active_worker_nodes();"
Ready to use Citus
At this step, you have completed the installation process and are ready to use your Citus cluster.
The new Citus database is accessible in psql through the postgres user:
sudo -i -u postgres psql
2. Verify that installation has succeeded
To verify that the installation has succeeded, we check that the coordinator node has picked up the desired worker configuration. This command when run in the psql shell should output the worker nodes we added to the pg_dist_node table above.
sudo -i -u postgres psql -c "SELECT * FROM citus_get_active_worker_nodes();"
Ready to use Citus
At this step, you have completed the installation process and are ready to use your Citus cluster. The new Citus database is accessible in psql through the postgres user:
sudo -i -u postgres psql
Time to create some tables and data using pgbench:
pgbench -i -s 10 citus
As of now these are standard tables and we can turn them into distributed tables by using the “create_distributed_table” function and provide the column we we want to shard on:
SELECT create_distributed_table('pgbench_accounts', 'aid');
This created a distributed table and you can query citus_tables to get more information:
postgres@ip-10-0-1-23:/home/postgres/ [citus] psql -c "select * from citus_tables"
citus
table_name | citus_table_type | distribution_column | colocation_id | table_size | shard_count | table_owner | access_method
------------------+------------------+---------------------+---------------+------------+-------------+-------------+---------------
pgbench_accounts | distributed | aid | 1 | 151 MB | 32 | postgres | heap
(1 row)
Assign a Node to Coordinator Node:
SELECT citus_set_coordinator_host('10.0.1.23');
Now we can add the workers:
SELECT * from citus_add_node('10.0.1.16', 5432)
Distributing the data across the nodes is something you need to kick of manually:
SELECT rebalance_table_shards();
Finally remove the data from the coordinator node by draining the node:
SELECT citus_drain_node('10.0.1.23',5432);
No comments:
Post a Comment