PostgreSQL (postgres or pgsql
) is a powerful open source relational database known for reliability, extensibility, and standards compliance. It features:
- Advanced SQL support (window functions, CTEs, JSON, full-text search)
- ACID compliance and strong transactional integrity
- Rich indexing (B-tree, GIN, GiST, BRIN, hash, SP-GiST)
- Extensible with custom types, operators, and functions
- MVCC for high concurrency and performance
- Robust security, authentication, and role management
- Active community, frequent releases, and excellent documentation
Ideal for everything from small apps to large-scale, mission-critical systems.
Core Concepts
Configuration
Depends on distro, generally somewhere like /etc/postgresql/11/main
.
postgresql.conf
the main config file, by default psql runs on port 5432, and the data directory lives invar
somewhere like/var/lib/postgresql/11/main
pg_hba.conf
host based authenticationpg_ident.conf
ident authentication config
Roles and Security
Postgres simplifies users and groups, into the role (i.e. they are users, they are roles).
A number of authentication methods are available, with the default being ident.
ident
andpeer
: if a local nix account that has the same name as a psql role, can sign in as that role. Optional user to role mappings can go inpg_ident.conf
.ident
applies to TCP/IP, andpeer
for local connections.md5
: takes an MD5 hashed passwordpassword
: clear texttrust
: no password needed, based on an IP/user/DB matchreject
: deny immediatelyGSSAPI
: standard secure auth protocol defined in RFC2743, and often combined with KerberosSSPI
: for Microsoft environments, uses Kerberos when possible, will fall back to NTLM- and lots more
kerberos
,LDAP
,RADIUS
,Certificate
andPAM
By default, only the postgres
user has read/write access to the configuration files, to edit make sure you are running as this identity.
To allow a non-default login, open up the the pg_hba.conf
(host based auth) config file.
$ sudo -u postgres vim pg_hba.conf
Basics
Connectivity
Starting off with ident
authentication, by default on installation a new OS user and PG role called postgres
is created. Running the psql
client as this user is one way to gain initial access, i.e. yb sudo -i -u postgres
or by running psql
under the user sudo -u postgres psql
.
postgres=# SELECT version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.1 (Ubuntu 11.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit
(1 row)
Enabling remote access, is generally a first desirable goal.
- Edit
postgresql.conf
, uncommentlisten_addresses = 'localhost'
, changelocalhost
to*
- Edit
pg_hba.conf
, addhost all all 192.168.1.0/24 trust
(substituting the network and CIDR for whats appropriate) - Bounce the daemon
sudo systemctl restart postgresql
- From a remote (client) host verify the port is open with a port scan
nmap -sT 192.168.1.238
$ nmap -sT 192.168.1.238
Nmap scan report for 192.168.1.238
PORT STATE SERVICE
22/tcp open ssh
5432/tcp open postgresql
Client Utilities
PostgreSQL ships with several handy client applications. You can see the symlinks to them in /usr/bin
by running ls -l /usr/bin | grep postgres
.
clusterdb
cluster a databasecreatedb
create a new databasecreatelang
install a procedural languagecreateuser
define a new user accountdropdb
remove a databasedroplang
remove a procedural languagedropuser
remove a user accountpg_basebackup
take a base backup of a clusterpgbench
run a benchmark testpg_dump
extract a database into a script file or other archive filepg_dumpall
extract a database cluster into a script filepg_isready
check the connection status of a serverpg_receivewal
stream write-ahead logs from a serverpg_receivexlog
streams transaction logs from a clusterpg_recvlogical
control logical decoding streamspg_restore
restore a database from an archive file created bypg_dump
psql
interactive terminalreindexdb
reindex a databasevacuumdb
garbage-collect and analyze a database
For example to create a user and database, could run:
$ sudo -u postgres createuser --login ben
And:
$ sudo -u postgres createdb --owner=ben foodb
To make the ben
role a superuser could have passed the -s
switch to createuser
, to promote the existing low powered role, can leverage the ALTER USER
SQL command. First can review existing role permissions with a quick \du
:
$ sudo -u postgres psql
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
ben | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Then the SQL:
postgres=# ALTER USER ben WITH SUPERUSER CREATEDB CREATEROLE REPLICATION LOGIN;
ALTER ROLE
Verification:
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
ben | Superuser, Create role, Create DB, Replication | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Several privileges are available and well documented:
SUPERUSER
can override any access controls within the database. Dangerous.CREATEDB
can spin up new dbsCREATEROLE
can create/drop roles (users)INHERIT
automatically applies powers of roles that the role is a member of (without explicitSET ROLE
)LOGIN
can be used for initial session authorisation (i.e. a user)REPLICATION
needed for connectivity to servers in replication mode.BYPASSRLS
bypass row-level security (RLS) policy.
Each of the below has an explicit deny version of the privilege, prefixed with NO, e.g. NOSUPERUSER
.
psql Primer
\?
: show all psql commands\h
show SQL syntax help\c
dbname [username]: Connect to database, with an optional username (or\connect
)
Display commands (append +
for extra details):
\l
: List all database (or\list
)\d
: Display all tables, indexes, views, and sequences\dt
: Display all tables\di
: Display all indexes\dv
: Display all views\ds
: Display all sequences\dT
: Display all types\dS
: Display all system tables\du
: Display all users\x
auto|on|off: Toggle|On|Off expanded output mode
To change database use \c
:
\c enceladus
enceladus=#
To get specific SQL statement help, try handy psql command \h
:
enceladus=# \h CREATE TABLE
Command: CREATE TABLE
Description: define a new table
Syntax:
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
{ column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
| table_constraint
....
Go for it:
enceladus=# drop table if exists master_plan;
enceladus=# create table master_plan(
enceladus(# id serial primary key,
enceladus(# the_date date,
enceladus(# title varchar(100),
enceladus(# description text
enceladus(# );
SQL script files
Great for reusable SQL, and putting into Git.
psql foodb -h 10.1.2.3 < build.sql
Or:
psql foodb -h 10.1.2.3 -f build.sql
Importing Data (ETL)
Generally ETL software is bloat. KISS with:
- Idempotent shell scripts and
make
files. Idempotency is important, so that scripts can be reliably run numerous times and result in the same outcome. All data should be typed asTEXT
, typing and cleanup can be dealt with later on. - If needed, elevate to Python using Pandas or PETL
- If needed, elevate by considering a dedicating piece of middleware such as Kafka or Logstash
Digging through the help \h COPY
found COPY FROM
will be useful for extracting the CSV on disk:
copy data between a file and a table
For example:
COPY master_plan
FROM '<put_path_here>/master_plan.csv'
WITH DELIMITER ',' HEADER CSV;
Note, the SQL based approach is server based, i.e. the csv path will need to be accessible by the server itself.
For a client side approach can leverage the \copy
commands, like so:
\copy master_plan FROM '/home/ben/code/psql/cassini_data/curious_data/data/master_plan.csv' WITH DELIMITER ',' HEADER CSV;
JSON and JSONB
Key Benefits for JSONB
-- Fast containment queries - find events containing specific data:
SELECT * FROM events WHERE event_data @> '{"user_id": 123}';
SELECT * FROM events WHERE event_data @> '{"type": "OrderCreated"}';
-- Existence checks - check if JSON key exists
SELECT * FROM events WHERE event_data ? 'correlation_id';
SELECT * FROM events WHERE event_data ?& array['user_id', 'amount'];
-- JSON path queries - query nested JSON structures:
SELECT * FROM events WHERE event_data @@ '$.metadata.version > 2';
Event sourcing use-cases (one of my favourite architectural patterns):
-- Event Sourcing Use Cases - Aggregate filtering:
WHERE event_data @> '{"aggregate_id": "order-123"}'
-- Event type filtering - all payment events:
WHERE event_data @> '{"event_type": "PaymentProcessed"}'
-- Complex event queries - Events with specific metadata:
WHERE event_data @> '{"metadata": {"source": "api"}}'
Indexing
GIN (Generalized Inverted Index)
GIN
(Generalized Inverted Index) provides fast querying capabilities for JSONB
data in PostgreSQL. GIN
essentially makes JSONB
queries perform like traditional indexed columns, which is crucial for use-cases such as event stores with millions of fine grained events.
Uber handy
Number of active connections
select pid from pg_stat_activity where datname = 'svc';
Kill connections
select pg_terminate_backend(pid) from pg_stat_activity where pid <> pg_backend_pid() and datname = 'database_name';