PostgreSQL
PostgreSQL (postgres or pg
) is an amazing open source relational database that provides the SQL DSL for interacting with data. Installation is a breeze with any package manager, packages to grab:
postgresql
andpostgresql-common
: core serverpostgresql-client-common
andpostgresql-client
: client libs and binariespostgresql-contrib
: useful bolt on modules
Once installed, is managed as a daemon by systemd.
$ sudo systemctl start postgresql
$ sudo systemctl stop postgresql
$ sudo systemctl restart postgresql
$ sudo systemctl reload postgresql
$ sudo systemctl status postgresql
Core Concepts⌗
Configuration⌗
Depends on distro, generally somewhere like /etc/postgresql/11/main
.
postgresql.conf
the main config file, by default psql runs on port 5432, and the data directory lives invar
somewhere like/var/lib/postgresql/11/main
pg_hba.conf
host based authenticationpg_ident.conf
ident authentication config
Roles and Security⌗
Postgres simplifies users and groups, into the role (i.e. they are users, they are roles).
A number of authentication methods are available, with the default being ident.
ident
andpeer
: if a local nix account that has the same name as a psql role, can sign in as that role. Optional user to role mappings can go inpg_ident.conf
.ident
applies to TCP/IP, andpeer
for local connections.md5
: takes an MD5 hashed passwordpassword
: clear texttrust
: no password needed, based on an IP/user/DB matchreject
: deny immediatelyGSSAPI
: standard secure auth protocol defined in RFC2743, and often combined with KerberosSSPI
: for Microsoft environments, uses Kerberos when possible, will fall back to NTLM- and lots more
kerberos
,LDAP
,RADIUS
,Certificate
andPAM
By default, only the postgres
user has read/write access to the configuration files, to edit make sure you are running as this identity.
To allow a non-default login, open up the the pg_hba.conf
(host based auth) config file.
$ sudo -u postgres vim pg_hba.conf
Basics⌗
Connectivity⌗
Starting off with ident
authentication, by default on installation a new OS user and PG role called postgres
is created. Running the psql
client as this user is one way to gain initial access, i.e. yb sudo -i -u postgres
or by running psql
under the user sudo -u postgres psql
.
postgres=# SELECT version();
version
----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.1 (Ubuntu 11.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit
(1 row)
Enabling remote access, is generally a first desirable goal.
-
Open
postgresql.conf
(from above), and uncommentlisten_addresses = 'localhost'
, changelocalhost
to*
(bind to all addresses) -
Open
pg_hba.conf
, and addhost all all 192.168.1.0/24 trust
(substituting the network and CIDR for whats appropriate) -
Bounce the PostgreSQL server process
sudo systemctl restart postgresql
-
From a remote (client) host verify the port is open with a port scan
nmap -sT 192.168.1.238
(thats my psql server).$ nmap -sT 192.168.1.238
Nmap scan report for 192.168.1.238 PORT STATE SERVICE 22/tcp open ssh 5432/tcp open postgresql
Client Utilities⌗
PostgreSQL ships with several handy client applications. You can see the symlinks to them in /usr/bin
by running ls -l /usr/bin | grep postgres
.
clusterdb
cluster a databasecreatedb
create a new databasecreatelang
install a procedural languagecreateuser
define a new user accountdropdb
remove a databasedroplang
remove a procedural languagedropuser
remove a user accountpg_basebackup
take a base backup of a clusterpgbench
run a benchmark testpg_dump
extract a database into a script file or other archive filepg_dumpall
extract a database cluster into a script filepg_isready
check the connection status of a serverpg_receivewal
stream write-ahead logs from a serverpg_receivexlog
streams transaction logs from a clusterpg_recvlogical
control logical decoding streamspg_restore
restore a database from an archive file created bypg_dump
psql
interactive terminalreindexdb
reindex a databasevacuumdb
garbage-collect and analyze a database
For example to create a user and database, could run:
$ sudo -u postgres createuser --login ben
And:
$ sudo -u postgres createdb --owner=ben foodb
To make the ben
role a superuser could have passed the -s
switch to createuser
, to promote the existing low powered role, can leverage the ALTER USER
SQL command. First can review existing role permissions with a quick \du
:
$ sudo -u postgres psql
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
ben | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Then the SQL:
postgres=# ALTER USER ben WITH SUPERUSER CREATEDB CREATEROLE REPLICATION LOGIN;
ALTER ROLE
Verification:
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
ben | Superuser, Create role, Create DB, Replication | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Several privileges are available and well documented:
SUPERUSER
can override any access controls within the database. Dangerous.CREATEDB
can spin up new dbsCREATEROLE
can create/drop roles (users)INHERIT
automatically applies powers of roles that the role is a member of (without explicitSET ROLE
)LOGIN
can be used for initial session authorisation (i.e. a user)REPLICATION
needed for connectivity to servers in replication mode.BYPASSRLS
bypass row-level security (RLS) policy.
Each of the below has an explicit deny version of the privilege, prefixed with NO, e.g. NOSUPERUSER
.
psql Primer⌗
\?
: show all psql commands\h
show SQL syntax help\c
dbname [username]: Connect to database, with an optional username (or\connect
)
Display commands (append +
for extra details):
\l
: List all database (or\list
)\d
: Display all tables, indexes, views, and sequences\dt
: Display all tables\di
: Display all indexes\dv
: Display all views\ds
: Display all sequences\dT
: Display all types\dS
: Display all system tables\du
: Display all users\x
auto|on|off: Toggle|On|Off expanded output mode
To change database use \c
:
\c enceladus
enceladus=#
To get specific SQL statement help, try handy psql command \h
:
enceladus=# \h CREATE TABLE
Command: CREATE TABLE
Description: define a new table
Syntax:
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
{ column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
| table_constraint
....
Go for it:
enceladus=# drop table if exists master_plan;
enceladus=# create table master_plan(
enceladus(# id serial primary key,
enceladus(# the_date date,
enceladus(# title varchar(100),
enceladus(# description text
enceladus(# );
CREATE TABLE
SQL script files⌗
Great for reusable SQL, and putting into Git.
psql foodb -h 10.1.2.3 < build.sql
Or:
psql foodb -h 10.1.2.3 -f build.sql
Importing Data (ETL)⌗
Generally ETL software is bloat. KISS with:
- Idempotent shell scripts and
make
files. Idempotency is important, so that scripts can be reliably run numerous times and result in the same outcome. All data should be typed asTEXT
, typing and cleanup can be dealt with later on. - If needed, elevate to Python using Pandas or PETL
- If needed, elevate by considering a dedicating piece of middleware such as Kafka or Logstash
Digging through the help \h COPY
found COPY FROM
will be useful for extracting the CSV on disk:
copy data between a file and a table
For example:
COPY master_plan
FROM '<put_path_here>/master_plan.csv'
WITH DELIMITER ',' HEADER CSV;
Note, the SQL based approach is server based, i.e. the csv path will need to be accessible by the server itself.
For a client side approach can leverage the \copy
commands, like so:
\copy master_plan FROM '/home/ben/code/psql/cassini_data/curious_data/data/master_plan.csv' WITH DELIMITER ',' HEADER CSV;
Uber handy⌗
Number of active connections⌗
select pid from pg_stat_activity where datname = 'svc';
Kill connections⌗
select pg_terminate_backend(pid) from pg_stat_activity where pid <> pg_backend_pid() and datname = 'database_name';