+ - 0:00:00
Notes for current slide
Notes for next slide

Postgres 9.5
for Big Data

Josh Berkus

Red Hat OSAS

PDXPUG 3/16

1 / 43

Me

  • PostgreSQL Core Team
  • DBA since 1995
  • Helped create Greenplum
  • Involved with PipelineDB and CitusData
2 / 43

elephant with car

3 / 43

Big data features

  • Sophisticated SQL parser/planner
  • Works with most DW tools
  • Many large DB features (partitioning, etc.)
  • FDWs (more later)
  • Stability & Reliability
4 / 43

Postgres Data Sizing

Comfort Zone: 1 GB to 10 TB

Possible: 10 TB to 100 TB

With Extensions: ???

5 / 43

9.5 Status

Beta 2 Released Nov. 12

Final in a few weeks

Test now!

6 / 43

9.5 Big Data Features

  • UPSERT
  • BRIN indexes
  • GROUPING SETS / CUBE
  • Foreign Schema
  • Faster Sorts
  • Misc
  • Beyond 9.5
7 / 43

UPSERT

8 / 43

UPSERT

Actual Syntax:

INSERT ... ON CONFLICT
DO { UPDATE | IGNORE }

Similar to MERGE, only better

9 / 43

Why you want this

  • works for large sets of rows
  • supports complex logic
  • supports cascading
  • rewrite procedural ETL logic as single SQL statements
  • faster (sometimes)
10 / 43

BRIN

11 / 43

BRIN

  • Stands for Block-Range INdex
  • New index type
  • Also knowns as "skiplist" index
  • Similar to Column Store
  • Works on naturally ordered data
12 / 43

Why you want this

Very small indexes for very large data

13 / 43

BTree

btree index

14 / 43

brin index

15 / 43

Index Loading

table load time graph

16 / 43

BRIN sizing

relname | pg_size_pretty
----------------+----------------
brin_example | 3457 MB
brin_index | 104 kB
brin_index_256 | 64 kB
brin_index_512 | 40 kB
brin_index_64 | 192 kB
btree_index | 2142 MB
17 / 43

GROUPING SETS,
ROLLUP and CUBE

18 / 43

ROLLUP

Get subtotals and totals with detail in one query.

SELECT country, city, count(*)
FROM lwn_subscribers
GROUP BY ROLLUP ( country, city );
19 / 43

CUBE

Get and explosion of all possible summaries for use with OLAP tools.

SELECT country, level, count(*)
FROM lwn_subscribers
GROUP BY CUBE ( country, level );
20 / 43

GROUPING SETS

Superset/programmable version of both ROLLUP and CUBE

SELECT city, level, count(*)
FROM lwn_subscribers
GROUP BY GROUPING SETS
((city, level),(level),());
21 / 43

Why you want this

Complex summary reports in one trip to the database.

Do OLAP in the database instead of outside.

22 / 43

Faster Sorts

23 / 43

Abbreviated Key Sorting

  • Sort Varlena data by key prefix instead of by value
  • Works with TEXT, VARCHAR, NUMERIC
  • 2X to 20X faster
  • Much faster index builds
24 / 43

Why you want this

Duh.

25 / 43

How to use faster sorts

It just works.

26 / 43

Foreign Schema

27 / 43

What's an FDW?

Foreign Data Wrapper

Lets PostgreSQL access external data like it was a local table.

CREATE FOREIGN TABLE myredishash
(key text, val text[])
SERVER redis_server
OPTIONS (database '0', tabletype 'hash',
tablekeyprefix 'mytable:');
INSERT INTO myredishash (key, val)
VALUES ('mytable:r1,
'{prop1,val1,prop2,val2}');
28 / 43

FDWs Rule the Universe

fdw diagram

29 / 43

Many FDWs available

  • Postgres
  • MySQL, Oracle
  • Cstore
  • Hadoop/Hive
  • Cassandra, Redis
  • CSV
  • Twitter
30 / 43

Import Foreign Schema

Copy an entire target database in one command.

IMPORT FOREIGN SCHEMA public
EXCEPT (reports, audit)
FROM SERVER acctng_server INTO acctng;
31 / 43

Foriegn Table Inheritance

Data federation using PostgreSQL partitioning.

create foreign table users_shard_1 ()
INHERITS (users) server shard_1
options ( table_name 'users' );
create foreign table users_shard_2 ()
INHERITS (users) server shard_2
options ( table_name 'users' );
create foreign table users_shard_3 ()
INHERITS (users) server shard_3
options ( table_name 'users' );
32 / 43

Why you want this

Ad-hoc Sharding

(but also check out pg_shard)

33 / 43

More Features

34 / 43

Set LOGGED/UNLOGGED

Use unlogged tables for ELT, then persist them.

ALTER TABLE july_data SET LOGGED;
35 / 43

GiST Index-Only Scan

Index-only scan for PostGIS, Exclusion

-> Index Only Scan using meeting_exclusion on meetings
-> (cost=0.41..1113.98 rows=36038 width=4)
Index Cond: (reservation && '
["2014-05-01 00:00:00+01","2014-05-31 00:00:00+01"]'::tstzrange)
36 / 43

WAL Compression

Reduce PostgreSQL log writes during data loading

Less writing == faster load

37 / 43

TABLESAMPLE

Get a "quick look" at data in a big table.

SELECT * FROM user_profiles
TABLESAMPLE BERNOULLI ( 0.001 );
38 / 43

Beyond 9.5

39 / 43

9.6: Parallel Seq Scan

Finally, parallel query for Postgres

set max_parallel_degree = 4;
select * from pgbench_accounts
where filler like '%a%';
QUERY PLAN
---------------------------------------------
Gather
Number of Workers: 4
-> Parallel Seq Scan on pgbench_accounts
Filter: (filler ~~ '%a%'::text)
40 / 43

cstore & pg_shard

extensions from CitusData

cstore: column-store for Postgres

pg_shard: automated DW sharding

41 / 43

Big Data Forks Go OSS

PipelineDB: August 2015

Greenplum: November 2015

CitusDB: March? 2016

42 / 43

¿questions?

more
jberkus:

more
events:

www.projectatomic.io
www.databasesoup.com
jberkus.github.io

pgCon:
May 21, Ottawa

pgConfSV
Nov. 16, San Francisco

43 / 43

Me

  • PostgreSQL Core Team
  • DBA since 1995
  • Helped create Greenplum
  • Involved with PipelineDB and CitusData
2 / 43
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow