Full
Auto
Database

Josh Berkus

Red Hat Project Atomic

KubeCon.EU 2016

redhat logo

1 / 57

auto battle game shot

2 / 57

google car

3 / 57

WIP: waiting for 1.2/1.3

under constuction

4 / 57

yak shaving

5 / 57

Demo6 / 57

Single Master DBs: Problemlow availability
unidirectional replication
very manual HA solutions
7 / 57

Why not multi-master DBs?

just moving the problem around

"eventual" consistency
network lag
maturity issues
feature poverty
app compatibility

8 / 57

But PG Replicaton is Awesome!Easy to set up
Guaranteed
Corruption-free
Anti-footgun
Combines with DR
9 / 57

y u no guy

Y U No Failover?

10 / 57

"Automated failover is too complicated.  You don't want it."11 / 57

NO!12 / 57

Hard != Impossible

google car

13 / 57

Hard != Impossible

general autofailover is prohibitive

but ... we can implement common use cases

14 / 57

The 80% SolutionPool of async replicas
Cheap/replacable nodes

Containers
Watchdog service
Auto-promote one replica
Other nodes remaster
Update routing
15 / 57

Now, a little history ...

handyrep logo

16 / 57

Handyrep

master-controller architecture
based on Python Fabric + SSH
worked in production
worked with any Postgres config
pluggable

www.handyrep.org

17 / 57

Handyrep: too generalDifficult to install
Difficult to debug
Over 100 configuration options
Scaled poorly
HR server was SPoF
18 / 57

Zalando

no1 European online fashion
15m customers
150 databases
24/7/365 operation

... needed automated, decentralized HA

19 / 57

Failover Failure

False failover
Misfires
Race conditions

20 / 57

split brain

21 / 57

Split Brain and S-M DBsworst possible outcome
automated recovery impossible
manual recovery painful
22 / 57

St. Francis feeding the flying elephants

Patroni

23 / 57

compose.io announcement

24 / 57

Postgres is a poor store of its own replication state
Smart agents > top-down controllers
25 / 57

Compose Governor

Containers
Etcd-based consensus
Simple PostgreSQL controller

... so we forked it.

26 / 57

How it works27 / 57

failover in three parts

failover est omnis divisa in partes tres

28 / 57

failover in three parts 2

failover est omnis divisa in partes tres

29 / 57

The Patroni Controller

patroni controller

30 / 57

Patroni controllerPython daemon
Runs in each container as PID 1
Controls Postgres startup/shutdown/config
Provides external REST API
Enforces opinionated config
31 / 57

Patroni Failover

how patroni works animation leader

32 / 57

Patroni Failover

how patroni works animation

33 / 57

Patroni Failover

how patroni works animation

34 / 57

Patroni Failover

how patroni works animation

35 / 57

Patroni Failover

how patroni works animation

36 / 57

Patroni Failover

how patroni works animation

37 / 57

Patroni Failover

how patroni works animation

38 / 57

Patroni Failover

how patroni works animation

39 / 57

Patroni Failover

how patroni works animation

40 / 57

Patroni Failover

how patroni works animation

41 / 57

Patroni Failover

how patroni works animation

42 / 57

Patroni Failover

how patroni works animation

43 / 57

Patroni Failover

how patroni works animation

44 / 57

Patroni Failover

how patroni works animation

45 / 57

Patroni Failover

how patroni works animation

46 / 57

What about split-brain?

split brain

47 / 57

Etcddistributed consensus HTTP data store
Raft algoritm
implements CA
great for config + metadatanot for data data

48 / 57

Etcd AlternativesZookeeperlarger scale
supported

Consulintegrates discovery
not (yet) suppported

49 / 57

What's AtomicDB?

WIP project

PostgreSQL
Patroni
Atomic Host
Kubernetes
Dynamic proxy (dev)
Cockpit UI (dev)

50 / 57

Let's see that again51 / 57

The Proxy Problemdifferentiate master and read-only connections
master service needs to follow failover
failover logic too complex for kubernetes (1.1)
52 / 57

pgbouncer?

current implementation in pgbouncer
master, read slaves separate services/ports
depends on flannel LB

not good enough. Waiting for 1.2/1.3!

53 / 57

More featurespg_rewind support (9.4+)
configurable node imagingWAL-E
PITR

synchronous replication
non-failover replicas
54 / 57

More Stuff Under development

cascading replication
integrated proxy
BDR support?

fork us on Github!

55 / 57

ResourcesThis Presentation:

jberkus.github.io/full_auto_db
Patroni Project:

github.com/zalando/patroni
AtomicDB Project:

github.com/jberkus/atomicdb
56 / 57

¿questions?

more
jberkus:

project atomic:

@fuzzychef
www.databasesoup.com

www.projectatomic.io RedHat booth for Cockpit Kube demo

rh logo

cc by sa

57 / 57

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

FullAutoDatabase

WIP: waiting for 1.2/1.3

Demo

Single Master DBs: Problem

Why not multi-master DBs?

But PG Replicaton is Awesome!

Y U No Failover?

"Automated failover is too complicated. You don't want it."

NO!

Hard != Impossible

Hard != Impossible

The 80% Solution

Now, a little history ...

Handyrep

Handyrep: too general

Zalando

Failover Failure

Split Brain and S-M DBs

Patroni

Compose Governor

How it works

failover est omnis divisa in partes tres

failover est omnis divisa in partes tres

The Patroni Controller

Patroni controller

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

Patroni Failover

What about split-brain?

Etcd

Etcd Alternatives

What's AtomicDB?

Let's see that again

The Proxy Problem

pgbouncer?

More features

More Stuff Under development

Resources

¿questions?

Help

Full
Auto
Database