Database Engines Crash

Learn ACID, Indexing, Partitioning, Sharding, Concurrency control, Replication, DB Engines, Best Practices and More!

Last updated 2022-01-10 | 4.8

- Learn and understand ACID Properties
- Database Indexing
- Database Partitioning

What you'll learn

Learn and understand ACID Properties
Database Indexing
Database Partitioning
Database Replication
Database Sharding
Database Cursors
Concurrency Control (Optimistic
Pessimistic)
B-Trees in Production Database Systems
Database System Designs
Difference between Database Management System
Database Engine and Embedded database
Database Engines such as MyISAM
InnoDB
RocksDB
LevelDB and More
Benefits of Using one database engine over the other
Switching Database Engines with MySQL
Database Security
Homomorphic Encryption

* Requirements

* Have worked with databases before but wish to get deeper understanding
* Basic SQL knowledge

Description

  • Learn and understand ACID Properties
  • Database Indexing
  • Database Partitioning
  • Database Replication
  • Database Sharding
  • Database Cursors
  • Concurrency Control (Optimistic, Pessimistic)
  • B-Trees in Production Database Systems
  • Database System Designs
  • Difference between Database Management System, Database Engine and Embedded database
  • Database Engines such as MyISAM, InnoDB, RocksDB, LevelDB and More
  • Benefits of Using one database engine over the other
  • Switching Database Engines with MySQL
  • Database Security
  • Homomorphic Encryption

Course content

17 sections • 132 lectures

Welcome to the Course Preview 06:36

Course Update Oct 2020 Preview 00:37

Course Update April 2021 Preview 07:36

Course Update December 2021 Preview 09:24

Note about Docker Preview 00:44

Introduction to ACID Preview 03:54

ACID which stands for Atomicity, consistency, isolation, and durability are four critical properties of relational database. I think any engineer working with a relational database like postgres, mysql, sqlserver oracle, should understand these properties.


In this course, we will go through the four properties and explain why each is critical to build and use a relational database successfully.

What is a Transaction? Preview 13:34

Atomicity Preview 10:04

Isolation Preview 31:53

Consistency Preview 12:41

Durability Preview 12:12

Eventual Consistency Preview 14:33

ACID by Practical Examples Preview 21:21

In this video we will demonstrate Atomicity, Isolation, Consistency and Durability on Postgres, fully practical example.



Phantom Reads Preview 07:35

Serializable vs Repeatable Read Preview 08:17

ACID Quiz

Answer the following questions about ACID properties in databases

How tables and indexes are stored on disk (MUST WATCH before continue) Preview 21:03

This lecture details the inner working of database systems with regards to storage. It is a must watch to understand the difference between tables, pages, IO, rows, indexes and data files.

Row-Based vs Column-Based Databases Preview 34:15

Primary Key vs Secondary Key - What you probably didn't know Preview 10:56

In this lecture I will discuss the difference between Primary Key and a Secondary Key and how it can affect your performance.


Create Postgres Table with a million Rows (from scratch) Preview 04:22

Lots of you asked me how to create a table with millions of rows in postgres, here are the details


Getting Started with Indexing Preview 18:18

Understanding The SQL Query Planner and Optimizer with Explain Preview 10:16

Index Scan vs Index Only Scan Preview 08:08

Key vs Non-Key Column Database Indexing Preview 17:02

Combining Database Indexes for Better Performance Preview 14:07

How Database Optimizers Decide to Use Indexes Preview 11:19

Bitmap Index Scan vs Index Scan vs Table Scan Preview 11:24

In this video, I explain the benefits of Bitmap Index Scan and how it differs from Index scan and table sequential scan. 

Create Index Concurrently - Avoid Blocking Production Database Writes Preview 03:10

If you create an index on a large production table in postgres, the operations blocks writes in order to make sure to pull all the field entries to the index. However most of the time you can't afford to block writes on an active production database table. Postgres new feature which allows create index concurrently allows writes and reads to go in the expense of cpu/memory, time and chance for the index to be invalid. A small price to pay for fast production writes! https://www.postgresql.org/docs/9.1/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

Bloom Filters Preview 09:18

Working with Billion-Row Table Preview 13:40

Article - The Cost of Long running Transactions Preview 01:40

Article - Microsoft SQL Server Clustered Index Design Preview 03:18

B-Tree Section's Introduction & Agenda Preview 03:34

B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. However, most contents explain this data structure from a theoretical point of view, in this lecture I’d like to shed some light on the practical considerations of B-Tree and B+Trees in real production database systems such as Postgres and MySQL.


Link to the original paper https://infolab.usc.edu/csci585/Spring2010/den_ar/indexing.pdf

Full Table Scans Preview 03:27

Original B-Tree Preview 04:46

How the Original B-Tree Helps Performance Preview 07:56

Original B-Tree Limitations Preview 06:31

B+Tree Preview 06:17

B+Tree DBMS Considerations Preview 04:27

B+Tree Storage Cost in MySQL vs Postgres Preview 04:45

B-Tree Section's Summary Preview 01:07

Introduction to Database Partitioning Preview 01:26

What is Partitioning? Preview 03:28

Vertical vs Horizontal Partitioning Preview 02:07

Partitioning Types Preview 01:32

The Difference Between Partitioning and Sharding Preview 02:43

Preparing: Postgres, Database, Table, Indexes Preview 05:22

Execute Multiple Queries on the Table Preview 01:18

Create and Attach Partitioned Tables Preview 03:46

Populate the Partitions and Create Indexes Preview 03:59

Class Project - Querying and Checking the Size of Partitions Preview 05:32

The Advantages of Partitioning Preview 04:41

The Disadvantages of Partitioning Preview 02:48

Section Summary - Partitioning Preview 01:10

How to Automate Partitioning in Postgres Preview 10:06


Assume you have a table that is partitioned on the customer_id field serial 32bit, and you want to partition by range, how do you create all the necessary partitions? this is what I discuss in this video


Source Code

https://github.com/hnasr/javascript_playground/tree/master/automate_partitions


Introduction to Database Sharding Preview 01:06

What is Database Sharding? Preview 03:31

Consistent Hashing Preview 02:41

Horizontal partitioning vs Sharding Preview 01:08

Sharding with Postgres Preview 01:05

Spin up Docker Postgres Shards Preview 07:23

Writing to a Shard Preview 21:58

Reading from a Shard Preview 11:19

Advantages of Database Sharding Preview 01:59

Disadvantages of Database Sharding Preview 03:33

Database Sharding Section Summary Preview 00:45

When Should you consider Sharding your Database? Preview 20:06

Shared vs Exclusive Locks Preview 11:04

In this lecture we explain the difference between exclusive (write locks) and shared locks (read locks)

Dead Locks Preview 05:31

Two-phase Locking Preview 07:08

Solving the Double Booking Problem (Code Example) Preview 13:42

In this video, I demonstrate how is it possible to get double booking in database-backed web applications and how to prevent double booking and race conditions with row-level locks.


Source Code https://github.com/hnasr/javascript_playground/tree/master/booking-system

Double Booking Problem Part 2 ( Alternative Solution and explination) Preview 17:49

SQL Pagination With Offset is Very Slow Preview 09:09

In this video I’ll explain why you should avoid using SQL offset when implementing any kind of paging. I’ll explain what offset does, why is it slow and what is the alternative for better performance This video is inspired by Use the index luke, i’ll have a link to the blog and slides to learn more. Let say you have a web application with an API that supports paging, you user want to request 10 news articles in page 10, this is performed via a simple GET request as shown here The API server receives the GET request and builds the SQL in order to send it to the database hopefully a pool of connections exist here. Page 10 translates to offset 100 assuming each page has 10 records and now the database is ready to execute the query against the table. Offset by design means fetch and drop the first x number of rows, so in this case the database will fetch the first 110 rows and physically drop the first 100 leaving the limit of 10 which the user will get. As the offset increase, the database is doing more work which makes this operation extremely expensive. Furthermore, the problem with offset is you might accidentally read duplicate records. consider the user now want to read page 11 and meanwhile someone inserted a new row in the table, row 111 will be read twice Let us jump and test this against postgres



Use the Index Luke Blog https://use-the-index-luke.com/no-offset


Slides in this video https://payhip.com/b/B6o1


Database Connection Pooling Preview 10:50

Connection pooling is a pattern of creating a pool of available connections (usually TCP) and allow multiple clients to share the same pool of connections. This pattern is usually used when connection establishment and tearing down is costly, and the server has a limited number of connections. In this video we will learn how to use connection pooling in NodeJs when working with a Postgres Database, we will learn how to spin up a pool of database connections and use stateless pool queries and transactional queries begin/end, and finally, we will


  • Node JS Source Code used in this lecture here https://github.com/hnasr/javascript_playground/tree/master/postgresnode-pool


Introduction to Database Replication Preview 01:19

Scripts and commands

docker run --name pgmaster -v /Users/HusseinNasser/postgres/v/master_data:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres


docker run --name pgstandby -v /Users/HusseinNasser/postgres/v/standby_data:/var/lib/postgresql/data -p 5433:5432 -e POSTGRES_PASSWORD=postgres -d postgres

In standby node update postgresql.conf

primary_conninfo = 'application_name=standby host=husseinmac port=5432 user=postgres password=postgres’

add file standby.signal

touch standby.signal

In master update postgresql.conf

first 1 (standby1)

select * from pg_stat_replication

Master/Standby Replication Preview 03:45

Multi-master Replication Preview 01:23

Synchronous vs Asynchronous Replication Preview 03:06

Replication Demo with Postgres 13 Preview 19:20

Pros and Cons of Replication Preview 03:28

Twitter System Design Database Design Preview 01:11:56

We got through a practical system design exercises, this lecture is two parts. Part 1 is all about backend engineering and scaling and Part 2 focuses on database design.

Building a Short URL System Database Backend Preview 37:50

Introduction Preview 02:44

Database engines or storage engines or sometimes even called embedded databases is software library that a database management software uses to store data on disk and do CRUD (create update delete)


Resources
https://youtu.be/V_C-T5S-w8g

https://mariadb.com/kb/en/library/changes-improvements-in-mariadb-102/

https://mariadb.com/kb/en/library/why-does-mariadb-102-use-innodb-instead-of-xtradb/

https://github.com/facebook/rocksdb/wiki/Features-Not-in-LevelDB

https://mariadb.com/kb/en/library/aria-storage-engine/

https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html https://eng.uber.com/mysql-migration/

What is a Database Engine? Preview 06:41

MyISAM Preview 06:49

InnoDB Preview 06:45

XtraDB Preview 02:05

SQLite Preview 03:44

Aria Preview 02:27

BerkeleyDB Preview 02:40

LevelDB Preview 06:20

RocksDB Preview 04:05

Popular Database Engines Preview 02:31

Switching Database Engines with mySQL Preview 17:08

Using docker we will spin up a mySQL database instance and then create two tables with myISAM and one InnoDB and demonstrate the difference.


Code here.


https://github.com/hnasr/javascript_playground/tree/master/mysql_transaction_myisam_innodb

What are Database Cursors? Preview 09:31

We usually advise against returning a large result set, however sometimes it is a requirement so how do we use the features of the database to perform this task? Database cursors can help.

We will discuss

  1. Problems with Large result set

  2. Database Cursors Explained

  3. Pros and Cons of Cursors


Learn more about Postgres cursors here https://www.postgresql.org/docs/current/plpgsql-cursors.html

Server Side vs Client Side Database Cursors Preview 01:44

We will learn the difference between a server-side cursor and a client side cursor. We will see the performance of both cursors by inserting 1 million rows, then reading the rows with a server side cursor vs a client side cursor. We are using psycopg2 in this tutorial with python.

Inserting Million Rows with Python in Postgres using Client Side Cursor Preview 03:29

Querying with Client Side Cursor Preview 04:42

Querying with Server Side Cursor Preview 01:24

Pros and Cons of Server vs Client Side Cursors Preview 04:38

How to Secure Your Postgres Database by Enabling TLS/SSL Preview 12:01

What is the Largest SQL Statement that You can Send to Your Database Preview 09:43

Deep Look into Postgres Wire Protocol with Wireshark Preview 14:07

Deep Look Into MongoDB Wire Protocol with Wireshark Preview 24:04

Best Practices Working with REST & Databases Preview 06:53

Database Permissions and Best Practices for Building REST API Preview 23:40

Introduction to Homomorphic Encryption Preview 01:52

What is Encryption? Preview 01:29

Why Can't we always Encrypt? Preview 04:32

What is Homomorphic Encryption Preview 05:00

Homomorphic Encryption Demo Preview 00:35

Clone and Build the Code Preview 02:35

Going Through the Code and the Database Preview 01:05

Searching The Encrypted Database Preview 03:10

Is Homomorphic Encryption Ready? Preview 02:30

Q&A - October 2021 Preview 19:37

In this Q&A I answer some of your database questions in detail. Enjoy and let me know if you want more of these.

Q&A - November 2021 Preview 25:45

Q&A - December 2021 Preview 23:49

SELECT COUNT (*) can impact your Backend Application performance, here is why Preview 10:36

aggregating large entries in the database (to perform a count for example) is a lot of work. The database has to sort through large number of records whether in an index or in the table itself. Doing this too often can impact the performance of both your database and your application, let us discuss why count can be slow and an alternative if you want an estimate of the count. 

How does the Database Store Data On Disk? Preview 18:55

Is QUIC a Good Protocol for Databases? Preview 12:15

What is a Distributed Transaction? Preview 21:28

Why Uber Moved from Postgres to MySQL (Discussion) Preview 47:13

Can NULLs Improve your Database Queries Performance? Preview 21:12

In this episode, we will discuss NULLs in database systems. I’ll go through the following: What is Null? NULLs persistence - Whether you store a 0 or 2 billion value in the field 32bit integer field it costs 32 bit - when you store a NULL in 32 bit integer field we save 32 bit but add overheads When NULLs are naughty - Semantics and inconsistent result - Select count(*). Includes nulls - count(column) ignores nulls - T is NULL returns the null rows - T is NOT NULL returns, not null rows - T In (NULL) returns nothing - T not in NULL returns nothing - Some database don’t index nulls When NULLs are useful - I don’t have value, I don’t wish to provide a birthday - not applicable field for certain use cases but not others fat tables (denormalization) - Fat tables with many columns makes your rows longer which means fewer rows fit on your page. NULLs help here .. that is NULL, it yields shorter rows, instead of storing a default 0 value


Write Amplification Explained in Backend Apps, Database Systems and SSDs Preview 21:15

Write Amplification Is a phenomenon where the actual writes that physically happen are multiples of the actual writes desired. In this episode, I'll discuss 3 types of write amplifications and their effects on performance and lifetime of storage mediums.


Resources

https://en.wikipedia.org/wiki/Write_amplification https://www.cybertec-postgresql.com/en/hot-updates-in-postgresql-for-better-performance/


Optimistic vs Pessmistic Concurrency Control Preview 17:36

Introduction to ACID (Archived) Preview 01:39

What is a Transaction? (Archived) Preview 02:31

Atomicity (Archived) Preview 02:39

Isolation (Archived) Preview 20:35

Consistency (Archived) Preview 13:06

Durability (Archived) Preview 01:44