Thank you Carnegie Mellon Database Group for putting this online! These are my notes from watching on YouTube.
Carnegie Mellon Databases Storage II, Lecture 4
Prof. Andy Pavlo [Watch on YouTube]
The Problem and Solution
How should the DBMS represent the database in storage files on disk? Solve it by choosing the right storage model for your target workload. The right strategy varies if you are reading data, writing data and with how many joins you are performing.
OLTP (Online Transaction Processing): “Simple queries with lots of writes.”
OLAP (Online Analytical Processing): “Read only queries. Lots of joins. Doing a lot of reads, but they’re more complex.”
HTAP (Hybrid Transactional Analytical Processing): “is trying to do both of them. You still want to ingest new data, but analyze it as it comes in. It’s used for companies making decisions on the fly as people are browsing websites, like internet advertising companies.”
screenshots from the lecture
N-ary used to be the dominant model until the ’80s.
Additional Reading: All Things Distributed
Column Store Vs. Row Store RDBMS
Row-oriented DBMS (Row Store)
- PostgreSQL, MySQL
- Row Store = use OLTP
Column-oriented DBMS (Column Store)
- Red Shift, BigQuery
- Column Store = use OLAP
If types are consistent, you can compress data into single column store.