lade...
random avatar

andy_pavlo - Network

Posts Subscribe

Do you hate SQL and wish it would die & burn in hell? Or do you love SQL and wish it ran faster? If you answered 'yes' to either...

https://discuss.systems/@andy_pa...

Do you hate SQL and wish it would die & burn in hell? Or do you love SQL and wish it ran faster? If you answered 'yes' to either question then join our Spring 2025 @CMUDB Seminar Series: SQL or Death?
Mondays @ 4:30pm via Zoom.
Videos posted to YouTube: db.cs.cmu.edu/seminar2025/

Seminar Schedule:
Feb 10: Convex
Feb 17: The Germans (TUM)
Feb 24: Apache Pinot
Mar 03: Malloy
Mar 10: Google SQL Pipes
Mar 24: PRQL
Mar 31: StarRocks
Apr 07: Oxide OxQL
Apr 14: MariaDB
Apr 21: EdgeDB

29.1.2025 14:36Do you hate SQL and wish it would die & burn in hell? Or do you love SQL and wish it ran faster? If you answered 'yes' to either...
https://discuss.systems/@andy_pa...

New @CMUDB Course: Database Query OptimizationThis is a special topics course on how to build a SQL optimizer from scratch covering...

https://discuss.systems/@andy_pa...

New @CMUDB Course: Database Query Optimization
This is a special topics course on how to build a SQL optimizer from scratch covering foundational and state-of-the-art implementations. All lectures available on Youtube: 15799.courses.cs.cmu.edu/sprin

One topic we will discuss is the Cascades optimizer architecture. There has never been a good description of how to implement it but the Microsoft Research Database Group just published a book that describes SQL Server's implementation. Microsoft has made the entire book available for free: microsoft.com/en-us/research/p

Query optimization is the hardest topic in databases and this is the first time I am offering this course. I am going to make mistakes and say incorrect things in my lectures. Send corrections to db-mistakes@cs.cmu.edu

16.1.2025 17:27New @CMUDB Course: Database Query OptimizationThis is a special topics course on how to build a SQL optimizer from scratch covering...
https://discuss.systems/@andy_pa...

We're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowbacks,...

https://discuss.systems/@andy_pa...

We're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowbacks, Databricks vs. Snowflake gangwar, DuckDB's shotgun weddings, and how to buy a quarterback with database money to show that special somebody in your life that you're thinking of them! cs.cmu.edu/~pavlo/blog/2025/01

1.1.2025 14:03We're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowbacks,...
https://discuss.systems/@andy_pa...

Alexey Milovidov's recent ClickHouse talk at CWI is hilariously off-the-chain. I highly recommend watching it:...

https://discuss.systems/@andy_pa...

Alexey Milovidov's recent ClickHouse talk at CWI is hilariously off-the-chain. I highly recommend watching it: youtube.com/watch?v=jmVxfGEN0Q

You can hear me yelling at him from the audience at 1:09, 2:40, and 7:33.

3.12.2024 14:06Alexey Milovidov's recent ClickHouse talk at CWI is hilariously off-the-chain. I highly recommend watching it:...
https://discuss.systems/@andy_pa...

The video for my "What Goes Around Comes Around... And Around" talk at CWI is now available:...

https://discuss.systems/@andy_pa...

The video for my "What Goes Around Comes Around... And Around" talk at CWI is now available: youtube.com/watch?v=8Woy5I511L

đź“ŠSlides: cs.cmu.edu/~pavlo/slides/whatg
đź“„Paper: db.cs.cmu.edu/papers/2024/what

2.12.2024 15:04The video for my "What Goes Around Comes Around... And Around" talk at CWI is now available:...
https://discuss.systems/@andy_pa...

The first two videos for @CMUDB's latest seminar series on Database Building Blocks are now posted:(1) Start with Andrew Lamb's ...

https://discuss.systems/@andy_pa...

The first two videos for @CMUDB's latest seminar series on Database Building Blocks are now posted:

(1) Start with Andrew Lamb's fantastic introductory overview to Apache DataFusion: youtube.com/watch?v=iJhRbDFJjb

(2) Then watch Andy Grove's presentation on how Apple integrated DataFusion into Apache Spark via Comet: youtube.com/watch?v=o59s0d3HE1

Full Schedule: db.cs.cmu.edu/seminar2024/

4.10.2024 12:47The first two videos for @CMUDB's latest seminar series on Database Building Blocks are now posted:(1) Start with Andrew Lamb's ...
https://discuss.systems/@andy_pa...

SFO friends: I am coming to town on Wed Oct 16th to speak at the SF Systems Meetup. The topic will be why the relational model (and SQL!)...

https://discuss.systems/@andy_pa...

SFO friends: I am coming to town on Wed Oct 16th to speak at the SF Systems Meetup. The topic will be why the relational model (and SQL!) will still be here after you're dead and your corpse is rotting. Thanks to Berkeley PhD students (Connor Power & Shadaj Laddad) for organizing and Accel VC for hosting.

➡RSVP here: lu.ma/8gb8s8wo

2.10.2024 21:29SFO friends: I am coming to town on Wed Oct 16th to speak at the SF Systems Meetup. The topic will be why the relational model (and SQL!)...
https://discuss.systems/@andy_pa...

VLDB'24 Paper #2: This is our next generation ML tuning algorithm for databases. Instead tuning a single part of the DBMS (knobs,...

https://discuss.systems/@andy_pa...

VLDB'24 Paper #2: This is our next generation ML tuning algorithm for databases. Instead tuning a single part of the DBMS (knobs, indexes) one-at-a-time, Proto-X tunes *everything* all at the same time! Tuning takes a little longer but achieves beyond human performance.
• Code: github.com/17zhangw/protox
• Paper: vldb.org/pvldb/vol17/p3373-zha

Proto-X leans similarities between tuning options and exploits them. For example, INDEX (A,B,C) will have similar pros/cons as INDEX (A,C,B). Proto-X uses transformer to encode a DBMS's config into an embedding and find similar embeddings when exploring tuning choices.

Proto-X encodes the config and maps it to high-dim latent space. Then the actor/critic tuner algo selects the next config to try out, learns whether it helps, and refines the selection of the next config in the latent space.

We support tuning nearly everything in PostgreSQL:
• System/table/index/query knobs
• Indexes with types (btree, hash, brin) + INCLUDE (CREATE only, via HypoPG)
• Query hints (via pg_hint_plan)
We don't support destructive actions yet (DROP index, table partitioning).

5.9.2024 14:58VLDB'24 Paper #2: This is our next generation ML tuning algorithm for databases. Instead tuning a single part of the DBMS (knobs,...
https://discuss.systems/@andy_pa...

VLDB'24 Paper #1: Collecting training data for ML models with DBs is $$$/slow. @capybara's Boot framework uses PostgreSQL extensions...

https://discuss.systems/@andy_pa...

VLDB'24 Paper #1: Collecting training data for ML models with DBs is $$$/slow. @capybara's Boot framework uses PostgreSQL extensions to cutoff redundant queries.
• Code: github.com/lmwnshn/boot
• Paper: vldb.org/pvldb/vol17/p3680-lim

Macro-Accelerator: Skip entire queries and send back cached result if plan is similar to past queries. It identifies similar queries based on parameterized query plans. Happens automatically in Postgres without changing application code.

Micro-Accelerator: Watch tuples moving between plan operators to identify redundant behavior. It hijacks Postgres' query cancellation feature to cutoff portions of query plan without killing entire query. Also performs operator-level tuple sampling.

The results are stunning! Running DSB (scalefactor=10) on PostgreSQL v15 goes from 57hrs to 15min! Other workloads go from weeks to hours! Experiments show using two accelerators together achieves the best results. Model accuracy degradation is negligible (~10%).

5.9.2024 14:09VLDB'24 Paper #1: Collecting training data for ML models with DBs is $$$/slow. @capybara's Boot framework uses PostgreSQL extensions...
https://discuss.systems/@andy_pa...

New semester of @CMUDB's Intro to Database Systems! We're back with a vengeance with updated lectures on vector/full-text indexes +...

https://discuss.systems/@andy_pa...

New semester of @CMUDB's Intro to Database Systems! We're back with a vengeance with updated lectures on vector/full-text indexes + distributed databases! We're also featuring 10min flash talks each Wednesday from leading DB companies 15445.courses.cs.cmu.edu/fall2

Everything is available to non-CMU students to follow along at home:
• Lectures on Youtube: youtube.com/playlist?list=PLSE
• Slides + Notes on course website.
• Project source code on GitHub: github.com/cmu-db/bustub
• Project grading with Gradescope (see FAQ Q7 ➡️ 15445.courses.cs.cmu.edu/fall2

27.8.2024 14:48New semester of @CMUDB's Intro to Database Systems! We're back with a vengeance with updated lectures on vector/full-text indexes +...
https://discuss.systems/@andy_pa...

We are pleased to announce the @CMUDB Fall 2024 schedule for the "Database Building Blocks" seminar series! It will feature...

https://discuss.systems/@andy_pa...

We are pleased to announce the @CMUDB Fall 2024 schedule for the "Database Building Blocks" seminar series! It will feature speakers from leading DBMSs built from open-source components: db.cs.cmu.edu/seminar2024/

Mondays @ 4:30pm ET via Zoom (open to public).
Videos posted to YouTube

Sep 23: Apache DataFusion
Sep 30: Apache DataFusion Comet
Oct 07: ParadeDB
Oct 21: VoltronData's Theseus
Oct 28: WhereTrueTech's Exon
Nov 04: Synnada
Nov 11: InfluxDB
Nov 18: GlareDB
Nov 25: GreptimeDB
Dec 02: Databend's OpenDAL

8.8.2024 20:44We are pleased to announce the @CMUDB Fall 2024 schedule for the "Database Building Blocks" seminar series! It will feature...
https://discuss.systems/@andy_pa...

After 10 years, we have posted our 1000th database on @CMUDB's Database of Databases encyclopedia!VelarixDB is a #rustlang embedded LSM...

https://discuss.systems/@andy_pa...

After 10 years, we have posted our 1000th database on @CMUDB's Database of Databases encyclopedia!

VelarixDB is a embedded LSM DBMS created by Adewumi Sunkanmi. He is a student in Nigeria that built the DBMS as part of his undergraduate studies. He reached out to me to share his work: dbdb.io/db/velarixdb

5.8.2024 11:58After 10 years, we have posted our 1000th database on @CMUDB's Database of Databases encyclopedia!VelarixDB is a #rustlang embedded LSM...
https://discuss.systems/@andy_pa...

After three years of writing, our follow-up to the classic 2006 paper is finally out! In "What Goes Around Comes Around… And...

https://discuss.systems/@andy_pa...

After three years of writing, our follow-up to the classic 2006 paper is finally out! In "What Goes Around Comes Around… And Around…", Stonebraker and I examine the last 20 years in databases and talk about why relational DBs are going to reign supreme.

db.cs.cmu.edu/papers/2024/what

1.7.2024 15:24After three years of writing, our follow-up to the classic 2006 paper is finally out! In "What Goes Around Comes Around… And...
https://discuss.systems/@andy_pa...

I'm sad to announce that @OtterTune is officially dead. Our service is shutdown and we let everyone go today (1mo notice). I can't...

https://discuss.systems/@andy_pa...

I'm sad to announce that @OtterTune is officially dead. Our service is shutdown and we let everyone go today (1mo notice). I can't got into details of what happened but we got screwed over by a PE-backed Postgres company on an acquisition offer.

We saw huge improvements for customers through ML-based tuning. OtterTune worked better in real world than in the lab. But we struggled with on-boarding and making the product sticky. There were also rumblings in the last year about whether using LLMs would be better for tuning...

On behalf of my co-founders Dana + Bohan, we thank our hardworking team over the last 4yrs. We also appreciate the guidance of our investors IntelCapital (Nick Washburn + Assaf Araki) & RaceCapital (Alfred Chuang). I look forward to working with them again.

ottertune.com

14.6.2024 20:26I'm sad to announce that @OtterTune is officially dead. Our service is shutdown and we let everyone go today (1mo notice). I can't...
https://discuss.systems/@andy_pa...

Columnar file formats are ubiquitous (Parquet, ORC, CarbonData). Along with Xinyu Zeng + Huanchen Zhang + Wes McKinney, we published a...

https://discuss.systems/@andy_pa...

Columnar file formats are ubiquitous (Parquet, ORC, CarbonData). Along with Xinyu Zeng + Huanchen Zhang + Wes McKinney, we published a comprehensive study that analyzes the internal components of these formats.

TLDR: Parquet/ORC are old and not optimized for modern hardware. Something new is needed.

Paper: vldb.org/pvldb/vol17/p148-zeng
Source: github.com/XinyuZeng/Evaluatio

One problem the balkanization of Parquet/ORC libraries. Using the Java libs for this eval is a non starter even though they support each format's latest features. But the JVM would get in the way of low-level profiling. We ended up using the Arrow's C++ libs for most experiments except when we used the Rust libs to evaluate zone maps.

14.5.2024 23:40Columnar file formats are ubiquitous (Parquet, ORC, CarbonData). Along with Xinyu Zeng + Huanchen Zhang + Wes McKinney, we published a...
https://discuss.systems/@andy_pa...

Attention @CMUDB alumni:We finally removed the infamous stained futon from the 9th floor lab in the Gates-Hillman Center. The health...

https://discuss.systems/@andy_pa...

Attention @CMUDB alumni:

We finally removed the infamous stained futon from the 9th floor lab in the Gates-Hillman Center. The health inspector made us get rid of it or they were going to shut down the lab. I know that many of you were fond of the futon but it had to go.

The other CS faculty (e.g., @dave_andersen) don't know how old it was. They think it was in the old CS building (Wean Hall) and then it was just brought into Gates by the movers.

Thanks to @capybara with helping me move it. We forgot to wear safety gloves.

10.5.2024 18:04Attention @CMUDB alumni:We finally removed the infamous stained futon from the 9th floor lab in the Gates-Hillman Center. The health...
https://discuss.systems/@andy_pa...

There is a post on reddit using @CMUDB's Advanced Database Systems course as evidence for their stock shorting scheme on Snowflake. This...

https://discuss.systems/@andy_pa...

There is a post on reddit using @CMUDB's Advanced Database Systems course as evidence for their stock shorting scheme on Snowflake.

This person doesn't know what they are talking about. But I'm now concerned that my lecture wasn't clear for students:

youtu.be/NhWp1bTG0Cw

3.5.2024 15:36There is a post on reddit using @CMUDB's Advanced Database Systems course as evidence for their stock shorting scheme on Snowflake. This...
https://discuss.systems/@andy_pa...

I was recently interviewed by CMU's School of Computer Science media people about the one year anniversary of the Jeopardy scandal:...

https://discuss.systems/@andy_pa...

I was recently interviewed by CMU's School of Computer Science media people about the one year anniversary of the Jeopardy scandal: instagram.com/reel/C6Y5SCPrFHu

1.5.2024 01:58I was recently interviewed by CMU's School of Computer Science media people about the one year anniversary of the Jeopardy scandal:...
https://discuss.systems/@andy_pa...

Somebody outside of CMU sent an email notifying me of a 2022 paper out of Saudia Arabia that blatantly stole our entire 2019 ICDE Bulletin...

https://discuss.systems/@andy_pa...

Somebody outside of CMU sent an email notifying me of a 2022 paper out of Saudia Arabia that blatantly stole our entire 2019 ICDE Bulletin survey paper on using ML automatically optimize databases.

+ 2022 Plagiarism: eajournals.org/ejcsit/vol10-is
+ 2019 Original: db.cs.cmu.edu/papers/2019/pavl

It's one thing to copy text, since it's unlikely that anyone will check. They even stole our images but then converted them to low JPGs with compressions artifacts. But to straight up copy the entire title is a bold move. It shows up in search results near the top.

11.4.2024 21:53Somebody outside of CMU sent an email notifying me of a 2022 paper out of Saudia Arabia that blatantly stole our entire 2019 ICDE Bulletin...
https://discuss.systems/@andy_pa...

My #1 PhD student Matt Butrovich successfully completed his PhD defense. His thesis is on accelerating databases with eBPF.+ Metric...

https://discuss.systems/@andy_pa...

My #1 PhD student Matt Butrovich successfully completed his PhD defense. His thesis is on accelerating databases with eBPF.

+ Metric Collection: db.cs.cmu.edu/papers/2022/modd

+ Database Proxies: vldb.org/pvldb/vol16/p3335-but

+ Key-Value Stores: TBA

You have 60 days to hire him. Expect fierce competition.

10.4.2024 00:58My #1 PhD student Matt Butrovich successfully completed his PhD defense. His thesis is on accelerating databases with eBPF.+ Metric...
https://discuss.systems/@andy_pa...
Subscribe
To add news/posts to your profile here, you must add a link to a RSS-Feed to your webfinger. One example how you can do this is to join Fediverse City.
         
Webfan Website Badge
Nutzungsbedingungen   Datenschutzerklärung  Impressum
Webfan | @Web pages | Fediverse Members