Did you know that DataFusion can be used from Python as a SQL query planner and optimizer?
The logical plan can then be translated to DataFrame libraries.
This weekend's experiment is running trivial SQL on Pandas and Polars.
https://github.com/apache/arrow-datafusion-python/pull/190
18.2.2023 16:14Did you know that DataFusion can be used from Python as a SQL query planner and optimizer?The logical plan can then be translated to...I built a toy distributed SQL query engine in Python this weekend, using Ray and DataFusion.
It can run about half of the queries from the TPC-H benchmark.
Would this make for an interesting blog post or conference talk?
https://github.com/andygrove/ray-sql
29.1.2023 18:18I built a toy distributed SQL query engine in Python this weekend, using Ray and DataFusion.It can run about half of the queries from the...DataFusion now has preliminary support for Substrait, and it is available from the Python bindings (as well as from Rust).
https://github.com/apache/arrow-datafusion-python#substrait-support
20.1.2023 14:45DataFusion now has preliminary support for Substrait, and it is available from the Python bindings (as well as from...Blog post covering DataFusion 16.0.0 release and last few months of progress.
https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/
19.1.2023 14:58Blog post covering DataFusion 16.0.0 release and last few months of progress.https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0/I self-published an introductory guide to query engines in 2020, called "How Query Engines Work".
The book covers all steps involved in building a query engine and covers topics such as logical plans, physical plans, query planning and optimization, SQL support, and parallel / distributed query execution.
I'd like the book to reach a wider audience, so I recently made all of the content available for free.
Hopefully, this is useful for some people in my network!
https://howqueryengineswork.com/
9.1.2023 18:42I self-published an introductory guide to query engines in 2020, called "How Query Engines Work".The book covers all steps...It's great to see DataFusion mentioned in Andy Pavlo's database retrospective 😍
https://ottertune.com/blog/2022-databases-retrospective/
2.1.2023 20:49It's great to see DataFusion mentioned in Andy Pavlo's database retrospective 😍...What I want from DataFusion in 2023.
https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/
2.1.2023 00:14What I want from DataFusion in 2023.https://andygrove.io/2023/01/what-i-want-from-datafusion-2023/I've been spending a lot of time comparing query plans lately to see the effects of optimizations and comparing plans between different query engines, so I built some simple tooling to help with this.
By saving plans from each engine out to a simple markup language, I can have one set of tools for generating diagrams. This has helped a lot with documentation and presentations.
Announcing QPML: Query Plan Markup Language
https://github.com/andygrove/qpml
20.12.2022 15:23I've been spending a lot of time comparing query plans lately to see the effects of optimizations and comparing plans between different...An ode to TPC-DS
6.12.2022 01:23An ode to TPC-DSHere's a simple example of using DataFusion from Python.
This is quickly becoming my favorite way to use DataFusion.
30.11.2022 15:11Here's a simple example of using DataFusion from Python.This is quickly becoming my favorite way to use DataFusion.DataFusion Python bindings 0.7.0 have just been published on PyPi.
https://pypi.org/project/datafusion/0.7.0/
This is the first release in ~6 months 😅
We will now be publishing a new version every four weeks, to coincide with new versions of the underlying Rust crates.
29.11.2022 17:04DataFusion Python bindings 0.7.0 have just been published on PyPi.https://pypi.org/project/datafusion/0.7.0/This is the first release in ~6...I've been tinkering with a simple CLI tool for viewing & querying data files (CSV, Parquet, etc.). The most recent feature is the ability to view Parquet row group metadata.
There are more capable tools out there for sure, but this is really coming in handy during my day-to-day work.
Also, writing simple CLI tools is oddly satisfying.
https://github.com/andygrove/bdt
9.11.2022 15:32I've been tinkering with a simple CLI tool for viewing & querying data files (CSV, Parquet, etc.). The most recent feature is the...It looks like quite a few tech folks are trying out Mastodon, so I figured I'd kick the tires too.
31.10.2022 14:57It looks like quite a few tech folks are trying out Mastodon, so I figured I'd kick the tires too.