I found this post about the new Pandas API on Spark very intriguing, specifically the performance improvements so I wrote a few simple tests to highlight them.
I will walk you through my thoughts on owning the new M1 MacBook Pro and what it took for me to get my development environment up and running on it.
This post will talk about what integer range partitioning is, how to leverage it, and finally walk through a few scenarios demonstrating the benefits of it.
In this post I will walk through how to use BigQuery’s new capability of querying Hive Partitioned Parquet files in GCS. It is a really cool feature.
If you are looking for an easy way to query a public dataset you should definitely check out Big Query’s publicly available datasets.