John's Data and Analytics Weekly
RSS
Archive
Subscribe
EDA limits, Tinned Fish, SQL Style
September 23, 2023
VHS allows you to create terminal GIFs as code -- great for demoing CLI. Canned Fish. What does that have to do with data analytics? Rainbow Tomatoes Garden...
GraphQL+Analytics, MLOps, Code Review 101
October 28, 2022
I’m back after a long hiatus! Let’s get right to it Code Reviews 101:The Golden Rules of Code Reviews. Tim Ferriss posted it, so it got wide distribution,...
monorepos, hiring cheatsheet, getting started in data engineering
April 27, 2022
We have been arguing about monorepos for a long time. Ask HN: I’m looking for a modern Python book, the latest one I have on my shelf is from before the 2/3...
Will Airflow Win, SQL vs Everything, 20 years of programming
April 3, 2022
My favorite post of the week is Alex Ewerlöf’s guiding principles after 20 years of programming. Some of the most resonant: Write code not for the machines...
Data Project Management, Code Review Pyramid, Apache Arrow
March 20, 2022
Good Data Product Managers Increase data accessibility Provide faster ROI on data Save time for the data team and data consumers Provide more precise...
Grist, Onboarding Engineers, a Career-Ending Mistake
February 26, 2022
This is not a meeting. Do you want to know why you’re fatigued at the end of a long day of video conferences? It’s because your brain has been straining to...
Differential Privacy, pipeline mistakes, team inefficiencies
February 13, 2022
This is very exciting: Google has released PipelineDP, an open-source python library that allows developers to aggregate data in a privacy-preserving way,...
Data Privacy Day, data2vec, Fake Data
January 27, 2022
This Friday (01/28) is Data Privacy Day. Duke's Kenan Institute for Ethics is hosting a discussion with Neil Richards on Why Privacy Matters. fly.io is now...
An improved git flow, probabilistic programming, better status updates
January 21, 2022
We use the common git flow described in the article below, and we suffer from precisely the drawbacks discussed (multiple commits per MR/PR, tedious code...
SQL is Code, Code Review Manners, High-Impact Habits
January 15, 2022
Why Google Treats SQL Like Code (and You Should Too) I Think I Know Why You Can’t Hire Engineers Right Now sums up what’s important, right out of the gate....
Data stacks, writing tests for data science, PCA vs SVD
December 30, 2021
An interview with former Snowflake CEO Bob Muglia on the how the data stack will evolve in the next 5 years. Of particular interest to our team are his...
Agile in analytics, time series forecasting, visualizing dataframe transforms
December 22, 2021
I enjoy Taylor Browlow's writing on data science in industry. How to make agile actually work for analytics, discusses what we all probably struggle with:...
Kalman filters explained, idempotency, down-sampling in plotly
December 15, 2021
Idempotency is a very simple algebraic concept that is important in building resilient systems. Love this article! Very approachable explanation of Kalman...
Driving your Career, Better Decision Tree Visualizations, DBT
December 8, 2021
Where do you see yourself in 5 years? 25 tips for driving your own career “The best version of your career is finding jobs that are in the Venn diagram...
VSCode Notebooks, Academic Research DevOps, JetBrains newest IDE
December 3, 2021
Hello friend! Here is a short list of the most interesting and relevant data science and data engineering content I've found this week. VSCode now supports...