5 minutes of Data Science with Pedro Madruga

Archive

ETL pipelines, what makes a data science project successful, Data Science for beginners by Microsoft

🗯 This week

  • I’ve been wrapping up the series on “Building an ETL pipeline from scratch.” It’s a great opportunity to get started if you’re not used to building pipelines. The bonus is that it uses the newest version of Airflow. I hope to finish the blog post in a week or two.
  • After a week of posting tweets on stats regarding successful data science projects, I figured the best is to compile in a (future) blogpost. After seeing so many failed projects, I found it interesting to understand how to tackle common data science problems. If you’re curious, I started tweting about it around here. Here are some example stats: stats
  • Remember to check the most popular Reddit posts this week on data-related boards. 👇

🔮 Data Science

  • Data Science for beginners by Microsoft
#9
November 1, 2021
Read more

It was a slow week but here's what happened in Reddit's data community

🗯 This week

  • This was a slow week. I took the opportunity to focus more on work and less on reading.

#8
October 18, 2021
Read more

Free ebook on Introduction to Probability for Data Science, how to train BERT for Q&A

🗯 This week

  • As I’ve mentioned last week, I’ve been working on extracting stats from my jobs’ feed at Upwork.com. The goal is to understand which are the most sought-out data science and data engineer skills. I’m building an ETL pipeline for this. If this is something that you’d be interested - wether it’s for Data Science or not - ping me on twitter.

  • The reason why math and programming go hand-in-hand: math_and_coding

  • Remember to check the most popular Reddit posts this week on data-related boards. 👇

🔮 Data Science

  • Free ebook! “Introduction to Probability for Data Science” by Stanley Chan. Download here.

  • NLP: How to train BERT for Q&A. link

  • FooDI-ML: a new large-scale multi-language dataset that contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections. link

#7
October 11, 2021
Read more

Data Salaries in 2021, Freelancing data, Activation functions

🗯 Quick update

  • Last month, O'Reilly released the 2021 Data/AI Salary Survey. It's a good insight into the most well-paid programming languages, tools, cloud providers, etc. Download it here.

  • To understand the Data Science/Data Engineering freelance market, I've been extracting data from Upwork, a website for people finding freelancers and for freelancers to find tasks. The goal is to understand a few things, such as the most sought-out skills, the highest/lowest paying jobs.

  • Remember to check the most popular Reddit posts this week on data-related boards. 👇

🔮 Data Science

  • All about activation functions
  • How to calculate time-weighted averages
#6
October 4, 2021
Read more

Getting started on Upwork as a platform for freelancing in Data Science and Data Engineering, most sought-after skills

🗯 Week retrospective

The pace of the blog posts has slowed a bit, but not the newsletter, so here's a new newsletter item! It has to do with the volume of work I'm (happily) facing. I've also created an account on Upwork to keep broadening my data science and data engineering skills while doing actual client work. When you're new to the platform, you face some challenges, and that's something I'll write about later.

Until then, here's what's going on out there.

🔮 Data Science

#10
September 14, 2021
Read more

interactive dashboard, datasets for data science practice, top reddit posts

🗯 This week

After a short vacation break, the newsletter is back. This time, I'm experimenting with including the top reddit posts on data-related subs. I still want to keep the newsletter small, so it's an experiment :-).

🔮 Data Science

  • Interactive dashboards with Holoviz
  • Datasets to practice Data Science skills
  • How to build an analytics data team
  • A minimal Python library to draw customized maps from OpenStreetMap data
#4
September 5, 2021
Read more

New blog post about taskgroups in Airflow 2, a free DBT course, machine and deep learning compendium and more

🗯 Featured post

The ETL series is now taking shape with the third article: task groups using the newest TaskFlow API from Airflow 2.0. In the blog post, we’re building a simple pipeline with two groups of tasks, using the @taskgroup decorator of the TaskFlow API from Airflow 2.

#3
August 23, 2021
Read more

Download a popular Data Science book, a tool to visualize Github repos, an SQL cheat sheet, ...

🗯 Featured post

This week’s blog post is a showcase of how Airflow 2.0 is a game-changer. The goal is to build an ETL pipeline and slowly build up.

#2
August 9, 2021
Read more

Probability distributions simply explained, data visualization library, ML cheat sheet and more

Hi everyone! In this issue of the newsletter, there’s a lot of focus on great libraries.

The 🗯 Featured post is not ready in time for this newsletter, but in the next edition, I’ll share how to write a DAG using Airflow 2.0’s new Taskflow API. Stay tuned!

#5
August 2, 2021
Read more

Installing airflow, ML conferences in 2021 and 2022, comparing dashboards, new Kaggle beginner-friendly competition

Hi folks, I hope you had a great week.

🗯 Featured post

This week’s featured post is the first of the series, where I’ll build an entire Data Engineering pipeline using Raspberry Pi’s.

#1
July 25, 2021
Read more
 
Brought to you by Buttondown, the easiest way to start and grow your newsletter.