5 minutes of Data Science

Archive

Week 4 of 2023

5 Minutes of Data Science - week 4

Highlights from January 16 to January 22

Foreword

Have a great week everyone!

#31
January 30, 2023
Read more

Week 3 of 2023

5 Minutes of Data Science - week 3

Highlights from January 16 to January 22

Foreword

Hello world Come say hi on Mastodon. See you next week!

#30
January 23, 2023
Read more

Week 2 of 2023

5 Minutes of Data Science - week 2

Highlights from January 09 to January 15

Foreword

Hello world Come say hi on Mastodon. See you next week!

#29
January 16, 2023
Read more

Week 1 of 2023

5 Minutes of Data Science - week 1

Highlights from January 02 to January 08

Foreword

A few newsletter feeds have been added, enjoy!

#28
January 9, 2023
Read more

Week 52

5 Minutes of Data Science - week 52

Highlights from December 26 to January 01

Foreword

Hello world Come say hi on Mastodon. See you next week!

#27
January 2, 2023
Read more

Week 51 - now with feeds from data science newsletters!

5 Minutes of Data Science - week 51

Highlights from December 19 to December 25

Foreword

Happy holidays everyone!

#26
December 26, 2022
Read more

Week 49

5 Minutes of Data Science - week 49

Highlights from December 05 to December 11

Foreword

ChatGPT has been all the rage, lately. And of course someone reverse engineered it and opensourced it. I don’t know how long it will last.

#25
December 12, 2022
Read more

Week 48

5 Minutes of Data Science - week 48

Highlights from November 28 to December 04

Foreword

Hi friends!

#24
December 5, 2022
Read more

Week 47

5 Minutes of Data Science - week 47

Highlights from November 21 to November 27

Foreword

Last week we saw Stable Diffusion 2.0 come out. There was also quite a big discussion on Galatica, a languarge language model by Meta.

#23
November 28, 2022
Read more

Week 46

5 Minutes of Data Science - week 46

Highlights from November 14 to November 20

Foreword

Hi folks!

#22
November 21, 2022
Read more

Week 45

5 Minutes of Data Science - week 45

Highlights from November 07 to November 13

Foreword

Hello world Come say hi on Twitter

#21
November 14, 2022
Read more

Week 44

5 Minutes of Data Science - week 44

Highlights from October 31 to November 06

Foreword

Hi folks 👋🏻

#20
November 9, 2022
Read more

Week 43

5 Minutes of Data Science - week 43

Highlights from October 24 to October 30

Foreword

Hi everyone - especially to the new subscribers! 👋 So great to have you here!

#19
October 31, 2022
Read more

Week 42

5 Minutes of Data Science - week 42

Highlights from October 17 to October 23

Foreword

The last episode of the Towards Data Science podcast is worth the time, since Jeremie (the host), talks about the future of AI. Simply put, we’re moving faster in AI that we initially predicted.

#18
October 24, 2022
Read more

Week 41

5 Minutes of Data Science - week 41

Highlights from October 10 to October 16

Foreword

This was a busy week - didn’t have the time to research the best from below. Enjoy!

#17
October 17, 2022
Read more

Week 40

5 Minutes of Data Science - week 40

Highlights from October 03 to October 09

Foreword

My favourites from the previous week:

#16
October 10, 2022
Read more

Week 39

5 Minutes of Data Science - week 39

Highlights from September 26 to October 02

Foreword

Last week, there was focus on environmentally friendly machine learning from Google AI and Amazon Science. Also, there was a text-to-video model released last week. More info below 👇

#14
October 3, 2022
Read more

Week 38

5 Minutes of Data Science - week 38

Highlights from September 19 to September 25

Foreword

A new podcast feed has been added to the content of this newsletter (Data Science at Home). A reminder that this newsletter is an ETL pipeline and open source. You can suggest any feeds in the issues section.

#13
September 26, 2022
Read more

Week 37

5 Minutes of Data Science - week 37

Highlights from September 12 to September 18

Foreword

Hi everybody! Last week’s podcasts included many interesting topics, suck as Stable Diffusion, Transformers on Tabular Data and whether studying AI in academia is a waste of time. The Reddit community is also focusing on Stable Diffusion, as well as a few GitHub repositories.

#15
September 19, 2022
Read more

Week 36

5 Minutes of Data Science - week 36

Highlights from September 05 to September 11

Foreword

Hi folks!

#12
September 12, 2022
Read more

Week 35 review

5 Minutes of Data Science - week 35

Highlights from August 29 to September 04

Foreword

Welcome to the new format of this newsletter. Now, it includes blog posts from the research teams at companies like OpenAI, DeepMind, Google and Amazon, the latest podcast and youtube episodes, the trending GitHub repositories related to Data Science and Machine Learning and the latest from the communities on Reddit!

#11
September 5, 2022
Read more

ETL pipelines, what makes a data science project successful, Data Science for beginners by Microsoft

🗯 This week

  • I’ve been wrapping up the series on “Building an ETL pipeline from scratch.” It’s a great opportunity to get started if you’re not used to building pipelines. The bonus is that it uses the newest version of Airflow. I hope to finish the blog post in a week or two.
  • After a week of posting tweets on stats regarding successful data science projects, I figured the best is to compile in a (future) blogpost. After seeing so many failed projects, I found it interesting to understand how to tackle common data science problems. If you’re curious, I started tweting about it around here. Here are some example stats: stats
  • Remember to check the most popular Reddit posts this week on data-related boards. 👇

🔮 Data Science

  • Data Science for beginners by Microsoft
#10
November 1, 2021
Read more

It was a slow week but here's what happened in Reddit's data community

🗯 This week

  • This was a slow week. I took the opportunity to focus more on work and less on reading.

#9
October 18, 2021
Read more

Free ebook on Introduction to Probability for Data Science, how to train BERT for Q&A

🗯 This week

  • As I’ve mentioned last week, I’ve been working on extracting stats from my jobs’ feed at Upwork.com. The goal is to understand which are the most sought-out data science and data engineer skills. I’m building an ETL pipeline for this. If this is something that you’d be interested - wether it’s for Data Science or not - ping me on twitter.

  • The reason why math and programming go hand-in-hand: math_and_coding

  • Remember to check the most popular Reddit posts this week on data-related boards. 👇

🔮 Data Science

  • Free ebook! “Introduction to Probability for Data Science” by Stanley Chan. Download here.

  • NLP: How to train BERT for Q&A. link

  • FooDI-ML: a new large-scale multi-language dataset that contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections. link

#8
October 11, 2021
Read more

Data Salaries in 2021, Freelancing data, Activation functions

🗯 Quick update

  • Last month, O'Reilly released the 2021 Data/AI Salary Survey. It's a good insight into the most well-paid programming languages, tools, cloud providers, etc. Download it here.

  • To understand the Data Science/Data Engineering freelance market, I've been extracting data from Upwork, a website for people finding freelancers and for freelancers to find tasks. The goal is to understand a few things, such as the most sought-out skills, the highest/lowest paying jobs.

  • Remember to check the most popular Reddit posts this week on data-related boards. 👇

🔮 Data Science

  • All about activation functions
  • How to calculate time-weighted averages
#7
October 4, 2021
Read more

Getting started on Upwork as a platform for freelancing in Data Science and Data Engineering, most sought-after skills

🗯 Week retrospective

The pace of the blog posts has slowed a bit, but not the newsletter, so here's a new newsletter item! It has to do with the volume of work I'm (happily) facing. I've also created an account on Upwork to keep broadening my data science and data engineering skills while doing actual client work. When you're new to the platform, you face some challenges, and that's something I'll write about later.

Until then, here's what's going on out there.

🔮 Data Science

#6
September 14, 2021
Read more

interactive dashboard, datasets for data science practice, top reddit posts

🗯 This week

After a short vacation break, the newsletter is back. This time, I'm experimenting with including the top reddit posts on data-related subs. I still want to keep the newsletter small, so it's an experiment :-).

🔮 Data Science

  • Interactive dashboards with Holoviz
  • Datasets to practice Data Science skills
  • How to build an analytics data team
  • A minimal Python library to draw customized maps from OpenStreetMap data
#5
September 5, 2021
Read more

New blog post about taskgroups in Airflow 2, a free DBT course, machine and deep learning compendium and more

🗯 Featured post

The ETL series is now taking shape with the third article: task groups using the newest TaskFlow API from Airflow 2.0. In the blog post, we’re building a simple pipeline with two groups of tasks, using the @taskgroup decorator of the TaskFlow API from Airflow 2.

#4
August 23, 2021
Read more

Download a popular Data Science book, a tool to visualize Github repos, an SQL cheat sheet, ...

🗯 Featured post

This week’s blog post is a showcase of how Airflow 2.0 is a game-changer. The goal is to build an ETL pipeline and slowly build up.

#3
August 9, 2021
Read more

Probability distributions simply explained, data visualization library, ML cheat sheet and more

Hi everyone! In this issue of the newsletter, there’s a lot of focus on great libraries.

The 🗯 Featured post is not ready in time for this newsletter, but in the next edition, I’ll share how to write a DAG using Airflow 2.0’s new Taskflow API. Stay tuned!

#2
August 2, 2021
Read more

Installing airflow, ML conferences in 2021 and 2022, comparing dashboards, new Kaggle beginner-friendly competition

Hi folks, I hope you had a great week.

🗯 Featured post

This week’s featured post is the first of the series, where I’ll build an entire Data Engineering pipeline using Raspberry Pi’s.

#1
July 25, 2021
Read more
Find 5 minutes of Data Science elsewhere: GitHub Twitter Linkedin Mastodon
Brought to you by Buttondown, the easiest way to start and grow your newsletter.