Oct. 16, 2022, 9:02 p.m.

šŸµ The tea hardest to swallow is reali-tea

Late To The Party

if you go back and analyse how many newsletters were late and which ones mention that I’m stressed, I think there’s some overlap. But regardless, there has been some amazing machine learning going one, so let’s check those out for a break!

Also welcome to all those 100s new subscribers! We’re over 850 now!

The Latest Fashion

  • The State of AI report for 2022 just dropped.
  • Selecting features in data science is important. LOFO-importance for feature selection automates this selection by leaving a feature out and retraining.
  • Kaggle published their data from the 2022 ML and data science survey.

Got this from a friend? Subscribe here!

My Current Obsession

I’m on the road again. This time I’ll be in Singapore and Greece for work. I’m giving a big talk here about the work ECMWF does in machine learning for numerical weather prediction. Unsurprisingly, I have some serious imposter syndrome going on.

This goes hand-in-hand with my problem that I get too excited by doing too many cool things, which I end up saying yes to. Within the coming month, I have multiple major deadlines for different projects and I have no idea how I’m supposed to meet them all. So the next weeks will be fairly intense.

Thing I Like

After flying all day today, I have to give another shoutout to my Sony noise-cancelling headphones keeping me sane during such a noisy endeavour. Only draw-back, when I was watching movies and the PA came on, my head almost exploded from the max volume they force on you.

Hot off the Press

I had two posts on Linkedin go pretty viral. One about ar5iv the HTML5 arxiv alternative render. Another about the NN-SVG web tool for neural network architectures. I share both of these with you 8 months ago, but it looks like the ar5iv one just cracked 600,000 views, which is pretty neat.

In case you missed it, I wrote an article about my favourite VSCode Extensions and it’s still quite popular.

Machine Learning Insights

Last week I asked, ā€Where do you normally obtain data for your analysis?ā€, and here’s the gist of it:

It’s not uncommon that we have to collect data ourselves for scientific analysis. The dirty secret about ā€œgetting labelsā€ for data is that someone has to sit down to label the data. High-quality data is usually extremely important for better models, so we end up labelling data ourselves.

But we would be amiss to not check if datasets exist and if they’re available publicly. In weather, for example we have WeatherBench as a fantastic benchmark dataset.

I will usually start my search on Kaggle. The simple reason being that datasets on Kaggle often come with notebooks, which is a double win for me. Then I often use the Google Dataset search to find other supplemental data.

There is also OpenML, the Amazon AWS Registry and this Awesome List of Public Datasets.

Data Stories

The Roman Empire fell and elephants basically disappeared from Western Europe.

So we had to do scientific sketches from stories and other sketches, which ended up in a hilarious case of the Telephone Game. I cropped a small part of it, so definitely check out the full chart below. Each elephant is also clickable, although many links are too old end don’t function anymore I’m afraid.

I think there are some really adorable specimens in here!

elephants.png

Source: Uli Westphal

Question of the Week

  • What is the exploding gradient problem?

Post them on Twitter and Tag me. I’d love to see what you come up with. Then I can include them in the next issue!

Tidbits from the Web

  • I enjoyed playing with these interactive music tools online!
  • I’ll admit it, weird GoPro angles are peak comedy to me.
  • Duncan is very cute and ultra-focused.

You just read issue #99 of Late To The Party. You can also browse the full archives of this newsletter.

Share on Facebook Share on Twitter Share on LinkedIn
Find Late To The Party elsewhere: GitHub Twitter YouTube Linkedin Mastodon Instagram