Aug. 19, 2022, 12:34 p.m.

🐋 Whale, whale, whale, what do we have here.

Late To The Party

finally, I’m coping with the heat. I’m back to writing regularly and contributing at work and in my projects. Lovely! So let’s look at some machine learning!

The Latest Fashion

  • This collection of over 200 machine learning flashcards is really neat.
  • Never Trust A Number – was a fascinating piece of insight.
  • Chip Huyen wrote an intro to streaming data for data scientists.

Got this from a friend? Subscribe here!

My Current Obsession

I’m getting ready for Euroscipy and my big vacation. Lots of vaccinations, visa stuff, and general travel preparations.

I’m so curious who will go to Euroscipy this year. I understand that many might not come due to the ongoing pandemic. I myself stay away from most in-person events, but felt like this small conference might be a good one to go for. I’m still going to mask, and I’ll go ahead and take a few tests just to be sure. If you have some tips on how to attend in-person events safely, I’d be curious to hear them!

Notion, which I use to run my life, just introduced a status property and a progress bar. I have been updating all my systems to incorporate the neat new tools. Marie Poulin has a nice little video about these!

I couldn’t really work out during the heat wave. (I could barely exist, to be honest.) I felt like pushing myself, so I deadlifted 200kg. Pretty proud of that milestone.

Thing I Like

I decided to get my own scuba diving computer. Apparently, the Cressi Donatello is really good, so I’ll go and try that one out next week. Very excited!

Hot off the Press

I wrote a blog post about getting started on Kaggle easily.

I’m trying atomic essays again for a bit of inspiration and challenge.

I started out writing about why I still write about ML online, including this newsletter. Here are 3 topics in ML I’m particularly interested in right now.

This is how I got interested in machine learning, and this is how I suggest others can get a start in ML (in 5 ways).

Machine Learning Insights

Last week I asked, What is an example for Bayes Theorem? and here’s the gist of it:

Geologists have to work in a highly biased decision space. Without knowledge of some regional tectonics, diagenesis, possible volcanic activity even, work in an outcrop gets very hard. It could be anything, so every detail has to be analyzed and weighed.

When we are in the field, we collect evidence. We find a healed fracture that could contain Quartz or Calcite. We have tested healed fractures in this area before, and we found that almost 95% of limestones contain Calcite seams. However, in sandstone, we could find 25% Calcite seams. It is easiest for us to test the seams. A simple scratch test is enough. The surrounding rock is very withered, and we try not to hammer all the rocks to preserve the nice geosite. The scratch test reveals it is Calcite after all. Now we can use Bayes theorem to calculate the probability that we are looking at a limestone.

We set the probabilities of the rock being sandstone or limestone to be 50/50, as we don’t know better. In statistical terms, we set the “known distribution” or a-priori to be equal:

P(Limestone) = P(Sandstone) = 0.5

We also know that calcite has a probability of 95% in limestones. In statistical terms, the conditional probability is

P(Calcite|Limestone) = .95

The same goes for sandstone:

P(Calcite|Sandstone) = .25

In fact, this is all we need to perform the Bayes trick. One intermediate step helps us understand Bayes even better:

P(Calcite) = P(Limestone) • P(Calcite|Limestone) + P(Sandstone) • P(Calcite|Sandstone)

This gives the total probability of testing positive for limestone in the outcrop. Now to the juicy juicy Bayes itself. We want to find the conditional probability of having a limestone rock surrounding our Calcite seam. In statistics, this is:  P(Limestone | Calcite). You may notice that It’s now turned around. It’s “is it limestone because we found Calcite?” instead of “How likely is it to find Calcite in limestone?”.

P(Limestone|Calcite) = P(Limestone) • P(Calcite|Limestone) / P(Calcite)

We have all the numbers to do this:

P(Limestone|Calcite) = 0.5 • 0.95 / (0.5 • 0.95 + 0.5 • .025) = 0.5 • 0.95 / 0.6 = 0.79

We get a probability of 79% of this being limestone surrounding a Calcite seam. Proudly, we go to our professor and report the number. It’s interesting, but based on the history of the outcrop, they suggest you might adjust your calculation a little bit. She tells the group that there were huge coral reefs in this area and even shows some fossils in another outcrop. Now that you understand Bayes, you can easily go back and adjust your numbers. The reefs made up 65% of the area, and with this expert knowledge, you adjust P(Limestone) to 65% and P(Sandstone) to 35%.

P(Limestone|Calcite) = 0.65 • 0.95 / (0.65 • 0.95 + 0.35 • 0.25) = 0.65 • 0.95 / 0.705 = 0.875

Your adjusted probability goes up to 87.5%. We can see that expert knowledge can be used in a Bayesian approach, which is why many people like it these days. Expert knowledge, or bias, skews the results in a certain direction, something we can use but need to use with care.

Originally published here.

Data Stories

GPS, we rely on it every day.

It is the most frustrating thing when your watch doesn’t find the location during a run, or your phone is a street off while driving to the hotel in a foreign city.

But GPS is a bit more in modern society.

Planes, ships, rockets, and even timekeeping. They use GPS.

But we can interfere with it. We can intercept the GPS signal and pretend we’re someone else. Somewhere else. Lead someone somewhere else. Or have them forget where they are and leave them stranded.

GPS jamming and spoofing are huge security and economic risks.

I was extremely surprised that someone could make a global interactive GPS jammer map, but here we are in the 2020s.

gpsjam.gif

Here’s the [Source]

Question of the Week

  • What are common techniques for Exploratory Data Analysis?

Post them on Twitter and Tag me. I’d love to see what you come up with. Then I can include them in the next issue!

Tidbits from the Web

  • You probably didn’t know you needed a live dashboard of the top emoji on Twitter.
  • Do you ever feel like life is changing too fast? This video tackles better dealing with change but might leave you in a mini-existential crisis.
  • We write text in addition to code to make people understand. How do we write good short easier text?

You just read issue #92 of Late To The Party. You can also browse the full archives of this newsletter.

Share on Facebook Share on Twitter Share on LinkedIn
Find Late To The Party elsewhere: GitHub Twitter YouTube Linkedin Mastodon Instagram