The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I’ve been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you’re welcome to become a friend via the links below.
This week, the newsletter is slightly longer than usual, as I skipped last week’s.
I travelled to Iceland last week, seeing recent volcanic eruptions and feeling the heat still coming out of the lava, walking inside a glacier, and bathing in natural hot springs. But one of my favourite trips was to see the famous DC-3 plane wreck, something that’s been on my list for years and that I wanted to see before it fatally crumbles – it’s literally dusting away as the years pass.
But… the wreckage also offers some reflections, as in my tweet below. There were 7 people on board of that plane when it crash-landed, and they all survived. However, at least 2 people have died in connection with it: tourists trying to visit the wreckage while ill-equipped or not fully informed. Risk-assessment is highly contextual. My picture below captures what is – by Icelandic October standards – a glorious day. But weather in Iceland is highly changing, even more than in Britain. A snow storm is always a possibility. High winds could make walking hard for people who are not at the top of their fitness. Would you believe that walking on a flat 2.8km walk could kill you and that crash-landing in the middle of a desert could not?
Every week I include a six-question interview with an inspiring data person. This week, I speak with Prukalpa Sankar of Atlan. Some of you might know her as the data infrastructures and platforms guru whose enlightning blog posts I’ve often featured in this newsletter.
David Kane and others, including the NCVO – the organisation that groups together charities and other voluntary organisations – have launched a new UK Charity Classification, with a view of reducing the way too many “catch-all” categories. They say: “we took a sample of over 4,000 registered charities and manually classified each one, creating new tags as we went along and encountered different types of charities. This sample could then be used to generate and test keyword-based rules for automatic classification of charities, as well as training machine-learning models.”
All the code and methods are openly available.
As you might or might not know, yours truly has been advising a group of academics – some of whom are former colleagues at St George’s, University of London – on an interesting research project that is trying to use knowledge graphs to detect “hubris” in leaders.
The project is now looking for its first PhD Students with a fully funded scholarship. If you are interested or know someone who could be, please direct them to the PhD scholarship advert and the Knowledge4Hubris project page.
‘till next week,
You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue.
The Financial Times’ voice of rationality, data journalist John Burn-Murdoch, has published two brilliant Twitter threads about COVID. The first explains population statistics when things look odd (e.g. what “>100% of the population is vaccinated” actually means) and is ultimately about the perils of using data without fully appreciating the methodology that created/processed that data in the first place. The second covers the often misinformed debate about cases, hospitalisations and deaths.
“This dashboard was created to visualize, explore and analyze all of the published IATI (International Aid Transparency Initiative) data that is related to the coronavirus pandemic. It was developed by the OCHA Centre for Humanitarian Data.“
The Economist’s data journalist Sondre Ulvund Solstad explains the thinking behind their visualization of Russian election voting patterns which shows, according to the author, that the election has some fraudulent results even though the chart cannot show exactlly where.
“What are the measures & policy responses connected to data about the pandemic?
Since the start of the pandemic about 641 days ago, we are confronted with charts about new cases or even deaths. What are the events behind the numbers?“
A catchy and interactive data visualization by researchers at the University of Applied Sciences Potsdam.
Data Analyst Lisa Hornung shares her favourite mostly data-related tools in a Twitter thread every Thursday. Keep an eye on Lisa, she’s one of data Twitter must-follows.
“Six Questions” graduate and census dataviz guru Ahmad Barclay has created a brilliant prototype of a Census Area Explorer for the Office of National Statistics.
Ah, those trolls at The Economist ;-) After convinging everyone that R was the best coding language for data journalism in their previous newsletter, they have now done the same with Python. Write-up by Dolly Setton, one of their Data Journalism team.
Interesting set of features I wasn’t really aware of. The CSS Overview tool is pretty.
We’ve seen plenty of map projection tools before, but this Observable notebook by geography lecturer Florian Ledermann is particularly easy to use. It’s for his course “Cartographic and Geodetic Foundations forPlanners” at the TU Wien.
“This package allows the use of Observable notebooks (or parts of them) as htmlwidgets in R.”
“In this post, we will cover.
What is web scraping?
What are the main programming frameworks for web scraping?
What are some of the main enterprise-level paid web scraping frameworks?
A Python web scraping example where we extract some information from a site with Beautiful Soup
The Do’s and Don’ts of Web Scraping“
“WeatherSpark.com offers detailed reports of the typical weather for 145,449 locations worldwide.“
“Every year I share a collection of useful tools for data journalism and data storytelling with my students. Sharing it with a wider world here.“
VLOOKUP is probably the single most useful function in spreadsheet applications. Here Lisa Charlotte Muth of Datawrapper shows how to use it.
“Statistical tests need to be paired with proper data and study design to yield valid results. A recent review paper on Long Covid in children provides a useful example of how researchers can get this wrong. We use causal diagrams to decompose the problem and illustrate where errors were made.” Interesting read by fast.ai.
“Built upon the ubiquitous Fourier transform, the mathematical tools known as wavelets allow unprecedented analysis and understanding of continuous signals.“
A take on whether machine learning is statistics or not – the author strongly supports the view that the two are essentially different. I’m not entirely sure that the case is entirely well argued, but it’s a good starting point to this side of the debate.
Anna Thieme at Datawrapper shows how to make a map of the volcano eruption in La Palma.
A very good piece of student work that was longlisted in the Information is Beautiful Awards. “The visualizations presented here are designed in an abstract form of a networked mesh of the places of worship of different dominant faiths in the country. “
No image here, as this piece comes with a warning: it’s about suicide. If you feel impacted or have suicidal thoughts, please contact the Samaritans.
This article analyses suicide in the Netherlands in 2017.
“I recognize the system Werner devised isn’t as useful as it used to be when it was devised so many years ago but I enjoy breathing new life into classic works of art so I chose to recreate it online.
The result is something that’s hopefully interesting for those just discovering Werner’s guide and those that may already be familiar with it and want to discover it in a new light.“
This is just too pretty not to link to, it’s fully interactive, and it might trigger great questions on where colours come from.
“A Netflix user will browse the app for 90 seconds and leave if they find nothing. Thumbnail artwork is actually NFLX’s most effective lever to influence a viewer’s choice. A user will look at one for only 1.8 seconds, so NFLX spends huge to optimize them.“
A very interesting Twitter thread.
Terence Eden has written this pretty enjoyable (but grim) article about the dangers of Google’s automatic summarisation in its search info-box. In some case, with potential risks to life.
quantum of sollazzo is supported by ProofRed’s excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.