The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I've been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you're welcome to become a friend via the links below.
The most clicked link last week was Wired's story on the use of data science in welfare decisions in the US. This has been picking up a lot of attention especially in my AI circles, where there is a lot of concern around both bias and lack of evidence. On this, I can't get tired of recommending this presentation on AI snake oil by Prof Naranayan of Princeton University.
I asked ChatGPT what it knows about me. Which is the 2023 equivalent of Googling your name 15 years ago. Well, it's interesting. It knows a lot about me and my career, and it definitely gets my interest in data and the recent healthcare context.
Except... it's all wrong. Very wrong. And for now, we see this almost as novelty, as something to laugh about. It might have serious consequences, unless we learn to use it as a tool for fuzzy, abstract stuff. In this respect, this really chimes with the brilliant article by Ted Chiang in the New Yorker, which likens ChatGPT to a blurry compressed image.
[*] their pun, not mine.
Finally, a reminder that you can directly buy advertisement space in Quantum of Sollazzo. Just head to the cal.com scheduler and select an issue.
I'll keep banging on this for a bit, as the readership is growing as are the costs involved in sending it. Most of it is covered by Friends of the newsletter (link below) and occasional sponsors for now, but it would be good to have a pipeline to make sure I don't have to suddenly pause if things get out of control (or maybe this is the dream...).
'till next week,
You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue.
"The data used to populate this chart is sourced from the ProPublica Congress API, which itself is sourced from the congress-legislators github project. Generational breakdown uses age brackets defined by the Pew Research Center, except for "Missionary Generation", which is defined by the Strauss–Howe generational theory."
Very, very good. Not (easily) replicable in the UK.
"Early spring can be good for trees and blossoms but miserable for allergy sufferers."
"New bans will have outsized impacts on who can get an abortion, how far they have to drive for it and how long they have to wait for an appointment. A new analysis by Caitlin Myers, an economics professor at Middlebury College who studies abortion, illustrates how abortion access could continue to dwindle this year if key states like Florida and North Carolina pass additional restrictions."
FiveThirtyEight takes a look at the state of abortion rights in the US, with the help of a brilliant interactive map.
"Fewer and fewer young people, fewer and fewer graduates. Thirteen of Italy's twenty-one regions are not only in demographic decline, they are also struggling to get the remaining young people through third-level education. And it is not just the South that is struggling."
Lack of opportunities, lack of education, and migration, in this article for the European Data Journalism Networks that looks at EUrostat data
"See how events unfolded through maps and Grid’s in-depth reporting and analysis."
More like this, please!
"The City Controller is the elected paymaster, auditor and chief accounting officer for the City of Los Angeles. Along with the Mayor and City Attorney, the Controller is one of three offices elected every four years by citywide popular vote". He's decided to share data and data analysis.
This website is a brilliant resource for Los Angeles, but also an inspiration to local authorities around the world.
The past year has seen a large reduction in the amount of available investment in tech, alongside the well-known layoffs at big tech companies. This article by magazine Rest of World tries to analyse the situation.
"Language Models such as GPT-2 can be used for Music Generation. The idea is to represent pieces of music as texts, effectively reducing the task to Language Generation.
This model is a rather small instance of GPT-2 trained the Lakhclean dataset. The model generates 4 bars at a time at a 16th note resolution with 4/4 meter."
A model on Hugging Face.
"Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs).
Guardrails: ✅ does pydantic-style validation of LLM outputs, ✅ takes corrective actions (e.g. reasking LLM) when validation fails, ✅ enforces structure and type guarantees (e.g. JSON)."
"System font stack CSS organized by typeface classification for every modern OS"
What's interesting about this project by Daniele Bottillo (ex Deliveroo) is not just that it's a useful template, but also that it is showcased by a sample app that creates a home screen with two tabs and a button to show the train & bus departures from a specific train station & bus stop in London (UK) using TransportAPI.
Speaking of buses, Ryan Lamb has written an extensive and well documented tutorial on his approach to ingesting and analysing the real time data coming from the UK Department for Transport's Bus Open Data Service. All source code is available under the GPL 3 licence.
Something to keep an eye on. Observable is really evolving into a complete data analytics suite.
"Studying conflicts in the post-Soviet space through structured analysis of textual contents available online". A new project to study text as data & data in the text, by Giorgio Comai, who will develop and share datasets and tools alongside its findings.
DataCamp helps individual learners make better use of data. Build data skills online while learning from the world’s top data scientists. Help close the talent gap with DataCamp.
"A recent blog post proclaims “Big Data is Dead.” Not coincidentally, this proclamation is made by the folks supporting DuckDB, a database system optimized for local in-process deployments. (P.S. I am a fan of DuckDB and I’m planning to swap out PostgreSQL with DuckDB in my data engineering class the next time I offer it. Also watch out for more from Ponder on DuckDB!)
All that said, I’m here to declare “not so fast!” I draw on lessons from the database research community to argue that any organization has big data, medium data, and small data needs."
The debate goes on.
"This post presents the argument for open sourcing analytical work using the real-world example of Splink, a Python record linkage library. The hope is that it helps others who want to make the case for open sourcing their work." This is by Robin Linacre, whose work at the Ministry of Justice on Splink we covered in QoS 486.
MuckRack's CEO Gregory Galant: "This survey sheds light on the deep responsibility journalists have to deliver news and information to the public and how they’re managing it with limited resources. Our aim in releasing this data is to help PR teams be successful when working with the journalism community, approaching relationships with empathy, patience and real insight into how journalism gets made."
The full report is behind a registration wall, but it's worth it.
There are quite a few resources on the website of the Digital Humanities Awards from last year. A wonderful set of dataviz, datasets, tools, training materials, "for fun" stuff, and publications. Basically, a bit like an issue of "Quantum", but for Digital Humanities.
(via Massimo Conte)
"One way to measure the magnitude of a bank's failure is by the amount of assets the bank held. By that measure, SVB's collapse is the second largest American bank failure of all time, and the largest since 2008."
Data visualization by Pranshu Maheshwari.
Projections, projections, projections.
"The first known calculation of the Earth’s circumference was made 2300 years ago by a man called Eratosthenes. I remember in school, how impressed I was by how accurately the Earth’s circumference was measured such long time ago. Today we’re going to take a closer look on how his calculation was made."
"Your AI-free Content Deserves a Badge".
Parody or not? :-)
"Machine learning techniques can be successfully deployed to better identify food insecurity outbreaks across the world long before they take place, according to a new study."
quantum of sollazzo is supported by ProofRed's excellent proofreading. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.
[*] this is for all $5+/months Github sponsors. If you are one of those and don't appear here, please e-mail me