#457: quantum of sollazzo – 15 February 2022
The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I’ve been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you’re welcome to become a friend via the links below.
Every week I include a six-question interview with an inspiring data person. This week, I speak with David Kane, a data-driven researcher who I had the pleasure to work with, and who’s the absolute guru of third sector data.
‘till next week,
This week’s edition is sponsored by OpenCage
OpenCage operates a highly available, simple to use, worldwide, geocoding API based on open data like OpenStreetMap. With libraries for python, R, MATLAB, Stata, and over 30 other programming languages it's easy to dive in. Whether you just need to geocode one dataset, or you have an on-going need, we offer cost-effective, flat-fee packages, and all the benfits of Open Data.
Try the API now on the OpenCage demo page.
Six questions to...
David is a freelance researcher.
What is your daily data work like and what tools do you use?
Tell me about a data project that you're proud of...
A lot of my work is looking at data about charities and other non-profits, so my typical day would start with firing up a Jupyter notebook and running some queries about a particular set of charities, or merging some datasets together to produce analysis.
More often than not I’ll be working with the Charity Commission’s register of charities. I run a site called findthatcharity.uk
which has a big database full of lots of data about charities - that database is often where I start with a piece of work.
I have VSCode open most of the day, it’s where I run notebooks and write Python scripts to process data. I’m not sure what I’d do without the Pandas data analysis library in Python.
...and a data project that someone else did and you're jealous of.
If I say "dataset", you think of...
I work a lot with 360Giving, an amazing charity that supports organisations to openly publish grants data, and helps people to use it to improve charitable giving.
This became even more important during the pandemic. To help people understand more about how funders responded to the crisis, in December we released a piece of research looking at grants made since March 2020
, and how funders made sure that the charities they supported could continue to provide vital services.
It was a real team effort - the team at 360Giving worked with a wide variety of funders to help them publish high quality data quickly - sometimes for the first time.
Let’s get stuck in! What do I already know that I can link this to, to add extra context or make it more useful?
Give someone new to data a tip or lesson you wish you'd learned earlier.
Always look at your data. It’s amazing how easy it is, especially when you use something like Python or R, to go straight to analysis or attempt to clean the data without seeing what’s actually in it first. Excel has lots of flaws, but the best thing about it is that the data is always there right in front of you - so you can see patterns & oddities. So even if I’m not using Excel I try to make sure I’m looking at the actual data as much as I can.
Data is or data are...
Data is. It's a mass noun - like water. There's not really such a thing as a "datum", a single point of data that exists in isolation - it's the context and integration with other parts that turns a number or a piece of text into "data". So you can only ever refer to the whole thing. Or something.
Beijing Winter Olympics Medal Count
There are many trackers of Olympic Medals, but I particularly like this one from Bloomberg this week.
You’re doing Italian food all wrong, say Italians
According to this YouGov poll, my spaghetti with cheese, ham, and mustard are probably not acceptable.
Interactive: How much of your country’s electricity is renewable?
“About 30 percent of the world’s electricity comes from renewables, including hydropower, solar and wind among others.“
Interactive quiz aside, the article gives a broad picture of renewable use.
Tools & Tutorials
Mother Jones’s Storytelling Tools
A lot of them.
“Simutrans is a freeware and open-source transportation simulator. Your goal is to establish a successful transport company. Transport passengers, mail and goods by rail, road, ship, and even air. Interconnect districts, cities, public buildings, industries and tourist attractions by building a transport network you always dreamed of.“
Basically, SimCity for transport.
Population around a point
“Human population within a distance, from any point in the world.”, using the Global Human Settlement Layer data.
(via Chris Weston)
“For bureaucratic reasons, a colleague of mine had to print, sign, scan and send by email a high number of pages. To save trees, ink, time, and to stick it to the bureaucrats, I wrote this script.“
The utter idiocy of requests like these trigger creative solutions, it would appear.
Free KMZ to SHP to GeoJSON Online Converter
Although it can only do files up to 10MB, this tool might be useful.
Cheatsheet: expansion for continuous scales
For ggplot2. Yes, there’s a typo in the title.
How to: make your first interactive map in R
Here’s the tutorial and here’s a map of London house prices made with it.
(via Steve Parks)
How I made the viral map
“Six Questions” graduate Erin Davies explains how she made the viral map (featured in last week’s quantum of sollazzo) of where Americans were born (see below) and how to adapt it to a different dataset.
How we communicate uncertainty in polls and election forecasts
The Economist data team explains well how they approach the topic of uncertainty in polls, using a technique that allows them “to translate the polls into probabilities, which […] should give readers a better understanding of the uncertainty in the election in general, and the polls in particular.“
Dataviz, Data Analysis, & Interactive
Can You Gerrymander Your Party to Power?
“To help you understand [gerrymandering] better, we created an imaginary state called Hexapolis, where your only mission is to gerrymander your party to power.“
(via Soph Warnes’ Fair Warning)
Words Known Better by Males Than Females, and Vice Versa
“The chart below visualizes words disproportionately known by one sex and not the other.“
On Observable, so it’s editable, using data coming from a paper. Not a lot of it, it must be said, but enough to give you a hunch of cultural norms. Oddly, I knew the word taffeta because of Young Frankenstein.
A Taxonomy for AI / Data for Good
“This piece is the culmination of a series of explorations seeking to map and make sense of the landscape of efforts to apply data for good.”
By DataKind co-founder Jake Porway. Nice chart, too.
UK to pilot world-leading approach to improve ethical adoption of AI in healthcare
Work thing klaxon. Led by my colleague Brhmie Balaram, in the NHS AI Lab we’re starting to work on piloting Algorithmic Impact Assessments (AIAs) in healthcare, as a way to become aware of, and ultimately eradicate, biases in algorithms.
The essence of the web, every morning in your inbox
Tens of thousands of busy people start their day with their personalized digest by Refind. Sign up for free and pick your favorite topics and thought leaders. Subscribe here.
quantum of sollazzo is supported by ProofRed’s excellent proofreading. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.
casperdcl and iterative.ai
[*] this is for all $5+/months Github sponsors. If you are one of those and don’t appear here, please e-mail me