The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I’ve been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you’re welcome to become a friend via the links below.
The most clicked link last week was LearnGPT. I think it’s the first time for Quantum that an AI-related link is the most viewed.
We have some great sponsored content this week: Ed Freyfogle, organiser of location-based service meetup Geomob, co-host of the Geomob podcast, and co-founder of the OpenCage, has offered to introduce a set of points around the topic of geodata. His first entry starts a few paragraphs below on the importance of geodata and the difference between open and closed data.
‘till next week,
Become a Friend of Quantum of Sollazzo from $1/month →
If you enjoy this newsletter, you can support it by becoming a GitHub Sponsor. Or you can Buy Me a Coffee. I'll send you an Open Data Rottweiler sticker.
You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue.
“Although most of this money must technically be disclosed to the public, the way that information is reported, stored and displayed almost guarantees the records will not be widely scrutinised.“
As others have noted, great that the data is searchaable, but… if the problem was that the data wasn’t accessible, it’s somewhat disappointing that this project hasn’t made it so.
(Disclaimer: just choosing the highest ranking parliamentarian, no politics, etc etc.)
“Areas that were largely spared in 2020 are now among those with the largest numbers of job cuts.“
The Wall Street Journal takes a look at Silicon Valley redundancies, which worryingly are higher than during COVID, a bad sign for the state of the economy.
Analysis by the Pew Research Centre shows growing representation.
Speaking about gender ratios, here’s a look to the percentage of women in Lula’s cabinet in Brazil.
The Washington Post looks at the House Speaker election.
Infographic guru Maarten Lambrecht’s own list.
In the first episode of the Uncharted Territory podcast by the European Data Journalism Network, we learn about the state of prison life during the pandemic.
“Data collected by 12 newsrooms in the European Data Journalism Network, coordinated by Deutsche Welle, shows that the effort to keep the infection under control in detention institutions came at a high cost. Prisoners found themselves more isolated than ever: visits and education activities were suspended, vaccination campaigns were delayed, while overcrowding put the most vulnerable at risk.“
Quantum Six Questions graduate Lisa Hornung is an open source and open data star. She has now created a directory of no code data and design tools.
“Graphic Walker is a different type of open-source alternative to Tableau. It allows data scientists to analyze data and visualize patterns with simple drag-and-drop operations.“
You can try it here with a very basic UI, or implement your own.
A tutorial on how to use ClickHouse, a software library that allows SQL queries to run on simple csv or parquet files. Although it’s a commercial product, it also comes with an open source version.
Why is open geodata important? What’s the difference between open and closed data?
Proprietary geodata from private services like Google are widely used, but come with licenses that severely restrict how you can use the data. Restrictions include:
Open data, like that returned by the OpenCage geocoding API, means:
As a final bonus, because the data is free (no cost), our service is also much more affordable.
Have a project that will need geocoding? See our geocoding buyer’s guide for an overview of all the factors to consider when choosing between geocoding services.
“This week I heard about the Microsoft Road Detections dataset, published as open data under the ODbL licence, and thought I’d take a look.
Turns out it’s a slightly unusual format: a tsv file containing a column for country codes and a column containing GeoJSON objects. Also, each file is pretty huge and contains data for multiple countries.
So… it’s not an easy drag & drop into QGIS.
But it’s the holidays, I’ve got some time on my hands, so I spent a bit of tinkering time figuring out how to pull out just the data for one country and convert it to a friendlier format.
I wrote a short 3 line bash script that downloads the Oceania data, extracts 4.6M records for Australia, and converts it to a GeoPackage so it works nicely in QGIS.
Please feel free to use the script, which should be easy to repurpose for any country covered by this data. Source Microsoft data.“
From this LinkedIn status update.
“Our goal is to provide a visual aid for students, professionals, and anyone preparing for a technical interview to better understand the underlying concepts of Machine Learning.“
A quick recipe.
The MOJ’s Robin Linacre explains why “bulk open data is best served as statically-hosted parquet files, with csv equivalents. It’s faster, easier to use and cheaper to host than alternatives such as custom APIs.“
A “Semantic Search Engine for ArXiv.“
“A Free, Open, and Documented Forecast API. An unprocessed weather forecast API, built to be fully Dark Sky compatible.“
It’s based on NOAA models and is run by a researcher, who will hopefully receive enough donations to keep it running.
This is a great tutorial on how to represent 3D in pure CSS, and the rest of the website is equally amazing at introducing different CSS concepts.
As in the most favourited code snippets from CodePen.
Another personal dataviz by Quantum graduate Erin Davis, generated using Kindle data.
“Starting in 2014, I logged every slice of pizza I ate in New York City on the Instagram account NYC Slice. The results shown below are collected from 464 slices. Over an eight-year period the average price of a plain slice increased from $2.52 to $3.00. This calculation excludes dollar slices.“
One of those nice, simple, powerful interactive charts that Nathan Yau is famous for.
“Edward Tian, a 22-year-old senior at Princeton University, has built an app to detect whether text is written by ChatGPT, the viral chatbot that’s sparked fears over its potential for unethical uses in academia.“
“At this point, knowing what AI can’t do is more useful than knowing what it can”.
quantum of sollazzo is supported by ProofRed’s excellent proofreading. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.
Supporters* casperdcl and iterative.ai Jeff Wilson Fay Simcock Naomi Penfold
[*] this is for all $5+/months Github sponsors. If you are one of those and don’t appear here, please e-mail me
Match with a licensed therapist for convenient online sessions via BetterHelp
Start your therapy journey with BetterHelp, the online platform that has helped over three million people match with licensed therapists.