#416: quantum of sollazzo – 6 April 2021
The data newsletter by @puntofisso.
This photo is going around. It depicts what Google Translate does when asked to translate a text in Hungarian, which is a genderless language in terms of grammar. What’s happening is pretty remarkable: it translates each sentence into English, a language with genders, and assigns them a gender according to what looks like bias. As I’m writing this newsletter, Google Translate still does it.
There are several levels of discussions that apply here. “Is Google Translate biased?” is the tip of the iceberg: the algorithm of translation is based on ingesting a lot of examples of translations. So the question becomes, in fact, two questions:
- Were those examples selected in a biased way?
- Where the examples themselves biased to begin with?
The difference is subtle. In the first case, the responsibility lies with the data scientists who selected the examples. But I can also easily imagine a situation in which centuries of texts that told the stories of men more than they did of women can have an impact on how we build translation algorithms today.
It doesn’t end here. If you read the thread, there is also an intriguing suggestion that this might be a piece of poetry, created precisely to elicit a reflection on gender discrimination in speakers of a genderless language.
I don’t know if that’s true or not, and my knowledge of Hungarian is less than zero, but this possible depiction triggers a question on how we expect an AI-driven translator to deal with the subtlety of such cases, where context is so important that sometimes not even humans can understand it (and for a bit of Internet subculture, read about Poe’s Law if you’ve never heard about the concept).
Speaking of AI, take some time to delve into this long interactive read by the New York Times about facial recognition and its legal and ethical limits.
You IT folks might want to take a look at this job advert: news verification outlet Bellingcat is hiring an IT engineer.
Let me also welcome EDJNet, our sponsor for the next 4 issues. I’m really happy to have their sponsorship, as EDJNet is an organisation I deeply respect.
Your links are below the sponsored box.
‘till next week,
sponsored content from EDJNet
We believe in the potential of data journalism in fostering innovation in newsrooms
and presenting new perspectives on issues that affect citizens and institutions.
The European Data Journalism Network
was born precisely to sustain the growth of data journalism in the European media landscape. We do so by producing data-driven stories
on a weekly basis, and putting them at the disposal of any media organisation by using an open license
We also promote data literacy and mutual learning
through tutorials, webinars, and explanations of methodologies and techniques. A series of original tools
make it easier for journalists to work with data, regardless of their level of acquaintance with them.
We also have a weekly newsletter
Long-term death trends
Colin Angus, of ScHARR – Sheffield Alcohol Research Group at University of Sheffield, tweets a few speculative ideas on ONS death data, which shows how we’re now in a situation where deaths are actually lower than the long-term average. There are some interesting points in the thread that would deserve some deeper investigation, such as the impact of gender, or the alleged lower rates of flu deaths.
Steer through the Suez Canal
“Navigating the Suez Canal is a high-stress, complicated feat that requires master piloting skills. To demonstrate, we worked with Master Mariner Andy Winbow and Captain Yash Gupta to produce this simulated passage. Try your hand at traversing one of the most highly trafficked nautical thoroughfares in the world.”
As much as it’s non-scientific (and openly so), this piece of interactive journalism by the CNN is good fun (and kinda mind opening).
If you’re into containers, you should probably also take a look at this article by DataWrapper.
1 big thing: Streamers chase current events
“Documentaries were the fastest-growing genre on streaming last year”, according to Axios. Unsurprisingly, horror and drama experienced a sizeable reduction in the COVID-19 year, but the dip in access to comedy is puzzling.
COVID-19 Vaccination Tracker
Excellent dataviz work by Reuters Graphics.
Movable walls, flexible living spaces, air filtration systems, curved walls, and more – quite a few interesting ideas (which, to be honest, might be good even in a non-pandemic world) in this article by the South China Morning Post.
Data Documentation Woes? Here’s a Framework
“Introducing a framework to help data teams build a documentation-first culture”.
Another interesting piece by Prukalpa Sankar of Atlan.
Satellite Imagery is Not Becoming a Commodity
A problem I’ve been feeling unconsciously aware of, but unable to really explain, is the odd variability of satellite imagery in terms of coverage and quality. This issue of Joe Morrison’s “On closer look” newsletter explains quite a bit of it.
Tools & Tutorials
Counting points in polygons in QGIS
“QGIS has a nice points in polygon tool” and it’s pretty easy to use.
Transport & Environment
More and more trains crossing European borders
As someone who moved from Italy to the UK via train (long story…), this article by Gianluca De Feo and Lorenzo Ferrari for OBC Transeuropa is really close to home.
“A dense network of cross-border rail connections cuts through the continent, and it’s set to expand even further in the coming years thanks to new infrastructure and the birth of the European single rail market. However, there are still profound differences between the central and peripheral regions of Europe.“
Grant funding versus the climate crisis
Charlotte Ravenscroft and David Kane take a data-driven look at climate-related grant funding, using data sources including 360 Giving’s GrantNav. They address three questions:
- Which funders are already working on climate?
- Which organisations are being funded?
- Where could new grant funding make the biggest difference?
The Naked Truth
The Pudding shows “how the names of 6,816 complexion products can reveal bias in beauty”.
The Bob Ross Virtual Art Gallery
I must admit I had no idea who Bob Ross was and had never heard of his TV show “The Joy of Painting” but apparently someone loved him enough to collect loads of data about the paintings he painted and make an interactive dataviz with it.
“Clickclickclick.click reveals the browser events used to monitor our online behaviour.”
Support this newsletter & spread the word
Become a GitHub Sponsor. It costs about the price of a coffee per month, and you’ll get an Open Data Rottweiler sticker (and other stuff).
If you’re a supporter of this newsletter, thanks a lot for your support. Share this e-mail with a friend, or via social media.
quantum of sollazzo is supported by my GitHub Sponsors and Buy Me A Coffee supporters, and by ProofRed’s excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.