Data Privacy Day, data2vec, Fake Data
-
This Friday (01/28) is Data Privacy Day. Duke's Kenan Institute for Ethics is hosting a discussion with Neil Richards on Why Privacy Matters.
-
fly.io is now offering a 3GB instance of postgres for free.
-
As of January, there is now a faster way to read a CSV in Pandas.
-
Lu Pan recalls Ted Codd's seminal 1970 paper on relational databases, and the technology's staying power 50 years later.
On the other hand, how often do you write SQL queries and think, there has to be better way. I found this proposal for Pipelined Relational Query Language, a proposal for better SQL, to be compelling. And if you really want to get into it, you can read the lengthier discussion on Hacker News.
-
If it hasn't happened already, you will be hearing a lot about data2vec, a single high-performance self-supervised algorithm that works for multiple modalities, including speech, vision, and text. The claim: "It outperformed the previous best single-purpose algorithms for computer vision and speech and it is competitive on NLP tasks."
-
We've been talking a lot lately about fake data. SDV is a collection of libraries that provides the capability to learn input tables and then generates synthetic data with the same properties.
-
Who among us has not as some point stored secrets in env vars? Don't do it!
-
Here's a video on creating maps in Python based on OpenStreet Maps using the Prettymaps library.
-
Great questions to ask future managers (and maybe more important to think about as a manager).
-
The value of equity can be hard to quantify when joining a startup, and knowing what questions to ask is almost as hard. I know that's not directly related to data engineering, but it's an issue many of us come across at some point, and I thought this twitter thread outlined the questions and motivations as well as anything I've read.
-
SciPy 2022 is coming up.