5 Minutes of Data Science - week 35
Highlights from August 29 to September 04
Foreword
Welcome to the new format of this newsletter. Now, it includes blog posts from the research teams at companies like OpenAI, DeepMind, Google and Amazon, the latest podcast and youtube episodes, the trending GitHub repositories related to Data Science and Machine Learning and the latest from the communities on Reddit!
This is highly experimental, but I hope you enjoy it. See you next week!
Pedro.
Blogs
- From motor control to embodied intelligence, by DeepMind
- Announcing the Patent Phrase Similarity Dataset, by Google AI
- DALL·E: Introducing Outpainting, by Open AI
- Model assesses the validity of tips offered in product reviews, by Amazon Science
- Janus framework lifts continual learning to the next level, by Amazon Science
- “I always knew that my main interest was in supply chain optimization”, by Amazon Science
Podcasts
- Fraudulent Amazon Reviewers, by Data Skeptic
- Privacy in the age of AI, by Practical AI
- Multimodal, Multi-Lingual NLP at Hugging Face with John Bohannon and Douwe Kiela - #589, by The TWIML AI
- Announcing Data Literacy Month, by DataFramed
Youtube
- Three more lessons from my Pop!!!, by StatQuest
Reddit
- Based on my nightmares, at r/Data Science (💬161)
- What was the most inspiring/interesting use of data science in a company you have worked at? It doesn’t have to save lives or generate billions (it’s certainly a plus if it does) but its mere existence made you say “HOT DAMN!” And could you maybe describe briefly its model?, at r/Data Science (💬157)
- WhatsApp chat analysis between me and a friend, at r/Data Science (💬75)
- [P] Apple pencil with the power of Local Stable Diffusion using Gradio Web UI running off a 3090, at r/Machine Learning (💬39)
- US Gov imposes export requirements on NVIDIA A100s and future H100s to China and Russia, at r/Machine Learning (💬191)
- [D] Senior research scientist at GoogleAI, Negar Rostamzadeh: “Can’t believe Stable Diffusion is out there for public use and that’s considered as ‘ok’!!!”, at r/Machine Learning (💬376)
- When to use linear regression?, at r/Ask Statistics (💬25)
- Teaching myself some statistics, shouldn’t question b be 97.5% to account for results more than 3 standard deviations above the mean?, at r/Ask Statistics (💬19)
- What to expect from a PhD in Statistics?, at r/Ask Statistics (💬4)
- Panoptic scene graph generation (PSG) Explained - A New Challenging Task for AI, at r/Latest in ML (💬1)
- Personalizing Text-to-Image Generation using Textual Inversion, at r/Latest in ML (💬2)
- A list of research papers and open source tools in Data centric AI, at r/Latest in ML (💬0)
Github jupyter notebook trends
Github python trends
- hlky/stable-diffusion-webui: Stable Diffusion web UI (1,155 stars this week)
- huggingface/diffusers: (1,002 stars this week)
- xinntao/Real-ESRGAN: Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration. (387 stars this week)
- python-poetry/poetry: Python dependency management and packaging made easy. (420 stars this week)
- crowsonkb/k-diffusion: Karras et al. (2022) diffusion models for PyTorch (64 stars this week)
- tiangolo/sqlmodel: SQL databases in Python, designed for simplicity, compatibility, and robustness. (489 stars this week)
- microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities (153 stars this week)
- iperov/DeepFaceLab: DeepFaceLab is the leading software for creating deepfakes. (387 stars this week)
- commaai/openpilot: openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 200 supported car makes and models. (97 stars this week)
- gradio-app/gradio: Create UIs for your machine learning model in Python in 3 minutes (302 stars this week)
- pytorch/torchdynamo: A Python-level JIT compiler designed to make unmodified PyTorch programs faster. (38 stars this week)
- 521xueweihan/HelloGitHub: (309 stars this week)