New machine learning datasets in 2021
If you do not want to receive these emails, you can opt out using the link at the bottom of the email.
New machine learning datasets on datasetlist.com
The list has been updated to include many significant releases from the first half of the year.
Among these are:
IBM CodeNet - A dataset for teaching AI to code, with 500M lines of code.
Facebook Casual Conversations - A new dataset for evaluating the fairness of computer vision and audio models.
Spotify Podcasts Dataset - A dataset of 100,000 episodes from different podcast shows on Spotify.
Mapillary Vistas 2.0 - An updated version with substantially increased granularity of semantic labels.
Facebook TextOCR - 1M high quality word annotations on TextVQA images.
And many others.
Annotation tools
The annotation tool list has received an overhaul.
The list now contains only open-source tools as there has been a proliferation of annotation companies each with their proprietary annotation tools so it made sense to just focus on the many free, open-source annotation tools being developed right now.
The list has also grown significantly thanks to all of your suggestions.
Thanks!
Thanks for all the feedback and letting me know about the new datasets and annotation tools out there!