Dataset Resources for Machine Learning
By Justin

In our Machine Learning (ML) course, we discuss the importance of good data in, good data out and this post is meant to be a long-term reference of finding quality datasets.
This is certainly not exhaustive but if you have found datasets that we are missing, please comment below.
Datasets Repos
- Kaggle (http://www.kaggle.com) has a lot of datasets as well as competitions for various ML/AI projects.
- UCI ML Repository (https://archive.ics.uci.edu/ml/datasets/). The University of California at Irvine has published all kinds of great datasets for data scientists to use.
Building Datasets
In the long run, we'll likely build our own datasets for problems that have yet to be solved or attempted. In that case, it's a really good idea to learn how to do web scraping as well as learning how to work with numpy (docs) and pandas (docs).