Artificial intelligence isn’t just for scary algorithms poised to take over our lives — it can also be a fun thing to play with, as we learned when we trained a computer to generate Lifehacker headlines. But you can’t play until you have some good data sets to start with.
Photo: Chris McGrath (Getty)
Fortunately, Gengo AI has rounded up some datasets for your next project, whether that’s a silly Twitter bot or the next self-driving car. (For real — check out this database of video clips of street signs, for example.)
Could your next project use 20,000 images of dogs? Tweets to airlines, already categorised as to whether the tweet is positive, negative, or neutral? (Surprise: they’re mostly negative.) Maybe you can do something fun with five million Yelp reviews.
Or perhaps you could feed these Jeopardy questions and answers into a neural net and ask it to put together a whole new Jeopardy game for you.
The roundup also includes several data repositories that are each a gold mine unto themselves:
- Kaggle has a huge variety of data sets, including superheroes, cryptocurrency markets and chest X-rays.
- Data.gov gathers data sets from US government agencies, including food recalls, complaints about banks and what hospitals charge for the most common procedures.
- The UK Data Center is a UK-centric repository of social, economic and population datasets.
These are all free data sets, available for whatever projects you can dream of. Perhaps you’d like to use some of this data to analyse social problems and make the world a better place. Or just goof around with bots, that’s fine too.