Google has teamed up with GitHub to add a new open dataset on Google BigQuery, a low cost analytics data warehouse service in the cloud, so that users can get a snapshot of more than 2.8 million open source GitHub repositories. This will bring new possibilities for data analytics to BigQuery. Here's what you need to know.
Google makes a number of public datasets available for BigQuery users to analyse using SQL queries for insights on trends on a particular topic. The BigQuery service, which works with Google Storage as part of the Google Cloud Platform, offers up to one terabyte of data processing each month for free.
With the new GitHub repository dataset, users can look at trends in the open source software space directly in BigQuery. The dataset will be updated regularly. According to Google developer advocate Filipe Hoffa:
"Thanks to our new collaboration with GitHub, you'll have access to analyse the source code of almost 2 billion files with a simple (or complex) SQL query. This will open the doors to all kinds of new insights and advances that we're just beginning to envision."
For a list of sample queries you can do on this new dataset in BigQuery, head over to this handy help page that Google has put together.
To read more on the announcement, head over to the Google Cloud Platform blog.