Google’s Dataset Search Beta Version Is Out With Over 25 Million Public Datasets
Google’s Dataset Search, a feature announced in September 2018, is now out of beta. This new search feature allows researchers to search for over 25 million publicly available data points with many different filters.
Dataset Search’s is an initiative of Google through which the company wants to “enable easy access” to thousands of online data repositories housed on publisher sites, digital libraries, and even personal pages.
Dataset Search is a specialized search engine tailored for combing through millions of datasets. Underpinning Dataset Search is a format that providers can use to mark and make information more easily queried.
As per the announcement, the Dataset Search aims to create a data-sharing ecosystem that will encourage data publishers to follow best practices for data storage and publication and giving scientists a way to show the impact of their work through the citation of datasets that they have produced.
While being in Beta for almost a year and a half, Google says that they have worked very hard & also collected plenty of feedback from early adopters, which help to design out some new features in Dataset Search.
Google’s Dataset Search has now come up with many different features such as Dataset Search is now available on mobile and now you can also now filter search results according to the type of data you want to work on. The results can be further filtered out to show only images, text, tables, price, etcetera.
These all features housed in a toolbar underneath the search field, with results appearing in a sidebar.
For example, researchers can search datasets, which range from pretty small ones that tell you how many cats there were in the Netherlands from 2010 to 2018 to large annotated audio and image sets, to check their hypotheses or train and test their machine learning models. The tool currently indexes 25 million datasets.
Meanwhile, geographic information can be charted on a map, while the company notes significant improvement to the quality of dataset descriptions and the organizations that publish them.
Anybody who publishes data can make their datasets discoverable in Dataset Search by using an open standard (schema.org) to describe the properties of their dataset on their own web page.
Dataset Search also gives us a snapshot of the data out there on the Web.
Natasha Noy, a research scientist at Google AI who helped create Google’s Dataset Search, said: “most data repositories have been very responsive and come in great quality and that the engine’s launch meant older scientific institutions are now taking publishing metadata more seriously.”
Noy highlighted that it is possible and encouraged for those holding on to a particular dataset to make the information discoverable through Google’s tool by using an open standard, called schema.org, to describe the properties of their dataset on their web page.
Google says the corpus covered by the search engine almost 25 million datasets is only a fraction of datasets on the web, but a significant one all the same. The largest topics indexed are geosciences, biology, and agriculture, and the most common queries include education, weather, cancer, crime, soccer, and dogs.
The United States leads in the number of open government datasets available, with more than 2 million. And the most popular data formats? Tables you can find more than 6 million of them on Dataset Search.