Ilya Zaslavsky

and 3 more

The EarthCube Data Discovery Studio (DDStudio) integrates several technical components into an end-to-end data discovery and exploration system. Beyond supporting dataset search across multiple data sources, it lets geoscientists explore the data using Jupyter notebooks; organize the discovered datasets into thematic collections which can be shared with other users; edit metadata records and contribute metadata describing additional datasets; and examine provenance and validate automated metadata enhancements. DDStudio provides access to 1.67 million metadata records from 40+ geoscience repositories, which are automatically enhanced and exposed via standard interfaces in both ISO-19115 and in schema.org markup; the latter can be used by commercial search engines (Google, Bing) to index DDStudio content. For geoscience end users, DDStudio provides a custom Geoportal-based user interface which enables spatio-temporal, faceted, and full-text search, and provides access to additional functions listed above. Key project accomplishments over the last year include: - User interface improvements, based on design advice from a Science Gateways Community Institute (SGCI) usability team, who conducted user interviews, performed usability testing, and analyzed a dozen of other search portals to identify the most useful features. This work resulted in a streamlined user interface, particularly in presentation of search results and in management of thematic collections. - The earlier effort to publish DDStudio content using schema.org markup resulted in significant usage increase. With over 900K records indexed by Google, nearly half of the roughly 1000 unique users per month are now accessing DDStudio via referrals from Google. - The added ability to harvest and process JSON-LD metadata makes it possible to integrate EarthCube GeoCodes content into DDStudio, and work with this content using DDStudio’s user interface. - New application domains include joint work with the library community, and interoperation with DataMed, a similar system that indexes 2.3 million biomedical datasets.