Skip to Main Content

Data Repositories: Intro to Data Repositories

What is a data repository?

Data repositories are specialized digital platforms designed to store, preserve, organize, and share research data. They serve as centralized locations where researchers can deposit their datasets, making them discoverable and accessible to the wider scientific community. These repositories employ standardized metadata schemas, persistent identifiers, and preservation protocols to ensure that data remains findable and usable over time.

In the medical and health sciences, data repositories play a crucial role in advancing knowledge by enabling:

  • Secondary analysis of existing datasets
  • Verification and reproduction of research findings
  • Combination of multiple datasets for meta-analyses
  • Reduction of duplicate data collection efforts
  • Preservation of valuable data for historical reference

FAIR Data Principles

The FAIR principles, introduced in 2016, provide a framework for optimizing the reuse of research data. These principles have been widely adopted by funding agencies, publishers, and research institutions:

  • Findable: Data should be easy to locate through robust metadata, clear identification, and indexing in searchable resources.
  • Accessible: Once found, data should be retrievable using standardized protocols, with clear conditions for access.
  • Interoperable: Data should use formal, accessible, shared vocabulary and include qualified references to other data.
  • Reusable: Data should be richly described with accurate attributes, clear usage licenses, and detailed provenance information.

 

Types of Data Repositories In Medical Research

Medical researchers have several options for data sharing, each serving different needs:

  • Domain-specific repositories: Specialized for particular data types or research areas (e.g., GenBank for genetic sequences, ClinicalTrials.gov for clinical trials)
  • Institutional repositories: Hosted by universities or research centers to preserve their research outputs
  • Generalist repositories: Accept data from various disciplines (e.g., Zenodo, Figshare, Harvard Dataverse)
  • Clinical data repositories: Specialized for sharing patient-level data with appropriate privacy protections (e.g., Vivli)

Why Data Sharing Matters in Medical Research

Data sharing has become increasingly important in medical research for several reasons:

Benefits to Science and Society
  • Accelerates discovery by allowing researchers to build upon existing work
  • Increases research transparency and accountability
  • Reduces research waste and unnecessary duplication
  • Enables new insights through data reanalysis and combination
  • Provides training resources for students and early-career researchers
Benefits to Individual Researchers
  • Increases citations and research impact
  • Creates opportunities for collaboration
  • Fulfills funder and journal requirements
  • Provides long-term preservation of research outputs
  • Demonstrates research integrity and transparency

Current Data Sharing Requirements

Many stakeholders in the research ecosystem now require or strongly encourage data sharing:
Funding Agencies
  • NIH: As of January 25, 2023, NIH has implemented a new Data Management and Sharing (DMS) Policy that applies to ALL grant applications or renewals that generate scientific data, not just those exceeding the previous $500,000 threshold. This represents a significant expansion of data sharing requirements, with researchers now needing to submit comprehensive Data Management and Sharing Plans with their applications.
  • NSF: Continues to require Data Management Plans (DMPs) for all grant applications since January 18, 2011. Every NSF proposal must include a supplementary document of no more than two pages labeled "Data Management Plan" - proposals lacking this document cannot be submitted.
  • Many private foundations (Gates Foundation, Wellcome Trust) have stringent data sharing requirements
Journals
  • Leading medical journals increasingly require data availability statements
  • Some journals require deposit of data in trusted repositories before publication
  • The International Committee of Medical Journal Editors (ICMJE) recommends data sharing for clinical trials
Institutions
  • Many universities have adopted open data policies
  • Institutional review boards increasingly consider data sharing plans during ethics reviews

Data Sharing Challenges in Medical Research

Despite its benefits, data sharing in medicine faces unique challenges:
  • Privacy concerns related to patient data
  • Intellectual property and commercialization considerations
  • Resource constraints for data preparation and documentation
  • Concerns about misinterpretation of complex datasets
  • Need for specialized knowledge to properly reuse certain data type

Follow Us!

Facebook logo

Contact the Library

Need help finding or using Downstate Library resources?
We're here to help!

Email us: reference@downstate.edu or use our online form.

Librarians are available

Mon-Fri from 9am to 5pm