Skip to Main Content

Data Repositories: Choosing a data repository

Understanding your repository needs

Selecting the right repository for your data is a critical decision that impacts data discoverability, accessibility, and longevity. Before choosing a repository, consider:

Key Questions to Ask When Selecting a Repository
  • Data type and format: What kind of data do you have (clinical, genomic, imaging, survey, etc.)? Are there specialized repositories for this data type?
  • Size and complexity: How large are your datasets? Do they include multiple file types or complex relationships?
  • Discipline focus: Is your research within a specific medical specialty that has dedicated repositories?
  • Audience: Who needs to access this data? Fellow specialists, interdisciplinary researchers, or the general public?
  • Funder/journal requirements: Does your funder or target journal recommend specific repositories?
  • Access controls: Do you need to restrict access to sensitive data or implement a data use agreement?
  • Long-term preservation: How long does the data need to remain available?
  • Costs: Are there costs associated with depositing data, and if so, how will these be covered?

Repository Types

Domain-Specific Repositories for Medical Research

Domain-specific repositories are designed for particular types of data and often offer specialized features, controlled vocabularies, and community standards. Consider these when your data fits within their scope:

Data Type Example Repositories Special Features
Genomic/Molecular dbGaP, GenBank, GEO, PDB Standardized formats, integration with analysis tools
Imaging Cancer Imaging Archive, NeuroVault Visualization tools, specialized metadata
Clinical Trials ClinicalTrials.gov, Vivli Patient-level data sharing, managed access controls
Public Health CDC Wonder, HealthData.gov Population-level statistics, geographic mapping
Biospecimens BBMRI-ERIC Sample tracking, connection to physical specimens

Generalist Repositories

When specialized repositories don't exist for your data type or when you have diverse data types, generalist repositories provide flexible solutions:

Repository Storage Limits Costs Special Features Best For
Zenodo 50GB per dataset Free Integration with GitHub, concept DOIs Mixed file types, code + data
Figshare 20GB free accounts Free (basic), Figshare+ for larger datasets Private link sharing, metrics Publishable figures and datasets
Harvard Dataverse 2.5GB file size limit, 1TB total Free Version comparison Complex datasets with multiple versions
Dryad 300GB per dataset $150 one-time fee (unless covered by membership) Curation service Datasets related to publications
OSF 50GB for public projects Free Project management features Entire research projects with multiple components

 

Evaluation Criteria

Factors to consider when choosing a data repository:

Technical Considerations
  • Accepts your file formats and sizes
  • Provides sufficient storage space
  • Offers necessary access controls for sensitive data
  • Has version control capabilities if needed
  • Provides data preview/visualization features if applicable
Trust and Sustainability
  • Has certification (CoreTrustSeal, TRUST principles)
  • Clear preservation policies and succession plans
  • Stable funding model and institutional backing
  • Transparent governance structure
Discoverability and Access
  • Assigns persistent identifiers (DOIs)
  • Rich metadata schema to describe your data
  • Indexed by major search engines and data catalogs
  • Supports appropriate licensing options
  • Provides usage statistics and metrics
Compliance and Integration
  • Meets funder and journal requirements
  • Supports relevant data standards
  • Allows linking to related publications and other research outputs
  • Provides machine-readable metadata and APIs

No viable repository?

If you cannot find a suitable repository for your data:

  1. Consult with our library team for personalized guidance
  2. Consider a generalist repository with custom metadata
  3. Explore emerging repositories in your research area
  4. Investigate discipline-specific options by consulting with colleagues
  5. Document your repository selection process for transparency

Remember that repository selection should ideally be part of your data management planning at the beginning of your research project, not an afterthought when publication is imminent.

Contact the Library

Need help finding or using Downstate Library resources? We're here to help!

Email us: reference@downstate.edu or use our online form.