Skip to Main Content

Data Repositories: Data Discovery

Finding Research Data

Discovering existing datasets is a critical skill for both students and researchers. This section outlines the primary tools and strategies for locating relevant data for analysis or reuse.

Major Data Discovery Tools

General Data Search Engines
  • Google Dataset Search: A specialized search engine for datasets. Provides access to millions of datasets across thousands of repositories with filtering options by format, license, and topic.
  • DataCite Commons: Allows search across more than 27 million datasets with DOIs, providing detailed metadata and citation information.
  • Dimensions.ai: Integrated search platform that connects datasets with publications, grants, and clinical trials.
Health Sciences Data Catalogs
Repository Directories
  • re3data.org: Registry of Research Data Repositories with over 2,000 data repositories
  • FAIRsharing.org: Curated registry of data repositories, standards, and policies with domain filtering options
  • Open Access Directory: Lists disciplinary repositories including many in health sciences

Domain-Specific Dataset Sources

Clinical Research Data
Genomic & Molecular Data
Public Health & Epidemiology Data
  • CDC WONDER: Access to public health datasets from CDC
  • Global Health Data Exchange (GHDx): Catalog of health and demographic datasets
  • IHME Data: Institute for Health Metrics and Evaluation datasets on global health metrics
  • Data.CMS.gov: Official repository for Medicare and Medicaid data providing healthcare utilization, payment, quality metrics, and provider information for researchers and the public.

New York State and City Health Data 

Imaging Data

Strategies for Effective Data Discovery

Boolean Search Techniques
  • Combine search terms using operators (AND, OR, NOT)
  • Use quotation marks for exact phrases
  • Employ field-specific searches when available
  • Start broad, then refine with filters
From Literature to Data
  • Follow data availability statements in publications
  • Check supplementary materials sections
  • Use article DOIs to find linked datasets
  • Contact authors directly when data not readily available

Evaluating Dataset Quality and Relevance

Essential Assessment Criteria
  • Provenance: Who created the data and for what purpose?
  • Collection methodology: Were appropriate methods used?
  • Temporal relevance: When was the data collected and is it current enough?
  • Completeness: Are there significant gaps or missing variables?
  • Documentation quality: Are methods, variables, and limitations clearly described?
  • Ethical considerations: Was the data ethically collected with appropriate permissions?
Documentation and Metadata Review
  • Examine codebooks and data dictionaries
  • Review variables for consistency and clear definitions
  • Check for standardized terminology and classification systems
  • Look for protocol documents describing data collection methods
Example Data Quality Checklist

Provide students and researchers with a simple checklist to evaluate potential datasets:

  1. Is the dataset from a reputable source?
  2. Is there complete documentation including methods and limitations?
  3. Are there clear definitions for all variables?
  4. Is the sample size adequate for your analysis?
  5. Is the timeframe appropriate for your research question?
  6. Are there any concerning gaps or anomalies in the data?
  7. Is the format compatible with your analysis tools?
  8. Are the license terms suitable for your intended use?

Properly Citing Datasets

Dataset Citation Anatomy
  • Creator(s)
  • Publication year
  • Title of dataset
  • Publisher/repository
  • Version
  • Persistent identifier (preferably DOI)
  • Access date
Citation Styles for Datasets

Include examples in standard formats:

APA Style (7th edition)

Author, A. A., & Author, B. B. (Year). Title of data set (Version number) [Data set]. Publisher/Repository. DOI or URL

NLM/ICMJE Style

Author AA, Author BB. Title of dataset [dataset]. Version XX. Repository Name. Year. DOI

Contact the Library

Need help finding or using Downstate Library resources? We're here to help!

Email us: reference@downstate.edu or use our online form.