CORR-Vars Logo

CORR-Variables

Streamlining clinical research with real-world data

πŸš€ What is CORR-Vars?

CORR-Variables is a Python package for extracting and analyzing data from the CharitΓ© Outcomes Research Repository (CORR).

It functions as a high-level connector on top of the Hadoop-based Health Data Lake (HDL), preprocessing raw clinical data into clinically meaningful, quality-checked variables to streamline research with real-world data.

πŸ₯ Clinical Focus

Pre-defined clinical variables validated by medical experts

⚑ High Performance

Built on Polars for fast processing of large datasets

πŸ”— Easy Integration

Simple API that works with existing analysis workflows

Quick Start#

The CORR-Vars package is pre-installed and regularly updated on the IMI server:

# Connect to IMI server: s-c01-imi-app01.charite.de
# Activate the CORR-Vars environment
conda activate /data02/projects/icurepo/.pkg/env10

πŸ” Access Required

If the conda environment doesn’t work, ask Patrick Heeren to be added to the miniconda-users group.

For local development (requires GitHub access):

pip install git+https://github.com/cub-corr/corr-vars.git

⚠️ Access Required

This only works if you have access to the private GitHub repository.

Your First CORR Cohort#

Get started in under 5 minutes:

# Import the main class
from corr_vars import Cohort

# Create your first cohort
cohort = Cohort(
    obs_level="icu_stay",
    sources={"cub_hdp": {"database": "db_hypercapnia_prepared", "password_file": True}}
)

# Add clinical variables
cohort.add_variable("age_on_admission")
cohort.add_variable("blood_sodium")

# View your data
print(f"Cohort: {len(cohort.obs)} patients")
print(cohort.obs.head())

🎯 Next Steps

Documentation Structure#

πŸ“š Learning Resources
  • Getting Started Tutorial - Your first analysis in 30 minutes

  • Custom Variables Guide - Create your own clinical variables

  • Contributing Guide - Add variables to the community catalog

  • Troubleshooting - Solutions for common issues

Tutorials and Getting Started
πŸ”§ API Documentation
  • Cohort Class - Main interface for building cohorts

  • Variable Types - Native, derived, and aggregation variables

  • Data Sources - CUB-HDP, ReprodicU, and more

  • Legacy Interface - Pandas compatibility layer

Cohort
πŸ“– Complete Table of Contents

Learning Resources

Core Architecture#

πŸ₯ Observation Levels

Choose your analysis unit:

  • ICU Stay - Individual intensive care episodes

  • Hospital Stay - Complete hospitalization periods

  • Procedure - Specific surgical/medical procedures

_images/cv_obs_levels.png
πŸ“Š Variable Types

Rich clinical data hierarchy:

  • Native - Direct database extractions

  • Derived - Computed from existing variables

  • Static - Single values per observation

  • Dynamic - Time-series measurements

_images/cv_var_hierarchy.png

πŸ” Explore Available Variables

Browse our 300+ pre-defined clinical variables in the interactive Variable Explorer

Real-World Example#

Here’s how researchers use CORR-Vars for clinical studies:

# Build an ICU sepsis cohort
cohort = Cohort(obs_level="icu_stay", sources={"cub_hdp": {"database": "db_hypercapnia_prepared"}})


# Add a static variable
cohort.add_variable("sofa_score_imputed")

# Add time-series biomarkers
cohort.add_variable("blood_lactate")
cohort.add_variable("blood_creatinine")

# Apply inclusion criteria
cohort.include_list([
    {"variable": "age_on_admission", "operation": ">= 18", "label": "Adults"},
    {"variable": "sofa_score_imputed", "operation": ">= 2", "label": "Organ dysfunction"}
])

# Generate publication-ready summary
table1 = cohort.tableone(groupby="inhospital_death")
print(f"Study cohort: {len(cohort.obs)} patients")

πŸ“ˆ Publication Ready

CORR-Vars concepts are quality-checked by attending physicians at CharitΓ© Berlin before being used for:

  • Critical care outcomes research

  • Machine learning model development

  • Health services research

  • Quality improvement studies

Community & Support#

πŸ› Found a Bug?

Report issues or request features on GitHub

https://github.com/cub-corr/corr-vars/issues
πŸ’¬ Need Help?

Check our Troubleshooting Guide guide or contact the team

🀝 Want to Contribute?

Add new variables to help the research community

Contributing New Variables

β€”

πŸ₯ Developed at CharitΓ© Berlin

Advancing clinical research through innovative data science tools

Development Team:

…and the entire CORR team πŸ™

πŸ“Š Project Stats

  • πŸ₯ Version: 0.5.0

  • πŸ“… Active Development: Since September 2024

  • πŸ“ˆ Publications: 10+ active projects pending publication

  • πŸ‘₯ Users: Research teams across CharitΓ© departments with a focus on critical care outcomes research

  • πŸ”— GitHub: CORR-Vars Repository