Quick Start

CORR-Variables

Streamlining clinical research with real-world data

🚀 What is CORR-Vars?

CORR-Variables is a Python package for extracting and analyzing data from the Charité Outcomes Research Repository (CORR).

It functions as a high-level connector on top of the Hadoop-based Health Data Lake (HDL), preprocessing raw clinical data into clinically meaningful, quality-checked variables to streamline research with real-world data.

🏥 Clinical Focus

Pre-defined clinical variables validated by medical experts

⚡ High Performance

Built on Polars for fast processing of large datasets

🔗 Easy Integration

Simple API that works with existing analysis workflows

Quick Start#

🖥️ IMI Server (Recommended)

The CORR-Vars package is pre-installed and regularly updated on the IMI server:

# Connect to IMI server: s-c01-imi-app01.charite.de
# Activate the CORR-Vars environment
conda activate /data02/projects/icurepo/.pkg/env10

🔐 Access Required

If the conda environment doesn’t work, ask Patrick Heeren to be added to the miniconda-users group.

💻 Local Installation

For local development (requires GitHub access):

pip install git+https://github.com/cub-corr/corr-vars.git

⚠️ Access Required

This only works if you have access to the private GitHub repository.

Your First CORR Cohort#

Get started in under 5 minutes:

# Import the main class
from corr_vars import Cohort

# Create your first cohort
cohort = Cohort(
    obs_level="icu_stay",
    sources={"cub_hdp": {"database": "db_hypercapnia_prepared", "password_file": True}}
)

# Add clinical variables
cohort.add_variable("age_on_admission")
cohort.add_variable("blood_sodium")

# View your data
print(f"Cohort: {len(cohort.obs)} patients")
print(cohort.obs.head())

🎯 Next Steps

New to CORR-Vars? → Start with our Tutorials and Getting Started
Need a specific variable? → Browse the Variable Explorer
Want to contribute? → Read our Contributing New Variables guide

Documentation Structure#

📚 Learning Resources

Getting Started Tutorial - Your first analysis in 30 minutes
Custom Variables Guide - Create your own clinical variables
Contributing Guide - Add variables to the community catalog
Troubleshooting - Solutions for common issues

Tutorials and Getting Started

🔧 API Documentation

Cohort Class - Main interface for building cohorts
Variable Types - Native, derived, and aggregation variables
Data Sources - CUB-HDP, ReprodicU, and more
Legacy Interface - Pandas compatibility layer

Cohort

Core Architecture#

🏥 Observation Levels

Choose your analysis unit:

ICU Stay - Individual intensive care episodes
Hospital Stay - Complete hospitalization periods
Procedure - Specific surgical/medical procedures

📊 Variable Types

Rich clinical data hierarchy:

Native - Direct database extractions
Derived - Computed from existing variables
Static - Single values per observation
Dynamic - Time-series measurements

🔍 Explore Available Variables

Browse our 300+ pre-defined clinical variables in the interactive Variable Explorer

Real-World Example#

Here’s how researchers use CORR-Vars for clinical studies:

# Build an ICU sepsis cohort
cohort = Cohort(obs_level="icu_stay", sources={"cub_hdp": {"database": "db_hypercapnia_prepared"}})


# Add a static variable
cohort.add_variable("sofa_score_imputed")

# Add time-series biomarkers
cohort.add_variable("blood_lactate")
cohort.add_variable("blood_creatinine")

# Apply inclusion criteria
cohort.include_list([
    {"variable": "age_on_admission", "operation": ">= 18", "label": "Adults"},
    {"variable": "sofa_score_imputed", "operation": ">= 2", "label": "Organ dysfunction"}
])

# Generate publication-ready summary
table1 = cohort.tableone(groupby="inhospital_death")
print(f"Study cohort: {len(cohort.obs)} patients")

📈 Publication Ready

CORR-Vars concepts are quality-checked by attending physicians at Charité Berlin before being used for:

Critical care outcomes research
Machine learning model development
Health services research
Quality improvement studies

Community & Support#

🐛 Found a Bug?

Report issues or request features on GitHub

https://github.com/cub-corr/corr-vars/issues

💬 Need Help?

Check our Troubleshooting Guide guide or contact the team

🤝 Want to Contribute?

Add new variables to help the research community

Contributing New Variables

—

🏥 Developed at Charité Berlin

Advancing clinical research through innovative data science tools

Development Team:

…and the entire CORR team 🙏

📊 Project Stats

🏥 Version: 0.5.0
📅 Active Development: Since September 2024
📈 Publications: 10+ active projects pending publication
👥 Users: Research teams across Charité departments with a focus on critical care outcomes research
🔗 GitHub: CORR-Vars Repository