CORR-Vars Logo

CORR-Variables

Streamlining clinical research with real-world data

๐Ÿš€ What is CORR-Vars?

CORR-Variables is a Python package for extracting and analyzing data from the Charitรฉ Outcomes Research Repository (CORR).

It functions as a high-level connector on top of the Hadoop-based Health Data Lake (HDL), preprocessing raw clinical data into clinically meaningful, quality-checked variables to streamline research with real-world data.

๐Ÿฅ Clinical Focus

Pre-defined clinical variables validated by medical experts

โšก High Performance

Built on Polars for fast processing of large datasets

๐Ÿ”— Easy Integration

Simple API that works with existing analysis workflows

Quick Start#

The CORR-Vars package is pre-installed and regularly updated on the IMI server:

# Connect to IMI server: s-c01-imi-app01.charite.de
# Activate the CORR-Vars environment
conda activate /data02/projects/icurepo/.pkg/env10

๐Ÿ” Access Required

If the conda environment doesnโ€™t work, ask Patrick Heeren to be added to the miniconda-users group.

For local development (requires GitHub access):

pip install git+https://github.com/cub-corr/corr-vars.git

โš ๏ธ Access Required

This only works if you have access to the private GitHub repository.

Your First CORR Cohort#

Get started in under 5 minutes:

# Import the main class
from corr_vars import Cohort

# Create your first cohort
cohort = Cohort(
    obs_level="icu_stay",
    sources={"cub_hdp": {"database": "db_hypercapnia_prepared", "conn_args": {"password_file": True}}}
)

# Add clinical variables
cohort.add_variable("age_on_admission")
cohort.add_variable("blood_sodium")

# View your data
print(f"Cohort: {len(cohort.obs)} patients")
print(cohort.obs.head())

๐ŸŽฏ Next Steps

Documentation Structure#

๐Ÿ“š Learning Resources
  • Getting Started Tutorial - Your first analysis in 30 minutes

  • Custom Variables Guide - Create your own clinical variables

  • Contributing Guide - Add variables to the community catalog

  • Troubleshooting - Solutions for common issues

Tutorials and Getting Started
๐Ÿ”ง API Documentation
  • Cohort Class - Main interface for building cohorts

  • Variable Types - Native, derived, and aggregation variables

  • Data Sources - CUB-HDP, ReprodicU, and more

  • Legacy Interface - Pandas compatibility layer

Cohort
๐Ÿ“– Complete Table of Contents

Learning Resources

Core Architecture#

๐Ÿฅ Observation Levels

Choose your analysis unit:

  • Patient - One row per unique patient (patient_id)

  • Hospital Stay - Complete hospitalization periods (case_id)

  • ICU Stay - Individual intensive care episodes (icu_stay_id)

  • Procedure - Specific surgical/medical procedures (procedure_id)

_images/cv_obs_levels.png
๐Ÿ“Š Variable Types

Rich clinical data hierarchy:

  • Native - Direct database extractions

  • Derived - Computed from existing variables

  • Static - Single values per observation

  • Dynamic - Time-series measurements

_images/cv_var_hierarchy.png

๐Ÿ” Explore Available Variables

Browse our 300+ pre-defined clinical variables in the interactive Variable Explorer

Real-World Example#

Hereโ€™s how researchers use CORR-Vars for clinical studies:

# Build an ICU sepsis cohort
cohort = Cohort(obs_level="icu_stay", sources={"cub_hdp": {"database": "db_hypercapnia_prepared"}})


# Add a static variable
cohort.add_variable("sofa_score_imputed")

# Add time-series biomarkers
cohort.add_variable("blood_lactate")
cohort.add_variable("blood_creatinine")

# Apply inclusion criteria
cohort.include_list([
    {"variable": "age_on_admission", "operation": ">= 18", "label": "Adults"},
    {"variable": "sofa_score_imputed", "operation": ">= 2", "label": "Organ dysfunction"}
])

# Generate publication-ready summary
table1 = cohort.tableone(groupby="inhospital_death")
print(f"Study cohort: {len(cohort.obs)} patients")

๐Ÿ“ˆ Publication Ready

CORR-Vars concepts are quality-checked by attending physicians at Charitรฉ Berlin before being used for:

  • Critical care outcomes research

  • Machine learning model development

  • Health services research

  • Quality improvement studies

Community & Support#

๐Ÿ› Found a Bug?

Report issues or request features on GitHub

https://github.com/cub-corr/corr-vars/issues
๐Ÿ’ฌ Need Help?

Check our Troubleshooting Guide guide or contact the team

๐Ÿค Want to Contribute?

Add new variables to help the research community

Contributing New Variables

โ€”

๐Ÿฅ Developed at Charitรฉ Berlin

Advancing clinical research through innovative data science tools

Development Team:

โ€ฆand the entire CORR team ๐Ÿ™

๐Ÿ“Š Project Stats

  • ๐Ÿฅ Version: 0.5.0

  • ๐Ÿ“… Active Development: Since September 2024

  • ๐Ÿ“ˆ Publications: 10+ active projects pending publication

  • ๐Ÿ‘ฅ Users: Research teams across Charitรฉ departments with a focus on critical care outcomes research

  • ๐Ÿ”— GitHub: CORR-Vars Repository