Aggregated Variables#

Aggregation variables allow you to create new variables by computing statistics or transformations from existing dynamic (time-series) variables. These are essential for creating meaningful clinical indicators from raw measurement data.

Overview#

The aggregation module provides several variable types:

  • NativeStatic: Simple aggregations of dynamic variables (e.g., first, last, mean, max)

  • DerivedStatic: Computed static variables based on expressions or custom functions

  • DerivedDynamic: Computed time-series variables from other variables

Variable Types#

NativeStatic Variables#

NativeStatic variables create single values from time-series data using aggregation functions.

Available Aggregation Functions:

  • !first [columns]: First recorded value

  • !last [columns]: Last recorded value

  • !mean [column]: Mean value

  • !median [column]: Median value

  • !max [column]: Maximum value

  • !min [column]: Minimum value

  • !count [column]: Count of non-null values

  • !any: True if any value exists

  • !closest(reference, offset, tolerance) [columns]: Value closest to reference time

Examples:

from corr_vars.sources.aggregation import NativeStatic
from corr_vars import Cohort

cohort = Cohort(obs_level="icu_stay", load_default_vars=False)

# First blood pressure measurement
first_bp = NativeStatic(
    var_name="first_blood_pressure",
    select="!first value",
    base_var="blood_pressure_sys"
)
cohort.add_variable(first_bp)

# Maximum heart rate during ICU stay
max_hr = NativeStatic(
    var_name="max_heart_rate",
    select="!max value",
    base_var="heart_rate"
)
cohort.add_variable(max_hr)

# Blood pressure closest to admission
admission_bp = NativeStatic(
    var_name="admission_blood_pressure",
    select="!closest(icu_admission, 0, 2h) value",
    base_var="blood_pressure_sys"
)
cohort.add_variable(admission_bp)

DerivedStatic Variables#

DerivedStatic variables compute new values using expressions or custom functions.

Expression-Based Variables:

from corr_vars.sources.aggregation import DerivedStatic

# Body Mass Index calculation
bmi_var = DerivedStatic(
    var_name="bmi",
    requires=["weight_on_admission", "height"],
    expression="weight_on_admission / (height / 100) ** 2"
)
cohort.add_variable(bmi_var)

# Hospital mortality
mortality_var = DerivedStatic(
    var_name="hospital_mortality",
    requires=["hospital_discharge", "death_timestamp"],
    expression="hospital_discharge >= death_timestamp"
)
cohort.add_variable(mortality_var)

Custom Function Variables:

For complex calculations, you can define custom functions in the variable mapping:

# Custom SOFA score calculation
sofa_var = DerivedStatic(
    var_name="sofa_score_admission",
    requires=[
        "first_creatinine", "first_bilirubin",
        "first_pao2_fio2", "first_platelets",
        "first_gcs", "first_map"
    ]
)
# Custom function would be defined in variables.py

DerivedDynamic Variables#

DerivedDynamic variables create new time-series from existing ones.

from corr_vars.sources.aggregation import DerivedDynamic

# PaO2/FiO2 ratio calculation
pf_ratio = DerivedDynamic(
    var_name="pf_ratio",
    requires=["blood_pao2_arterial", "vent_fio2"],
    cleaning={"value": {"low": 50, "high": 800}}
)
# Custom calculation function defined in variables.py

Time Constraints and Filtering#

All aggregation variables support time constraints:

# Values only from first 24 hours
early_lactate = NativeStatic(
    var_name="max_lactate_24h",
    select="!max value",
    base_var="blood_lactate",
    tmin="icu_admission",
    tmax=("icu_admission", "+24h")
)

# Values before a specific event
pre_intubation_spo2 = NativeStatic(
    var_name="last_spo2_before_intubation",
    select="!last value",
    base_var="spo2",
    tmax="first_intubation_dtime"
)

Filtering with WHERE Clauses#

Filter source data before aggregation:

# Only abnormal values
high_temp = NativeStatic(
    var_name="max_fever_temperature",
    select="!max value",
    base_var="body_temperature",
    where="value > 38.0"
)

# Specific medication doses
max_norepinephrine = NativeStatic(
    var_name="max_norepinephrine_dose",
    select="!max value",
    base_var="med_norepinephrine",
    where="!isin(description, ['Norepinephrine', 'Noradrenaline'])"
)

Advanced Examples#

Complex Clinical Indicators#

# Shock index calculation
shock_index = DerivedStatic(
    var_name="shock_index_admission",
    requires=["first_heart_rate", "first_blood_pressure_sys"],
    expression="first_heart_rate / first_blood_pressure_sys"
)

# APACHE II acute physiology score components
apache_temp = NativeStatic(
    var_name="apache_temperature",
    select="!closest(icu_admission, 0, 24h) value",
    base_var="body_temperature"
)

apache_map = NativeStatic(
    var_name="apache_mean_arterial_pressure",
    select="!closest(icu_admission, 0, 24h) value",
    base_var="blood_pressure_mean"
)

Outcome Variables#

# ICU length of stay
icu_los = DerivedStatic(
    var_name="icu_length_of_stay_days",
    requires=["icu_admission", "icu_discharge"],
    expression="(icu_discharge - icu_admission).dt.total_seconds() / 86400"
)

# Ventilator-free days
vent_free_days = DerivedStatic(
    var_name="ventilator_free_days_28",
    requires=["icu_admission", "last_extubation_dtime", "icu_discharge"],
    # Custom function required for complex logic
)

Quality Indicators#

# Number of blood pressure measurements per day
bp_frequency = NativeStatic(
    var_name="bp_measurements_per_day",
    select="!count value",
    base_var="blood_pressure_sys",
    tmin="icu_admission",
    tmax="icu_discharge"
)

# Time to first antibiotic
time_to_abx = NativeStatic(
    var_name="time_to_first_antibiotic_hours",
    select="!first recordtime",
    base_var="any_antibiotic_icu"
)

Best Practices#

  1. Use Appropriate Aggregation Functions: Choose the right function for your clinical question

  2. Set Time Constraints: Always specify appropriate tmin/tmax to avoid temporal biases

  3. Apply Cleaning Rules: Use cleaning parameters to filter out physiologically impossible values

  4. Document Clinical Rationale: Include clear variable names and documentation

  5. Validate Results: Always check aggregated values for clinical plausibility

Class Reference#