Aggregated Variables#
Aggregation variables allow you to create new variables by computing statistics or transformations from existing dynamic (time-series) variables. These are essential for creating meaningful clinical indicators from raw measurement data.
Overview#
The aggregation module provides several variable types:
NativeStatic: Simple aggregations of dynamic variables (e.g., first, last, mean, max)
DerivedStatic: Computed static variables based on expressions or custom functions
DerivedDynamic: Computed time-series variables from other variables
Variable Types#
NativeStatic Variables#
NativeStatic variables create single values from time-series data using aggregation functions.
Available Aggregation Functions:
!first [columns]: First recorded value!last [columns]: Last recorded value!mean [column]: Mean value!median [column]: Median value!max [column]: Maximum value!min [column]: Minimum value!count [column]: Count of non-null values!any: True if any value exists!closest(reference, offset, tolerance) [columns]: Value closest to reference time
Examples:
from corr_vars.sources.aggregation import NativeStatic
from corr_vars import Cohort
cohort = Cohort(obs_level="icu_stay", load_default_vars=False)
# First blood pressure measurement
first_bp = NativeStatic(
var_name="first_blood_pressure",
select="!first value",
base_var="blood_pressure_sys"
)
cohort.add_variable(first_bp)
# Maximum heart rate during ICU stay
max_hr = NativeStatic(
var_name="max_heart_rate",
select="!max value",
base_var="heart_rate"
)
cohort.add_variable(max_hr)
# Blood pressure closest to admission
admission_bp = NativeStatic(
var_name="admission_blood_pressure",
select="!closest(icu_admission, 0, 2h) value",
base_var="blood_pressure_sys"
)
cohort.add_variable(admission_bp)
DerivedStatic Variables#
DerivedStatic variables compute new values using expressions or custom functions.
Expression-Based Variables:
from corr_vars.sources.aggregation import DerivedStatic
# Body Mass Index calculation
bmi_var = DerivedStatic(
var_name="bmi",
requires=["weight_on_admission", "height"],
expression="weight_on_admission / (height / 100) ** 2"
)
cohort.add_variable(bmi_var)
# Hospital mortality
mortality_var = DerivedStatic(
var_name="hospital_mortality",
requires=["hospital_discharge", "death_timestamp"],
expression="hospital_discharge >= death_timestamp"
)
cohort.add_variable(mortality_var)
Custom Function Variables:
For complex calculations, you can define custom functions in the variable mapping:
# Custom SOFA score calculation
sofa_var = DerivedStatic(
var_name="sofa_score_admission",
requires=[
"first_creatinine", "first_bilirubin",
"first_pao2_fio2", "first_platelets",
"first_gcs", "first_map"
]
)
# Custom function would be defined in variables.py
DerivedDynamic Variables#
DerivedDynamic variables create new time-series from existing ones.
from corr_vars.sources.aggregation import DerivedDynamic
# PaO2/FiO2 ratio calculation
pf_ratio = DerivedDynamic(
var_name="pf_ratio",
requires=["blood_pao2_arterial", "vent_fio2"],
cleaning={"value": {"low": 50, "high": 800}}
)
# Custom calculation function defined in variables.py
Time Constraints and Filtering#
All aggregation variables support time constraints:
# Values only from first 24 hours
early_lactate = NativeStatic(
var_name="max_lactate_24h",
select="!max value",
base_var="blood_lactate",
tmin="icu_admission",
tmax=("icu_admission", "+24h")
)
# Values before a specific event
pre_intubation_spo2 = NativeStatic(
var_name="last_spo2_before_intubation",
select="!last value",
base_var="spo2",
tmax="first_intubation_dtime"
)
Filtering with WHERE Clauses#
Filter source data before aggregation:
# Only abnormal values
high_temp = NativeStatic(
var_name="max_fever_temperature",
select="!max value",
base_var="body_temperature",
where="value > 38.0"
)
# Specific medication doses
max_norepinephrine = NativeStatic(
var_name="max_norepinephrine_dose",
select="!max value",
base_var="med_norepinephrine",
where="!isin(description, ['Norepinephrine', 'Noradrenaline'])"
)
Advanced Examples#
Complex Clinical Indicators#
# Shock index calculation
shock_index = DerivedStatic(
var_name="shock_index_admission",
requires=["first_heart_rate", "first_blood_pressure_sys"],
expression="first_heart_rate / first_blood_pressure_sys"
)
# APACHE II acute physiology score components
apache_temp = NativeStatic(
var_name="apache_temperature",
select="!closest(icu_admission, 0, 24h) value",
base_var="body_temperature"
)
apache_map = NativeStatic(
var_name="apache_mean_arterial_pressure",
select="!closest(icu_admission, 0, 24h) value",
base_var="blood_pressure_mean"
)
Outcome Variables#
# ICU length of stay
icu_los = DerivedStatic(
var_name="icu_length_of_stay_days",
requires=["icu_admission", "icu_discharge"],
expression="(icu_discharge - icu_admission).dt.total_seconds() / 86400"
)
# Ventilator-free days
vent_free_days = DerivedStatic(
var_name="ventilator_free_days_28",
requires=["icu_admission", "last_extubation_dtime", "icu_discharge"],
# Custom function required for complex logic
)
Quality Indicators#
# Number of blood pressure measurements per day
bp_frequency = NativeStatic(
var_name="bp_measurements_per_day",
select="!count value",
base_var="blood_pressure_sys",
tmin="icu_admission",
tmax="icu_discharge"
)
# Time to first antibiotic
time_to_abx = NativeStatic(
var_name="time_to_first_antibiotic_hours",
select="!first recordtime",
base_var="any_antibiotic_icu"
)
Best Practices#
Use Appropriate Aggregation Functions: Choose the right function for your clinical question
Set Time Constraints: Always specify appropriate tmin/tmax to avoid temporal biases
Apply Cleaning Rules: Use cleaning parameters to filter out physiologically impossible values
Document Clinical Rationale: Include clear variable names and documentation
Validate Results: Always check aggregated values for clinical plausibility