Contributing New Variables#

AI-generated content

This page was generated by AI and has not been fully reviewed by a human. Content may be inaccurate or incomplete. If you find any issues, please create an issue on the GitHub repository.

๐Ÿš€ Contribute to CORR-Vars

Help expand the CORR-Vars variable catalog! Follow this step-by-step guide to add new clinical variables that benefit the entire research community.

Overview#

Contributing new variables to CORR-Vars involves a structured development workflow that ensures quality, reproducibility, and proper integration. This tutorial walks you through the complete process from idea to merged pull request.

๐Ÿ“‹ Prerequisites

  • Access to the CORR-Vars GitHub repository

  • Development environment set up (see dev-environment-setup)

  • Basic familiarity with Git and GitHub workflows

  • Understanding of the variable types (see Custom Variables Guide)

Development Workflow Overview#

Step 1: Create a GitHub Issue#

๐ŸŽฏ Why Start with an Issue?

GitHub issues help track variable requests, avoid duplicate work, and provide a place for community discussion about the clinical relevance and implementation approach.

Check for Existing Issues

Before creating a new issue, search existing issues to avoid duplicates:

  1. Visit the CORR-Vars GitHub repository

  2. Click on the Issues tab

  3. Search for keywords related to your variable (e.g., โ€œlactateโ€, โ€œSOFAโ€, โ€œmechanical ventilationโ€)

Create a New Issue

If no existing issue covers your variable, create a new one:

Title: Add [Variable Name] variable
Example: "Add blood lactate clearance variable"

Issue Template:

## Variable Request: [Variable Name]

### Clinical Context
- **Purpose**: Brief description of clinical use case
- **Population**: Target patient population (ICU, hospital, specific conditions)
- **Evidence**: Reference to literature or clinical guidelines if available

### Technical Requirements
- **Variable Type**: Static/Dynamic, Derived/Native
- **Data Sources**: Which database tables/sources contain the data
- **Dependencies**: Any variables this depends on
- **Time Constraints**: Relevant time windows (admission, 24h, etc.)

### Expected Output
- **Data Type**: Numeric/Boolean/Categorical
- **Units**: Expected units of measurement
- **Range**: Expected value ranges
- **Example**: Sample output for a few patients

### Additional Information
- **Priority**: High/Medium/Low
- **Complexity**: Simple aggregation/Complex calculation/Database extraction
- **Timeline**: When this variable is needed

Example Issue:

Real Example: Blood Lactate Clearance
## Variable Request: Blood Lactate Clearance

### Clinical Context
- **Purpose**: Calculate lactate clearance as a marker of shock resolution
- **Population**: ICU patients with elevated lactate levels
- **Evidence**: Strong evidence for prognostic value in septic shock (Nguyen et al., Crit Care Med 2004)

### Technical Requirements
- **Variable Type**: Derived Static
- **Data Sources**: Existing blood_lactate variable
- **Dependencies**: blood_lactate (dynamic variable)
- **Time Constraints**: First 6-24 hours of ICU stay

### Expected Output
- **Data Type**: Numeric (percentage)
- **Units**: Percentage improvement
- **Range**: -100% to +100% (negative = worsening)
- **Example**: Patient with initial lactate 4.0 mmol/L, 6h lactate 2.0 mmol/L โ†’ 50% clearance

### Additional Information
- **Priority**: High (frequently requested by researchers)
- **Complexity**: Medium (requires time-based aggregation)
- **Timeline**: Needed for upcoming sepsis study

Step 2: Set Up Development Environment and Create Feature Branch#

๐Ÿ“š Development Environment Setup

If you havenโ€™t set up your development environment yet, follow the detailed setup guide:

Clone and Setup (if first time)

# Clone the repository
git clone https://github.com/CUB-CORR/corr-vars.git
cd corr-vars

# Install in development mode
pip install -e .

# Install development dependencies
pip install pytest jupyter black ruff

Create Feature Branch

# Ensure you're on main and up to date
git checkout main
git pull origin main

# Create feature branch (use descriptive name referencing issue)
git checkout -b feature/add-lactate-clearance-variable

# Alternative naming patterns:
# git checkout -b feature/issue-123-lactate-clearance
# git checkout -b variable/blood-lactate-clearance

๐ŸŒฟ Branch Naming Conventions

Use descriptive branch names that reference the GitHub issue:

  • feature/add-[variable-name]

  • variable/[clinical-concept]

  • feature/issue-[number]-[description]

Step 3: Explore and Prototype in Jupyter Notebook#

๐Ÿ”ฌ Exploration Phase Goals

  • Understand the data structure and quality

  • Test different calculation approaches

  • Validate results against clinical expectations

  • Document any edge cases or limitations

Start Jupyter for Exploration

# Navigate to your development directory
cd /path/to/corr-vars

# Start Jupyter notebook
jupyter lab

Create Exploration Notebook

Create a new notebook: exploration/[variable-name]-development.ipynb

Exploration Template:

# Variable Development: Blood Lactate Clearance
# GitHub Issue: #123
# Developer: [Your Name]
# Date: [Today's Date]

import polars as pl
import pandas as pd
from corr_vars import Cohort
from corr_vars.sources.aggregation import DerivedStatic
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Load a test cohort
cohort = Cohort(
    obs_level="icu_stay",
    load_default_vars=False,
    sources={
        "cub_hdp": {
            "database": "db_hypercapnia_prepared",
            "password_file": True,
            "filters": "_d1"  # Small dataset for testing
        }
    }
)

print(f"Test cohort size: {len(cohort.obs)} patients")

Data Exploration:

# 2. Add required base variables
cohort.add_variable("blood_lactate")

# 3. Explore the data structure
lactate_data = cohort.obsm["blood_lactate"]
print(f"Lactate measurements: {len(lactate_data)} records")
print(f"Patients with lactate: {lactate_data['icu_stay_id'].n_unique()}")
print(f"Value range: {lactate_data['value'].min()} - {lactate_data['value'].max()}")

# 4. Check data quality
print(f"Missing values: {lactate_data['value'].null_count()}")
print(f"Negative values: {len(lactate_data.filter(pl.col('value') < 0))}")

# 5. Sample data inspection
sample_patient = lactate_data['icu_stay_id'].unique()[0]
patient_data = lactate_data.filter(pl.col('icu_stay_id') == sample_patient)
print(f"Sample patient {sample_patient}:")
print(patient_data.to_pandas())

Prototype the Calculation:

# 6. Prototype lactate clearance calculation
def calculate_lactate_clearance_prototype(lactate_df):
    """
    Prototype function to calculate lactate clearance.

    Clearance = (Initial - Final) / Initial * 100
    """

    # Group by patient and calculate first/last values in first 24 hours
    result = lactate_df.filter(
        pl.col("recordtime") <= pl.col("recordtime").min().over("icu_stay_id") + pl.duration(hours=24)
    ).group_by("icu_stay_id").agg([
        pl.col("value").first().alias("initial_lactate"),
        pl.col("value").last().alias("final_lactate"),
        pl.col("recordtime").first().alias("initial_time"),
        pl.col("recordtime").last().alias("final_time"),
        pl.count().alias("n_measurements")
    ]).with_columns([
        # Calculate clearance percentage
        ((pl.col("initial_lactate") - pl.col("final_lactate")) / pl.col("initial_lactate") * 100)
        .alias("lactate_clearance_24h")
    ])

    return result

# Test the prototype
clearance_result = calculate_lactate_clearance_prototype(lactate_data)
print("Lactate clearance results:")
print(clearance_result.to_pandas().head())

Validate Results:

# 7. Validate results
# Check for reasonable ranges
clearance_stats = clearance_result["lactate_clearance_24h"].describe()
print("Clearance statistics:")
print(clearance_stats)

# Plot distribution
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
clearance_result.to_pandas()["lactate_clearance_24h"].hist(bins=30)
plt.title("Lactate Clearance Distribution")
plt.xlabel("Clearance (%)")

plt.subplot(1, 2, 2)
plt.scatter(clearance_result.to_pandas()["initial_lactate"],
            clearance_result.to_pandas()["lactate_clearance_24h"])
plt.xlabel("Initial Lactate (mmol/L)")
plt.ylabel("Clearance (%)")
plt.title("Clearance vs Initial Lactate")
plt.tight_layout()
plt.show()

# Identify potential edge cases
extreme_cases = clearance_result.filter(
    (pl.col("lactate_clearance_24h") < -50) | (pl.col("lactate_clearance_24h") > 100)
)
print(f"Extreme clearance values: {len(extreme_cases)} cases")
if len(extreme_cases) > 0:
    print(extreme_cases.to_pandas())

Refine the Implementation:

# 8. Refine based on exploration findings
def calculate_lactate_clearance_v2(var, cohort):
    """
    Refined lactate clearance calculation for CORR-Vars.
    """

    lactate_var = var.required_vars["blood_lactate"]
    lactate_data = lactate_var.data

    # Calculate clearance with improved logic
    result = lactate_data.group_by("icu_stay_id").agg([
        # Get first and last measurements in first 24h
        pl.col("value").first().alias("initial_lactate"),
        pl.col("value").last().alias("final_lactate"),
        pl.count().alias("n_measurements")
    ]).filter(
        # Only include patients with at least 2 measurements
        pl.col("n_measurements") >= 2
    ).with_columns([
        # Calculate clearance with bounds checking
        pl.when(pl.col("initial_lactate") > 0)
        .then(
            ((pl.col("initial_lactate") - pl.col("final_lactate")) / pl.col("initial_lactate") * 100)
            .clip(-200, 200)  # Reasonable bounds
        )
        .otherwise(None)
        .alias("lactate_clearance_24h")
    ]).select(["icu_stay_id", "lactate_clearance_24h"])

    return result

# Test refined version
print("Testing refined calculation...")
# [Include test code here]

Step 4: Implement Clean Code in Configuration Files#

๐Ÿงน Clean Implementation Goals

  • Minimal, production-ready code

  • Proper error handling

  • Clear documentation

  • Follows project conventions

Based on your exploration, implement the clean version in the appropriate configuration files.

For Simple Variables (No Custom Function Needed)

Most variables can be implemented using only vars.json without custom Python code:

File: src/corr_vars/sources/cub_hdp/mapping/vars.json

{
  "variables": {
    "first_lactate_value": {
      "type": "aggregation",
      "base_var": "blood_lactate",
      "select": "!first value",
      "dynamic": false,
      "description": "First blood lactate measurement during ICU stay"
    },
    "max_lactate_24h": {
      "type": "aggregation",
      "base_var": "blood_lactate",
      "select": "!max value",
      "tmin": "icu_admission",
      "tmax": ["icu_admission", "+24h"],
      "dynamic": false,
      "description": "Maximum blood lactate in first 24 hours of ICU"
    }
  }
}

File: src/corr_vars/sources/aggregation/vars.json

{
  "variables": {
    "shock_index_admission": {
      "type": "derived_static",
      "requires": ["admission_heart_rate", "admission_sbp"],
      "expression": "admission_heart_rate / admission_sbp",
      "cleaning": {"value": {"low": 0.1, "high": 5.0}},
      "dynamic": false,
      "description": "Shock index calculated from admission vital signs"
    }
  }
}

For Complex Variables (Custom Function Required)

For complex calculations that cannot be expressed as simple aggregations or expressions:

Step 4a: Add to vars.json

{
  "variables": {
    "lactate_clearance_24h": {
      "type": "complex",
      "requires": ["blood_lactate"],
      "dynamic": false,
      "py_ready_polars": true,
      "description": "Blood lactate clearance percentage over first 24 hours of ICU stay"
    }
  }
}

Step 4b: Implement Function in variables.py

File: src/corr_vars/sources/cub_hdp/mapping/variables.py

def lactate_clearance_24h(var, cohort):
    """
    Calculate blood lactate clearance over the first 24 hours of ICU stay.

    Clearance is calculated as: (Initial - Final) / Initial * 100

    Args:
        var: Variable object containing metadata and required variables
        cohort: Cohort object with access to patient data

    Returns:
        polars.DataFrame: DataFrame with icu_stay_id and lactate_clearance_24h columns

    Notes:
        - Requires at least 2 lactate measurements within 24h of ICU admission
        - Initial lactate must be > 0 to calculate meaningful clearance
        - Results are bounded between -200% and +200% to handle outliers
        - Missing or insufficient data results in null values
    """

    try:
        # Get lactate data
        lactate_var = var.required_vars["blood_lactate"]
        if lactate_var.data is None:
            raise ValueError("No lactate data available")

        lactate_data = lactate_var.data

        # Calculate 24h window from ICU admission for each patient
        cohort_times = cohort.obs.select(["icu_stay_id", "icu_admission"])

        # Join with lactate data and filter to 24h window
        windowed_data = lactate_data.join(
            cohort_times, on="icu_stay_id", how="inner"
        ).filter(
            (pl.col("recordtime") >= pl.col("icu_admission")) &
            (pl.col("recordtime") <= pl.col("icu_admission") + pl.duration(hours=24))
        )

        # Calculate clearance
        result = windowed_data.group_by("icu_stay_id").agg([
            pl.col("value").first().alias("initial_lactate"),
            pl.col("value").last().alias("final_lactate"),
            pl.count().alias("n_measurements")
        ]).filter(
            # Require at least 2 measurements
            pl.col("n_measurements") >= 2
        ).with_columns([
            # Calculate clearance with proper bounds
            pl.when(pl.col("initial_lactate") > 0)
            .then(
                ((pl.col("initial_lactate") - pl.col("final_lactate")) /
                 pl.col("initial_lactate") * 100).clip(-200, 200)
            )
            .otherwise(None)
            .alias("lactate_clearance_24h")
        ])

        # Return only required columns
        return result.select(["icu_stay_id", "lactate_clearance_24h"])

    except Exception as e:
        # Log error and return empty result with correct schema
        print(f"Error calculating lactate clearance: {e}")
        return pl.DataFrame({
            "icu_stay_id": [],
            "lactate_clearance_24h": []
        })

๐Ÿ”ง When to Use variables.py vs vars.json Only

Use vars.json only when:

  • Simple aggregations (first, last, max, min, mean, count)

  • Basic expressions involving arithmetic operations

  • Standard time window filtering

Use variables.py when:

  • Complex multi-step calculations

  • Custom business logic or clinical rules

  • Advanced data transformations

  • Error handling for edge cases

  • Integration of multiple data sources

Step 5: Verify Everything Works#

โœ… Testing Checklist

Thoroughly test your implementation before creating a pull request.

Create a Test Script

Create test_[variable_name].py in your development directory:

"""
Test script for lactate clearance variable
Run this before submitting PR to ensure everything works
"""

from corr_vars import Cohort
import polars as pl

def test_lactate_clearance():
    """Test the new lactate clearance variable"""

    print("๐Ÿงช Testing lactate clearance variable...")

    # 1. Create test cohort
    cohort = Cohort(
        obs_level="icu_stay",
        load_default_vars=False,
        sources={
            "cub_hdp": {
                "database": "db_hypercapnia_prepared",
                "password_file": True,
                "filters": "_d1"  # Small test dataset
            }
        }
    )

    print(f"โœ“ Test cohort created: {len(cohort.obs)} patients")

    # 2. Add the new variable
    try:
        cohort.add_variable("lactate_clearance_24h")
        print("โœ“ Variable added successfully")
    except Exception as e:
        print(f"โœ— Error adding variable: {e}")
        return False

    # 3. Check results
    if "lactate_clearance_24h" not in cohort.obs.columns:
        print("โœ— Variable not found in cohort.obs")
        return False

    clearance_data = cohort.obs["lactate_clearance_24h"]
    n_valid = clearance_data.drop_nulls().len()
    n_total = len(clearance_data)

    print(f"โœ“ Results: {n_valid}/{n_total} patients have clearance values")

    # 4. Validate data quality
    if n_valid > 0:
        stats = clearance_data.drop_nulls().describe()
        print(f"โœ“ Value range: {clearance_data.min():.1f}% to {clearance_data.max():.1f}%")

        # Check for reasonable values
        extreme_count = clearance_data.filter(
            (pl.col("lactate_clearance_24h") < -200) |
            (pl.col("lactate_clearance_24h") > 200)
        ).len()

        if extreme_count > 0:
            print(f"โš ๏ธ  Warning: {extreme_count} extreme values found")
        else:
            print("โœ“ All values within reasonable range")

    print("๐ŸŽ‰ Testing completed successfully!")
    return True

if __name__ == "__main__":
    success = test_lactate_clearance()
    if success:
        print("\nโœ… Ready to create pull request!")
    else:
        print("\nโŒ Fix issues before creating pull request")

Run the Test

python test_lactate_clearance.py

Manual Verification

# Quick manual check in Jupyter or Python shell
from corr_vars import Cohort

cohort = Cohort(obs_level="icu_stay", sources={"cub_hdp": {"filters": "_d1"}})
cohort.add_variable("lactate_clearance_24h")

# Check a few patients manually
print(cohort.obs.select(["icu_stay_id", "lactate_clearance_24h"]).head())

Step 6: Create Pull Request#

๐Ÿ”„ Pull Request Best Practices

A well-structured pull request makes review faster and increases the likelihood of acceptance.

Commit Your Changes

# Add your changes
git add src/corr_vars/sources/cub_hdp/mapping/vars.json
git add src/corr_vars/sources/cub_hdp/mapping/variables.py  # if needed

# Commit with descriptive message
git commit -m "Add lactate clearance variable (closes #123)

- Implements 24-hour lactate clearance calculation
- Requires minimum 2 measurements for reliability
- Handles edge cases with proper bounds (-200% to +200%)
- Tested on sample cohort with good data quality"

# Push to your feature branch
git push origin feature/add-lactate-clearance-variable

Create Pull Request on GitHub

  1. Visit the CORR-Vars repository

  2. Click โ€œCompare & pull requestโ€ (should appear after pushing)

  3. Fill out the pull request template:

PR Template:

## Add [Variable Name] Variable

Closes #[issue-number]

### Summary
Brief description of what this variable calculates and its clinical relevance.

### Changes Made
- [ ] Added variable definition to `vars.json`
- [ ] Implemented calculation function in `variables.py` (if needed)
- [ ] Tested on sample cohort
- [ ] Verified data quality and reasonable value ranges

### Variable Details
- **Type**: Static/Dynamic, Derived/Native
- **Dependencies**: List of required variables
- **Output**: Data type and expected range
- **Clinical Use**: Brief clinical context

### Testing
- [ ] Manual testing completed
- [ ] Edge cases handled
- [ ] Performance acceptable on large cohorts
- [ ] Documentation/comments added

### Review Checklist
- [ ] Code follows project style guidelines
- [ ] Variable name is descriptive and follows naming conventions
- [ ] Function includes proper docstring
- [ ] Error handling implemented
- [ ] No breaking changes to existing functionality

Example Pull Request:

Complete PR Example
## Add Blood Lactate Clearance Variable

Closes #123

### Summary
Implements blood lactate clearance calculation as a prognostic marker for septic shock patients. Calculates the percentage change in lactate levels over the first 24 hours of ICU stay.

### Changes Made
- [x] Added `lactate_clearance_24h` to `vars.json`
- [x] Implemented calculation function in `variables.py`
- [x] Tested on sample cohort (n=150 patients)
- [x] Verified 89% of patients with lactate data have valid clearance values

### Variable Details
- **Type**: Derived Static
- **Dependencies**: `blood_lactate` (dynamic variable)
- **Output**: Numeric percentage (-200% to +200%)
- **Clinical Use**: Prognostic marker in sepsis, shock resolution monitoring

### Testing
- [x] Manual testing completed on _d1 filter cohort
- [x] Edge cases: handles single measurements, zero/negative lactate
- [x] Performance: <2 seconds on 10k patient cohort
- [x] Comprehensive docstring and error handling added

### Additional Notes
Formula: `(Initial_Lactate - Final_Lactate) / Initial_Lactate * 100`
- Requires minimum 2 lactate measurements within 24h
- Uses first and last measurements in the time window
- Returns null for insufficient data

Step 7: Trigger Unit Tests with Interactive Auth#

๐Ÿ”— Automated Testing System

CORR-Vars uses automated unit tests to ensure new variables donโ€™t break existing functionality and work correctly across different scenarios.

Wait for Bot Comment

After creating your pull request, an automated bot will post a comment with an interactive authentication link. This typically appears within 1-2 minutes:

๐Ÿค– **CORR-Vars Test Bot**

Thanks for your contribution! To run the automated tests, please click the link below to authenticate:

๐Ÿ”— **[Click here to start unit tests](https://auth.corr-vars.charite.de/pr/123/auth)**

This will run the full test suite including:
โœ… Unit tests for all existing variables
โœ… Integration tests with your new variable
โœ… Performance benchmarks
โœ… Data quality checks

Click the Authentication Link

  1. Click the authentication link in the bot comment

  2. Log in with your Charitรฉ credentials

  3. Authorize the test run for your pull request

  4. Tests will start automatically (typically take 10-15 minutes)

Monitor Test Progress

The bot will update the PR with test progress:

๐Ÿƒโ€โ™‚๏ธ **Tests Running...**

Current Status:
โœ… Code style checks (passed)
โœ… Unit tests (passed)
๐Ÿ”„ Integration tests (running...)
โณ Performance benchmarks (queued)

Test Results

Tests will complete with one of these outcomes:

๐ŸŽ‰ **All Tests Passed!**

โœ… Code style: All checks passed
โœ… Unit tests: 847/847 passed
โœ… Integration tests: All variables work correctly
โœ… Performance: New variable adds <1s overhead
โœ… Data quality: No issues detected

Your PR is ready for review! ๐Ÿš€
โŒ **Some Tests Failed**

โœ… Code style: All checks passed
โŒ Unit tests: 2/847 failed
โœ… Integration tests: All variables work correctly
โš ๏ธ  Performance: New variable adds 15s overhead (threshold: 10s)

Please review the failed tests and update your code.

**Failed Tests:**
- test_lactate_clearance_edge_cases: Division by zero error
- test_lactate_clearance_performance: Timeout on large cohort

Fix Test Failures (if needed)

If tests fail, examine the error messages and update your code:

# Make fixes based on test feedback
git add -u
git commit -m "Fix edge case handling for zero lactate values"
git push origin feature/add-lactate-clearance-variable

# Tests will automatically re-run on the updated PR

Step 8: Request Review and Merge#

๐Ÿ‘ฅ Code Review Process

Code review ensures quality, shares knowledge, and catches issues before merge. Be patient and responsive to feedback!

Tag Reviewers

Once tests pass, tag appropriate reviewers in a comment:

@mthiele @nkronenberg Tests are passing! This lactate clearance variable is ready for review.

Key points for review:
- Clinical validation: Formula matches literature standard
- Edge case handling: Tested with zero/negative lactate values
- Performance: <2s overhead on 10k patient cohorts
- Documentation: Full docstring with clinical context

Respond to Review Feedback

Reviewers may request changes or ask questions:

Example Review Feedback

Reviewer Comment:

โ€œThanks for this contribution! The implementation looks solid. A few suggestions:

1. Could you add validation for extremely high initial lactate values (>20 mmol/L)? These might be measurement errors.

2. Consider using 6-hour clearance as an additional option, as some studies show this is more predictive.

3. Minor: The variable name could be more specific - maybe `lactate_clearance_24h` instead of just `lactate_clearance`?โ€

Address Feedback:

# Make requested changes
# Edit variables.py to add validation and rename variable

git add -u
git commit -m "Address review feedback:

- Add validation for lactate >20 mmol/L
- Rename to lactate_clearance_24h for clarity
- Add TODO for 6-hour clearance variant"

git push origin feature/add-lactate-clearance-variable

Final Approval and Merge

Once reviewers approve:

  1. Maintainer merges your pull request

  2. Variable becomes available in the next release

  3. GitHub issue closes automatically

  4. Your contribution is live! ๐ŸŽ‰

๐ŸŽŠ Congratulations!

Your variable is now part of CORR-Vars and available to researchers worldwide! Youโ€™ve contributed to advancing clinical research with real-world data.

Post-Merge Follow-up#

Clean Up Your Local Environment

# Switch back to main and update
git checkout main
git pull origin main

# Delete your feature branch (optional)
git branch -d feature/add-lactate-clearance-variable

Monitor Usage and Feedback

  • Watch for any issues reported with your variable

  • Consider contributing documentation or examples

  • Think about related variables that could be added

Share Your Success

  • Add your contribution to your CV/portfolio

  • Share with your research team

  • Consider presenting at department meetings

Common Pitfalls and Tips#

โš ๏ธ Common Mistakes to Avoid

Learn from othersโ€™ experiences to avoid these common issues:

๐Ÿ› Data Quality Issues
  • Problem: Not handling missing/invalid data

  • Solution: Always validate inputs and handle edge cases

  • Example: Check for null values, negative measurements, extreme outliers

โฑ๏ธ Performance Problems
  • Problem: Slow calculations on large cohorts

  • Solution: Use efficient Polars operations, avoid loops

  • Example: Use .group_by() instead of patient-by-patient processing

๐Ÿ“ Poor Documentation
  • Problem: Unclear variable purpose or calculation

  • Solution: Write comprehensive docstrings with clinical context

  • Example: Include formula, units, expected ranges, clinical use

๐Ÿงช Insufficient Testing
  • Problem: Edge cases not discovered until production

  • Solution: Test with diverse patient populations and data scenarios

  • Example: Test with single measurements, extreme values, missing data

Pro Tips for Success:

๐Ÿ’ก Expert Tips

  1. Start Simple: Begin with basic aggregations before complex calculations

  2. Clinical Validation: Verify results match clinical expectations

  3. Performance First: Optimize for speed from the beginning

  4. Document Everything: Future users (including yourself) will thank you

  5. Ask for Help: Engage with the community early and often

  6. Iterative Development: Get feedback on design before full implementation

Additional Resources#

๐Ÿ“š Helpful Links

Development Tools:

  • Code Style: Follow PEP 8, use black for formatting

  • Testing: Write unit tests for complex functions

  • Documentation: Use clear docstrings and type hints

  • Version Control: Make atomic commits with descriptive messages

๐Ÿค Join the Community

Become an active contributor to the CORR-Vars ecosystem:

  • Contribute Variables: Start with this tutorial

  • Improve Documentation: Help other researchers learn

  • Report Issues: Identify bugs and suggest improvements

  • Share Experience: Present your work at conferences

  • Mentor Others: Help new contributors get started

โ€”

Happy contributing! Your clinical expertise combined with this development workflow will help advance medical research worldwide. ๐Ÿš€