Legacy Interface (Pandas-Based)#
π Migration Recommended
For new projects, use the Polars-native interface (from corr_vars import Cohort) which is 2-10x faster and more memory-efficient.
This legacy interface is provided for backward compatibility only.
Overview#
The legacy interface provides a Pandas-based wrapper around the new Polars-native CORR-Vars interface. It was created to maintain backward compatibility with existing code that uses Pandas DataFrame methods, allowing researchers to continue using their existing analysis pipelines without immediate rewrites.
Aspect |
Legacy Interface πΌ |
Polars-Native Interface β‘ |
|---|---|---|
Import |
|
|
Data Access |
|
|
Performance |
Slower (conversion overhead) |
2-10x faster |
Memory Usage |
Higher (dual storage) |
Lower (single storage) |
Syntax |
Familiar Pandas syntax |
Modern Polars expressions |
Recommendation |
Existing code migration |
New projects |
What is the Legacy Interface?#
Note
Architecture Overview
The legacy interface is a compatibility layer that bridges old and new:
.loc[], .groupby(), .head()
Automatic conversion
Fast & efficient
Key Features:
Seamlessly converts between Polars (internal) and Pandas (user-facing) representations
Preserves familiar Pandas methods like .loc[], .iloc[], .groupby()
Uses the new Polars backend internally for improved performance and stability
Maintains compatibility with existing analysis scripts and workflows
Using the Legacy Interface#
# Legacy interface (Pandas access)
from corr_vars.legacy_v1 import Cohort
cohort = Cohort(
obs_level="icu_stay",
database="db_hypercapnia_prepared",
password_file=True
)
# Pandas DataFrame access
print(type(cohort.obs)) # pandas.DataFrame wrapper
print(type(cohort.obsm)) # dict of pandas.DataFrame wrappers
# Polars-native interface (Recommended)
from corr_vars import Cohort
cohort = Cohort(
obs_level="icu_stay",
sources={"cub_hdp": {"database": "db_hypercapnia_prepared", "password_file": True}}
)
# Polars DataFrame access
print(type(cohort.obs)) # polars.DataFrame
print(type(cohort.obsm)) # dict of polars.DataFrame
π‘ Quick Start Guide
Step 1: Import the legacy interface
Step 2: Create your cohort with familiar parameters
Step 3: Use standard Pandas syntax for analysis
Pandas-Style Data Access:
# Static data access (exactly like Pandas)
print(cohort.obs.head()) # First 5 rows
print(cohort.obs.shape) # (n_rows, n_cols)
print(cohort.obs.columns.tolist()) # Column names
# Familiar Pandas indexing and filtering
adults = cohort.obs[cohort.obs["age_on_admission"] >= 18]
males = cohort.obs.loc[cohort.obs["sex"] == "M"]
specific_patient = cohort.obs.iloc[0]
# Pandas aggregation methods
summary = cohort.obs.groupby("sex").agg({
"age_on_admission": ["mean", "std"],
"inhospital_death": "sum"
})
Time-Series Data Access:
# Add dynamic variable
cohort.add_variable("blood_sodium")
# Access time-series data (Pandas DataFrame)
sodium_data = cohort.obsm["blood_sodium"]
print(type(sodium_data)) # <class 'LegacyObsmDataframe'> (behaves like pd.DataFrame)
# Familiar Pandas time-series operations
patient_data = sodium_data[sodium_data["icu_stay_id"] == "12345"]
daily_avg = sodium_data.groupby(sodium_data["recordtime"].dt.date)["value"].mean()
# Standard Pandas methods work
print(sodium_data.describe())
print(sodium_data.value_counts())
Column Assignment (Limited):
# Direct column assignment works for obs
cohort.obs["bmi_category"] = cohort.obs["weight"] / (cohort.obs["height"] / 100) ** 2
cohort.obs["is_elderly"] = cohort.obs["age_on_admission"] > 65
# Note: obsm DataFrames are read-only to prevent data corruption
# sodium_data["new_col"] = 1 # This will raise NotImplementedError
Pandas Method Compatibility#
β Full Pandas Compatibility
The legacy interface supports most common Pandas DataFrame methods out of the box!
.head(),.tail(),.info().describe(),.shape,.columns.nunique(),.value_counts().isnull(),.dtypes
.loc[],.iloc[],.at[],.iat[].query(), boolean indexing.filter(),.select_dtypes()
.groupby(),.pivot_table().merge(),.join().sort_values(),.drop().drop_duplicates()
.mean(),.median(),.std().corr(),.agg(),.apply().transform()
.resample(),.rolling().expanding()DateTime indexing
Direct plotting with matplotlib
Seaborn compatibility
Works with existing viz code
Limitations of the Legacy Interface#
Warning
Important Limitations to Consider
While the legacy interface maintains compatibility, it has several important limitations that may affect performance and functionality.
Detailed Limitation Analysis
Understanding these limitations will help you decide when to migrate to the Polars-native interface.
Performance Limitations#
# Legacy interface: Data conversion overhead
large_cohort = Cohort(obs_level="icu_stay", load_default_vars=True) # Slower
# Polars-native: Direct access, no conversion
from corr_vars import Cohort as PolarsCohort
fast_cohort = PolarsCohort(obs_level="icu_stay", load_default_vars=True) # Faster
Memory Overhead: Data is stored in Polars but converted to Pandas for access, requiring additional memory
Conversion Costs: Each access to
.obsor.obsmtriggers Polars β Pandas conversionLarge Dataset Issues: Very large cohorts may hit memory limits during conversion
Slower Operations: Pandas operations are generally slower than equivalent Polars operations
Functional Limitations#
# 1. Limited obsm modification
cohort.obsm["blood_sodium"]["new_column"] = 1 # NotImplementedError
# 2. No direct polars access
# cohort._obs.filter(pl.col("age") > 18) # Not recommended, internal API
# 3. Some advanced Polars features unavailable
# No lazy evaluation, no expression API
Read-Only obsm: Time-series DataFrames (
obsm) are read-only to prevent data corruptionNo Polars Expression API: Cannot use Polarsβ powerful expression syntax
No Lazy Evaluation: Cannot benefit from Polarsβ lazy evaluation optimizations
Limited Parallel Processing: Pandas operations are less optimized for parallel execution
Data Type Limitations#
# Some Polars data types don't translate perfectly to Pandas
# May lose precision or type information in edge cases
print(cohort.obs.dtypes) # May show different types than native Polars
Type Conversion Issues: Some Polars types may not translate perfectly to Pandas
Precision Loss: Potential precision loss in numeric conversions
Missing Value Handling: Different null/missing value semantics between libraries
Migration Guide: Legacy β Polars-Native#
Step 1: Update Imports
# Before (Legacy)
from corr_vars.legacy_v1 import Cohort
# After (Polars-native)
from corr_vars import Cohort
Step 2: Update Data Access Patterns
# Legacy Pandas syntax
adults = cohort.obs[cohort.obs["age_on_admission"] >= 18]
male_patients = cohort.obs.loc[cohort.obs["sex"] == "M"]
# Polars-native equivalent
adults = cohort.obs.filter(pl.col("age_on_admission") >= 18)
male_patients = cohort.obs.filter(pl.col("sex") == "M")
Step 3: Update Aggregations
# Legacy Pandas groupby
summary = cohort.obs.groupby("sex").agg({
"age_on_admission": ["mean", "std"],
"inhospital_death": "sum"
})
# Polars-native equivalent
summary = cohort.obs.group_by("sex").agg([
pl.col("age_on_admission").mean().alias("age_mean"),
pl.col("age_on_admission").std().alias("age_std"),
pl.col("inhospital_death").sum().alias("deaths")
])
Step 4: Update Time-Series Operations
# Legacy Pandas time-series
patient_data = cohort.obsm["blood_sodium"][
cohort.obsm["blood_sodium"]["icu_stay_id"] == "12345"
]
# Polars-native equivalent
patient_data = cohort.obsm["blood_sodium"].filter(
pl.col("icu_stay_id") == "12345"
)
Benefits of Migration:
2-10x Performance Improvement for most operations
Lower Memory Usage (no conversion overhead)
Better Type Safety and error handling
Access to Modern Features like lazy evaluation and expression API
Future-Proof Code as legacy interface may be deprecated
When to Use Legacy vs. Polars-Native#
- β Migrating Existing Code
You have extensive Pandas-based analysis pipelines
- β Team Training Time
Your team needs time to learn Polars syntax
- β External Dependencies
Your code integrates with Pandas-only libraries
- β Proof of Concepts
Quick prototyping with familiar syntax
- β οΈ Temporary Migration Step
Use as stepping stone to Polars-native
- β New Projects
Starting fresh analysis projects
- β Performance Critical
Working with large datasets or complex operations
- β Memory Constrained
Limited memory environments
- β Production Code
Building robust, long-term analysis pipelines
- β Modern Features
Want to leverage advanced Polars capabilities
- π Future-Proof Choice
Recommended for all new development
π― Decision Matrix
Your Situation |
Legacy Interface πΌ |
Polars-Native β‘ |
|---|---|---|
New research project |
β Not recommended |
β Recommended |
Existing Pandas codebase |
β Good transition option |
π Migrate gradually |
Large datasets (>1GB) |
β οΈ Performance issues |
β Optimal performance |
Team learning curve |
β Familiar syntax |
π Investment in learning |
Production deployment |
β οΈ Legacy, may deprecate |
β Future-proof |
Example: Side-by-Side Comparison#
π Real-World Performance Example
Both examples below produce identical results, but with very different performance characteristics.
from corr_vars.legacy_v1 import Cohort
# Create cohort (slower initialization)
cohort = Cohort(obs_level="icu_stay", database="db_hypercapnia_prepared")
# Pandas-style analysis (familiar syntax)
adults = cohort.obs[cohort.obs["age_on_admission"] >= 18]
summary = adults.groupby("sex").agg({
"age_on_admission": "mean",
"inhospital_death": "sum"
})
# Time-series analysis
sodium = cohort.obsm["blood_sodium"]
patient_trends = sodium.groupby("icu_stay_id")["value"].agg(["first", "last", "mean"])
from corr_vars import Cohort
import polars as pl
# Create cohort (faster initialization)
cohort = Cohort(obs_level="icu_stay", sources={"cub_hdp": {"database": "db_hypercapnia_prepared"}})
# Polars-style analysis (faster execution)
summary = cohort.obs.filter(pl.col("age_on_admission") >= 18).group_by("sex").agg([
pl.col("age_on_admission").mean().alias("mean_age"),
pl.col("inhospital_death").sum().alias("deaths")
])
# Time-series analysis (more efficient)
patient_trends = cohort.obsm["blood_sodium"].group_by("icu_stay_id").agg([
pl.col("value").first().alias("first_sodium"),
pl.col("value").last().alias("last_sodium"),
pl.col("value").mean().alias("mean_sodium")
])
π Ready to Migrate?
Start your migration journey with the Tutorials and Getting Started and explore the Custom Variables Guide guide to learn modern Polars patterns!
π Related Documentation
Tutorials and Getting Started - Learn the Polars-native interface
Troubleshooting Guide - Common migration issues and solutions
Custom Variables Guide - Advanced variable creation patterns
Cohort - Full Polars-native Cohort documentation