NIMHANS — Catatonia Outcome Prediction

Professor Assistant — Data Scientist — NIMHANS, Bengaluru

Role

Professor Assistant — Data Scientist

Timeline

Aug 2023 – Jun 2024

Context

NIMHANS, Bengaluru

Type

ML / Healthcare Research

Problem

Catatonia is a complex neuropsychiatric condition where treatment decisions depend heavily on clinician judgment built from experience. With 450+ patient records available at NIMHANS, there was an opportunity to apply ML to surface patterns that could support — not replace — clinical decision-making around diagnosis and treatment planning.

Context

NIMHANS is India's premier mental health research and clinical institution. The project involved applying ML to clinical patient records to explore whether outcomes could be predicted with sufficient accuracy and interpretability to be clinically useful. The interpretability requirement was non-negotiable: a black-box model in a clinical context creates accountability problems that no accuracy metric can offset.

Why It Mattered

Catatonia is underdiagnosed and undertreated, partly because the clinical features are varied and the condition presents differently across patients. A model that could surface the features most predictive of outcomes — and make those patterns visible to clinicians — had potential to improve diagnosis consistency and treatment planning. The catch: it had to be explainable enough for clinicians to trust it.

My Role

I owned the modeling pipeline end-to-end — data preprocessing, feature engineering, model development, benchmarking, and interpretability analysis — and delivered the research documentation.

What I Did

Cleaned and preprocessed 450+ patient records. Clinical data missingness isn't random — it often reflects something about the patient or the clinical workflow — so I handled it thoughtfully rather than just imputing means.

Conducted exploratory analysis before touching models: variable distributions, inter-variable correlations, and class imbalance. Built and benchmarked three classification models — Random Forest, Logistic Regression, and SVC — evaluating on precision, recall, and AUC-ROC. Accuracy alone is a misleading metric on imbalanced clinical datasets, so I didn't rely on it.

Achieved 75% accuracy with the best-performing model. Identified 15 key clinical features with the strongest influence on outcomes.

Used visual reports — ROC curves and confusion matrices — to present findings to non-technical stakeholders including clinicians and researchers. The goal was interpretable outputs that enabled informed clinical judgment, not just a technical summary for data scientists.

Documented deployment prerequisites explicitly: clinical validation, prospective testing, and institutional review. The model was not clinically ready, and I said so in the documentation.

Key Decisions & Tradeoffs

Decision 1

Benchmarked three model types rather than picking one upfront. Random Forest, Logistic Regression, and SVC each have different interpretability profiles and clinical suitability characteristics. The comparison was the product, not just an engineering step.

Decision 2

Presented via ROC curves and confusion matrices rather than technical metrics alone. Non-technical stakeholders — the clinicians who would ultimately use or reject the model's outputs — needed to understand what the model got right and wrong in terms they could reason about.

Decision 3

Flagged deployment prerequisites explicitly and in writing. A 75% accurate model is technically interesting. Whether it should be used in clinical practice is a different question — one that requires clinical validation, prospective testing, and IRB review. I documented this clearly rather than leaving it implicit.

Outcome

ML pipeline built across 450+ patient records. 75% accuracy achieved; 15 key clinical features identified and documented. Findings presented via visual reports enabling non-technical stakeholders to engage with the results. Research summary delivered with full methodology and explicit deployment prerequisites.

Reflection

“Working in healthcare data forced a discipline around 'what should we build?' that purely technical environments don't require. A 75% accurate model sounds good until you ask: what happens in the 25% of cases where it's wrong, and who bears that cost? That question — what happens when the model fails — is one every PM building AI products should have a specific answer to before shipping anything.”

←Back to all work