Michael W. Kattan,  PhD

Michael W. Kattan, PhD

Department Chair

The Dr. Keyhan and Dr. Jafar Mobasseri Endowed Chair for Innovations in Cancer Research (Joint appointment with Department of Urology in the GUKI)

Lerner Research Institute, 9500 Euclid Avenue, Cleveland, Ohio 44195


My general research interest lies in medical decision making. More specifically, my research is focused on the development, validation, and use of prediction models. Most of these models are available online, and designed for physician use, athttp://riskcalc.org/. I am also interested in quality of life assessment to support medical decision making (such as utility assessment), decision analysis, cost-effectiveness analysis, and comparative effectiveness.

Here are some pages you might want to check out:

  1. My official Cleveland Clinic page
  2. My list of publications
  3. My Google Scholar page
  4. My ResearchGate page
  5. Researchers with the highest h-indices (#1072 through 3/2022)
  6. Most-cited scientists (#2825 through 2021)
  7. The most cited authors in urologic surgery (#1)

Lay Summary

Ever since my dissertation, “A Comparison of Machine Learning with Traditional Statistical Techniques,” I’ve had a long-standing interest in machine learning (ML) and artificial intelligence (AI).  At first, I compared AI with human experts [1] to better understand when one should outperform the other [2].  I then compared ML with traditional statistical techniques, similarly, trying to understand when one would prove superior.  I first developed a theoretical framework to describe the factors that should drive the performance in favor of or against ML in any given situation [3].  With the framework in place, I simulated data to illustrate the validity of this framework [4].  I later published a condensed illustration of the framework [5].   In wanting to apply ML in more varied applications, I noticed they were not well suited to handle time-until-event data and built these extensions [6,7].  With that in place, I was able to compare a variety of ML techniques with the standard statistical approach for time-until-event data, Cox proportional hazards regression [8].  What matters most is how well these ML and AI techniques fare in real-world data; to this end, I’ve studied their performances when predicting prostate cancer recurrence [9], clinical deterioration in the ward [10], and pelvic floor disorders after delivery [11].  In a recent comparative effectiveness study of bariatric surgery, we found that random forests were best for 2 of the models, but regression was superior for the remaining 6 models [12].  This disappointment for random forests led us to pursue enhancement, specifically, adding multi-objective particle swarm optimization (MOPSO).  We found the combination to outperform random forests alone [13], though random forests did well for us predicting progression of diabetic kidney disease [14].  More recently, when predicting good postoperative depth of focus after cataract surgery, extreme gradient boost outperformed logistic regression [15].  All of these complex issues regarding machine learning vs. statistical methods are discussed in our book [16].      

1.         Kattan, M.W., Inductive expert systems vs. human experts. AI Expert, 1994: p. 32-38.

2.         Kattan, M.W., D.A. Adams, and M.S. Parks, A Comparison of Machine Learning with Human Judgment. J Management Inf Sys, 1993. 9(4): p. 37-57.

3.         Kattan, M.W. and R.B. Cooper, The predictive accuracy of computer-based classification decision techniques.  A review and research directions. Omega Int J Mgmt Sci, 1998. 26(4): p. 467-482.

4.         Kattan, M.W. and R.B. Cooper, A simulation of factors affecting machine learning techniques: an examination of partitioning and class proportions. Omega Int J Mgmt Sci, 2000. 28: p. 501-512.

5.         Kattan, M.W., Statistical prediction models, artificial neural networks, and the sophism "I am a patient, not a statistic". J Clin Oncol, 2002. 20(4): p. 885-887.

6.         Zupan, B., (Kattan, M.W.)et al., Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif Intell Med, 2000. 20(1): p. 59-75.

7.         Kattan, M.W., K.R. Hess, and J.R. Beck, Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Comput Biomed Res, 1998. 31: p. 363-373.

8.         Kattan, M.W., Comparison of Cox regression with other methods for determining prediction models and nomograms. J Urol, 2003. 170(Supplement): p. S6-S10.

9.         Cordon-Cardo, C., (Kattan, M.W.) et al., Improved prediction of prostate cancer recurrence through systems pathology. J Clin Invest, 2007. 117(7): p. 1876-1883.

10.       Churpek, M.M., (Kattan, M. W.)et al., Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards. Crit Care Med, 2016. 44(2): p. 368-74.

11.       Jelovsek, J., (Kattan, M. W.)et al., Predicting risk of pelvic floor disorders 12 and 20 years after delivery. Am J Obstet Gynecol, 2018. 218(2): p. 222.e1-222.e19.

12.       Aminian A, Zajichek A, Arterburn DE, Wolski KE, Brethauer SA, Schauer PR, Nissen SE, Kattan MW. Predicting 10-Year Risk of End-Organ Complications of Type 2 Diabetes With and Without Metabolic Surgery: A Machine Learning Approach. Diabetes Care. 2020 Apr;43(4):852-859. doi: 10.2337/dc19-2057. Epub 2020 Feb 6. Erratum in: Diabetes Care. 2020 Jun;43(6):1367. PMID: 32029638; PMCID: PMC7646205.

13.       Asadi S, Roshan S, Kattan MW. Random forest swarm optimization-based for heart diseases diagnosis. J Biomed Inform. 2021 Feb 1;115:103690. doi: 10.1016/j.jbi.2021.103690. Epub ahead of print. PMID: 33540075.

14.      Chan L, Nadkarni GN, Fleming F, McCullough JR, Connolly P, Mosoyan G, El Salem F, Kattan MW, Vassalotti JA, Murphy B, Donovan MJ, Coca SG, Damrauer SM. Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict progression of diabetic kidney disease. Diabetologia. 2021 Apr 2. doi: 10.1007/s00125-021-05444-0. Epub ahead of print. PMID: 33797560.

15.       Liu Y, Wei D, Bai T, Luo J, Wood J, Vashisht A, Zhang S, Xuan J, Kattan M, Coplan P. Using machine learning to predict post-operative depth of focus after cataract surgery with implantation of Tecnis Symfony. Eur J Ophthalmol. 2021 Feb 2:1120672121991777. doi: 10.1177/1120672121991777. Epub ahead of print. PMID: 33530727.

16.       Gerds TA and Kattan MW. (2021).  Medical Risk Prediction Models With Ties to Machine Learning.  CRC Press.


BREAKING NEWS!  Our book is out.  Get it here

Before getting into specific research interests, here are some useful statistical reporting guidelines, and here are ideas for nice figures and tables.   


Inspired by personal frustrations with medical uncertainty, I am particularly interested in statistical prediction models and medical decision making:

A. Prediction Model Development

  1. Here are the requirements for having a statistical prediction model endorsed by the American Joint Commission on Cancer.
  2. Here is how we process data from Epic to make it research ready.
  3. In the TRIPOD group, we came up with a checklist of what should be reported in a paper presenting a prediction model.
  4. Making a prediction model when there is a time-varying covariate.
  5. Propensity scores do not improve the accuracy of statistical prediction models.
  6. Here's the code to make binary, ordinal, and survival outcome nomograms.
  7. Here's how to make a competing risks regression nomogram. Detailed R code is here.
  8. Machine learning approaches usually lose

B. Prediction Model Assessment

  1. Here's a decent way to compare two rival prediction tools that both predict on an ordinal scale.
  2. How to make a calibration plot for a prediction model in the presence of competing risks.
  3. How to estimate a time-dependent concordance index.
  4. This is why you can't compare two prediction models tested on separate datasets. The figure is updated here.
  5. A guide to the many metrics for assessing prediction models.
  6. How to determine the area under the ROC curve for a binary diagnostic test.
  7. The concordance index is not proper.  Use the Index of Predictive Accuracy (IPA) instead.
  8. This is a framework for reviewers when evaluating statistical prediction modeling manuscripts.

C. Prediction Communication and Interpretation

  1. As cancer survivors, we like to think we both needed the treatment we received and were cured by it, but that is hard to prove.
  2. Here's an example of how patients should be counseled: a table of tailored predictions of benefits and harms crossed by treatment options.
  3. It is useless and confusing to put confidence intervals around a predicted probability.
  4. You must apply a statistical prediction model to achieve informed consent.
  5. What is a real nomogram anyway?
  6. My definition of comparative effectiveness.
  7. Too often we diagnose patients based on some arbitrary cutoff.  Let's stop doing that and recognize risk is on a continuum.
  8. Don't just look at the p-value when judging a new marker.
  9. Cancer staging systems need to go away.
  10. "I'm a patient, not a statistic" is bogus.
  11. Here is how we make our online risk calculators.
  12. Patients found our risk calculator decision aid easy to use and useful.
  13. Simplifying a regression model with friendly integers loses accuracy.

D. Predictions Doctors Make

  1. The wisdom of crowds of doctors: averaging their individual predictions improves accuracy over the individuals themselves.
  2. Probably due to cognitive biases, predicted probabilities coming from doctors are

E. Decision Analysis and Utility Assessment

  1. The method used to measure utilities affects the decision analytic recommendation.
  2. Unfortunately, you probably have to measure individual patient utilities to run a decision analysis on someone.
  3. Stop multiplying health state utilities to get the utility of the combined health state.
  4. Why utilities are more helpful than traditional health-related quality of life measures with respect to medical decision making.
  5. The layout of the time trade-off is problematic.
  6. How to measure standard gamble on paper.

F. Novel Uses of Prediction Models

  1. Here's an example of how to make a synthetic control arm for a single arm study, using a prediction model.  Here's how to calculate the p-value.
  2. Rather than running a decision analysis at the bedside, apply a nomogram instead -- much easier and same answer.
  3. Here are some thoughts about using prediction models for clinical trial design.

02/16/2022 |  

Cleveland Clinic Model Predicts the Risk of Hospital Readmissions

Dr. Misra-Hebert and colleagues assessed a Cleveland Clinic model’s ability to predict the likelihood that a patient would need to be readmitted to the hospital within 30 days of discharge.

12/22/2020 |  

Latest COVID-19-Related Prediction Model from Cleveland Clinic Forecasts Risk for ICU Admission, Death

This latest prediction model from Drs. Kattan and Jehi helps identify which COVID-19-positive patients are at greatest risk for severe COVID-19, which may help physicians prioritize who should receive COVID-19 vaccines first.

11/09/2020 |  

COVID-19 Risk Model Developed by Cleveland Clinic Now Available to Health Systems Around the World Through Epic

Healthcare organizations can present the clinically validated model—first published by Drs. Kattan and Jehi in a June edition of CHEST—to patients in MyChart to assess their risk of having COVID-19.

08/11/2020 |  

New Prediction Model Can Forecast Personalized Risk for COVID-19-Related Hospitalization

Drs. Jehi and Kattan have developed their second COVID-19 nomogram—this latest one helping physicians to anticipate which COVID-19 patients are most likely to be admitted to the hospital for related symptoms and complications.

08/03/2020 |  

New Analysis Shows Surgery for Drug-Resistant Temporal Lobe Epilepsy is Cost-Effective

The first U.S. cost-effectiveness analysis in decades supports more surgical evaluations, suggesting the up-front cost of evaluation is significantly smaller than the price paid by patients, society and healthcare systems when medications alone are used.

06/15/2020 |  

Researchers Develop First Model to Predict Likelihood of Testing Positive for COVID-19 and Disease-Related Outcomes

The first-in-class individual prediction model, developed by Drs. Jehi and Kattan, reveals new characteristics that affect a person's risk for testing positive, including taking certain medications and vaccination history.

03/26/2020 |  

Researchers Develop COVID-19 Case & Mortality Dashboard

Staff in the Department of Quantitative Health Sciences have created a dashboard to help stay current on COVID-19 case and mortality U.S. data.