## Michael W. Kattan, PhD

### Department Chair

#### The Dr. Keyhan and Dr. Jafar Mobasseri Endowed Chair for Innovations in Cancer Research (Joint appointment with Department of Urology in the GUKI)

Lerner Research Institute,
9500 Euclid Avenue, Cleveland, Ohio 44195

Location: JJN3-230

Phone: (216) 444-0584

Fax: (216) 445-7659

My general research interest lies in medical decision making. More specifically, my research is focused on the development, validation, and use of prediction models. Most of these models are available online, and designed for physician use, athttp://riskcalc.org/. I am also interested in quality of life assessment to support medical decision making (such as utility assessment), decision analysis, cost-effectiveness analysis, and comparative effectiveness.

Here are some pages you might want to check out:

- My official Cleveland Clinic page
- My list of publications
- My Google Scholar page
- My ResearchGate page
- Researchers with the highest h-indices (#989 through 3/2021)
- Most-cited scientists (#2773 through 2019)
- The most cited authors in urologic surgery (#1)

#### Lay Summary

Ever since my dissertation, “A Comparison of Machine Learning with Traditional Statistical Techniques,” I’ve had a long-standing interest in machine learning (ML) and artificial intelligence (AI). At first, I compared AI with human experts [1] to better understand when one should outperform the other [2]. I then compared ML with traditional statistical techniques, similarly, trying to understand when one would prove superior. I first developed a theoretical framework to describe the factors that should drive the performance in favor of or against ML in any given situation [3]. With the framework in place, I simulated data to illustrate the validity of this framework [4]. I later published a condensed illustration of the framework [5]. In wanting to apply ML in more varied applications, I noticed they were not well suited to handle time-until-event data and built these extensions [6,7]. With that in place, I was able to compare a variety of ML techniques with the standard statistical approach for time-until-event data, Cox proportional hazards regression [8]. What matters most is how well these ML and AI techniques fare in real-world data; to this end, I’ve studied their performances when predicting prostate cancer recurrence [9], clinical deterioration in the ward [10], and pelvic floor disorders after delivery [11]. In a recent comparative effectiveness study of bariatric surgery, we found that random forests were best for 2 of the models, but regression was superior for the remaining 6 models [12]. This disappointment for random forests led us to pursue enhancement, specifically, adding multi-objective particle swarm optimization (MOPSO). We found the combination to outperform random forests alone [13], though random forests did well for us predicting progression of diabetic kidney disease [14]. More recently, when predicting good postoperative depth of focus after cataract surgery, extreme gradient boost outperformed logistic regression [15]. All of these complex issues regarding machine learning vs. statistical methods are discussed in our book [16].

1. Kattan, M.W., Inductive expert systems vs. human experts.* AI Expert*, 1994: p. 32-38.

2. Kattan, M.W., D.A. Adams, and M.S. Parks, A Comparison of Machine Learning with Human Judgment*.* *J Management Inf Sys*, 1993. 9(4): p. 37-57.

3. Kattan, M.W. and R.B. Cooper, The predictive accuracy of computer-based classification decision techniques. A review and research directions*.* *Omega Int J Mgmt Sci*, 1998. 26(4): p. 467-482.

4. Kattan, M.W. and R.B. Cooper, A simulation of factors affecting machine learning techniques: an examination of partitioning and class proportions. *Omega Int J Mgmt Sci*, 2000. 28: p. 501-512.

5. Kattan, M.W., Statistical prediction models, artificial neural networks, and the sophism "I am a patient, not a statistic*".* *J Clin Oncol*, 2002. 20(4): p. 885-887.

6. Zupan, B., (Kattan, M.W.)et al., Machine learning for survival analysis: a case study on recurrence of prostate cancer*.* *Artif Intell Med*, 2000. 20(1): p. 59-75.

7. Kattan, M.W., K.R. Hess, and J.R. Beck, Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. *Comput Biomed Res*, 1998. 31: p. 363-373.

8. Kattan, M.W., Comparison of Cox regression with other methods for determining prediction models and nomograms*.* *J Urol*, 2003. 170(Supplement): p. S6-S10.

9. Cordon-Cardo, C., (Kattan, M.W.) et al., Improved prediction of prostate cancer recurrence through systems pathology. *J Clin Invest*, 2007. 117(7): p. 1876-1883.

10. Churpek, M.M., (Kattan, M. W.)et al., Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards. *Crit Care Med*, 2016. 44(2): p. 368-74.

11. Jelovsek, J., (Kattan, M. W.)et al., Predicting risk of pelvic floor disorders 12 and 20 years after delivery*.* *Am J Obstet Gynecol*, 2018. 218(2): p. 222.e1-222.e19.

12. Aminian A, Zajichek A, Arterburn DE, Wolski KE, Brethauer SA, Schauer PR, Nissen SE, Kattan MW. Predicting 10-Year Risk of End-Organ Complications of Type 2 Diabetes With and Without Metabolic Surgery: A Machine Learning Approach. Diabetes Care. 2020 Apr;43(4):852-859. doi: 10.2337/dc19-2057. Epub 2020 Feb 6. Erratum in: Diabetes Care. 2020 Jun;43(6):1367. PMID: 32029638; PMCID: PMC7646205.

13. Asadi S, Roshan S, Kattan MW. Random forest swarm optimization-based for heart diseases diagnosis. J Biomed Inform. 2021 Feb 1;115:103690. doi: 10.1016/j.jbi.2021.103690. Epub ahead of print. PMID: 33540075.

14. Chan L, Nadkarni GN, Fleming F, McCullough JR, Connolly P, Mosoyan G, El Salem F, Kattan MW, Vassalotti JA, Murphy B, Donovan MJ, Coca SG, Damrauer SM. Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict progression of diabetic kidney disease. Diabetologia. 2021 Apr 2. doi: 10.1007/s00125-021-05444-0. Epub ahead of print. PMID: 33797560.

15. Liu Y, Wei D, Bai T, Luo J, Wood J, Vashisht A, Zhang S, Xuan J, Kattan M, Coplan P. Using machine learning to predict post-operative depth of focus after cataract surgery with implantation of Tecnis Symfony. Eur J Ophthalmol. 2021 Feb 2:1120672121991777. doi: 10.1177/1120672121991777. Epub ahead of print. PMID: 33530727.

16. Gerds TA and Kattan MW. (2021). *Medical Risk Prediction Models With Ties to Machine Learning.* CRC Press.

**BREAKING NEWS! ** Our book is out. Get it here:

Before getting into specific research interests, here are some useful statistical reporting guidelines, and here are ideas for nice figures and tables.

**RESEARCH INTERESTS**

Inspired by personal frustrations with medical uncertainty, I am particularly interested in statistical prediction models and medical decision making:

**A. Prediction Model Development**

- Here are the requirements for having a statistical prediction model endorsed by the American Joint Commission on Cancer.
- Here is how we process data from Epic to make it research ready.
- In the TRIPOD group, we came up with a checklist of what should be reported in a paper presenting a prediction model.
- Making a prediction model when there is a time-varying covariate.
- Propensity scores do not improve the accuracy of statistical prediction models.
- Here's the code to make binary, ordinal, and survival outcome nomograms.
- Here's how to make a competing risks regression nomogram. Detailed R code is here.
- Machine learning approaches usually lose

**B. Prediction Model Assessment**

- Here's a decent way to compare two rival prediction tools that both predict on an ordinal scale.
- How to make a calibration plot for a prediction model in the presence of competing risks.
- How to estimate a time-dependent concordance index.
- This is why you can't compare two prediction models tested on separate datasets. The figure is updated here.
- A guide to the many metrics for assessing prediction models.
- How to determine the area under the ROC curve for a binary diagnostic test.
- The concordance index is not proper. Use the Index of Predictive Accuracy (IPA) instead.
- This is a framework for reviewers when evaluating statistical prediction modeling manuscripts.

**C. Prediction Communication and Interpretation**

- As cancer survivors, we like to think we both needed the treatment we received and were cured by it, but that is hard to prove.
- Here's an example of how patients should be counseled: a table of tailored predictions of benefits and harms crossed by treatment options.
- It is useless and confusing to put confidence intervals around a predicted probability.
- You must apply a statistical prediction model to achieve informed consent.
- What is a real nomogram anyway?
- My definition of comparative effectiveness.
- Too often we diagnose patients based on some arbitrary cutoff. Let's stop doing that and recognize risk is on a continuum.
- Don't just look at the p-value when judging a new marker.
- Cancer staging systems need to go away.
- "I'm a patient, not a statistic" is bogus.
- Here is how we make our online risk calculators.
- Patients found our risk calculator decision aid easy to use and useful.
- Simplifying a regression model with friendly integers loses accuracy.

**D. Predictions Doctors Make**

- The wisdom of crowds of doctors: averaging their individual predictions improves accuracy over the individuals themselves.
- Probably due to cognitive biases, predicted probabilities coming from doctors are

**E. Decision Analysis and Utility Assessment**

- The method used to measure utilities affects the decision analytic recommendation.
- Unfortunately, you probably have to measure individual patient utilities to run a decision analysis on someone.
- Stop multiplying health state utilities to get the utility of the combined health state.
- Why utilities are more helpful than traditional health-related quality of life measures with respect to medical decision making.
- The layout of the time trade-off is problematic.
- How to measure standard gamble on paper.

**F. Novel Uses of Prediction Models**

- Here's an example of how to make a synthetic control arm for a single arm study, using a prediction model. Here's how to calculate the p-value.
- Rather than running a decision analysis at the bedside, apply a nomogram instead -- much easier and same answer.

### Latest COVID-19-Related Prediction Model from Cleveland Clinic Forecasts Risk for ICU Admission, Death

Cleveland Clinic researchers have developed an algorithm to predict which COVID-19 patients are at highest risk for becoming seriously ill or dying from the disease. This prediction model will help physicians and healthcare systems efficiently allocate resources, including COVID-19 vaccines.

### COVID-19 Risk Model Developed by Cleveland Clinic Now Available to Health Systems Around the World Through Epic

A COVID-19 risk prediction model designed by Cleveland Clinic researchers—including Michael Kattan, PhD, Chair of the Department of Quantitative Health Sciences, and Lara Jehi, MD, Cleveland Clinic’s Chief Research Information Officer—is now available to health systems around the world through Epic.

### New Prediction Model Can Forecast Personalized Risk for COVID-19-Related Hospitalization

Cleveland Clinic researchers have developed and validated a risk prediction model (called a nomogram) that can help physicians predict which patients who have recently tested positive for SARS-CoV-2, the virus that causes COVID-19, are at greatest risk for hospitalization.

### New Analysis Shows Surgery for Drug-Resistant Temporal Lobe Epilepsy is Cost-Effective

U.S. patients with drug-resistant temporal lobe epilepsy (DR-TLE) should be referred for evaluation for epilepsy surgery “without hesitation,” concludes a new model-based analysis of surgery cost effectiveness from Cleveland Clinic researchers.

### Researchers Develop First Model to Predict Likelihood of Testing Positive for COVID-19 and Disease-Related Outcomes

Cleveland Clinic researchers have developed the world’s first risk prediction model for healthcare providers to forecast an individual patient’s likelihood of testing positive for COVID-19 as well as their outcomes from the disease.

### Researchers Develop COVID-19 Case & Mortality Dashboard

Led by Michael Kattan, PhD, chair of the Department of Quantitative Health Sciences (QHS), Lerner Research Institute investigators have created a dashboard to track COVID-19 case and mortality data in the U.S.

### Risk Calculator Predicts Diabetes Complications from Weight Loss Surgery

Patients struggling with type 2 diabetes and obesity are faced with the decision of whether to receive usual medical care or undergo weight-loss surgery. Now, a new risk calculator developed by a team of Cleveland Clinic researchers can show these patients their risks of developing major health complications over the next 10 years depending on which course of treatment they choose.

### New Statistical Guidelines in Urology Research

A panel of urology experts from eleven universities and medical centers across the United States and United Kingdom, including Cleveland Clinic, recently published a new set of guidelines for reporting statistics in urology research. Guideline recommendations are based on the consensus of the statistical consultants to four leading urology medical journals: *Urology*, *European Urology*, *The Journal of Urology* and *BJUI*, and will be published in each of the four journals.

### Kattan Recognized with National Award

Michael Kattan, PhD, MBA, Chair of Lerner Research Institute's Department of Quantitative Health Sciences and a joint appointee in the Cleveland Clinic Glickman Urological and Kidney Institute, has been elected a Fellow of the American Statistical Association (ASA). The formal induction ceremony will be in Vancouver this coming July. Dr. Kattan was nominated by an ASA-member peer for his excellent reputation and outstanding contributions to statistical science.