Improvements in cancer care that help patients live longer have been linked to increased risk for cardiac dysfunction and cardiovascular disease, highlighting the clinical need for tools that can help assess risk of cancer therapy-related cardiac dysfunction (CTRCD).
Machine learning-based approaches to risk assessment can be highly effective in predicting various types of cardiac dysfunction among cancer survivors who have received cardiotoxic cancer therapies, according to a new retrospective longitudinal study by researchers from Cleveland Clinic’s Lerner Research Institute; Heart, Vascular & Thoracic Institute; and Taussig Cancer Institute.
Published in the Journal of the American Heart Association, the study represents the first reported large-scale use of a machine learning-based approach for evaluating complications from cancer therapies that can contribute to cardiovascular disease.
Developing machine learning models
The research team, led by Feixiong Cheng, PhD, assistant staff in the Genomic Medicine Institute, and Patrick Collier, MD, PhD, co-director of Cleveland Clinic’s Cardio-Oncology Center, developed and evaluated risk assessment machine learning models for six forms of CTRCD: heart failure, atrial fibrillation, coronary artery disease, myocardial infarction, stroke and de novo CTRCD (CTRCD developed after cancer therapy).
They built models for each of the six outcomes using clinical data from 4,309 cancer patients from 1997 to 2018 who had laboratory test and echocardiographic results in Cleveland Clinic’s electronic health record database. The models were then evaluated for predictive performance and generalizability and inspected to identify clinically relevant variables that were associated with CTRCDs.
Results demonstrate model accuracy and generalizability
Of the 4,309 cancer patients studied, 93 percent were treated with chemotherapy and 46 percent with radiation. Among the overall cohort, 36 percent were diagnosed with at least one of the six CTRCDs; 17 percent of these patients had preexisting cardiac disease before cancer therapy, while 19 percent developed de novo CTRCD.
Based on 100 model iterations, the models demonstrated moderate to high predictive performance as well as real-world generalizability for prediction of CTRCD for new patients. Interrogation of the models revealed several clinically relevant variables were significantly associated with CTRCDs, including but not limited to age, hypertension, glucose level and left ventricular ejection fraction. They found that combining both laboratory test and echocardiographic variables yielded the highest predictive performance.
A clear upside of machine learning approaches is that, by nature, they improve over time. “As additional longitudinal clinical data are accumulated for cancer survivors, machine learning can use these data to build and refine predictive models to guide clinical decision-making,” said Dr. Cheng.
In fact, the study authors note that they will continue to improve the models they developed for the present study as more data are gathered. “We also are now incorporating imaging data directly into convolutional neural networks to further enhance the performance of our machine learning models,” added Dr. Cheng. “As a next step, we are working to develop new risk calculators that integrate our models into Cleveland Clinic’s electronic health record system to help provide cardiovascular care for cancer patients.”
“The findings from this study underscore the promise that machine learning methods hold for cardiac risk assessment for individuals before, during and after cancer treatment,” concluded Dr. Collier.
Yadi Zhou, PhD, a data scientist in Dr. Cheng’s lab, was first author on the study, which was supported by the National Heart, Lung, and Blood Institute, part of the National Institutes of Health
Image: Overview of the study design. Cardiovascular echocardiographic and laboratory testing variables were integrated from over 4,300 longitudinal cancer patients for the prediction of six outcomes: heart failure (HF), atrial fibrillation (AF), coronary artery disease (CAD), myocardial infarction (MI), stroke and de novo cancer therapy-related cardiac dysfunction (CTRCD). Five classification methods were systematically tested: k-nearest neighbors (k-NN), logistic regression (LR), support vector machine (SVM), random forest (RF) and gradient tree boosting (GB). Feature sets were tested as follows: laboratory test variables only, echocardiographic variables only, and lab test and echocardiographic variables combined. Reprinted from Zhou et al., J Am Heart Assoc. 2020;9:e019628. ©2020 The Authors. Reprinted under Creative Common Attribution-NonCommercial License.