Department of Quantitative Health Sciences
Multilevel data analysis: What? Why? How?
In many clinical outcomes studies investigators naturally encounter multilevel data. Multilevel data can be of “nested” type where, for example, data obtained from patients within difference nursing units, collected from different hospitals. The most frequently encountered multilevel data type is ‘Longitudinal Data’, where, for example, echo measurements were collected over time for a group of patients. To produce valid statistical and clinical inferences, statistical analysis of such data should take into account of the multilevel structure. Rajeswaran and Blackstone’s scientific commentary in the Journal of Thoracic and Cardiovascular Surgery briefly discusses the needs for such analyses.
Expanding clarity or confusion? Volatility of the 5-tier ratings assessing quality of transplant centers in the United States
The Scientific Registry for Transplant Recipients (SRTR) issues biannual reports evaluating the quality of solid organ transplant programs in the United States based on one year post-transplant survival. Recently, the SRTR introduced a new 5-tier rating system to evaluate center quality. In a recent issue of the American Journal of Transplantation, Schold and colleagues evaluated the variation in the ratings system over time using a national cohort of kidney transplant centers in the United States. The primary findings demonstrated that center ratings fluctuate rapidly over time including more than half of center ratings changing in consecutive reports and virtually no correlation between center ratings from a baseline period to ratings after three years. Given that most kidney transplant patients have to wait at least four years for an available deceased donor organ offer, the study questions the utility of the rating system to effectively inform prospective patients regarding the quality of transplant programs.
Identifying risk factors: Challenges of separating signal from noise
Identifying risk factors from a possibly large number of candidate variables—variable selection—is key in clinical outcomes studies. However, different persons analyzing the same dataset often identify different risk factors. Making variable selection more reproducible is a challenging task. Combining parametric and machine-learning nonparametric methods using resampling techniques may provide the most reproducible risk factor models. Rajeswaran and Blackstone’s editorial in the Journal of Thoracic and Cardiovascular Surgery provides a brief but general road map for clinical researchers to tackle the problem of variable selection.
Intraoperative excursions in blood pressure (BP) outside a targeted range have been associated with 30-day mortality in cardiac surgery patients. However, such hypotensive and hypertensive indices measure average BP rather than reading-to-reading variability. The relationship between BP variability per se (distinct from mean BP) and mortality remains unclear. Mascha, Yang and colleagues studied whether within-patient variability in mean arterial pressure (MAP), independent of time-weighted average MAP and other confounders, is associated with 30-day postoperative mortality in noncardiac surgery patients. Average MAP and MAP variability were nonlinearly related to outcome. MAP variability as measured by an improved formula was independently associated with 30-day mortality, but the association was not deemed clinically important. Anesthesiologists might thus pay more attention to overall trends in MAP than minute-to-minute variations.
With comparative effectiveness of interventions increasingly coming into focus, and with the ongoing establishment of large, robust observational clinical data registries, methods for better understanding patient heterogeneity and variation in treatment response are continuing to be developed. In January’s issue of Medical Decision Making, Jarrod Dalton and colleagues present a technique for empirical modeling of treatment effectiveness for binary outcomes, demonstrating its use with observational data on percutaneous coronary intervention or coronary artery bypass grafting for coronary revascularization.
In the field of population pharmaco-kinetics/dynamics (PK/PD) inter-individual variability is represented by model parameter distributions. In this paper Radivoyevitch et al. compare stochastic process PD models that capture the probability of complete eradication of colony forming units (CFU) to standard deterministic PD models that track only average CFU numbers. For neonatal intravenous gentamicin dosing regimens directed against Escherichia coli, stochastic calculations predict that the first dose is crucial. For example, a single 6mg/kg dose is predicted to have a higher eradication probability than four daily 4mg/kg doses. Conclusion: regimens with larger first doses but smaller total doses deserve further investigation.
Statistics courses that focus on data analysis in isolation, discounting the scientific inquiry process, may not motivate students to learn the subject. By involving students in other steps of the inquiry process, such as generating hypotheses and data, students may become more interested and vested in the analysis step. Additionally, such an approach might better prepare students to tackle real research questions outside of the statistics classroom. Dr. Nowacki emphasizes that statistical problem solving is an investigative cycle and should be taught within that context. As an illustration of such an approach, she developed a classroom activity utilizing the popular Hasbro board game Operation, which requires student involvement in identifying the research question, designing the study and database, data collection and analysis. Intended to mimic a real-world research scenario, this fun activity provides a guided yet flexible research experience from start to finish.
Numerous studies have examined clinical outcomes associated with immunosuppression regimens among kidney transplant recipients. Re-transplant recipients have unique risk profile associated with immunological condition and experience with a failed graft. Schold et al. used a propensity-score analysis in order to attenuate potential selection bias for allocation of immunosuppression and evaluated the association of induction therapy for re-transplant recipients using a national registry cohort. Findings illustrated a variable set of complications associated with different induction therapies but similar overall patient survival. The results may inform clinical decision-making and potentially illustrate mechanisms of the effect of induction therapy in this population.
Measurement error problems have attracted a great deal of interest in the past two decades. A variety of models and methods for the problems have been applied in scientific fields, such as medicine, economy, and astronomy. This paper (Wang and Ye) is motivated by a wide range of background correction problems in gene array data analysis, which refer to adjustments to the contaminated data intended to remove measurement error from the measured signal. Estimating the conditional density function from contaminated gene expression data, therefore, plays a key aspect of statistical inference and visualization here. We propose re-weighted deconvolution kernel methods to estimate the conditional density function in a general additive error model, when the error distribution is known as well as when it is unknown. Theoretical and numerical properties of the proposed estimators are comprehensively investigated. The new methodology could serve as an informative graphical and inferential tool for studying the relationship between the contaminated gene intensities and the unobserved true signals.
Comparative studies typically assess whether an exposure affects an outcome. However, just as important is to understand how. Mediation analysis attempts to quantify how much, if any, of the effect of an exposure on outcome goes though pre-specified mediator, or “mechanism”, which sits on the causal pathway between exposure and outcome. Mediation is suggested when two conditions are true: exposure affects mediator and mediator (adjusting for exposure) affects outcome. A mediation analysis can validate or refute one’s original hypothesis and stimulate further research to modify mediators to improve patient outcomes. In this work Mascha et al discuss design and analysis of studies investigating mediation, including distinguishing mediation from confounding, identifying potential mediators when the exposure is chronic versus acute, and requirements for claiming mediation. Besides the simplest design with a single continuous mediator and outcome, we consider binary mediator and outcome, multiple mediators, multiple outcomes and mixed data types. Methods are illustrated with NSQIP data assessing the effects of pre-operative anemic status on post-operative outcomes through a set of intraoperative mediators.
There are numerous factors which explain processes of healthcare delivery and outcomes beyond traditional clinical characteristics. Emerging research suggests that factors such as behavioral characteristics, environmental hazards, mental health and socioeconomic status may have a marked impact on patient outcomes independent of clinically defined medical condition. Using data from numerous national registries and surveys, we sought to quantify the effect of the prevalence of underlying risks in patients' communities on outcomes for kidney transplant candidates in the United States. Findings suggest that independent of known clinical characteristics, there is a dose-response relationship of the level of community risk with outcomes for transplant candidates in the United States including mortality and likelihood to receive a transplant. These findings (summarized in the September, 2013 edition of the American Journal of Transplantation here) may have important implications for developing interventions as well as measuring the quality of care of providers who treat a high proportion of patients from high risk communities.
In evaluating the accuracy of diagnostic imaging tests, multi-reader studies are often performed. These studies characterize the performance of a sample of readers and allow unbiased comparisons between competing diagnostic tests. To date, most multi-reader imaging studies have used a fully-crossed design where each reader interprets each image in the study; however, these studies require that each reader interpret many images,
and the number of required interpretations often becomes a limiting factor in the study. Split-plot designs have been proposed recently for multi-reader studies. These studies can greatly reduce the number of interpretations required of each reader. Obuchowski et al (Acad Radiol, 2012; 19: 1508-1517) present and compare three statistical methods for analyzing data from a split-plot multi-reader imaging study. They illustrate the new methods with a 36-reader, 200-patient split-plot study of breast cancer detection. Read more here.
Binary prediction models – that is, statistical models intended to measure absolute risk of a dichotomous disease status or outcome variable – are used widely in clinical care and biomedical research settings. These risks are commonly expressed as a predicted probability. A fundamental issue involved with predicted probabilities is whether or not patients with, say, a 20% predicted probability for the event actually express a 20% incidence of the event (and whether or not the same is true across all predicted probabilities). This is known in the statistical literature as model calibration. Jarrod Dalton recently developed a flexible recalibration methodology which will correct predicted probabilities for miscalibrations based on a user-specified logistic regression model structure. Read more here.
Joint hypothesis testing and gatekeeping procedures can improve the efficiency and interpretation of studies with multiple outcomes of interest. When a claim of superiority of one intervention over another depends on several outcome variables, making conclusions about individual outcomes in isolation can be problematic, especially if effects differ. We thus advocate joint hypothesis testing, in which the decision rule to claim success of an intervention over another with regard to multiple outcomes is specified a priori, and type I error is protected. Success might require significant improvement in all outcomes, or at least one. We focus on requiring superiority on at least one outcome and noninferiority on the rest. We further advocate "gatekeeping" procedures in which primary and secondary hypotheses are a priori organized into ordered sets, and testing proceeds to the next set only if significance criteria for previous sets are satisfied. Methods are demonstrated with data from a trial assessing effects of transdermal nicotine on pain and opioids after pelvic gynecological surgery. Click here for more.
We propose a nonparametric procedure to describe the progression of longitudinal cohorts over time, leading to multi-state probability curves with the states defined jointly by survival and longitudinal outcomes measured with error. To account for the challenges of informative dropout and nonlinear shapes of the longitudinal trajectories, a bias corrected penalized spline regression is applied to estimate the unobserved longitudinal trajectory for each subject. The multi-state probability curves are then estimated based on the survival data and the estimated longitudinal trajectories. Simulation Extrapolation (SIMEX) is further used to reduce the estimation bias caused by the randomness of the estimated trajectories. We present theoretical justification of the estimation procedure along with a simulation study to demonstrate finite sample performance. The procedure is illustrated by data from the African American Study of Kidney Disease and Hypertension, and it can be widely applied in longitudinal studies. Click here for more.
In many neuroscience studies, multidimensional outcomes of different natures are obtained simultaneously from multiple modalities. A joint modeling approach is presented to model the multidimensional outcomes together, which allows us to not only estimate the covariate effects but also evaluate the strength of association among the multiple responses from different modalities. A simulation study is conducted to quantify the possible
benefits by the new approach in finite sample situations. An analysis of neurophysiology data is illustrated with the use of the proposed method. Click here for more.
Adrian Hernedez, MD, Ph.D. was recently named a Fellow of the American College of Cardiology (FACC). Besides being nominated, one has to have quite a number of publications in Cardiology, including many as lead or senior author. See his publications here.
Many medical programs seek active learning instructional techniques to involve students directly and dynamically in the learning process itself. Implementing such techniques has proven challenging in some disciplines, such as biostatistics. In this paper, Nowacki applies the 4MAT framework to educational planning and transforms a biostatistics course into an active learning focused experience. Using this four-question approach, described are specific learning activities/materials utilized during each class session and during the course block. Click here for more.
Determining a reasonable sample size for a multi-reader diagnostic accuracy study can be challenging. Computer-aided detection (CAD) studies are particularly challenging because the reader can detect more than one suspicious finding per patient. To determine sample size, one must take into consideration the correlation between findings from the same patient. In this paper, Obuchowski and Hillis extend previous work on sample size calculations to studies with potentially multiple findings per patient. They present tables which provide ball-park estimates of sample size for multiple reader studies with multiple findings per patient. Click here for more.
There has been a well documented increased risk of diminished outcomes following kidney transplantation among African American recipients relative to other race/ethnic groups. Studies suggest that this association is based on multiple factors including immunological responses and socioeconomic conditions. In this study, using a national cohort of recipients in the United States,Schold et al. determined that this relative risk is marked among younger age groups but highly attenuated and even eliminated among older recipients. Findings have important implications for organ allocation policy, transplant candidacy decisions and for improved understanding of the etiology of diminished outcomes among African American transplant recipients. Click here for more.
Wang et al. (2012) proposed a generalized negative-binomial model with structured dispersion for a functional Magnetic Resonance Imaging (fMRI) study to investigate the motor recovery mechanisms in chronic stroke patients. The effects of inappropriate statistical models that ignore the nature of data were addressed through Monte Carlo simulations. Based on the proposed model, significant activation differences were observed in a number of cortical regions for stroke versus control and as a result of treatment; notably, these differences were not detected when the data were analyzed using a conventional linear regression model. The findings provided an improved fMRI data analysis protocol, specifically for pixel/voxel counts. Click here for more.
Computer-aided detection (CAD) algorithms have been developed to help physicians find early disease, often missed on standard imaging tests. CAD algorithms have been developed and tested for breast cancer, colon polyps, vertebral fracture, and lung nodules. Before CAD algorithms can be used to diagnose patients, they must undergo extensive evaluation to demonstrate their value. With each incremental improvement in the CAD algorithm, similar evaluations are required. In this paper a statistical modeling approach is presented to predict the performance of an improved CAD algorithm based on small technical studies. The new method is illustrated for a CAD study of lung cancer detection on chest x-rays. Click here for more.
Alex Z. Fu, PhD et al. quantified, at the US national level, the marginal differences in a list of health-related quality of life (HRQoL) measures for diabetic patients with and without macrovascular comorbid conditions. The list of HRQoL measures included EQ-5D index, EQ-VAS, SF-12 PCS, and SF-12 MCS. Results of this study are valuable for future comparative-effectiveness and cost-effectiveness analyses in diabetes. More details of the study and the results can be found here.
Patient Participation in Research Among Solid Organ Transplant Recipients in the United States. The principal findings from this study indicate that in the transplant recipient population, patients that participate in research are systematically different from non-participants. Participants differ with respect to demographic characteristics, socioeconomic status, co-morbid conditions and distance to centers. Participants also have superior outcomes relative to non-participants and rates of participation vary widely by center. These results suggest that the external validity of research findings may be questionable and future study is needed to understand the mechansims of these differences. Read more.
Xiao-Feng Wang, PhD et al., proposed new fast Fourier transform (FFT) estimation algorithms for two measurement error problems, density/distribution estimation in an additive error model and nonparametric regression with errors-in-variables. A new software package "decon" for R was also developed, which contained a collection of functions to deal with the measurement error problems using deconvolution kernel methods. The paper was published in 2011: 39(10), 1-24, Journal of Statistical Software (ISI Journal Citation Reports Ranking, 2009-2010: 8/100 in Statistics & Probability). Full details of the study and the software can be found here.
Mascha EJ and Imrey PB: Factors affecting power of tests for multiple binary outcomes. Statistics in Medicine 2010; 29: 2890-2904.
In perioperative clinical research the primary outcome is frequently a composite endpoint consisting of several binary events, such as post-operative complications. Mascha and Imrey show how the relative efficiencies of tests comparing groups on such a composite depend on the magnitudes and variabilities of the component incidences and treatment effects as well as component correlations. Multivariate GEE tests which estimate either a common effect or distinct effects across components are more flexible and often more powerful than the traditional collapsed composite (any-vs-none) or count methods. Particularly useful is the proposed average relative effect test, unique in not being ‘driven’ by the highest frequency components. When effects are expected to differ across components and relative effects are at least as important as absolute effects, this test may well be preferred.
Mascha EJ, Sessler DI. Statistical grand rounds: Equivalence and noninferiority testing for regression models and repeated measures designs. Anesth Analg 2011, 112: 678–687.
Equivalence and noninferiority designs are useful when superiority of one intervention over another is neither expected nor required. Mascha and Sessler give a tutorial on design and analysis of such studies, with a focus on regression models to facilitate covariable adjustment, interaction assessment, and repeated-measures designs. They test equivalence in the regression setting by assessing whether the (100 – 2 × alpha)% confidence interval for the covariable-adjusted treatment effect beta estimate falls within the a priori specified equivalence region, and/or with two one-sided tests; noninferiority is basically a 1-sided equivalence test of whether a new treatment is ‘not worse’ than standard. Given group-time interactions, equivalence and noninferiority tests are conducted at specific times. Although they focus on continuous outcomes, extensions to other data types and optimal sample size formulae are discussed.
Hu and Fu propose a general framework for predicting utility for joint health states. This framework includes the 3 nonparametric estimators (multiplicative, minimum, and additive) as special cases. A new simple nonparametric estimator, the adjusted decrement estimator, [U(ij) = U(min) - U(min)(1 - U(i))(1 - U(j))], is introduced under the proposed framework. When applied to 2 independent data sources, the new nonparametric estimator not only generated unbiased prediction of utilities for joint health states but also had the least root mean squared error and highest concordance when compared with other nonparametric and parametric estimators.
Kidney transplantation offers a significant survival advantage to End-Stage Renal Disease patients. Current policies used to allocate deceased donor kidneys attempt to balance utility and equity for patients. In this manuscript, Schold et al propose that even with the limited availability of organs, there are marked efficiencies that could be implemented to improve allocation policy and ultimately extend patients lives. These improvements stem from (a) identifying the right organ for the right patient based on the expected benefit associated with a given transplant and (b) reducing variability in centers’ implementation of policy which contributes to inefficiencies. We suggest that these changes can improve allocation and utilization of donor organs without deleteriously impacting equity to patients.
Hu et al study R2 statistics of explained variance for linear mixed-effect models when applied to longitudinal data, where the R2 statistics are computed at different levels to measure respectively within- and between-subject variabilities explained by the covariates. By deriving the probability limits, the interpretation of explained variance for the existing R2 statistics is found to be clear only in the case where the covariance matrix of the outcome vector is compound symmetric. Two new R2 statistics are proposed to address the effect of time-dependent covariate means. In the general case where the outcome covariance matrix is not compound symmetric, they introduce the concept of compound symmetry projection and use it to define new R2 statistics.
Numerous studies have reported a strong inverse association with waiting time on dialysis and renal transplant outcomes. In a recent study, Jesse Schold PhD, et al, demonstrated that that the majority of the effect is explained by the time to patients' placement on the waiting list rather than cumulative dialysis time. These findings suggest that waiting time may be a proxy for socioeonomic status, access to care and pre-existing comorbidities rather than the dialysis alone This study may have important implications for organ allocation policy, risk adjustment and prospective patient interventions. The study can be found in the American Journal of Transplantation.
Michael Kattan, PhD, and John Barnard, PhD, recently received $1.2 million from the NIH to greatly expand and enhance their risk calculator constructor, found here (www.risk-calculator-constructor.org). Several interface improvements are planned, as well as functionality enhancements including missing predictor imputation.
Alex Z. Fu, PhD discusses results of a study finding that Statins Are Underused in Type 2 Diabetics in Renal and Urology News.
Xiaofeng Wang, Ph.D. et al proposed testing methods to compare nonparametric surfaces. They considered a test statistic by means of an L2-distance under a completely heteroscedastic multivariate nonparametric model. They also extended the test statistic for use in the case of spatial correlated errors. Two bootstrap procedures were described in order to approximate the critical values of the test depending on the nature of random errors. Both theoretical properties and numerical properties of the proposed methods were investigated. The resulting algorithms and analyses were illustrated with a real medical image study. Full details of the study can be found here.