## Department of Quantitative Health Sciences

### The Hazard Package

#### Hazard Function Technology

Some of the most relevant outcomes of medical procedures, or of the life-history of machines, are time-related events. The "raw data" for such events is the time interval between some defined "time zero" (t=0) and the occurrence of the event. The distribution of a collection of these time intervals could be viewed as a cumulative distribution table or graph, although commonly the compliment of the cumulative distribution is displayed as a so-called survivorship function. Another way to visualize the intervals would be as a histogram or probability density function; however, because the fundamental questions about these intervals relates to some biologic or natural phenomenon across time, the more natural domain for study is as the rate of occurrence.

The rate of occurrence of a time-related event is known as the hazard function. John Graunt brought this word from dicing into the arena of time-related events during the 17th century. It is sometimes called the "force of mortality." In financial circles, it is the inverse of Mills ratio.

Actually, all one is dealing with is the distribution of a positive variable, so the methodology embodied in hazard function analysis is applicable to any positively distributed variable.

The nature of living things and real machines is such that lifetimes (or other time-related events) often lead to rather simple, low-order distributions. For this reason, we have believed that low-order, parametric characterization of the distribution can be accomplished.

The parametric approach taken in the hazard procedures developed in the early 1980s at the University of Alabama at Birmingham was a decompositional approach. The distribution of intervals is viewed as consisting of one or more overlapping "phases" (herein called early, constant, and late) additive in hazard (competing risks). A generic functional form is utilized for the phases that can be simplified into a large number of hierarchically nested forms.

Each phase is scaled by a log-linear function of concomitant information. This allows the model to be non-proportional in hazards, an assumption often made, but often unrealistic.

Finally, the hazard model has been enriched in 3 ways. Because the intervals may not be known completely (incomplete, censored data), right censoring, left censoring, and interval censoring has been incorporated into the procedure. Second, the events considered may be repeating. This automatically accommodates a wide class of time-varying co-variables, that class that can be considered to change at specific intervals. Third, the event may be weighted on a positive scale (such as cost). Thus, the procedure, at its most complex, can accommodate time-related repeating cost data, with time-varying co-variables, and a non-proportional hazard structure.

For questions or comments, please contact us at hazard@bio.ri.ccf.org