Читать книгу The Statistical Analysis of Doubly Truncated Data - Prof Jacobo de Uña-Álvarez, Prof Carla Moreira - Страница 16

1.3 Double Truncation

Оглавление

A variable of interest is said to be doubly truncated by a couple of random variables if the observation of is possible only when occurs. In such a case, and are called left‐ and right‐truncation variables respectively. Double truncation reduces to left‐truncation when degenerates at , while it corresponds to right‐truncation when . This book is focused on the problem of estimating the distribution of , and other related curves, from a set of iid triplets with the distribution of given .

There are several scenarios where double truncation appears in practice. One setting leading to double truncation is that of interval sampling, where the sample is restricted to the individuals with event between two specific dates and (Zhu and Wang, 2012). Then, the right‐truncation time is , where denotes the date of onset for the time‐to‐event, and the left‐truncation time is , where is the interval width. The Childhood Cancer Data in Section 1.4.1 is an example of data obtained through interval sampling.

With interval sampling the variable is degenerated at . This occurs in other sampling schemes too, in which and are certain subject‐specific event dates. An illustrative example is given by the Parkinson's Disease Data, see Section 1.4.5, where is the individual age at blood sampling. When is constant, the couple falls on a line, and its joint density does not exist, even when the truncating variables may be continuous.

In other situations, the truncating variables and are not linked through the linear equation . For example, and could represent some random observation limits beyond which the variable of interest can not be sampled or detected. Situations like this occur for example in Astronomy, as it is illustrated in Section 1.4.4.

With random double truncation, both large and small values of are observed in principle with a relatively small probability. However, the real observational bias for varies from application to application, depending on the joint distribution of . We will see, for example, that the probability of sampling a value , namely , may be roughly constant, inducing no observational bias; or that it may be roughly decreasing, indicating the dominance of the right‐truncation bias relative to the left‐truncation bias.

Another issue of relevance is that of the identifiability of the distribution of . Intuitively it is clear that with doubly truncated data it is only possible to estimate the distribution of conditional on , where and denote respectively the lower and upper endpoints of the supports of and (see Chapter 2 for details). This may have important practical consequences, as we will see. On the other hand, in applications with doubly truncated survival data the estimates correspond to the susceptible population for which the terminal event of interest is sure. This is in contrast to the standard analysis of survival times where a portion of the individuals may belong to the so‐called cured fraction, or immunes. This should be taken into account when interpreting the results from the analysis.

An important difference of double truncation when compared to one‐sided truncation is that, with doubly truncated data, the NPMLE of the probability distribution has no explicit form. In fact, the NPMLE may be non‐unique and even non‐existing (Xiao and Hudgens, 2019); see Chapter 2. Several iterative algorithms that have been proposed to compute the NPMLE in practice (Efron and Petrosian, 1999; Shen, 2010) will be reviewed in this book, and simulated and real data examples will be analysed with existing libraries of the software R. Semiparametric and parametric alternatives to the NPMLE will be introduced too; these approaches avoid some of the aforementioned potential issues of non‐uniqueness or non‐existence of the NPMLE, also reducing the variance at the price of introducing some bias in estimation. Also, resampling procedures, testing problems, smoothing methods, regression models and multi‐state data analysis under double truncation will be presented.

The Statistical Analysis of Doubly Truncated Data

Подняться наверх