Читать книгу The Statistical Analysis of Doubly Truncated Data - Prof Jacobo de Uña-Álvarez, Prof Carla Moreira - Страница 18

1.4.1 Childhood Cancer Data

Оглавление

The Childhood Cancer Data were gathered from the IPO (Instituto Português de Oncologia) of Porto, Portugal, by the RORENO (Registro Oncológico do Norte) service. The information corresponds to all children diagnosed from cancer between 1 January 1999 () and 31 December 2003 () in the region of North Portugal, which includes five districts: Porto, Braga, Bragança, Vila Real and Viana do Castelo. The variable of main interest is the age at diagnosis which, by definition of childhood cancer, is supported on the interval (time in years). The number of cases was 409. However, for three cases the value of was not available, so we only consider the children who report complete information.

Because of the interval sampling, the age at diagnosis is doubly truncated by the pair , where the right‐truncation variable is the time in years from birth (date of onset, ) to 31 December 2003, and . The triplets , , with the values observed for were reported in Moreira and de Uña‐Álvarez (2010), while de Uña‐Álvarez (2020) included the cancer group in the statistical analysis. Ordinary descriptive statistics can be applied to the information gathered along this 5 year long window to compute, for instance, the average age at cancer diagnosis. However, if the goal is to describe the population of children eventually developing cancer, the double truncation issue should be acknowledged and properly corrected, so potential biases are avoided.

Interestingly, the observed values for range between and 14.5 (years); equivalently, the observed values for range between 0.5 and 19.5. This means that the lower and upper endpoints of and satisfy and . Thus, in this case, the target variable is observable on its whole support , and there are no identification issues for , the cdf of . Information on is summarized in Table 1.1.

Table 1.1 Descriptive statistics for Childhood Cancer Data: sample size and mean (and standard deviation, SD) for the age at diagnosis (years).

Group Mean (SD)
All 406 6.47 (4.50)
By gender Female 178 6.43 (4.51)
Male 228 6.51 (4.51)
By ICCC Group Leukemia 107 6.30 (4.15)
Lymphoma 57 8.66 (4.39)
N. System Tumour 94 6.38 (4.29)
Neuroblastoma 38 3.16 (3.47)
Other 105 6.87 (4.70)
Missing 5 3.92 (5.18)

This dataset is used in Chapters 2, 3 and 5 and is accessible in the DTDA package in ChildCancer.

The Statistical Analysis of Doubly Truncated Data

Подняться наверх