Using mortality to predict incidence for rare and lethal cancers in very small areas

Incidence and mortality figures are needed to get a comprehensive overview of cancer burden. In many countries, cancer mortality figures are routinely recorded by statistical offices, whereas incidence depends on regional cancer registries. However, due to the complexity of updating cancer registries, incidence numbers become available 3 or 4 years later than mortality figures. It is, therefore, necessary to develop reliable procedures to predict cancer incidence at least until the period when mortality data are available. Most of the methods proposed in the literature are designed to predict total cancer (except nonmelanoma skin cancer) or major cancer sites. However, less frequent lethal cancers, such as brain cancer, are generally excluded from predictions because the scarce number of cases makes it difficult to use univariate models. Our proposal comes to fill this gap and consists of modeling jointly incidence and mortality data using spatio‐temporal models with spatial and age shared components. This approach allows for predicting lethal cancers improving the performance of individual models when data are scarce by taking advantage of the high correlation between incidence and mortality. A fully Bayesian approach based on integrated nested Laplace approximations is considered for model fitting and inference. A validation process is also conducted to assess the performance of alternative models. We use the new proposals to predict brain cancer incidence rates by gender and age groups in the health units of Navarre and Basque Country (Spain) during the period 2005–2008.


INTRODUCTION
Cancer incidence predictions play an important role in epidemiology allowing cancer monitoring in a population even in the absence of specific control plans. For administrative purposes, predictions are also useful to support public health decision-making processes related to interventions, screening, cancer control programs, treatments, and rehabilitation. Cancer predictions have different purposes. In countries without a national cancer registry, the interest resides in estimating or predicting cancer incidence at the national level. For that aim, different prediction techniques have been developed by Galceran et al. (2017), Uhry et al. (2007), and Ferlay et al. (2018). In some countries, such as Spain, France, or Italy, cancer incidence figures are monitored by provincial cancer registries covering only a part of the population (see Figure A.1 in the Appendix); nevertheless, mortality numbers are provided by National Statistical Offices making them available at different levels (municipality, province, autonomous region, or national level). In this context, different approaches and statistical models have been developed to estimate national incidence using national mortality and the polled incidence data provided by the local registries (Møller et al., 2003). Due to the complexity of updating cancer registries, incidence numbers become available 3 or 4 years later than mortality figures. The completeness of a cancer registry database is a very important quality requirement. Therefore, cancer registries, researchers, and local health decision-makers are highly interested in developing reliable procedures to complete the registry databases at least up to the period when the mortality data are available. Most of the methods proposed in the literature stem from the recommendations of the International Agency for Research on Cancer (IARC) on how realistic predictions should be done. According to this agency, predictions of cancer incidence should fulfill a list of requirements . First, predictions should be smooth over time. Abrupt changes in time trends may lead to the appearance of unexpected or implausible incidence trends within a registry's dataset. Second, they must be comparable in different populations or regions. This allows to identify high or low incidence patterns by specific regions. Third, age-specific incidence curves should be provided, including childhood cancer rates. Incidence rates of cancer in children tend to be lower than the rates in adults, although there are some well-documented geographical and ethnic differences for certain pediatric cancers, such as Brain and Central Nervous System cancer (hereafter BCNS) or leukemia (Steliarova-Foucher et al., 2017). Unexpected drops in age-specific trends may indicate problems with source files, for example, the size of the populations at risk in the age groups.
Finally, the mortality-to-incidence (M/I) ratio should be taken into account. This ratio compares the number of deaths due to a specific type of cancer over a specific period of time (usually obtained from a source that is independent of the registry such as National Statistical Offices) with the number of new cases of that type of cancer registered during the same period by the cancer registry. This ratio is also an important indicator of completeness as long as the quality of the mortality data is good. Usually, the observed M/I ratios for a specific registry are compared to the values obtained for a similar cancer registry or region. M/I ratios higher than expected raise suspicions of incompleteness.
Based on all these recommendations, different methods have been proposed in the literature. The very first procedures come from the Finnish Cancer Registry (Hakulinen et al., 1986;Teppo et al., 1974), and they are based on the linear extrapolation of cancer incidence trends. However, age-period-cohort (APC) models (Holford, 1983;Osmond, 1985) have been historically the most popular tools (Dyba & Hakulinen, 2000;Møller et al., 2003). Different versions of the APC models were developed by Møller et al. (2003). In particular, different link functions between rates and covariates were employed (the log link and the power link), and shorter and longer observed time trends were used. At a local or national level, research has been conducted to predict cancer incidence rates based on the previous methodologies. Most of the literature provides incidence and mortality estimates and predictions for total cancer and/or for the most common cancer types such as breast, prostate, or colorectal cancer (Bezerra-de Souza et al., 2012;Sánchez et al., 2010). Less frequent cancer sites such as brain, pancreatic, or ovarian cancer are generally excluded from predictions. The main reason to do this is because the aforementioned APC models require a disaggregation of the number of cases by age group and calendar year. However, data scarcity leads to imprecise incidence forecasts when these methods are used and, therefore, predicting rare or less frequent cancers becomes a challenge from a methodological point of view. As far as we know, there is no specific methodology to solve this problem and we therefore propose a joint modeling method with spatial and age-shared components that elegantly exploits the correlation between cancer incidence and mortality to improve incidence forecasts of rare cancer types. Here we illustrate the methodology by predicting BCNS incidence rates in subregions of Navarra and Basque Country, two northern regions of Spain that have historically presented very high BCNS incidence rates compared to other regions in Europe . This cancer is very lethal with a high correlation between incidence and mortality. Hence, it is the necessity of careful monitoring over time. Our approach takes into consideration the previ- 1-Gran Bilbao 2-North Biscay 3-South Biscay 4-West Gipuzkoa 5-East Gipuzkoa 6-Donostia-Bajo Bidasoa 7-Alava 8-Mid Navarra 9-Navarra South 10-Navarra North 11-Pamplona F I G U R E 1 Navarre (regions 8-11 on the right) and the Basque Country (Regions 1-7 on the left), Spain ous recommendation of the IARC as different age, time, gender, and spatial-specific terms are considered in the models. Moreover, our proposal is an interesting strategy to predict incidence for rare and lethal cancers because the multivariate modeling smartly overcomes sparsity by putting together two sources of information, mortality, and incidence. The rest of the paper is laid out as follows. In Section 2, an exploratory data analysis is provided to set the problem. Section 3 describes a set of joint models predicting cancer incidence, and how computation, model parameter estimation, and prediction are conducted. A validation process is presented in Section 4. Results are shown in Section 5. Finally, the paper ends with a discussion.

BCNS INCIDENCE AND MORTALITY DATA FROM NORTHERN SPAIN
Navarre and the Basque Country are two regions located in northern Spain ranked among the European regions with the highest rates of BCNS . More precisely, Navarre and Basque Country are in the ninth and 19th position out of 119 in the ranking of regions with the highest rates (both genders) in Europe. Previous geographical analysis in Spain also showed a cluster of high risk in these regions. Some of these investigations were motivated by the possible association between BCNS and the types of soil cover and/or crop and plant protection treatments used in rural areas, but no evidence was found. Despite the efforts to identify BCNS risk factors, very little progress has been made. Besides exposure to ionizing radiation, no other definitive risk factor is known (Connelly & Malkin, 2007;Ugarte et al., 2015a). Our study is based on incidence cases and deaths of brain and central nervous system tumors (C70-C72, International Classification of Diseases-10) reported by the regional population-based cancer registries of Navarre and the Basque Country. Data are organized by age group, gender, period, and region. More precisely, data are split by 18 age groups, gender, regions, and calendar year (1989-2008 for mortality and 1989-2004 for incidence). Figure 1 displays the regions of Navarre and the Basque Country considered in this paper. The regions are numbered from 1 to 11. Regions 1-7 belong to the Basque Country (1-3 to the province of Vizcaya, regions 4-6 to the province of Gipuzkoa, and region 7 represents the province of Alava). Finally, regions 8-11 belong to the province of Navarre.
A total of 3615 cases of malignant brain tumors between 1989 and 2004 (55.29% males and 44.71% females) and 3296 deaths between 1989 and 2008 (55.10% males and 44.90% females) were reported by the two cancer registries, representing on average 225 incidence and 165 mortality cases per year. Crude incidence and mortality rates of brain cancer per 100,000 inhabitants were calculated using 18 age groups, the two genders, and all the regions. Similar overall crude incidence and mortality rates were observed (6.8 and 6.20 cases per 100,000 inhabitants, respectively). Figure 2 shows age-specific incidence (continuous line) and mortality (dashed line) rates for males (blue) and females (red), respectively, during the study period. Although the distribution is similar in shape in both sexes, differences can be observed with males having higher incidence and mortality rates in all age groups. Both incidence and mortality rates peak in the 65-80 age group, decreasing for 80+. There is also a small peak in incidence rates in early childhood (0-4, and 5-9 age groups) in both sexes. This is not very common in other cancer sites. This exploratory analysis shows that gender and age at diagnosis are particularly important in characterizing brain cancer. Similar to other research on rare cancer types (Etxeberria et al., 2017) and to ensure a sufficient number of cases to allow model fitting and prediction, the Gender−specific incidence and mortality rate trends per 100.000 1989−1990 1993−1994 1997−1998 2001−2002 2005−2006 Incidence males Mortality males Incidence females Mortality females F I G U R E 3 Crude incidence and mortality rates trends by gender 18 groups are now reorganized into the following age groups <40, 40-49, 50-59, 60-69, 70-79, and 80+, and the period is managed on a biannual basis, 1989-1990, 1991-1992, . . . , 2007-2008. Data scarcity and the consequent huge variability preclude the analysis if yearly data are considered. Male (blue) and female (red) global trends of crude incidence and mortality rates are depicted in Figure 3 from 1989 to 2008. Note that incidence is only considered up to 2004. In males, the crude incidence rates increase up to 1994, they decrease up to 2000 and experience a V-shaped trend up to 2004. Incidence rates for females present an increasing trend up to 2002 and a slight decrease in the past 2 years. Crude mortality rates show an upward trend throughout the entire period for both genders.
Top panels in Figure 4 display crude incidence (left) and mortality rates (right) per 100,000 inhabitants by region. In this figure, regions located in the north and mid-Navarre are the ones presenting the highest incidence and mortality rates. Overall, the geographical patterns of incidence and mortality are not very different, suggesting a high correlation between them. This is confirmed by the scatter plot of incidence and mortality rates by region at the bottom panel of Figure 4.
The exploratory data analysis provides a preliminary idea of how brain cancer incidence and mortality behave by age group, gender, region, and time. This information is very useful to define suitable models that can appropriately fit the data.
Crude Incidence and mortality rates for the whole period in Navarre and Basque Country Crude incidence rates Crude mortality rates Incidence rates Mortality rates

F I G U R E 4
Crude incidence and mortality rates by region for both genders (top panels) and scatter plot of incidence and mortality rates by region (bottom panel)

MODELS TO PREDICT CANCER INCIDENCE USING MORTALITY DATA
In this section, different age and gender-specific shared component models are proposed to predict cancer incidence. These shared component models constitute a simple way of modeling several diseases, and they can be embedded within the general multivariate framework (MacNab, 2010). For a general review of these types of models in disease mapping, the reader is referred to some recent work by MacNab (2016a, 2016b). One difference between shared component models (SCM) and more general multivariate models is that in SCMs, dependence between diseases is assumed a priori whereas multivariate models are more appropriate if the relationship among diseases is unknown. Here we exploit the correlation between incidence and mortality in BNCS, and hence we propose shared component models. The context of our study is the following. Let us define as 1igjt and 2igjt the number of incidence and mortality cases, respectively by health-area = 1, … , = 11, gender (male or female), age group =< 40, 40-49, 50-59, 60-69, 70-79, and 80+ and, time period = 1, … , 10 where 1 = 1989-1990, 2 = 1991-1992, … , 10 = 2007-2008. Incidence data, 1igjt , are only available for = 1, … , 8. In the first level of the Bayesian hierarchical structure, the likelihood, we assume that conditional on the rates, 1igjt and 2igjt follow the next Poisson distributions In these expressions, igjt is the population at risk (the same for incidence and mortality) and 1igjt and 2igjt are the incidence and mortality rates in region , gender , age group , and period . Recall that for = 9, 10 and all , , , the observed incidence rates are unavailable.
The interest here relies on modeling the log incidence rates (log 1igjt ) and log mortality rates (log 2igjt ) jointly and, therefore, to get an appropriate posterior predictive distribution for the nonobserved incidence cases. For this aim, a set of models are proposed. For ease of reading, only some of them are described in this paper. Due to the important role that gender and age groups play in describing brain cancer incidence and mortality patterns, we consider models incorporating space, time, age group, gender, and interactions between them. Let us first start with model 1 (M1) including a genderspecific shared component spatial term, and age and time effects common to both incidence and mortality.
In these expressions, males = ( 1 , … , nm ) ′ and f emales = ( 1 , … , nf ) ′ are assumed to follow multivariate normal distributions. Namely, males ∼ ( , 2 males − ) and f emales ∼ ( , 2 f emales − ), respectively, where is the spatial neighborhood matrix defined by Besag et al. (1991). The symbol − denotes the Moore-Penrose generalized inverse. Here two areas are considered neighbors if they share a common border. Note that the simplest shared component model defined by Knorr-Held and Best (2001), includes an additional parameter , where 2 can be interpreted as the ratio between log-incidence and log-mortality gradients. Prediction models including shared component terms are appropriate as they allow to monitor how much of the spatial pattern is common to both mortality and incidence, how much is specific to each one, and to interpret 1∕ 2 as a kind of mortality to incidence ratio, something recommended by the IARC. Moreover, in this work models including gender-specific parameters = ( males , f emales ) are considered. This idea comes from the work by Etxeberria et al. (2018) in which different spatial shared component models are examined. In particular, they compare gender-specific shared spatial components in which the same parameter or gender-specific parameters are considered. Introducing gender-specific parameters makes the model more flexible, and it provides better results, as the spatial component is allowed to be different between genders with the same or different precision parameters controlling the degree of smoothing. Additionally, a common temporal random effect and another common age effect for incidence and mortality are considered in model 1 (M1) assuming the following distributions for the vectors and : Here, is determined by the temporal structure and is the structure matrix for the age effect. For both terms, time and age, we assume a first-order random walk prior (RW1), as we expect that the effects of contiguous age groups and the effects of contiguous time points tend to be similar. The temporal effect is supposed to be completely structured (its covariance matrix does not contain an unstructured term) because temporal trends are typically strong for most diseases (Knorr-Held, 2000).
To gain flexibility, models including different interactions are also considered. Throughout this investigation, a wide variety of models including all possible interactions were defined and fitted. For simplicity, the models proving best results using this dataset are provided. We would like to emphasize that here the goal is not to propose a model for all situations, but a battery of models based on shared components that exploit the relationship between incidence and mortality. Consequently, using other dataset a different model could be chosen. Now, we extend M1 including a gender-specific time trend (model M2). Model 3 (M3) expands M1 with an outcome-specific age term, and, finally, models 4 (M4) and 5 (M5) also broaden M1 by incorporating gender-specific temporal and age random effects, and gender-specific temporal terms and outcome-specific age effects, respectively. We would like to comment that models including outcome-specific linear trends were also studied, but they did not provide good results.
Looking at Figures 2 and 3, some kind of proportionality is observed between the distribution of crude rates by age group and the crude temporal trends. Then, it seems sensible to assume shared component models for the age and time effects. Based on this, model 6 (M6) and model 7 (M7) include shared component terms for age and time, respectively. In these cases, additional parameters and are considered for the age and time shared component terms. A detailed description of the models is provided below.
Finally, a model including spatially unstructured random effects for incidence (asymmetric formulation) is also considered. This term could explain incidence-specific variability due to region-specific factors occurring before diagnosis, such as screening, improvements in diagnostic techniques (tomography and magnetic resonance imaging in the diagnosis of brain tumors), or improvements in the completeness of the cancer registry (Ellis et al., 2014). Model 8 (M8) introduces this unstructured term as follows: M8 ∶ log 1igjt = ig + gt + + 1 , log 2igjt = 1 ig + gt + 1 .
In this expression, 1 represents spatially unstructured random effects for incidence. Denoting by = ( 1 , … , ) ′ , these random effects are assumed to follow a multivariate normal distribution, ∼ ( , 2 ). It is noteworthy that models including other interactions (such as space-time interactions) were also considered in this work, but they did not improve results.

Computation, parameter estimation, and prediction
Model fitting, inference, and prediction were carried out using Bayesian methodology, specifically, integrated nested Laplace approximations (INLA) (Rue et al., 2009 3.5″ SATA3 500GB). The fitting time for each model varied between 1 and 9 min approximately. In this paper, we were interested in obtaining predictions using INLA. The reader is referred to Etxeberria et al. (2014) and Ugarte et al. (2012) to see how predicted values were obtained when the models are presented under the umbrella of generalized linear-mixed models from an empirical Bayes approach. In INLA, predictions are obtained as a part of the model-fitting itself. As prediction is the same as fitting a model with some missing data, we can simply set y[i] = NA for those unobserved values we want to predict. In our case, we were interested in getting predictions using the same likelihood already used to fit the data. A detailed description of how predictions were obtained caA detailed description n can be found in Appendix A.1. The full code to fit the models will be available on the GitHub of our research group (https://github.com/spatialstatisticsupna). Prior distributions on the precision parameters (inverse of variance components) are required to fully specify the models. In this case, PC-priors (Simpson et al., 2017) were used for the precision parameters males = 1∕ 2 males , females = 1∕ 2 females , = 1∕ 2 , = 1∕ 2 , = 1∕ 2 , = 1∕ 2 , = 1∕ 2 , and, = 1∕ 2 . The reader is referred to Etxeberria et al. (2018) for a thorough insight into the sensitivity analysis conducted to assess the impact of different sets of hyperpriors (PC-priors, log gamma priors, and improper uniform priors on the standard deviations) on the final estimates of shared component models. In this study, sensitivity issues were not found. Besides, log gamma priors (the priors provided by default in INLA) were used for the additional parameters males , f emales , , and in the shared components. Finally, as the models do not include an intercept, sum to zero constraints were imposed in all the terms but the shared spatial effect to ensure model identifiability .
The Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002) and the Watanabe-Akaike Information Criterion (WAIC) (Watanabe, 2010) were used as model selection criteria. The logarithmic score (LS) (Gneiting & Raftery, 2007) was used as an indicator of the predictive ability of the models. As suggested by one reviewer, only DIC values are displayed in Table 1, as the WAIC and LS measures ranked the models in a similar way (see Table A.1 in the Appendix). Note that using this dataset M5 and M8 exhibit the lowest values of DIC (and also WAIC and LS). To go into greater depth on the predictive performance of these models, a validation procedure is carried out in the next section.

VALIDATING CANCER INCIDENCE PREDICTIONS
To assess the predictive ability of all the models, one-step ahead predictions were computed based on different fitting periods. Here, as the time period used is biannual, we considered one-step ahead predictions to assess predictive ability. More precisely, the following process was used to generate predictions. Incidence predictions for the period 1997-1998 were based on models fitted in the period 1989-1996 (the minimum data we used to fit the model are four 2-year time periods). Predictions for 1999-2000 were based on data from 1989 to 1998 and so on. A total of six rounds of cross-validations were done to assess the predictive ability of the models by using the global absolute relative bias (GARB). (10) In this expression, 1igjt represents the observed incidence cases andˆ1 igjt is the predicted incidence cases for each area , age group , and time period . To look into more detailed results, gender absolute relative biases were also computed. Results are shown in Table 2.
Figures in Table 2 clearly indicate that M8 provides the best results in terms of GARB, with the overall bias in this model (0.011) being about 7 times lower than the second best model (M5). By gender, M8 is also the best one. For males, F I G U R E 5 Age-specific relative biases in one-step ahead predictions the GARB is 0.0203 (about 3 times lower than in M3 and M5, two competitive models), and for females the GARB is 0.0481, the lowest value among all models. At this point, it is important to emphasize that what differentiates M8 from the rest of the model is the age-specific shared term plus the spatially unstructured random effects for incidence. It appears that including these last terms in the model substantially improves predictive ability. More specifically, we have observed that models without spatially unstructured random effects for incidence underestimate the number of brain cancer incidence cases. Therefore, introducing this term in the model seems to improve prediction results.
Finally, as incidence varies by age group and region, it is important to assess how models predict over these groups. For health researchers, it is relevant to know if the models provide similar bias by age group and region or if there are subgroups that are better predicted than others. To gain understanding of this, age-and region-specific relative biases are computed in the next subsection.

Validation by age groups and regions
Here, age-specific and region-specific relative biases are computed for each model using the following expressions: . Figure 5 shows interesting results on how the best two models M5 and M8 perform by age groups. In general, model M8 seems to perform best as it provides biases below 10% in all the age groups. This model provides reasonable bias results even in the more difficult age groups < 40 and 80+. For the rest of models, U-shaped biases are observed indicating a bad performance for the oldest age groups. We should not be overly concerned about providing poor predictions for the 80+ age group, as BCNS estimates in the elderly are less important than in other age groups. Brain cancer in elderly people presents some particularities. In most cases, they are not treated as they are usually asymptomatic and brain tumors in this age group have a slow growth rate. Some of the elderly patients present also multiple comorbidities, low tolerance to chemotherapy, high risk for radiation-induced neurotoxicity, and very limited life expectancies (Nayak & Iwamoto, 2010). This is the reason why brain cancer tumors are just followed up among elderly patients rather than treated. In contrast, the age group < 40 is important as brain cancer is the second most frequent cancer in children and young people after leukemia. Hence, providing good predictions is key to better organize resources for treatment and thus to avoid premature deaths (Ugarte et al., 2015b). In this age-group model, M8 performs the best. Figure 6 gives region-specific relative biases for the best two models M5 and M8. By regions, again model M8 is clearly the best in terms of bias. Using this model, Southern Biscay and Western Gipuzkoa are the regions with the highest bias followed by Northern Biscay and Northern Navarre. In summary, model M8 would be the most suitable model for providing incidence predictions as it shows more accurate results both globally and by age groups and regions.

REAL DATA ANALYSIS
In this section, model M8 is considered to provide BCNS cancer incidence predictions in Navarre and the Basque Country by region, age group, gender, and period. This election is based on model selection criteria together with the good performance in the validation process. Using M8, we will focus on predicting incidence cases in periods when mortality figures are already available (2005-2006 and 2007-2008). First of all, the observed and the fitted number of incidence cases and their corresponding 95% credible intervals by period and gender are shown in Table 3. Predicted incidence cases (posterior means) for periods 2005-2006 and 2007-2008 in both genders and 95% credible intervals are also provided. Among males, 592 cases are predicted (290 in 2005-2006 and 302 in 2007-2008) while among females 509 are predicted (254 in 2005-2006 and 255 in 2007-2008). It can be observed that for females the fitted values are all above the observed ones. One reason may be a kind of shrinkage effect. Incidence rates for females are in general lower than in males (see Figure 3), but the difference is getting smaller with time. Hence it seems that the model tends to push female incidence towards males incidence, and hence we observed predicted incidence rates for females above the observed. Figure 7 displays temporal incidence trends and predicted values with their 95% credible bands for 2005-2008 for both genders. This figure shows an increasing trend for both genders during the study period, and this trend could continue in the forthcoming periods for males. On the other hand, the trend seems to stabilize for women from 2005 onwards. Note that, usually the long-term forecast values present more uncertainty than forecasts for the near future. In our case, this is not relevant for two main reasons: First, under M8 both incidence and mortality have the same gender trends, gt , and the estimated mortality trend will be used to forecast incidence and, therefore, the uncertainty will not widen. Second, the incidence forecast is anchored around the observed mortality, which reduces uncertainty. This is a very important advantage of this modeling versus univariate incidence modeling approaches. Figures 8 and 9 display the posterior means of predicted incidence rates for each region in the last time period (2007)(2008) by age groups (rows) and gender (columns). Each region is specifically colored regarding the predicted rates per 10 5 inhabitants, so that it is easy to see its ranking within the different age groups and genders. To indicate the variability  1989−1990 1991−1992 1993−1994 1995−1996 1997−1998 1999−2000 2001−2002 2003−2004 2005−2006 2007−2008 Period Gender−specific incidence rate trends and predictions One of the most important findings of this study is that neither the region nor age groups are equally affected. These maps provide valuable results as the region of Pamplona (main city of Navarre, region number 11 in Figure 1) seems to be the area with the highest rate in almost all age groups in both genders. Then, in a hypothetical brain cancer prevention plan, this area should be considered of high priority. In contrast, Southern Navarre (region number 9 in Figure 1) is the region with the lowest rate for most age groups and both genders. Little variation within regions is observed for age groups < 40, 40-49, and 50-59 where rates remain below 20 cases per 10 5 (below 5 cases per 10 5 for < 40). For the 60-69 age group, males living in Pamplona and Donostia-Bajo Bidasoa, the capital city of the province of Gipuzkoa, are the most affected.

Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions Predictions
Brain cancer rates reach their maximum in the 70-79 age group, in which some geographical differences are found. Regions located on the coast of the Bay of Biscay, Alava, Pamplona, and Mid-Navarre are the ones with the highest rates in males. In females, West Gipuzkoa, Navarra North, and Pamplona are the areas with the highest rates. In both genders, Southern Navarre is the one with the lowest rates. Finally, rates decrease slightly for the 80+ age group with maps more similar to those for the 60-69 age group.

DISCUSSION
High-quality and preferably long-term population-based data on cancer incidence and mortality are crucial for cancer control and prevention. Compared with mortality figures, incidence cases are usually available after approximately 3 years due to administrative and procedural delays. Consequently, health policymakers consider alternative information, usually relying on predictions based on statistical models. Approaches based on age-period-cohort models are usually employed in the literature to provide predictions of cancer mortality or incidence counts, but these methods are not useful for rare and lethal cancers such as BCNS or pancreatic cancer due to data scarcity. Our proposal comes to fill this gap. In this paper, gender-and age-specific shared component models are proposed to predict incidence when mortality is already available. The high correlation between incidence and mortality in brain cancer supports the joint modeling of both processes increasing the effective sample size. The major advantage of our method is that it elegantly exploits the correlation between incidence and mortality allowing disaggregated predictions by region, age groups, and gender, variables playing an important role in BCNS epidemiology (Miranda-Filho et al., 2016). This would be impossible if a univariate prediction model for incidence had been considered due to the scarce number of cases in certain regions and age groups.
Although model-based predictions should be interpreted in light of data limitations and modeling assumptions, we found that our proposed model provides accurate results (with a posterior coefficient of variations under 15%) in general, and in particular in the sensitive age group < 40. Brain tumors are an important type of cancer in children and young adults, and understanding their epidemiology is essential for clinicians and for those involved in the care of patients or investigating the cause of primary brain tumors in these age groups (McNeill, 2016). It should be noted that only a small proportion of brain tumors can be explained by established risk factors (exposure to ionizing radiation, rare mutations of penetrant genes, and familial history) (Fisher et al., 2007).
We expect that predictions at a very disaggregated level will contribute to complete the cancer data series improving health system planning and management of lethal cancers. The results presented in this study also indicate important regional variations in BCNS incidence predictions among Navarre and Basque Country. Projected gender-specific trends indicate that males will have higher incidence rates of BCNS than females. This is consistent with the results obtained in other regions in which the male-to-female ratio ranges from 1.0 to 2.7 (Miranda-Filho et al., 2016). It has been suggested that gender differences could be due to sex hormones and genetic features (McKinley et al., 2000). Like any forecasting method, our proposal also has some limitations. First, not all the regions and age groups are predicted equally well. Data scarcity in some age groups is really an obstacle to provide accurate predictions. Second, the predicted trends are based on the observed ones that do not capture the effects of future events. For example, the implementation of new screening programs, improvements in the data registration, or any change in the definition of a particular malignancy could affect the number of incidence cases to a large extent. Finally, we are aware that our data are not very updated but unfortunately we do not have access to more recent incidence data yet. However, despite these limitations, the methodology presented in this article is a promising alternative to existing techniques when predicting rare and lethal cancer types by age, gender, and region. This paper will provide regional cancer registries with a valuable predictive tool.

A C K N O W L E D G M E N T S
We would like to thank Eva Ardanaz from the Navarre Cancer Registry (Public Health and Labor Institute of Navarre), Nerea Larrañaga from the Basque Cancer Registry, and Covadonga Audícana from the Basque Mortality Registry for providing the data. The work has been supported by Project PID2020-113125RB-I00, MCIN/AEI /10.13039/501100011033, and European Union NextGenerationEU/PRTR and Proyecto Jóvenes Investigadores PJUPNA2018-11.

C O N F L I C T O F I N T E R E S T
The authors declare no potential conflict of interests.

D ATA AVA I L A B I L I T Y S TAT E M E N T
Synthetic data comparable to the original data in size and structure have been included.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section. This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results. The results reported in this article were reproduced partially due to data confidentiality issues.

R E F E R E N C E S S U P P O R T I N G I N F O R M AT I O N
Additional supporting information can be found online in the Supporting Information section at the end of this article.