Spanish Journal of Statistics Imagen decorativa de Spanish Journal Statistics Logo INE

Issue 7. 2025

  • Full publication
  • Presentation of Volume 7, 1, 2025 José María Sarabia
  • Copula based inference for certain types of actuarial datasets: A brief survey Indranil Ghosh, Carman Greenway, Hannah Powers, and Brandon Kelly Kortan
    • Doc.
      DOI
      https://doi.org/10.37830/SJS.2025.1.02
      Abstract

      Copula is a useful tool for modeling bivariate/multivariate dependency structures among others. In this paper, we aim to study various types of dependence indicated by well-known measures of dependence such as Spearman's, Kendall´s, and Blomqvist's ß etc., for certain types of actuarial datasets, which is obtained from the CAS datasets in R software package. Although our primary focus is on the insurance claim datasets, the adopted copula-based procedure can be mimicked in other types of actuarial datasets and other domains as well. On using the CDA vine package in R, we find the best fitted bivariate copula for a given dataset, and subsequently study various structural properties of the derived best fitted bivariate copula. The adopted strategy can be envisioned in identifying and exploring multivariate dependence via Vine copula strategy which will be discussed in a separate article.

      Keywords
      tourism statistics, official statistics, Big Data, modernization, mobile network data, granularity,Bivariate copula, Measures of Association, Kendal's tau, Copula fitting
      Document
  • Gender gap and spatial disparities in the evolution of literacy in Spain, 1860-1910 José Manuel Gutiérrez, Gloria Quiroga
    • Doc.
      DOI
      https://doi.org/10.37830/SJS.2025.1.03
      Abstract

      This article considers the dynamics of Spanish literacy in the period 1860-1910, characterized by local councils' responsibility of public elementary education. To this end, it is built a harmonized series of the literacy of the population aged ten or over, disaggregated by sex and province. Marked spatial differences and a very large gender gap can be observed. Five clusters are determined according to the male literacy rates of the provinces in 1860; these clusters prove to have explanatory power all along the period and for both sexes. A parsimonious statistical model of the evolution of male literacy during the period, introducing linguistic variables, shows a considerable temporal stability of the spatial distribution of male literacy. The model of the evolution of female literacy presents similarities with that of male literacy, although now the initial state (in 1860) is not described by female literacy, but yet by male literacy. All in all, the evolution of literacy in Spain between 1860 and 1910 did not follow the spatial pattern of the economic modernization process. Besides, there was no correlation between birth rates and literacy rates of children, for both sexes, and the same can be said of the correlation between urbanization and literacy. Considering the West European context, the Spanish literacy process during the period 1860-1910 was a failure, except for the geographical area of the top cluster.

      Keywords
      Historical censuses, literacy, nineteenth century, Spain, Official Statistics
      Document
  • Feasibility of Implementing Accelerometers in the Spanish Health Survey Borja del Pozo Cruz, Rosa M. Alfonso Rosa, and Jesús del Pozo-Cruz
    • Doc.
      DOI
      https://doi.org/10.37830/SJS.2025.1.04
      Abstract

      Accurately measuring physical activity, sedentary behavior, and sleep is vital for public health monitoring, but self-reported data are often biased. Accelerometers offer objective data, yet their feasibility within the Spanish Health Survey (ESdE) has not been assessed. This study evaluated the integration of thigh-worn accelerometers in ESdE by analyzing participant compliance, device usability, data return, and comparisons with self-reported measures. A total of 100 adults aged 30-90 were recruited through five provincial delegations of the National Statistics Institute (INE), with each delegation enrolling 20 participants equally split between home-based collection and prepaid return groups. All participants wore a thigh-mounted SENS accelerometer continuously for 7 to 10 days using a water-resistant patch, with two patches provided in case of replacement. INE staff administered the ESdE questionnaire and coordinated device logistics. Valid accelerometry data were obtained from 98 participants, with excellent compliance. Device return rates were 100collection and 85Comparison with self-reported data was only possible for sedentary behavior, where participants consistently underestimated sitting time. Agreement between self-reports and accelerometry was low (ICC = -0.05 to 0.43), and Bland-Altman plots revealed a clear negative bias. These findings demonstrate the feasibility of incorporating accelerometry into national surveys like ESdE, with high participant adherence and minimal operational issues. The objective data provided by accelerometers can complement self-reported measures and capture domains like sleep and incidental activity, which are often missed. Their inclusion in future surveys may enhance the accuracy and utility of lifestyle surveillance in Spain.

      Keywords
      accelerometry, physical activity, sedentary behavior, sleep, health survey, feasibility, selfreport, objective measurement, public health surveillance
      Document
  • Semiparametric von Mises kernel circular density estimator Yasmina Ziane, Nabil Zougab, Kahina Bedouhene, and Smail Adjabi
    • Doc.
      DOI
      https://doi.org/10.37830/SJS.2025.1.05
      Abstract

      In this paper, we propose to estimate the circular density function by the semiparametric bias-corrected circular kernel method using the particular von Mises kernel. This method consists to apply a multiplicative bias correction for the initial parametric model in order to improve the quality of the estimator as well as the bias. Two semiparametric estimators Hjort and Glad (1995) (HG) and Jones, Signorini, and Hjort (1999) (JSH) for probability density estimation are applied on circular data with support [0, 2π). The properties of the latter are reported such as the bias, the variance and the mean square error integrated (MISE). A comparative study is performed to evaluate the performance of the semiparametric estimator (HG and JSH). The popular cross validation technique is adapted for bandwidth selection. A simulation and a real data application for circular data illustrate in terms of integrated squared bias (ISB) and integrated squared error (ISE) that the semiparametric estimators JSH and JLN with the von Mises kernel perform better than the classical and HG estimators.

      Keywords
      Bandwidth selection, Circular data, Cross validation, Multiplicative bias correction MBC, von Mises kernel
      Document
  • Objective Bayesian goodness-of-fit tests for the alpha-skew-normal distribution José Rodolfo Olmos-Zepeda, Sergio Pérez-Elizalde
    • Doc.
      DOI
      https://doi.org/10.37830/SJS.2025.1.06
      Abstract

      The family of alpha-skew-normal (ASN) distributions is a flexible class of three-parameter probability models characterized by their location, scale, and shape. The shape parameter governs both asymmetry and uni-bimodality, allowing the distribution to model unimodal or bimodal data with varying degrees of skewness. This paper proposes an objective Bayesian goodness-of-fit test to determine whether a random sample follows an ASN distribution when parameters are unknown. The test statistics are based on empirical distribution function, whose sampling distributions depend solely on the shape parameter. Their prior predictive distributions, serving as null distributions, are obtained by integrating out the shape parameter with respect to a proper approximation of Jeffreys prior, specifically a Cauchy prior, chosen for its analytical tractability. Critical values are estimated via Monte Carlo simulation. A comprehensive simulation study demonstrates that the proposed tests maintain the nominal significance level across various scenarios and exhibit strong power properties against a range of alternative distributions. Finally, the methodology is illustrated through real-data examples, showcasing its practical applicability.

      Keywords
      alpha-skew-normal distribution, empirical distribution function, goodness-of-fit test, Jeffreys prior, Monte Carlo simulation, prior predictive distribution
      Document
  • Address at the 2024 Spanish National Award in Statistics Concha Bielza
    • Doc.
      DOI
      https://doi.org/10.37830/SJS.2025.1.07
      Abstract

      This article is based on the address given upon receiving the 2024 Spanish national award in statistics, a ceremony made especially meaningful by the attendance of His Majesty the King of Spain. It offers a personal overview of my research trajectory, shaped by the long-standing interplay between statistics and artificial intelligence, and by their applications in neuroscience and industry. After beginning my career in statistical decision theory and probabilistic graphical models, Bayesian networks soon became the central framework of my work, enabling rigorous reasoning under uncertainty across domains.
      In neuroscience, my contributions span neuronal classification, spatial analysis of synapses, modeling of dendritic arborizations, biomarker discovery for neurological disorders, and the decoding of brain activity, among others. These efforts, supported by landmark programs such as the Cajal Blue Brain Project and the Human Brain Project, were driven by the need for models capable of capturing complex, high-dimensional, and often unconventional data.
      Around 2018, my research turned increasingly toward Industry 4.0, where real-time data streams, dynamic systems, and predictive and prescriptive maintenance posed demanding methodological challenges. This led to advances in dynamic Bayesian networks, latent-variable models, and probabilistic evolutionary algorithms for high-dimensional optimization.
      Alongside scientific discovery, I have remained committed to teaching, mentoring, and knowledge transfer through initiatives such as the Machine Learning and Advanced Statistics Summer School at the Universidad Politécnica de Madrid. I conclude with reflections on interdisciplinary work, convergence of statistics and machine learning, ethical use of data and algorithms, and the importance of inspiring new generations of statisticians.

      Keywords
      Bayesian networks, probabilistic machine learning, Bayesian decision theory, reasoning under uncertainty, heuristic optimization, temporal data, interpretable models, artificial intelligence, neuroscience, industry 4.0
      Document