Joint Statistical Seminars

The Department of Mathematics and Mathematical Statistics and the Department of Statistics jointly organize this statistical seminar series. The Seminars are open to employees and students of Umeå University.

3 March 2026, 13.00 Stockholm

Cluster-based generalized additive models informed by random Fourier features

Speaker: Xin Huang, Umeå University

Abstract: Modern regression problems often require balancing predictive accuracy with model interpretability. In this talk, I present a regression approach that combines random Fourier feature representations with generalized additive models (GAMs). Random Fourier features are used to extract latent structure in the covariate space, which then guides the construction of a mixture of cluster-specific GAMs. Each component models nonlinear marginal effects in an additive and interpretable form, while the mixture structure allows the model to adapt across heterogeneous data regimes. Empirical studies on several benchmark datasets show that the proposed method improves upon global additive models and achieves performance comparable to commonly used machine learning methods. In spatial settings, the learned representation aligns with meaningful geographic patterns, illustrating how representation learning and interpretable modeling can be combined in practice.

Venue: MIT.A.346

26 February 2026, 13.00 Stockholm

Causal Inference for Multilevel Longitudinal Data with Country-Level Attrition: A Bayesian G-Formula Approach and Application to Cognitive Aging

Speaker: Huixia Wang, Umeå University

Abstract: Causal inference in longitudinal studies with multilevel structures and attrition poses major methodological challenges, particularly in population-based health research. We extend the g-formula to accommodate clustered longitudinal data with country-level attrition, developing a Bayesian framework for flexible estimation and sensitivity analyses. Simulation studies demonstrate that the proposed method achieves accurate parameter estimation under both complete-case and attrition scenarios, with small biases across linear and nonlinear settings. We apply the method to large-scale European cohort data to estimate the causal effect of depressive symptoms on memory decline. The results suggest that persistent depressive symptoms have a strong negative causal effect on memory, while recovery, especially earlier recovery, is associated with preserved cognitive function. This work provides both a methodological advance for multilevel causal inference and empirical evidence for the role of depression in cognitive aging.

Venue: MIT.A.346

10 February 2026, 13.00 Stockholm

Directional Statistics: inference of intrinsic statistics on the sphere

Speaker: Aron Persson, Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: The area of directional statistics has applications in geology, medicine, radiology and meteorology. I will provide an introduction to these applications and why directional statistics is a useful tool. I will end the talk by going through some recent results on the inference for the projected normal distribution on the sphere.

Venue: MIT.A.346

27 January 2026, 14.15 Stockholm

Predictive performance at different time horizons in the presence of competing risks: machine learning versus statistical time-to-event models

Speaker: Josline Adhiambo Otieno, Department of Statistics, Umeå University

Abstract: Few studies have evaluated competing risk models at multiple time points despite the cumulative incidence of the outcome of interest changing throughout the observation time. This study focused on evaluating the performance of traditional statistical models (the cause specific Cox and Fine-Gray (FG) models), a tree-based model (random survival forest (RSF) for competing risks), a deep-learning-based model (DeepHit) and pseudo-observation-based models (linear regression and random forest (RF) models) for competing risk prediction across multiple evaluation time points. Three evaluation measures are reported. The analyses were based on two datasets of different sizes, the Primary Biliary Cirrhosis dataset, for benchmarking, and a large dataset from the Swedish stroke register (Riksstroke). All models demonstrated a similar trend in performance across horizons in both datasets. Model discrimination improved from short term to midterm followed by a decline, while calibration and overall prediction accuracy consistently worsened. The RSF model achieved better overall performance for short-term prediction in a smaller dataset with a high proportion of outcome of interest, while the RF-pseudo-based model performed well in short-term prediction on a large dataset with rare events. On the other hand, the FG model showed better predictive performance for long-term prediction in both samples. The study concluded that model performance of competing risk models vary substantially across evaluation time points and datasets, highlighting the need for horizon-specific evaluation to guide model selection.

Venue: SAM.A.233

13 January 2026, 14.15 Stockholm

A multi-season epidemic model with random genetic drift and transmissibility

Speaker: Tom Britton, Stockholm University

Abstract: Seasonal epidemics, like seasonal influenza, varies between seasons both in size and timing of the peak. In order to increase understanding about this phenomenon, we study a multi-season epidemic model in which each new seasonal strain of the virus has a new random genetic drift (affecting prior immunity) and a new random transmissibility. Given these quantities, and the immunity status of the population coming into the new season, we can, using epidemic model theory, determine how many that will get infected during the season, and the immunity state of the population after the epidemic season. We analyse properties of this Markov chain to deduce its stationary distribution. We also derive the distribution of the final epidemic size during the season, given the immunity status of the population before the season and the speed of its initial growth. Joint work with Andrea Pugliese.

Venue: SAM.A.233

Past seminars

2025

The modern Statistician's toolbox: What does it take to make a difference outside Academia, 9 December

Speakers:
Sara Sjöstedt de Luna, Department of Mathematics and Mathematical Statistics, Umeå University

Maria Karlsson, Department of Statistics, Umeå University

Anders Lundquist, Department of Statistics, Umeå University

Abstract: We will give a resumé and some additional reflections from the Cramér Society autumn meeting, October 22-23, 2025 organized at KTH. The topic of the meeting was the same as the title of our talk where invited researchers and representatives from various sectors, many outside of academia, gave their perspectives.

The Cramér society advertised that the discussion of the meeting revolved around the following question: In recent years, the rapid development of data-driven methods has continued to challenge and shape statistics education in Sweden. At last year’s autumn meeting, we gathered around the theme “What does a modern statistician actually do?”—a question that sparked lively discussions about the statistician’s professional identity and role in a changing era marked by AI and machine learning. It became clear that while the statistician’s starting point—evaluating uncertainty—remains central, the tools and skills required to make a real impact have become broader and more diverse.

Therefore, this year we follow up with a related and more practically oriented theme: “The modern statistician’s toolbox: What does it take to make a difference outside academia?” Together with invited researchers and representatives from various sectors, we will explore which knowledge, skills, and perspectives are crucial in today’s data-driven work environment—from programming and communication to theoretical understanding and domain expertise.

A Framework for Detecting Structural Heterogeneity in Latent Variable Models, 4 November

Speaker: Gabriel Wallin, School of Mathematical Sciences, Lancaster University

Abstract: Latent variable models are widely used in the social, behavioural, and health sciences to learn the latent structure underlying multivariate data. These models typically assume that the relationship between the set of latent variables and observed variables is identical for all measurement units. In practice, subpopulations may exist where the conditional distribution of a subset of observed variables given the latent variables differs systematically. Detecting such heterogeneity is challenging when both the subpopulations and the affected variables are unknown a priori. To address this problem, this talk presents a hybrid model that probabilistically assigns observations to discrete latent classes, where within each class, a continuous latent variable governs the observed variables. For each class, we estimate class-specific intercept and slope parameters that may deviate from a common baseline. We propose a regularised marginal likelihood estimator that enforces sparsity of these deviations, enabling simultaneous identification of latent classes and selection of heterogeneous variables via a proximal-gradient-based EM algorithm. The approach is illustrated using data from both a personality assessment and a large-scale educational test, where we identify groups that differ on specific variables beyond what is explained by the latent variable. Such patterns have important implications for the validity of these instruments. Connections to recent work on change-point analysis for latent variable models highlight a broader framework for detecting structural breaks in latent processes. This is joint work with Qi Huang (Purdue University).

Can municipalities mitigate the effect of parental job loss on children mental health? Valid test when using machine learning methods, 30 September

Speaker: Natalia Andreeva, Department of Statistics, Umeå University

Abstract: Parental job losses have a well-documented detrimental impact on the mental health of children, yet little is known about how local authorities may shield children’s health from adverse events in parental working lives. We hypothesized that higher municipal funding of elementary schools, after-school care, and hiring more qualified teachers can mitigate the adverse consequences of parental job loss. Using data from the intergenerationally linked Swedish register, we constructed an analytical sample of children aged 7–10 years, whose parents lost jobs during the years 2006-2013 and their matched controls. We identified parental job loss through workplace closures and measured children’s mental health outcomes through prescriptions for anxiety and depression disorders. We present a framework using state-of-the-art machine learning methods, and develop a design taking advantage of the available rich and complex data, which allows us to study and test for causal effects heterogeneity in children’s health across municipalities. Our results demonstrated that increasing municipal spending on elementary schools and after-school centers by 7,500 SEK per student significantly alleviated the negative impacts of parental job loss: by at least half in the case of paternal job loss and by about 90% in the case of maternal job loss. Increasing the percent of highly qualified teachers to 90% also alleviated the effect of maternal job losses, however, we did not find statistically significant moderation in the case of fathers’ job loss. Our findings highlight the importance of the compensatory role municipal authorities play in reducing inequality in children’s mental health.

Tuning derivatives for fairness in machine learning, 3 June

Speaker: Filip Edström, Department of Statistics, Umeå University

Abstract: AI systems and automated decision making are becoming ubiquitous in society and the fairness of these systems and decisions is becoming increasingly important, and as a result the literature on Fairness in Machine Learning is quickly expanding. A main purpose in this field is to obtain decision algorithms (predictions) that do not make use of certain features (protected attributes) for given fairness reasons (e.g. discrimination), even though the data (observation of the world) point outs such attributes as relevant predictors for the decision. A promising research direction builds on formal frameworks for causal reasoning, in order to disentangle path-specific effects of protected attributes. Important concepts here include Statistical parity (where such effects through not-allowed path should be eliminated) and Predictive Parity (where effects through paths allowed because of business necessity should be kept). Fair algorithms/predictors should fulfill such parity conditions, or when that is not possible find an acceptable balance between them. In this paper, we fill a gap in the field by defining Statistical and Predictive Parity in terms of partial derivatives, which allows for the handling of mixed continuous and discrete protected attributes. Indeed, existing fairness methods are typically not suited to handle continuous features. We provide conditions under which such a predictor exists. We, moreover, introduce a method called Fair Tuning that produces a fair predictor when statistical and predictive parity are compatible, and otherwise tune these conditions to achieve a compromise. We study the theory and methods introduced through simulated experiments. In particular, how prediction performance is affected by imposing statistical parity, as well as when statistical and predictive parity are tuned to a compromise. We, finally, illustrate the use of fair tuning on the COMPASS dataset.

Sample size estimation for functional data analysis, 6 May

Speaker: Reza Seydi, Department of Statistics, Umeå University

Abstract: Recent studies in biomechanics and human movement sciences have shown increasing interest in inferential methods for curve data analysis. Despite the development of new functional methods, power analysis for sample size estimation during the data collection phase remains less explored. We have developed an interactive R Shiny application that aids researchers in performing a priori power analysis by allowing them to explore how parameter changes affect statistical power when using inferential methods appropriate for curve data. In addition, we performed a simulation study, examining how changes in the standard deviation and smoothness of noise functions influence the sample size required to achieve a statistical power of 0.80. We compared the estimated sample sizes for six widely used inferential methods, including statistical parametric mapping (SPM), F-max, interval-wise testing (IWT), threshold-wise testing (TWT), and two envelope tests, extreme rank length (ERL) and iterative adaptive two-stage envelope (IATSE). Our simulation study revealed that when substantial differences between mean curves cover a wide area of the domain, smoother noise functions demand larger sample sizes with only minor variations between methods. In this case, ERL, SPM, and F-max require slightly lower sample sizes than IATSE and TWT, while IWT needs slightly more than the other methods. Conversely, when differences are restricted to a narrow domain segment, most methods require a lower sample size or maintain constant sample sizes as noise smoothness increases, except for IWT and TWT, which demand considerably larger samples. In this scenario, ERL, SPM, and F-max again require slightly lower sample sizes than IATSE. These findings emphasise the importance of appropriate sample size planning and method selection for valid inference in functional data analysis.

Addressing measurement error in social science data, 6 February

Speaker: Patricía Martinková, Charles University, Prague

Abstract: Measurement error is omnipresent in social science data. Assessing the sources and impact of the error is important for designing policies to increase measurement reliability, and for developing high-quality ratings. In this work, we discuss several statistical aspects addressing the measurement error in social science data. We introduce a flexible method for assessing heterogeneity in measurement error and reliability with variance component models. We also discuss the relationship between the reliability and the false positive rate and address the issue of zero estimates. Methods are demonstrated with real-data examples from teacher hiring and grant proposal peer-review.

References:

- Martinková, P., & Hladká, A. (2023). Computational Aspects of Psychometric Methods: With R. Chapman and Hall/CRC. https://doi.org/10.1201/9781003054313

- Martinková P., Bartoš F., & Brabec M. (2023). Assessing inter-rater reliability with heterogeneous variance components models: Flexible approach accounting for contextual variables. Journal of Educational and Behavioral Statistics, 48(3), 349–383. https://doi.org/10.3102/10769986221150517

- Bartoš, F., & Martinková P. (2024). Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12343

- Erosheva E, Martinková P, & Lee CJ (2021). When zero may not be zero: A cautionary note on the use of inter-rater reliability in evaluating grant peer review. Journal of the Royal Statistical Society — Series A, 184(3), 904-919. https://doi.org/10.1111/rssa.12681

A comprehensive simulation study evaluating the predictive performance of Cox proportional hazards model and machine learning algorithms for time-to-event data, 28 January

Speaker: Josline Otieno, Department of Statistics, USBE, Umeå University, Sweden

Abstract: Many data-driven risk prediction models have been developed for analyzing time-toevent data. However, choosing the most suitable model for accurate predictions in a specific medical application remains a challenge. Simulation enables effective comparison based on equal-sized datasets. This study provided a comprehensive and fair comparison of the survival prediction performance of machine learning (ML) models and the traditional Cox proportional hazards (PH) model, using both simulated and real datasets. The ML models included random survival forests, eXtreme Gradient Boosting, and deep neural networks (DeepSurv). We assessed model performance using C-index and Integrated Brier Score. The evaluation was performed under different data-generating mechanisms such as varying sample sizes, censoring proportions, addition of noise variables, and in the presence of different types of model misspecification. All the models showed improved predictive performance with an increase in sample sizes. However, their performance declined as the censoring proportion increased. An increase in noise variables reduced prediction accuracy across all models, regardless of dataset size. Tree-based models demonstrated promising predictive performance compared to the Cox PH model and DeepSurv in the presence of misspecification
and large number of noise variables. The Cox PH model performed well with larger sample sizes and fewer noise variables. It also performed well when the model was correctly specified or had only minor misspecification.

2024

IV-learner: learning conditional average treatment effects using instrumental variables, October 24
Speaker: Karla Diaz Ordaz, Department of Statistical Science, University College London

Abstract: Instrumental variable methods are very popular in econometrics and biostatistics for inferring causal average effects of an exposure on an outcome where there is unmeasured confounding. However, their application for learning heterogeneous treatment effects, such as conditional average treatment effects (CATE), in combination with machine learning in investigating treatment effect heterogeneity is somewhat limited.

A generic approach that allows the use of arbitrary machine learning algorithms can be based on the popular two-stage principle. In this two-stage approach, we can learn causal treatment effects by regressing the outcome on the predicted exposure, based on a first-stage regression of exposure on instrumental variables (and pre-exposure covariates). This gives rise to the IV-double machine learning (IV-DML) approach of Foster and Syrgkanis (2023).

Unfortunately, the slow convergence rates of the data-adaptive estimators that affect the first-stage predictions propagate into the resulting CATE estimates. In view of this, we make an alternative proposal, the IV-learner, which is inspired by infinite-dimensional targeted learning procedure (Vansteelandt 2023, van der Laan et al 2024), which strategically tailors first-stage predictions to perform well in their ultimate task: CATE estimation. The resulting targeted Neyman-orthogonal learner is easy to construct based on arbitrary, off-the-shelf learners. We study the finite sample performance of our proposal using simulations, and compare it to existing methods. We also illustrate it using a real data example.

Ths is a joint work with Stijn Vansteelandt, Stephen O’Neill, Richard Grieve.

Optimal ownership and capital structure with agency conflicts and debt renegotiation, August 19
Speaker: Zhaojun Yang, Department of Finance, Southern University of Science and Technology, Shenzhen, China

Abstract: We develop a continuous-time growth investment model with debt renegotiation to examine agency conflicts among controlling shareholders, minority shareholders, and creditors. We show that controlling shareholders' private control benefits accelerate investment while their skin-in-the-game and debt overhang delay it. Exploiting the two opposite conflicts can realize the first-best investment. Increasing debt or controlling shareholders' equity alleviates agency conflicts. We reveal how entrenchment and alignment effects shape optimal ownership structure and capital structure. Debt renegotiation has two opposite effects on investment: exacerbating debt overhang for low-risk projects and alleviating it for high-risk ones. We present model implications for corporate security design.

Introducing nonparametric monotone multiple choice item response theory models and bit scales, May 28
Speaker: Joakim Wallmark, Department of Statistics, USBE, Umeå University, Sweden

Abstract: Item Response Theory (IRT) is a powerful statistical approach for evaluating test items and determining test taker abilities through response analysis. An IRT model that better fits the data leads to more accurate latent trait estimates. In this study, we present a new nonparametric model for multiple choice data, the monotone multiple choice (MMC) model, which we fit using autoencoders. Using both simulated scenarios and real data from the Swedish Scholastic Aptitude Test, we demonstrate empirically that the MMC model outperforms the traditional nominal response IRT model in terms of fit. Furthermore, we illustrate how the latent trait scale from any fitted IRT model can be transformed into a ratio scale, aiding in score interpretation and making it easier to compare different types of IRT models. We refer to these new scales as bit scales. Bit scales are especially useful for models for which minimal or no assumptions are made for the latent trait scale distributions, such as for the autoencoder fitted models in this study.

Causal inference targeting a concentration index for studies of health inequalities, May 23
Speaker: Mohammad Ghasempour, Umeå School of Business, Economics and Statistics (USBE), Umeå University

Abstract: A concentration index, a standardized covariance between a health outcome and relative income ranks, is often used to quantify income-related health inequalities. There is, however, a lack of formal approach to study the effect of an exposure/intervention, e.g., education, on such measures of inequality. In this paper we contribute to this gap by developing the theory and method in this field. We define a counterfactual concentration index of interests for different levels of an exposure. We then deduce the efficient influence function of this target estimand, which allows us to propose estimators, which are regular asymptotic linear under certain conditions. Nuisance functions, possibly high-dimensional, need to be fitted to implement these estimators. The latter have robustness properties allowing for convergence rates slower than n-rate for some of the nuisance function fits. The relevance of the asymptotic results for finite samples is studied with simulation experiments. We also present a case study of the effect of education on income-related health inequality for a Swedish cohort born 1950.

Extended generalized additive modelling with shape constraints, April 9
Speaker: Natalya Pya Arnqvist, Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: Regression models that incorporate smooth functions of predictor variables to explain the relationships with a response variable have gained widespread usage and proved successful in various applications. By incorporating smooth functions of predictor variables, these models can capture complex relationships between the response and predictors while still allowing for interpretation of the results. In situations where the relationships between a response variable and predictors are explored, it is not uncommon to assume that these relationships adhere to certain shape constraints. Examples of such constraints include monotonicity and convexity. Shape-constrained additive models (SCAM) offer a general framework for fitting exponential family generalized additive models with shape restrictions on smooths. The main objective of this talk is to provide extensions of the existing framework for SCAM with a mixture of unconstrained terms and various shape-restricted terms to accommodate smooth interaction of covariates, varying coefficient terms, linear functionals with or without shape constraints as model components, and data with short-term temporal or spatial autocorrelation. The practical usage of the suggested extensions will be illustrated in several examples.

DANSE - Data-Driven Non-linear State Estimation of Model-free Process in Unsupervised Learning Setup, February 28
Speaker: Saikat Chatterjee, School of Electrical Engineering and Computer Science, KTH-Royal Institute of Technology, Sweden

Abstract: This seminar will address a standard Bayesian state estimation problem, like Kalman Filter. The major new thing is that Kalman Filter to Particle Filter, almost all the methods are assumed to know the underline state space model or process model (also known as process dynamics), but our new method DANSE does not. DANSE learns from noisy measurements without access to clean data and/or state space models. That means DANSE learns in an unsupervised manner from noisy measurements, and fully model-free. It is an interesting combination of deep learning and Bayesian learning, and then perform Bayesian estimation.

Causal inference for semi-competing risks data, February 1
Speaker: Associate Professor Daniel Nevo, Department of Statistics and Operations Research, Tel Aviv University

Abstract: An emerging challenge for time-to-event data is studying semi-competing risks, namely when two event times are of interest: a non-terminal event (e.g. Alzheimer's disease diagnosis) time, and a terminal event (e.g. death) time. The non-terminal event is observed only if it precedes the terminal event, which may occur before or after the non-terminal event. Studying treatment or intervention effects on the dual event times is complicated because for some units, the non-terminal event may occur under one treatment value but not under the other. Until recently, existing approaches generally disregarded the time-to-event nature of both outcomes. More recent research focused on principal strata effects within time-varying populations coupled with Bayesian estimation. In this talk, we will present alternative estimands, based on a single stratification of the population, corresponding to the scientific questions of interest. We present a novel assumption utilizing the time-to-event nature of the data, that is generally more flexible than the often-invoked monotonicity assumption. Our new assumption enables partial identifiability of causal effects of interest. We further present a frailty-based sensitivity analysis approach, and give conditions under which full identification is possible. We present non-parametric and semi-parametric estimation methods under right censoring. We illustrate the utility of our approach in a study of the causal effects of having APOE e4 allele on late-onset Alzheimer's disease and death.

2023

Double robust estimation of functional outcomes with data missing at random, November 28
Speaker: Kreske Ecker, Department of Statistics, Umeå University

Abstract:We present and study semi-parametric estimators for the mean of functional outcomes in situations where some of these outcomes are missing and covariate information is available on all units. Assuming that the missingness mechanism depends only on the covariates (missing at random assumption), we present two estimators for the functional mean parameter, using working models for the functional outcome given the covariates, and the probability of missingness given the covariates. We contribute by establishing that both these estimators have Gaussian processes as limiting distributions and explicitly give their covariance functions. One of the estimators is double robust in the the sense that the limiting distribution holds whenever at least one of the nuisance models is correctly specified. These results allow us to present simultaneous confidence bands for the mean function with asymptotically guaranteed coverage. A Monte Carlo study shows the finite sample properties of the proposed functional estimators and their associated simultaneous inference.

Confidence sets for the intraclass correlation applied to curve data from test-retest studies, November 21
Speaker: Mohammad Reza Seydi, Department of Statistics, Umeå University

Abstract: The evaluation of test-retest reliability is crucial in biomechanical research, as it ensures the validity of experimental outcomes. A common measure of reliability is the intraclass correlation coefficient (ICC). This study aims to explore and compare methods for constructing confidence sets for the ICC of curve data. The ICC for curve data can be expressed as a so-called ICC curve, or as an integrated ICC. There are currently no guidelines for how to report the uncertainty in either case. We consider both confidence bands for the ICC curve, and confidence intervals for the integrated version of ICC. We conducted a simulation study covering different major shapes of ICC curves with different sets of parameters. These methods were also applied to assess test–retest reliability of knee kinematics during one-leg hop for distance landings. Our findings indicate that for the integrated ICC, two of the compared methods demonstrate distinct merits. Meanwhile, we recommend a rank-based bootstrap confidence band for the ICC curve.

Machine Learning and the Location Component, October 19
Speaker: Assoc. Prof. Jose Francisco Ramos, University Jaume I of Castellon, Spain

Abstract: In this talk, I will present the research group GEOTEC, focused on GIS, together with explaining, in detail, some of their projects, which are based on the location component and how we used it in different Machine Learning developments. Finally, some interesting aspects about Data Visualization, a subject in our Master Erasmus Mundus of Science in Geospatial Technologies [1], will be also presented.

About the projects, we will be focused on two of them. First, a solution to navigate in high polluted cities will be explained [2], which considers sensors, interpolate surfaces, and solve the typical routing problem of going from point A to point B, but in this case, minimizing the exposure to areas with high pollution. After that, I will try to answer the question: does the weather influence the results in soccer matches? [3]. In this case, we will apply some Machine Learning methods to a dataset that include historical weather conditions in different cities of Spain combined to soccer results.

Mixture of Linear Models Co-supervised by Deep Neural Networks, October 10
Speaker: Prof. Jia Li, Department of Statistics, Penn State University, USA

Abstract: Deep neural networks (DNN) often achieve state-of-the-art prediction accuracy for many applications. However, in some areas, the use of DNN is resisted because it is extremely hard to explain a DNN model. On the other hand, a linear model, e.g., logistic regression, is usually considered highly interpretable but its accuracy tends to be low. Our goal is to develop mechanisms for balancing interpretability and accuracy so as to bridge the gap between explainable linear models and black-box models. Specifically, we propose a new mixture of linear models (MLM) for regression or classification, whose estimation is guided by a pre-trained DNN, acting as a proxy of the optimal prediction function. Visualization methods and quantitative approaches have been developed for interpretation. Experiments show that the new method can trade-off interpretability and accuracy. For some examples, MLM achieves comparable accuracy as DNN but significantly enhances interpretability. I will also briefly discuss our more recent work on an EM-type algorithm to estimate MLM and its potential to improve logistic regression for small datasets.

On latent and selection nodes in systems of binary variables, September 19
Speaker: Prof. Elena Stanghellini, Perugia University, Italy

Abstract: I will discuss the distortions induced by marginalization and conditioning on some parameters of interest in systems of binary random variables. Marginalization accounts for unobserved confounders while conditioning accounts for some kind of non-random sampling, such as case-control or self-selection. I shall discuss point identification as well as sensitivity analysis. Links to nonparametric modelling will also be made. An instance where the introduction of a latent variable is beneficial will also be presented.

Robot causal discovery aided by human interaction, June 13
Speaker: Filip Edström, Department of Statistics, Umeå University

Abstract:Causality is relatively unexplored in robotics even if it is highly relevant, in several respects. In this paper, we study how a robot's causal understanding can be improved by allowing the robot to ask humans causal questions. We propose a general algorithm for selecting direct causal effects to ask about, given a partial causal representation (using partially directed acyclic graphs, PDAGs) obtained from observational data. We propose three versions of the algorithm inspired by different causal discovery techniques, such as constraint-based, score-based, and interventions. We evaluate the versions in a simulation study and our results show that asking causal questions improves the causal representation over all simulated scenarios. Further, the results show that asking causal questions based on PDAGs discovered from data provides a significant improvement compared to asking questions at random, and the version inspired by score-based techniques performs particularly well over all simulated experiments.

Change-point detection: sliding-window and hierarchical-based approaches, May 30
Speaker: Mehdi Moradi, Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: This talk focuses on the problems of i) detecting change-points near the tails of time series in a univariate setting and ii) domination between marginals when performing multivariate analysis. An adaptive sliding-window-based approach is proposed for the former case, and a hierarchical approach is developed for the latter. Both approaches are studied via comprehensive simulation studies showing that the proposed approaches outperform the state-of-the-art in various senses. Applications to NDVI, LST, and point processes are considered.

This talk is based on the following publications:

[1] Moradi, M., Montesino-SanMartin, M., Ugarte, M. D., & Militino, A. F. (2022). Locally adaptive change-point detection (LACPD) with applications to environmental changes. Stochastic Environmental Research and Risk Assessment, 36(1), 251-269.

[2] Moradi, M., Cronie, O., Pérez-Goya, U., & Mateu, J. (2023). Hierarchical spatio-temporal change-point detection. The American Statistician, 1-11.

On Federated Learning with Decision Trees, May 23
Speaker: Saloni Kwatra, Department of Computing Science, Umeå University

Abstract: Federated Learning (FL) allows training a shared model across multiple distributed devices or organizations without the need for centralized data collection. In FL, the data remains with the local devices or organizations, and only the model updates are shared with the central server. The central server aggregates the model updates received from different devices and sends the aggregated model updates back to the distributed devices. The process is continued until the model reaches a point of convergence or until the maximum number of iterations has been achieved. Although only the model parameters are shared across the devices, sharing model updates leads to substantial privacy leakage. Hence, our work focuses on privacy-preserving FL. We proposed an FL framework with Decision Trees, in which each device first protects its data using Mondrian k-anonymity and then trains the decision tree classifier. Distributed devices share their nodes from the root to the leaf node, and the aggregation server recursively merges the DTs and obtains a merged tree, which is then shared by the distributed devices.

Each device participating in the FL process aims to learn a better machine-learning model than what it could have learned alone. We studied an FL framework called SimFL, which leverages the information from similar samples of distributed parties. SimFL uses Locality Sensitive Hashing (LSH) to know similar samples from different distributed devices. The idea of LSH is that the data sample and its nearest similar neighbors should be hashed into the same bucket with a high probability, and dissimilar samples should be hashed into the same bucket with a low probability. The SimFL framework assumes that each distributed device knows the hashed values (computed using LSH functions) of every device's records. We show that this assumption is a significant vulnerability in SimFL, which risks the privacy of individuals. We implemented two data reconstruction attacks, which estimated the user's original data from the hash values computed using LSH. We proposed an FL framework SimFL, where we use Mondrian anonymization before the computation of locality-sensitive-based hashed values. Mondrian k-anonymity before LSH improves the privacy of participants in FL. The reason is that Mondrian k-anonymity creates an equivalence class or anonymized set of size k, where all the quasi-identifiers are generalized to the same values in a group of size k. Dissimilar samples are placed in the same equivalence class due to the enforcement of k-anonymity (precisely when k is high. The term high depends on the size and distribution of the dataset). High k can also worsen the predictive capability of the FL model. Therefore, there is a trade-off between utility and privacy.

This talk is based on a joint work with Vicenç Torra.

Ticket to GAMLSS: case studies, May 16
Speaker: Bertold Mariën, Icelab, Umeå University

Abstract:The dynamics of ecological processes are often highly complex. In addition, ecological experiments are usually hampered by practical limitations. As a result, data derived from these experiments is often riddled with statistical issues such as missing data, outliers, zero observations, et cet. Generalized additive models (GAMs) and its extension generalized additive models for location, scale and shape (GAMLSS) are supervised machine learning methods that can account for many of these statistical issues. They are especially suitable to capture the non-linear relationships that often characterize ecological systems. After a quick introduction to GAMs and GAMLSS, this seminar will discuss the advantages and limitations of GAMS and GAMLSS based on a few ongoing ecological case studies.

The Penalized Instrumental Variables Methods for many invalid instruments with an application to Mendelian Randomization, April 25
Speaker: Muhammad Qasim, Jönköping International Business School Jönköping University

Abstract: The valid instrumental variable (IVs) must not have direct effects on the outcome variable and not be correlated to unmeasured variables. But practically IVs are likely to be invalid. The existing methods can lead to the large bias relative to standard errors with many weak and invalid instruments. In this paper, we derive a LASSO procedure for the k-class IV estimation methods in the linear IV model. In addition, we propose the jackknife IV method by using the LASSO in order to overcome the problem of many weak invalid instruments in the heteroscedastic data. The proposed methods are robust for estimating the causal effect in the presence of many invalid and valid instruments, with theoretical assurances on their execution. In addition, the two-step numerical algorithms are developed for the estimation of causal effects. The performance of the proposed estimators is shown by Monte Carlo simulation and Mendelian randomization study to estimate the causal effect of body mass index on health-related quality of life.

This talk is based on joint work with Kristofer Månsson (Jönköping University) and Narayanaswamy Balakrishnan (McMaster University).

Federated Frank-Wolfe Algorithm, April 18
Speaker: Ali Dadras, Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: Federated learning (FL) has gained much attention in recent years for building privacy-preserving collaborative learning systems. However, FL algorithms for constrained machine learning problems are still very limited, particularly when the projection step is costly. To this end, we propose a Federated Frank-Wolfe Algorithm (FedFW). FedFW provably finds an eps-suboptimal solution of the constrained empirical risk-minimization problem after O(1/eps^2) iterations if the objective function is convex. The rate becomes O(1/eps^3) if the objective is non-convex. The method enjoys data privacy, low per-iteration cost and communication of sparse signals. We demonstrate the empirical performance of the FedFW algorithm on several machine-learning tasks.

This talk is based on joint work with Karthik Prakhya and Alp Yurtsever.

Combinatorics and statistics – some points of contact, April 11
Speaker: Lars-Daniel Öhman, Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: Historically, design theory has been studied both by statisticians and combinatorialists, although with slightly different takes on the subject. While statisticians have focused more on optimality and practical applicability, combinatorialists have considered questions of existence and complete enumeration. One intention of this talk will be to attempt to bring these two groups closer, by talking about the relation between these two perspectives. In the main part of the talk, I will present some variations on classic designs, notably ordered sets and non-constant intersection sizes, and present results from our recent work on such designs.

This talk is based on joint work with Gerold Jäger, Klas Markström, Tomas Nilson and Denys Shcherbak.

Estimating Multiregional Input-Output Tables for Swedish Regions - Trade Modelling Comparisons, April 4
Speaker: Jonas Westin, Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: The purpose of the paper is to discuss experiences from the ongoing project for estimating and validating interregional trade in the new multiregional input-output (MRIO) tables at Statistics Sweden. In the paper we investigate a novel method for estimating interregional trade flows in sectors where no or little survey data of trade patterns are available.

The project is part of a quality assurance initiative to develop interregional input-output tables for research, policy assessment and planning. For this effort to be successful, it is necessary to add further resources for the collection of interregional trade statistics and the consistent modelling of interregional trade using a combination of survey and non-survey tools.

There are many techniques for updating or estimating multiregional input-output tables using non-survey methods. A common method for estimating trade matrices is the gravity-RAS approach where unobserved trade flows are estimated using a gravity model in combination with the RAS-algorithm for fitting the estimated matrix to total production and consumption in each region. This is the method used in both previous Swedish MRIO-projects as well as in the process for estimating Production-Consumption-matrices for the Swedish National Freight Transport model SAMGODS.

A drawback of the used method is that it requires survey data on regional commodity flows, which can be expensive and difficult to collect. The estimation procedure uses survey data to estimate parameters in a gravity model. This model is then utilized to generate à priori matrices that is fitted to data on regional production and consumption using RAS-balancing.

In this paper, we investigate an alternative method for estimating the parameters in the gravity model using an error function that penalizes errors in the marginal constraints. This way, we can use regional data already available to find the most likely gravity model trade patterns that fits the data.

Comparisons of estimated gravity models using historical trade flows for Sweden have shown that the method, in many situations, produces results that are similar to more traditional survey based estimations techniques. In this paper we investigate the properties of this new method further in a simulation study where different ways of estimating MRIO-matrices are compared using Monte Carlo simulations.

Joint clustering of temporally-dependent misaligned functional data - an application to annually laminated lake sediments, March 27
Speaker: Michele di Sabato, intern at the Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: In order to properly quantify the effects of current global climate changes, it is important to understand how climate evolved in the past. Tree-rings, ice cores, corals, lake and sea sediments are among the natural archives that may carry such information.

Throughout this seminar, we will propose an application of Functional Data Analysis and Spatial Statistics to a dataset consisting of varved sediments obtained from Swedish and Finnish lakes, in an attempt to reconstruct the main climatic changes which involved the Fennoscandian Peninsula thousands of years ago.

Impact of Non-Informative Censoring on Propensity Score Based Estimates of Marginal Hazard Ratios, March 14
Speaker: Guilherme Barros, Department of Statistics, Umeå University

Abstract: In medical and epidemiological studies, one of the most common settings is studying the effect of a treatment on a time-to-event outcome, where the time-to-event might be censored before end of study. A common parameter of interest in such a setting is the marginal hazard ratio (MHR). When a study is based on observational data, propensity score (PS) based methods are often used, in an attempt to make the treatment groups comparable despite having a non-randomized treatment. Previous studies have shown censoring to be a factor that induces bias when using PS based estimators. In this paper we study the magnitude of the bias under different rates of non-informative censoring when estimating the MHR using PS weighting or PS matching. A bias correction involving the probability of event is suggested and compared to conventional PS based methods.

Robust and Efficient Estimation under Nonignorable Missing Response, March 7
Speaker: Yanyuan Ma, Department of Statistics, Eberly College of Science, Penn State

Abstract: We consider the estimation problem in a regression setting where the outcome variable is subject to nonignorable missingness and identifiability is ensured by the shadow variable approach. We propose a versatile estimation procedure where modeling of missingness mechanism is completely bypassed. We show that our estimator is easy to implement and we derive the asymptotic theory of the proposed estimator. We also investigate some alternative estimators under different scenarios. Comprehensive simulation studies are conducted to demonstrate the finite sample performance of the method. We apply the estimator to a children's mental health study to illustrate its usefulness.

Revisiting High-Resolution ODEs for Faster Convergence Rates, February 21
Speaker: Hoomaan Maskan, Department of Mathematics and Mathematical Statistics, Umeå University

Abstract: We consider the unconstrained minimization of strongly convex smooth functions using first-order accelerated methods. High-resolution ordinary differential equations (ODEs) are studied to understand the behaviour of Nesterov's accelerated gradient algorithm. A new general framework is proposed and a new general Lyapunov function for convergence proof is introduced. Using integral quadratic constraints from robust control theory, we justify our choice of Lyapunov function. Through an alternative ODE representation, Nesterov's algorithm is exactly explained. Our continuous-time analysis leads to improved convergence results on the triple momentum's ODE. After discretization, several methods are found as special cases of our framework and better convergence rate on the quasi hyperbolic momentum algorithm is achieved. To the best of our knowledge, this is the first work considering continuous time ODE representation as an effective factor in first-order accelerated algorithms.

This is a joint work with Armin Eftekhari (Umeå University) and Konstantinos Zygalakis (Edinburgh University).

Developing Machine Learning Models to Predict Multi-Class Functional Outcomes and Death 3 months after Stroke in Sweden, February 14
Speaker: Josline Otieno, Department of Statistics, Umeå University

Abstract:

Importance: Globally, stroke is the third-leading cause of mortality and disability combined. Due to the personal suffering of post-stroke disability and the economic burden for the community, accurate prediction of outcomes could provide guidance for the continued care and rehabilitation planning. Machine learning algorithms have recently shown potential in predicting outcomes after stroke.

Objective: To develop and compare the performance of three supervised machine learning algorithms with the traditional logistic regression model in predicting disability and death 3 months after stroke based on the modified Rankin Scale (mRS) using routinely collected. A secondary aim was to explore the explainability of these algorithms by revealing the most important variables and how they contribute to the prediction.

Design, Setting, Participants: This prognostic study includes data from the Swedish Stroke Register (Riksstroke), containing information on the entire chain of acute care among patients admitted to all 72 hospitals caring for stroke patients in Sweden. Patient reported outcomes, including functional status, are collected by a questionnaire 3 months after stroke. Data on 102135 adult patients, recorded between the period January 2015 to December 2020 were included in the analyses.

Exposures: Prognostic factors (features) comprised amongst others age, sex, cardiovascular risk factors, medications, mRS prior to stroke, National Institutes of Health Stroke Scale (NIHSS), and type of stroke. Imputation of missing NIHSS values was done based on the MICE technique, and a separate category was created for missing values in other features. To improve the model’s prediction performances, feature scaling and label encoding were carried out using Min-Max and one-hot encoding methods, respectively.

Main Outcomes and Measures: The main outcome for prediction was mRS measured at 3 months after stroke, and categorized into 3 levels (0-2 independent, 3-5 dependent, and 6 dead). Classifiers included support vector machines (SVM), artificial neural networks (ANN), eXtreme Gradient Boosting (XGBoost), and logistic regression (LR). They were trained and tested on 75% and 25% of the dataset, respectively, their predictive performances assessed and compared based on accuracy scores, Matthews correlation coefficient (MCC), Cohen’s Kappa correlation coefficient, F1 scores, and area under the receiver operating characteristic curve (AUC-ROC). Lastly, the predictions were explained using SHAPley Additive exPlanations (SHAP) values.

Results: In total, 85.8% had ischemic stroke and 53.3% were male. The mean [SD] age at admission was 75.8 [12.0] years with NIHSS score median [Q1-Q3] of 3 [1-8]. The ANN and XGBoost classifiers performed significantly better than the traditional LR in classifying the correct mRS levels, respectively with an accuracy of 0.698 (95%CI 0.693-0.704) and 0.694 (95%CI 0.688-0.699), compared to 0.681 (95%CI 0.675, 0.686) for the LR model. The results also showed that death after stroke was most strongly associated with NIHSS, higher age, hemorrhagic stroke, prior stroke mRS, and being inpatient at time of stroke. Whereas, independence in functional outcome was related to male sex, stroke alerts, and lipid lowering drugs.

Conclusions and Relevance: The study demonstrated that both ANN and XGBoost classifiers have significantly better performances than the traditional LR in predicting functional outcome and death. On average, for every 10000 stroke patients, an additional of 170 patients would be correctly classified into different mRS categories using machine learning algorithms instead of LR. This could be clinically important in acute stroke care and rehabilitation planning. Existing methods (e.g SHAP) can be used for the interpretability of these advanced algorithms. The models showed promising results, however they need to be externally validated for generalizability.

Bounding the selection bias, January 17
Speaker: Stina Zetterström, Department of Statistics, Uppsala University

Abstract:Selection bias is a systematic error that can occur when subjects are included or excluded in the analysis based upon some selection criteria for the study population. This type of bias can threaten the validity of the study and, therefore, methods for estimating the effect of selection bias are desired. One method of estimating the effect of selection bias is through sensitivity analysis, and one such type of sensitivity analysis is bounding the bias. In this work, we investigate a previously proposed bound for average causal effects in the total population and in the selected subpopulation, referred to as the SV bound (Smith and Vanderweele, 2019). The bound is based on assumptions of values of sensitivity parameters selected by the researcher. Furthermore, we derive feasible regions for the sensitivity parameters as well as conditions for the SV bound to be sharp, where sharp means that the bias can take the value of the bound. As an alternative, we propose a second bound that is based solely on the observed data and is, therefore, referred to as the assumption free (AF) bound. We provide an R package for calculating the SV and AF bounds. The bounds and the R package are illustrated with a simulated dataset that emulates a study on the effect of zika virus on microcephaly in Brazil.

This is a joint work with Prof. Ingeborg Waernbaum Department of Statistics, Uppsala University

2022

Comparing non-parametric and parametric item response theory models, November 22
Speaker: Joakim Wallmark, Department of Statistics, Umeå University.

Item response theory (IRT) models are used to model the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is meant to measure. In the present study, nonparametric IRT is compared with parametric IRT for tests containing polytomous items. Specifically, optimal scores are compared with the generalized partial credit (GPC) model for both simulations and real data examples. In the real data examples, model fit when using optimal scores is shown to be superior to the fit of the GPC model for all the datasets analyzed. In the simulations study, the optimal scores out-perform the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Additionally, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be an alternative to sum scores for measuring the amount of information contained within a test a test taker has achieved.

Valid Causal Inference in High-Dimensional and Complex Settings (PhD Thesis Defense), October 7
Speaker: Niloofar Moosavi, Department of Statistics, Umeå University

Optimal estimation of heterogeneous causal effects, October 6
Speaker: Edward Kennedy, Statistics and Data Science, Carnegie Mellon University, USA

Estimation of heterogeneous causal effects – i.e., how effects of policies and treatments vary across units – is fundamental to medical, social, and other sciences, and plays a crucial role in optimal treatment allocation, generalizability, subgroup effects, and more. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there have remained important theoretical gaps in understanding if and when such methods make optimally efficient use of the data at hand. This is especially true when the CATE has nontrivial structure (e.g., smoothness or sparsity). This talk surveys work across two recent papers in this context. First, we study a two-stage doubly robust estimator and give a generic model-free error bound, which, despite its generality, yields sharper results than those in the current literature. The second contribution is aimed at understanding the fundamental statistical limits of CATE estimation. We resolve this long-standing problem by deriving a minimax lower bound, with matching upper bound obtained via a new estimator based on higher order influence functions. Applications in medicine and political science are considered.

In Search of Projectively Equivariant Neural Networks, September 28
Speaker: Axel Flinth, Umeå University

A key concept within the field of Geometric Deep Learning is that of equivariance. Put simply, a network is equivariant towards a group of transformations if it reacts properly to the input being transformed. A prominent example is that of convolutional neural networks: Here, a translation of the input causes the output to translate with it. In recent years, networks for a number of other transformation groups have been successfully constructed and applied. In this talk, we investigate the question of equivariance in a projective sense, and in particular the connection to equivariance in the standard sense. Our main motivation for studying projective equivariance is the pinhole camera model in computer vision, but other applications may be possible. As in many other works, we concentrate on equivariant multilayered perceptrons, and in particular linear layers. Our main theoretical finding is that in several important special cases, the problem of finding projectively equivariant linear layers is actually equivalent to the standard equivariance problem. We also present some small, proof-of-concept, numerical experiments. This talk is based on joint work with Georg Bökman and Fredrik Kahl.

New Statistical and Machine Learning based Control Charts with Variable Parameters for Monitoring Generalized Linear Model Profiles, September 27
Speaker: Hamed Sabahno, Department of Statistics, Umeå University

In many practical cases, the process’s quality is characterized by the relationship between some response and independent variables in the form of a regression model (called a profile), rather than being characterized by some quality variables/attributes, i.e., the quality characteristic of the process is a regression model, not some variables/attributes. Monitoring whether this relationship (profile’s parameters) remains unchanged over time or not is called ‘profile monitoring’. It is usually assumed that the response variable in a profile follows a normal distribution. However, in many real-case applications, the response variable can be distributed non-normally and follow another type of exponential family of distributions. These models, in the case of a linear relationship (which is actually the most common one), are called generalized linear models (GLMs). The most common GLM-type distributions in profile monitoring are: Binomial, Poisson, and Gamma.

In this research, we develop three statistical based control charts: Hotelling’s T2, MEWMA (multivariate exponentially weighted moving average), and LRT (likelihood ratio test) as well as three machine learning (ML) based control charts: ANN (artificial neural network), SVR (support vector regression) and RFR (random forest regression), for monitoring GLM profiles. We train these ML techniques to get a linear (regression) output and then apply our own classification technique to see if the process is in- or out-of- control, at each sampling time. In addition to developing the FP (fixed parameter) schemes, we design an adaptive VP (variable parameters) scheme for each control chart as well to increase the charts’ sensitivity in detecting shifts, by developing some algorithms with which the values of the control chart parameters in both FP and VP schemes can be obtained. Then, we develop two Monte Carlo-based algorithms to measure the charts’ performance in both FP and VP formats, by using the run length and time to signal performance measures.

After designing the control charts as well as performance measures, which can be used for any other types of distribution-based or distribution-free control charts as well, we perform extensive simulation studies and evaluate and compare all our control charts under different shift sizes and scenarios and in three different simulation environments. At last, we present a numerical example regarding a drug dose-response study to show how the proposed control charts can be implemented in real practice.

Statistical analysis of point patterns on linear networks, September 15
Speaker: Jesper Møller, Aalborg University

A point process is a mathematical model for randomly distributed point patterns in a given space. While the mathematical and statistical theory for point processes on one, two or higher dimensional Euclidean space is fairly well-developed with accompanying user-friendly software for statistical analysis, notably the R package spatstat, the research on point
processes defined on more general spaces such as spheres and linear networks is in its infancy. This talk will provide a state-of-the-art review on statistical models, simulation procedures, and methods for estimation and model checking when analyzing point patterns observed on spheres.

Statistical analysis of point patterns on spheres, September 13
Speaker: Jesper Møller, Aalborg University

Additive Gaussian Process Models for Spatial and Spatio-temporal Data, September 8
Speaker: Sahoko Ishida, London School of Economics

Regression with Gaussian Process (GP) prior is a powerful statistical tool for modelling a wide variety of data with both Gaussian and non-Gaussian likelihood. In the spatial statistics community, GP regression, also known as Kriging, has a long-standing history. It has been proven useful since its introduction, due to its capability of modelling autocorrelation of spatial and spatio-temporal data.

Other than space and time, real-life applications often contain additional information with different characteristics. In applied research, interests often lie in exploring whether there exists a space-time interaction or investigating relationships with covariates and the outcome while controlling for space and time effect.

Additive GP regression allows to model such flexible relationships by exploiting the structure of the GP covariance function (kernel) by adding and multiplying different kernels for different types of covariates. This has only partially been adapted in spatial and spatio-temporal analysis.

In this study, we use ANOVA decomposition of kernels and introduce a unified approach to model spatio-temporal data, using the full flexibility of additive GP models. Not only does this permit modelling of main effects and interactions of space and time, but furthermore to include covariates, and let the effects of the covariates vary with time and space. We consider various types of outcomes including, continuous, categorical and counts. By exploiting kernels for graphs and networks, we show that areal data can be modelled in the same manner as the data that are geo-coded using coordinates. For model estimation, we have implemented both MCMC algorithm and analytical approximations including Laplace approximation and variational inference. In this presentation we demonstrate the proposed methods using empirical data.

Spectral methods for clustering signed and directed networks, September 6
Speaker: Associate Prof. Mihai Cucuringu, Department of Statistics, Oxford University, UK.

We consider the problem of clustering in two important families of networks: signed and directed, both relatively less well explored compared to their unsigned and undirected counterparts. Both problems share an essential common feature: they can be solved by exploiting the spectrum of certain graph Laplacian matrices or derivations thereof. In signed networks, the edge weights between the nodes may take either positive or negative values, encoding a measure of similarity or dissimilarity. We consider a generalized eigenvalue problem involving graph Laplacians, with performance guarantees under a signed stochastic block model setting, along with regularized versions to handle very sparse graphs. The second problem concerns directed graphs. Imagine a (social) network in which you spot two subsets of accounts, X and Y, for which the overwhelming majority of messages (or friend requests, endorsements, etc) flow from X to Y, and very few flow from Y to X; would you get suspicious? To this end, we also discuss a spectral clustering algorithm for directed graphs based on a complex-valued representation of the adjacency matrix, which is able to capture the underlying cluster structures, for which the information encoded in the direction of the edges is crucial. We evaluate the proposed algorithm in terms of a cut flow imbalance-based objective function, which, for a pair of given clusters, it captures the propensity of the edges to flow in a given direction. Experiments on a directed stochastic block model and real-world networks showcase the robustness and accuracy of the method when compared to other state-of-the-art methods. Time permitting, we briefly discuss connections to ranking from pairwise comparisons data, the group synchronization problem, and also overview alternative approaches beyond spectral methods to all the above issues.

Model-based inference for abundance estimation using presence/absence data from large-area forest inventories with covariate data from remote sensing, May 17
Speaker: Benoît Gozé, doctoral student at the Department of Forest Resource Management, SLU

In this paper, we investigate methods to estimate plant population size and intensity (also known as density) from presence/absence data. Presence/absence sampling is a useful and relatively simple method for monitoring state and change of plant species communities. Moreover, it has advantages compared to traditional plant cover assessment, the latter being more prone to surveyor judgement error.
We use inhomogeneous Poisson point process models concerning plant locations, and generalised linear models (GLM) with a complementary log-log link function for linking presence/absence data to plant intensity. In these models, auxiliary covariate information coming from remote sensing (i.e. wall-to-wall data) are used. We propose an estimator of plant intensity, as well as a variance of this estimator (and how to estimate this variance). For evaluating these estimators, we use both Monte-Carlo simulations, where we create artificial plant populations, and empirical data from the Swedish National Forest Inventory (NFI). We also develop a test for our models, to check the underlying Poisson point process model assumption and protect inference against model misspecification. The suggested hypothesis test is evaluated through Monte-Carlo. Some models could be produced for a selection of forest plant species and passed the Poisson test. Estimation of plant density and its related variance estimation could be performed for these species.

Two examples of statistical applications: Stroke quality of care improvement, and SARS-CoV-2 disease severity, May 10
Speaker: Dr Wenjuan Wang, Research Fellow/Senior Data Scientist in the School of Population Health & Environmental Sciences, King’s College London. Dr Wang works on applying machine learning for stroke quality care improvement, as well as for COVID-19 and flu patient management and severity scores.

Part I: Applying machine learning for stroke quality care improvement
Machine learning was implemented for risk prediction of 30-day mortality after stroke using data from the Sentinel Stroke National Audit Programme (SSNAP) which is the national registry of stroke care in England, Wales and Northern Ireland. The ML model developed more accurately predicted 30-day mortality (AUC 0.896) compared to the previously developed model used in SSNAP (0.854) and was reasonably well calibrated, thus could potentially be used as benchmarking model for quality improvement in stroke care in SSNAP

Part II: Multivariable analysis of the association of the Alpha variant (B.1.1.7 lineage) of SARS-CoV-2 with disease severity in inner London

Through a descriptive comparison of admission characteristics between pandemic waves and multivariable analysis of the association of the Alpha variant (B.1.1.7 lineage) of SARS-CoV-2 with disease severity in inner London, we discovered that increased severity of disease associated with the Alpha variant and the number of nosocomial cases was similar in both waves despite the introduction of many infection control interventions before wave 2.

Data subsampling, active learning, and optimal design, April 26
Speaker: Henrik Imberg, Doctoral Student, Department of Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg

Data subsampling has become increasingly used in the statistics and machine learning community to overcome practical, economical and computational bottlenecks in modern large scale inference problems. Examples include leverage sampling for big data linear regression, optimal subdata selection for generalised linear models, and active machine learning in measurement constrained supervised learning problems.

So far, the contributions to the field have been largely focused on computationally efficient algorithmic developments. Consequently, most sampling schemes proposed in the literature are either based on heuristic arguments or use optimality criteria with known deficiencies, e.g. being dependent on the scaling of the data and parametrisation of the model. We develop a general theory of optimal design for data subsampling methods and derive a class of asymptotically linear optimality criteria that i) can easily be tailored to the problem at hand, ii) are invariant to the parametrisation of the model, and iii) enable fast and efficient computation for both Poisson and multinomial sampling designs.

The methodology is illustrated on binary classification problems in active machine learning, and on density estimation in computationally demanding virtual simulations for safety assessment of automated vehicles.

An adaptive max-type multivariate control chart by considering the measurement errors and autocorrelation, March 22
Speaker: Hamed Sabahno, Postdoctoral fellow, Department of Statistics, Umeå University

Investigating the effects of two real-world-occurring phenomena: 'measurement errors' and 'autocorrelation between observations' on control charts, has caught researchers' attention in recent years. However, their combined effect has rarely been investigated; with only one study for multivariate control charts. In this paper, their combined effects will be investigated for the first time in univariate and multivariate control charts on 'adaptive' and/or 'simultaneous process parameters monitoring' control charts and also for the first time in multivariate control charts by using linearly covariate measurement errors, VARMA (vector mixed autoregressive and moving average) autocorrelation models, and Markov Chain based performance measures. To do so, we add the above-mentioned measurement errors and autocorrelation models to a recently developed adaptive VP (variable parameters) max-type control chart which is capable of monitoring the process parameters simultaneously. Then, we develop a Markov chain model to compute the average and standard deviation of time to chart signal. After developing the control scheme as well as the performance measures in the presence of both measurement errors and autocorrelation, extensive simulation studies will be performed to investigate the combined effects of measurement errors and autocorrelation as well as some methods to alleviate their negative effects. In addition, this paper for the first time uses the skip-sampling strategy in an ARMA/VARMA autocorrelation model for alleviating the autocorrelation effect. At last, an illustrative example involving a real industrial case will be presented.

Microstructures and mass transport in porous materials - combining physics, spatial statistics, machine learning, and data science, March 1

Speaker: Magnus Röding, Adjunct Associate Professor, Chalmers University of Technology and University of Gothenburg, Sweden

Understanding the microstructure of a porous material and how it relates to its mass transport properties (diffusion, fluid flow) is crucial for designing better materials. One example is coating layers on pharmaceutical pellets for controlled release of compounds, another is liquid transport through fibrous media in hygiene products. We combine e.g. image analysis, spatial statistics, stochastic geometry, numerical simulation techniques, and machine learning to characterize materials and to predict and understand their properties. We will discuss a number of cases involving semantic segmentation of 3D image data, deep learning regression for parameter estimation in different experimental techniques, development of realistic virtual materials models, machine learning-based prediction of properties, and the design of materials with desired properties.

Scalable ML: Communication- efficiency, Security, and Architecture Design, February 22
Speaker: Ali Ramazani-Kebrya, a Senior Postdoctoral Associate at EPFL, Switzerland

To fully realize the benefits of deep learning, we need to design highly scalable, robust, and privacy-preserving learning algorithms along with understanding the fundamental limits of the underlying architecture, e.g., a neural network over which the learning algorithm is applied. The key algorithm underlying deep learning revolution is stochastic gradient descent (SGD), which needs to be distributed to handle enormous and possibly sensitive data distributed among multiple owners, such as hospitals and cellphones, without sharing local data. When implementing SGD on large-scale and distributed systems, communication time required to share stochastic gradients is the main performance bottleneck. In addition to communication-efficiency, robustness is highly desirable in real-world settings. We present efficient gradient compression and robust aggregation schemes to reduce communication costs and enhance security while preserving privacy. Our algorithms currently offer the highest communication-compression while still converging under regular (uncompressed) hyperparameter values. Considering the underlying architecture, one fundamental question is "How much should we overparameterize a neural network?" We present the current best scaling on the number of parameters for fully-trained shallow neural networks under standard initialization schemes.

Causal inference with a functional outcome, Febryary 15
Speaker: Kreske Ecker, Department of Statistics, Umeå University

In this work we present methods to study the causal effect of a scalar treatment on a functional outcome based on observational data. We develop a semi-parametric estimator for a Functional Average Treatment Effect (FATE), based on outcome regression. Using recent results from functional data analysis, we show how to obtain exact valid inferences on the FATE under certain conditions: we give simultaneous confidence bands, which cover the parameter of interest with a given probability over the entire domain. Using simulation experiments, we compare the performance of the simultaneous confidence bands to that of pointwise bands that do not take the multiple comparison problem into account, and find that the former achieve the desired coverage rates, whereas the latter do not. In addition, we use the methods presented to estimate the effect of early adult location on subsequent income development for one Swedish birth cohort. Overall, we find a positive effect of living in an urban, as opposed to rural, area at the age of 20 on cumulative lifetime incomes, but there are differences by gender. For women, the effect is stronger and positive over the entire study period, whereas for men there is a negative effect during the first years.

Kernel equating with mixed-format test forms, February 1
Speaker: Joakim Walmark, Department of Statistics, Umeå University

The purpose of equating is to ensure that test scores from different test forms can be used interchangeably. Test forms which include items of different formats, such as dichotomously and polytomously scored items, are typically referred to as mixed-format tests. In this study, the kernel equating method was evaluated under different scenarios for equating of mixed-format tests. In kernel equating, the test score distributions are typically presmoothed to remove irregularities due to sampling before the actual equating is conducted. The use of both log-linear and item response theory (IRT) models for presmoothing were compared through simulations and real data applications. Data was simulated with and without IRT models to avoid exclusively favouring IRT presmoothing and both equivalent and non-equivalent group designs were considered. The simulation results and the real-data applications suggest that using IRT models for presmoothing provides smaller equating standard errors compared to using log-linear models. Additionally, IRT presmoothing resulted in lower bias than log-linear presmoothing when IRT models were used to simulate test data. However, when test data was simulated without the use of IRT models, the bias was lower when log-linear presmoothing was used. In a practical setting, when computation of bias is not possible, using IRT models for presmoothing should be preferred in most situations because of the lower standard errors.

Valid causal inference: model selection and unobserved confounding in high-dimensional settings, January 25
Speaker: Niloofar Moosavi, Department of Statistics, Umeå University

During the last years, a great extent of work has been done on constructing confidence intervals for average causal effect parameters that are uniformly valid over a set of data generating processes even when high-dimensional nuisance models are estimated by post-model-selection or machine learning estimators. These developments assume that all the confounders are observed to ensure point identification. We contribute by showing that valid inference can be obtained in the presence of unobserved confounders and high-dimensional nuisance models. We thus propose uncertainty intervals, which allow for nonzero confounding bias. The later bias is specified and estimated and is function of the amount of unobserved confounding allowed for. We show that valid inference can ignore the finite sample bias and randomness in the estimated value of confounding bias by assuming that the amount of unobserved confounding is small relative to the sample size; the latter is formalized in terms of convergence rates. An interpretation is that more confounders are collected as the sample size grows. Simulation results are presented to illustrate finite sample properties and explore a double selection procedure and a correction of the residual variance estimator, which improve the performance even for larger correlations.

2021

Prehospital resource optimization, February 2
Speaker: Patrik Rydén, Department of Mathematics and Mathematical Statistics, Umeå University

The prehospital care in Sweden has about 660 ambulances, respond to about 1.2 million emergency calls per year, and costs more than 4 billion SEK per year. An aging population, urbanization and medical progress demand a flexible prehospital care. The goal of this project is to develop processes and tools that make it possible to organize ambulance units and operations in an optimal way. Based on big and complex alarm-data, advanced statistical modelling and large-scale data driven simulations we have develop tools to compare allocations (how the ambulances are placed and scheduled) under user defined future scenarios. The solution makes it easy to highlight the implications for specific regions and patient groups. The next step is to find the allocation that optimize some user defined loss function. Here an allocation can be thought of as a design point in a high-dimensional space for which the loss can be estimated using time demanding simulations and where the design points can be selected iteratively. I will give an overview of the project and highlight some interesting problems and results.

162 years of temperatures in Umeå, 1859-2020, February 16
Speaker: Per Arnqvist, Department of Mathematics and Mathematical Statistics, Umeå University

Maximum likelihood estimation in stochastic channel models
Speaker: Christian Hirsch, University of Groningen, February 23

We propose Monte Carlo maximum likelihood estimation as a novel approach in the context of calibration and selection of stochastic channel models. First, considering a Turin channel model with inhomogeneous arrival rate as a prototypical example, we explain how the general statistical methodology is adapted and refined for the specific requirements and challenges of stochastic multipath channel models. Then, we illustrate the advantages and pitfalls of the method on the basis of simulated data. Finally, we apply our calibration method to wideband signal data from indoor channels.
Based on joint work with Ayush Bharti, Troels Pedersen, Rasmus Waagepetersen'

Global envelopes with applications to functional data analysis and general linear model, March 2
Speaker: Mari Myllymäki, Natural Resources Institute Finland (Luke), Helsinki, Finland.

Abstract: Global envelopes are nowadays quite often used in testing null models for spatial processes by means of different summary functions, because they provide a formal test and provide suggestions for alternative models through graphical interpretation of the test results. Global envelopes are however a rather general tool that can be applied in various applications. Namely, they can be employed for central regions of functional or multivariate data, for graphical Monte Carlo and permutation tests where the test statistic is multivariate or functional, and for global confidence and prediction bands. In this talk, I describe the global envelopes, illustrate the methodology on different applications including the functional general linear model, and show examples of the usage of the R package GET (Myllymäki and Mrkvička, 2020) that implements global envelopes. Further, I discuss the multiple testing correction in the global envelope tests for functional test statistics, which are discretized to m highly correlated hypotheses. While the global envelopes were first developed to control the family-wise error rate, also control of false discovery rate can be introduced.

Myllymäki and Mrkvička (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME] https://arxiv.org/abs/1911.06583

Summary statistics for point processes on linear networks, March 30
Speaker: Mohammad Mehdi Moradi, Department of statistics, computer sciences, and mathematics, the public university of Navarra, Pamplona, Spain.

Abstract: The last decade witnessed an extraordinary increase in scientific interest in the analysis of network-related data. This pervasive interest is partly caused by a strongly expanded availability of such datasets. In the spatial statistics field, there are numerous real examples, such as the locations of traffic accidents or street crimes, with the need of restricting the support of the underlying process over the corresponding network structure to set and define a more realistic scenario. This being said, the analysis of the point process on a linear network has been extremely challenging due to the geometrical complexities of the network. In this talk, we go through summary statistics of different orders, and their estimators, to study the correlation between events that occurred over a linear network. We highlight the importance of the change-of-support, mathematical challenges, and the use of different distance metrics would be also discussed. Finally, we demonstrate applications to traffic accidents and criminology.

Stochastic analysis and modelling of eye movements, April 20
Speaker: Aila Särkkä, Professor at the Department of Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, Sweden.

Abstract: Eye movements are outcomes of cognitive processes in the human brain, and can be recorded with a high spatial and temporal resolution by computerized eye trackers. Here, the question of interest is how people look at art. The data come from a cognitive art research experiment, where the eye movements of twenty test subjects were recorded while they were looking at six paintings, each painting for three minutes. We will concentrate on studying the eye movements on one of the six paintings, namely Koli landscape by Eero Järnefelt.

Eye movements can be represented as an alternating sequence of fixations (periods in which the gaze is staying relatively still around a location of the target space) and saccades (rapid movements between the fixations). We regard the process of fixations as a spatio-temporal point process and introduce methods to analyse the point pattern data and models for the spatio-temporal eye movement process including fixation locations, fixation durations, and saccade durations and lengths. I will mainly discuss joint work with Anna-Kaisa Ylitalo and Peter Guttorp [1] but will also briefly mention the work by Antti Penttinen and Anna-Kaisa Ylitalo [2].

References:

[1] Penttinen, A., and Ylitalo, A.-K. Deducing self-interaction in eye movement data using sequential spatial point processes. Spatial Statistics 17, (2016), 1-21.

[2] Ylitalo, A.-K., Särkkä, A. and Guttorp, P. Stochastic analysis and modeling of eye movements in viewing paintings. Annals of Applied Statistics 10(2), (2016), 549-574.

Train performance analysis using heterogeneous statistical models, June 8
Speaker: Jianfeng Wang, First research engineer, Department of Mathematics and Mathematical Statistics, Umeå University

On the rate of convergence of deep neural network regression estimates, September 20
Speaker: Dr. Sophie Langer, TU Darmstadt, Germany

Abstract: Recent results in nonparametric regression show that deep learning, i.e., neural network estimates with many hidden layers, are able to circumvent the so–called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. Under a general composition assumption on the regression function, one key feature of the neural networks used in these results is that their network architecture has a further constraint, namely the network sparsity. In this talk we show that we can get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions. Here either the number of neurons per hidden layer is fixed and the number of hidden layers tends to infinity suitably fast for sample size tending to infinity, or the number of hidden layers is bounded by some logarithmic factor in the sample size and the number of neurons per hidden layer tends to infinity suitably fast for sample size tending to infinity. In a second result we show that deep neural networks (DNNs) achieve a dimensionality reduction in case that the regression function has locally low dimensionality. Consequently, the rate of convergence of the estimate does not depend on its input dimension d, but on its local dimension d* and the DNNs are able to circumvent the curse of dimensionality in case that d* is much smaller than d.

Resample-smoothing and statistical learning for point processes, October 28
Speaker: Mehdi Moradi, Public University of Navarre, Spain

Abstract: The analysis of point patterns may almost always begin with estimating the intensity function due to its control over distributional behaviours of the underlying point process that is assumed to have generated the observed pattern. Going through the literature, one can easily see that many techniques, based on different points of view, have been proposed for intensity estimation. In this talk, by employing independent random thinning, we show how i) a resample-smoothing approach can significantly improve the performance of Voronoi intensity estimators, and ii) a statistical-learning-based approach enhances kernel-based intensity estimators. We discuss technical details, and through simulation studies show how our proposals improve the state-of-art. Applications to some real data will also be presented.

References:

Cronie, O., Moradi, M., and Biscio, C. A. (2021). Statistical learning and cross-validation for point processes. arXiv preprint arXiv:2103.01356.

Moradi, M., Cronie, O., Rubak, E., Lachieze-Rey, R., Mateu, J., and Baddeley, A. (2019). Resample-smoothing of Voronoi intensity estimators. Statistics and computing, 29(5), 995-1010.

2018

19 September, 13:15, Hörsal E Humanisthuset
Angel G. Angelov, Department of statistics Umeå University
Methods for interval-censored data and testing for stochastic dominance

9 October, 13:00 - 14:00, UB334
Giovanni Forchini, School of Business Economics and statistics Umeå University
Ill-Conditioned Problems and Fisher Information

16 October, 13:00 -14:00, UB333
Raoul Theler, School of Business, Economics and statistics Umeå University
On the Evaluation of Endogenous Treatment Effects Correlated with Natural Instruments

13 November, 13:00 - 14:00, UB334
Xuan-Son Vu, Department of computer science
Privacy in the world of AI and Big data

20 November, 13:00 - 14:00, UB336
Mohammad Ghorbani, Department of Mathematics and Mathematical Statistics
Statistical analysis of functional marked point processes

27 November, 13:00 - 14:00, UB336
Maria Josefsson, Centre for demographic and aging research Umeå University
Bayesian semiparametric G-computation for causal inference in a cohort study with non-ignorable dropout and death

4 December, 11:00 - 12:00, UB337
Juha Karvanen, Jyväskylä University Finland
Combining experiments and observations in causal inference

19 December, 13:15-14:00, UB337
Tetiana Corbach, Department of statistics Umeå University
Bayesian mixture modeling of fMRI connectivity in cross-sectional and longitudinal studies.

Latest update: 2026-03-02