Les missions du poste

Établissement : Université Paris-Saclay GS Informatique et sciences du numérique École doctorale : Sciences et Technologies de l'Information et de la Communication Laboratoire de recherche : Laboratoire Interdisciplinaire des Sciences du Numérique Direction de la thèse : Lionel MATHELIN ORCID 0000000314334987 Début de la thèse : 2026-10-01 Date limite de candidature : 2026-05-12T23:59:59 « Gouverner, c'est prévoir ». Ce proverbe s'applique à de nombreuses situations d'intérêt pour l'ingénierie, où les décisions doivent être prises sur la base de prévisions ou lorsque la conception d'une séquence d'actions appropriée pour atteindre un objectif donné nécessite une bonne connaissance de l'effet de ces actions sur le système considéré. Dans ce projet de thèse, nous visons à développer une approche théorique pour prédire des quantités d'intérêt ou (approximativement) l'état d'un système. Nous nous appuyons sur le cadre théorique de Mori-Zwanzig développé dans la communauté de la physique statistique à la fin des années 60. Nous aborderons de manière systématique les défis liés à l'élaboration d'un modèle prédictif pour les quantités d'intérêt d'un système complexe. La manière dont l'historique des observables est pris en compte est particulièrement importante. Une dynamique des observables basée sur Mori-Zwanzig sera considérée, s'appuyant sur des structures de mémoire bien choisies et fondées. Nous décrirons également la dynamique des affinités des observables, plutôt que les observables elles-mêmes. Ces développements méthodologiques seront d'abord illustrés sur des systèmes dynamiques de faible dimension avant d'être appliqués sur des données réelles grande échelle issues de la géophysique et/ou de la cardiologie. Describing and predicting the dynamics of complex systems remains a fundamental challenge across many scientific domains. These systems are commonly described by dynamical systems in the form of differential equations.
While this formulation is principled, it assumes that the model is known and tractable. In practice, however, the dynamics are often partially unknown, computationally expensive, or only valid within limited regimes. This limitation has led to the development of data-driven approaches that infer system dynamics directly from observations.
A key difficulty arises from partial observability. In many applications, only a subset of the system variables is accessible, and observations are often noisy, sparse, or irregular. As a result, the system cannot be accurately described as a Markovian process depending solely on the current observation. Instead, its evolution depends on past states, leading naturally to a non-Markovian formulation.
Several modeling strategies explicitly incorporate memory effects, such as autoregressive models such as ARMAX [5], while recurrent neural networks (RNNs), including LSTMs [9, 17, 7], introduce latent memory variables. Reservoir computing and echo state networks [8, 11] offer computationally efficient alternatives capable of capturing long-term dependencies [19]. More recent developments include Latent ODEs [16], which combine Neural ODEs with RNN encoders, augmented Neural ODEs [3], and Transformer architectures [18]. Despite their empirical success, these approaches inherently involve a trade-off between expressivity and interpretability or tend to operate as black boxes. A natural first approach to incorporate non-Markovian effects is by explicitly including past states, leading to delay differential equations (DDEs). Neural State-Dependent Delayed Differential Equations [8] introduced a flexible framework allowing multiple delays that depend on both time and state.
'Governing is forecasting''. This proverbial saying is relevant to many situations of engineering interest where decisions must be taken based on predictions or when devising a suitable sequence of actions to achieve some goal requires a good knowledge of the effect of these actions onto the system under consideration. Such predictions usually rely on a simulation of a model of the system at hand and/or observations collected over time. A reliable model may however not be available, or be too computationally costly to be useful. Observations, on the other hand, are often scarce and do not provide a complete picture of the state of the system.In this thesis, we aim at deriving a principled approach to predict the time-evolution of quantities of interest associated with a system observed only via a few noisy sensors active at unpredictable times. To this end, we leverage the history of the information one can collect. This paradigm of predicting the future from whatever available knowledge over a past horizon is rigorously justified by the Mori-Zwanzig framework developed in the statistical physics community in the late 60s.
A particular focus will be on developing scalable approaches, suited for large-scale systems, such as those encountered in haemodynamics. While these approaches are purely data-driven, they do not explicitly exploit the physical structure of the underlying system. We aim at leveraging a theoretically grounded approach to efficiently predict quantities of interest or (approximation of) the state of a system. We rely on the Mori-Zwanzig framework developed in the statistical physics community in the late 60s, [13,20]. In a nutshell, it formalizes the time-evolution of a set of variables x(t) related to the system as a function of their history, without requiring knowledge of the other variables describing the system.
Accounting for the past essentially allows to isolate the dynamics of these observables. This framework is general and applies widely. For instance, when the whole state of the system is not accessible, the dynamics of the observables can be described with a non-Markovian model via this framework. It similarly provides a principled closure for coarse models which can be effectively complemented with a history-based term, [14,12,6].

In this thesis, we will explore the potential of Signatures to efficiently approximate the history of the observations, [2,4,15]. The Signature transform introduced in [1,10] has recently been used in several areas, including rough path theory, finance, stochastic control, and machine learning. It has proven to be an effective tool to summarize the information of paths and dependencies across different dimensions, with high computational efficiency. Signatures consist of iterated integrals of the history of its inputs and enjoys interpretability. They provide a way to linearize all possible functions of their input and exhibit nice theoretical properties. In particular, owing to tensor algebra, they can be efficiently updated when new observations become available, without recomputing the whole object.

Many open questions however remain and will be the focus of this thesis. In particular, how are the different time scales of the physical system preserved across the Signature of its observations? What are the properties of the time series to retain in order to allow for a reliable and efficient prediction based on Signatures? How large should the truncation order be for a given performance? How frugal can the Signature-based term in the Mori-Zwanzig framework be in terms of training data, a critical point in many situations? Does the Mori-Zwanzig solution has a structure that can be exploited, such as low rankness, sparsity or multi-dependence which can be captured with tensor formats, etc.?
These methodological developments will first be illustrated on low-dimensional dynamical systems before, if time allows, being demonstrated on large scale real data from geophysics.

Le profil recherché

Le candidat devra avoir une bonne formation en apprentissage automatique, mathématiques appliquées et/ou statistiques. La connaissance d'un framework d'apprentissage machine (par exemple PyTorch, Jax ou Julia) est un plus.

Postuler sur le site du recruteur

Ces offres pourraient aussi vous correspondre.

Recherches similaires