Les missions du poste

Établissement : Université Paris-Saclay GS Informatique et sciences du numérique École doctorale : Sciences et Technologies de l'Information et de la Communication Laboratoire de recherche : Laboratoire Interdisciplinaire des Sciences du Numérique Direction de la thèse : Sergio CHIBBARO ORCID 0000000282035711 Début de la thèse : 2026-10-01 Date limite de candidature : 2026-05-12T23:59:59 Ce projet vise à développer de nouvelles techniques d'apprentissage automatique adaptées à la modélisation et à l'inférence de systèmes complexes, en prenant la turbulence comme prototype d'un problème complexe de haute dimension décrit par des équations aux dérivées partielles (EDP).
L'objectif est de développer de nouvelles architectures satisfaisant les symétries physiques, et notamment l'invariance d'échelle, afin d'améliorer les modèles génératifs actuels. L'espoir est de répondre au caractère multi-échelle de ce genre de problèmes, et de donner une explication raisonnable des événements extrêmes qui se produisent dans ces environnements complexes.
Des données précises et de haute qualité issues de simulations numériques bien résolues de la turbulence dans différentes situations seront utilisées pour évaluer les modèles. In the last decade, machine learning (ML), and more specifically deep neural networks (DNN), has
thoroughly renewed the research perspectives in many fields like Natural Language Processing and Computer Vision. Despite indisputable successes, the introduction of ML approaches in complex physical systems remains a challenge. While systems exhibiting self-similarity w.r.t. scale transformation (monofractals) are so far, in our view, not properly handled with ML models, complex systems, which correspond to the coexistence of multiple fractal components (multifractal), raise even more serious issues and lead to solutions which may suffer from a lack of confidence, acceptability, guarantees and explainability. In addition, DNN are very good at interpolating data but unable to extrapolate while, for complex systems, extreme events play an important role in many situations. For avalanche processes for instance one would like to be able to extrapolate inference to scales unseen during training: for instance train on small earthquakes to predict also larger ones.
Concerning turbulent flows, which constitutes a transversal domain of interest for the project, under-
standing their statistical properties is a challenging problem of utmost importance for modeling, predicting, and controlling many applications. The main difficulties arise from the vast range of
time/space scales involved, spanning from the largest, governed by the stirring mechanism to the shortest, named Kolmogorov scales, associated with viscous dissipation, and the presence of strong non-Gaussian fluctuations (intermittency), resulting in rare-but-intense velocity and acceleration fluctuations of up to 50-60 standard deviations that can be easily measured even in table-top laboratory flows. Due to all these issues, the problem remains insurmountable from both theoretical and modeling perspectives at the present time. Markovian and non-Markovian modelization based on multifractal and/or multiplicative models, have been employed previously to reproduce certain observed Lagrangian and Eulerian multi-scale turbulent features. However, we still lack a systematic way to generate synthetic trajectories with the correct multiscale statistics over the full range of dynamics encountered in a real turbulent environment, from the large forcing scales, through the intermittent inertial range, to the coupled regime between inertial and dissipative scales. We plan to build up a new methodology tailored to scale-invariant systems which
differs from the usual ML setup where training data are massively available. Informed modelling is still
emergent in ML. Among recent efforts, physics informed neural networks (PINNs) ensure that the
output of a deep neural network predicting the future state of a flow field satisfies locally the PDE (e.g. Navier-Stokes equations); stability conditions, ordinary symmetries and Galilean invariances have been enforced in various works and also combined with PINNs to forecast geophysical dynamics.
Yet, all these techniques have difficulties to deal with complex multiscale problems where singularities are present. Multi-scale models have been proposed in many ways either by data augmentation or using U-net architectures with weight sharing mechanisms. These brute force approaches rely unfortunately on very large datasets and lead to non-interpretable models. This project is targeting both the generative and forecasting of complex multiscale dynamics. On the generative side it is devoted to develop an ML-based approach capable of generating a surrogate signal of complex nonlinear systems reproducing all the relevant statistical features. On the forecasting side it is aimed at extrapolating predictions to scales unseen during training.
To be able to achieve this, in addition to the enforcement of physical conservations laws, we will need to introduce learning mechanisms able to share common information patterns between scales and to couple them properly. Schematically three related and complementary aspects of the problem will be investigated:
(i) introducing scale symmetries on a multi-scale representation of the data both in the supervised and
unsupervised contexts; (ii) coupling local scale symmetries with attention mechanisms for the multifractal context; (iii) adapt these multiscaling approaches to the conditional generative setting.

We will focus on two types of systems, paradigm models for which many things are known and applications of direct interest for some of the partners. On the side of paradigm models, testbeds will consist of systems having self-similar or close to self-similar properties (wave turbulence, shell-model and exactly solvable avalanche processes) with a tunable level of complexity so that we can study
how the ML models are able to adapt to deviation from strict auto-similarity. Both prediction (supervised learning) and data generation (unsupervised learning) - checking all relevant known statistics (far in the large deviation domain) or critical exponents- will be done allowing for precise assessments of the methods developed. Then the application side of these approaches will be devoted to the study of fully developed turbulence and geophysical dynamics.

Le profil recherché

Un étudiant ayant une formation en mécanique, en physique ou en informatique devra être recherché. Une expérience dans les outils de ML et / ou les systèmes complexes sera un plus. Le projet est interdisciplinaire et l'étudiant devrait montrer une forte motivation pour un tel chemin.

Postuler sur le site du recruteur

Ces offres pourraient aussi vous correspondre.

Recherches similaires