Les missions du poste

Établissement : Université Paris-Saclay GS Economie & Management École doctorale : Droit, Economie, Management Laboratoire de recherche : LITEM - Laboratoire en Innovation, Technologies, Economie et Management Direction de la thèse : Vincent LEFRERE ORCID 0000000335659692 Début de la thèse : 2026-10-01 Date limite de candidature : 2026-06-30T23:59:59 Le développement de logiciels libres (open sources, pilier de l'innovation collective, est profondément reconfiguré par l'IA générative (genAI). Parallèlement, les entreprises jouent un rôle croissant via des contributeurs salariés, des modèles « open core » et des activités orientées services. Cette participation corporative apporte des ressources et de la visibilité, mais peut aussi modifier la dynamique communautaire en soulevant des questions de gouvernance et d'orientation des projets. Cette thèse vise à évaluer comment la genAI affecte l'écosystème open source, en mettant l'accent sur sa capacité à recalibrer ou perturber l'équilibre délicat entre participation indépendante et participation d'entreprise. Les travaux existants documentent des gains de productivité et de qualité liés au codage assisté par l'IA ainsi que des transformations dans les dynamiques des communautés en ligne, mais la question de la répartition de ces bénéfices entre contributeurs liés à des entreprises et contributeurs indépendants reste ouverte, de même que ses implications pour la pérennité de l'écosystème open source. Nous mobilisons des données GitHub à grande échelle portant sur un échantillon aléatoire d'un million de développeurs afin de suivre l'évolution de leurs contributions, interactions et caractéristiques de projet dans le temps. L'objectif est de dresser un portrait précis de la manière dont la genAI a transformé l'activité, la collaboration et la qualité du code des développeurs salariés et indépendants, et d'en tirer des recommandations concrètes à destination des mainteneurs, des plateformes et des décideurs pour préserver la durabilité des écosystèmes open source. Open source software (OSS) has become a core element of contemporary digital production. It provides essential infrastructure for firms and users, constitutes a significant part of the current software stack, and offers a lens to understand the dynamics of collective innovation (Lerner & Tirole, 2000; Lakhani & von Hippel, 2003). Open source relies on licenses that ensure access to source code, redistribution, and the right to modify and share derivative works. In practice, these features lower coordination and verification costs, create a stock of reusable code, and facilitate the diffusion of knowledge across projects. Duplication is almost costless and improvements can spread quickly. Governance is typically delegated to communities and maintainers, often mediated by platforms such as GitHub. These properties make OSS a central digital common and a key input to innovation.
The structure of the OSS ecosystem has, however, changed profoundly over the past decade. While early narratives emphasized volunteer contributions, recent evidence points to increasing professionalization and firm involvement. Major digital players now sponsor key maintainers, employ dedicated open source teams, and design open core business models that combine public repositories with proprietary extensions and value-added services (Li et al., 2024). This corporate turn brings investment capacity and long-term maintenance, but it also raises concerns about governance and value capture. Increased firm control can concentrate decision rights, redefine project priorities, and alter community norms (Bitzer et al., 2007; Meissonier et al., 2010; Zhang et al., 2024).
In parallel, the rise of large language models (LLMs) and generative AI (genAI) coding assistants has introduced a new shock to software development. Tools such as GitHub Copilot and conversational models like ChatGPT can generate code and documentation from natural language descriptions, suggest fixes for error messages, refactor existing code, and help write tests or translate between programming languages. By lowering the cost of routine tasks and changing how developers search for information, genAI may substitute for some community interactions and become embedded in standard workflows. The emerging empirical literature show that AI-assisted coding speeds up development and can improve code quality, especially for certain types of tasks (Peng et al., 2023; Hoffmann et al., 2024; Yeverechyahu et al., 2024; Cui et al., 2025). It also appears from previous work that genAI does not affect all contributors in the same way. Some studies emphasize larger gains for more experienced users who can better integrate AI into existing routines, while others highlight its role in helping less experienced participants compensate for missing skills. Developers using AI tools tend to shift effort toward refinement and integration tasks such as debugging, testing, or writing documentation rather than pure exploration, suggesting that genAI may better complement already formalized practices. Other papers study the impact of genAI in online communities and collaborative platforms, documenting changes in participation and quality of content (Burtch et al., 2024; Quinn & Gutt, 2025). From an economic perspective, the corporate turn in OSS and the diffusion of genAI raise a central set of questions about incentives, value capture, and ecosystem dynamics. In particular, genAI may interact differently with intrinsic motivations and extrinsic rewards, shaping who captures productivity gains and how it affects the volume and content of contributions, collaboration patterns, direction of innovation, and code quality. OSS is a digital common and a key input to the software economy and its vitality depends on decentralized contributions and on governance and coordination mechanisms that sustain long-term maintenance. As a result, shifts in contributors' incentives and in the organization of work can have broad consequences for innovation and ecosystem sustainability. This thesis therefore seeks to document how genAI reshapes OSS activity and collaboration, with a specific focus on whether firm-linked and independent contributors adjust differently. It also examines whether genAI changes the direction of OSS innovation by tracking which project types and functionalities expand or contract over time, and whether these reallocations translate into changes in robustness and code quality. The core contribution is to treat genAI as a structural change that may shift both production and control in OSS, by reallocating effort across projects and altering the distribution of maintenance and review responsibilities. The project is positioned at the intersection of economics, information systems, and software engineering research on genAI and digital labor, and it contributes by putting OSS ecosystem dynamics and the distinction between corporate and independent developers at the center of the analysis. This thesis aims to produce a rigorous empirical analysis of the impact of generative AI on the open source ecosystem, distinguishing between the dynamics specific to independent contributors and firm-linked contributors. The scientific objectives are organized around three axes: (1) measuring the effects of genAI on productivity and the distribution of work across contributor types; (2) analyzing transformations in collaboration networks and governance structures; (3) assessing the extent to which the diffusion of genAI redirects open source innovation and affects code quality. Ultimately, the thesis aims to produce at least three publishable papers in leading academic journals in economics and information systems, as well as concrete recommendations for practitioners and policymakers.
The core dataset combines two complementary sources. The first one is a panel dataset constructed from the GitHub REST API. We start from a random sample of one million developer accounts and collect information about all their public repositories. For each repository, we collect the creation date, the last update, the owner type, the main programming languages, license indicators, the number of stars and forks, and the list of collaborators with their respective contribution volumes. We also have data relating to the developer's social activity on the platform, such as their subscribers, subscriptions, or the stars they have given and received. To complete this panel, we also collect additional data from the Github Archive project, which aims to accurately record the public activity of Github users over time. We are therefore able to build a monthly task portfolio for each developer, then analyze how this portfolio changes following the arrival of generative AI.

Le profil recherché

Méthodes quantitatives & économétriques

Inférence causale (doubles différences, études d'événements, variables instrumentales)
Économétrie sur données de panel
Traitement automatique du langage (modélisation thématique, embeddings)
Apprentissage automatique pour la classification et la détection

Programmation & ingénierie des données

Python (pandas, NumPy, scikit-learn, NetworkX, transformers)
API REST GitHub & pipelines de collecte de données
Git, workflows reproductibles (ex. DVC, Snakemake)
R (modélisation économétrique : fixest, lfe, did)

Connaissances disciplinaires

Économie de l'innovation et des marchés numériques
Écosystèmes de logiciels libres
Organisation industrielle
Économie des plateformes

Compétences requises

  • Python
  • Développement logiciel
  • Git
  • API REST
Postuler sur le site du recruteur

Ces offres pourraient aussi vous correspondre.

Recherches similaires

L’emploi par métier dans le domaine Data et IA à Paris