A Large Scale Analysis of Semantic Versioning in NPM

Donald Pinckney, Federico Cassano, Arjun Guha, and Jonathan Bell
Mining Software Repositories (MSR), 2023

The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a “semantic versioning” (‘semver’) scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible, while breaking changes require manual intervention by downstream package maintainers. In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem. We build a time-traveling dependency resolver for NPM, which allows us to determine precisely which versions of each dependency would have been resolved at different times. We segment our analysis to allow for a direct analysis of security-relevant updates (those that introduce or patch vulnerabilities) in comparison to the rest of the ecosystem. We find that when developers use semver correctly, critical updates such as security patches can flow quite rapidly to downstream dependencies in the majority of cases (90.09%), but this does not always occur, due to developers’ imperfect use of both semver version constraints and semver version number increments. Our findings have implications for developers and researchers alike. We make our infrastructure and dataset publicly available under an open source license.

PDF available on arXiv

  @inproceedings{pinckney:npm-mining,
  title = {A {{Large Scale Analysis}} of {{Semantic Versioning}} in {{NPM}}},
  booktitle = {Mining {{Software Repositories}} ({{MSR}})},
  author = {Pinckney, Donald and Cassano, Federico and Guha, Arjun and Bell, Jonathan},
  year = {2023}
}