all posts

Npm Slop & Wonky Software Supply Chains

Simon Ramstedt, · last updated

The problem with the two most widely used open-source package ecosystems, npmjs.com and pypi.org, is that they are not actually source-based. Instead, they are based on unreproducible developer-uploaded bundles and binaries. In general, published packages can't be reproduced from their original source and for packages without provenance attestation, you can't even know if the package was built from the linked source repo in the first place. This has led to supply chain incidents on both registries (e.g. 1, 2).1

Npm offers no reliable path to install from source. Even packages with no dependencies often require patches or other tricks to be usable without going through npmjs.com.2 This isn't easy to fix either. Many popular npm packages pull dozens or even hundreds of transitive dependencies, making it impractical to switch to better dependency handling or vendoring. I think of these as npm slop.

Pip, unlike npm, does offer an option to build the entire dependency graph from source, but it requires all used dependencies to publish their source to pypi.org. Many packages containing native binaries don't do that (e.g. PyTorch or JAX)3.

Both npmjs.com and pypi.org have adopted attestation (see SLSA and PEP 740). Attestation certifies that a package was built by some trusted provider from a specific source commit. Attestation is a real improvement, but it doesn't make it easier to rebuild from source. Also, the attestation protocol itself has gaps: the source commit and the workflow file (e.g. publish.yml) are pinned, but the runner image and anything the workflow downloads at build time are not.

Any supply chain where the source of its dependencies isn't pinned via hash is wonky. While npm and pip do pin dependencies via lock files, the hashes in those lock files cover the built artefacts and not the sources.

There are many ways to do better here, for example by using git submodules for dependencies, or using custom workflows like the one in Versatile Npm-Free Web Stack. Unlike npm, with pip just specifying git repos with commit hashes as dependencies works well for pure Python packages. For the really ambitious, today, systems like Nix and Guix allow you to go even beyond that and source pin entire build environments and native runtime dependencies.

In practice, source pinning will bottom out somewhere. Sometimes you have to rely on some closed CUDA library or a MacOS tool or a compiler that is hard to bootstrap4. But in so many other cases we give this ability up despite available sources, robbing ourselves of flexibility and control in our software supply chains.

  1. Attestation does not prevent all supply chain incidents, but it would force the attack to happen at the source level, i.e. a malicious commit in the actual repo. Source-level attacks are more amenable to automatic detection, e.g. by AI code review.
  2. While npm supports specifying git sources as dependencies, most packages don't support this or require a large variety of native build tools to be installed. In Versatile Npm-Free Web Stack an elegant workaround is presented.
  3. pip install --no-binary :all: <pkg> forces pip to use source distributions instead of pre-built wheels.
  4. Amazingly, people have achieved full-source bootstrapped builds requiring nothing pre-built except an x86 processor. Guix completed a full-source bootstrap in 2023, and Nix seems to be getting there too.