NPM: Cutting Onions with a Chainsaw

After a few years of using NPM day-to-day, I feel I’m entitled to finally writing a blog post outlining why NPM is an immensely powerful tool that is unfortunately not designed to handle the task of dependency management well in large projects.

NPM, the node package manager is both the package registry for all things node as well as the name of the accompanying command line utility used to locally manage node-based projects. In combination they make it very easy to create node modules, publish them and add dependencies to those. That’s awesome, and it’s really a very straightforward and simple thing to do.

This is due to the fact that the ecosystem and consequently the design of NPM values a low entry barrier to publishing a package and the ease of adding packages to an existing project very, very high. This also means that there’s no vetting in regards to the quality of packages and the community has widely endorsed and accepted the practice of having both very small dependencies (with often just a few lines of code) and not having best practices regarding the maintanence, quality or reliability of packages. Number of downloads, the software world equivalent of “who screams the loudest”, is seen as a sign of quality, not as a sheer number of distribution. I can’t think of any other ecosystem that would simply accept the fact there’s an is-even package, which consists of only one exported function, which in turn depends on the is-odd* package. It returns the negated value of the is-odd modules only exported function. I’m not kidding you, this package exists and has had a solid 88k downloads in the last week. Also, it hasn’t been updated since 2014 and chances that you’re having a copy of it on your local hard drive are pretty good.

Now if this was about pointing out the brokeness of a package manager by showcasing a simple example, I would be done, but I actually want to take you along to a walkthrough highlighting a number of reasons why NPM is not up to the task. In fact, it might have common functionalities you’d expect from a package manager, but it’s like using a chainsaw to cut onions: It’s cumbersome, you’ll likely get it wrong and there’s just too much potential for harming yourself to make it worth it.

Normalising the Edge Case

In dependency management world there’s two approaches to managing transitive dependencies – that is, dependencies that your dependencies depend on.

The first approach is a white-box approach: The dependencies of your dependencies are, to some extent, your problem, and you’re responsible to make sure that for each package your project depends on (be it directly or indirectly), there’s one final version number that is compatible with all the modules you have. And that actually makes a ton of sense, since otherwise you’d end up with a number of versions of the same package, in the same dependency tree. Or as NPM puts it: normality.

The other approach is the one that NPM chooses, and it comes down to: You don’t have to care about transitive dependencies at all. NPM will automatically, if required, install multiple versions of the same dependency to satisfy each modules requirement. This is a bad thing. Why?

First, it can (and sometimes does) lead to incompatibilities at runtime, since there’s more than one version of the same dependency around. This can lead to fun-to-debug scenarios where actually multiple major versions of the same dependency are loaded.

Secondly, this endorses using and integrating packages that are not kept up to date. If you’re keeping an up to date project in the Ruby world, depending on, made up example, Redis version 5.0 and try to add a dependency that depends on Redis 4.0, you’ll get an error – and maybe a hunch that the dependency you’re trying to add is not exactly bleeding edge. This gives you an important data point, independent of number per downloads, about the maintenance state of a module and maybe makes you switch to something more up to date, which is overall beneficial to the quality of the software you’re in the end going to ship.

Thirdly, this makes it very easy to, indirectly, integrate modules that are lacking important security patches or updates. Noteworthy, npm now has a special function that audits all the packages in your tree for known vulnarabilites. But this is rather fixing the symptom, not the cause.

What would NPM ideally be doing? I think allowing more than one version of a module in a dependency tree has some edge cases where it makes sense, but it overall should highlight, on every npm install, that not all modules could be resolved to a single version and for which modules multiple versions have been installed. This makes it visible that this not-desired behaviour is happening, and it also makes it more likely that it’ll be resolved.

The weight of a package

That’s not the only problem with the black-box approach to transitive dependencies (they’re really the root of all evil). The second problem is the invisible weight of a package. Let’s look at jest, a popular test framework in the node world. Now, when adding it using node install --save jest or yarn add you’re actually going to add around 34.9 Megabytes of stuff into your node_modules. That’s right, 34.9 MB. Now, that’s not a lot of course, but it’s just for a test framework. Want to use AWS? Fine, drop in around 45 MB in dependencies. And the list goes on. The general issue here is always: Well-maintained dependencies try to keep their footprint low, and succeed in doing so (hello, express!). There’s unfortunately no way for you as a developer to easily find out which those well-maintained dependencies are. There’s an excellent tool called [Package Phobia][] that will give you the stats you need to make informed decisions on the matter, but it’s not integrated.

The idea of showing the weight of a package isn’t exactly brand new. If you’ve installed anything on a debian system lately, you’ll be well familiar with the prompt from apt-get asking you if you want to install this package, along with install size and all dependencies by name, before doing so. This is what npm should do, but doesn’t. It’s not about making things impossible, it’s about guiding users to make the best possible decision by adding relevant information in that process, and NPM just doesn’t do it.

No one could say: few hundred MBs in your node_modules directory, no big deal in times of fast internet and abundant disk space? Sure, and I’m inclined to agree, if it wasn’t for the brute-force way NPM does local caching:

Absence of by-default Local Caching

Now, I don’t want to say that Maven or RubyGems or CocoaPods are superior in every dimension to what NPM does, but when it comes to local caching, it unfortunately is the case.

What do I mean by local caching? Given that you need Version 2.0.0 of Module A, NPM will download that if it’s a dependency. It will also put it in the projects node_modules folder – all good. It will however not keep a local copy of that somewhere else, so next time you need it, it can be copied/loaded directly from that location. That behaviour can be turned on, but it’s rarely used simply by the reason that it’s not a default. NPM would go a bit easier on the resources (and faster on the clean npm install) with that, but for some reason doesn’t.

Also for packages that require extensions to be built, a local cache would drastically decrease the time it would take to reinstall a dependency, since now, all built artefacts are also stored only in node_modules. Of course, there’s a tool that does just that, node-gyp-cache, but once again, it’s not a default, it’s something people need to install on top to make it work. And this is by very definition the opposite of a sensible default.

No Standard Library

The next pain point is really due to the fact that Node comes without a comprehensive Standard Library that contains key functionalities to work around the shortcomings of JS as a language. This is not really NPMs fault, as the language clearly is a tad older, but it anyhow turns into a problem that NPM amplifies – that of tiny dependencies.

In most ecosystems there’s a certain baseline for what is considered worthy of being published as a module. Given by the ease of publishing packages, the absence of quality control and the fetish for depending on half the internet in node projects, tiny modules are a common sight and actually contribute to fantastically deep dependency trees. This increase time to resolve all dependencies, installation time and also it makes it impossible to clearly understand where all included packages come from. This has in the past already lead to quite some incidents and it’ll probably happen again. Honorable mention: yarn comes with a handy functionality called yarn why the explains why a package is included.

Back to the original point though, if Node was bundled with a comprehensive, well-maintained standard library that would contain functionality found in the most commonly used node modules (is-even, hi there!) this would bring the average number of dependencies certainly down and would overall encourage the move to bigger-than-a-function modules. Something like lodash would certainly be a good start.

Local Dependencies are a Pain

Now this point might or might not be relevant to your workflow, for me it unfortunately is. Given the case that you have a repository that contains not one, but two node modules. One of them depends on the other. Wouldn’t it be fantastic to be able to say: Yeah, I want to simply use the local version of that module? Ok, to be transparent, you can. By changing the entry in the package.json from "moduleA": "1.0.0" to "moduleA": "file:../moduleA", it will now use the local module.

But, NPM being NPM, it does so in the least intuitive and most time- and space-consuming manner possible. It actually copies the contents of said folder to the node_modules folder of the including module when running npm install. There’s probably a lengthy discussion somewhere in the internet on why that is the absolute right thing to do, but it just doesn’t feel like the right way.

To ease the pain created by this unsensible default npm comes with npm link and npm unlink to actually establish symbolic links between packages. Once again, fixing the symptom, not the root cause.

Also, on every npm install anything linked in using npm link will of course be blown away again, but I bet there’s also more than sane reasoning behind that behaviour.

Of course I’m fully aware of the limitations of module loading in the node world and that it probably contributes to this issue – but it’s something that anyhow needs to be addressed.

Oh my this turned into a rant.

I’m a firm believer that the first step to effective change is an admission or recognition that something is indeed broken – and not just lightly, but fundamentally. I think the reason behind every single decision on why NPM behaves in the way it does is probably sound. I also think that the overall result is a mess, and I hope I gave some food for thought with the above points. It’s not ok to easily have a node_modules directory that is bigger than a Ubuntu Live CD (which is easily doable). That’s not how things should be.

NPM should, in my opinion, move to become more opiniated, work with sensible defaults and take a more guided approach in managing projects and dependencies. This should include adding helpful information to all add/install operations, working with local caching and making it easier to work in a local development flow.

This would, at least for me, make it seem a bit more enjoyable working in this ecosystem.

*To add to the drama, is-odd depends on is-number, and unfortunately i’m not making this up.