Managing Node.js Dependencies and Deployments at Yammer
The node community has seen a lot of discussion recently about managing dependencies with npm. In particular, people are debating the merits of using npm for deploying production applications. Mikeal Rogers’s post, about checking modules dependencies into version control, sparked much of the discussion. Everyone has opinions, and certainly everyone’s deployment concerns are different. I thought it might help folks to give some info about Yammer’s experience.
There are several problems you may need to solve when dealing with dependencies in a non-trivial node app; like having multiple versions of the same module installed, or what to do with native modules that need to build a binary component. I’d like to talk about some real problems we had that led us to having all of the modules for our apps checked into the project repo.
Npm is a great tool, and as a long time member of the node community, I know it’s been an uphill battle to get it to mostly meet everyone’s needs. But it’s still just a tool and not appropriate for all jobs. We still use npm extensively during development, but we found it less appropriate in a deployment plan. More specifically, we found that depending on the public npm repository server was problematic. If npm is for managing dependencies, it seems that by the time you’re ready to deploy, most of these concerns should be over. Things should be pretty set in stone. So why depend on a remote npm server and a fluid deployment? Here’s how we ended up at that conclusion.
We started going down the “recommended” route with a package.json for our project that listed all deps. The top level deps were pegged to specific versions so we didn’t have those updating out from under us. This works well. But the second level deps still used version ranges, and we had several instances where they were updated without it being very obvious. For instance, express has a dependency on connect. But connect can be anything from version 1.5 to version 2.0. That’s a big potential gap. So when you do npm install one day, you may get 1.7.1, the next week you may get 1.7.5. There is even a possibility that this will happen between the time you cut your release and the time you actually deploy, because npm always goes and gets the latest from the server. We had to mitigate this issue.
The next concern is that we had modules in various states. Some were taken straight from npm. Some were forked and modified by us. Some were developed by us internally and we didn’t want those on the npm repo at all. Our first attempt at having more control over this situation was maintaining our own internal npm repository. It would replicate with the public repo to stay up to date, and it could still have our internal modules on it. This seemed like a good idea, and it solved a few problems. We could now have our yammer specific modules in our repo and available for only our deployments. But this also caused some other headaches. Here’s a brief rundown:
- We had several instances where some second level modules changed with an npm install. It didn’t always break the build, but it’s not something you can take on faith. Not everyone has the same ideas about what constitutes a patch version release. So we’re combing through the updates trying to determine if there are any breaking changes.
- Also for the modules we forked, we had to figure out how to have our version alongside the original version. We could rename our fork, but then we’d have to change other modules that were using the original names so they would get our updated version. Now you’ve got more forks.
- The replication between our repo and the public repo was really slow and often failed. This was a couchdb problem I think, but still not something we wanted to deal with. The replication is key, because every time you have to do a full update you’re talking about 4+ GB.
- There was also some confusion every time a new person wanted to set up the project. We had to get them set up to point to our internal repo and make sure they understood how to manage our odd set of dependencies rather than just using the tried and true npm workflow.
- We found that deployments had a lot of activity, which is not necessarily good. npm had to pull down several packages and connecting to our npm repo was kind of slow (it was hosted on ec2). It started to feel silly to make dozens of network calls on every server for every deployment.
- Finally, we had to configure our replication so it didn’t replicate our internal modules back out to the public repo. This is simple enough if you know what you’re doing, but still a source of unnecessary worry.
So after some discussion, we started checking in all modules, and all of these problems literally went away. Now when someone wanted to install the app, they just did a clone, npm rebuild (more on this in a sec), changed some settings and they were off and running. We were fully aware of any module updates because they showed up in git diffs. And deployments were much easier.
There are still some things to consider with this approach, like native modules; those partially written in C/C++ that need to be compiled. We do not check in build directories and binaries. This is a simple entry in our .gitignore file (which is also checked in).
#.gitignore bin scripts node_modules/**/out node_modules/**/build
Ideally, native modules wouldn’t always need to be rebuilt if they didn’t change. But in practice, it’s easier to just rebuild them with every deploy. For most native modules we’ve come across, this only takes a few seconds and still takes less time then pulling everything from npm each time. The “npm rebuild” command is very handy for this, and perhaps not something everyone is aware of. Npm rebuild checks all dependencies and runs build scripts if they are specified in the module’s package.json file. We run this command after the checkout as part of our automated deploy.
You also need to add a few steps to your development process. Most things you can manage as normal: npm install modules, update your package.json, etc. But now when you go to check in changes, you’ll see any updates that happened in your node_modules folder. You’ll be able to see clear diffs of what changed and decide if this is an upgrade you’re ready for. And when you are ready, “git add” these modules and check them in. And as an added maintenance step, if you’re maintaining a separate repo for any forked or internal modules, you may need to update those as well. This seems onerous at first, but we’ve found that only a few members of the team end up maintaining modules. For most devs, their workflow is unchanged.
I hope I’ve painted a clear picture of an alternate option for managing node dependencies. But if checking in dependencies is still a concern, and you don’t want to rebuild native modules on every server in your cluster, a second option is to have a deployment server. This server matches the hardware of your production systems. It may even just use one of your backup machines if you have those. Your deployment process is still simplified; checkout the release, npm install, and then test it out. Then you can run routines that tar that whole thing up and deploy it to the rest of the cluster. A deployment server may help avoid other potential snafus caused by different configuration on local machines.
Node is still a young technology, so “best practices” is still a vague term that is forming as we speak. Yammer engineering is constantly looking to improve and refine our procedures, so our time is spent on innovation and not headaches. I hope some of the information here is helpful for those developing deployment plans for node applications. Of course if you like what you heard, you should work with us.