Andy Berry

Musings of a Software Engineer

HomePostsAboutProjectsExperience
All posts
April 18th, 2016programming

Are your dependencies as safe as your source code?


Disaster recovery

For all companies who write software of any kind, the source code is one of the most valuable assets the company owns. Without the source code, there is no software.

With the explosion of software as a service (SAAS) models, this is particularly important since without the source code there is no business model and no company. Given the importance of the source code, it certainly needs protecting and there is a lot to protect it from. There's accidental data loss, hard drives breaking & servers failing; accidental data corruption, a force push to master deleting previous commits; malicious damage from internal sources, disgruntled employees deleting things; and malicious damage from external sources, such as hackers breaking in.

At this point in the evolution of software development, we can assume (I hope) that some kind of source control system is in place to keep track of the change history and manage multiple developers working on the same codebase. This is most likely Git. Due to Git's distributed nature, it automatically gives an added layer of protection and redundancy just by using it. The Git repository containing the source code is also (I hope!) automatically backed up on a fairly regularly basis so in the event of any issues it's relatively easy to recover. This process really should be part of the standard backup and disaster recovery plan and included with things like server and database backups.

Great, so we've got ourselves some source code that's protected from data loss and corruption. We've also got a mechanism where if we need to we can recovery from data loss. If you haven't, you should probably stop reading here and do a manual backup now, just in case .


Disaster recovery

Now having some source code isn't usually the whole story. With the open source movement taking off, there's plenty of source code and libraries that have already been written that we all use to save having the write more code than we need to - maintaining code is expensive so the less code you have to maintain yourself the better. If you're doing it right these libraries won't be in the source control system but downloaded by a build system as the code is compiled. So we now have external dependencies for our project or software.

So what happens if someone, either accidentally or maliciously, manages to corrupt the source code repository? Well, we have a backup we can restore from and on each machine that needs to build the software we can then continue to download any dependencies we need, so we're back in business. Now, what if the service we're downloading these dependencies from either shuts down, loses data, or someone deletes a library they had previously provided. Then what?

That'll never happen you say? Well, it's exactly what happened recently when one NPM package maintainer decided (rightfully or wrongly) to un-publish all of his modules (source code) from NPM. Now this itself wouldn't have caused a problem if it weren't for the thousands of projects that used, either directly or indirectly, one of these now un-published modules such as leftpad. Worst still some of these packages had millions of users in turn depending on them. Overnight things like Babel, a popular JavaScript ES2016 compiler, suddenly broke when people attempted to download them. What followed was a minor panic across the Internet due to '#leftpadgate' (yes it's now a 'gate'). So was this just a one-off?

Fairly soon after 'leftpadgate' GitHub, THE place to host Git repositories, itself had a bug where issues (bug reports) lost their description when labels were added - not an issue with hosting code I know, but it demonstrates that even huge companies like GitHub can lose data.

Now things returned backed to normal fairly quickly after the two example incidents, but it demonstrates that data can be lost or deleted by anyone, no matter how large and established the company or ecosystem. Companies that depend on writing and/or deploying software to exist need to be ready to get back up and running again should any data be lost. What's more, unless you're a huge corporation paying thousands to use a service, or you're able to kick up a massive storm, the chances are you're not going to get very far in complaining if you do lose or are denied access to things you depend on.

So neither our dependencies nor our source code are safe. It's up to use to make sure it's protected and data recovery is possible. The risk that you'll need to recover third-party dependencies is slim, but if your business is so heavily dependent on them then surely you'll be backing them up too. Right..?


So what's the solution? Well, it depends on the type of project you have and what you rely on to be available, but keep it simple. Something like a nightly backup job which does the usual database backup along with a zip of your dependencies and an archive of your source code repository will generally be adequate. Then put these in a dark (and cheap) place like Amazon Glacier and forget about them. You'll hopefully never need them but, in the event you do, you'll be able to restore.

New disaster plan

Credits

  • Disaster recovery - http://www.kanatek.com/wp-content/uploads/2015/12/bigstock-Disaster-Recovery-Plan-Drp-101895317.jpg
  • Dilbert disaster recovery - http://everyday-tech.com/wp-content/uploads/2013/09/Disaster-Recovery-Templates.jpg
  • New disaster plan - http://www.cloudtweaks.com/wp-content/uploads/2013/10/disaster-recovery-cartoon-1.jpg
Published April 18th, 2016
buildnpmdisaster-recovery

Thoughts and opinions inspired by life in the Software Engineering industry

© Andy Berry, All rights reserved..

HomeCVLicenseSitemap