Monday, March 08, 2010

You Should Probably be Using Version Control Software

If you're working on a coding project that more than one person *ever*
is going to be contributing to, you should definitely be using version
control software - in fact you would be a fool not to, just because of
all the headaches it will save. Even if it's just the two of you, with
some kind of system to not step on each other's toes, if it's not proper
version control, your system sucks and it will waste your time
painfully. But I'm going to say why it might be worth using it even if
it's just you and even if you don't need to do any programming.

How version control software works is that there is one "master" copy of
the project folder stored in something called a repository, which should
be on a shared server. When a programmer begins working on it, he or she
"checks out" that folder, into what is called a working directory on
their local hard drive. When a significant change has been finished and
tested, and the code is now back to working, they check the code back
in, updating the repository. The great advantage to using version
control software is that it handles the situation where two people are
working on the same folder or the same file - it identifies places that
were changed by both people and then aske you to manually merge together
the changes to make a program that makes sense.

What makes it super useful even if you're writing code solo is the
version tracking aspect of the software. The repository stores in an
efficient format the state of the project at all previous check-ins, so
you can always retrieve an old version. Much better than ad hoc ways to
hang onto old code (the worst being the "copy the folder and add 'old'
to the end of the name" technique, which I've definitely been party to).
You have a detailed log message (well it should be detailed)to identify
the version, and also powerful tools to tell you which files were
changed and exactly which lines were changed and how between the current
and any past version (or between two past versions).

I resisted putting my experiment code under version control for years,
but I noticed an immediate change in my behaviour when I did: I became
bold. I could plow ahead when I saw a change I wanted to make, without
worrying about how I might get back if I changed my mind. And I no
longer had many copies of the same code lying around, all with different
conventions to their names, with no idea which ones work or (if you
leave it long enough) which is even the final version. With version
control there is only one "real" version. But the whole history of the
project is well taken care of. (I'm not even going to get into the
ability to create and maintain alternate versions of a project, known as
branches, which is a more advanced topic) It even vastly reduced the
number of useless .m files around, since you have to explicitly add the
files you want to use to the repository.

Which brings me to the cost of starting up version control, which is not
insignificant, and kept me from bothering with it all this time.
Subversion, or SVN is what I use - it is free, very widespread, and
relatively simple, but it's still pretty intense for people not used to
using it. You have to create, move, and delete your files differently,
and think about checking in and updating your local copy, not to mention
learning to resolve conflicts. Most important is to really understand
the underlying concepts of what's going on, which in itself can take a
half day of reading and experimenting. You should *definitely* read at
least chapters 1 and 2 of this before starting:
I was lucky enough to be trained in using version control in my software
engineering jobs, but even people who are ok with programming I might
have warned away from this because of the complexity, except that I've
found an excellent GUI for Windows that makes it much more intuitive and
accessible, and solves a lot of the headaches SVN itself introduces,
called TortoiseSVN:
It integrates beautifully with Windows explorer, and allows you to
easily surf your repository and past versions as well as some tricky
things (like importing an existing folder) that are hard in SVN. However
it's still very important to understand the underlying concepts, so
definitely read the background chapters in the TortoiseSVN docs. (it is PC only, but apparently there is an equivalent for mac OS called SC plugin,, which should be uusable to interact with the same repositories and checked-out directories since they are both based on SVN)

TortoiseSVN also means for the first time that even people not doing
programming should think about using SVN, for one feature: Microsoft
Word versioning. SVN is ok for storing non-text files, it just can't do
comparisons between versions, or resolve conflicts. However Tortoise SVN
*can* do that for Word files, and amazingly: differences between
versions show up as though they were changes made with Track Changes
turned on, so dead simple if you're used to that. So version control
worth using on manuscripts you will be working on for weeks for the same
reasons as code: so you can boldly strike out in a new direction with
the text, and be sure that all the old versions will still be safe if
you need to backtrack or you need a complete older version to send to
someone while you're ripping apart the current one. It's happened so
many times, me and/or my advisor decide we should go back to an earlier
take on some material. Because of that, looking in a folder for a
current paper of mine, I have literally 10 copies of it with different
version numbers in the name. No more. From now on, with Tortoise SVN,
only one Word file for this manuscript. (plus I don't have to make up my
own version numbers)

Note that this solves a different problem than backups (which I cover
here), though it is related and can help with that. If you really know
what you're doing you might be able to use Apple's Time Machine software
to replace some of this functionality for the solo user.


The biggest challenge with adopting version control software beyond learning the basic concepts, and the one that can get you into snarls, is that you have to reteach yourself to do all the regular file manipulation operations in a new way. To that end I have made a summary of how to do the basics in TortoiseSVN:

Create a file or folder - just create it, and then later right click it, go Tortoise SVN -> Add...

Delete a file - right click and choose Tortoise SVN -> Delete.

Delete a folder - right click and choose Tortoise SVN -> Delete. Note that in this case the folder will not disappear until you commit the changes.

Move a file or folder -

  1. select the files or directories you want to move

  2. right-drag them to the new location inside the working copy

  3. release the right mouse button

  4. in the popup menu select Tortoise SVN -> SVN Move versioned files here

Renaming a file or folder - Tortoise SVN -> Rename

More detailed instructions at TortoiseSVN's user guide,

One more trick is that using TortoiseSVN it's easy to place folders that already exist under version control (copied from the manual):
  1. Use the repository browser to create a new project folder directly in the repository.

  2. Checkout the new folder over the top of the folder you want to import. You will get a warning that the local folder is not empty. Now you have a versioned top level folder with unversioned content.

  3. Use TortoiseSVN -> Add... on this versioned folder to add some or all of the content. You can add and remove files and make any other changes you need to.

  4. Commit the top level folder, and you have a new versioned tree, and a local working copy, created from your existing folder.

No comments: