science | design | engineering | common sense

Archiving Project

leave a comment »

Two hard drives failed on me in the last 2 years, and until I hit a spat of good luck recently and recovered most of the data, I thought everything was irretrievably lost.  Everything worked long enough to recover most of the data (all the important stuff) from the drives with some finagling (I even froze one of the drives – twice). Once I’m finished gathering the data, I’m dumping  it onto a reasonably safe backup source (RAID’d external hard drive) and other backup mediums. It’s about time I got proper storage set up for my stuff, and a good excuse to sort and purge as needed, and convert everything into a reasonable folder structure and accessible format.


Default Format:  Plaintext, Unicode or ASCII standard
Images: jpeg (default), .png, .psd or .xcf for editable documents, w/jpeg copy
Audio: mp3 (compressed) or flac (uncompressed)
School Papers: .doc (Word 2000/2003 compatible) original documents, w/text copies
General Documents/Essays/Books: text (if possible) or PDF (for docs which aren’t plaintext)
Disk Captures/CDs/DVDs: Standard .iso format (why use anything else?)

Sure, I know that these aren’t all ideal formats; mp3 is still semi-proprietary (the patents are expiring in a few years), but mp3 is a widely used and standardized format, and the comparable open source format (ogg) has to swim upstream against mp3’s popularity. The benefit just doesn’t make it worth converting my whole collection of audio files. For uncompressed audio on the other hand I gladly use flac, there’s really no reason to go with anything else.

As for backup, I’m pushing everything out to an Iomega Ultramax Pro, which is a 2 hard drive enclosure which I will configure with RAID 1 (mirroring). Really important data will get backed up to DVD as I’ve been doing for awhile. Absolutely important stuff will be printed out into hardcopy and stored.

Edit: October 4, 2009: The Ultramax Pro is working great; $160 for 700GB of external RAID 1 storage, with USB+eSATA connections, and formatted mainly with FAT32 for backwards compatibility, with an Ext2 partition for large files and an extra 50GB partition for temporary large file transfer off Windows/Mac/Linux systems.

Lessons Learned on Data Portability

File and Directory Names
  • Descriptive. For example, if naming a song, name with the title and the group/composer/performer. That way you can find information quickly and easily through a simple title search, this makes finding and indexing much easier.
  • Command-line Compatible. No spaces, keep it short and typeable. Common directory names should ideally be 3-4 letters long, though this isn’t necessary everywhere. Names should never be location-dependent – name that school paper “essay-class-spr2008”, this is far more description then naming it “essay” and relying on the location (the directory “School>Spring 2008”) to supply the other information.

Rich Text is not plaintext. It is far preferable to .doc or other formats if you need simple formatting, but use plaintext wherever possible. Plaintext>RTF>.doc, in terms of compatibility. Do not create your rtf files using MS Word – it produces terrible files.

There is PDF, and then there is PDF. Using PDF as a container for scanned pages creates huge pdf files which aren’t machine-independent (they require a large screen to view properly), and text-only PDF’s might as well be stored in plaintext. PDF should only  with forms, magazines and other documents with a lot of formatting or diagrams, which need its features to display properly. Otherwise, use plaintext.


Written by logand

September 22, 2009 at 4:20 pm

Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: