Avoiding the Chasm

January 24, 2010

Affordable Safe Backup

Filed under: Technology — Tags: , , — vextasy @ 12:18 am

S352U2RER.smallChoosing the right backup system to use at work or home is a difficult, and frankly uninspiring use of time, and one which almost always results in making undesirable compromises. There always seem to be far too many options and the good solutions come with a price tag that almost matches the cost of the system they were designed to support. A good backup system should satisfy, at least, the following requirements:

  • It should be affordable.
  • Operation should be reasonably well automated.
  • It should be easy to restore to one of several points in time.
  • There should be some redundancy.
  • It should be simple to store some backups offsite.
  • Media should be encrypted for security.

The system described here uses an affordable drive enclosure to host two of three disk drives in a scheme that gives both disk mirroring and offsite storage all for a very affordable price of, roughly, £260. Of the three disk drives, two stay in the enclosure and the third is kept offsite and periodically exchanged with one of the two in the enclosure. Each disk drive is mounted in a drive tray (a one-off operation which requires nothing more than the use of a Philips screwdriver) which allows it to be easily inserted and removed from the enclosure. The enclosure is connected to a USB port on a computer and looks, to the computer, like a single large disk. I use 1TB capacity disks which allows me to securely store approximately 1,000 GB of data which is a lot even for a small business.

Operation

The drive enclosure is populated with two of the three SATA disk drives and attached to a computer (Mac, Windows or Unix) through a USB connection. A DIP switch configuration on the rear of the enclosure can be used to choose from a number of different configurations allowing the two drives to appear as either one big drive, two independent drives or a single drive using RAID 1 mirroring for increased protection against disk failure. This backup scheme uses the latter configuration, RAID 1, which employs disk mirroring in which the enclosure maintains an exact copy of its first disk on the second disk so that if either disk fails the remaining disk can continue operation without loss of data. If a failure occurs, the failed drive can be replace with a good drive and the system will automatically mirror the data to the new drive without any downtime. From the computer, the enclosure appears as a single USB (external) drive.

SyncBackProThe second disk can be removed at any point and replaced with another which will be automatically mirrored with the contents of the first disk. The process of mirroring a 1TB disk takes about 3 or 4 hours but during this period the drive can be used normally. A green LED above a drive indicates that it is functioning normally, a flashing amber LED indicates that the drive is in the process of being mirrored and a red LED is indicative of a hardware fault with the drive.

Backup software on the computer populates the USB drive as it would any externally attached drive. I use the excellent SyncBackPro for Windows to pull files in from other machines on the network and write it to the drive but any archiving software could be used as appropriate for the platform.

Archival Storage

Some operating systems provide a mechanism for maintaining historical copies of your files within the file system. Windows does this with what it calls Previous Versions, and  on Mac OS X similar functionality can be achieved by using Time Machine. Both these mechanisms enable fuller use of the storage space on the drive by keeping old versions of all files, even files that have been deleted since the last backup, for as long as space remains on the backup disk. Once the disk fills up these systems will automatically begin to prune back the oldest versions of files, keeping only as many old files as the disk can hold. Both of these mechanisms, Previous Versions and TimeMachine, will allow you to view the files in any backed up folder on your system as they were at several points in the past, typically at daily or more frequent intervals.

Redundancy and Offsite Storage

The enclosure keeps two drives in sync automatically so that should one of the drives in the enclosure fail the other one will continue providing read and write functionality without any interruption of service. This gives one form of redundancy, but by swapping the mirrored disk with a spare one on a regular, say daily, basis you get to maintain as many backup copies of your complete data as you feel comfortable with. It is easy to manage a small pool of spare disks which can be used in rotation. If we then keep one or more of these disks at a different location to the enclosure we have an offsite backup.

Very cheap USB caddies can be purchased which will hold a single SATA disk drive. These can be used in an emergency to mount any of the disks on a computer if the enclosure fails or if you need to access the data that is on one of the drives from a different computer or location.

Encryption

truecrypt If securing the content of your backup data is important then the free open source TrueCrypt is an excellent tool. TrueCrypt provides on-the-fly encryption of an entire disk which means that data is encrypted or decrypted just before it is saved or loaded from the disk. The operation works transparently, encrypting the entire disk volume without any user intervention. Data is copied to or read from the encrypted disk exactly as it would be to or from an unencrypted disk. No data can be read from the disk until the correct password has been provided so if one of your disks is lost or stolen you can be confident that its contents will remain safe. The documentation on the TrueCrypt web site provides a step by step guide to installing and using TrueCrypt to protect a USB drive.

The decision to encrypt is an optional one and can be delayed until a later date. TrueCrypt software is such a good tool that I would recommend experimenting with it even if you don’t decide to use it to encrypt your backups. One of the modes of operation of TrueCrypt allows you to create an encrypted file on your normal file system which can then be mounted by TrueCrypt as a drive (or volume) on your computer. To the computer this looks like a normal external disk but has the advantage that all of the files that you write to the disk are securely encrypted and cannot be read without providing the correct password. TrueCrypt is software that I would be prepared to pay quite a lot of money for but it is open source and free.

Cost

The system I describe here costs roughly £260 (including the backup media) and provides 1TB of always available RAID and offsite backed storage – this makes it a very competitively priced solution for a small business or home worker.

I purchased the enclosure and extra drive trays from Dabs.com but the enclosure is also available from Amazon. The hard drives can be purchased from anywhere but should all be of the same capacity – Good 1TB drives can currently be found for about £60. The individual component costs for the whole system were:

Component Price
1 x USB Dual Removable SATA RAID External Hard Drive Enclosure £70
1 x Extra Hot Swap Hard Drive Tray £10
3 x 1TB Internal SATA Disk Drives £60 each
TrueCrypt Open Source Disk Encryption Software £free

With a total component cost of about £260, the benefits of this disk-based system over our old tape based solution are enormous not just in price but in flexibility and features. In short, it is a solution that I would recommend.

April 17, 2008

Broadband Network Usage Monitoring and Measurement Tools

Filed under: Technology — Tags: , , , , , , , , , , — vextasy @ 8:10 pm

If you have found that your ISP has been restricting your broadband bandwidth the obvious question you will ask your ISP is why? If you ask that question the answer you are likely to get is that you have been using the “broadband service inappropriately”. You might also be told that you have exceeded your usage allowance but if you, like me, are on an unlimited contract you are unlikely to be told what the upper usage limit is. The reason you won’t be told the actual value of the upper limit is that it is likely to be more complicated than a simple figure and the reason that this is the case is because the ISPs are principally interested in avoiding router congestion at peak times. At off-peak times, such as in the early hours of the morning and during the working day, it makes little difference to an ISP if the capacity of their network is 25%, 50% or 75% used, they still have the same equipment costs and other overheads. But once the network reaches capacity, and routers are forced to drop packets, then customers start to notice and the ISP begins to get a bad name. For this reason most ISPs have, quietly, begun to apply traffic shaping at peak times.

Traffic shaping involves restricting the bandwidth of `heavy usage’ customers in such a way as to prevent them from interfering with the network experience enjoyed by `lighter usage’ customers. Unfortunately it looks like we are in for a lot more of this as network demand grows. The popularity of the Internet as a medium for watching media has rocketed in recent months and looks to continue to grow as more and more people switch their viewing habits from the more traditional broadcast medium to Internet based technologies. In the UK, the BBC iPlayer alone has been responsible for tremendous changes in network usage.

I recently found myself in the position of having determined from my ISP that I was being traffic shaped. Unfortunately, my router provided me with little help in identifying the volumes of data that were passing through my broadband connection each month but my ISP furnished me with figures which seemed to be considerably higher than I would have expected. Finding myself in a very weak position I decided to rearrange my home network to allow me to gain a better understanding of my broadband usage.

My broadband is supplied by BT on an unlimited tariff and I use the BT supplied broadband router (2-Wire 1800) which hosts two wired Ethernet connections, one to a PC and one to a network attached storage device, and a number of wireless connections to PCs or laptops. There are nine devices in total but typically a maximum of 4 might be actively using the broadband line at any one time, and by active I usually mean browsing web pages. Living reasonably close to our exchange we can manage to achieve a download speed of about 6.3 Mb/sec and even with the bandwidth restriction we still reach this speed outside of peak times.

WallWatcher

WallWatcher

The problem with having a wired network is that the only device that can really determine how much traffic is flowing is the broadband router as everything else talks directly to it and it funnels data into and out of the ADSL line. I attempted to get my existing router to log traffic information to a PC so that I could take a close look at what was travelling up and down the external line.

There are many different logging analysers available but the one I chose to use was WallWatcher a free tool with support for a large number of routers.

Tomato

Unfortunately I found that I couldn’t get WallWatcher to correctly recognise the format of the packet logs coming from my model of 2-Wire router. Linksys_WRT54GLIn my case the solution was to make use of a spare wireless router that I had which was not being usefully employed, a Linksys WRT54GL. This variant of the WRT54 family of router runs Linux and can be easily upgraded to run an alternative piece of firmware. I wanted to concentrate my network devices on the Linksys router and then run a single connection from the Linksys router to an Ethernet port on the back of the 2-Wire broadband router. I also wanted to make use of the fine bandwidth reporting available from the Tomato firmware which this router can be upgraded to run. The process of upgrading the router took about 10 minutes as it can be done from a menu option within the native Linksys firmware. Once Tomato was up and running on the Linksys it was easy to configure it to provide a good quality wireless network that replaced the old 2-Wire network and as an added benefit was also faster.

I configured the Linksys to store its bandwidth logs on a network Windows share and to forward its packet logs to WallWatcher as before and the results bandwidth-24hrswere immediately interesting. Tomato hosts a number of web pages that show bandwidth over varying periods of time. I could see straight away that the download bandwidth on the WAN port was considerably higher than I would have expected and at quite a sustained value (see the image on the right). The graph shows consistently high volume of download overnight and then a period of very low activity in the morning when all of the PCs were switched off followed by high usage again from about 2pm when they were restarted. Tomato also hosts a real time bandwidth display. Using these displays combined with WallWatcher it was easy to identify the PC responsible for the heavy usage and by examining the addresses of the remote end-points shown by WallWatcher it was also easy to determine the offending program.

TV Tonictvtonic_realtime

In my case, the program generating the traffic was the TV Tonic RSS service, a program responsible for downloading video podcasts from the Internet. I hadn’t realised that the program was still active as I had not made use of the client for a number of weeks. Incidentally, the TV Tonic client runs as an add-on to Windows Media Center (under Vista) and is quite a nice addition. Had I looked a little more closely at its configuration I would have noticed that not only did it have an option to limit the download bandwidth but also it had a download scheduler to control the time of day that it be allowed to download at all (both of these options would be nice to see adopted by the BBC iPlayer). I’m not quite sure why TV Tonic was downloading such large amounts of data but, not wishing to experience another month of bandwidth restriction, I immediately disabled the TV Tonic service and the Tomato real time monitor showed the corresponding reduction in bandwidth usage.

BBC iPlayerwallreviewer-out

Even after disabling the TV Tonic RSS service there still seemed to be a lot of network activity from my PC although I wasn’t running any obvious client program. A closer look at the WallWatcher log display showed a large number of incoming and outgoing UDP packets wallwatcher-iplayerbeing sent to external machines. WallWatcher comes with a charting tool called WallReviewer which gives a useful interactive graphical picture of incoming and outgoing traffic information over a given period of time. The WallReviewer chart of “Outbound Leaks by Remote Names” showed a large number of packets being sent to the machines iplaykdms82.telhc.bbc.co.uk and iplaykdms6.telhc.bbc.co.uk. The names of these remote sites suggested the BBC iPlayer might be responsible but the application wasn’t running and the option “allow programmes to be shared when you exit download manager” was not ticked in the iPlayer configuration dialogue so I had assumed that there ought to be no networking activity from the iPlayer Kontiki-based software. I found that if I disabled the Windows service named “KService” (which runs the BBC iPlayer program “C:\Program Files\Kontiki\KService.exe”) then all of this network activity stopped immediately. From the WallWatcher display it was clear to see that these packets were being sent about every 2 to 4 seconds but WallWatcher is not able to give any indication of the size of the packets.

Wireshark

To get a better indication of packet sizes a protocol analyzer is required. The “old faithful” in this area used to be called Ethereal but development on Ethereal has now been moved to Wireshark. Wireshark is simple to install and can be used on many platforms. It is also free and licensed under the GNU General Public License. There is a lot more to Wireshark that the casual user is ever likely to need and a basic knowledge of networking protocols and terminology helps but there is plenty of documentation.wireshark-iplayer

Running Wireshark on my PC confirmed that data packets were being sent to the BBC domain every two to four seconds but also showed that the packet sizes were small, 16 bytes of payload which by the time they have been wrapped in the UDP and IP packets amount to a 58 byte Ethernet frame. I find that having disabled the KService service I am unable to start up the BBC iPlayer but as soon as I re-enable the service the iPlayer functions as normal.

Conclusion

Having made these networking changes I am now in a much better position to know exactly how much traffic is being downloaded (or uploaded) over my broadband line and also able to detect this traffic early on to avoid triggering any ISP penalties. The tools required to monitor bandwidth are not expensive (in fact they are free) and are easily configured. I think that my ISP should have been able to give me the information that I needed to monitor and control my bandwidth – it feels a little like having been sold a car which has no fuel gauge.

One lesson that can be learnt from all of this is that it is becoming more and more important for anyone with a reasonable grasp of networking to take matters into their own hands to monitor their own network usage. I don’t see the ISPs relaxing their grip on our usage patterns in the short term, at least not until their own issues of congestion have been addressed. So by tightening up on wasted bandwidth we should be left with more to do the things that we really want to use it for.

April 7, 2008

My 10 Favourite (free) Windows Tools of all Time.

Filed under: Software — Tags: , , , , , , , , — vextasy @ 9:57 pm

Reading Ed Bott’s postings about his and his readers’ favourite Windows programs of all time I was surprised to note just how many of the programs on the list had an associated price tag rather than being free (as in beer). In particular, what attracted my attention was that had I been asked to guess which were free and which were not I would probably have failed miserably. For example, a text editor for $33, a note taking tool for $60 and a screen capture utility for $40, but a complete news aggregator for free.

I work, mostly, in a Microsoft environment and so the majority of my main software development tools for that platform are either purchased or licensed through an (expensive) subscription but, like most readers, I like to adorn that environment with utilities that make for a more agreeable working experience. Sometimes those utilities relate directly to work tasks and sometimes less so, but what I notice is that most often those utilities are free (or effectively so – more on this later).

I constructed a list of the utilities that I use on a regular basis at work and at home and very quickly the list grew well beyond 10 in size. As it doesn’t seem sensible to attempt to order them in any way (because such an ordering would make an assumption about your motives for having them in the first place) I leave them unordered. Likewise, as I don’t feel comfortable choosing my top ten, I describe more than that number here but the real list is much longer and growing.

TrueCryptTrueCrypt

TrueCrypt is disk encryption software which allows either an entire disk partition to be encrypted or else a virtual encrypted disk to be created from a file and then mounted as a Windows drive. The software is Open Source, well documented and thoroughly well thought out. I haven’t had the courage to get to encrypt a real partition yet but do use it to maintain a number of well protected virtual drives that I can mount when I need access to the documents that I store securely inside them. A drive can be mounted once the required password (or password and key file, or correct encryption keys) are provided and once mounted it can be used just like any other Windows drive. The contents of a TrueCrypt drive are never stored in their decrypted state on disk they are only ever held temporarily in RAM. TrueCrypt drives are a great place to store that collection of documents that you know should really be kept secure.

Cygwin

For software developers, like me, who were brought up in a Unix environment the lack of a real command line in the Windows environment can be stifling. Now I know (the awfully named PowerShell) is now available, but what made the Unix environment so complete was the rich set of commands that could be glued together with whichever variant of the Bourne shell was in vogue. Cygwin provides that same environment but hosted under Windows. The choice of programs is truly massive: editors, shells, compilers, interpreters, text and document processors, libraries, windowing systems. Most things GNUish can be found there courtesy of the GNU C compiler and friends too. Integration with Windows through the filesystem means that all of these tools can be used to process files and media residing on any Windows drive.

TimeSnapper TimeSnapper1

TimeSnapper quietly records your activity by taking snapshots of your computer screen at regular intervals through the day. The interval between snapshots can be configured to a given number of seconds and the recording is achieved without any noticeable pause or flicker. This is really handy on those days where you have moved from one task to another and have not been as meticulous about recording your exact timings as you should have been as it allows you to replay the day a snapshot at a time or to jump quickly to a particular time of day and see what you were working on at that point in time. You provide TimeSnapper with a folder it can use to store the snapshot images and chose the format (.png, .jpg, .gif, .wmf, .tiff, .bmp, .emf) and the resolution of the stored images as a percentage of the full screen resolution. TimeSnapper will also manage the archiving of the snapshots if you provide it with an age beyond which you wish it to delete old images or an upper limit to the amount of space you would like it to allocate to storage. Multiple displayed are handled too. This is a tool you can forget about until you need it, and then its a lifesaver.

Copernic Desktop Searchcds2-screenshot-all-big

Copernic Desktop Search is one the many similar search products but what I really like about this program is its intuitive interface. Of course, it is packed with all of the features you would expect from any such search tool and, of course, it indexes a myriad of document and media file formats inspecting meta data inside the files for rapid lookup. It also understands, and so can index, email and contact information from Outlook, Outlook Express, Eudora and Mozilla Thunderbird.

The using interface, rather than relying on a web browser as some search tools do, reacts dynamically as you type, homing-in on the information being sought. Indexing happens on-the-fly and only when the machine is not heavily loaded (and this is configurable). Copernic confirm on their web site that you can “Rest assured that the data indexed by CDS stays on your PC and on no account will it be transferred to us or any of our partners”. The licence only allows for non-commercial use. A separate licence exists for commercial application. That said, I know people for whom this has revolutionised the way they use their PC and I recommend this as a productivity tool.

Virtual CloneDrive virtualclonedrive

As a software developer I find that I am often presented with application software in ISO format. It is always a pain to have to burn a DVD just so that it can be mounted in a Windows drive and then discarded and probably never used again (probably never even labelled) once the installation has been completed. Most of my MSDN software arrives this way. SlySoft’s Virtual CloneDrive allows these images to be mounted directly from the ISO file on the file system. Several other formats are supported in addition to ISO.

MusicBrainz Picardpicard

If you have ripped your CD collection to MP3 or other digital format you will almost certainly have found errors in the track and album metadata that the music files contain or inconsistencies in the naming conventions used by each of the different people who have provided this information. MusicBrainz Picard comes to the rescue by applying the accumulated knowledge from the very well moderated MusicBrainz database. MusicBrainz is a community music metadatabase that attempts to create a comprehensive music information site and you can use the Picard tagger to automatically identify digital music and then tag it and to clean up the existing metadata tags in your digital music collection. I used Picard to correct the Windows Media Player created tags in my own music library when I ripped my entire CD collection to mp3 format and use it regularly each time I purchase music.

Pidgin logo.pidgin

Pidgin is a multi-protocol messaging client that handles a large number of instant messaging protocols: AIM, Bonjour, Gadu-Gadu, Google Talk, Groupwise, ICQ, IRC, MSN, MySpaceIM, QQ, SILC, SIMPLE, Sametime, XMPP, Yahoo!, Zephy. I can really only claim to have used the MSN and IRC protocols but the reason for turning to Pidgin was to allow me to communicate with my family members on MSN without having to endure advertisement hell. Pidgin supports away messages, typing indications and file transfer between clients.

Firebug firebug

If you are anything more than the most casual of Firefox users or if you create any kind of HTML content or even if you are simply interested in the structure of the HTML page that you are viewing in Firefox you should be interested in the Firebug extension to Firefox. Firebug integrates with Firefox to enable rich examination of a web page structure including:

  1. an interactive and graphical identification of the effect of individual sections of HTML on the resulting display going from both HTML to display and from display to HTML.
  2. an indication of the CSS rules, and the order in which they have been applied, that determine the final appearance of a screen element.
  3. the ability to change elements of the CSS or HTML source and immediately see the resulting effect on the display.

Firebug was written by one of the original Firefox developers and the slickness of the integration is evident. If I could only keep one Firefox extension it would be Firebug.

7Zip7ziplogo

Its difficult to get to excited about a file archive tool, especially one that performs well is unobtrusive and just gets the job done. 7Zip is just that kind of tool, integrating well with the Windows explorer shell context menus but providing more functionality and better performance than the native Windows archiver (Compressed folders). When writing an archive, 7z, ZIP, GZIP, BZIP2 and TAR target formats are available and when reading an archive any of RAR, CAB, ISO, ARJ, LZH, CHM, MSI, WIM, Z, CPIO, RPM, DEB and NSIS formats are available. 7Zip can optionally apply AES-256 encryption when creating 7z and ZIP format archives.

Gimpwilber_painter

Gimp, the GNU Image Manipulation Program, is as close as you will get to a tool like Photoshop or Paint Shop Pro without spending a lot of money. For most of the image related tasks that I need to perform it is overkill (by a long shot too) but if you are prepared to put some time into learning the basic techniques some impressive results can be obtained. There are quite a lot of helpful web sites within reach of Google that contain hints, tips and tutorials for those who make the effort. Also take a look at paint.net, a relative newcomer but receiving a lot of praise.

JungleDiskjungledisklogo

JungleDisk is a tool that puts a user-friendly front-end on top of Amazon’s S3 Storage Service. S3 enables inexpensive off-site storage of files up to 5GB in size to an unlimited capacity. Storage costs are of the order of $0.18 / month per GB with data transfer rates of between $0.10 and $0.20 per GB. JungleDisk itself is not free (in spite of my claim in the title of this post), it costs $20, but can be used on as many PCs as you like with the same Amazon S3 account. I include it here because compared to the cost and worry of on-site storage the combined cost of JungleDisk and Amazon S3 is effectively free, at least as far as I am concerned. JungleDisk can perform on-the-fly encryption of data as it travels from the PC to S3 and decryption on its return journey, it can make the S3 storage appear as a mapped local drive and it can perform scheduled backups from the PC to S3.

Launchy launchy_icon

Launchy is a smart search program which tries to guess which program you are looking for and will launch it with the minimum number of keypresses required to satisfactorily identify the desired program. It is designed to help you forget about your start menu, the icons on your desktop, and even your file manager.

This is a utility that I didn’t expect to survive my move from Windows XP to Windows Vista because at first glance it appears to provide much the same functionality that is now found with the search facility that is built into Vista’s Start Menu search box. Indeed, to begin with, I survived without it for a couple of months but then I began to miss the fact that Launchy is started with only a hot-key combination and requires no mouse movement or clicks. Launchy lurks in the background and responds to the Alt-spacebar key sequence by opening a small input field to accept keyboard input. On typing, Launchy searches its indexed list of known programs for the closest match, the search being refined with each additional keypress. When the desired program is identified a hit of the enter key is all that is required to launch the program.

Launchy can be customised to search specific locations for commands and to recognise additional files type, or to provide additional arguments or accept user supplied parameters to commands and it can also perform online searches with google, msn, yahoo, live, weather, amazon, wikipedia, dictionary, thesaurus, imdb, netflix, and msdn.

SharpReadersharpreader

SharpReader is an RSS feed aggregator created by Luke Hutteman and is the only RSS reader that I have ever been completely comfortable with. The application is infrequently updated but (possibly as a result) runs without faulting and simply does the job well. In addition to allowing a collection of feeds to be browsed it also presents a stacked list of alerts up the right hand edge of the display whenever new items arrive. The lifetime of these alerts can be adjusted to allow just enough time to quickly scan them without them becoming too much of a distraction to the job in hand. At the time that I started using SharpReader the only other utility that I felt came anywhere close to it was FeedDemon. FeedDemon is now also a free product and I have been dual running it alongside SharpReader – the jury is out, but SharpReader still has the edge.

gVimvim2

Vi was one of the first Unix visual text editors, taking its name from the two character command that switched its predecessor ex into a, so called, visual mode. Ex, in turn, is a descendent of ed which was written by Ken Thomson back in the 1960s as part of the Multics environment and contained one of the first implementations of regular expressions. Vim was created in 1991 for the Amiga computer as an extended version of the vi editor and gVim is the graphical variant of vim. The expressiveness of regular expressions combined with the rather terse but necessary and sufficient approach command driven editing that this family of editors supported went on to fuel many of the ideas in other important Unix commands, notably grep, sed and later awk (which you could argue was responsible for the creation of Perl). The lineage continues with Rob Pike’s sam and acme for the Plan 9 and Inferno operating systems.

You could argue that Vim is part of Cygwin which I have described elsewhere but I think it deserves to be singled out here if for no other reason than for the fact that on a Windows system it allows you to replace the hopeless notepad with something that at least allows you to perform some useful tasks and, if you are prepared to make the effort to learn its command syntax, become more productive too.

Create a free website or blog at WordPress.com.