So, I think I might have mentioned before that there was a Games Magazine contest where the idea was to take a diagram they provided you and "play Boggle backwards"; insert words (from their list) into the diagram so that the largest number of words (from their list) could be found in the diagram.
That was a few years ago, and I toyed around with the contest, thinking that a short program might be able to find the optimal solution.
Turns out, my first run at it ended up hitting a brick wall, as I was trying to explore a very large space exhaustively.
My second run at the problem was a genetic algorithm solution, which isn't guaranteed to get the best solution, but can be a way to explore big spaces and get some pretty OK solutions in some cases.
As it turned out, I got really close to the best submitted solution, and might have won the contest if I had let my gang of machines work a little longer on the problem.
Fast forward to the beginning of January 2009. Games has a similar contest. I put together a search program over the weekend, and got a score of 20 using a single CPU for 4 hours. I scaled up to using 5 CPUs for the next few days and finally got a score of 21. I want to throw more processing power at the problem, but the way I had the machines cooperating relied on all the worker nodes being able to use a shared filesystem, which doesn't work for all my machines at home, let alone any machines outside my LAN that I might be able to press into service.
I poked around looking at Amazon's Elastic Compute Cloud (EC2). This is a neat way to rent a (virtual) machine on the net without having to worry too much about hardware or bandwidth or colocation blah blah blah. CPU power on the network - just what I'd be looking for. I looked at the pricing, and figured out that if I rented two of their cheap machines for the rest of the time before I had to send in a submission, I'd just about burn through my prize money. Still, an idea to keep in mind.
A related service to EC2 is Amazon's Simple Queue Service (SQS), which provides reliable message delivery designed for scalable processes. This seems like a neat tool for this problem and a few others I have on the back burner. The upshot of it is that I could have workers anywhere (like here at work, or an EC2 instance) that submit their results to an SQS queue. I could then have a consumer of that queue at home collate the results and keep track of the best solution so far.
At this point, the reader may say "yeah, that sounds good, but how much work would be involved integrating a new library of somebody else's code into a running system?", which I consider a fair amount of skepticism. I ended up downloading Boto, a Python implementation of the Amazon SOAP protocols. I worked through the Boto SQS tutorial, and verified that I could send and receive messages. From there, it was a simple matter to rip out the bits where I wrote results to the (file server's) hard drive and send them to the net. I then created a tool to read the results coming in on the queue and write them out to the file server. Total amount of work done: perhaps an hour, and now I have 3 more nodes running than I did when I got home last night.
I still don't have a better solution than the 21-word solution I had found before all of this, but that's OK. I now can deploy all the compute power I have at home and even dabble at bringing up an EC2 machine if I feel so inclined.
Bottom line: SQS is easy and allowed me to overcome a real technology hurdle I was struggling with.