“Damn it! It just can’t read the update file any faster!”
It was late 2012, and I had a little bit of freedom after the launch of ComicBase 16 to try to take on some of the “Big Issues” for ComicBase’s future. High on the list was better mobile support (more on this in a separate article), a possible replacing of the underlying database technology, and possibly even facing down the prospect of completely rewriting in .Net.
“Dotnet” as it’s pronounced, is a Microsoft technology that had clearly been the future of the company’s development path for some time–but which promised to pose a monumental struggle for porting the mammoth code base behind ComicBase. We’d actually done an investigation of what it would take to make the move three different times over the years, but had to turn back each time when it became clear we’d have to essentially rewrite and refactor what had become a very large and complex program. Worse, if we somehow managed to rewrite ComicBase in .Net, our developers would get the benefit of much better build tools (albeit at the price of endless hours of programming and re-testing), but the customers would be unlikely to notice any difference at all.
Actually, that last part isn’t quite true: the progress bars in .Net are decidedly nicer. The rest of the changes would be technical and architectural in nature–which is to say, virtually invisible to the end user, unless we used the rewrite as an excuse to polish up various bits of the program using the newer technology.
But for today, I wasn’t worried about any of those things. I had decided that I wanted to see what could be done to make the weekly updates faster.
Introduced in ComicBase 10, the weekly updates did something previously unimaginable: the ComicBase staff had taken on the job of keeping all our customers updated with all new comic information the same week the comics appeared on the stands. We’d supply all the new data on every comic released each week–along with its artists, writers, storylines, and other special information–and our customers would be able to simply download it and have their database be instantly current, instead of adding all that data in themselves. All the customer had to do then was simply check off which books they had in their own collection, or better yet, use a barcode scanner to “bleep” them in to their own collection.
For understandable reasons, the weekly updates were a huge hit, but it meant that we had to take on the incredible amount of work to acquire several hundred new comics each week, as well as keeping up with the constant pricing changes that were happening in the world of comics. We opened up our own Diamond Comics account, and soon were ordering one copy of virtually every issue sold, which our indexers would scan and index within a day or two of their arrival so they could be part of the Friday update.
Later, we added in a “Submit new or corrected data” feature to ComicBase which allowed several dozen amazing customers to add to the wealth of knowledge we were processing, and the pace of additions to the database doubled, then doubled again. Soon we were processing thousands of new issues and additions each week, and the database grew to encompass virtually every English-language comic that had ever been printed–as well as hundreds of thousands of foreign books.
But now there was a new problem: the sheer size of the database was starting to make the process of downloading and processing the weekly updates an increasingly lengthy process for our customers. What once took them only a few minutes was starting to stretch on for 15 minutes or longer–sometimes much longer if they had a slower machine or were upgrading a very old version. If customers hadn’t been updating regularly, it was not unheard of for an update to contain hundreds of thousands of updates to everything from pricing to artist credits. Unfortunately, updating this much information meant that customers were spending too much of their time watching progress bars while they waited for all those changes to be incorporated.
So I had decided to take some time and really pound on the code for the updating process, trying to wring the last bit of performance out of it. Numerous late night hacking sessions ensued, but for every clever programming trick I came up with that saved a few ticks of the clock, the time savings soon vanished as the flood of new comics swelled the database to ever greater scope.
After yet another late night of coding, I was discussing the problem with my wife Carolyn as we walked over to Starbucks on our morning routine. “I think I’m at the limit–no matter how fast computers are going, it just looks like it’s going to take several minutes to even read–let along process–the update file. After all, it’s got something over 10 million distinct pieces of information in each one.”
“Can’t you cut it down?” she asked?
“Not that I can see. There’s no telling how long it’s been since someone updated, and we need to be able to catch them up to date even if they haven’t checked for updates all year long. We could cut down on the amount of data we offer, but a big part of the appeal of the program is that it’s the biggest database of comics in the world.”
“If only there was some way to have the updates happen automatically so that people weren’t standing around waiting for them each week–.”
“Hey, I’ve got an idea…”
(to be continued)