It’s 3 AM, the day after Daylight Savings Time threw everyone’s internal sleep clocks into absolute chaos. (I say “chaos” based on both my own personal feelings, as well as the flood of fire service calls we’ve had today, including an overdose, a suicide attempt, and numerous other ways that our local residents have signaled their general lack of fervor at the idea of getting up tomorrow).
Worse yet, had it not been for the time change, I could have started this blog post with “It’s 2 AM, and the fear is gone” — and my opening would have been much cooler. Now I’m blaming Daylight Savings Time for writer’s block too. Way to go, DST.
But nevertheless, here I am, writing a pretty darned geeky blog with the hopes that some poor schmoe might stumble upon it in a session of mad Googling and save themselves some of the five hours I’ve just blown on one of the more painful programming pitfalls I’ve managed to stumble into in recent memory.
As part of a general modernization of ComicBase’s web APIs, we’re testing out a new set of calls to our servers which locate all the items you’ve sold on Atomic Avenue and let you deduct them from your inventory–as well as (minor spoilers here) finding all the comics you’ve scanned with the app while you’re out in the real world and which you now want to add to your desktop database.
Since it’s incredibly helpful to be able to watch the action on both the client and the server side of things when you’re doing work like this (and since it’s considered presumptuous for the programmer to set breakpoints on the production server which would stop the site cold), I’ve been working with a local copy of the ComicBase.com and AtomicAvenue.com sites, running under a development version of the web server software called “IIS Express” . Things had been going well, and I was watching the program carefully validate the user’s credentials, look up their databases, get the right data and post it back to the user–all the while checking for all the jillions of things that could go wrong in terms of bad passwords, invalid user accounts, lost network connections, and just about any other simulated problem you can imagine–trying to make sure we handled them all as gracefully as possible.
It’d been a long weekend on this project, but as I say down around 10 to finish things up, I was feeling pretty good about my chances to knock off early, grab a beer, and maybe even check out that crazy Polish cyberpunk video game I’d started a while back (Observer). All I really had to do was step through the different cases in the debugger, make sure they were being handled right, then remove the breakpoints and watch the whole thing run at speed to get a sense for how the system would feel in real use.
Everything was going well, but as I started tidying up and removing my breakpoints, breakpoints, suddenly I started getting bad data back from the web requests which were rock solid mere moments earlier.
So I put the breakpoints back and started single-stepping through them, puzzled all the server calls came back exactly as expected–only to give 404 errors moments later when I let them run at speed.
That’s when the night started to blur into one long slog which resembled nothing so much as an escape room whose puzzles had been planned by a madman. I’d check the code, it would behave. I’d set a breakpoint for a couple of lines after the call completed, and it’d work. But if there was ever a case where two web calls in a row fired off, the second one would always fail.
“OK”, I thought… It’s probably some sort of thread issue, which seemed all the more plausible that any call that I waited even a couple of seconds on before proceeding to in the debugger would run normally. Unfortunately, chasing down problems like this–whether they’re thread deadlocks or inadvertent calls to non thread-safe libraries–are a royal pain in the tucchus to track down.
The hours went by as I double-checked that all my async calls were properly awaited, that I hadn’t accidentally blocked them by calling “.result” at the end of any methods, and so on and so on with all manner of obscure programming lore. This was followed by endless googling on StackOverflow to see if anyone else had a similar problem or could suggest answers.
I tried removing the asynchronous calls; I tried marking all the relevant async calls with ConfigureAwait (False) to help them keep their context straight; I even tried rewriting all the HTTPClient calls in the old-style WebClient mode which allowed me to get rid of the mere idea of anything being asynchronous at all. Sure it’d mess up system performance and make the app seem slower to users, but as the clock edged past 2 AM and all the Fiddler packet traces in the world showed nothing useful, I was willing to try darn near anything to make some progress.
But even rewriting the whole set of web calls to be fully synchronous using the ancient WebClient routines was getting me nowhere. They ran great in the debugger, but immediately returned 404 errors when run without breakpoints set. What the living heck was going on?
So then–as much to make my Fiddler traces make more sense if I had to post the whole thing up on StackOverflow in the hopes that someone smarter than me could figure it out, I decided to move the new routines up to our production server and get a trace of them running there.
And they worked.
Perfectly.
With no debug points set.
Over the next several minutes, many curses were muttered as I leaned on the Ctrl-Z (Undo) key and watched the last several hours of my typing undone, block by block, until I was basically back where I was when I sat down to work tonight. The only real difference was that the code I was using to call the routines was pointing to the real server, running the real version of IIS instead of the IIS Express running on my development system.
And the whole darn thing was working right.
Sooo… what did we learn here? Well, there’s apparently a strange glitch in the behavior of the various web pieces of the Microsoft web client framework which keeps repeated calls to those routines from resolving properly when used on a Microsoft Visual Studio 2017 session on IIS Express. Basically, if you’re going to use the local server to debug, something may not resolve quite as fast as it should when it comes to the web calls, and if your calls start stacking up, you might want to try either slowing down your debugging, or moving some of the critical pieces to their final homes and testing there before you give up.
I also learned a lot of ways not to solve this problem, which has its own sort of value to programmers. And I would up learning about 4 entirely different techniques for making web post calls–all of which blew up in exactly the same way when run at speed on the development system. In a way, that’s what made me suspect that the problem may not have been purely code-related at all.
I also learned that I truly detest Daylight Savings Time. And now at 3:55 am, I am absolutely going to bed.