Coder Diary #32 -- A Look Inside the Sausage Factory
Three days in the life:
ORIGINAL: berto
Heads up [to the Dev Team]:
Things have gone well. Everything checks out as "good enough for now" at least.
You can anticipate my releasing a ton of new stuff, both data and EXEs, for both ME & VN, today (Saturday).
Then the map & data guys can expect follow-up posts throughout the weekend and beyond explaining the new stuff, and what still needs to be done.
Get ready to rumble! [:)]
ORIGINAL: berto
Um, hit a snag. I did something that broke edmap startup, causing the map editor to crash. Debugging it now.![]()
ORIGINAL: berto
All bets are off. Who knows when and how I can fix this edmap launch failure bug, and when and if I will release. I'm in deep debugging hell here. [:(][:(]
ORIGINAL: berto
Evidence suggests a memory allocation error somewhere. It only manifests itself in Release-mode EXEs, not in Debug-modes EXEs. I have been building EXEs in Debug mode only for the past month or more. I need to revert back to the last time I released known-good EXEs to the team, all the way back to January 29. Then diff the codebase between then and now to find the needle in the haystack.
Ugh, I particularly hate this sort of bug hunt. [:(]
ORIGINAL: berto
Like a good Boy Scout, I am prepared. I have all sorts of things I can try.
ORIGINAL: berto
I rebuilt a non-crashing edmap from the 20160129 codebase (the last release EXEs).
I rebuilt edmap from the 20160228 backup codebase. The rebuilt edmap crashes, in the same way.
It seems clear that this is a software fault that I introduced sometime between 20160129 and 20160228.
(I did various other checks, including: virus scan; validated the core Windows system files; verified that nothing has changed in my Visual C++ setup in recent months; etc. Everything checks. No, all signs point to a flaw in the codebase.)
Now my strategy will be to revert to backup codebases, doing a binary search, narrowing it down between dates, until I find the two successive codebases where the fault occurs. Then it will be a matter of diff'ing those two successive codebases. I still won't have my needle, but by then I should have a much, much smaller haystack. (Because how many diffs will there be from one day to the next? Not many.)
Thank God for daily backups!
...ORIGINAL: berto
I narrowed it down to a three-day span, between 20160216 (good) to 20160219 (bad). I have interim code backups for both 20160217 & 20160218, but at the time of backup, the code was not successfully compiling on those days, so they are effectively unusable as points of comparison. (That is, I can't run EXEs built from those dates.)
From the codebase diffs between 20160216 & 20160219, I spotted one suspicious thing, but after fixing it, no dice, I still get the crash.
The crazy thing is that the crash only shows in edmap, not the other EXEs; and only manifests in the Release EXE, not the Debug EXE.
I have scrutinized the diffs, and darned if I can see a problem anywhere.
I know exactly in the code where the Release EXE crashes. It's where the new Map object is constructed. I have looked at the Visual C++ library code for this function.
But here's the thing: That might be a red herring. It could very well be that something else entirely separate in the code is overwriting memory somewhere, causing this strange side effect -- the "not enough space for thread data" fault. Sometimes, in coding and debugging, it's like the bug is purposely giving random clues that point you in the wrong direction.
This is shaping into one of the most difficult bugs I've ever faced. Most difficult, because I can't use my usual tricks of single stepping through the code in Debug mode -- this only happens in the Release EXEs remember, EXEs where debugging stuff has been stripped from the code. And I can't use my new code tracing mechanism, or the usual logging either, since the fault lies (maybe; see above) in the Visual C++ libraries, which of course I can't modify (so as to add my own debugging code).
Nothing to do except to keep trying this, trying that, experimenting, thinking outside the box. Doing Web searches also, though much of what I see is extremely technical, and much of it off-target or sometimes even rubbish.
Not the way I had planned to spend my weekend.
<sigh> [:(]
ORIGINAL: berto
I have found the bug. I know exactly where and how to toggle on/off the R6016: "not enough space for thread data" edmap crash. I still don't know yet how to fix this properly.
Still, it's progress, major progress even.
ORIGINAL: berto
After 24 hours of trying this, trying that, trying almost everything, I went back to the 20160217 & 20160218 codebase backups, in each case doing the minimum necessary to get them to compile. Miraculously, the compiled EXEs from both days were both crash free.ORIGINAL: berto
I narrowed it down to a three-day span, between 20160216 (good) to 20160219 (bad). I have interim code backups for both 20160217 & 20160218, but at the time of backup, the code was not successfully compiling on those days, so they are effectively unusable as points of comparison. (That is, I can't run EXEs built from those dates.)
Having narrowed down the time frame where the bug first appeared -- between the end-of-day 20160218 backup and the end-of-day 20160219 backup -- I then, file by file, incrementally applied the 20160219 edits (making the haystack smaller and smaller, as it were) until ... I found it!
Not!I know exactly in the code where the Release EXE crashes. It's where the new Map object is constructed. I have looked at the Visual C++ library code for this function.
This:
[:@]But here's the thing: That might be a red herring. It could very well be that something else entirely separate in the code is overwriting memory somewhere, causing this strange side effect -- the "not enough space for thread data" fault. Sometimes, in coding and debugging, it's like the bug is purposely giving random clues that point you in the wrong direction.
Later in April or so, I really need to shift to my new development platform -- new PC, Visual Studio 2015, and supplemental power tools. It will then be so much faster and easier to fix these issues.This is shaping into one of the most difficult bugs I've ever faced. Most difficult, because I can't use my usual tricks of single stepping through the code in Debug mode -- this only happens in the Release EXEs remember, EXEs where debugging stuff has been stripped from the code. And I can't use my new code tracing mechanism, or the usual logging either, since the fault lies (maybe; see above) in the Visual C++ libraries, which of course I can't modify (so as to add my own debugging code).
In all too typical fashion, Windows supplies a crash error message that is completely irrelevant, and worse than useless, it's misleading. As was so much of what I read on the Internet.Nothing to do except to keep trying this, trying that, experimenting, thinking outside the box. Doing Web searches also, though much of what I see is extremely technical, and much of it off-target or sometimes even rubbish.
No, one wise old sage had it right:
In the end, the bug is a weird array index error, where memory is read/written to outside one end or the other of the memory allocated to the array(s).I have 40 of programming in every type of programming language. The error you are getting is caused by an access to a memory location outside the bounds that the operating system is allowing and causing an exception. So any type error that accesses a invalid memory error can be causing this problem.
At this point, I have the bug cornered. But I don't quite know yet how to vanquish it without perhaps causing collateral damage.
And beyond. A torturous last 54 hours it's been. [:(]Not the way I had planned to spend my weekend.
<sigh> [:(]

Until the next time ...