ORIGINAL: Big B
It seems to me that your test conditions listed below are rock solid for establishing a baseline. However, after establishing that baseline, how can one know what the effects of different aircraft ratings in different categories will be - unless one starts to throw in those different stats systematically?
Yup that is exactly what one would do!
==
It might actually be of some use to discuss the path I followed a few years back. Note this was with WITP not AE.
I wanted to know which factors where "high leverage" in air to air combat and which were not. In other words I wanted to know which factors - if changed a little - would make a big difference to the results - and which would not. So I set off down the road of making a sand box.
Well I made my first sand box and the results were asymmetrical - meaning given that most things were the same on both sides - I was not getting "flat" results. For some reason, the attacker (the sweeper) was always winning by somewhere between 3 to 1 and 5 to 1. So I assumed my sand box was still not completely neutral with respect to some key variables and so I made some more things the same. The results were still assymmetrical - and I made some more things the same. But now, everything was the same as per my above post - and still the results were asymmetrical - the attacker always won by between 3 to 1 and 5 to 1.
So we went and looked in the code - and holy cow - we found the "sweep bonus". Basically a 3 to 2 chance that the sweeper will get the bounce! This factor was the single most important factor in air to air combat in WITP and probably still is in AE.
Once we figured that out, I went back to testing and starting varying things like firepower, durability, experience and maneuver. And IIRC, that was about the order of importance. Small changes in firepower were important, but larger and larger changes were needed in durability, experience and finallly maneuverability in order to dramatically influence the results.
Interestingly, I know that what many WITPers called the "Zero Bonus" was really the "Sweep Bonus". Someone actually did some exhausive testing (can't recall who) that proved the "Zero Bonus" made almost no difference. But, Zeros flying sweeps will win big time in WITP - thought that is because they are flying sweeps - not because of the "Zero Bonus".
So one possible explanation for loopsided results in AE is still the sweep bonus. If you think you are seeing loopsided results, try reversing the combat and let the "uber plane" be on CAP and the "nada-uber plane" sweep and see if the "uber effects" are taken down a notch.
Another key factor is/was the detection of the incomming raid. In WITP there was a mid-war allied "radar bonus" which dramatically increased Allied fighters on CAP getting the bounce. In AE this is replaced with both sides being more likely to scramble more fighters based on detection of the incomming raid. But over time, in AE, this will result in a significant Allied advantage as they have lots more and lots better radars.
Another factor that seems to matter a lot in AE though I have not done exhaustive testing on it - is range. It seems that there is a big range attenuation factor for fighters - so either on sweep or escort - it seems like fighters flying farther and farther - do less and less well. I've seen P-38s sweeping get clobbered by Oscars on CAP if the P-38s are flying super long range missions.
So, at least based on my experience in AE, if you want your fighters to win, then fly short ranged sweeps or get your opponent to fly long ranged ones and have lots of firepower and durability. And if you're on CAP, have lots of long range radar! A bit of experience won't hurt.