This kind of post REALLY needs to be put in a known issues section. It's detailed and has QUALITY all over it, thoroughly explains why the failure is happening, and most importantly does so without blowing rainbows and skittles up our collective poop chutes or make excuses.
While it doesn't make the bug any easier to play through, it does make it somewhat less frustrating to know what is going on and that a fix is inbound. It's beta, and bugs happen. Dealing with them head on with the community is always the better route, since keeping them under wraps just lets rumors blossom. Rumors and conspiracies left unaddressed are far more damaging in the long run.
More quality communications like this would go a Looooong way to repairing some of the malcontent the community has harbored. The other example I'd give is the one Nick posted about the testing suite you guys are developing. That is the other post that stands out in my mind as being excellent, since it reminded me at least that you guys have to create the massive infrastructure a MMO requires to work as well as the game we see.
I'd complement you to your boss Karl on this excellent writeup, but I've no idea who you report to! Feel free to pass this on to him/her!
Karl Berg, on 23 April 2013 - 11:48 AM, said:
Mech input state is queued up into a 20 hz stream of traffic sent from your system to the server. The server processes this state and relays it to all other clients. Obviously then, for a 16 player game, you're sending one set of inputs up to the server, and receiving inputs for 15 other players. For a 24 player game this is compounded, one set of inputs up and 23 sets of inputs down. This movement state traffic dominates all the other game traffic being sent in terms of bandwidth costs.
Taking a look at what is actually in that mech input state, you have some aim angles, throttle settings, jump jet status, torso turn settings, and a small collection of other essential states. At 20 hz, a lot of that state doesn't change from one input to the next, so it's a fairly reasonable optimization to only send input values when they change, commonly referred to as delta compression. Most of our optimizations are focussed on the data being sent to you from the server, since that grows with player count.
To start, we add a sequence id to map transmitted movement state to a specific identifier. Now every move we send to the client is tagged with its sequence number. If this movement packet requires previous state to decompress, we add that base state identifier as well and delta compress the current state against that base state before transmission.
Well, we're using UDP, and all this traffic is unreliable and unordered. The underlying movement system is set up in such a way that it will reorder or simply reconstruct lost traffic over a very small window of time. If any received input is too old it's simply discarded.
We still have to deal with the problem of knowing what states the client has received, so for every state the client receives, it send back a really small 'state ack' packet containing a small identifier and the last received sequence id. The CryNetwork layer handles combining all these little tiny packets into optimally sized packets for transmission for us. Now on the server it's quite simple to always delta compress against the most recently ack'd state for each client.
It's the transmission of these ack packets in combination with small levels of packet loss that have messed things up. My guess is that our sending of lots of tiny little messages is incorrectly triggering flow control logic in the network layer, but it will take some digging to really track down where and why this is happening.
Small update on the hotfix, QA has it and is testing it out in the stable environment. If all goes well, the absolute earliest it might end up getting pushed out would be sometime tomorrow.
Edited by Esplodin, 24 April 2013 - 04:58 AM.