Karl Berg, on 23 April 2013 - 11:48 AM, said:
Maybe; the failure cases for this issue are usually much more extreme, but please let us know if the hotfix makes any difference.
... no comment
@Rixsaw
Not quite. Here is what was done:
Mech input state is queued up into a 20 hz stream of traffic sent from your system to the server. The server processes this state and relays it to all other clients. Obviously then, for a 16 player game, you're sending one set of inputs up to the server, and receiving inputs for 15 other players. For a 24 player game this is compounded, one set of inputs up and 23 sets of inputs down. This movement state traffic dominates all the other game traffic being sent in terms of bandwidth costs.
Taking a look at what is actually in that mech input state, you have some aim angles, throttle settings, jump jet status, torso turn settings, and a small collection of other essential states. At 20 hz, a lot of that state doesn't change from one input to the next, so it's a fairly reasonable optimization to only send input values when they change, commonly referred to as delta compression. Most of our optimizations are focussed on the data being sent to you from the server, since that grows with player count.
To start, we add a sequence id to map transmitted movement state to a specific identifier. Now every move we send to the client is tagged with its sequence number. If this movement packet requires previous state to decompress, we add that base state identifier as well and delta compress the current state against that base state before transmission.
Well, we're using UDP, and all this traffic is unreliable and unordered. The underlying movement system is set up in such a way that it will reorder or simply reconstruct lost traffic over a very small window of time. If any received input is too old it's simply discarded.
We still have to deal with the problem of knowing what states the client has received, so for every state the client receives, it send back a really small 'state ack' packet containing a small identifier and the last received sequence id. The CryNetwork layer handles combining all these little tiny packets into optimally sized packets for transmission for us. Now on the server it's quite simple to always delta compress against the most recently ack'd state for each client.
It's the transmission of these ack packets in combination with small levels of packet loss that have messed things up. My guess is that our sending of lots of tiny little messages is incorrectly triggering flow control logic in the network layer, but it will take some digging to really track down where and why this is happening.
Small update on the hotfix, QA has it and is testing it out in the stable environment. If all goes well, the absolute earliest it might end up getting pushed out would be sometime tomorrow.
Awesome Karl ! Your explanation was a bit more complicated than I necessarily needed, but I appreciate your willing ness to flex your technical muscle
I am among other things a Voice Engineer, so we have to deal with the same problem you are often.
Video or Voice conference calling. It also has a 20hz packetization, with a 150ms stream buffer. The reason you use these is most humans won't really notice a slight lag in voice that happens 150 ms later. Maybe in the game the lag has to be tighter, so that may be where the issue is. Anyway, it looks like you fixed it, good job.
Basically the netcode is a voice conference call
Each user sends his audio up, the server combines, and sends the official outbound audio to all participants.