Jump to content

Malformed Packet Causing A Game Drop


164 replies to this topic

#141 DeathofSelf

    Member

  • PipPipPipPipPipPipPip
  • Bridesmaid
  • Bridesmaid
  • 655 posts
  • LocationChicago

Posted 02 June 2013 - 04:23 PM

View PostTyR, on 02 June 2013 - 03:56 PM, said:

Since there are often people who do not read previous posts or check other areas of the forums for information, I will again quote what was said in the Ask the Devs Answers. So, yes they are aware of the problem. Yes they are addressing it. Complaining about the game being broken (again) or the devs are clueless and possibly ignoring your plight does little good. Since this appears to be another CryEngine issue, expect that it probably took a little time to track down as did the HUD issue that was recently resolved.


I used to be supportive like you, then I took a PGI to the knee

#142 TyR

    Member

  • PipPipPipPipPip
  • 133 posts
  • Twitch: Link
  • LocationIL

Posted 02 June 2013 - 07:03 PM

View PostDeathofSelf, on 02 June 2013 - 04:23 PM, said:


I used to be supportive like you, then I took a PGI to the knee

Not trying to be supportive necessarily, just realistic. Given I am often on the receiving end of the idea that a developer can just snap his fingers or wave a magic wand to make a problem go away or add a "simple" new feature, I do not expect the issue to be resolved instantly even if it is now identified and fully understood.

#143 Butane9000

    Member

  • PipPipPipPipPipPipPipPipPip
  • Elite Founder
  • 2,788 posts
  • LocationGeorgia

Posted 02 June 2013 - 07:46 PM

Man I'm not sure what's going on with this particular bug anymore. It was previously really really bad after a new patch for 2-3 days. Then it would be intermittent for the next week or so until finally going away until the next patch. However for the last 2 days I haven't been able to play at all. Every match I launched crashes due to this bug.

#144 R 13

    Member

  • PipPipPip
  • WC 2017 Bronze Champ
  • WC 2017 Bronze Champ
  • 56 posts

Posted 02 June 2013 - 07:55 PM

Yeah, I feel a bit like a lemming at this point. I try about every night. 3 drops. Occasionally I'll be able to complete a match, or get tantalizingly close. Usually though it crashes in the first minute or two three times in a row and quit so I don't keep messing up others' games.

I guess I'll have to do something more productive or give Hawken a whirl until I hear this is fixed. Love the game, it just doesn't love me right now.

#145 Avengar

    Member

  • PipPipPip
  • 98 posts

Posted 02 June 2013 - 09:47 PM

but when's release? if it's a year away that's a LONG time

#146 DeathofSelf

    Member

  • PipPipPipPipPipPipPip
  • Bridesmaid
  • Bridesmaid
  • 655 posts
  • LocationChicago

Posted 03 June 2013 - 06:06 AM

View PostTyR, on 02 June 2013 - 07:03 PM, said:

Not trying to be supportive necessarily, just realistic. Given I am often on the receiving end of the idea that a developer can just snap his fingers or wave a magic wand to make a problem go away or add a "simple" new feature, I do not expect the issue to be resolved instantly even if it is now identified and fully understood.


Fair enough, but, the fact that we are this close to launch/this far into beta and this stuff is still happening is unacceptable. I understand that this stuff isn't easy but PGI has shown time and time again that they have some pretty horrid QA and are not really on the ball when it comes to bugs.

Edited by DeathofSelf, 03 June 2013 - 06:07 AM.


#147 VvFreezervV

    Rookie

  • 5 posts

Posted 03 June 2013 - 02:06 PM

View Postgameadmin, on 02 June 2013 - 02:27 PM, said:

UDP is a dangerous protocol to use for LOGIC its only meant for lossy streaming (audio, video) and can be spoofed (UDP has no security at all of any type). Also UDP can arrive in ANY ORDER, by design of internet. Obviously some insane notions on UDP and netwroking are tainting this app and a systems level engineer is needed on the development team.

PLEASE FIX THIS HORRIBLE BUG! or at least let team mates know that the client crashed out and why. Or let users choose to have only TCP/IP packets instead of UDP "disposable" packets, as a fallback measure if your app notices disconnects.

In FIFO order of wrongness, let's try to educate you...

A) UDP is the primary protocol used for streaming, precisely because it doesn't automatically retransmit

B ) TCP has no security of any type either (no, a byte checksum and sequence number aren't security)

C) Both UDP and TCP can arrive in any order ('by design of internet!') - TCP just reorders itself before being passed up the network stack

D) Both UDP and TCP are merely containers for any data you want to put in them: meaning, you can put a TCP packet in a UDP packet (/so you can TCP while you UDP :D )

E) Why would you want to do (D)? So that the application has control over networking.

F) Why is (E) important? Because if MWO dropped a packet on a TCP connection, the entire connection would block until that packet was retransmitted and received! (Which means that data from 1 second ago would need to be received before you get the data from now... which... words cannot express how wrong that is)

G) Why is (F) important? Because now both you and the servers are wasting time sending you data that you don't even want anymore.


View Postgameadmin, on 02 June 2013 - 02:27 PM, said:

Finally, the game is really fun, but the definition of BETA means "no known bugs at time of release of specifically enumerated version, and feature complete" , the word used in the industry and codified by the esteemed Darin Adler in a tech paper for Apple decades ago is "Alpha". Too many people misuse "beta" and resort to idiotic post-beta naming such as "release candidate" "release candidate 2" "golden release" etc etc. The ACTUAL lifecycle of code naming is v3.2.1d (development) then v3.4.2a (Alpha) then v3.4.5BETA, then after beta testers are done and still no bugs found, then promoted to all as v.3.4.5 or such. The numbers are irrelevant in my examples.,

This code is not "beta". It is clear from web sites that it has known bugs in most releases, and therefore it is alpha code still.

It is very fun and very impressive and amazing work, but it is indeed not beta quality yet.

"beta" should and usually means NO KNOWN BUGS AT TIME OF DESIGNATION OF BUILD, AND FEATURE COMPLETE

I don't think beta means what you think it means. :)

#148 Screech

    Member

  • PipPipPipPipPipPipPipPipPip
  • Knight Errant
  • 2,290 posts

Posted 03 June 2013 - 02:30 PM

Got dropped 6 times in a row today. Using PingPlotter to 141.136.110.9 I got up to 40% packet loss at 77.67.71.145. Guess I will go back to using a VPN as it does seem to work, just annoying. Also usually a lower ping player 30-50.

#149 DeathofSelf

    Member

  • PipPipPipPipPipPipPip
  • Bridesmaid
  • Bridesmaid
  • 655 posts
  • LocationChicago

Posted 03 June 2013 - 06:19 PM

Still totally unable to even get past the start up... You know what PGI? I'm done. Maybe I will give it another try after launch, but the lack of communication shows you really don't give half a rats ***, I have not heard one f-ing word from you. Thank you for destroying my favorite franchise.

#150 Karl Berg

    Technical Director

  • 497 posts
  • LocationVancouver

Posted 03 June 2013 - 08:51 PM

Hey guys, it's worth at least a brief explanation about what's been going on.

A change last patch, which was intended to address huge packet send delays induced by small amounts of packet loss, ended up causing very large numbers of small packets to be sent by mistake, and due to other bugs this was not detected until too late.

The reason for the huge latency with minimal packet loss was we found that CryNetwork implements a form of flow control designed to detect and correct for bad network conditions. This flow control would rapidly throttle back the size of sent packets to ridiculously low values, 13 byte payloads in fact, and would take several 10's of seconds to recover. We also determined that a core loop that was designed to flush outstanding messages to the socket was not iterating correctly, causing the system to send only a single packet per game update loop. The teeny packet sizes, combined with the single packet per frame, caused traffic to get extremely backlogged and would induce huge delays into the network layer.

We 'corrected' the send loop by fixing their main packet loop to at least iterate until all pending messages had been sent. Unfortunately the send queue could block messages, causing the loop to iterate too many times, causing small micro packets to get transmitted.

Now normally we would have caught this, we monitor network traffic very closely. It turns out, unfortunately, that while the CryNetwork traffic metrics correctly monitor received traffic data usage, the send traffic data usage is bugged and was not accounting for the overhead these tiny packets were causing.

We have corrected both our buggy fix to the send loop, and the network data usage counters for this next patch coming out tomorrow. This should fix the increased latency some users are experiencing, as well as the erroneous DoS detection that this bug caused.

This leads me to the disconnect issue.

While it is extremely unfortunate that the previous bug made its way to production, it turns out that this allowed us to isolate and address this major bug which was causing disconnect to mechlab issues. It turns out that this whole time, at the very lowest layer, CryNetwork has been using only a single byte for packet sequence id's. This is an extremely small size, providing only 256 possible sequence values; and we've determined that if there is a large change in connection latency causing these sequence numbers to overflow, the engine detects this as a 'malformed packet' error and forces a disconnect.

The large number of small packets introduced with last patch caused the network layer to burn through these sequence numbers at a much higher rate, hence the increased number of disconnects.

We have now doubled this sequence number size from 1 byte to 2, or from a total of 256 possible sequence values to 65536. This increases the engines tolerance for delayed packets from a second or two at most to something far more sane, closer to 4 or 5 minutes in fact.

This change is also coming out next patch.

I greatly apologize for all the grief these bugs have caused. It's been a pretty rocky set of patches up to now as we've tried to iron out all the issues we've been having with this engine. In addition to the fixes listed above, we've added a whole new set of test conditions to the QA test plan designed to catch and prevent similar issues. Already these new tests have caught some issues on login that users with very poor connections may have been experiencing. We have done our best to address these login issues with tomorrows patch as well.

#151 Deathlike

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 29,240 posts
  • Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 03 June 2013 - 09:00 PM

Now, that's a good thorough post.

Edit:

For Karl, here's some followup questions:

How much bandwidth is needed for 12v12 play for both client and server?

With respect to compression that was introduced recently, how much has that impacted the demand/requirements?

Are more optimizations necessary to make this occur and/or how close should the requirements need to be to make it viable?

Edited by Deathlike, 03 June 2013 - 09:09 PM.


#152 King Arthur IV

    Member

  • PipPipPipPipPipPipPipPipPip
  • Moderate Giver
  • 2,549 posts

Posted 03 June 2013 - 09:13 PM

"""" This is an extremely small size, providing only 256 possible sequence values; and we've determined that if there is a large change in connection latency causing these sequence numbers to overflow, the engine detects this as a 'malformed packet' error and forces a disconnect. """"

this is the only part i understood!! ;) and im pretty sure this is my problem since im all the way in Australia. finally a patch that has got me excited since April, i may Actually get to play after 2 months of crashing. :angry: :excl: :excl:

#153 Deathlike

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 29,240 posts
  • Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 03 June 2013 - 09:26 PM

View PostKing Arthur IV, on 03 June 2013 - 09:13 PM, said:

this is the only part i understood!! ;) and im pretty sure this is my problem since im all the way in Australia. finally a patch that has got me excited since April, i may Actually get to play after 2 months of crashing. :angry: :excl: :excl:


Let me break this down, for n00bs to understand.

View PostKarl Berg, on 03 June 2013 - 08:51 PM, said:

Hey guys, it's worth at least a brief explanation about what's been going on.


It demands its own Command Chair post, because it epic sounding.

Quote

A change last patch, which was intended to address huge packet send delays induced by small amounts of packet loss, ended up causing very large numbers of small packets to be sent by mistake, and due to other bugs this was not detected until too late.

The reason for the huge latency with minimal packet loss was we found that CryNetwork implements a form of flow control designed to detect and correct for bad network conditions. This flow control would rapidly throttle back the size of sent packets to ridiculously low values, 13 byte payloads in fact, and would take several 10's of seconds to recover. We also determined that a core loop that was designed to flush outstanding messages to the socket was not iterating correctly, causing the system to send only a single packet per game update loop. The teeny packet sizes, combined with the single packet per frame, caused traffic to get extremely backlogged and would induce huge delays into the network layer.


Packet loss is bad (it doesn't reach its destination and/or changes in transit, which is all bad), so what computers tend to do is resize the packet into smaller chunks. Think of it like eating food.. some people can swallow big chunks of food like a vacuum... others use knives and cut them into smaller pieces so it fits their mouths.

The problem is the interaction with the changes made to address packet loss that made certain connections and people's pings change wildly. You don't want to keep sending tiny packets, it's bad for the network (think 56k if you will). You want to send the biggest packet as OFTEN as possible and as RELIABLE as possible. That's part of writing good netcode that works with its enviornment.

Quote

We 'corrected' the send loop by fixing their main packet loop to at least iterate until all pending messages had been sent. Unfortunately the send queue could block messages, causing the loop to iterate too many times, causing small micro packets to get transmitted.

Now normally we would have caught this, we monitor network traffic very closely. It turns out, unfortunately, that while the CryNetwork traffic metrics correctly monitor received traffic data usage, the send traffic data usage is bugged and was not accounting for the overhead these tiny packets were causing.

We have corrected both our buggy fix to the send loop, and the network data usage counters for this next patch coming out tomorrow. This should fix the increased latency some users are experiencing, as well as the erroneous DoS detection that this bug caused.


Simply put, they had to spend more time making sure their network detection tools are up to par with their changes in their code.

Quote

This leads me to the disconnect issue.

While it is extremely unfortunate that the previous bug made its way to production, it turns out that this allowed us to isolate and address this major bug which was causing disconnect to mechlab issues. It turns out that this whole time, at the very lowest layer, CryNetwork has been using only a single byte for packet sequence id's. This is an extremely small size, providing only 256 possible sequence values; and we've determined that if there is a large change in connection latency causing these sequence numbers to overflow, the engine detects this as a 'malformed packet' error and forces a disconnect.

The large number of small packets introduced with last patch caused the network layer to burn through these sequence numbers at a much higher rate, hence the increased number of disconnects.

We have now doubled this sequence number size from 1 byte to 2, or from a total of possible sequence values 256 to 65536. This increases the engines tolerance for delayed packets from a second or two at most to something far more sane, closer to 4 or 5 minutes in fact.


Ah... built in limitations are bad. I assume the compression made this numbering a lot more sensitive.

Quote

This change is also coming out next patch.

I greatly apologize for all the grief these bugs have caused. It's been a pretty rocky set of patches up to now as we've tried to iron out all the issues we've been having with this engine. In addition to the fixes listed above, we've added a whole new set of test conditions to the QA test plan designed to catch and prevent similar issues. Already these new tests have caught some issues on login that users with very poor connections may have been experiencing. We have done our best to address these login issues with tomorrows patch as well.


I was suffering, so I'm thrilled that this is being resolved.

Edited by Deathlike, 03 June 2013 - 09:27 PM.


#154 zazz0000

    Member

  • PipPipPipPipPipPip
  • 232 posts

Posted 03 June 2013 - 09:36 PM

Rockin' cool PGI. And looks like you guys have been staying up late working on this. +1.

Now throw in a buff for flamers and lbx into tomorrow's patch and we call it even?

#155 King Arthur IV

    Member

  • PipPipPipPipPipPipPipPipPip
  • Moderate Giver
  • 2,549 posts

Posted 03 June 2013 - 09:42 PM

View PostDeathlike, on 03 June 2013 - 09:26 PM, said:


Let me break this down, for n00bs to understand................


why sank you kind sir.......

your location has the word 4sshole init huehuehuehue

Edited by King Arthur IV, 03 June 2013 - 09:43 PM.


#156 DeathofSelf

    Member

  • PipPipPipPipPipPipPip
  • Bridesmaid
  • Bridesmaid
  • 655 posts
  • LocationChicago

Posted 03 June 2013 - 10:22 PM

Thank you for the response, PGI, was that so hard? Hopefully this fixes the issues people are seeing.

#157 Deathlike

    Member

  • PipPipPipPipPipPipPipPipPipPipPipPipPipPip
  • Littlest Helper
  • Littlest Helper
  • 29,240 posts
  • Location#NOToTaterBalance #BadBalanceOverlordIsBad

Posted 03 June 2013 - 10:28 PM

View PostDeathofSelf, on 03 June 2013 - 10:22 PM, said:

Thank you for the response, PGI, was that so hard? Hopefully this fixes the issues people are seeing.


Better him than Paul or Bryan giving one word/sentences/responses that provide little context/value/meaning to the issue.

#158 Dude42

    Member

  • PipPipPipPipPipPipPip
  • 530 posts
  • LocationFL, USA

Posted 03 June 2013 - 10:48 PM

LOL. I went to buy some MC just to like "reward" PGI for this update(and because I'm getting low on mechbays...) And didn't think about the fact that I was still on the Cyberghost VPN when I went to do the payment. Got declined, likely legitimately due to the IP address I appeared to be on being in a much different state than the card address, which apparently locks you completely out of being able to purchase MC using ANY method for 48 hours(now even paypal declines...after being approved by paypal, go figure). So yea.... I guess they won't be getting their reward, lol.

#159 stjobe

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 9,498 posts
  • LocationOn your six, chipping away at your rear armour.

Posted 04 June 2013 - 01:18 AM

View PostKarl Berg, on 03 June 2013 - 08:51 PM, said:

Hey guys, it's worth at least a brief explanation about what's been going on.

Thank you Karl, and I must say that your definition of "brief explanation" is much more appealing to me than the AtD "in-depth" standard non-answer of "no plans at the moment".

Would you mind smacking the other devs upside the head until they also subscribe to your definition of "brief explanation"?

Again, this is the kind of communication that will build you a solid and loyal community - please make sure that the others understand that.

Oh, and good work on finding and fixing the bug(s)!

#160 MentalPatient

    Member

  • PipPipPipPipPip
  • 145 posts

Posted 04 June 2013 - 01:44 AM

Really hope this fixes the CONSTANT disconnects.





4 user(s) are reading this topic

0 members, 4 guests, 0 anonymous users