Jump to content

Crysis 3 Cpu Benchmarks - Vengance Of The Multicore


28 replies to this topic

#1 Thorqemada

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • 6,394 posts

Posted 21 February 2013 - 04:45 PM

Results of Crysis 3 CPU Benchmark test - Multi Cores rule - Dual Cores suck:

http://www.dslreport...e-of-Multi-Core

#2 Vulpesveritas

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,003 posts
  • LocationWinsconsin, USA

Posted 21 February 2013 - 07:32 PM

Posted Image


Well that FX-8350 is now to be put in every Crysis 3 purpose built PC I think.

#3 Bad Karma 308

    Member

  • PipPipPipPipPipPip
  • Legendary Founder
  • Legendary Founder
  • 411 posts

Posted 21 February 2013 - 11:26 PM

What you're really looking at is how in depth they went with multi-threading support on the Cry Engine 3. Looks like it's pretty much from the ground up, which is really amazing when you consider how far programmers have come in the last few years. As little as 5-6 years back you couldn't find many programmers even willing to jump into the multi-threading waters.

What a lot of people don't realize is how involved programming for multiple cores can be.
Here is a great article that I use to help explain:

_____________________________________________________________________________

http://ashishkhandel...ramming-part-1/

Race Condition

A race condition occurs when two or more threads are able to access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any point, we cannot know the order at which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are ‘racing’ to access/change the data.

Often problems occur when one thread does a “check-then-act” (e.g. “check” if the value is X, and then “act” to do something that depends on the value being X) and another thread does something to the value in between the “check” and the “act”.

In order to prevent race conditions occurring, typically put a lock around the shared data to ensure that only one thread can access the data at a time.

DeadLock

A deadlock occurs when two or more processes/threads are unable to proceed because each is waiting or one of the others to do something.

A common example is a program communicating to a server, which may find itself waiting for output from the server before sending anything more to it, while the server is similarly waiting for more input from the controlling program before outputting anything.

Another common example in which each process is trying to send stuff to the other but all buffers are full because nobody is reading anything.

Another example, common in database programming, is two processes that are sharing some resource (e.g. read access to a table) but then both decide to wait for exclusive (e.g. write) access.

The real world example will be interactions between humans, as when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side

LiveLock

A thread often acts in response to the action of another thread. If the other thread’s action is also a response to the action of another thread, then livelock may result. As with deadlock, livelocked threads are unable to make further progress. However, the threads are not blocked; they are simply too busy responding to each other to resume work. A livelock happens when a request for an exclusive lock to use the shared resource is repeatedly denied because a series of overlapping shared locks keeps interfering and at the end two or more threads continue to execute, but make no progress in completing their tasks.

A livelock is very similar to a deadlock, except that the state of the two processes involved in the livelock constantly changes with regards to the other process.

As a real world example, livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they both move always the same way at the same time. A deadlock results in an infinite wait whereas a livelock results in wasting CPU cycles.

Priority Inversion

Priority inversion occurs when two or more threads with different priorities are in contention to be scheduled.
Let’s understand:

When we have a shared resource, we use the lock (mutex) to prevent race conditions and inconsistencies. Locks make sure that only one thread is accessing the resource at a time. In order to access a resource a thread must acquire the lock first. If it is unable to acquire the lock (it means, another thread is using the resource), it must wait until the thread currently accessing the resource releases the lock. Now take a simple case with three threads:
  • Thread 1 with High priority
  • Thread 2 with low priority and
  • Thread 3 with medium priority
Thread 3 is holding a lock on a resource that Thread 1 is wants to use. Thread 1 must wait for thread 3 to release the lock because it called acquire on the lock. Because thread 1 is waiting for the lock, it is not available to run. Thread 3 and 2 are ready to run. So when the priority scheduler chooses a thread to run next, it can only choose between 3 and 2. Thread 2 has a higher priority, so it will go next. Thread 3 cannot run, so it won’t be able to release the lock and thread 1 will have to wait until 2 finishes. In other words, the highest priority thread, thread 1, is inadvertently being blocked from running by a lower priority thread, thread 2.

Two-Step Dances

In “two-step dance” threads bounce between waking and waiting, not doing any work. This happens due to the way signaling models get implemented by developers.

Let’s take an example: sometimes when you signal an event while holding a lock and if the waking (signaled – Thread 2) thread needs to acquire a lock already held by signaling thread (Thread1), in this case the signaled thread (Thread 2) will only be awaked to find out that it has to wait again. The signaling thread (Thread 1) will awake again, release the lock. Once released, the signaled thread (Thread 2) will awake and get the lock. This is wasteful and increases the number of overall context switches. This situation is called the two-step dance, and can extend far beyond just two steps if many locks and events are involved.

Lock Convoys

Lock convoy occurs when multiple threads with equal priority compete repeatedly for the same lock. In this situation threads do progress, but each time the attempt to acquire the lock gets fails. This degrades the overall performance of the application because of additional overhead of repeated context switches and underutilization of scheduling process.

Lock convoy more occurs when there are more threads waiting at a lock than can be serviced. This situation is more common on server-side programs where locks get implemented to protect data needed by most clients.

For example: On an average, application gets eight requests per 100 milliseconds and uses eight threads to service these requests (because hosted on an 8-CPU machine). Each thread must hold a lock for 20 milliseconds to accomplish meaningful work. Access to this lock must be serialized, therefore it takes 160 milliseconds for all eight threads to enter and exit the lock. After the first exists, 140 milliseconds are required before ninth thread can access the lock. This scheme inherently will not scale, and there will be a continuously growing backup of requests. Over time, if the arrival rate does not decrease, client requests are apt to begin timing out, and a disaster will result.

____________________________________________________________________________


And there are many more variables that programmers have to juggle. So for this engine to be able to take advantage of so many core is truly outstanding.

Edited by Bad Karma 308, 21 February 2013 - 11:28 PM.


#4 Thorqemada

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • 6,394 posts

Posted 22 February 2013 - 12:38 AM

Yeah - its amazing!

#5 Thorqemada

    Member

  • PipPipPipPipPipPipPipPipPipPip
  • 6,394 posts

Posted 22 February 2013 - 01:25 AM

View PostVulpesveritas, on 21 February 2013 - 07:32 PM, said:

Posted Image


Well that FX-8350 is now to be put in every Crysis 3 purpose built PC I think.


The FX 6300 is outstanding too - god, its only 139$ for a good "Crysis 3" CPU !

Edited by Thorqemada, 22 February 2013 - 01:25 AM.


#6 TheFlayedman

    Member

  • PipPipPip
  • Survivor
  • 76 posts

Posted 22 February 2013 - 06:28 AM

At first glance it looks like theres a big difference until you notice they ran the benchmarks at 1280x1024 with no AA or AF.
Check out the guru3D review of the titan they tested with a quad core and heavily overclocked 6 core at full res and details, the difference is 3fps.
http://www.guru3d.co..._review,13.html

#7 Rumble Chicken

    Member

  • PipPipPip
  • 77 posts

Posted 22 February 2013 - 08:25 AM

weird my .17 hp machine seems faster

#8 Zyllos

    Member

  • PipPipPipPipPipPipPipPipPip
  • 2,818 posts

Posted 22 February 2013 - 08:36 AM

View PostBad Karma 308, on 21 February 2013 - 11:26 PM, said:

What you're really looking at is how in depth they went with multi-threading support on the Cry Engine 3. Looks like it's pretty much from the ground up, which is really amazing when you consider how far programmers have come in the last few years. As little as 5-6 years back you couldn't find many programmers even willing to jump into the multi-threading waters.

What a lot of people don't realize is how involved programming for multiple cores can be.
Here is a great article that I use to help explain:

_____________________________________________________________________________

http://ashishkhandel...ramming-part-1/

Race Condition

A race condition occurs when two or more threads are able to access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any point, we cannot know the order at which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are ‘racing’ to access/change the data.

Often problems occur when one thread does a “check-then-act” (e.g. “check” if the value is X, and then “act” to do something that depends on the value being X) and another thread does something to the value in between the “check” and the “act”.

In order to prevent race conditions occurring, typically put a lock around the shared data to ensure that only one thread can access the data at a time.

DeadLock

A deadlock occurs when two or more processes/threads are unable to proceed because each is waiting or one of the others to do something.

A common example is a program communicating to a server, which may find itself waiting for output from the server before sending anything more to it, while the server is similarly waiting for more input from the controlling program before outputting anything.

Another common example in which each process is trying to send stuff to the other but all buffers are full because nobody is reading anything.

Another example, common in database programming, is two processes that are sharing some resource (e.g. read access to a table) but then both decide to wait for exclusive (e.g. write) access.

The real world example will be interactions between humans, as when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side

LiveLock

A thread often acts in response to the action of another thread. If the other thread’s action is also a response to the action of another thread, then livelock may result. As with deadlock, livelocked threads are unable to make further progress. However, the threads are not blocked; they are simply too busy responding to each other to resume work. A livelock happens when a request for an exclusive lock to use the shared resource is repeatedly denied because a series of overlapping shared locks keeps interfering and at the end two or more threads continue to execute, but make no progress in completing their tasks.

A livelock is very similar to a deadlock, except that the state of the two processes involved in the livelock constantly changes with regards to the other process.

As a real world example, livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they both move always the same way at the same time. A deadlock results in an infinite wait whereas a livelock results in wasting CPU cycles.

Priority Inversion

Priority inversion occurs when two or more threads with different priorities are in contention to be scheduled.
Let’s understand:

When we have a shared resource, we use the lock (mutex) to prevent race conditions and inconsistencies. Locks make sure that only one thread is accessing the resource at a time. In order to access a resource a thread must acquire the lock first. If it is unable to acquire the lock (it means, another thread is using the resource), it must wait until the thread currently accessing the resource releases the lock. Now take a simple case with three threads:
  • Thread 1 with High priority
  • Thread 2 with low priority and
  • Thread 3 with medium priority
Thread 3 is holding a lock on a resource that Thread 1 is wants to use. Thread 1 must wait for thread 3 to release the lock because it called acquire on the lock. Because thread 1 is waiting for the lock, it is not available to run. Thread 3 and 2 are ready to run. So when the priority scheduler chooses a thread to run next, it can only choose between 3 and 2. Thread 2 has a higher priority, so it will go next. Thread 3 cannot run, so it won’t be able to release the lock and thread 1 will have to wait until 2 finishes. In other words, the highest priority thread, thread 1, is inadvertently being blocked from running by a lower priority thread, thread 2.


Two-Step Dances

In “two-step dance” threads bounce between waking and waiting, not doing any work. This happens due to the way signaling models get implemented by developers.

Let’s take an example: sometimes when you signal an event while holding a lock and if the waking (signaled – Thread 2) thread needs to acquire a lock already held by signaling thread (Thread1), in this case the signaled thread (Thread 2) will only be awaked to find out that it has to wait again. The signaling thread (Thread 1) will awake again, release the lock. Once released, the signaled thread (Thread 2) will awake and get the lock. This is wasteful and increases the number of overall context switches. This situation is called the two-step dance, and can extend far beyond just two steps if many locks and events are involved.

Lock Convoys

Lock convoy occurs when multiple threads with equal priority compete repeatedly for the same lock. In this situation threads do progress, but each time the attempt to acquire the lock gets fails. This degrades the overall performance of the application because of additional overhead of repeated context switches and underutilization of scheduling process.

Lock convoy more occurs when there are more threads waiting at a lock than can be serviced. This situation is more common on server-side programs where locks get implemented to protect data needed by most clients.

For example: On an average, application gets eight requests per 100 milliseconds and uses eight threads to service these requests (because hosted on an 8-CPU machine). Each thread must hold a lock for 20 milliseconds to accomplish meaningful work. Access to this lock must be serialized, therefore it takes 160 milliseconds for all eight threads to enter and exit the lock. After the first exists, 140 milliseconds are required before ninth thread can access the lock. This scheme inherently will not scale, and there will be a continuously growing backup of requests. Over time, if the arrival rate does not decrease, client requests are apt to begin timing out, and a disaster will result.

____________________________________________________________________________


And there are many more variables that programmers have to juggle. So for this engine to be able to take advantage of so many core is truly outstanding.


Oh man...you have no idea...

#9 Catamount

    Member

  • PipPipPipPipPipPipPipPipPip
  • LIEUTENANT, JUNIOR GRADE
  • 3,305 posts
  • LocationBoone, NC

Posted 22 February 2013 - 09:48 AM

View PostTheFlayedman, on 22 February 2013 - 06:28 AM, said:

At first glance it looks like theres a big difference until you notice they ran the benchmarks at 1280x1024 with no AA or AF.
Check out the guru3D review of the titan they tested with a quad core and heavily overclocked 6 core at full res and details, the difference is 3fps.
http://www.guru3d.co..._review,13.html


That's an important point to be mindful of. Yes, extreme multi core chips are faster at the CPU portion of Crysis 3, but that's an extreme minority component of gaming performance. Doubling your RAM speed would also make for a faster system, technically, but it's not going to make a tangible difference.

That said, it's definitely a good look into then direction if CPUs. Chips like Bulldozer and Pile driver have been deceptively good values for software that makes use of them, and taking into account their ability to make extreme multi core chips for less, AMD has a hell of a product for software that can take advantage. It's just that that caveat has been killing them. If software continues to take advantage of these chips increasingly, however, it will mean AMD not only has a chip competitive with Intel's higher end, but superior, giving the same performance as such high end chips as the 3770k in a massively less expensive package.

#10 Lord of All

    Member

  • PipPipPipPipPipPipPip
  • Knight Errant
  • 581 posts
  • Google+: Link
  • LocationBottom Of a Bottle

Posted 22 February 2013 - 12:33 PM

View PostThorqemada, on 21 February 2013 - 04:45 PM, said:

Results of Crysis 3 CPU Benchmark test - Multi Cores rule - Dual Cores suck:

http://www.dslreport...e-of-Multi-Core


My initial research on the subject (albeit cursory) has shown that the devs must code to support multi processors so testing on the engine alone will not yield the same results for every iteration of said engine.

With that being said and because I could find no specific info on the MWO iteration I did a small test with "Process explorer" Last week and indeed MWO does use all 6 of my cores. Sorry haven't tested beyond that as I only have a 1601T.

#11 Lord of All

    Member

  • PipPipPipPipPipPipPip
  • Knight Errant
  • 581 posts
  • Google+: Link
  • LocationBottom Of a Bottle

Posted 22 February 2013 - 12:38 PM

View PostTheFlayedman, on 22 February 2013 - 06:28 AM, said:

At first glance it looks like theres a big difference until you notice they ran the benchmarks at 1280x1024 with no AA or AF.
Check out the guru3D review of the titan they tested with a quad core and heavily overclocked 6 core at full res and details, the difference is 3fps.
http://www.guru3d.co..._review,13.html

Remember that the Thuban core have full ALU units while the Bulldozer/Pildriver/Steamrioller' have a shared ALU.

So you have the Thubans with 6 ALU units running while the B/P/S have only 4. Deneb Being 4 as well.

#12 Rumble Chicken

    Member

  • PipPipPip
  • 77 posts

Posted 23 February 2013 - 01:47 PM

btw windows have tons of threading and this works differently in different cpu's n gpu's . there is some more fps to be found if ya got one squezzed MHz intels will probally need more looking at verily clockish sensitive.

#13 Youngblood

    Member

  • PipPipPipPipPipPipPip
  • 604 posts
  • LocationGMT -6

Posted 25 February 2013 - 08:44 AM

Although this is only a single tear shed on the face of the generic Intel fanboy, and although the OP misspelled the word "Vengeance", I will take that tear and I will DRINK IT! With this and HSA coming up for APUs, there will hopefully be a lot more noise made for AMD by enthusiasts, at least to try and bring them back into the mainstream desktop spotlight.

#14 Rumble Chicken

    Member

  • PipPipPip
  • 77 posts

Posted 25 February 2013 - 10:27 PM

ok, well AMD and ps4 bitter sweet

#15 Vulpesveritas

    Member

  • PipPipPipPipPipPipPipPipPip
  • 3,003 posts
  • LocationWinsconsin, USA

Posted 02 March 2013 - 12:14 PM

View Postmarcos6, on 25 February 2013 - 10:27 PM, said:

ok, well AMD and ps4 bitter sweet

How is it bitter sweet? They have an 8 core ~2ghz APU with a integrated Radeon HD 7850 (basically) pulling under 100 watts (guestimate given average total power consumption of the last two generations of consoles and TDP limitations of a console the same size or smaller) That AMD has said that they will be bringing a 'cut down' (I.e. Sony IP removed) To the Desktop / laptop space.

http://www.overclock...aystation-4-apu

Edited by Vulpesveritas, 02 March 2013 - 12:16 PM.


#16 Badconduct

    Dezgra

  • PipPipPipPipPipPip
  • Knight Errant
  • 364 posts

Posted 02 March 2013 - 03:58 PM

View PostCatamount, on 22 February 2013 - 09:48 AM, said:

That's an important point to be mindful of. Yes, extreme multi core chips are faster at the CPU portion of Crysis 3, but that's an extreme minority component of gaming performance. Doubling your RAM speed would also make for a faster system, technically, but it's not going to make a tangible difference.

That said, it's definitely a good look into then direction if CPUs. Chips like Bulldozer and Pile driver have been deceptively good values for software that makes use of them, and taking into account their ability to make extreme multi core chips for less, AMD has a hell of a product for software that can take advantage. It's just that that caveat has been killing them. If software continues to take advantage of these chips increasingly, however, it will mean AMD not only has a chip competitive with Intel's higher end, but superior, giving the same performance as such high end chips as the 3770k in a massively less expensive package.


That's incorrect. A faster CPU is the backbone of a system, not the GPU. The GPU certainly makes a difference once your CPU hits the minimum.

AMD is arguing that better programming is the future, and I agree. I doubt we'll many single-core games in the near future.

Reality is, intel isn't that much quick in real world tests. Some benchmark programs are built to use Intel, others work better on AMD.

#17 Catamount

    Member

  • PipPipPipPipPipPipPipPipPip
  • LIEUTENANT, JUNIOR GRADE
  • 3,305 posts
  • LocationBoone, NC

Posted 02 March 2013 - 06:28 PM

View PostBadconduct, on 02 March 2013 - 03:58 PM, said:


That's incorrect. A faster CPU is the backbone of a system, not the GPU. The GPU certainly makes a difference once your CPU hits the minimum.

AMD is arguing that better programming is the future, and I agree. I doubt we'll many single-core games in the near future.

Reality is, intel isn't that much quick in real world tests. Some benchmark programs are built to use Intel, others work better on AMD.


You apparently either didn't understand my post, or don't know much about what determines gaming performance within DX11 titles. Try again.

Almost any modern CPU can drive a modern gaming title at high framerates, whether it's a $100 CPU or a $300 CPU. It doesn't matter because the DX11 API offloads rendering tasks from the CPU, creating absolutely no discernible difference between slower or faster CPUs in games. Of course, that, in and of itself, isn't an argument against having a fast CPU, and I, myself, am running a rather high-end chip from today's lineup, but it is, nevertheless, a correct statement that modern games show little difference between CPUs. When you make vague, nondescript statements like "a CPU is the backbone of a system", I have no idea what that's even supposed to mean, but it's definitely not a rebuttal to said statement.

You are correct about one thing, however. Multithreading will definitely migrate to games increasingly as time goes on.

Edited by Catamount, 02 March 2013 - 06:35 PM.


#18 TheFlayedman

    Member

  • PipPipPip
  • Survivor
  • 76 posts

Posted 03 March 2013 - 08:06 AM

View PostLord of All, on 22 February 2013 - 12:38 PM, said:

Remember that the Thuban core have full ALU units while the Bulldozer/Pildriver/Steamrioller' have a shared ALU.

So you have the Thubans with 6 ALU units running while the B/P/S have only 4. Deneb Being 4 as well.


Sorry you lost me. Can you explain more clearly the point you are trying to get across?

#19 Badconduct

    Dezgra

  • PipPipPipPipPipPip
  • Knight Errant
  • 364 posts

Posted 04 March 2013 - 06:16 AM

View PostCatamount, on 02 March 2013 - 06:28 PM, said:


You apparently either didn't understand my post, or don't know much about what determines gaming performance within DX11 titles. Try again.

Almost any modern CPU can drive a modern gaming title at high framerates, whether it's a $100 CPU or a $300 CPU. It doesn't matter because the DX11 API offloads rendering tasks from the CPU, creating absolutely no discernible difference between slower or faster CPUs in games. Of course, that, in and of itself, isn't an argument against having a fast CPU, and I, myself, am running a rather high-end chip from today's lineup, but it is, nevertheless, a correct statement that modern games show little difference between CPUs. When you make vague, nondescript statements like "a CPU is the backbone of a system", I have no idea what that's even supposed to mean, but it's definitely not a rebuttal to said statement.

You are correct about one thing, however. Multithreading will definitely migrate to games increasingly as time goes on.



That's true, but you said; "Yes, extreme multi core chips are faster at the CPU portion of Crysis 3, but that's an extreme minority component of gaming performance."

Not everyone is running modern (ie 2 years old) hardware. My old processor is the 965 in that test, and it's already starting to lag behind. I believed it was released late 2009.

#20 Catamount

    Member

  • PipPipPipPipPipPipPipPipPip
  • LIEUTENANT, JUNIOR GRADE
  • 3,305 posts
  • LocationBoone, NC

Posted 04 March 2013 - 06:25 AM

View PostBadconduct, on 04 March 2013 - 06:16 AM, said:



That's true, but you said; "Yes, extreme multi core chips are faster at the CPU portion of Crysis 3, but that's an extreme minority component of gaming performance."

Not everyone is running modern (ie 2 years old) hardware. My old processor is the 965 in that test, and it's already starting to lag behind. I believed it was released late 2009.


Well keep in mind, the test is designed to bring out differences in CPUs that wouldn't necessarily be there in a real-world test, although it is strange that Crysis 3 isn't managing at least 60fps on all CPUs. A 965BE isn't a bad chip by any stretch of the imagination, so it's really odd that the game doesn't run well on one. I had one up until only four months ago, and I only swapped to my 3570k -I'm now beginning to wonder if I should have gotten the 8350 instead- because it was a free upgrade, a present from a couple out-of-country visitors.

In most titles, even a fairly high end GPU shouldn't bottleneck on a high-end Deneb CPU. That's the good news :D Even if Crysis 3 shares Crysis 2's affinity for beefy mcbeefy CPUs, that isn't typical, especially in shooters. I'll test this later in the game with my laptop's i7-720QM and see if the game really does require such powerful hardware.


This does show why having a fast CPU does pay off, in the end, though. It is an extreme minority determinant in performance, at least most of the time, but buying a high end one up front saves you a motherboard and CPU replacement down the road. Imagine if you had bought an Athlon II X2 instead of a 965 (which was just fine for gaming three years ago) ;) I think you'd have been making the replacement a fair bit quicker. Hell, you could have stuck with that chip for another couple of years and probably been fine.

Edited by Catamount, 04 March 2013 - 06:31 AM.






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users