

Crysis 3 Cpu Benchmarks - Vengance Of The Multicore
#1
Posted 21 February 2013 - 04:45 PM
http://www.dslreport...e-of-Multi-Core
#2
Posted 21 February 2013 - 07:32 PM

Well that FX-8350 is now to be put in every Crysis 3 purpose built PC I think.
#3
Posted 21 February 2013 - 11:26 PM
What a lot of people don't realize is how involved programming for multiple cores can be.
Here is a great article that I use to help explain:
_____________________________________________________________________________
http://ashishkhandel...ramming-part-1/
Race Condition
A race condition occurs when two or more threads are able to access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any point, we cannot know the order at which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are ‘racing’ to access/change the data.
Often problems occur when one thread does a “check-then-act” (e.g. “check” if the value is X, and then “act” to do something that depends on the value being X) and another thread does something to the value in between the “check” and the “act”.
In order to prevent race conditions occurring, typically put a lock around the shared data to ensure that only one thread can access the data at a time.
DeadLock
A deadlock occurs when two or more processes/threads are unable to proceed because each is waiting or one of the others to do something.
A common example is a program communicating to a server, which may find itself waiting for output from the server before sending anything more to it, while the server is similarly waiting for more input from the controlling program before outputting anything.
Another common example in which each process is trying to send stuff to the other but all buffers are full because nobody is reading anything.
Another example, common in database programming, is two processes that are sharing some resource (e.g. read access to a table) but then both decide to wait for exclusive (e.g. write) access.
The real world example will be interactions between humans, as when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side
LiveLock
A thread often acts in response to the action of another thread. If the other thread’s action is also a response to the action of another thread, then livelock may result. As with deadlock, livelocked threads are unable to make further progress. However, the threads are not blocked; they are simply too busy responding to each other to resume work. A livelock happens when a request for an exclusive lock to use the shared resource is repeatedly denied because a series of overlapping shared locks keeps interfering and at the end two or more threads continue to execute, but make no progress in completing their tasks.
A livelock is very similar to a deadlock, except that the state of the two processes involved in the livelock constantly changes with regards to the other process.
As a real world example, livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they both move always the same way at the same time. A deadlock results in an infinite wait whereas a livelock results in wasting CPU cycles.
Priority Inversion
Priority inversion occurs when two or more threads with different priorities are in contention to be scheduled.
Let’s understand:
When we have a shared resource, we use the lock (mutex) to prevent race conditions and inconsistencies. Locks make sure that only one thread is accessing the resource at a time. In order to access a resource a thread must acquire the lock first. If it is unable to acquire the lock (it means, another thread is using the resource), it must wait until the thread currently accessing the resource releases the lock. Now take a simple case with three threads:
- Thread 1 with High priority
- Thread 2 with low priority and
- Thread 3 with medium priority
Two-Step Dances
In “two-step dance” threads bounce between waking and waiting, not doing any work. This happens due to the way signaling models get implemented by developers.
Let’s take an example: sometimes when you signal an event while holding a lock and if the waking (signaled – Thread 2) thread needs to acquire a lock already held by signaling thread (Thread1), in this case the signaled thread (Thread 2) will only be awaked to find out that it has to wait again. The signaling thread (Thread 1) will awake again, release the lock. Once released, the signaled thread (Thread 2) will awake and get the lock. This is wasteful and increases the number of overall context switches. This situation is called the two-step dance, and can extend far beyond just two steps if many locks and events are involved.
Lock Convoys
Lock convoy occurs when multiple threads with equal priority compete repeatedly for the same lock. In this situation threads do progress, but each time the attempt to acquire the lock gets fails. This degrades the overall performance of the application because of additional overhead of repeated context switches and underutilization of scheduling process.
Lock convoy more occurs when there are more threads waiting at a lock than can be serviced. This situation is more common on server-side programs where locks get implemented to protect data needed by most clients.
For example: On an average, application gets eight requests per 100 milliseconds and uses eight threads to service these requests (because hosted on an 8-CPU machine). Each thread must hold a lock for 20 milliseconds to accomplish meaningful work. Access to this lock must be serialized, therefore it takes 160 milliseconds for all eight threads to enter and exit the lock. After the first exists, 140 milliseconds are required before ninth thread can access the lock. This scheme inherently will not scale, and there will be a continuously growing backup of requests. Over time, if the arrival rate does not decrease, client requests are apt to begin timing out, and a disaster will result.
____________________________________________________________________________
And there are many more variables that programmers have to juggle. So for this engine to be able to take advantage of so many core is truly outstanding.
Edited by Bad Karma 308, 21 February 2013 - 11:28 PM.
#4
Posted 22 February 2013 - 12:38 AM
#6
Posted 22 February 2013 - 06:28 AM
Check out the guru3D review of the titan they tested with a quad core and heavily overclocked 6 core at full res and details, the difference is 3fps.
http://www.guru3d.co..._review,13.html
#7
Posted 22 February 2013 - 08:25 AM
#8
Posted 22 February 2013 - 08:36 AM
Bad Karma 308, on 21 February 2013 - 11:26 PM, said:
What a lot of people don't realize is how involved programming for multiple cores can be.
Here is a great article that I use to help explain:
_____________________________________________________________________________
http://ashishkhandel...ramming-part-1/
Race Condition
A race condition occurs when two or more threads are able to access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any point, we cannot know the order at which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are ‘racing’ to access/change the data.
Often problems occur when one thread does a “check-then-act” (e.g. “check” if the value is X, and then “act” to do something that depends on the value being X) and another thread does something to the value in between the “check” and the “act”.
In order to prevent race conditions occurring, typically put a lock around the shared data to ensure that only one thread can access the data at a time.
DeadLock
A deadlock occurs when two or more processes/threads are unable to proceed because each is waiting or one of the others to do something.
A common example is a program communicating to a server, which may find itself waiting for output from the server before sending anything more to it, while the server is similarly waiting for more input from the controlling program before outputting anything.
Another common example in which each process is trying to send stuff to the other but all buffers are full because nobody is reading anything.
Another example, common in database programming, is two processes that are sharing some resource (e.g. read access to a table) but then both decide to wait for exclusive (e.g. write) access.
The real world example will be interactions between humans, as when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side
LiveLock
A thread often acts in response to the action of another thread. If the other thread’s action is also a response to the action of another thread, then livelock may result. As with deadlock, livelocked threads are unable to make further progress. However, the threads are not blocked; they are simply too busy responding to each other to resume work. A livelock happens when a request for an exclusive lock to use the shared resource is repeatedly denied because a series of overlapping shared locks keeps interfering and at the end two or more threads continue to execute, but make no progress in completing their tasks.
A livelock is very similar to a deadlock, except that the state of the two processes involved in the livelock constantly changes with regards to the other process.
As a real world example, livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they both move always the same way at the same time. A deadlock results in an infinite wait whereas a livelock results in wasting CPU cycles.
Priority Inversion
Priority inversion occurs when two or more threads with different priorities are in contention to be scheduled.
Let’s understand:
When we have a shared resource, we use the lock (mutex) to prevent race conditions and inconsistencies. Locks make sure that only one thread is accessing the resource at a time. In order to access a resource a thread must acquire the lock first. If it is unable to acquire the lock (it means, another thread is using the resource), it must wait until the thread currently accessing the resource releases the lock. Now take a simple case with three threads:
- Thread 1 with High priority
- Thread 2 with low priority and
- Thread 3 with medium priority
Two-Step Dances
In “two-step dance” threads bounce between waking and waiting, not doing any work. This happens due to the way signaling models get implemented by developers.
Let’s take an example: sometimes when you signal an event while holding a lock and if the waking (signaled – Thread 2) thread needs to acquire a lock already held by signaling thread (Thread1), in this case the signaled thread (Thread 2) will only be awaked to find out that it has to wait again. The signaling thread (Thread 1) will awake again, release the lock. Once released, the signaled thread (Thread 2) will awake and get the lock. This is wasteful and increases the number of overall context switches. This situation is called the two-step dance, and can extend far beyond just two steps if many locks and events are involved.
Lock Convoys
Lock convoy occurs when multiple threads with equal priority compete repeatedly for the same lock. In this situation threads do progress, but each time the attempt to acquire the lock gets fails. This degrades the overall performance of the application because of additional overhead of repeated context switches and underutilization of scheduling process.
Lock convoy more occurs when there are more threads waiting at a lock than can be serviced. This situation is more common on server-side programs where locks get implemented to protect data needed by most clients.
For example: On an average, application gets eight requests per 100 milliseconds and uses eight threads to service these requests (because hosted on an 8-CPU machine). Each thread must hold a lock for 20 milliseconds to accomplish meaningful work. Access to this lock must be serialized, therefore it takes 160 milliseconds for all eight threads to enter and exit the lock. After the first exists, 140 milliseconds are required before ninth thread can access the lock. This scheme inherently will not scale, and there will be a continuously growing backup of requests. Over time, if the arrival rate does not decrease, client requests are apt to begin timing out, and a disaster will result.
____________________________________________________________________________
And there are many more variables that programmers have to juggle. So for this engine to be able to take advantage of so many core is truly outstanding.
Oh man...you have no idea...
#9
Posted 22 February 2013 - 09:48 AM
TheFlayedman, on 22 February 2013 - 06:28 AM, said:
Check out the guru3D review of the titan they tested with a quad core and heavily overclocked 6 core at full res and details, the difference is 3fps.
http://www.guru3d.co..._review,13.html
That's an important point to be mindful of. Yes, extreme multi core chips are faster at the CPU portion of Crysis 3, but that's an extreme minority component of gaming performance. Doubling your RAM speed would also make for a faster system, technically, but it's not going to make a tangible difference.
That said, it's definitely a good look into then direction if CPUs. Chips like Bulldozer and Pile driver have been deceptively good values for software that makes use of them, and taking into account their ability to make extreme multi core chips for less, AMD has a hell of a product for software that can take advantage. It's just that that caveat has been killing them. If software continues to take advantage of these chips increasingly, however, it will mean AMD not only has a chip competitive with Intel's higher end, but superior, giving the same performance as such high end chips as the 3770k in a massively less expensive package.
#10
Posted 22 February 2013 - 12:33 PM
Thorqemada, on 21 February 2013 - 04:45 PM, said:
http://www.dslreport...e-of-Multi-Core
My initial research on the subject (albeit cursory) has shown that the devs must code to support multi processors so testing on the engine alone will not yield the same results for every iteration of said engine.
With that being said and because I could find no specific info on the MWO iteration I did a small test with "Process explorer" Last week and indeed MWO does use all 6 of my cores. Sorry haven't tested beyond that as I only have a 1601T.
#11
Posted 22 February 2013 - 12:38 PM
TheFlayedman, on 22 February 2013 - 06:28 AM, said:
Check out the guru3D review of the titan they tested with a quad core and heavily overclocked 6 core at full res and details, the difference is 3fps.
http://www.guru3d.co..._review,13.html
Remember that the Thuban core have full ALU units while the Bulldozer/Pildriver/Steamrioller' have a shared ALU.
So you have the Thubans with 6 ALU units running while the B/P/S have only 4. Deneb Being 4 as well.
#12
Posted 23 February 2013 - 01:47 PM
#13
Posted 25 February 2013 - 08:44 AM
#14
Posted 25 February 2013 - 10:27 PM
#15
Posted 02 March 2013 - 12:14 PM
marcos6, on 25 February 2013 - 10:27 PM, said:
How is it bitter sweet? They have an 8 core ~2ghz APU with a integrated Radeon HD 7850 (basically) pulling under 100 watts (guestimate given average total power consumption of the last two generations of consoles and TDP limitations of a console the same size or smaller) That AMD has said that they will be bringing a 'cut down' (I.e. Sony IP removed) To the Desktop / laptop space.
http://www.overclock...aystation-4-apu
Edited by Vulpesveritas, 02 March 2013 - 12:16 PM.
#16
Posted 02 March 2013 - 03:58 PM
Catamount, on 22 February 2013 - 09:48 AM, said:
That said, it's definitely a good look into then direction if CPUs. Chips like Bulldozer and Pile driver have been deceptively good values for software that makes use of them, and taking into account their ability to make extreme multi core chips for less, AMD has a hell of a product for software that can take advantage. It's just that that caveat has been killing them. If software continues to take advantage of these chips increasingly, however, it will mean AMD not only has a chip competitive with Intel's higher end, but superior, giving the same performance as such high end chips as the 3770k in a massively less expensive package.
That's incorrect. A faster CPU is the backbone of a system, not the GPU. The GPU certainly makes a difference once your CPU hits the minimum.
AMD is arguing that better programming is the future, and I agree. I doubt we'll many single-core games in the near future.
Reality is, intel isn't that much quick in real world tests. Some benchmark programs are built to use Intel, others work better on AMD.
#17
Posted 02 March 2013 - 06:28 PM
Badconduct, on 02 March 2013 - 03:58 PM, said:
That's incorrect. A faster CPU is the backbone of a system, not the GPU. The GPU certainly makes a difference once your CPU hits the minimum.
AMD is arguing that better programming is the future, and I agree. I doubt we'll many single-core games in the near future.
Reality is, intel isn't that much quick in real world tests. Some benchmark programs are built to use Intel, others work better on AMD.
You apparently either didn't understand my post, or don't know much about what determines gaming performance within DX11 titles. Try again.
Almost any modern CPU can drive a modern gaming title at high framerates, whether it's a $100 CPU or a $300 CPU. It doesn't matter because the DX11 API offloads rendering tasks from the CPU, creating absolutely no discernible difference between slower or faster CPUs in games. Of course, that, in and of itself, isn't an argument against having a fast CPU, and I, myself, am running a rather high-end chip from today's lineup, but it is, nevertheless, a correct statement that modern games show little difference between CPUs. When you make vague, nondescript statements like "a CPU is the backbone of a system", I have no idea what that's even supposed to mean, but it's definitely not a rebuttal to said statement.
You are correct about one thing, however. Multithreading will definitely migrate to games increasingly as time goes on.
Edited by Catamount, 02 March 2013 - 06:35 PM.
#18
Posted 03 March 2013 - 08:06 AM
Lord of All, on 22 February 2013 - 12:38 PM, said:
So you have the Thubans with 6 ALU units running while the B/P/S have only 4. Deneb Being 4 as well.
Sorry you lost me. Can you explain more clearly the point you are trying to get across?
#19
Posted 04 March 2013 - 06:16 AM
Catamount, on 02 March 2013 - 06:28 PM, said:
You apparently either didn't understand my post, or don't know much about what determines gaming performance within DX11 titles. Try again.
Almost any modern CPU can drive a modern gaming title at high framerates, whether it's a $100 CPU or a $300 CPU. It doesn't matter because the DX11 API offloads rendering tasks from the CPU, creating absolutely no discernible difference between slower or faster CPUs in games. Of course, that, in and of itself, isn't an argument against having a fast CPU, and I, myself, am running a rather high-end chip from today's lineup, but it is, nevertheless, a correct statement that modern games show little difference between CPUs. When you make vague, nondescript statements like "a CPU is the backbone of a system", I have no idea what that's even supposed to mean, but it's definitely not a rebuttal to said statement.
You are correct about one thing, however. Multithreading will definitely migrate to games increasingly as time goes on.
That's true, but you said; "Yes, extreme multi core chips are faster at the CPU portion of Crysis 3, but that's an extreme minority component of gaming performance."
Not everyone is running modern (ie 2 years old) hardware. My old processor is the 965 in that test, and it's already starting to lag behind. I believed it was released late 2009.
#20
Posted 04 March 2013 - 06:25 AM
Badconduct, on 04 March 2013 - 06:16 AM, said:
That's true, but you said; "Yes, extreme multi core chips are faster at the CPU portion of Crysis 3, but that's an extreme minority component of gaming performance."
Not everyone is running modern (ie 2 years old) hardware. My old processor is the 965 in that test, and it's already starting to lag behind. I believed it was released late 2009.
Well keep in mind, the test is designed to bring out differences in CPUs that wouldn't necessarily be there in a real-world test, although it is strange that Crysis 3 isn't managing at least 60fps on all CPUs. A 965BE isn't a bad chip by any stretch of the imagination, so it's really odd that the game doesn't run well on one. I had one up until only four months ago, and I only swapped to my 3570k -I'm now beginning to wonder if I should have gotten the 8350 instead- because it was a free upgrade, a present from a couple out-of-country visitors.
In most titles, even a fairly high end GPU shouldn't bottleneck on a high-end Deneb CPU. That's the good news

This does show why having a fast CPU does pay off, in the end, though. It is an extreme minority determinant in performance, at least most of the time, but buying a high end one up front saves you a motherboard and CPU replacement down the road. Imagine if you had bought an Athlon II X2 instead of a 965 (which was just fine for gaming three years ago)

Edited by Catamount, 04 March 2013 - 06:31 AM.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users