Jump to content

Gtx 980 Weirdness--Driver Crashing And Recovering


31 replies to this topic

#1 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 03 February 2016 - 10:01 AM

The machine:

OS: Win 7 64 Bit home edition
CPU: Intel i7 930 (o/c 3.4 ghz, stock 2.8 ghz)
CPU cooler: Thermaltake NIC C5
MB: EVGA X58 SLI
GPU: EVGA GTX 980 gaming 4 GB ram (1126 ghz)
Storage: 240 GB SSD, 1 TB 7200 rpm WD something
Memory: 6x 2GB Corsair DDR 3 @ 1600
PSU: Thermaltake smart 630w, should be using 30.7a of 45a on the 12v rail. Feels ok?
http://outervision.com/b/YEg8jU
WIFI: TP-Link AC1900 Archer T9E PCI card
El-cheapo front panel SD card reader plugged into MB usb
Monitor: 21.5" Samsumg something old 1080p. Connected via DVI, no HDMI inputs (probably not relevant)

So I spent about 4 hours last night trying to swap out my GTX 750 for a GTX 980. Should have been a routine thing, but then on boot windows isn't recognizing my up front usb flash card reader and my video driver started crashing and recovering upon any action. Boot computer, open chrome, screen freeze with checkers for a minute, unfreeze with a nvidia kernel driver recovery message. Google reveals this error and it's fixes are all across the board. Well crap.

Hot installed new drivers, no effect. Clean install new drivers, no effect. Clean install of really old drivers, no effect. Adjust physx settings to run only on CPU, no effect. Unplug case fan and all non-essential peripherals, no effect. Roll CPU overclock back to stock speed, no effect. Download MSI afterburner, downclock GPU, hard freeze--no recovery.

I was getting pretty frustrated thinking I have a bum card. Went to put the case back together for the night and noticed that one of the dimms locking clips didn't look perfectly tight and closed. Gave it a push and it clicked like you were seating it the first time, but no real visual movement. Then I checked the rest and they settled a little. Then I thought I should check the GPU power and make sure that was in all the way. For sht's and giggles I swapped the PCI-E 6 pins position (GTX 980 takes 2x 6 pin power) and checked that the GPU was seated nicely (it seemed like it was). Fired it up and everything seems fine. MWO is running at 1080p max settings 30-40 FPS. Elite Dangerous is running at 60 fps (locked) on max settings with no stutters or dips anywhere. Switching between programs, no issue. Everything seems fine now, but the SD card reader still has an error.

Went to put the case back on the shelf under the desk and as soon as I moved it, driver crash and recover. Moved it fully back into position after turning it off and it was fine on reboot. Reapplied o/c and it was fine. Turned down particles, shaders, and AA in MWO to get it running 60-90 FPS with everything else maxed. Called it a night and went to bed.

So the big question: what the hell is going on? My theory is that the GPU is so frigging large and heavy, it is causing enough movement of the MB to make the dimms unhappy as I saw when I moved the case back toward the shelf. Never had that issue with the old GPU. The other theory is my PSU is on it's way out, but I feel like that's a cop-out for any weird PC issue. Last theory is a bad card, but it's a evga b-stock, so it just came from their QC checks.

Any guesses on what's happening?

Edited by Big Tin Man, 03 February 2016 - 10:09 AM.


#2 xWiredx

    Member

  • PipPipPipPipPipPipPipPip
  • Elite Founder
  • Elite Founder
  • 1,805 posts

Posted 03 February 2016 - 10:32 AM

Oh, lots of guesses.

The last time I saw something like this happen, the PCIe slot on my buddy's motherboard was faulty. It wasn't supplying the right amount of juice from the slot itself. Turns out the PSU wasn't really doing well and the PCIe slot was just the first of a few things to systematically stop working well before he gave in and switched out the motherboard.

It does kind of sound like there is a strain or stress somewhere, or something is moving, or delivery of power or commands is borked at some stage. Kind of hard to tell beyond that level of generalization unfortunately. This is one of those process of elimination things probably. Maybe the event viewer has some more info? And maybe you could fire up some performance logging to see if there are any temperature or utilization or spec anomalies?

#3 Missing Spartan

    Member

  • Pip
  • Veteran Founder
  • Veteran Founder
  • 10 posts
  • LocationUSA

Posted 03 February 2016 - 10:38 AM

your getting close to full output on you PSU is my guess. 630W PSU with a GTX980 on a overclocked system is probably close to maxing out the PSU which in turn is probably not giving your GPU a steady power supply.

#4 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 03 February 2016 - 10:43 AM

@ wired: no temp issues. Driver crash happens within 30 seconds of windows load, basically when a program was opened it crashed/recovered. Never made it to actually running a program without a crash. GPU and CPU were around 30* C. Don't think it's a faulty MB either, the GTX 750 was in this slot for 2 years and was doing just fine, the card before that had a 3 year tenure.

May try checking the screws on the MB and making sure the mount to the case is tight.

@Spartan: So just how much additional power do you recommend? This says I need 472w http://outervision.com/b/YEg8jU and evga's website calculator says I should buy their 650w. 12v rail is only 2/3rds utilized.

Edited by Big Tin Man, 03 February 2016 - 10:47 AM.


#5 xWiredx

    Member

  • PipPipPipPipPipPipPipPip
  • Elite Founder
  • Elite Founder
  • 1,805 posts

Posted 03 February 2016 - 11:21 AM

View PostBig Tin Man, on 03 February 2016 - 10:43 AM, said:

@ wired: no temp issues. Driver crash happens within 30 seconds of windows load, basically when a program was opened it crashed/recovered. Never made it to actually running a program without a crash. GPU and CPU were around 30* C. Don't think it's a faulty MB either, the GTX 750 was in this slot for 2 years and was doing just fine, the card before that had a 3 year tenure.

May try checking the screws on the MB and making sure the mount to the case is tight.

@Spartan: So just how much additional power do you recommend? This says I need 472w http://outervision.com/b/YEg8jU and evga's website calculator says I should buy their 650w. 12v rail is only 2/3rds utilized.

Yeah, but the 980 is going to take the full amount of energy from the PCIe slot in addition to extra power from the 6/8-pin. The 750 is a hair less than the full amount from the slot at 65-70W. Like I said, though, this is really just anecdote from the last time I saw an issue like yours.

See if getting into safe mode helps. Worth a shot. Also, if it happens pretty quickly and you can't get into the event viewer, replace the card with something you CAN keep stable enough to get into the event viewer.

A 600W PSU would be enough to power your system. The 980 doesn't draw a ridiculous amount of power, and a heavily OCed Intel system isn't going to push it over that. If the PSU's degrading, though, that could be an issue with it delivering the right amount of power and could also cause odd behavior. If it's getting radically out of line and causing weird noise, spikes, etc through the power delivery process, it could be causing havoc or even damaging motherboard power systems, too.

#6 Goose

    Member

  • PipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 3,463 posts
  • Twitch: Link
  • LocationThat flattop, up the well, overhead

Posted 03 February 2016 - 11:51 AM

Can you get into Safe Mode? Can you get into VGA Mode?

Are you using DDU for these fresh driver installs?

#7 Oderint dum Metuant

    Member

  • PipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 4,758 posts
  • LocationUnited Kingdom

Posted 03 February 2016 - 12:16 PM

How old is the PSU. Would be my question.

#8 Missing Spartan

    Member

  • Pip
  • Veteran Founder
  • Veteran Founder
  • 10 posts
  • LocationUSA

Posted 03 February 2016 - 12:27 PM

Use this site to calculate your system draw. http://extreme.outer...culatorlite.jsp also for best effeciency you want to only be using about 60-80% of your PSU. Also your I7 930 is a power hungry CPU @ 130w stock. with an overclock you could be using up 200W with that component alone.


EDIT: Just did a check on your system and it looks like you still have about 100watts of headroom. Even tho I would still probably get a 750w PSU so you have more room if you intend to overclock the card.

Edited by Missing Spartan, 03 February 2016 - 12:37 PM.


#9 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 03 February 2016 - 12:48 PM

View PostGoose, on 03 February 2016 - 11:51 AM, said:

Can you get into Safe Mode? Can you get into VGA Mode?

Are you using DDU for these fresh driver installs?


Yes, used safe mode with networking to d/l the various versions of nvidia drivers at first. Used DDU for the clean driver installs. No issues while in safe mode, which is why I chased the drivers for so long thinking that was the issue. After pushing the ram in and switching the 6 pin connections, there haven't been issues.

View PostDV McKenna, on 03 February 2016 - 12:16 PM, said:

How old is the PSU. Would be my question.


Embarrassingly old. Was new when I built the system, so 5-6 years now. Thinking about it more, switching the 6 pins to make it work but the USB card reader is still malfunctioning is sounding more like a smoking gun here.

View PostMissing Spartan, on 03 February 2016 - 12:27 PM, said:

Use this site to calculate your system draw. http://extreme.outer...culatorlite.jsp also for best effeciency you want to only be using about 60-80% of your PSU. Also your I7 930 is a power hungry CPU @ 130w stock. with an overclock you could be using up 200W with that component alone.


See OP, already had that info up there. Load is ~422w of 630w, so 66.9%. Feels ok, but I'm not an expert of this, and I don't remember my o/c voltages off the top of my head.

#10 xWiredx

    Member

  • PipPipPipPipPipPipPipPip
  • Elite Founder
  • Elite Founder
  • 1,805 posts

Posted 03 February 2016 - 12:51 PM

That isn't particularly sound advice for efficiency reasons. PSUs do not typically reach peak efficiency until somewhere in the 70-90% utilization range. Going bigger when you don't need to isn't really better. Having breathing room is good, but having too much breathing room is an absolute waste.

If the cause of the issue lies in the PSU, it's the PSU's ability to do its job in its aged state and not the amount of power it can output.

Realistically, the motherboard is 7 or 8 years old now, too, so I'm kind of wondering how long ago the system was built. Just kind of wondering more than anything, but if the system was built around the board's release date and the PSU hasn't changed, we can safely assume at least the PSU should be replaced.

#11 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 03 February 2016 - 01:04 PM

psu, mb, cpu , and 3 of the 6 dimms are all original. I think I built the system mid 2009 or so and it's been good.

#12 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 03 February 2016 - 01:14 PM

What tools do you guys like for attempting to diagnose a psu issue?

#13 xWiredx

    Member

  • PipPipPipPipPipPipPipPip
  • Elite Founder
  • Elite Founder
  • 1,805 posts

Posted 03 February 2016 - 01:49 PM

Eh, there are testers available on sites like Newegg. Problem is they're not the most trustworthy way of diagnosing PSU issues. The only good way is to swap it out with another known-good one.

Here's a few questions for you: Have you tried putting the 980 in a different PCIe slot? Have you tried using a different 6/8-pin connector for the extra power? Have you tried unplugging that card reader?

#14 Oderint dum Metuant

    Member

  • PipPipPipPipPipPipPipPipPip
  • Ace Of Spades
  • Ace Of Spades
  • 4,758 posts
  • LocationUnited Kingdom

Posted 03 February 2016 - 01:53 PM

View PostBig Tin Man, on 03 February 2016 - 01:04 PM, said:

psu, mb, cpu , and 3 of the 6 dimms are all original. I think I built the system mid 2009 or so and it's been good.


PSU 's will suffer wear over the years and supply less power. I think age has just caught up with it and it's not supplying enough power

#15 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 03 February 2016 - 02:05 PM

View PostxWiredx, on 03 February 2016 - 01:49 PM, said:

Eh, there are testers available on sites like Newegg. Problem is they're not the most trustworthy way of diagnosing PSU issues. The only good way is to swap it out with another known-good one.

Here's a few questions for you: Have you tried putting the 980 in a different PCIe slot? Have you tried using a different 6/8-pin connector for the extra power? Have you tried unplugging that card reader?


I'm coming to the realization that I need a new PSU.

The MB only has one 16x slot so I didn't think moving to a 8x slot was a good idea.

The PSU only has two 6/8 pin connectors, so I can't trade those around other than the left/right switch that I already did, and it worked. For now.

#16 xWiredx

    Member

  • PipPipPipPipPipPipPipPip
  • Elite Founder
  • Elite Founder
  • 1,805 posts

Posted 03 February 2016 - 06:22 PM

The x8 slot might slightly reduce performance, but it can't hurt to try it to see if the issue still happens.

#17 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 04 February 2016 - 09:02 AM

So a couple things happened last night. I decided to try to have fun instead of tinkering right away.

1. Everything seemed stable with the 344.75 driver. Played probably 10-12 matches, multitasked while waiting for the group, etc. Ran afterburner to monitor FPS and voltages, everything seems normal. FPS was in the 40-50 range throughout matches with nearly maxed settings 1920x1080 and vsync running. Temps below 60. Was pretty darn smooth.
2. Turned physx back auto-decide somewhere in the middle of all those matches
3. The front USB card reader healed itself, somehow
4. Sometime during the evening, Windows decided to drop from 32-bit color to 16 bit color. Weird. Turned it back to 32 bit.
5. Then just for fun at the end of the night I tried upgrading the driver to the latest 361.86. Driver crashing immediately came back. Went to bed saying screw it, I'll roll back to 344.75 tomorrow.

New PSU should arrive today. The overall weirdness that has been created by upsizing the card (and power draw) sure smells like a psu on the way out. Could still be the MB, but I'd wager that if I put the gtx750 back it, everything would go back to normal.

#18 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 05 February 2016 - 08:54 AM

AAAARRRRGGGHHHH.

Installed new PSU late night: no change. Drivers still crashing
DDU the 361.86 drivers and reinstalled 344.75 (which worked with the previous PSU): drivers still crashing. Rage. I was better off with the old PSU.

Opened support ticket with EVGA. Their initial suggestion was to check voltages in BIOS, run memtest and update BIOS. Anything else pop into anybody's mind here?

#19 Goose

    Member

  • PipPipPipPipPipPipPipPipPip
  • Civil Servant
  • Civil Servant
  • 3,463 posts
  • Twitch: Link
  • LocationThat flattop, up the well, overhead

Posted 05 February 2016 - 09:09 AM

[grasping at straws]Bump the QPI PLL and/ or it's "twin," IOH Core? That seems to be my cure for all the 0x9fs I was racking up.[/grasping at straws]

#20 Big Tin Man

    Member

  • PipPipPipPipPipPipPipPip
  • Rage
  • Rage
  • 1,957 posts

Posted 05 February 2016 - 12:02 PM

Really dumb question: my PCI cable has two 6+2 pin cables on it. I should use both of those to power the GPU, and not completely separate cables for each, right?





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users