• BSOD VIDEO_TDR_FAILURE (116)

    Author
    Topic
    #2458046

    I installed the June patches on June 28 on my Win 10 Professional 64-bit 21H2 system.  Today, in the space of 45 minutes, I had three 116 BSODs:

     

    VIDEO_TDR_FAILURE (116)
    Attempt to reset the display driver and recover from timeout failed.

    MODULE_NAME: memory_corruption
    IMAGE_NAME: memory_corruption
    FOLLOWUP_NAME: memory_corruption
    DEBUG_FLR_IMAGE_TIMESTAMP: 0
    MEMORY_CORRUPTOR: LARGE
    FAILURE_BUCKET_ID: X64_MEMORY_CORRUPTION_LARGE
    BUCKET_ID: X64_MEMORY_CORRUPTION_LARGE
    Followup: memory_corruption

    These are the first times I have experienced this BSOD in Windows 10, and I have no idea if it is caused by the June patches:

    KB5013887 2022-06 Cumulative Update .NET 3.5 & 4.3 for Win 10 21H2
    KB5014699 2022-06 Cumulative Update for Win 10 v21H2 x64

    I rebooted from a Memtest 5.0 CD, and I ran Memtest for four hours.  When I cancelled the test it was on pass 2 of 4, and it had not found any errors.

    I had not really run Windbg since my Win 7 days.  I set the environmental variable

    _NT_SYMBOL_PATH to c:\symbols

    Is this the correct setting?  I am not an expert in Windbg, and I do not know a lot of Windows internals.  Does anyone have any suggestions as to what I can do next to debug this?  I have four minidumps.  I have attached the output of my Windbg on the first minidump.  Thanks.

    Viewing 12 reply threads
    Author
    Replies
    • #2458052

      I’d start by looking in Device Manager for any alerts, and being sure your display/video drivers are up to date from you computer OEM or the device OEM.

    • #2458051

      Is this the correct setting?

      Yes, or you can use any other valid directory. Have you cleaned inside your computer lately or manually installed a new video driver (or because of Windows update changed video drivers)?

      Excess heat and dust build-up does cause this type of driver crash. (After checking for many possibilities cleaning the computer restored system stability.)

      Even though video driver issues are not listed as a known problem for this cumulative update, have you tried to uninstall this update?

      • #2458226

        I installed a new motherboard and CPU February 06.  And the case was fairly clean then.  I have not tried to uninstall this update.  I am not sure I trust an uninstall to put things back the way they were.  The four 116 BSODs were all during a Zoom session.  I started a Zoom session (with just me) as a simple test to see if I get any BSODs while Zooming for half an hour.

        I do have “unknown” heat problems with the CPU – Intel I5-10400.  I am running Open Hardware Monitor to monitor the fan speeds and the core temperatures.  Right now the cores are 113-124 degrees F.  And I have not yet determined the temperature at which the CPU will auto-reset.  I had one auto-reset this morning when I plugged a USB Passport drive into the front panel to do my Monday morning backups.

         

        I will look at the sysnative.com web site soon.

         

        When I look at the device Manager for my Asus EAH6450  graphics card, I see a large number of driver files listed.  I use Iobit Driver Booster to tell me if drivers are old.  I normally do not use this utility to update drivers; I check with the manufacturers’ web sites to find new drivers.  In Driver Booster I do not see any driver specific to my graphics card.

        1 user thanked author for this post.
        • #2458258

          The max temp for an i5-10400 (i.e. the temp at which it will auto shutdown) is 100° C (212° F) and those temps aren’t anywhere close to that!

          A typical “under load” temp for that CPU would be ~75° C (167° F) and you’re well below that as well.

          If you’re actually seeing unusual temperature spikes that are causing it to overheat and shutdown, it’s possible the thermal paste didn’t make a good seal between the CPU and heat sink and it’s simply taken this long for it to show up.

          BTW, what wattage is your PSU (if it’s not large enough to provide sufficient power to the system when its under a load, it’d cause the sort of symptoms you’re seeing.)

          • #2458276

            The PSU is a new 750 W unit that I installed with the new motherboard and CPU.  In the past, I have run a full Defender scan (26 minutes) that used all cores to the max and got the temperatures close to 200 degrees F.   I tried a full scan a week ago, and after less than a minute the machine auto-restarted.  Maybe I need more thermal paste.  I have replaced one of the chassis fans, and I have two other replacement fans.  But the current fans seem to be operating normally – CPU fan 1080 RPM and the three chassis fans 850-980 RPM.  Open Hardware Monitor shows that the max temps are around 160 degrees F.  It appears that OHM does not store max values over a reboot.

        • #2458316

          (Thanks)

          Sometimes a totally unrelated program can cause these TDR Failures, a few search results return with Zoom video conferencing being mentioned with this failure although the viewing the results were inconclusive and unsolved. Does it happen while watching any other video service (YouTube, Vimeo, etc.)?

          To backup @alejr: Depending on ambient room conditions 113-124F at system idle is okay. If the CPU temperature is reaching 160F under load is also okay this is a safe zone but still hot.

          Windows Defender driving the CPU to nearly 200F for that amount of time can wear out the CPU even though silicon may take the stress. Also recommending you to replace the thermal paste, especially if the current paste is a decade old.

          Do you have any problems with static electricity all year?

          (Are your USB drivers up to date?)

          • #2458430

            Sunday afternoon I watched a one-hour YouTube webinar with no problems.  The ONLY 116 BSODs I have experienced were the four Sunday morning during a Zoom meeting.  The thermal paste is as old as the CPU, which I installed February 6.  There are no static electricity problems, as far as I know.  As for USB drivers – I do not know.  Monday I plugged one USB Passport drive to do a backup of my data disk; there were no problems.  After that backup, I removed that Passport and plugged in a different Passport USB, and the system auto-rebooted.  Driver Booster says that I have many Intel(R) ICH8 Family USB Host Controller – {2834, 2836, 3830, 2835, 2832, 283A, 2831} drivers that need updating.  Current 06/21/2006 Available 7/31/2013.  I have no idea if this is one driver for all of the USB controllers or a different driver for each; Driver Booster does not give me details before I update.  I run Intel Driver and Support Assistant, and I have no idea if that utility would alert me to any outdated Intel USB drivers; I have never looked at any output from that utility.

            Someone suggested that, even though memtest ran fine, I should re-seat the two memory sticks and maybe interchange them in the slots.

            • #2458702

              I hope that you can find out whether Zoom is the problem causing software, it could be searching only Zoom’s forums again shows complaints but perhaps no solutions.

              The USB driver may be fine as is, it might not cause harm to update them as the drivers cover a range of chip set models. Although wait until the Zoom issue get resolved before introducing another potential issue. Intel’s utility should tell you if a driver needs replacement although sometimes you will need to search for driver manually.

              Has this different Passport drive caused other computer to auto-reset (spontaneous reboot?) or only this one?

            • #2461753

              “I hope that you can find out whether Zoom is the problem causing software, it could be searching only Zoom’s forums again shows complaints but perhaps no solutions.

              “The USB driver may be fine as is, it might not cause harm to update them as the drivers cover a range of chip set models. Although wait until the Zoom issue get resolved before introducing another potential issue. Intel’s utility should tell you if a driver needs replacement although sometimes you will need to search for driver manually.

              “Has this different Passport drive caused other computer to auto-reset (spontaneous reboot?) or only this one?”

              My replies: I have used Zoom a number of times since, and I have not experienced any further 116 BSOD outages.

              Intel(R) Driver & Support Assistant reports, “No supported driver or software updates are available for your system.”

              I have two Passport drives that I use once a week for backups.  I have had outages with both, but I have never had outages with both on the same day.

              These two external hard drives are used ONLY on this computer; I do not use them on the Win 8,1 laptop that I manage.

              One further question.  I am assuming that the auto-reboots are initiated by the CPU chip, which may be sensing overheating.  These are not Windows-initiated BSODs, as there is no mi minidump produced and no traditional BSOD screen from Windows stating that a problem has occurred.  Is there any other hardware that would cause these auto-reboots?  My brother suggested that, since the 116 BSOD minidumps point to memory corruption, that I re-seat the memory and/or interchange the two sticks in the slots.  Could memory cause these reboots?  I do not see how, but I am not an expert in PC hardware.  Thanks.

        • #2458433

          The PSU is a new 750 W unit that I installed with the new motherboard and CPU.

          Unless you’ve got a ton of other internal items that use a lot of power, that should be plenty.

          Maybe I need more thermal paste

          Thermal paste is one of those things were more is not better and you should NEVER add more on top of what’s already there!

          Instead, completely clean off all the existing paste using some “lint free” alcohol wipes and then reapply new paste (a “pea sized” drop applied to either the CPU or heatsink, but not both) and then spread out to the edges using something that’s flat but not sharp (i.e. similar to a credit card.)

          The type of paste you use can also make a difference. My current preference is Thermal Grizzly Kryonaut although there are other good ones out there as well.

          Remember, the intended purpose of thermal paste is to fill the “invisible” microscopic holes between the heatsink and the CPU while allowing them to be as close as possible to “touching each other” so there’s efficient heat transfer between them.

          Applying more than needed can widen the gap between the heatsink and the CPU causing a significant reduction in heat transfer and lead to CPU overheating.

            It’ll also get squeezed out of the sides of the heatsink CPU interface onto other motherboard parts where it “may” cause problems — or at the very least be a real PITA to clean up.

          At this point, I’d say maybe you should try applying new thermal paste and see if that helps bring your temps back under control.

          • #2458458

            B-B sized” or “lentil-sized” (for the foodies) are probably more appropriate descriptions.

            Zig

            1 user thanked author for this post.
          • #2460542

            If I were to open the case and look at the existing thermal paste, how can I tell if what I have is adequate for the surface area of the CPU chip?  One of my brothers thinks that I need a new case with better cooling.  The temperatures now are 100-104 degrees F.  But this morning at 4:09 the machine reset (while I was sleeping), and I have no idea what the temperatures were at that time.

            • #2460849

              You can’t check the paste without replacing it. It’s a use once only thing.

              The case generally makes little difference, but the CPU fan does.

              As you have only manages to get to 200F by maxing out the CPU I would think you have the temperatures under control and the issue is elsewhere.
              Reseat the memory etc and try again.

              Can you log the temperatures with OHM? If not, try Speedfan.

              cheers, Paul

            • #2461019

              To log your temps using OHM:

              Select Options, Log Sensors and specify the temp sensors.

              Select  Options, Logging Interval to set how often it updates the data.

              The sensor data will be saved to a file in the same directory OHM is in.

    • #2458157

      0x116/117 are display driver commands that fail to reach their target due to a ‘logjam’ of other commands taking CPU priority.

      I suggest you have an expert walk you through troubleshooting: https://www.sysnative.com/forums/threads/blue-screen-of-death-bsod-posting-instructions-windows-10-8-1-8-7-vista.68/

      1 user thanked author for this post.
    • #2459500

      Here is a quick update.  I have had two Zoom sessions since last Sunday morning, and both had no problems.  So, I cannot conclude that Zoom is the problem.  I have had two CPU shutdowns this morning – one a few minutes after I left the computer around midnight, and again when I started Open Hardware Monitor (after I logged in from the first reboot). The previous ones were 6/26 – 7/02).   I have had no problems since 10:00AM, and now the CPU cores right now are running 122 – 141 degrees F.  I have one driver update for my ASUS EAH6450 graphics card – from AMD and from Driver Easy (I am not sure that they are the same driver update).  I will install if tomorrow morning after I have done my weekly disk backups.  I have not yet had time to pursue the 116 BSOD dumps with the URL provided earlier.

      Driver Easy says
      current: 2015-09-22     AMD      ASUS EAH6450                                15.201.1801.0
      new:       201 6-02-26    AMD     AMD Radeon HD 7400 series        15.301.1901.0000

      The other is contained in downloaded file amd-catalyst-15.7.1-win10-64bit.exe .

      • #2459843

        The sudden crashes are not good, but the other news is. If those temperatures were caused from modest system load that is okay, but perhaps those temperatures were while the system was considered idle it seems a bit too high.

        The other is contained in downloaded file amd-catalyst-15.7.1-win10-64bit.exe .

        Yes, if you have downloaded that from AMD’s site the AMD catalyst driver version number 15.7.1 will support a Radeon HD6450 or a 7000 series model as suggested by Driver Easy.

        ASUS’s last driver for your card is from 2013 a bit old, usually it is okay to use AMD’s video driver.

    • #2460221

      This morning I  ran the amd-catalyst-15.7.1-win10-64bit.exe  file.  It installed some software and updated some drivers.   I am not sure what driver files were updated; the XML-format report dies not give filenames.   I will wait to see if I get any new 116 BSODs.

      1 user thanked author for this post.
    • #2462238

      I am now logging temps with OHM every two minutes.  When I get another reboot, I will check the log and re-seat the memory.  When I have been at my computer during auto-reboots, there is nothing CPU-intensive happening, as far as I can tell.  I also will check the TPM setting in the BIOS, as I am getting conflicting info on the status of my TPM.

      • #2462533

        Have you tried checking the TPM this way:
        Win R
        tpm.msc

        cheers, Paul

        • #2464317

          01) System Log EventID 15 – TPM (10:02:07AM)
          The device driver for the Trusted Platform Module (TPM) encountered
          a non-recoverable error in the TPM hardware, which prevents TPM
          services (such as data encryption) from being used. For further help,
          please contact the computer manufacturer.
          locationCode 0x1c0004f3

          02) System Log EventID 1282 – TPM-WMI (10:02:23AM)
          The TBS device identifier has been generated.

          03) System Log EventID 1282 – TPM-WMI (10:02:23AM)
          The Trusted Platform Module (TPM) hardware on this computer cannot be
          provisioned for use automatically. To set up the TPM interactively
          use the TPM management console (Start->tpm.msc) and use the action to
          make the TPM ready.

          Error: The operation completed successfully.
          Additional Information: 0x80000

          04) tpm.msc
          Status +
          The TPM is ready for use.

          05) PC Health Check:
          “This PC must support Secure Boot”

          It appears that some of these statements are contradictory.  What I will do (at the next reboot): Run Settings > Update & Security > Recovery and select Restart now under Advanced startup.
          From the next screen, select Troubleshoot > Advanced options > UEFI Firmware Settings > Restart to make changes.

          1 user thanked author for this post.
          • #2465307

            “What I will do (at the next reboot): Run Settings > Update & Security > Recovery and select Restart now under Advanced startup.
            From the next screen, select Troubleshoot > Advanced options > UEFI Firmware Settings > Restart to make changes.”

            I tried, and, (as I might have expected), these instructions did not work.  Maybe they are wrong or written for a different level of Windows 10.  I tried the first line, and when I came to “Restart now”, the compute4r started a reboot.  I did not get into the “next” screen.  So, at this point I have no idea what to try next.

      • #2464608

        Maybe my CPU chip has ESP and knows that the temps are being monitored and logged 🙂
        I have not had an outage since I started logging, but I have gone this long without an auto-reboot before, as per my reboot log:

        04/02/2022 04:17 Auto-reboot
        04/02/2022 16:55 Auto-reboot
        04/03/2022 20:58 Auto-reboot
        04/03/2022 22:53 Auto-reboot
        04/04/2022 08:04 Auto-reboot
        04/04/2022 08:09 Auto-reboot
        04/09/2022 04:21 Auto-reboot
        04/17/2022 14:58 Auto-reboot
        04/17/2022 16:56 Auto-reboot
        04/18/2022 09:19 Auto-reboot
        04/19/2022 11:22 Auto-reboot
        04/20/2022 16:49 Auto-reboot
        06/26/2022 10:46 Auto-reboot
        06/26/2022 16:07 Auto-reboot (during full Defender scan)
        06/28/2022 09:56 Auto-reboot (while connecting blue Passport disk)
        06/28/2022 11:27 Auto-reboot (while closing front panel)
        06/28/2022 14:51 Auto-reboot
        06/28/2022 15:02 Auto-reboot
        07/02/2022 01:59 Auto-reboot
        07/02/2022 08:34 Auto-reboot
        07/03/2022 11:08 Auto-reboot
        07/03/2022 11:23 Auto-reboot
        07/03/2022 11:31 Auto-reboot
        07/04/2022 09:39 Auto-reboot (while connecting blue Passport disk)
        07/10/2022 00:09 Auto-reboot
        07/11/2022 09:11 Auto-reboot (while connecting yellow Passport disk)
        07/11/2022 18:35 Auto-reboot
        07/13/2022 10:26 Auto-reboot
        07/14/2022 04:09 Auto-reboot
        07/14/2022 18:33 Auto-reboot
        07/15/2022 04:38 Auto-reboot
        07/15/2022 08:21 Auto-reboot
        07/16/2022 11:00 Auto-reboot
        07/17/2022 03:01 Auto-reboot
        07/17/2022 10:01 Auto-reboot

        • #2464827

          Maybe the auto-reboot will clear up on it own, but have you tried to reseat all of your power connectors after one of these events happens to you?

          • #2464862

            I have not manually shutdown to re-seat power cables and the two memory sticks, as I assumed that the machine would reboot on its own, and then I could shutdown to do the “maintenance”.   If I run a few weeks without an auto-reboot, then I will power-down and do the maintenance, including the TPM stuff, above.

            • #2465310

              I had two auto-reboots today.  The first at 04:48, when I was asleep (or maybe listening to weather alerts on my radio).  I got to my machine around 8:10, and after I logged in and while I was starting the applications needed for a Zoom session, at 08:17 there was another auto-reboot.  I did not have time to do anything then, so at 21:10 this evening, (after the failed TPM settings test above), I shut down.  I interchanged the two memory sticks, and I rebooted into memtest to test the memory for 2.5 minutes.  Then I rebooted into Windows 10.  The machine did not auto-reboot between 8:17AM and the time I did the memory exchange at 21:10.

              I was logging OHM every two minutes, and I have attached a copy of the log for today.  I did not see any temperature spike at the end, but I have not reviewed the log in detail.  I have changed the logging from 2-minute intervals to one-minute.  But I cannot tell from the System Event Log exactly when the auto-reboot occurred. 04:39:22 is the timestamp of the first entry after the reboot, but EventID 6008 says, “The previous system shutdown at 4:02:09 AM on ‎7/‎23/‎2022 was unexpected.”  And I know that if the reboot was at 04:02, the restart would have put the machine in the login screen within a minute.  So I have no idea from where the time in the 6008 Event Log record was retrieved.  And the last entry in the OHM log is 4:38:23.

               

            • #2466371

              I rebooted into memtest to test the memory for 2.5 minutes.

              A successful 2.5 min memtest isn’t a reliable indicator of good memory.

              It’s barely long enough for it to heat up to normal operating temps and doesn’t even begin to test whether “long term” heat exposure might cause problems.

              Since your problem is “very random“, a much better memory test would be to leave it running overnight (simply set the number of passes to a large value and then stop the test the next morning.)

              It it passes that without without any errors, it’s unlikely the memory is causing your problem.

            • #2466390

              My short memory test was done ONLY to ensure that I had properly re-inserted the two sticks of memory in their slots.  I was assuming, maybe incorrectly, that if the memory stick had not been installed properly, then memtest would fail almost immediately.  I do not know if there is any memory test during boot, so I do not know if incorrectly installed memory would cause a failure.  I assume that the Windows 10 operating system, which needs memory, would fail at some point while starting.  On July 3 I ran memtest for 4.0 hours without a problem.  It was in pass 2 of 4 when I terminated to reboot back into Windows 10.

            • #2466413

              I do not know if there is any memory test during boot, so I do not know if incorrectly installed memory would cause a failure.

              Every motherboard BIOS has a “built-in” memory test that would immediately detect incorrectly installed memory.

              It’ll also detect if you update/change the memory and run a more extensive memory test during the first boot with the new memory configuration.

    • #2466273

      On another forum someone suggested checking the power connections.  After I installed the July updates this morning,  I checked the power connections on the motherboard.  I will see if I experience any m ore auto-reboots.

    • #2468228

      My last auto-reboot was 07/23/2022 08:17, before I shut down to re-seat the memory and check the power connections to the motherboard.  I just ran a full Defender scan, which took 29 minutes (09:44 – 10:13) and got all six CPU cores to over 150 degrees F.   For most of the time all six cores were running at 100%.  I have attached the OHM log file; the temps logged are in Celsius.

    • #2468251

      66° C (150° F) is well within acceptable limits for your CPU so it seems like maybe reseating the memory and power connections fixed things.

      • #2468335

        OHM says that the max temp for each core is 212 degrees F.

    • #2468429

      Specs for the i5-10400 indicate a Tjunction max of 100° C (212° F) so OHM is right on the mark.

      The temps for your CPU under a “normal” load should be 60—80° C (140—176° F) depending on exactly what you’re using it for (my i7-9700K runs between 66—78° C (151—172° F) under load; which regularly includes some video processing.)

      Lower temps for simple stuff like word processing/virus scans, higher temps for more complex things like games/video processing.

      Idle temps should be somewhere around 35—40° C (95—104° F) depending on your cooling setup (my i7-9700K with a 120mm Corsair H60 AIO and 3 case fans stays between 35—38° C (95—100° F) while idle.)

      FYI, Tjunction max is the maximum internal temperature a CPU can reach before it automatically “throttles” itself (i.e. lowers its operating frequency) to reduce power draw and limit further temperature rise.

      If the temp keeps rising after the CPU throttles itself and get to ~105° C (221° F), your PC will shut down to protect the CPU from burning up.

    • #2468486

      Reading through this thread reminds me of an incident I had with my old XP system.
      I changed memory latency timings on upgraded memory months previous to the encounter of freezes in the then new game Far Cry.
      Changing latency timings back to default in bios section resulted in no more frustration freezes thereafter, even though the OS reported everything was fine at my preferred.memory timing settings.

      Motherboard model?
      Might be an idea to check over bios settings if they have been altered intentionally and pehaps forgotten about!? IF the M/bord has that facility, it’s worth checking.

      If debian is good enough for NASA...
      • #2468628

        I installed the new motherboard Feb 6 of this year, and the only changes I made to the BIOS settings since then were:

        1) Change the boot order so that it would look for a DVD before the boot disk (in case I wanted to run memtest)

        2) Change the boot type from CSM to UEFI.

        I think that there is a BIOS upgrade that I have downloaded but not yet installed.

        Another topic (see July 19 and 23,  above) my machine is still not Windows 11-capable.  And the System EventLog entries are conflicting, and the MS help page has instructions that do not work.  But that problem is minor now and NOT related to the auto-reboots.

    • #2487974

      One further comment.  I have had no recent auto-reboots, and every Saturday night I run a full defender scan, which runs all cores at 100% for about 30 minutes.  There are no problems.  I was running BOINC, which uses lots of CPU cycles that would otherwise not be used.  Is it safe for me to start BOINC again; this will keep the CPUs busy when I am not at the machine?  Thanks.

      1 user thanked author for this post.
    • #2488165

      I was running BOINC, which uses lots of CPU cycles that would otherwise not be used

      It actually uses your CPU (and electricity) when you are not using it, so it makes your machine run harder that usual.
      As long as you are prepared to donate CPU and electricity, run BOINC / Folding@Home etc.

      cheers, Paul

    Viewing 12 reply threads
    Reply To: BSOD VIDEO_TDR_FAILURE (116)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: