• BSOD Without Full Dump nor Minidump

    Home » Forums » AskWoody support » Windows » Windows 10 » Questions: Win10 » BSOD Without Full Dump nor Minidump

    Author
    Topic
    #2436304

    I am running Windows 10 Professional 64-bit 21H2 with the March patches installed.  This morning and afternoon my machine experienced two BSOD outages.  There are two System EventID 6008 entries, but I can find no other EventLog entries nor full dumps nor minidumps that tell me what has happened.  Do I have to set some flag (maybe in the registry) to get a minidump?  Or has the minidump directory changed?  My C:\Windows\minidump directory is empty.  And I was not at my computer each time the BSOD occurred.  On Feb 06 I installed a new motherboard, new CPU chip, and new memory; I have not had problems since that time.  Could the problem be memory-related?

    Viewing 10 reply threads
    Author
    Replies
    • #2436372

      File name is memory.dmp

      Try Nir Sofer’s
      https://www.nirsoft.net/utils/blue_screen_view.html

      BlueScreenView scans all your minidump files created during ‘blue screen of death’ crashes, and displays the information about all crashes in one table. For each crash, BlueScreenView displays the minidump filename, the date/time of the crash, the basic crash information displayed in the blue screen (Bug Check Code and 4 parameters), and the details of the driver or module that possibly caused the crash (filename, product name, file description, and file version).
      For each crash displayed in the upper pane, you can view the details of the device drivers loaded during the crash in the lower pane. BlueScreenView also mark the drivers that their addresses found in the crash stack, so you can easily locate the suspected drivers that possibly caused the crash.

      https://www.nirsoft.net/utils/application_crash_report.html

      WinCrashReport provides an alternative to the built-in crash reporting program of Windows operating system. When application crashes in your system and Windows displays the internal crash window of the operating system, you can run WinCrashReport, and get extensive report about the crashed application. The crash report of WinCrashReport is displayed as simple text or in HTML, and includes the following information: Crash memory address, Exception code, Exception description, Strings found in the stack, call stack, processor registers, modules list, threads list, and more…

    • #2436466

      I’ve been having problems of a similar nature recently.   Not exactly:  no BSOD, but rather the “Black screen of death” – i.e. the computer just stops, screens go black.  Have to hard boot restart.   This initially happened when I was running several graphics programs together.

      To test I ran several utilites.  I ran FurMark GPU stress test for 2 hours without problems.  Then memtest utility ran for 2 hours without problems.  Lastly I ran prime95 – which I understand to be a stress test for the microprocessor – which lasted about 2 minutes and then the system died blackscreenod….

      So I was thinking that perhaps I have a microprocessor problem  (AMD Ryzen5) – but this thread gives me to think otherwise.  Nirsoft – Wincrash report – shows a whole bunch of searchapp.exe failures “stopped working” which seem to roughly correlate with times.

      Looking at this I see sugesstions that kb5010415 update can be problematic.  But I dont’ see it installed – though it seems to be a bundled update.

      So unclear to me.  More work….

      • #2436469

        When is the last time you updated the video card driver?

        Susan Bradley Patch Lady/Prudent patcher

    • #2436475

      For me, I updated graphics drivers, the system bios, and the chipset bios.   Didn’t want anyone to accuse me of not having updated drivers…..

      Richard

       

      • #2436478

        Also to note that I tried it in Safe Mode and had a very similar response.

        Looking at the ResMon screens while running prime95 I see what looks like virtually 100% CPU activity, but lots of memory and network activity too.  All the network stuff is background, I expect – rather gives one to think one should disconnect the internet!  (though perhaps network and internet are not interchangable in this context.)   Memory gives a number of hard faults but I gather this is not unknown, and the system is supposed to compensate for that (…)(shows my ignorance).  Then, having said that, I don’t see the same number of hard faults when running MemTest64…  Why would one see them with one program (prime95) and not the other?

        • #2436489

          Then, a little bit further confusing.    Probably has to do with how the utilites run.

          The ResourceMon utility actually shows 100% CPU activity when the MemCheck86 utility is running (together with a lot of memory activity, without significant memory errors).  But the prime95 utility – showing a similar pattern on the CPU / Memory utilization, kills the system quite quickly.    It all means I don’t know what is going on.

          Previous message awaiting moderation says I had a similar pattern when running the system in safe mode.

    • #2436683

      A nice mixed bag here.. so this post is a bit one size fits all..

      Prime95 a bit of software written in an attempt to find large prime numbers (https://en.wikipedia.org/wiki/Prime95)- the side effect is the process is incredibly intensive on CPU and memory operations.. The distribution of the operations is thus skewed (lots of multiplication and shifting, so the heating of the CPU die is uneven on a cycle by cycle basis and the workload continuous) which causes instabilities to manifest as the likelihood of those events is framed in the number of operations needed to encounter them – the fact they happen doesn’t mean the CPU is the cause of the instability. Ultimately more events, more chance of a failure, as eventually all devices will fail.

      A cosmic ray passing through could flip a bit and cause a failure so you need to either test long term or in several runs to determine repeatability of the time to failure (thermal issues testing from similar starting conditions give higher repeatability, noise gives more random failure times) so you can really only use the test to prove poor reliability and characterise it as a thermal or noise problem.

      Note here thermal isn’t just the CPU. If you have the luxury of an IR camera you’ll find the inductors in the CPU voltage regulator heat faster than the capacitors there, and both of those suffer more thermally than the CPU when you load the system as the CPU  has several protection methods in its design. Unfortunately only large OEMs have resources to properly evaluate if the airflow across a main board is adequate to cool any given design of motherboard..

      I would narrow the field with processor diagnostics (Intel do those, unsure as to how good the contender’s offerings are) and memtest of various flavours (include a passmark pro demo to get the memory thrashing in Windows if you need to make it work harder..). If a graphics card is suspect something like the demo for the range is good. That said we found Mad Mod Mike was good for exercising Nvidia cards way past the series it was released for..

      Unfortunately you could be heading for a bruising if you just change a motherboard / CPU.

      All the kit in your rig ages at roughly the same age as it all does the same on time at the same temperature, the variation determining which bit fails is how hard it works..

      So if you have replaced the main board, the principal quality component which determines how the machine acts under CPU load is the CPU voltage regulator, as if the capacitors there can’t meet their specification the CPU core voltage becomes noisy and it’s a literal matter of time before the CPU hits an error as the supply went soft just as the CPU hit the transfer latches to get the data in or out..

      My experience is the CPU isn’t usually the fault; it’s destroyed by the fault. Eventually. Unfortunately it has been the way for some time that you can’t get a new board for an old chip but it could also be false economy to change all the silicon and stick with the old power supply. Even if you by a high flyer replacement it’s probably still cheaper than the parts you’re considering replacing, and it holds the capacitors which stabilise the power to the motherboard, so the regulator on your new board is quite possibly now working harder, reducing its potential life. Look at the MTBF on it’s datasheet (if the PSU hasn’t a datasheet, the consumer standard is a pretty poor 80000 hours so they wouldn’t want to shout about it..) Gauge the running hours for the PSU using the hard disk operating hours if you haven’t replaced the drive – and work out if you PSU is past it before diving in and replacing the expensive bits.

      Of course if you have a PCIE graphics card which uses the extra power connector that might show stability issues faster if the PSU is behind a future issue (or alternatively a graphics demo can detect a PSU issue like a bad PCIE power port – some cards give a POST display to inform the user on reboot if the power port has failed!), depending on the machine’s usage. Also, if the PSU is weak connecting a new motherboard able to take large bites of current to regulate its supplies properly will work the PSU harder so it will let you know it has issues pretty quick..

      As to the driver issues- you can change those as much as you like; the base hardware stability need to be good for that to make a difference so in evaluating what’s changed to cause the problem it’s time and updates, so back up the current system and restore a backup prior to the manifestation of the problem is going to clear or confirm software as a contributor to the problem (and probably you should have started the backup when the system started to show issues as drive failures can be unforgivingly fast) which is why maybe you need to give Crystaldiskinfo a go for piece of mind..

       

      1 user thanked author for this post.
    • #2436866

      Thanks – but very complicated as you suggested.

      Usually I would say that if a machine has been running for months (as bsfinkel said in the intro to this post) or a year (which is my case) then likely the memory or CPU aren’t suddenly failing…   your suggestion of a capacitor seems more likely.  That’s not fixable, from my point of view…

      Powersupply replacement is pretty simple and not desperately expensive. Mine is newer but ?hours – I have no idea.  Do you suggest this is a reasonable thing to do?

      One concern I have is the suggestion that a recent MS KB update has been associated with such problems.  If that is the case one could replace hardware all day and not find happiness!

      Fortunately for me I don’t do anything mission critical so unless the BSOD becomes a real issue (and not one generate primarly by stress tests) I may not find it worthwhile to worry too much at this time.

      Richard

       

      • #2437032

        If you’re lucky you can literally see a capacitor failing – they swell, the vent (almost always on the end you can view now) splits to stop it exploding off the board and ultimately the solution of salts starts to crystallise on the outside. Peer through the PSU fan in the area towards the output wires and at the capacitors around the CPU and see if you see any looking like these results..

        https://www.google.co.uk/search?q=failed+capacitor+split+vent&source=lnms&tbm=isch

        Unfortunately you can only see advanced failures, fixing them is only a stop gap measure as the others will have had the same wear and will eventually fail so if you see them in that condition and the PSU is over 3 years old just replace it. I have replaced CPU voltage regulator capacitors but mainly as a vehicle to getting to the point the customer could prepare to move to a new machine.

        For anyone who might want to try to replace lead mounted motherboard capacitors..

        The size of soldering iron needed is impressive due to the wide high current connections buried in the main board and the hell there is if you pull the capacitor wire out and the hole closes the joint area is too small to get the heat in to the board sufficiently to reflow all the way through the hole with a soldering iron without the copper lead in situ to act as a heat shunt. If you go there be prepared for some very carefull pin drill work having picked the drill size as the lead size of the failed part (from the datasheet).

        The replacement method is thus pull the failed part off its legs (they pull through the part’s rubber base seal easily), snip off the internal bits which pulled out, leaving just the lead protruding from the board which also serves to allow you to maintain the centring of your solder pump on the joint while you move your head to view the print side and apply the soldering iron to the other side. After a bit, move the pump marginally backwards and forwards across the print – if the joint has flowed through you’ll see the end of the lead on the iron side twitch as the pump brushes the other end, and that’s the point at which you press the button on the pump and hope to see the lead and solder disappear through the hole into the pump leaving an empty hole for the new part to be fitted.

    • #2436950

      I think I have traced by BSOD problem to heat.  I happened to be at the computer when it rebooted, and it appeared the same as what happened before I changed motherboard (my CPU fan had stopped working).  I am running BOINC, and BOINC was overloading the six cores.  I run Core Temp, and it was showing occasions when the temperature of the cores was in the RED category.  I disabled BOINC and BOINC tray, and I will wait to see if I get any more BSODs.  Open Hardware Monitor shows that the three fans are running at 970 (#1), 1436 (#2), and 1045 (#5) RPM.  The motherboard lists the fan jacks as 1, 2, and 3.   So   maybe I should replace the fan that OHM says is #1.  I assume that the fans are always running at full speed; is this correct?

    • #2436994

      Fans should run at variable speed depending on heat generated. The mobo controls them.
      Watch the fan speeds as you make the machine do more work.

      cheers, Paul

    • #2436995

      BTW, there is no guarantee that the fans are actually cooling the system adequately even though they are specced for the system. My Dell box doesn’t have temperature monitors that I can access, but the NVMe disk does have one and every now and then it passes 60c when the machine is working hard.

      cheers, Paul

      • #2437033

        That sounds exactly right. Dell would only fit enough cooling to maintain the parts within the limit of the maximum running temperature. They’re not going to up the price of their machines to fit better cooling to every one to make your machine last longer, as they hope to sell you long term maintenance or the replacement.

        Also I take you’ve had the situation where SMART shuts an overheated drive down and it recovers when it cools? The mechanical drives definitely did this – and as the drive shut down there was never a log entry, of course.

         

    • #2437717

      I take you’ve had the situation where SMART shuts an overheated drive down

      Nope, it’s the only drive so I would notice.  🙂

      cheers, Paul

    • #2437788

      With respect to the fans, I have found they are far easier to stall (to a twitching state) when “barely moving” before the action so IF they’re suspect I would suggest the test there would be turn off the fan control so they run at full blast and put up with the noise long enough to be fairly certain the problem hasn’t happened in a timescale it would previously have done so.

      If that does “fix” the issue, you have a fan with a poor bearing (worn ball care on a ball bearing fan – open and look for swarf at the shaft if you can, sleeve bearing fans tend to whine – look for a dark oily ring around the circlip, but if you can open those probably just add a dot of oil with an oil pen or similar and see if it helps). There seem to even be places selling new bearings now but it’d have to be a pretty special fan to not just change it and see.

      Alternatively to test, strap a chassis fan to the CPU with zip ties (move the wire as well, obvs!) and leave the side off (with a desk fan facing in if your system’s really hot enough to need it!) and if the problem goes away buy a new CPU cooler with reasonable pedigree.

    • #2468342

      This topic has migrated to the AskWoody thread “BSOD VIDEO_TDR_FAILURE (116)“.

      In response to four 116 BSODs in one hour (cause not resolved, but not repeated), I re-seated the two memory sticks (actually interchanged them) and checked the power connections to the motherboard.  My last auto-reboot was 07/23/2022 08:17, before I shut down to do the re-seating.  See what I have posted there.

    Viewing 10 reply threads
    Reply To: BSOD Without Full Dump nor Minidump

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: