• Random crashes then auto reboot: Win10 20H2 or h/w?

    Home » Forums » AskWoody support » PC hardware » Questions: How to troubleshoot hardware problems » Random crashes then auto reboot: Win10 20H2 or h/w?

    Author
    Topic
    #2393145

    This is my wife’s machine that since the upgrade to Win 10 20H2 in January 2021 has been randomly crashing. She doesn’t use it much (from 10min to 1 hour a day) but the longer it is on the more likely. Today was after 40 mins. When crashing there is no warning – the PC automatically reboots itself and when Win 10 restarts there is no message that Windows was not shutdown properly.

    Software: Windows 10 20H2 Professional 64bit
    regularly FireFox and The Bat! a PC client email
    occasionally Zoom and Skype.

    Hardware built in 2015:

    • Asus A88XM-A AMD Socket FM2+ not overclocked
    • AMD A4 5300 3.4GHz Socket FM2 Dual Core  processor
    • Corsair CX550M 550W Semi-Modular 80+ Bronze PSU (new March 2021)
    • 8GB (2x4GB) G.Skill Ripjaws X DDR3 CL9 (9-9-9-24) 1600MHz (PC3-12800) Dual Channel running at 1600MHz plus
    • 8GB (2 x 4GB) Ballistix Sport DDR3 1600MHz CL9 (9-9-9-24) 1.5v 240pin running at 1600MHz
    • Western Digital Blue 1TB SATA III 3.5″   legacy MBR boot
    • Western Digital Gold Data Centre 1TB SATA III 3.5″
    • Samsung SH-224 DVDRW Internal 5.25″ Drive DL DVD
    • Asus E11100 Xonar DS Sound Card in PCI slot

    Action:

    • ran memory test (mem86+ I think) for 24 hours with no errors
    • ran WD low level DOS harddisk check with no errors
    • changed the PSU to a new Corsair  (as above)
    • removed the Sound Card but the crashes continued

    Log shows only 1 critical event since 18-11-2020 “critical The system has rebooted without cleanly shutting down first. ”

    What do you guys/girls suggest to help diagnose the fault and keep my wife happy!

    Thanks

    Alan

    Viewing 17 reply threads
    Author
    Replies
    • #2393169

      Did you ever update the BIOS to version 3001 which was released on April 8, 2016? That update is said to improve system stability.

      1 user thanked author for this post.
    • #2393177

      Sometimes the only way to determine the issue is to see it – if a problem affects the ability of Windows to write to the log file you can’t read it later. To that end given the diagnostics you have run, propose you turn off the automatic reboot to see if you get the traditional blue (or maybe green?) screen.

      To do this, in a run box (hold Windows key, press R) type the following and click OK:

      control sysdm.cpl 3,3

      That should open the system properties at the advanced pane. Under start-up and recovery (last item), click the settings button and untick the box next to “automatically restart” and OK the changes, reboot and retest.

      At this point I am going to assume you have done the plugging exercise. If not do that before retesting – basically power off completely and hold the power button in to drain the power supply completely, and then separate each plugged connection in the machine and reseat it again. in 20 years I saw three ATX power connectors which had bad connections (two at ATX2 4 way 12V CPU feed, one at ATX1, 3.3V (the one with the sense wire to it!) severe enough to discolour the connection and cause crashes (probably about 1%) – easier to check and be sure. We also had one machine where any vibration on the case fired the reset due to a duff switch so I guess you can do without that switch for now as well.

      Now if the machine restarts as it did before the problem is either something critical enough  to Windows to cause an uncontrolled crash (rootkit, antivirus? something “kernel mode”..), or a hardware problem (memory allocation or ground bounce [motherboard screws tight and in the right places?]  or  an unexpected interrupt [but you have cleared the soundcard so that leaves a motherboard issue potentially, head to reset the CMOS after writing down the settings.]).

      If you get a error screen you can read, write down what is said as accurately as possible or take a good photo of it with a phone or such and post it (remember to remove anything from the image “too identifying” if you see it – we need error  numbers, process names etc.)

      If it just keeps rebooting take off the side – any change? after a reboot, check CPU temperature in BIOS the figure to be below, widely quoted, is a relatively cool 70 degrees – if its crept up it could be the thermal paste or CPU mounting isn’t quite right.

      If the machine seems just fine after the exercise, decide as to if you actually need the reset button.

      We did have one machine where the customer had routed the USB3 drive caddy wire through the hinge of his desk, ultimately crushing it. The current spikes were enough to upset the power supply throwing suspicion on that – otherwise the device on the end showed no problems! The problem showed after the sort of exercise above when we put it back under the desk!

      1 user thanked author for this post.
      • #2393196

        I think you meant to say:

        Power off the computer, UNPLUG the computer, and then press the power button to drain residual charge in the power supply.

      • #2393197

        Many thanks @oldguy for a comprehensive and quick reply 🙂   Lots to work on there starting with disabling auto reboot – I didn’t know that option existed. Currently my PC is on Win 7 ESU and I’ve yet to get into the innards of Win 10 other than tackling all the privacy settings.

        Of course, when I changed the PSU all the cables were replaced but a visual check on the sockets wont do any harm but first I will wait for the next crash and see any messages.

        Alan

    • #2393188

      Actually, reviewing my thoughts on this, we had issues with MSI boards which could come into play here. with reference to the attachment, we found the inductors in the CPU regulator (the square parts indicated by a red arrow on your board) can get very hot if the capacitors (the row of silver circular parts marked by yellow arrows) loose value as they age and cause instability – each group of components cyclically tops up the core voltage so a failed set causes the regulator voltage to destabilise like an engine with one cylinder down as being a regulator, the average voltage is maintained but the peaks get higher to keep the average at the right value. Unfortunately those silver parts are a complete **** to change as they are soldered to large areas of copper to get the heat out. You may find any number of the items get hot or they all do, basically if they’re over 80 degrees it doesn’t look good (assuming you have airflow..). If it goes too far the problem can potentially damage the CPU (only had it happen once – that machine emitted a loud click when it rebooted. We found with the cover off it was coming from one of the inductors, and the time we had an ear in the right place to determine that the click was final for the CPU.) Perhaps if you ever needed an excuse to buy an infra-red thermometer.. or have one already, so you can compare the temperatures of the other fault finding measures don’t produce results?

      1 user thanked author for this post.
    • #2393184

      I suspect overheating, from dust in CPU fan.  But the above suggestions are great.

      • #2393201

        I do clean the PC about once per year (recall it’s not used much) and gave it a clean when the problem started. As you say, the most build-up is on the fan.

    • #2393206

      Hi Alan_uk,

      Hmm…interesting that the problem started after you last cleaned the computer.

      Try downloading and running Nirsoft’s bluescreenview.exe from his web page about his program:

      https://www.nirsoft.net/utils/blue_screen_view.html

      Hopefully it might indicate what is causing the problem.

      Also download, install and run Speccy so that you can monitor the CPU temperature. It could be that the thermal paste between the fan and CPU is no longer effective, or that the CPU fan itself is not properly seated. Get Speccy (free version) from here:

      https://www.ccleaner.com/speccy

      If the CPU is running too hot, then obviously it is time to replace the CPU fan.

      Yet I am thinking that it is a hardware issue. Once you get the computer shut down and powered off (and having discharged the PSU by pressing the power button), open up the computer and check all connections. Don’t forget to check that the screws which attach the motherboard to the inside of the case are all snug yet not overly tight. Double check that the 24-pin and 4-pin ATX power supply cables for the motherboard are properly seated by unplugging them and reseating them. And double check that the CPU fan is seated properly.

      Best regards,

      GTP

       

      1 user thanked author for this post.
      • #2393207

        Hmm…interesting that the problem started after you last cleaned the computer.

        No, I cleaned it last after the problem started. Sorry if that wasn’t clear.

        Thanks for all the other hints. Bluescreenview looks very useful. I will follow the notes and to configure Windows to create MiniDump files on BSOD.

        Alan

    • #2393370

      Just an update: spent 10 hours yesterday on this (multi-tasking with other things).

      First a whole system image backup then applied Sept Windows Update – then the fun 😉 started. Froze on download 100% – KB5005565  I think it was – 600MB. Tried downloading again (another few hours), tried various fixes and eventually disabled Windows Update, deleted the download cache and manually installed the cab file.

      I then set the dump option for Nirsoft’s bluescreenview (complained the virtual memory was too small (<800MB) – but it’s 4GB/8GB but on D: drive), disabled Windows restart on crash, and deferred Update until end October.

      During this 10 hours it crashed once at about 6 hours in and then a later again immediately after a reboot after loading the desktop.

      Let’s see / hope the next crash gives some useful information.

      PS Who said technology makes for a productive work environment 😉

      • #2393424

        Under Control Panel >> System >> System Protection >> System Properties >> Advanced >> Startup and Recovery >> Settings >> System Failure

        select Small memory dump (256 KB) under the Write debugging information box. The small memory dump provides more than enough information for Nirsoft Blue Screen view to show you the module and process which causes the blue screen.

        1 user thanked author for this post.
    • #2393407

      Hmm. I have seen that with a hard disk issue. The problem with the drives today is they’re too intelligent- they try to fix themselves as they are running. Perhaps you could download the zip (XP-) archive from

      https://crystalmark.info/en/download/#CrystalDiskInfo

      (escape should clear adverts..)

      Locate the DiskInfo32 or DiskInfo64 (depending on if you have 32 or 64 bit Windows) and run it. It generates a quick summary for each drive or you can save them all to a log.

      The situation I’m concerned out is listed as “pending sectors” – these can lurk in an indeterminate state and should have been resolved by the diagnostics. If any drives flag the interface amber or red its a concern (the tabs will show the colour). Remember to redact the drive serial numbers from the log if you post it (can be used to get extra support sometimes..)

      • #2393667

        Thanks oldguy. I didn’t mention it but towards the end of doing the manual windows update Windows advised a reboot and scan which I gave permission to do. I’m not surprised that with these crashes something got corrupted. I don’t don’t if the scan found anything. It was running and I looked and when I looked again Windows was starting.

        I’m wary of free/shareware but given your recommendation I will try crystalmark.

        Alan

    • #2393656

      Sorry, still mulling this over. Windows update wise next time there’s a problem, it might be worth installing the servicing stack update and trying again as manually installing an update can actually break the stack if it is out of date:

      https://msrc.microsoft.com/update-guide/en-us/vulnerability/ADV990001

      if the problem persists then dive through the relevant steps at

      https://docs.microsoft.com/en-us/windows/deployment/update/windows-update-resources

      Another thought I had was as i read it the problem happened after 6 hours (well into the “running at full temperature” state) and again immediately post reboot, as the desktop appeared, (heavy workload on memory, CPU , hard disk controllers and at the point the graphics card is hitting the max workload wise as it initialises full display while at running temperature so with minimal thermal headroom)..

      When you cleaned out the machine I take it you did clean the heatsink on the chipset but did you make sure it wasn’t crowded by wires and the like? It looks pretty small, which can equate to sensitivity to its immediate surroundings. Alternatively do you have an old CPU cooler or such the fan of which you could place near enough to that heatsink to cool it? – with the aim of seeing if it affects the time the system takes to fail (all other things being equal of course). Unfortunately the chassis fan outlet is above the rear end of the top PCIE slot so it might be easier to set any fan to sit in the base of the chassis and blow air across the bottom of the case if it’s all you can do without risking damage.

      The only other thing I can think of is check the PSU power INPUT cable isn’t sharply bent where it enters the PSU and if it is, try another to see if that helps – I’ve known a sharp kink and movement (caused by free hanging from a desk for example) to break the cores of the cheaper cables internally at the kink making the connection poor, as if the PSU contact itself was bad. (It usually affects the 0.75mm square ones, 1.5mm square cables are more robust – That detail is in the tiny text running the length of the cable, which is either printed on, or embossed into the plastic. You may need a lens to read it.)

      Given the machine is 5 years old indicating the potential for a BIOS update, would suggest as the problem has recently manifested the BIOS is unlikely to be the cause as it worked fine for the intervening years (assuming you haven’t upgraded and just put up with an issue since..), and you need to have the machine in a stable state to avoid the potential loss of the motherboard should it reboot unexpectedly during the update process.

      The fact you saw a reboot but no blue screen is steering me towards thinking this is a hardware rather than a software issue, even if the problem is the display adapter is failing so you literally can’t see the blue (from your spec I assume you are using the on chip adapter in the processor.) which would bring us back around to cooling or the slightly freaky mechanical issues there the testing of which would involve removing and reinstalling the CPU to ensure the contacts there are sound (and probably would need some more CPU paste as its designed to migrate outwards on the chip to preferentially displace any air pockets in the paste – hard to believe, the CPU heat spreader is ever so slightly domed.

      2 users thanked author for this post.
      • #2393682

        Wow, that was some post 🙂

        When you cleaned out the machine I take it you did clean the heatsink on the chipset but did you make sure it wasn’t crowded by wires and the like?

        Can’t remember the details but I like to keep things tidy. The fan is as old as the motherboard & processor (2015) but the case is much older – a LIAN aluminium full size case.

        I have a box of input cables so can easily put another in.

        Correct, the BIOS is been the same for many years. I updated to v2801 on 14/12/15 when I installed Win 7. I have now moved to v3001.

        Yes I am using the on graphics adapter chip in the processor.

        I will wait for the next crash before trying these mechanical fixes. Hopefully the diagnostics will say if it is memory, hard disk or other.

        I would like to keep the machine for another year. I can then replace with my PC and build myself something more suitable for Win 10 🙁   – prices might have staybalised by then plus I’m very happy with Win 7 ESU. The PC is also the house/my backup PC so it’s important to fix this problem.

        Thanks again.

        Alan

         

    • #2394585

      OK, first crash since changing various settings. PC had only been on 10 mins or so.

      Despite setting auto restart to off the PC did reboot:

      Windows-Startup-Recovery-settings

      I then downloaded and ran Blue Screen View but on running it said zero crashes and there was no minidump file in C:Windows. There was a DumpStack.log.tmp in D:\

      I then downloaded CrystalDiskMark and ran DiskInfo64 as suggested by @oldguy. Screen shots below. “pending sectors” says 200/200 but there are no amber/red flags and it says OK

      CrystalMarkInfo-main-HD
      CrystalMarkInfo-second-HD

      I don’t think it is a M/B temperature issue as sometimes it happens within 10 mins or so of switching on.  Next idea from the previous posts is to change the main input power cable (kettle lead) and to remove all PSU cables, inspect sockets/plugs and replace.

       

       

       

    • #2394602

      See what Speccy tells you for your hard drive. What catches my eye are the worst conditions of 253 for the uncorrectable sector count and the write error rate. Also, CrystalDiskInfo does not appear to be showing the real values for your hard drive. Attached is a screen capture from Speccy for my WD 2TB OS hard drive.

    • #2394636

      Speccy screen shot attached.

    • #2394683

      To check for a software problem, do you have an old spare hard drive around?  Remove and put in a very safe place your real hard drives, install the spare, and do a clean install of Windows 10 to it, set it to no “automatically restart” and see how it works.

      Are you using any third party antivirus software?

      -BB

    • #2394689

      Thanks @BB. Now you mention it I have a HD that is a clone (out of date) of my Windows 7 PC – my PC is almost the same h/w spec (my Asus A88XM-Plus + AMD A8-7600  vs A88XM-A + AMD A4-5300) so the drivers might well be OK . I could try that. It’s a case of how long before a crash happens / doesn’t happen.  OTOH, as the clone is very old I could do as you say but again how long to wait.

      Are you using any third party antivirus software?

      Sophos. I noted today that Defender is disabled – presumably done by Sophos.

       

       

       

    • #2394690

      Have to agree it all looks fine (Crystaldiskinfo didn’t do a  great job of labelling the data  columns – the drive values are on the right, it makes more sense in a saved log.)

      It still looks as if there is something causing such disruption Windows is unable to resist restarting long enough to write anything (or is unable to do so) which has a definite hardware thing about it. When I’ve seen his sort of thing it’s either been failing hardware (in which case arranging extra cooling often extends the periods of correct operation) or a construction issue usually involving a PSU wire or such being grounded (or a screw trapped beneath the motherboard, as that represents one of the few ways to get an unexpected interrupt as the hardware this century is plug and play – the PC interrupts are historically edge triggered so there would also likely be some mechanical sensitivity there.. I suspect there is none as you haven’t mentioned it.. for this sort of interrupt problem you could try turning off hardware at a BIOS level maybe but I assume motherboard sound is off as you have a soundcard, which really only leaves the network port as a candidate here.).

      To give you an idea of the flavour of the hardware fail which might cause this, at one point we built in the gigabyte GZ-X1 which has a rear case fan. In previous builds in that case we secured the case fan wire under the back of the cards to take up the slack. A couple percent failed after a year or two as the knockout used by one builder to put the wire behind the PSU and motherboard plate lacked a rolled edge at the corners, where it slowly sawed into the insulation with movement, and a sudden fan power short whilst, “polyfuse” protected (so nothing was ever damaged and power returned when the fault was cleared), was still enough of a power event to trip the PSU “power good” circuit, causing the grey wire from the PSU to go low firing a CPU hard reset (which doesn’t give Windows any chance of fixing anything or noting anything – Windows knows a bad shutdown occurred because the paging file and various other temporary items are left uncleared). I was lucky – it reset when I touched the wire while reaching in to unplug the reset button from an adjacent header.

      Suggest having a really good look at the bits in the rig you only moved while you were working as you’ve reseated the main components. Wires which run out of sight or are confined by items not intended to retain wires might need closer scrutiny. You could monitor the grey wire with a multi meter maybe – though the pulse (5V down to 0V) is likely to be very narrow and the connection is confined so its not easy. Unfortunately beyond that you would have to look at swapping other components to see if any affected the fault but as those parts would likely be “used” to be a like for like substitution, so might have undetermined issues themselves. Thus going about it that way, things can get both expensive and unclear.

      Perhaps it’s time to get an old second hand drive from somewhere (even a discarded PVR), disconnect all your usual drives and install Windows 10 (for ease) on the gash drive. If the machine can achieve that without problems then that alone indicates a degree of stability. You could then use some demo test software (such as microscope (micro2000), burn in test (Passmark), and maybe some of the old AMD GPU demos (even Ladybird) which might still work (where you can get them from the right sites, obviously!) and exercise the core hardware thoroughly and see if the machine still reboots indicating a hardware issue, or if it is stable indicating it might be time to think about reworking the software. You have the tools to check if the gash drive develops issues already..

      The few days grace on activation should be enough to determine if the machine is stable. Also, remember to set the power scheme so nothing goes to sleep.. and if you’re throwing away the install afterwards you can turn off Windows update and test off line so that won’t interrupt or reboot either.. so if you leave the software running  and open notepad and type something, and its fine the next morning but later the software isn’t running and notepad has gone the machine has rebooted, as opposed to the demo has expired and closed or such..

       

      2 users thanked author for this post.
    • #2394792

      Thanks @oldguy. I’ve

      • replaced the power (kettle) cable
      • removed all PSU cables to M/B and checked contacts and pins. All look fine.
      • all cables look fine (see attachment). It’s an ATX M/B inside a full AT case so plenty of room
      • removed the reset cable.

      All fans running on power on.

      I did notice that there is tape on another front panel cable. It’s the PWR & HD LED lights. I will disconnect these if the problem persists. I have also found a 250GB drive in the spares box and will install Win10 on that if the problem persists. So far has been running 4 hours without crashing.

       

    • #2394807

      Actually having reviewed your photo I need to adjust my previous comment – The PS2 port also uses an interrupt, so along with the network card and (hopefully already disabled) on board sound card would need to be excluded should the fault finding leave you at the unexpected interrupt / motherboard side of the fault tree. To be honest the easiest way would be to swap to a USB, as though a “short pin” in the PS2 plug can cause such an effect by causing an  intermittent connection, the socket is more likely to be damaged and those contacts are not something you can inspect anyway.

      1 user thanked author for this post.
      • #2394921

        Thanks @oldguy for your further thoughts. Yes, the PS2 port is being used for a mouse. It’s a simple job to try a USB mouse. I did wiggle the cable at both ends and there was no fault generated.

        I also have a PS2 keyboard but that’s more difficult to change: 1) it’s a Swedish keyboard, not so easy to get one in the UK and 2) I’ve run out of USB ports (I think I have a hub somewhere). I also wiggled that cable without fault – it’s much thicker and less subject to movement anyway.

        I left it running all night without sleep enabled and it’s still running +15 hours later. I know if it has rebooted as being Professional version it restarts at the login screen rather than the desktop.

         

    • #2394867

      This is not a laptop but a desktop, yes?  I’m going to suggest something dumb.  Have you tried blowing the dust bunnies out?

      Susan Bradley Patch Lady/Prudent patcher

      • #2394920

        Hi Susan. It’s a desktop. It was cleaned when the problem first manifested itself and the PSU was changed at the same time as it pre-dated the build. There is a photo attached 2 posts above. Thanks, Alan

    • #2394951

      On the PS2 keyboard – It won’t cause the problem unless its literally disrupting the power as it doesn’t generate an interrupt via its controller, its output is polled by the RTC clock in the old school generic AT designs, the job is just handled by the chipset now. I think you’re just having to narrow the cause down to either hardware or software, and then from there to component level. It could take a while but you might get lucky.

    Viewing 17 reply threads
    Reply To: Random crashes then auto reboot: Win10 20H2 or h/w?

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: