Frequent ZeroShell crashes, version 3.4.0

Home Page Forums Network Management Signal a BUG Frequent ZeroShell crashes, version 3.4.0

This topic contains 4 replies, has 0 voices, and was last updated by  TheNanny 2 years, 9 months ago.

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #44492

    TheNanny
    Member

    Hello all.

    Last weekend I tried to update my ZeroShell-router from Version 3.2.1 to 3.4.0.
    So via the webinterface I shut ZeroShell down (after an uptime of more than 220 days) and replaced LiveCD 3.2.1 with LiveCD 3.4.0.
    It’s runnning on an Intel PC.

    Now I have the problem, that the router reboots very often. Once it was running for 2 hours, but sometimes just for 10 minutes.
    I think the reboots are caused by system crashes, so I checked all logs. I can only find log entries for the boot sequences of ZeroShell, but nothing about what happens before to cause the restarts.

    Then I tried version 3.3.2, same behavior.

    After going back to version 3.2.1 the Router runs stable again, no reboots, the uptime right now is about 15 hours.

    Can someone help me with finding a solution?

    Thank you in advance.

    #54025

    TheNanny
    Member

    Hello.

    Unfortunately, I still have no solution.
    Can anyone help me?

    #54026

    TheNanny
    Member

    Hello.

    I now tried the latest version 3.5.0, but still the same behavior. ZS reboots every 2-16 hours.
    Since a few days I again try to debug the crashes with the help of log files and all I can find on the issue on google. It looks that in the moment of the crash the kernel can’t do anything more but reboot. No entries in log files, no crash dumps, no messages or any data on SSH connection (for example dmesg outputs).
    SSH connections are not closed by ZS in the moment of crash. When I try to use the SSH terminal after the crash it seems to be still connected, and it comes with the message “pipe broken”.
    I have some experience with linux systems. I never encountered a behavior like this before. Crashes of Linux are very rare and I never encountered a crash without any hints in log files or on the screen just like pushing the reset button. If I had to guess I would assume a hardware failure of the PC. But when I boot the system again with the 3.2.1 version, everythings is perfect again.

    For my needs, ZS is best router distribution available (thank you Fulvio for your work). I also want to benefit from secrity issue fixes so I would be really glad if someone could help me with this.

    #54027

    squigley
    Member

    This was happening to me. I had a VGA monitor connected so I would see the kernel crash output. From that I deduced it was being caused by the NIC I had in there, a Compaq ThunderLAN dual port card.

    Once I took that NIC out and replaced it with a single port 3Com NIC, it stopped crashing and rebooting.

    #54028

    alaust
    Member

    i have the same issue

    #54029

    TheNanny
    Member

    Hello all.

    Since about a month I’m running the ZeroShell 3.5.0 now without crashes (uptime is now about 28 days). I’m still not sure why ZS 3.2.1 was stable on my PC and later versions crashed frequently.
    My problem while debugging the crashes was that they occurred randomly. It was no option to sit in front of a monitor waiting for a possible crash related output because there were no crashes when I waited for them in front of the monitor…

    Since I assumed from the beginning that the reboots were caused by crashes (as you know in Linux kernel they are called ‘kernel panics’) I read a lot about debugging those kernel panics.
    I still found no way to catch the panic output and I also found no way to activate a crash kernel like on an Ubuntu system, which handles the panic outputs of the kernel and writes them to the log files.
    So I had another idea. I installed ZS to HDD and switched the console output from VGA to serial interface. Then I connected the serial port of my router with a null modem cable to a second PC running a terminal session. So I was able to write all of ZS console outputs to a file and finally I could catch the kernel panic output.

    The kernel panics were caused by MCEs (machine check exception). I googled a lot and learned that MCEs are usually caused by hardware malfunctions. So I ran a lot of tests on the routers’ hardware without any errors.
    Then (after I googled even more) I found the hint, the MCEs also can occur because of bugs in the Intel processors firmware called ‘microcode’. One way to update this Intel microcode is upgrading the BIOS and I really found a newer BIOS version for the routers’ main board containing a newer version of the microcode.
    After updating the BIOS the router runs stable with ZS 3.5.0 and the frequent crashes are gone.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.