CF card failures – WARNING.

Home Page Forums Network Management Embedded Devices CF card failures – WARNING.

This topic contains 5 replies, has 0 voices, and was last updated by  KLGIT 7 years, 10 months ago.

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #42935

    KLGIT
    Member

    I wanted to post this to share my experience and to hopefully save my fellow zeroshell users from router failures.

    We currently have 3 zeroshell powered routers in production use. They are used to manage bandwidth for our networks and handle quite a load 24/7-365. Zeroshell has run very well in this role and so have the embedded appliances we use to run it on.

    Here is our configuration:
    Router HW
    – IBT FWA7304G fanless router appliance with VIA Eden chipset and automatic LAN bypass on powerfail
    – various brand, high quality 4GB consumer Compact Flash (CF) card
    – 1GB RAM
    – Zeroshell

    Again, this configuration has been excellent and I highly recommend it with one exception, the Compact Flash Cards.
    DO NOT USE CONSUMER GRADE CF CARDS, THEY WILL FAIL!
    The failure is guaranteed, and predictable. So far all of these routers have had CF card failures and all them happened in the same time period. For our use, they last approximately 13 months and then fail.
    Whether this is due to a write cycle failure or exposure to higher heat than normal consumer use I do not know for sure. Likely a combination.

    Do yourself a favour and buy industrial grade CF cards. These cards are not only rated for wider temperature ranges, but the also have a degree of magnitude more write cycle lifetime.

    Currently I am testing Transcend P/N TS4GCF150, 4GB 150x Industrial Class CF cards. I guess I’ll know in about 13 months if it makes a difference or not.

    #51673

    SysEngBD
    Member

    Did you happen to have a SanDisk Ultra CF fail?

    When I bought our Soekris boxes I opted to get everything from them (Soekris) and the CF they sell is SanDisk Ultra. Both of the routers have been in production for 10 months so I’m wandering if I should get worried…

    (I’ve got an ALIX running ZS as well but it has a Transcend Industrial CF)

    Cheers,

    #51674

    KLGIT
    Member

    @sysengbd wrote:

    Did you happen to have a SanDisk Ultra CF fail?

    When I bought our Soekris boxes I opted to get everything from them (Soekris) and the CF they sell is SanDisk Ultra. Both of the routers have been in production for 10 months so I’m wandering if I should get worried…

    (I’ve got an ALIX running ZS as well but it has a Transcend Industrial CF)

    Cheers,

    I don’t know specifically as I didn’t keep the first failed CF since I figured that failure was a fluke. So far I’ve had Verbatim, Kingston and I believe a Sandisk CF card failure. All were generic consumer grade CF cards purchased from local electronics stores.

    My recommendation to you is that you backup your profile every time you make changes to your Zeroshell boxes. I save my ZS profiles on our file server. If you do have a failure, you can be back up and running pretty quick.

    If you have a situation where you’re using a ZeroShell box with a configuration that will be pretty static once it’s initially setup then you should make backups at that time. For example, setup your router, make all your configs, test and tweak, then backup the profile. Now image that CF card to a file (I use Linux to do this, eg. dd if=/dev/ of=/server/backup/routers/ZS01/ZS01.img bs=4096).
    Then if you keep a few blank CF cards around, you can quickly replace a failed unit by imaging the backup onto a blank and slapping it into your ZS box.

    As for worrying about your configuration, not to spread panic, but you’re approaching the time frame when mine started to fail which was between 10-14 months. I guess you’ll find out if your San Disk CF’s are industrial grade or not.

    I was half tempted to try microdrive CF’s but that kinda defeats the purpose of using flash based storage in the first place.

    Anyway, get out ahead of this and take the chance to bring down your ZS boxes during off peak hours, image the CF cards, and get some spare CF’s in stock. Then you’ll be prepared for the worst.

    Honestly, I wish there were SSD’s in CF form factor.

    #51675

    KLGIT
    Member

    BTW, I don’t think Soekris is really at fault on that.
    For most routers designed to run from flash, writes to the flash devices are minimal or non-existent. Many devices log to ramdisk and update to the flash in bulk (ie daily, hourly etc.). This and other design choices greatly enhance the life of the Flash device.
    In the case of ZeroShell, this is a system designed without these optimizations that happens to also run from Flash. It was really designed for use with a hard drive. I think this makes it much harder on Flash storage when it comes to write cycles. Now add to that the heat the CF card has to deal with in an embedded router device, especially as compared to in a camera, and you have a recipe for early failure.

    The only real solutions (that I see) are:

    – Create a ZeroShell Flash edition with optimizations for reducing writes to storage
    – Setup ZeroShell to mount a remote network share and do all or most writes to that

    #51676

    SysEngBD
    Member

    Thanks for the replies. I have backups of the profiles and images of the CFs on our files servers. The routers are both at remote locations that I won’t be able to access for a few months so I’ve dropped these lines in my PreBoot scripts trying to minimize CF writes, the tradeoff being that I’ll also lose logs and MRTG stats between power cycles.

    mount -t tmpfs -o size=64m,mode=1777,nosuid,nodev,exec tmpfs /tmp
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/run
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/lock
    mount -t tmpfs -o size=64m,mode=755,nosuid,nodev tmpfs /Database/LOG
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/counters
    mount -t tmpfs -o size=32m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/html

    I’m fairly certain this should buy me some time until I can access the units.

    Cheers,

    #51677

    SysEngBD
    Member

    @sysengbd wrote:

    mount -t tmpfs -o size=64m,mode=1777,nosuid,nodev,exec tmpfs /tmp
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/run
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/lock
    mount -t tmpfs -o size=64m,mode=755,nosuid,nodev tmpfs /Database/LOG
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/counters
    mount -t tmpfs -o size=32m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/html

    Should be:

    mount -t tmpfs -o size=64m,mode=1777,nosuid,nodev,exec tmpfs /tmp
    mount -t tmpfs -o size=64m,mode=755,nosuid,nodev tmpfs /Database/LOG
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/counters
    mount -t tmpfs -o size=32m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/html

    /var/lock and /var/run as tmpfs is still untested on my part, the others were, but I haven’t had time to test how those two perform between reboots.

    Cheers,

    #51678

    SysEngBD
    Member

    @sysengbd wrote:

    @sysengbd wrote:

    mount -t tmpfs -o size=64m,mode=1777,nosuid,nodev,exec tmpfs /tmp
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/run
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/lock
    mount -t tmpfs -o size=64m,mode=755,nosuid,nodev tmpfs /Database/LOG
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/counters
    mount -t tmpfs -o size=32m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/html

    Should be:

    mount -t tmpfs -o size=64m,mode=1777,nosuid,nodev,exec tmpfs /tmp
    mount -t tmpfs -o size=64m,mode=755,nosuid,nodev tmpfs /Database/LOG
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/counters
    mount -t tmpfs -o size=32m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/html

    /var/lock and /var/run as tmpfs is still untested on my part, the others were, but I haven’t had time to test how those two perform between reboots.

    Cheers,

    mount -t tmpfs -o size=64m,mode=1777,nosuid,nodev,exec tmpfs /tmp
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/run
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /var/lock
    mount -t tmpfs -o size=64m,mode=755,nosuid,nodev tmpfs /Database/LOG
    mount -t tmpfs -o size=16m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/counters
    mount -t tmpfs -o size=32m,mode=755,nosuid,nodev tmpfs /Database/var/register/system/mrtg/html

    All these seem to work well between reboots and function as expected. I would think having /var/run /var/lock/ and /tmp as tmpfs would have been a good idea as a matter of course, having logs and MRTG stats on tmpfs is just for serious write cycle saving…

    Thanks again for the warning KLGIT. I had assumed CF saving features had been included in Zeroshell since it is a distro aimed at embedded machines.

    Cheers,

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.