[Janus / twin Alix2] #7 Auto-switch prime to backup and back

Home Page Forums Network Management ZeroShell [Janus / twin Alix2] #7 Auto-switch prime to backup and back

This topic contains 1 reply, has 0 voices, and was last updated by  PatrickB 3 years, 10 months ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #44358

    PatrickB
    Member

    Hello.

    We are close to the end now…

    Today I give you my solution to make the backup server watch the prime and become the LAN Master in case it has disappeared, then restitute the role when the prime is back.
    This role includes:
    – the gateway,
    – the DN server,
    – the WiFi connection point,
    – on WAN side, the unique IP set as DMZ in the DSL box.

    As explained in posts #3 and #4 of the saga, for the DHCP server and the NetBios Browse Master, the switching is implicit.

    Considering the definition of the DNS local zone (post #4), the active DNS is where the IP .1 is up, and it is the same for the gateway. Then we just (not really, too simple :P) have to activate it on the backup server.

    Problem #1 – Watching the prime server

    Initially I wanted to rely on Samba (nmblookup Janus1) since there is already automatic arbitration there.
    Very bad idea because the stability is quite long to establish, and notably at boot, since the name is not found, the backup server competes with the prime and this is a mess on the LAN 😡

    Currently I prefer to ping the administrative IP of the prime (.11). It works fine to detect if it is down. If it was in a zombie state it would be harder to detect.
    ❓ Still searching an idea there…

    The job is done with a cronjob run every 2mn on the backup server only:

    # Bash script: WatchJanus1-Cron

    PATH=/opt/bin:$PATH

    DoIt="no"
    if [ "$(ping 192.168.xxx.11 -c 2 | grep '100% packet loss' | wc -l | xargs echo)" == "1" ]; then
    DoIt="yes" ;
    fi

    echo Action = $DoIt

    /opt/bin/be-lan-master.sh $DoIt

    The script be-lan-master.sh decides when switching is required.

    It is not so simple to see whether an IP is up on an interface which has several:

    # when 192.168.xxx.1 is up:
    root@janus2> ifconfig -a -v BRIDGE01:00
    BRIDGE01: Link encap:Ethernet HWaddr 00:0D:B9:XX:YY:ZZ
    inet addr:192.168.xxx.1 Bcast:192.168.xxx.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

    # when 192.168.xxx.1 is down:
    root@janus2> ifconfig -a -v BRIDGE01:00
    BRIDGE01: Link encap:Ethernet HWaddr 00:0D:B9:XX:YY:ZZ
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

    So be-lan-master.sh compares the query and current state this way:

    ZtIsLanMaster="no"

    if [ "$(ifconfig -a -v 'BRIDGE01:00' | grep '192.168.xxx.1')" != "" ]; then
    ZtIsLanMaster="yes" ;
    fi
    if [ "$ZtMode" == "$ZtIsLanMaster" ]; then
    # Nothing to do
    exit 0 ;
    fi

    # Here we have to switch
    ...

    Problem #2 – Switching

    The administrative IP (.12) must keep up whatever happens, else the next step, if a reboot does not save us, is to restore the factory settings and to play again from the very beginning 😯

    ➡ I recommend to add such a thing in the PostBoot script, in case of:

    # Safety feature: restore the LAN interface in case it was set down !
    ifconfig BRIDGE01:02 192.168.xxx.12/24 up
    ifconfig ETH01 up

    Before experimenting, I explored the device to locate and name safely all the items to be used for switching:

    # The interfaces and IP addresses
    ZtWiredLanInterfaceName="ETH01"
    ZtAdminInterfaceName="BRIDGE01:02"
    ZtAdminIP="192.168.xxx.12"
    ZtMasterInterfaceName="BRIDGE01:00"
    ZtMasterIP="192.168.xxx.1"
    ZtWifiLanInterfaceName="WLAN00"

    # Wan side: take the unique IP set as DMZ in the DSL box
    ZtWiredWanInterfaceName="ETH00"
    ZtBoxDmzInterfaceName="BRIDGE00:01"
    ZtBoxDmzIP="192.168.yyy.103"

    In the principle it should be very simple:

    if [ "$ZtMode" == "yes" ]; then
    echo -e "$(DateForLog) - Janus2 becomes LAN Mastern" > $ZtReportEmailBodyFile ;
    echo -e "Activating IP = $ZtMasterIP" >> $ZtReportEmailBodyFile ;
    ifconfig $ZtMasterInterfaceName $ZtMasterIP/24 up ;
    elif [ "$ZtMode" == "no" ]; then
    echo -e "$(DateForLog) - Janus2 stops to be LAN Mastern" > $ZtReportEmailBodyFile ;
    echo -e "Deactivating IP = $ZtMasterIP" >> $ZtReportEmailBodyFile ;
    ifconfig $ZtMasterInterfaceName $ZtMasterIP/24 down ;
    else
    echo "$ZtThisScript: Unexpected directive '$ZtMode' !" ;
    exit 1 ;
    fi

    Then I experienced issues with the switch my servers are plugged to: it does not see the change, then it won’t route the IP .1 to its new owner ! Symptom: the ping fails.
    So I switch the physical interface down and up to force the switch to update:

    # In both cases we may have to force the switch to rebuild its routes.
    # - turn the physical interface off for a second:

    echo -e "Switching down and up the wired interface $ZtWiredLanInterfaceName" >> $ZtReportEmailBodyFile
    ifconfig $ZtWiredLanInterfaceName down
    sleep 1
    # < !> If the next one was failing, we would have a big problem !
    # It is prudent to force it up in the PostBoot script, in case of...
    ifconfig $ZtWiredLanInterfaceName up

    Then another issue: the DNS also did not understand. Symptom: on the server itself, in the ZS GUI, DNS page, the ‘DNS Lookup’ works using localhost, but not using ‘192.168.xxx.1’.
    It is fixed by just restarting the service:

    # Need to restart the DNS so that it integrates the change
    echo -e "Restarting the DNS service" >> $ZtReportEmailBodyFile
    /etc/rc.d/init.d/dns restart | sed -e 's/.[[0-9;-]*[A-Za-z]//g' >> $ZtReportEmailBodyFile

    The SED filter is to remove the escape sequences that display a right aligned green [ OK ] because some mailers (like Thunderbird) get confused and display a void email 👿

    Thing to now: the IP switched up or down this way is not updated in the GUI, and in case of reboot the manual settings would be restored. But I don’t need the change to be persistent, then for such a feature the solution works fine.

    The result

    After that, the gateway, DNS and all other features have switched. Janus2 replaces Janus1 within 2 mn, does it only when it disappears, and gives it back the role within 2mn after it reappears. This delay could be reduced but I don’t want to flood with pings.

    The same way it is possible to switch the WiFi and the DMZ IP on Wan side (usually a DSL box accepts only one and filters nothing for it, the performances may be better).

    At the end, my script checks again the state of the interfaces and active IPs to report the operation through an email, but this is out of topic.

    It works very fine and achieves the purpose of twin servers 😀

    Hope the tips can help someone.

    Ideas for improvements are welcome.

    Best regards.

Viewing 1 post (of 1 total)

You must be logged in to reply to this topic.