10% packet loss using NAT/LoadBalancer

Home Page Forums Network Management ZeroShell 10% packet loss using NAT/LoadBalancer

This topic contains 9 replies, has 0 voices, and was last updated by  nmadrane 7 years, 11 months ago.

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #42980

    nmadrane
    Member

    Hi,

    I recently noticed that we had quality problems with our SIP communications when using Zeroshell. After hours of trials, I tried to check the packet loss by using WinMTR, a small ping-like tool that shows you for each node on your route what is the percentage of packet loss.

    And the results are very strange : I have a 10% packet loss during NAT traversal !

    My setup is very simple : I activated NAT between 2 LAN segments (on 2 separate network cards). I am sitting on the first LAN segment (from a Windows XP computer running WinMTR). The second LAN segment contains 2 Internet gateways and I have activated the load balancer.

    I am using the lastest version of Zeroshell on a P4 2.4 Ghz / 2 MB RAM. The CPU usage of this box is very low when I do my tests with WinMTR, so it can’t be a hardware performance problem.

    I thought it was a hardware issue, so I tried to change the network cards. Same problem. I also checked the cables. The problem only occurs when I go through the Zeroshell box, and WinMTR says that packet loss occurs right between my 2 network cards, i.e. during NAT.

    Cheers

    nmadrane

    #51743

    atheling
    Member

    Have you installed my net balance and QoS patch?

    I’m not familiar with winMTR, can it tell you if its the echo request packets were dropped or if it is the echo reply packets?

    #51744

    nmadrane
    Member

    I didn’t know that there were a patch for netbalance/qos. Where can I download it ? I would like to try it.

    For WinMTR I don’t know if it talks about the request or reply packets.

    #51745

    atheling
    Member

    Current patch is at http://dl.dropbox.com/u/19663978/ZS_nb_qos_b14_a.tar.gz

    There is an issue with it if all links go down. I’m working on that. But the basic load balancing it should work better with the patch than with the stock configuration. Instructions for installation are in the .tar.gz file.

    For SIP, you should probably also make sure the SIP helper module for iptables is installed. I have the following in my Pre-Boot script:

    modprobe nf_nat_sip

    And you should have your firewall set to allow “related” as well as “established” connections. Between these two items, the Linux routing stack in ZS will sniff out the ports it needs to open allow for media transport from the SIP dialog.

    #51746

    nmadrane
    Member

    Thank you for the information. I will try with your patch. However it seems very strange that the raw installation of Zeroshell gives me such problems with a simple NAT/loadbalancing. I am pretty sure that something in my configuration is wrong. After all I am just pinging… and there is no reason to loose packets.

    Could anyone try WinMTR on his system to see if it has the same issue ? I ran WinMTR, I specified an interval value of 0.2 instead of 1.0 (in the options menu) and then I simply tried to ping http://www.google.com.

    On the route, all nodes gives me 0% loss except the very first node… the one that corresponds to the NAT traversal….

    #51747

    atheling
    Member

    Looking at the web site for WinMTR, it seems to be a GUI version of the Unix traceroute utility.

    I did a little playing around with traceroute on a machine behind my ZS router and found that if I give a fairly large count that about 10% of the responses are lost on the hop from my Unix box and the ZS router. But if I just ping the box I get 100% of my ping responses back. And I get 100% of the responses back from boxes beyond my ZS router.

    traceroute, and I would guess WinMTR, rely on having a small hop count/TTL in their packets so that the packet will expire in some hop prior to reaching the final destination. It also relies on the machine handling the packet when it expires to return a message to the effect that the packet died there. It looks like there is an issue in the generation of that in ICMP message in ZS. However that does not seem to be affecting other traffic that is going through the box.

    And, from experience, I know that SIP phone calls can work well through a ZS box. So while you have uncovered an interesting area that needs investigation, I think your poor quality SIP connections has a different cause.

    What other traffic do you have going through your ZS box? Do you have any QoS rules that would put the VoIP traffic ahead of other traffic?

    #51748

    nmadrane
    Member

    Hi,

    I don’t have any QoS rule in my ZeroShell box. And the only traffic is SIP. However I tried to look at how HTTP behaves : the user experience is bad, pages are slow to load. When I go directly to the Internet, bypassing the ZeroShell box, the navigation is much better.

    I thought that the main difference between traceroute and WinMTR is that WinMTR is able to tell you the percentage of packet loss at EACH node on the route. So if you do A->B->C->D, WinMTR could perfectly tell you that there is a 10% loss at B, but 0% loss at C and D. Is that what you get with the linux traceroute ?

    By the way, do you know any other tool like WinMTR/traceroute that I could try in order to investigate this issue ? Otherwise, I think I will rebuild the whole ZeroShell box from scratch with a completely new hardware 🙁

    Thanks a lot for your time.

    #51749

    mgibbons
    Member

    Pingplotter is another good utility to trace packet loss at each hop.

    If you turn off load balancing and use just one WAN line do you experience packet loss?

    This can determine if it is NAT traveersal causing the problem or the Load Balancer.

    KR’s

    Mark

    #51750

    nmadrane
    Member

    I have news. I turned off NAT and used only Load Balancing (by using the same network card for balancing between several Internet lines). And the results are the same : 10% packet loss at the Zeroshell box.

    Here is my setup :
    My PC is 192.168.0.100
    My ZS box is 192.168.0.1
    My Internet gateways are 192.168.0.2, 192.168.0.3 and 192.168.0.4

    NAT is disabled
    Load Balancer is activated (192.168.0.2 is the only gateway enabled)

    A traceroute (WinMTR) on http://www.google.com gives the following :

    Hostname_______Loss _____Sent______Received
    192.168.0.1______11% _____396_______355
    192.168.0.2______0%______1316______ 1316
    193.194.50.130___0%______1313______ 1313
    etc.

    #51751

    atheling
    Member

    Using traceroute on a box on a LAN behind the ZeroShell router/firewall, I see a very consistent pattern where the 7th request is not responded to and then every 6th request thereafter. End result is that 1/6 are being dropped (16.6%). If you only use a count of 10 as if you have a 10% loss rate.

    But the packets are not actually being “lost”. The way these trace tools work is by sending UDP packets with a small time to live (TTL). The IP specifications say that the TTL value is to be decremented by one on each hop and if the TTL goes to zero the packet is to be discarded and a ICMP “Time to live exceeded” message is to be sent back to the originator. What is happening is one of the two:
    1. The TTL was ignored and the packet forwarded on.
    2. The TTL was processed but the ICMP message was not generated or was sent to the wrong destination.

    But as I mentioned in my earlier reply, this odd behavior when generating ICMP Time-to-Live Exceeded messages does not seem to affect traffic through the ZeroShell router. It is something that does need to be looked into but it is not what is causing your performance issues. If it were then all the machines past the ZeroShell router would also exhibit at least a 10% packet loss too as far as any measurement you could make from inside your LAN. That, by your own postings, is not happening.

    I am a little confused about your setup where you have

    Here is my setup :
    My PC is 192.168.0.100
    My ZS box is 192.168.0.1
    My Internet gateways are 192.168.0.2, 192.168.0.3 and 192.168.0.4

    All of your machines are on the same subnet? That does not make sense to me….

    #51752

    nmadrane
    Member

    Actually, Zeroshell is able to do load balancing even on the same subnet. In my original setup I had 2 NICs/subnets : one for my computers (192.168.0.X) and one for my Internet gateways (10.0.0.X), so NAT had to be activated.

    In order to isolate my problem of packet loss I tried to put everything on the same subnet, so that I can disable NAT and keep only load balancing.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic.