Very bad bonding performance

Home Page Forums Network Management Linux and Networking Very bad bonding performance

This topic contains 6 replies, has 0 voices, and was last updated by  BuddyButterfly 9 years, 5 months ago.

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #41144

    I have set up bonding of two ADSL lines to one root server.
    Both ADSL lines are 16Mb/s. Measured downloads on the single lines
    where 1.7MB/s and 1.5MB/s. So I expected the bonded lines to have
    a rate of about 2.5-2.8 MB/s (factoring in the overhead of vpn and
    NAT, etc.). It turns out that TCP performance of the bonded lines
    are not better than a single line and most of the time even worse.
    It regularly goes up 1.1MB/s and down to 200kB/s. Sometimes download speed is as
    expected for the first 50MB-60MB of a download of about 2.6MB/s before
    it drops again. The most worse thing is that multiple connections do not
    some up to the bandwidth. When one download is at 1MB/s and another is
    started they both sum up to 1MB/s each one dropping to 500kB/s.

    I have tried lots of things to get rid of this problem:

    1. Setting tcp reordering to 127
    2. Setting tcpqueuelength to 1000
    3. Setting tcp window sizes etc.

    I might also have done something wrong in the configuration. That is
    why I am asking here in the forum. The topology is:

    1. One linux server running debian. This linux box runs 2 openvpn servers.

    2. 2 standard ADSL lines each with a router doing NAT which are connected
    via LAN to two interfaces on the debian server. The routers are registered
    dynamically with dyndns.

    3. A root server running debian connected with 100Mb/s to the internet.
    The root server runs 2 openvpn clients which connect to the 2 servers via
    the registrated dyndns entries to the linux servers two openvpn servers.

    4. Openvpn servers are configured to have 10.20 and 10.21 net (with 255.255.0.0 mask).

    5. Bond interfaces on both side are configured like

    linux server: bond0: 10.10.0.1 /16
    root server: bond0: 10.10.0.2 /16

    6. The root server runs shorewall which does masquerading for the bond0 interface.

    7. Default root on the linux server is 10.4.0.2

    Bond configuration on root server is as follows:

    Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006)

    Bonding Mode: load balancing (round-robin)
    MII Status: up
    MII Polling Interval (ms): 0
    Up Delay (ms): 0
    Down Delay (ms): 0
    ARP Polling Interval (ms): 1000
    ARP IP target/s (n.n.n.n form): 10.10.0.1

    Slave Interface: tap0
    MII Status: up
    Link Failure Count: 2
    Permanent HW addr: bc:cd:8c:18:bc:7c

    Slave Interface: tap1
    MII Status: up
    Link Failure Count: 1
    Permanent HW addr: d2:d5:e2:63:3e:1a

    On the linux server:

    Ethernet Channel Bonding Driver: v3.2.3 (December 6, 2007)

    Bonding Mode: load balancing (round-robin)
    MII Status: up
    MII Polling Interval (ms): 0
    Up Delay (ms): 0
    Down Delay (ms): 0
    ARP Polling Interval (ms): 1000
    ARP IP target/s (n.n.n.n form): 10.10.0.2

    Slave Interface: tap0
    MII Status: up
    Link Failure Count: 2
    Permanent HW addr: 14:be:f8:eb:f7:b1

    Slave Interface: tap1
    MII Status: up
    Link Failure Count: 1
    Permanent HW addr: 32:cd:7c:d0:7a:d6

    Failover is working flawlessly.

    My questions:

    1. I something wrong with the setup?

    2. Should the bond interfaces on both side have a different net? Or is the config ok?

    3. Could the version difference of the bonding driver be the reason?

    4. I recognized with wireshark that there are lots of out
    of order packets and fast retransmits happening. Out of orde packets seem to occurr
    inherently in bonding. Is there any chance at all to get a good performance
    with tcp via bond?

    #46779

    imported_fulvio
    Participant

    I am waiting to try OpenVPN Bonding with next release of Zeroshell that will include the net balancer. By using this, it is easier to set up a load balancing and failover VPN. When I will know more details about the vpn aggregation I will post here.

    Regards
    Fulvio

    #46780

    Hi fulvio,

    thanks for the information.
    The information in the setup 7. is wrong. The default points to 10.10.0.2 of course. Otherwhise it won’t work.

    Do you have experience with the current bonding in zeroshell regarding
    performance of tcp connections?

    Thanks a lot.

    #46781

    imported_fulvio
    Participant

    I know that other people had your same performance issues. I have not investigated yet.

    Regards
    Fulvio

    #46782

    martinhill
    Member

    it depends on what you want – but it is not possible to say load balancing is better than bonding.

    may need to watch out for certain devices that claim to bond but actually load balance. they’re both easy to set up. bonding is the best choice for just improving upload and download bandwidth capacity. load balancing will be better for if you are using voip. failvoer still available in both technolgioeis

    see mushrom and xrio ubm devices. these will do both – inexpensive out of the 2 is xrio.

    hope it helps

    #46783

    satir
    Member

    Is there any update on this? Did you solve your problem BuddyButterfly?

    I am facing the same issue: There are two (or more) ADSL lines, each with a theoretical bandwidth of 16 Mb/s down and 1 Mb/s up. Over each ADSL line an OpenVPN connection is established to a server in a datacenter with a 100Mb/s link. This server acts as a gateway. The OpenVPN connections are bonded together with the linux bonding module mode 0, so that every connection can consume the whole bandwidth.

    I’m struggling with this since weeks without finding any viable solution, so maybe someone with more experience can shed some light on this.

    First I’m going to provide some speed measurements, which were made downloading a single file over FTP . The file was transferred from the server in the datacenter to the machine on the other side of the VPN, so there were no other stations except Internet routers involved.

    • Over an OpenVPN connection without any bonding: 13,3 Mb/s download with 0,28 Mb/s upload inside the tunnel
    • Over a bond which consists of 1 single VPN connection (pointless but just for testing): 13,2 Mb/s download with 0,28 Mb/s upload inside the tunnel
    • Over a bond which consists of 2 VPN connections: 15 Mb/s download with 0,87 upload inside the tunnel

    As you can see, the upload of the machine which was receiving the file is increasing tremendously as soon as a second connection is added to the bond. Why is this happening? Or actually: How can this be solved?

    The above numbers are all taken inside the tunnel. Since OpenVPN adds about 30 bytes per packet(!), replys are small and a lot of them are sent, this is increasing the load on the upload about 30% to 50%. The ADSL lines only have 1 Mb/s upload, so I think that my download speed with the bond is not higher because the replys of the receiving machine get dropped since the upload is saturated.

    Also the speed of the bonded connection with 2 VPN slaves is very unsteady. Most of the time its about 11-14 Mb/s, sometimes it goes up to 25 Mb/s for a few seconds just to fall down to 2 Mb/s shortly after. I assume that with 25 Mb/s downspeed the machine can’t get out its replys fast enough, because the upload is full, and then the sender is decreasing the rate again. Downloading multiple files at once doesn’t make a difference either… Uploading files on the other hand works fine with about 1,6 Mb/s over two lines.

    I already tried tinkering with tcp_reordering, OpenVPN queue length and buffer sizes, all to no avail. OpenVPN is running with “proto udp, auth none, no-iv, no-replay” and without tls-cipher or pings to minimize its overhead. Can it be lowered even more or are there other tunnel solutions which provide a smaller overhead?

    #46784

    satir
    Member

    An update:
    I disabled TCP timestamps and selective ACKs on the side with the two ADSL links (net.ipv4.tcp_sack=0, net.ipv4.tcp_timestamps=0). That way the ACK packets became smaller in size (84 byte per packet with OpenVPN overhead compared to 124 byte before), and the FTP download faster (20 Mb/s). The upload while downloading with 20 Mb/s was 0,76 Mb/s, so much better than the 15 Mb/s from my last posting which resulted in 0,87 Mb/s upload.
    However 0,76 Mb/s upload for 20 Mb/s download over the two bonded VPN cnnections is still twice as much as I get when I “bond” only one connection. Tcpdump shows that the the machine with the two ADSL links is sending every ACK over both(!) connections, ie over both TAP devices. Every ACK gets therefore transmitted twice. Why is that?!

    Anyway I’m probably going to give up on this, since I spent already more than 100 hours on it. Even if I can solve the problem with the duplicate ACKs, I’m still facing the problem that the overhead of the multiple VPN connections are eating up half of the upload of my lines… I guess bonding is only feasible with symmetrical lines that all have the same round trip times.

    #46785

    ninnic
    Member

    Hi,

    I have the same problem but with UMTS modems, I think the problem is that one side have only one IP , I am waiting to get from my provider to try.

    In the mean time let me know if there are some news about

    bye

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.