Tuning packet size? High PPS/bw from ACKs over VPN bond link

Home Page Forums Network Management VPN Tuning packet size? High PPS/bw from ACKs over VPN bond link

This topic contains 17 replies, has 0 voices, and was last updated by  houkouonchi 4 years, 10 months ago.

Viewing 15 posts - 1 through 15 (of 19 total)
  • Author
    Posts
  • #42597

    houkouonchi
    Member

    So I noticed that when downloading from a machine outside the VPN from a server behind the VPN there are *way* more ACKs and thus packets per second sent back to the server.

    This is actually causing problems when I try downloading off a bonded connection capable of doing about 68 meg using a residential cable connection that is 75/5 megabits due to the bad ratio (download/upload). It actually causes speed issues due to the ACK’s overwhelming the upload speed on the connection.

    For example if I download from the IP of the data center side ZS box (which does the bonding) then the client sees about this usage:


    Incoming rates: 68098.0 kbits/sec
    6104.6 packets/sec

    Outgoing rates: 4621.4 kbits/sec
    6093.2 packets/sec

    As you can see the amount of packets per second (pps) for upstream and downstream is about the same.

    Now here is a download going again but this time only downloading off one of the connections (not going through the ZS VPN/bond) and thus the download goes at a bit more than half the speed:


    Incoming rates: 39043.6 kbits/sec
    3260.4 packets/sec

    Outgoing rates: 703.1 kbits/sec
    1417.2 packets/sec

    As you can see the amount of bandwidth on the upstream was about 6.5x higher earlier. Also the PPS earlier was 4.3 times the amount when normally I might only expect double? The pps/downstream for downstream is about half (which is expected).

    Now here is a download to another server (no ZS) going at about the same speed (limited via wget) as I was going through with the bond. It gets:


    Incoming rates: 67828.8 kbits/sec
    5621.8 packets/sec

    Outgoing rates: 798.8 kbits/sec
    1422.8 packets/sec

    So going double the speed the amount of PPS and bandwidth used for the upstream for the ACKs barely changed compared to going at half the speed and is *way lower* than what it was when downloading from an IP that is port forwarding to a box behind the bonded ZS VPN.

    I am guessing this is being caused by some duplicate ACKs and retransmits? Is there anyway to tune this so its not such a burden to download off the bonded machine (to the point that ACKs overwhelm the downloader on some high ratio asynchronous residential connections)?

    #50953

    ppalias
    Member

    Your vpn is TCP or UDP?

    #50954

    houkouonchi
    Member

    Its UDP. Keep in mind those values for packets/sec and upload bandwidth are machines that are just download off a machine through the VPN tunnel and thus are completely unknowingly going through the tunnel.

    #50955

    ppalias
    Member

    Another thing I can guess is the ping-keepalive functionality of openvpn. Maybe if it is tuned to ping too frequently then it may affect the tunnel. Post here a screenshot of ZS configuration for the BOND, the VPNs and netbalancer in case you have one.

    #50956

    houkouonchi
    Member

    @ppalias wrote:

    Another thing I can guess is the ping-keepalive functionality of openvpn. Maybe if it is tuned to ping too frequently then it may affect the tunnel. Post here a screenshot of ZS configuration for the BOND, the VPNs and netbalancer in case you have one.

    I dont think the keep alive could cause this type of behavior. Maybe I am not explaining myself well. The extra bandwidth is not being used between the tunnel but when an external host connects to the WAN IP of the DC zeroshell box it sees excess ACKs and packets when downloading off that WAN IP of the ZS box. You can test this out yourself by downloading from these two links.

    Not going through the VPN:

    http://fios1.houkouonchi.jp/test256

    VPN:

    http://66.33.216.46/test256

    One of my connections behind the VPN is doing a decent amount of upload activity (I stopped it during my testing) so you might want to limit the download speed on both to 200 or 300 kilobytes/sec so you can see how to the second link your connection sends way more PPS and uses way more upstream bandwidth when downloading.

    Here are some screen shots you requested.





    The other side of the VPN bond (bonded the same way) is 10.99.98.1 and I am port forwarding port 80 on 66.33.216.46 to 10.99.98.2.

    #50957

    ppalias
    Member

    Mind drawing a physical connection diagram of your setup? I am trying to understand exactly what you have done here. I tried the downloads and the first link was downloaded at about 1,1Mb/s while the second at around 300Kb/s. I am under the impression that for some reason you are using only one link and not the the BOND. The 66.33.216.46 IP is NATed somewhere outside your ZS infrastructure? Please also include the IPs in the diagram as well as the names of the ZS you are using (e.g fios1.houkouonchi.jp).

    #50958

    houkouonchi
    Member

    The setup really is pretty simple.

    There are two boxes. one at my home (two 35/35mb connections) and one in the DC which is on 1000/1000 (single NIC with multiple IPS).

    I have two VPN connections setup on my home side each one going to a different IP address on the DC side. I then have a bond00 which the two VPN connections are apart of (on both sides).

    BOND00 on home side is set to use IP 10.99.98.2
    BOND00 on DC side is set to use IP 10.99.98.1

    I then just have port forwarding for the local address 66.33.216.46 on the DC side (one of its multiple IPs) go to 10.99.98.2 (my home ZS box) which goes over the VPN.

    It doesn’t surprise me that the second link will go slower for you. Like I mentioned one of my connections typically has a lot of upload activity going on it so the BOND is not very fast for uploading unless I shut down the activity using the bandwidth. This is due to the VPN bond being limited by the slowest connection in the MIX so if I am using 30 megabits out of 35 on one line then there is only 5 free and thus the bond will not go over 10 megabits.

    Also the excessive PPS/ACKs also seems to cause high latency paths to be more effected compared to low latency paths.

    This is why I suggested you limit the transfer to see what I am talking about and see how when going at the same speed one links will generate significantly more upload traffic (on your end, the downloader) than the other link. The only difference is that during the transfer it goes over the bonded VPN link (which of course you the downloaded has no way to know about).

    I mainly use the bond connection for doing downloading and in that case the upload activity doesn’t effect it all that much.

    #50959

    ppalias
    Member

    Okay now I understand your problem. You have only http packets going through the BOND00 link and when other applications use one of your lines the other one is underutilized in the BOND. Actually the point of the bond is to use it generally for everything, not for a specific type of traffic. Of course you can use it as you are using it, passing only a specific type of traffic.
    Eventually the residential lines have what bandwidth exactly? In the previous post you mention 35/35Mbps and in the first post you mention 75/5Mbps.

    #50960

    houkouonchi
    Member

    The two lines are 35/35. What I said in the original post is when downloading from the bonded connection from another residential connection (that connection is 75/5) has problems maintaining download speed because the ACKs are actually overwhelming *that* connections upload speed. Downloading off a normal server does not do this but downloading over the bonded VPN causes excessive upload PPS and bandwidth usage.

    #50961

    ppalias
    Member

    Ok so the problem is seen when you download over the BOND from a residential connection of 75/5Mbps.
    Is the line clear of all other traffic during the test?
    You must also take in consideration that with vpn you have packet encapsulation, which lowers the ratio of data size to frame size.

    #50962

    houkouonchi
    Member

    @ppalias wrote:

    Ok so the problem is seen when you download over the BOND from a residential connection of 75/5Mbps.
    Is the line clear of all other traffic during the test?
    You must also take in consideration that with vpn you have packet encapsulation, which lowers the ratio of data size to frame size.

    Well download from the WAN IP of the DC box which is port forwarding over the BOND link (so upload activity on the BOND).

    Yes the line is clear of all other traffic. The problem is that the client downloading from the BOND sends about 3x more data than usual for the ACKs. I was hoping there is some way to tune this a little?

    Is there anyway to tweak the data/frame size?

    #50963

    ppalias
    Member

    You could try messing with the MTU value if it is available.
    May I suggest trying with IPERF to measure the exact bandwidth between the 2 endpoints of the bonded vpns.

    #50964

    houkouonchi
    Member

    @ppalias wrote:

    You could try messing with the MTU value if it is available.
    May I suggest trying with IPERF to measure the exact bandwidth between the 2 endpoints of the bonded vpns.

    I already tried messing with the MTU but at low values (like under 1000) it got even worse (up to 8k PPS from 6k) so I switched it back to 1500.

    Here is what I get with iperf:

    Home -> DC box:

    home side:
    admin@zeroshell: 05:34 AM :~# iperf -c 10.99.98.1


    Client connecting to 10.99.98.1, TCP port 5001
    TCP window size: 16.0 KByte (default)


    [ 3] local 10.99.98.2 port 53282 connected with 10.99.98.1 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.0 sec 77.8 MBytes 65.2 Mbits/sec

    DC side:
    root@zeroshell root> ./iperf -s


    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)


    [ 4] local 10.99.98.1 port 5001 connected with 10.99.98.2 port 53282
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.0-10.1 sec 77.8 MBytes 64.5 Mbits/sec

    DC Box -> home:

    home side:
    admin@zeroshell: 05:32 AM :~# iperf -s


    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)


    [ 4] local 10.99.98.2 port 5001 connected with 10.99.98.1 port 41422
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.0-10.1 sec 77.9 MBytes 64.8 Mbits/sec

    DC side:
    root@zeroshell root> ./iperf -c 10.99.98.2


    Client connecting to 10.99.98.2, TCP port 5001
    TCP window size: 16.0 KByte (default)


    [ 3] local 10.99.98.1 port 41422 connected with 10.99.98.2 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.0 sec 77.9 MBytes 65.2 Mbits/sec

    iptraf, mrtg, and gkrellm seem to give me better

    #50965

    houkouonchi
    Member

    Something interesting to add is that the excessive upstream bandwidth appears to be specific to the bonding (not the VPN) atleast to an extent. For example when I use iperf to go over a non bonded VPN (it goes faster due to one of my connections is closer to 45/35 than 35/35) but uses less upstream bandwidth during the transfer.

    For example over the bond on a single link I see:


    Incoming rates: 44692.3 kbits/sec
    3819.0 packets/sec

    Outgoing rates: 2325.3 kbits/sec
    2045.0 packets/sec

    And when I go over the bonded VPN I see:


    Incoming rates: 36914.3 kbits/sec
    4272.8 packets/sec

    Outgoing rates: 4422.9 kbits/sec
    3326.4 packets/sec

    So its sending a lot more data (almost double) even though its downloading at a lower speed. How does the bonding work? Does it automatically send ACKs out of both interfaces or something?

    Anywhere I can read up on the exact bonding method that ZS uses to bond the VPNs?

    #50966

    houkouonchi
    Member

    Ok so it looks like its balance-rr:


    admin@zeroshell: 06:18 AM :~# cat /proc/net/bonding/BOND00
    Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

    Bonding Mode: load balancing (round-robin)
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 7000
    Down Delay (ms): 0

    Slave Interface: VPN00
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: da:93:72:55:43:4a

    Slave Interface: VPN01
    MII Status: up
    Link Failure Count: 2
    Permanent HW addr: 82:41:96:a3:ba:da

    So it appears that the problem might be what this guy mentions:

    http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1282742025999+28353475&threadId=1407298

    I have a feeling that this is likely why that one guy trying to do 4-6 connections is having performance issues as well.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic.