Second part, where experiments with VPNs start

So, for the starters. I have a small LAN here with about a dozen of devices that speak to internet. A couple of notebooks, an HTPC, one desktop and various smartphones and tablets. When it comes to the users number it is usually one or two with occasional guests. The most of the experiments I’ve done were set up just with one client. Before I’ve started to experiment all the traffic went directly through the CPE that did the NAT.

EeeBox EB1501

EeeBox EB1501

To start playing I’ve used the HTPC as the router. Hooked up a cheap HSDPA router over WiFi and had two interfaces with different IP addresses that could be used to go to the internets. So far so good. The first thing to set up is multipath routing and at this point someone may ask why not to use it without VPN.

Huawei E5830

Huawei E5830

Well, the normal multipath routing will work good only when the “multi” part of it doesn’t affect the source IP address, it has to remain the same otherwise you will get into all kind of problems if the routing cache entries expire. Since my clients are behind the NAT the only way to overcome this limitation is to source route the packets but with only two clients it doesn’t make a lot of sense and just using the two lines separately is not the goal here. YMMV of course. Running source based multi-path in a ten persons office is a viable option to distribute the traffic.

Back to my HTPC. I had two interfaces now set up. The ethernet had to be both an Internet leg and the local leg and I had to resort to aliases:

eth0:
192.168.230.253/24 → ADSL router (192.168.230.1) subnet
192.168.231.253/24 → LAN subnet
wlan0:
192.168.232.253/24 → to HSDPA (192.168.232.1) subnet

Quick and dirty, setting up multi path routing is as simple as this:

#create two separate tables for the traffic going through different providers:
ip route add 192.168.230.0/24 dev eth0 src 192.168.230.253 table ISP230
ip route add 192.168.232.0/24 dev wlan0 src 192.168.232.253 table ISP232
#add default gateways to the tables:
ip route add default via 192.168.230.1 table ISP230
ip route add default via 192.168.232.1 table ISP232
#now add some rules to be strict about what goes were
ip rule add from 192.168.230.253 table ISP230 priority 20
ip rule add from 192.168.232.253 table ISP232 priority 21
#and finally setup two default gateways
ip route add default scope global \
nexthop via 192.168.230.1 dev eth0:0 weight 1 \
nexthop via 192.168.232.1 dev eth0:1 weight

At this point, pinging something on the internets will go through different gateways. Achievement unlocked.  To check if it is working as expected, one can start pinging something with -R to record  the path (choose something close enough to you or the -R will not work), here is an example of pinging one of the ISPs exchanges:

ping -R x.x.128.81
PING x.x.128.81 (x.x.128.81) 56(124) bytes of data.
64 bytes from x.x.128.81: icmp_req=1 ttl=64 time=21.6 ms
RR:     192.168.232.253
        a.a.165.227
        b.b.27.141
        c.c.27.138
        d.d.128.82
        x.x.128.81
64 bytes from x.x.128.81: icmp_req=2 ttl=64 time=21.5 ms    (same route)
Ping Packet

Ping Packet

As the last line suggests the route didn’t change for the second ICMP exchange and it will not as long as there is traffic between two IP addresses. This is the caching of the routing which is a good thing but for the sake of the experiment just flush it a couple of times while the ping is running:

ip route flush cache

And rather sooner than later you will see a different path:

64 bytes from x.x.128.81: icmp_req=9 ttl=252 time=121.2 ms
RR:     192.168.230.253
        a.a.227.41
        d.d.128.82
        x.x.128.81

Side note: the IPs in the paths are similar in my example because both ADSL and HSDPA are hooked up to the same network. You may be less lucky in finding a host pingable through both lines that will let you have the RR. You can then just launch a torrent download and see how the traffic goes through different routes.

Next, setting up the VPNs. The one I am familiar with is the openvpn but I’ve tried some others and there is no difference in this specific setup as far as the VPN software satisfies some requirements I’ll come back to later on. Since I already had a VPN setup between my HTPC and a server in the internets I’ve just copied the configuration and launched the second instance. The interesting parts of the configuration:

tls-client
proto tcp-client
dev tap0
ca /etc/openvpn/cacert.crt
cert /etc/openvpn/cert.crt
key  /etc/openvpn/key.key
remote x.x.x.x 443
cipher BF-CBC
comp-lzo
persist-key
persist-tun

After checking that I can add an IP address and ping the server via both of the links I’ve removed the IPs and bonded the interfaces on both HTPC and the server:

#on both:
#load the bonding driver
modprobe bond0
#enslave the VPN interfaces
ifenslave bond0 tap0 tap1
#on HTPC
ifconfig bond0 10.1.1.1/30
#on the server:
ifconfig bond0 10.1.1.2/30

Then ping:

ping 10.1.1.2
PING 10.1.1.2 (10.1.1.2) 56(84) bytes of data.
64 bytes from 10.1.1.2: icmp_req=1 ttl=64 time=22.1 ms
64 bytes from 10.1.1.2: icmp_req=2 ttl=64 time=124.6 ms
64 bytes from 10.1.1.2: icmp_req=3 ttl=64 time=163.7 ms
64 bytes from 10.1.1.2: icmp_req=4 ttl=64 time=21.4 ms

The different times clearly show both HSDPA and ADSL are in use and tcpdump confirms it.

Till now all was going pretty smooth and no problems besides the human factor were encountered but as soon as I’ve launched the first wget:

wget -O /dev/null http://10.1.1.2/1MB.bin
--skip--
(51.3 KB/s) - `/dev/null' saved

I’ve realized that the easy part is over. The first few hours were spent on checking and trying different MTU sizes, fragmentation parameters, queue lengths, etc. to no avail. Then I’ve started to understand that running TCP over TCP which usually worked just fine for most of my openvpn setups is not going to work this time. And here is why. The lines are different. Both as throughput and the latency. The way TCP works, it has some tolerance for the reordering of the packets but 5-8 times difference between every second packed is bad by itself, add to this the two TCPs and all you will get is full buffers and no data. You can mitigate the problem adding some artificial latency to the faster interface:

tc qdisc add dev eth0 root netem delay 120ms

but firstly, it only works one way (or you have to do the same on the other side, means you must have a multihomed server in the internets as well) and secondly, it applies to all the interface traffic, not only the VPN. Finally it will only improve the things, but occasionally the things will get stuck. I’ve reconfigured the VPNs to UDP then (make sure the routers support DMZ host or at least port forwarding, you are going to have bad time without them ports).

Packet loss

Packet loss

Things started to improve. I didn’t get the connections stuck anymore, but strangely, using 10Mbps ADSL + 3Mbps HSDPA downlinks resulted in about 2Mbps of download speed. This is for a few reasons, first, lots of UDP packets get dropped because the TCP above the VPN doesn’t want them (latency, excessive retransmissions and reordering), second, the radio link is less reliable and has a lot of retransmissions by itself (compare what your smartphone says about consumed bytes and what your operator counted, you always lose), and finally, my VPN had a big overhead, 69 bytes for every packet which were tken from already small PPP MTUs. The last one I did reduce to 28 bytes per packet by removing the encryption or I could even save more using GRE or IPIP tunnels but I had no static IPs so at least I had to authenticate the VPN client on the server. But the first two were out of my control anyway, given the time and money budgets.

Also, the native linux bonding driver does not support the weighted round robin, it can only split the traffic equally which means the throughput of every single line is that of the slowest line. There are some patches for older kernels that allow weighting in the bonding but I couldn’t get them to compile on a 3.x and didn’t want to spend too much time to port (I am no friends with C).

Too much knobs

Too much knobs

I’ve used this setup for a day or two just to get a feeling of the performance, after all I didn’t need bandwidth, just reliability. And it was awful. The connections were dropping periodically, stalling. Mobile IP changing. ADSL dropping. Not usable. The good news were that the experiment proved to be successful. I had a working prototype that could be improved. All I needed was the second line with the same parameters as the first one.

Post Navigation