Measuring IPv4 vs. IPv6 Performance

Posted October 19, 2015.Bash,Networks,University,Web.4195 words.

I produced a series of bash scripts to automate the process of pinging the list of websites. I choose bash as it is trivial to pipe the output from ping into various other command line programs such as: sed, gawk and wget. As it was completely automated I decided to start early and just let it run. In total I pinged the top 100,000 websites up to 100 times each using script.sh (See Appendix A), recording useful statistics.

The script is very simple; a while loop to iterate over each site, and ping/ping6 piped into gawk to process the result. Gawk is very good at processing this sort of data, and more than fast enough to perform the task. The result was output to a large (11.1MiB) CSV file, with the IPv4 and IPv6 of each site, on separate lines.

As part of creating this script, I assumed that every site, if it exists would be able to respond within 10 seconds if it was online. If not, my script would timeout and assume the site was down. I feel this is a reasonable assumption as any remotely popular site should respond quickly, unless it’s currently being DDOSed. Another assumption I made is that any site I scanned would be able to withstand 5 requests per second. Even a raspberry pi is capable of serving 43 static pages per second1. As I sent a maximum of 50 requests, the brief period of slightly increased load should be negligible for any of the sites I scanned.

In hindsight, I would have combined both IPv4 and IPv6 into a single line from the start, as manipulating the data in excel is significantly easier to do if it is all on a single line. By that time I had already scanned the top 100,000 site’s so simply regathering the data was impossible. To fix this, I created combine.sh (See Appendix B) which simply echo’s the IPv4 line without a newline, then the IPv6 line with one. This is the reason I have some duplicated columns in my combined output. These are removed in Appendix D.

Whilst looking through the IPv6 column I noticed a very common prefix: “2400:cb00:”. After some research I discovered that this prefix belongs to cloudflare2. Using the prefixes I found on whatmyip.co3, I created a table mapping the hosting company to the number of sites it hosts. The results are impressive.

I decided to lookup the geolocation of each website. Looking around for a convenient database or API, I stumbled upon freegeoip.net4. It allows you to easily gather geolocation information for a specified IP in CSV form, perfect for my coursework. To retrieve this information using lookup.sh (See Appendix C) I self-hosted my own instance, then used cURL and a simple while loop to request and printf all the location information about each site to a file. I decided to record all the information given, to keep the script simple and retain all the information, to ensure I didn’t need to re-run the script.

Once the data was collected, it was time to head to Excel to analyze the data and draw conclusions. Having a large dataset let me create very good graphs, and draw good conclusions but was tedious to work with in Excel. Certain formulas, such as the ones used to create the Average Response Time per Country over Distance graph, managed to crash Excel numerous times and it even ran out of memory every now and then. In future when dealing with similarly sized amounts of data I would need to look into other graphing tools.

Despite not being perfect, I am happy with what data I gathered and how I gathered it. The results speak for themselves.


Results

Average Response Time for each Site

I decided to plot all 100,000 points as an X-Y scatter. In this, and subsequent graphs, IPv4 is blue and IPv6 is red. Immediately I noticed rather obvious bands of pings, which are shown in the Histogram, below.

Histogram of Average Response Time for each Site

There are large peaks in the graph above. It’s interesting to note that despite the lower adoption of IPv6, the initial peak is half the height of IPv4. Past that, the frequency is low, indicating lower IPv6 response times.

Average Response Time for the Top 1000 Websites

Thanks to CDNs and local sites, the top 1000 sites are concentrated around the 4-6ms mark with both averages trending slowly upwards. IPv6 is always significantly lower than IPv4 in the graph above.

IPv4 vs IPv6 across the world

From a glance, you can see regions such as Africa, the Caribbean, and the Middle East without any IPv6 deployment. Sites are concentrated around the U.S.A, Europe and East Asia, with barren areas in-between.

Combined Charts

Average response times for IPv4 only and IPv6 only sites is roughly the same at about 104ms. The average min and max are largely identical as well; nothing surprising. On the other hand, the averages for sites running both IPv4 and IPv6 is very low in comparison – only 25ms compared with over 100ms! Now the question is, why are sites running both IPv4 and IPv6 significantly faster?

A Vast majority of 90% of the internet is IPv4 only, with only 4.5% of sites providing both. In fact, more sites provide neither than both! It’s impossible to not have an IPv4 address and be truly connected.

A large number of the top 100,000 websites are either blocking ICMP echo requests (ping) or are simply offline. Alternatively they could only be only listening to a specific sub-domain. I didn’t check for this.

One reason sites that provide both IPv4 and IPv6 are faster, is that 65% of them are behind cloudflare, or google. Both have worldwide CDNs, and cloudflare provides a free IPv6 gateway; allowing IPv4 only sites to be connected to using IPv6.

The U.S.A. is the world leader in number of hosted sites with 43% of the market. Comparatively, every other country is trailing behind with Canada at 9%, Germany at 6%, and Hong Kong at 6%. This is despite the existence of global CDNs.

Average Response Time over Distance

To test the geolocation accuracy, I potted estimated distance, over average response time. Sites above the diagonal are likely closer than their IP suggests. Sites below the diagonal simply have a poor connection.

Average Response Time per Country over Distance

Besides a colourful graph, this shows the grouping of sites in different countries; explaining why there are so many peaks in the Histogram. There is a minimum amount of time it takes to connect to distant hosts.


Discussion

Limitations

I only ran the script once, and as the script took easily a week to analyze the top 100,000 sites, chances are the sites at the top had changed since the start. I could have gotten around this by parallelizing the script, or running it multiple times on a smaller set and taking an average. Parallelizing seemed too complex for the task at hand, and I didn’t really consider running it multiple times before it had almost finished analyzing. As I didn’t want to discard all the data, I decided to go ahead with the data I had.

The geolocation database I used isn’t 100% accurate - that is virtually impossible. You can see that it isn’t on the Average Response Time over Distance graph. Many sites are so significantly above the diagonal that the only way that the time would be possible is by breaking the speed of light; and indication that they are located much closer than their IP suggests. This is likely as common as it is, due the how exhausted IPv4 is, organizations are trading the limited number of IPv4’s that they have access to. This doesn’t really matter too much, due to the size of my data set. In future I could remove the outliers to produce cleaner results.

Speed

It is clear to see that sites that serve both IPv4 and IPv6 traffic are, on average, significantly faster than those that don’t (4 times faster on average). Every single graph I have produced shows this simple fact. I feel that this can be attributed to several factors:

  1. Cloudflare and Google account for ~65% of all IPv6 traffic and both have global CDNs to ensure a fast and reliable connection. There are many more IPv4 sites that neither are associated with. Instead of connecting to a distance site, a CDN acts as a proxy, speeding up the time it takes to connect.
  2. IPv6 isn’t widely deployed yet, with only 6% of sites serving it. Those 6% are usually the bigger websites; smaller sites may be stuck behind a single IPv4 address, in a datacenter that doesn’t support IPv6.
  3. Sites that serve both IPv4 and IPv6 traffic are concentrated in North America, Europe and East Asia – regions with geographically closer and better connected than the rest of the world. They are practically absent from regions such as Africa, the Caribbean and the Middle East. There is also little deployment in South America, West Asia and Oceania.

However, despite the fact that average response time for IPv6 is significantly faster than IPv4, it’s unlikely you’ll see any speed increase switching between IPv4 and IPv6 on a host that supports both. IPv6 is faster because the hosts that server both are well connected with fast response time, regardless of protocol.

Deployment

Despite the best efforts of organizations such as worldipv6launch.org, cloudflare, and Google; IPv6 access is an afterthought. Despite claims of 500% growth since 2012, it’s 2015 and only 4.5% of sites support IPv6. As we go into the future, IPv6 deployment will surely grow as larger populations and the Internet of Things will strain the exhausted IPv4 pool even further. Until IPv6 is widespread, anyone with only an IPv6 address will be unable to connect directly to IPv4 only hosts, without the aid of a tunnel.

IPv6 isn’t evenly geographically distributed, compared with IPv4. If you’re in Africa, the Caribbean, or the Middle East, virtually no sites support IPv6. This suggests to me that the infrastructure required to support IPv6 just isn’t there.

Rank

Bigger sites are more likely to support and have a fast IPv4 and IPv6 connection than smaller sites. As you go through the different sites, the further down you get, the slower the site is to respond, on average.


Appendices

Appendix A – ping.sh
version=(4 6)
timeout=10
attempt=50

input="/home/mc21g14/ipv4-vs-ipv6/top-1m.csv"
range=(1 100000)
output=`date +/home/mc21g14/ipv4-vs-ipv6/outputs/%y%m%d-%H%M.csv`

sed -n "${range[0]},${range[1]}p" $input | while IFS=$',' read -r -a host; do
	[ "${host[0]}" = "1" ] && echo -e "pos,host,ip,v,min,max,ave,sent,got,lost"

	for ver in "${version[@]}"; do
		[ $ver = "4" ] && command="ping" || command="ping$ver"

		$command -c $attempt -i 0.2 -q -w $timeout "${host[1]}" | gawk -v pos="${host[0]}" -v \
      host="${host[1]}" -v version=$ver '
			BEGIN {
				OFS  = ","
				FS   = " "
				sent = got = lost = 0
			}
			/PING .+ \([0-9.:]+\) [0-9]+\([0-9]+\) bytes of data./ {
				ip   = substr($3, 2, length($3) - 2)
			}
			/PING .+\([0-9a-zA-Z\-_.:]+\) [0-9]+ data bytes/ {
				ip   = substr($2, index($2, "(") + 1, length($2) - index($2, "(") - 1)
			}
			/[0-9]+ packets transmitted, [0-9]+ received, [0-9]+% packet loss, time [0-9]+[a-z]+/{
				sent = $1
				got  = $4
				lost = $1 - $4
			}
			/rtt min\/avg\/max\/mdev = [0-9./]+ [a-z]+/ {
				split($4, temp, "/")
				min  = temp[1]
				max  = temp[2]
				ave  = temp[3]
			}
			END {
				print pos, host, ip, version, min, max, ave, sent, got, lost
			}
		'
	done
done 2> /dev/null | tee $output
Appendix B – combine.sh
counter=0
tail -n +2 output.csv | while read -r line; do
	if [ $counter -eq 0 ]; then
		printf "$line,"
		counter=1
	else
		printf "$line\n"
		counter=0
	fi
done | tr -d "\t " | tee combined.csv
Appendix C – lookup.sh
host="http://127.0.0.1:8080/csv/"

while IFS=$'\t,' read -a array; do
	rip4=`wget -qO- $host${array[2]} | tr -d '\n\r'`
	rip6=`wget -qO- $host${array[12]} | tr -d '\n'\r`

	[[ $ip4 == "" ]] && rip4=",,,,,,,,,,"
	[[ $ip6 == "" ]] && rip6=",,,,,,,,,,"

	echo "${array[1]},$rip4,$rip6"
done < combined.csv | tr -d "\t " | tee geo.csv
Appendix D – Data for top 100 sites
Rank Host IPv4 Min4 Ave4 Max4 Sent4 Got4 Lost4 IPv6 Min6 Ave6 Max6 Sent6 Got6 Lost6
1 google.com 216.58.210.46 4.64 5.578 6.173 50 50 0 lhr14s23-in-x0e.1e100.net 4.388 5.032 6.393 50 49 1
2 facebook.com 31.13.74.1 90.289 90.888 92.635 50 48 2 edge-star6-shv-01-ord1.facebook.com 93.663 94.24 94.955 50 46 4
3 youtube.com 216.58.210.46 4.738 5.27 5.849 50 49 1 lhr14s23-in-x0e.1e100.net 4.46 4.93 5.715 50 48 2
4 baidu.com 220.181.57.217       49 0 49         0 0 0
5 yahoo.com 206.190.36.45 152.285 152.97 153.387 49 15 34 ir1.fp.vip.bf1.yahoo.com 92.439 92.828 93.196 50 49 1
6 amazon.com 176.32.103.205       49 0 49         0 0 0
7 wikipedia.org 91.198.174.192 12.09 12.658 13.886 50 49 1 text-lb.esams.wikimedia.org 12.376 12.843 13.247 50 49 1
8 qq.com 125.39.240.113 237.524 238.328 239.603 50 48 2         0 0 0
9 twitter.com 185.45.5.43 4.484 5.529 7.486 50 49 1         0 0 0
10 google.co.in 216.58.210.35 4.702 5.485 8.106 50 49 1 lhr14s23-in-x03.1e100.net 4.332 4.893 5.687 50 49 1
11 taobao.com 110.75.115.70 185.638 186.337 186.899 49 16 33         0 0 0
12 live.com 65.55.206.154       49 0 49         0 0 0
13 sina.com.cn 202.108.33.60       49 0 49         0 0 0
14 linkedin.com 108.174.2.129 78.524 78.975 79.857 50 48 2         0 0 0
15 yahoo.co.jp 182.22.59.229 278.256 278.7 279.546 49 47 2         0 0 0
16 weibo.com 114.134.80.162 257.736 258.437 260.118 50 49 1         0 0 0
17 ebay.com 66.135.216.190       49 0 49         0 0 0
18 google.co.jp 216.58.210.35 4.114 5.044 5.676 50 49 1 lhr14s23-in-x03.1e100.net 4.158 4.766 5.519 50 49 1
19 yandex.ru 5.255.255.5 58.047 58.636 59.634 50 50 0 yandex.ru 69.847 70.568 71.145 50 50 0
20 bing.com 204.79.197.200 4.524 5.338 7.763 50 49 1         0 0 0
21 vk.com 87.240.131.99 55.904 56.806 58.016 50 50 0 2a00:bdc0:3:103:1:0:403:900 47.489 48.606 53.783 50 50 0
22 hao123.com 180.149.132.19 164.102 165.279 166.886 50 48 2         0 0 0
23 google.de 216.58.210.35 4.626 5.635 6.803 50 50 0 lhr14s23-in-x03.1e100.net 3.954 4.929 5.869 50 50 0
24 t.co 185.45.5.47 4.96 5.774 6.494 50 49 1         0 0 0
25 instagram.com 54.164.44.207       49 0 49         0 0 0
26 msn.com 23.101.196.141       49 0 49         0 0 0
27 amazon.co.jp 54.240.248.0       49 0 49         0 0 0
28 google.co.uk 216.58.210.35 4.141 5.171 5.866 50 50 0 lhr14s23-in-x03.1e100.net 4.201 4.741 6.089 50 50 0
29 pinterest.com 23.235.37.84 4.595 5.526 6.995 50 50 0         0 0 0
30 tmall.com 110.75.114.89 213.58 214.453 216.155 50 48 2         0 0 0
31 wordpress.com 192.0.78.9 4.952 5.768 8.058 50 49 1         0 0 0
32 ask.com 66.235.120.127 77.744 78.512 79.456 50 50 0         0 0 0
33 reddit.com 198.41.208.139 4.362 5.026 5.534 50 50 0         0 0 0
34 blogspot.com 216.58.210.41 4.19 4.852 5.468 50 50 0 lhr14s23-in-x09.1e100.net 3.894 4.328 5.125 50 49 1
35 paypal.com 66.211.169.66       49 0 49         0 0 0
36 google.fr 216.58.210.35 4.119 4.729 5.765 50 50 0 lhr14s23-in-x03.1e100.net 3.905 4.399 5.584 50 50 0
37 mail.ru 217.69.139.200 55.976 56.271 57.219 50 50 0         0 0 0
38 apple.com 17.142.160.59       49 0 49         0 0 0
39 google.com.br 216.58.210.35 4.057 4.334 5.053 50 50 0 lhr14s23-in-x03.1e100.net 3.82 4.326 5.371 50 50 0
40 onclickads.net 78.140.191.89 11.053 11.668 12.453 50 48 2         0 0 0
41 tumblr.com 66.6.41.30 73.377 74.212 75.04 50 49 1         0 0 0
42 aliexpress.com 205.204.101.160 148.093 148.093 148.093 1 1 0         0 0 0
43 microsoft.com 134.170.185.46       49 0 49         0 0 0
44 google.ru 216.58.210.35 4.069 4.444 5.589 50 50 0 lhr14s23-in-x03.1e100.net 3.929 4.538 6.043 50 50 0
45 sohu.com 220.181.90.240 172.701 175.674 177.063 50 31 19         0 0 0
46 imgur.com 185.31.18.193 4.223 4.863 5.858 50 50 0         0 0 0
47 xvideos.com 141.0.174.37 10.857 11.618 12.892 50 50 0         0 0 0
48 google.it 216.58.210.35 4.212 4.863 5.501 50 49 1 lhr14s23-in-x03.1e100.net 3.933 4.776 6.495 50 49 1
49 imdb.com 207.171.166.22 78.004 78.711 79.662 50 50 0         0 0 0
50 google.es 216.58.210.35 4.338 4.907 5.715 50 50 0 lhr14s23-in-x03.1e100.net 3.963 4.543 5.461 50 50 0
51 netflix.com 50.19.210.42       49 0 49         0 0 0
52 amazon.de 178.236.6.250       49 0 49         0 0 0
53 gmw.cn 111.202.12.1       49 0 49         0 0 0
54 fc2.com 54.148.76.135 149.872 150.924 151.974 50 49 1         0 0 0
55 360.cn 106.120.167.66 176.302 177.604 181.387 50 50 0         0 0 0
56 alibaba.com 198.11.132.23 144.695 145.18 146.226 50 49 1         0 0 0
57 stackoverflow.com 198.252.206.16 83.033 83.857 84.89 50 50 0         0 0 0
58 go.com 199.181.131.249 136.686 137.457 138.746 50 49 1         0 0 0
59 google.com.mx 216.58.210.35 4.471 5.263 5.97 50 49 1 lhr14s23-in-x03.1e100.net 4.093 4.787 5.69 50 50 0
60 ok.ru 217.20.156.159 54.942 55.848 56.438 50 50 0         0 0 0
61 google.ca 216.58.210.35 4.637 5.118 5.957 50 50 0 lhr14s23-in-x03.1e100.net 3.903 4.485 5.692 50 49 1
62 google.com.hk 216.58.210.35 4.199 5.191 6.629 50 49 1 lhr14s23-in-x03.1e100.net 4.421 4.997 6.809 50 48 2
63 tianya.cn 124.225.65.154 242.886 247.381 257.391 50 47 3         0 0 0
64 amazon.in 54.239.34.40       49 0 49         0 0 0
65 amazon.co.uk 178.236.7.220       49 0 49         0 0 0
66 craigslist.org 208.82.238.129 142.154 142.777 143.345 50 49 1         0 0 0
67 rakuten.co.jp 133.237.48.124 266.547 267.284 272.682 50 48 2         0 0 0
68 pornhub.com 31.192.117.132       49 0 49         0 0 0
69 naver.com 202.179.177.22       49 0 49         0 0 0
70 blogger.com 216.58.210.41 4.341 5.066 6.155 50 49 1 lhr14s23-in-x09.1e100.net 4.289 4.915 5.973 50 50 0
71 diply.com 184.27.136.120 4.478 4.994 6.066 50 50 0         0 0 0
72 xhamster.com 88.208.29.24 10.096 11.291 16.545 50 50 0 2a02:b48:4000:d::1 9.613 10.611 18.378 50 50 0
73 google.com.tr 216.58.210.35 4.372 5.088 11.501 50 49 1 lhr14s23-in-x03.1e100.net 3.909 4.617 6.092 50 48 2
74 flipkart.com 163.53.78.58 145.724 146.658 148.342 50 50 0 2001:df0:23e:9002::15 116.853 117.855 118.647 50 49 1
75 espn.go.com 68.71.212.186 137.183 138.1 139.633 50 49 1         0 0 0
76 googleadservices.com 216.58.210.34 4.036 4.704 5.672 50 50 0         0 0 0
77 soso.com 106.120.151.169 192.633 193.531 194.7 50 19 31         0 0 0
78 outbrain.com 66.225.223.5       49 0 49         0 0 0
79 cnn.com 157.166.226.25 88.462 88.92 89.727 50 50 0         0 0 0
80 nicovideo.jp 202.248.110.243       49 0 49         0 0 0
81 google.co.id 216.58.210.35 4.095 4.876 5.809 50 50 0 lhr14s23-in-x03.1e100.net 3.92 4.485 5.826 50 50 0
82 dropbox.com 108.160.172.200 152.031 152.031 152.031 1 1 0         0 0 0
83 googleusercontent.com         0 0 0         0 0 0
84 github.com 192.30.252.131 78.465 78.896 79.884 50 50 0         0 0 0
85 bongacams.com 64.210.142.13 19.519 20.284 21.492 50 50 0         0 0 0
86 kat.cr 78.138.99.144 15.043 16.015 17.915 50 46 4         0 0 0
87 xinhuanet.com 202.108.119.194       49 0 49         0 0 0
88 google.co.kr 216.58.210.35 4.037 4.783 5.363 50 50 0 wl-in-x5e.1e100.net 10.99 11.717 12.582 50 50 0
89 bbc.co.uk 212.58.244.18 4.151 4.966 5.906 50 50 0         0 0 0
90 ebay.de 66.211.181.235       49 0 49         0 0 0
91 google.pl 216.58.210.35 4.088 5.004 7.228 50 50 0 lhr14s23-in-x03.1e100.net 3.868 4.388 6.364 50 50 0
92 google.com.au 216.58.210.35 4.096 4.448 5.251 50 50 0 lhr14s23-in-x03.1e100.net 3.941 4.69 5.181 50 50 0
93 pixnet.net 103.23.108.107 286.382 287.107 287.813 50 46 4         0 0 0
94 popads.net 184.154.76.140 92.731 93.419 94.268 50 50 0         0 0 0
95 ebay.co.uk 66.211.181.235       49 0 49         0 0 0
96 sogou.com 106.120.188.46 196.929 197.989 199.203 48 17 31         0 0 0
97 dailymotion.com 195.8.215.137 12.132 13.163 14.873 50 49 1         0 0 0
98 adcash.com 104.154.36.143 105.894 106.606 107.661 50 50 0         0 0 0
99 adobe.com 192.150.16.117 118.065 119.054 120.162 50 49 1         0 0 0
100 nytimes.com 170.149.159.130 147.227 147.976 149.528 50 50 0         0 0 0

Click to download the full dataset of the top 100,000 sites.