Hello! I studied Computer Science, and live in Bath. I write code, design games, and occasionally tweet.
Hello! I studied Computer Science, and live in Bath. I write code, design games, and occasionally tweet.

Measuring IPv4 vs. IPv6 Performance Oct. 19, 2015 in Bash, Networks, Text, University, Web

I produced a series of bash scripts to automate the process of pinging the list of websites. I choose bash as it is trivial to pipe the output from ping into various other command line programs such as: sed, gawk, and wget. As it was completely automated I decided to start early and just let it run. In total I pinged the top 100,000 websites up to 100 times each using script.sh (See Appendix A), recording useful statistics.

The script is very simple; a while loop to iterate over each site, and ping/ping6 piped into gawk to process the result. Gawk is very good at processing this sort of data, and more than fast enough to perform the task. The result was output to a large (11.1MiB) CSV file, with the IPv4 and IPv6 of each site, on separate lines.

As part of creating this script, I assumed that every site, if it exists would be able to respond within 10 seconds if it was online. If not, my script would timeout and assume the site was down. I feel this is a reasonable assumption as any remotely popular site should respond quickly, unless it’s currently being DDOSed. Another assumption I made is that any site I scanned would be able to withstand 5 requests per second. Even a raspberry pi is capable of serving 43 static pages per second1. As I sent a maximum of 50 requests, the brief period of slightly increased load should be negligible for any of the sites I scanned.

In hindsight, I would have combined both IPv4 and IPv6 into a single line from the start, as manipulating the data in excel is significantly easier to do if it is all on a single line. By that time I had already scanned the top 100,000 site’s so simply regathering the data was impossible. To fix this, I created combine.sh (See Appendix B) which simply echo’s the IPv4 line without a newline, then the IPv6 line with one. This is the reason I have some duplicated columns in my combined output. These are removed in Appendix D.

Whilst looking through the IPv6 column I noticed a very common prefix: “2400:cb00:”. After some research I discovered that this prefix belongs to cloudflare2. Using the prefixes I found on whatmyip.co3, I created a table mapping the hosting company to the number of sites it hosts. The results are impressive.

I decided to lookup the geolocation of each website. Looking around for a convenient database or API, I stumbled upon freegeoip.net4. It allows you to easily gather geolocation information for a specified IP in CSV form, perfect for my coursework. To retrieve this information using lookup.sh (See Appendix C) I self-hosted my own instance, then used cURL and a simple while loop to request and printf all the location information about each site to a file. I decided to record all the information given, to keep the script simple and retain all the information, to ensure I didn’t need to re-run the script.

Once the data was collected, it was time to head to Excel to analyze the data and draw conclusions. Having a large dataset let me create very good graphs, and draw good conclusions but was tedious to work with in Excel. Certain formulas, such as the ones used to create the Average Response Time per Country over Distance graph, managed to crash Excel numerous times and it even ran out of memory every now and then. In future when dealing with similarly sized amounts of data I would need to look into other graphing tools.


Results

Average Response Time for each Site Histogram of Average Response Time for each Site

I decided to plot all 100,000 points as an X-Y scatter. In this, and subsequent graphs, IPv4 is blue and IPv6 is red. Immediately I noticed rather obvious bands of pings, which are shown in the Histogram, below. There are large peaks in the graph above. It’s interesting to note that despite the lower adoption of IPv6, the initial peak is half the height of IPv4. Past that, the frequency is low, indicating lower IPv6 response times.

Average Response Time for the Top 1000 Websites IPv4 vs IPv6 across the world

Thanks to CDNs and local sites, the top 1000 sites are concentrated around the 4-6ms mark with both averages trending slowly upwards. IPv6 is always significantly lower than IPv4 in the graph above. From a glance, you can see regions such as Africa, the Caribbean, and the Middle East without any IPv6 deployment. Sites are concentrated around the U.S.A, Europe and East Asia, with barren areas in-between.

Combined Charts

Average response times for IPv4 only and IPv6 only sites is roughly the same at about 104ms. The average min and max are largely identical as well; nothing surprising. On the other hand, the averages for sites running both IPv4 and IPv6 is very low in comparison – only 25ms compared with over 100ms! Now the question is, why are sites running both IPv4 and IPv6 significantly faster?

A Vast majority of 90% of the Internet is IPv4 only, with only 4.5% of sites providing both. In fact, more sites provide neither than both! It’s impossible to not have an IPv4 address and be truly connected.

A large number of the top 100,000 websites are either blocking ICMP echo requests (ping) or are simply offline. Alternatively they could only be only listening to a specific sub-domain. I didn’t check for this.

One reason sites that provide both IPv4 and IPv6 are faster, is that 65% of them are behind Cloudflare, or Google. Both have worldwide CDNs, and Cloudflare provides a free IPv6 gateway; allowing IPv4 only sites to be connected to using IPv6.

The U.S.A. is the world leader in number of hosted sites with 43% of the market. Comparatively, every other country is trailing behind with Canada at 9%, Germany at 6%, and Hong Kong at 6%. This is despite the existence of global CDNs.

Average Response Time over Distance Average Response Time per Country over Distance

To test the geo-location accuracy, I potted estimated distance, over average response time. Sites above the diagonal are likely closer than their IP suggests. Sites below the diagonal simply have a poor connection. Besides a colourful graph, this shows the grouping of sites in different countries; explaining why there are so many peaks in the Histogram. There is a minimum amount of time it takes to connect to distant hosts.


Discussion

Limitations

I only ran the script once, and as the script took easily a week to analyze the top 100,000 sites, chances are the sites at the top had changed since the start. I could have gotten around this by parallelizing the script, or running it multiple times on a smaller set and taking an average. Parallelizing seemed too complex for the task at hand, and I didn’t really consider running it multiple times before it had almost finished analyzing. As I didn’t want to discard all the data, I decided to go ahead with the data I had.

The geo-location database I used isn’t 100% accurate - that is virtually impossible. You can see that it isn’t on the Average Response Time over Distance graph. Many sites are so significantly above the diagonal that the only way that the time would be possible is by breaking the speed of light; and indication that they are located much closer than their IP suggests. This is likely as common as it is, due the how exhausted IPv4 is, organizations are trading the limited number of IPv4’s that they have access to. This doesn’t really matter too much, due to the size of my data set. In future I could remove the outliers to produce cleaner results.

Speed

It is clear to see that sites that serve both IPv4 and IPv6 traffic are, on average, significantly faster than those that don’t (4 times faster on average). Every single graph I have produced shows this simple fact. I feel that this can be attributed to several factors:

  1. Cloudflare and Google account for ~65% of all IPv6 traffic and both have global CDNs to ensure a fast and reliable connection. There are many more IPv4 sites that neither are associated with. Instead of connecting to a distance site, a CDN acts as a proxy, speeding up the time it takes to connect.
  2. IPv6 isn’t widely deployed yet, with only 6% of sites serving it. Those 6% are usually the bigger websites; smaller sites may be stuck behind a single IPv4 address, in a data-center that doesn’t support IPv6.
  3. Sites that serve both IPv4 and IPv6 traffic are concentrated in North America, Europe and East Asia – regions with geographically closer and better connected than the rest of the world. They are practically absent from regions such as Africa, the Caribbean and the Middle East. There is also little deployment in South America, West Asia and Oceania.

However, despite the fact that average response time for IPv6 is significantly faster than IPv4, it’s unlikely you’ll see any speed increase switching between IPv4 and IPv6 on a host that supports both. IPv6 is faster because the hosts that server both are well connected with fast response time, regardless of protocol.

Deployment

Despite the best efforts of organizations such as worldipv6launch.org, Cloudflare, and Google; IPv6 access is an afterthought. Despite claims of 500% growth since 2012, it’s 2015 and only 4.5% of sites support IPv6. As we go into the future, IPv6 deployment will surely grow as larger populations and the Internet of Things will strain the exhausted IPv4 pool even further. Until IPv6 is widespread, anyone with only an IPv6 address will be unable to connect directly to IPv4 only hosts, without the aid of a tunnel.

IPv6 isn’t evenly geographically distributed, compared with IPv4. If you’re in Africa, the Caribbean, or the Middle East, virtually no sites support IPv6. This suggests to me that the infrastructure required to support IPv6 just isn’t there.

Rank

Bigger sites are more likely to support and have a fast IPv4 and IPv6 connection than smaller sites. As you go through the different sites, the further down you get, the slower the site is to respond, on average.


Appendices

Appendix A – ping.sh
version=(4 6)
timeout=10
attempt=50

input="/home/mc21g14/ipv4-vs-ipv6/top-1m.csv"
range=(1 100000)
output=`date +/home/mc21g14/ipv4-vs-ipv6/outputs/%y%m%d-%H%M.csv`

sed -n "${range[0]},${range[1]}p" $input | while IFS=$',' read -r -a host; do
	[ "${host[0]}" = "1" ] && echo -e "pos,host,ip,v,min,max,ave,sent,got,lost"

	for ver in "${version[@]}"; do
		[ $ver = "4" ] && command="ping" || command="ping$ver"

		$command -c $attempt -i 0.2 -q -w $timeout "${host[1]}" | gawk -v pos="${host[0]}" -v \
      host="${host[1]}" -v version=$ver '
			BEGIN {
				OFS  = ","
				FS   = " "
				sent = got = lost = 0
			}
			/PING .+ \([0-9.:]+\) [0-9]+\([0-9]+\) bytes of data./ {
				ip   = substr($3, 2, length($3) - 2)
			}
			/PING .+\([0-9a-zA-Z\-_.:]+\) [0-9]+ data bytes/ {
				ip   = substr($2, index($2, "(") + 1, length($2) - index($2, "(") - 1)
			}
			/[0-9]+ packets transmitted, [0-9]+ received, [0-9]+% packet loss, time [0-9]+[a-z]+/{
				sent = $1
				got  = $4
				lost = $1 - $4
			}
			/rtt min\/avg\/max\/mdev = [0-9./]+ [a-z]+/ {
				split($4, temp, "/")
				min  = temp[1]
				max  = temp[2]
				ave  = temp[3]
			}
			END {
				print pos, host, ip, version, min, max, ave, sent, got, lost
			}
		'
	done
done 2> /dev/null | tee $output
Appendix B – combine.sh
counter=0
tail -n +2 output.csv | while read -r line; do
	if [ $counter -eq 0 ]; then
		printf "$line,"
		counter=1
	else
		printf "$line\n"
		counter=0
	fi
done | tr -d "\t " | tee combined.csv
Appendix C – lookup.sh
host="http://127.0.0.1:8080/csv/"

while IFS=$'\t,' read -a array; do
	rip4=`wget -qO- $host${array[2]} | tr -d '\n\r'`
	rip6=`wget -qO- $host${array[12]} | tr -d '\n'\r`

	[[ $ip4 == "" ]] && rip4=",,,,,,,,,,"
	[[ $ip6 == "" ]] && rip6=",,,,,,,,,,"

	echo "${array[1]},$rip4,$rip6"
done < combined.csv | tr -d "\t " | tee geo.csv
Appendix D – Data for top 100 sites
RankHostIPv4Min4Ave4Max4Sent4Got4Lost4IPv6Min6Ave6Max6Sent6Got6Lost6
1google.com216.58.210.464.645.5786.17350500lhr14s23-in-x0e.1e100.net4.3885.0326.39350491
2facebook.com31.13.74.190.28990.88892.63550482edge-star6-shv-01-ord1.facebook.com93.66394.2494.95550464
3youtube.com216.58.210.464.7385.275.84950491lhr14s23-in-x0e.1e100.net4.464.935.71550482
4baidu.com220.181.57.217   49049    000
5yahoo.com206.190.36.45152.285152.97153.387491534ir1.fp.vip.bf1.yahoo.com92.43992.82893.19650491
6amazon.com176.32.103.205   49049    000
7wikipedia.org91.198.174.19212.0912.65813.88650491text-lb.esams.wikimedia.org12.37612.84313.24750491
8qq.com125.39.240.113237.524238.328239.60350482    000
9twitter.com185.45.5.434.4845.5297.48650491    000
10google.co.in216.58.210.354.7025.4858.10650491lhr14s23-in-x03.1e100.net4.3324.8935.68750491
11taobao.com110.75.115.70185.638186.337186.899491633    000
12live.com65.55.206.154   49049    000
13sina.com.cn202.108.33.60   49049    000
14linkedin.com108.174.2.12978.52478.97579.85750482    000
15yahoo.co.jp182.22.59.229278.256278.7279.54649472    000
16weibo.com114.134.80.162257.736258.437260.11850491    000
17ebay.com66.135.216.190   49049    000
18google.co.jp216.58.210.354.1145.0445.67650491lhr14s23-in-x03.1e100.net4.1584.7665.51950491
19yandex.ru5.255.255.558.04758.63659.63450500yandex.ru69.84770.56871.14550500
20bing.com204.79.197.2004.5245.3387.76350491    000
21vk.com87.240.131.9955.90456.80658.016505002a00:bdc0:3:103:1:0:403:90047.48948.60653.78350500
22hao123.com180.149.132.19164.102165.279166.88650482    000
23google.de216.58.210.354.6265.6356.80350500lhr14s23-in-x03.1e100.net3.9544.9295.86950500
24t.co185.45.5.474.965.7746.49450491    000
25instagram.com54.164.44.207   49049    000
26msn.com23.101.196.141   49049    000
27amazon.co.jp54.240.248.0   49049    000
28google.co.uk216.58.210.354.1415.1715.86650500lhr14s23-in-x03.1e100.net4.2014.7416.08950500
29pinterest.com23.235.37.844.5955.5266.99550500    000
30tmall.com110.75.114.89213.58214.453216.15550482    000
31wordpress.com192.0.78.94.9525.7688.05850491    000
32ask.com66.235.120.12777.74478.51279.45650500    000
33reddit.com198.41.208.1394.3625.0265.53450500    000
34blogspot.com216.58.210.414.194.8525.46850500lhr14s23-in-x09.1e100.net3.8944.3285.12550491
35paypal.com66.211.169.66   49049    000
36google.fr216.58.210.354.1194.7295.76550500lhr14s23-in-x03.1e100.net3.9054.3995.58450500
37mail.ru217.69.139.20055.97656.27157.21950500    000
38apple.com17.142.160.59   49049    000
39google.com.br216.58.210.354.0574.3345.05350500lhr14s23-in-x03.1e100.net3.824.3265.37150500
40onclickads.net78.140.191.8911.05311.66812.45350482    000
41tumblr.com66.6.41.3073.37774.21275.0450491    000
42aliexpress.com205.204.101.160148.093148.093148.093110    000
43microsoft.com134.170.185.46   49049    000
44google.ru216.58.210.354.0694.4445.58950500lhr14s23-in-x03.1e100.net3.9294.5386.04350500
45sohu.com220.181.90.240172.701175.674177.063503119    000
46imgur.com185.31.18.1934.2234.8635.85850500    000
47xvideos.com141.0.174.3710.85711.61812.89250500    000
48google.it216.58.210.354.2124.8635.50150491lhr14s23-in-x03.1e100.net3.9334.7766.49550491
49imdb.com207.171.166.2278.00478.71179.66250500    000
50google.es216.58.210.354.3384.9075.71550500lhr14s23-in-x03.1e100.net3.9634.5435.46150500
51netflix.com50.19.210.42   49049    000
52amazon.de178.236.6.250   49049    000
53gmw.cn111.202.12.1   49049    000
54fc2.com54.148.76.135149.872150.924151.97450491    000
55360.cn106.120.167.66176.302177.604181.38750500    000
56alibaba.com198.11.132.23144.695145.18146.22650491    000
57stackoverflow.com198.252.206.1683.03383.85784.8950500    000
58go.com199.181.131.249136.686137.457138.74650491    000
59google.com.mx216.58.210.354.4715.2635.9750491lhr14s23-in-x03.1e100.net4.0934.7875.6950500
60ok.ru217.20.156.15954.94255.84856.43850500    000
61google.ca216.58.210.354.6375.1185.95750500lhr14s23-in-x03.1e100.net3.9034.4855.69250491
62google.com.hk216.58.210.354.1995.1916.62950491lhr14s23-in-x03.1e100.net4.4214.9976.80950482
63tianya.cn124.225.65.154242.886247.381257.39150473    000
64amazon.in54.239.34.40   49049    000
65amazon.co.uk178.236.7.220   49049    000
66craigslist.org208.82.238.129142.154142.777143.34550491    000
67rakuten.co.jp133.237.48.124266.547267.284272.68250482    000
68pornhub.com31.192.117.132   49049    000
69naver.com202.179.177.22   49049    000
70blogger.com216.58.210.414.3415.0666.15550491lhr14s23-in-x09.1e100.net4.2894.9155.97350500
71diply.com184.27.136.1204.4784.9946.06650500    000
72xhamster.com88.208.29.2410.09611.29116.545505002a02:b48:4000:d::19.61310.61118.37850500
73google.com.tr216.58.210.354.3725.08811.50150491lhr14s23-in-x03.1e100.net3.9094.6176.09250482
74flipkart.com163.53.78.58145.724146.658148.342505002001:df0:23e:9002::15116.853117.855118.64750491
75espn.go.com68.71.212.186137.183138.1139.63350491    000
76googleadservices.com216.58.210.344.0364.7045.67250500    000
77soso.com106.120.151.169192.633193.531194.7501931    000
78outbrain.com66.225.223.5   49049    000
79cnn.com157.166.226.2588.46288.9289.72750500    000
80nicovideo.jp202.248.110.243   49049    000
81google.co.id216.58.210.354.0954.8765.80950500lhr14s23-in-x03.1e100.net3.924.4855.82650500
82dropbox.com108.160.172.200152.031152.031152.031110    000
83googleusercontent.com    000    000
84github.com192.30.252.13178.46578.89679.88450500    000
85bongacams.com64.210.142.1319.51920.28421.49250500    000
86kat.cr78.138.99.14415.04316.01517.91550464    000
87xinhuanet.com202.108.119.194   49049    000
88google.co.kr216.58.210.354.0374.7835.36350500wl-in-x5e.1e100.net10.9911.71712.58250500
89bbc.co.uk212.58.244.184.1514.9665.90650500    000
90ebay.de66.211.181.235   49049    000
91google.pl216.58.210.354.0885.0047.22850500lhr14s23-in-x03.1e100.net3.8684.3886.36450500
92google.com.au216.58.210.354.0964.4485.25150500lhr14s23-in-x03.1e100.net3.9414.695.18150500
93pixnet.net103.23.108.107286.382287.107287.81350464    000
94popads.net184.154.76.14092.73193.41994.26850500    000
95ebay.co.uk66.211.181.235   49049    000
96sogou.com106.120.188.46196.929197.989199.203481731    000
97dailymotion.com195.8.215.13712.13213.16314.87350491    000
98adcash.com104.154.36.143105.894106.606107.66150500    000
99adobe.com192.150.16.117118.065119.054120.16250491    000
100nytimes.com170.149.159.130147.227147.976149.52850500    000

Click to download the full dataset of the top 100,000 sites.



Get an email when I post, zero spam     Get an email when I post     Newsletter