Measuring IPv4 vs. IPv6 Performance

Measuring IPv4 vs. IPv6 Performance Oct. 19, 2015 in Bash, Networks, Text, University, Web

I produced a series of bash scripts to automate the process of pinging the list of websites. I choose bash as it is trivial to pipe the output from ping into various other command line programs such as: sed, gawk, and wget. As it was completely automated I decided to start early and just let it run. In total I pinged the top 100,000 websites up to 100 times each using script.sh (See Appendix A), recording useful statistics.

The script is very simple; a while loop to iterate over each site, and ping/ping6 piped into gawk to process the result. Gawk is very good at processing this sort of data, and more than fast enough to perform the task. The result was output to a large (11.1MiB) CSV file, with the IPv4 and IPv6 of each site, on separate lines.

As part of creating this script, I assumed that every site, if it exists would be able to respond within 10 seconds if it was online. If not, my script would timeout and assume the site was down. I feel this is a reasonable assumption as any remotely popular site should respond quickly, unless it’s currently being DDOSed. Another assumption I made is that any site I scanned would be able to withstand 5 requests per second. Even a raspberry pi is capable of serving 43 static pages per second¹. As I sent a maximum of 50 requests, the brief period of slightly increased load should be negligible for any of the sites I scanned.

In hindsight, I would have combined both IPv4 and IPv6 into a single line from the start, as manipulating the data in excel is significantly easier to do if it is all on a single line. By that time I had already scanned the top 100,000 site’s so simply regathering the data was impossible. To fix this, I created combine.sh (See Appendix B) which simply echo’s the IPv4 line without a newline, then the IPv6 line with one. This is the reason I have some duplicated columns in my combined output. These are removed in Appendix D.

Whilst looking through the IPv6 column I noticed a very common prefix: “2400:cb00:”. After some research I discovered that this prefix belongs to cloudflare². Using the prefixes I found on whatmyip.co³, I created a table mapping the hosting company to the number of sites it hosts. The results are impressive.

I decided to lookup the geolocation of each website. Looking around for a convenient database or API, I stumbled upon freegeoip.net⁴. It allows you to easily gather geolocation information for a specified IP in CSV form, perfect for my coursework. To retrieve this information using lookup.sh (See Appendix C) I self-hosted my own instance, then used cURL and a simple while loop to request and printf all the location information about each site to a file. I decided to record all the information given, to keep the script simple and retain all the information, to ensure I didn’t need to re-run the script.

Once the data was collected, it was time to head to Excel to analyze the data and draw conclusions. Having a large dataset let me create very good graphs, and draw good conclusions but was tedious to work with in Excel. Certain formulas, such as the ones used to create the Average Response Time per Country over Distance graph, managed to crash Excel numerous times and it even ran out of memory every now and then. In future when dealing with similarly sized amounts of data I would need to look into other graphing tools.

Results

I decided to plot all 100,000 points as an X-Y scatter. In this, and subsequent graphs, IPv4 is blue and IPv6 is red. Immediately I noticed rather obvious bands of pings, which are shown in the Histogram, below. There are large peaks in the graph above. It’s interesting to note that despite the lower adoption of IPv6, the initial peak is half the height of IPv4. Past that, the frequency is low, indicating lower IPv6 response times.

Thanks to CDNs and local sites, the top 1000 sites are concentrated around the 4-6ms mark with both averages trending slowly upwards. IPv6 is always significantly lower than IPv4 in the graph above. From a glance, you can see regions such as Africa, the Caribbean, and the Middle East without any IPv6 deployment. Sites are concentrated around the U.S.A, Europe and East Asia, with barren areas in-between.

Average response times for IPv4 only and IPv6 only sites is roughly the same at about 104ms. The average min and max are largely identical as well; nothing surprising. On the other hand, the averages for sites running both IPv4 and IPv6 is very low in comparison – only 25ms compared with over 100ms! Now the question is, why are sites running both IPv4 and IPv6 significantly faster?

A Vast majority of 90% of the Internet is IPv4 only, with only 4.5% of sites providing both. In fact, more sites provide neither than both! It’s impossible to not have an IPv4 address and be truly connected.

A large number of the top 100,000 websites are either blocking ICMP echo requests (ping) or are simply offline. Alternatively they could only be only listening to a specific sub-domain. I didn’t check for this.

One reason sites that provide both IPv4 and IPv6 are faster, is that 65% of them are behind Cloudflare, or Google. Both have worldwide CDNs, and Cloudflare provides a free IPv6 gateway; allowing IPv4 only sites to be connected to using IPv6.

The U.S.A. is the world leader in number of hosted sites with 43% of the market. Comparatively, every other country is trailing behind with Canada at 9%, Germany at 6%, and Hong Kong at 6%. This is despite the existence of global CDNs.

To test the geo-location accuracy, I potted estimated distance, over average response time. Sites above the diagonal are likely closer than their IP suggests. Sites below the diagonal simply have a poor connection. Besides a colourful graph, this shows the grouping of sites in different countries; explaining why there are so many peaks in the Histogram. There is a minimum amount of time it takes to connect to distant hosts.

Discussion

Limitations

I only ran the script once, and as the script took easily a week to analyze the top 100,000 sites, chances are the sites at the top had changed since the start. I could have gotten around this by parallelizing the script, or running it multiple times on a smaller set and taking an average. Parallelizing seemed too complex for the task at hand, and I didn’t really consider running it multiple times before it had almost finished analyzing. As I didn’t want to discard all the data, I decided to go ahead with the data I had.

The geo-location database I used isn’t 100% accurate - that is virtually impossible. You can see that it isn’t on the Average Response Time over Distance graph. Many sites are so significantly above the diagonal that the only way that the time would be possible is by breaking the speed of light; and indication that they are located much closer than their IP suggests. This is likely as common as it is, due the how exhausted IPv4 is, organizations are trading the limited number of IPv4’s that they have access to. This doesn’t really matter too much, due to the size of my data set. In future I could remove the outliers to produce cleaner results.

Speed

It is clear to see that sites that serve both IPv4 and IPv6 traffic are, on average, significantly faster than those that don’t (4 times faster on average). Every single graph I have produced shows this simple fact. I feel that this can be attributed to several factors:

Cloudflare and Google account for ~65% of all IPv6 traffic and both have global CDNs to ensure a fast and reliable connection. There are many more IPv4 sites that neither are associated with. Instead of connecting to a distance site, a CDN acts as a proxy, speeding up the time it takes to connect.
IPv6 isn’t widely deployed yet, with only 6% of sites serving it. Those 6% are usually the bigger websites; smaller sites may be stuck behind a single IPv4 address, in a data-center that doesn’t support IPv6.
Sites that serve both IPv4 and IPv6 traffic are concentrated in North America, Europe and East Asia – regions with geographically closer and better connected than the rest of the world. They are practically absent from regions such as Africa, the Caribbean and the Middle East. There is also little deployment in South America, West Asia and Oceania.

However, despite the fact that average response time for IPv6 is significantly faster than IPv4, it’s unlikely you’ll see any speed increase switching between IPv4 and IPv6 on a host that supports both. IPv6 is faster because the hosts that server both are well connected with fast response time, regardless of protocol.

Deployment

Despite the best efforts of organizations such as worldipv6launch.org, Cloudflare, and Google; IPv6 access is an afterthought. Despite claims of 500% growth since 2012, it’s 2015 and only 4.5% of sites support IPv6. As we go into the future, IPv6 deployment will surely grow as larger populations and the Internet of Things will strain the exhausted IPv4 pool even further. Until IPv6 is widespread, anyone with only an IPv6 address will be unable to connect directly to IPv4 only hosts, without the aid of a tunnel.

IPv6 isn’t evenly geographically distributed, compared with IPv4. If you’re in Africa, the Caribbean, or the Middle East, virtually no sites support IPv6. This suggests to me that the infrastructure required to support IPv6 just isn’t there.

Rank

Bigger sites are more likely to support and have a fast IPv4 and IPv6 connection than smaller sites. As you go through the different sites, the further down you get, the slower the site is to respond, on average.

Appendices

Appendix A – ping.sh

version=(4 6)
timeout=10
attempt=50

input="/home/mc21g14/ipv4-vs-ipv6/top-1m.csv"
range=(1 100000)
output=`date +/home/mc21g14/ipv4-vs-ipv6/outputs/%y%m%d-%H%M.csv`

sed -n "${range[0]},${range[1]}p" $input | while IFS=$',' read -r -a host; do
	[ "${host[0]}" = "1" ] && echo -e "pos,host,ip,v,min,max,ave,sent,got,lost"

	for ver in "${version[@]}"; do
		[ $ver = "4" ] && command="ping" || command="ping$ver"

		$command -c $attempt -i 0.2 -q -w $timeout "${host[1]}" | gawk -v pos="${host[0]}" -v \
      host="${host[1]}" -v version=$ver '
			BEGIN {
				OFS  = ","
				FS   = " "
				sent = got = lost = 0
			}
			/PING .+ \([0-9.:]+\) [0-9]+\([0-9]+\) bytes of data./ {
				ip   = substr($3, 2, length($3) - 2)
			}
			/PING .+\([0-9a-zA-Z\-_.:]+\) [0-9]+ data bytes/ {
				ip   = substr($2, index($2, "(") + 1, length($2) - index($2, "(") - 1)
			}
			/[0-9]+ packets transmitted, [0-9]+ received, [0-9]+% packet loss, time [0-9]+[a-z]+/{
				sent = $1
				got  = $4
				lost = $1 - $4
			}
			/rtt min\/avg\/max\/mdev = [0-9./]+ [a-z]+/ {
				split($4, temp, "/")
				min  = temp[1]
				max  = temp[2]
				ave  = temp[3]
			}
			END {
				print pos, host, ip, version, min, max, ave, sent, got, lost
			}
		'
	done
done 2> /dev/null | tee $output

Appendix B – combine.sh

counter=0
tail -n +2 output.csv | while read -r line; do
	if [ $counter -eq 0 ]; then
		printf "$line,"
		counter=1
	else
		printf "$line\n"
		counter=0
	fi
done | tr -d "\t " | tee combined.csv

Appendix C – lookup.sh

host="http://127.0.0.1:8080/csv/"

while IFS=$'\t,' read -a array; do
	rip4=`wget -qO- $host${array[2]} | tr -d '\n\r'`
	rip6=`wget -qO- $host${array[12]} | tr -d '\n'\r`

	[[ $ip4 == "" ]] && rip4=",,,,,,,,,,"
	[[ $ip6 == "" ]] && rip6=",,,,,,,,,,"

	echo "${array[1]},$rip4,$rip6"
done < combined.csv | tr -d "\t " | tee geo.csv

Appendix D – Data for top 100 sites

Rank	Host	IPv4	Min4	Ave4	Max4	Sent4	Got4	Lost4	IPv6	Min6	Ave6	Max6	Sent6	Got6	Lost6
1	google.com	216.58.210.46	4.64	5.578	6.173	50	50	0	lhr14s23-in-x0e.1e100.net	4.388	5.032	6.393	50	49	1
2	facebook.com	31.13.74.1	90.289	90.888	92.635	50	48	2	edge-star6-shv-01-ord1.facebook.com	93.663	94.24	94.955	50	46	4
3	youtube.com	216.58.210.46	4.738	5.27	5.849	50	49	1	lhr14s23-in-x0e.1e100.net	4.46	4.93	5.715	50	48	2
4	baidu.com	220.181.57.217				49	0	49					0	0	0
5	yahoo.com	206.190.36.45	152.285	152.97	153.387	49	15	34	ir1.fp.vip.bf1.yahoo.com	92.439	92.828	93.196	50	49	1
6	amazon.com	176.32.103.205				49	0	49					0	0	0
7	wikipedia.org	91.198.174.192	12.09	12.658	13.886	50	49	1	text-lb.esams.wikimedia.org	12.376	12.843	13.247	50	49	1
8	qq.com	125.39.240.113	237.524	238.328	239.603	50	48	2					0	0	0
9	twitter.com	185.45.5.43	4.484	5.529	7.486	50	49	1					0	0	0
10	google.co.in	216.58.210.35	4.702	5.485	8.106	50	49	1	lhr14s23-in-x03.1e100.net	4.332	4.893	5.687	50	49	1
11	taobao.com	110.75.115.70	185.638	186.337	186.899	49	16	33					0	0	0
12	live.com	65.55.206.154				49	0	49					0	0	0
13	sina.com.cn	202.108.33.60				49	0	49					0	0	0
14	linkedin.com	108.174.2.129	78.524	78.975	79.857	50	48	2					0	0	0
15	yahoo.co.jp	182.22.59.229	278.256	278.7	279.546	49	47	2					0	0	0
16	weibo.com	114.134.80.162	257.736	258.437	260.118	50	49	1					0	0	0
17	ebay.com	66.135.216.190				49	0	49					0	0	0
18	google.co.jp	216.58.210.35	4.114	5.044	5.676	50	49	1	lhr14s23-in-x03.1e100.net	4.158	4.766	5.519	50	49	1
19	yandex.ru	5.255.255.5	58.047	58.636	59.634	50	50	0	yandex.ru	69.847	70.568	71.145	50	50	0
20	bing.com	204.79.197.200	4.524	5.338	7.763	50	49	1					0	0	0
21	vk.com	87.240.131.99	55.904	56.806	58.016	50	50	0	2a00:bdc0:3:103:1:0:403:900	47.489	48.606	53.783	50	50	0
22	hao123.com	180.149.132.19	164.102	165.279	166.886	50	48	2					0	0	0
23	google.de	216.58.210.35	4.626	5.635	6.803	50	50	0	lhr14s23-in-x03.1e100.net	3.954	4.929	5.869	50	50	0
24	t.co	185.45.5.47	4.96	5.774	6.494	50	49	1					0	0	0
25	instagram.com	54.164.44.207				49	0	49					0	0	0
26	msn.com	23.101.196.141				49	0	49					0	0	0
27	amazon.co.jp	54.240.248.0				49	0	49					0	0	0
28	google.co.uk	216.58.210.35	4.141	5.171	5.866	50	50	0	lhr14s23-in-x03.1e100.net	4.201	4.741	6.089	50	50	0
29	pinterest.com	23.235.37.84	4.595	5.526	6.995	50	50	0					0	0	0
30	tmall.com	110.75.114.89	213.58	214.453	216.155	50	48	2					0	0	0
31	wordpress.com	192.0.78.9	4.952	5.768	8.058	50	49	1					0	0	0
32	ask.com	66.235.120.127	77.744	78.512	79.456	50	50	0					0	0	0
33	reddit.com	198.41.208.139	4.362	5.026	5.534	50	50	0					0	0	0
34	blogspot.com	216.58.210.41	4.19	4.852	5.468	50	50	0	lhr14s23-in-x09.1e100.net	3.894	4.328	5.125	50	49	1
35	paypal.com	66.211.169.66				49	0	49					0	0	0
36	google.fr	216.58.210.35	4.119	4.729	5.765	50	50	0	lhr14s23-in-x03.1e100.net	3.905	4.399	5.584	50	50	0
37	mail.ru	217.69.139.200	55.976	56.271	57.219	50	50	0					0	0	0
38	apple.com	17.142.160.59				49	0	49					0	0	0
39	google.com.br	216.58.210.35	4.057	4.334	5.053	50	50	0	lhr14s23-in-x03.1e100.net	3.82	4.326	5.371	50	50	0
40	onclickads.net	78.140.191.89	11.053	11.668	12.453	50	48	2					0	0	0
41	tumblr.com	66.6.41.30	73.377	74.212	75.04	50	49	1					0	0	0
42	aliexpress.com	205.204.101.160	148.093	148.093	148.093	1	1	0					0	0	0
43	microsoft.com	134.170.185.46				49	0	49					0	0	0
44	google.ru	216.58.210.35	4.069	4.444	5.589	50	50	0	lhr14s23-in-x03.1e100.net	3.929	4.538	6.043	50	50	0
45	sohu.com	220.181.90.240	172.701	175.674	177.063	50	31	19					0	0	0
46	imgur.com	185.31.18.193	4.223	4.863	5.858	50	50	0					0	0	0
47	xvideos.com	141.0.174.37	10.857	11.618	12.892	50	50	0					0	0	0
48	google.it	216.58.210.35	4.212	4.863	5.501	50	49	1	lhr14s23-in-x03.1e100.net	3.933	4.776	6.495	50	49	1
49	imdb.com	207.171.166.22	78.004	78.711	79.662	50	50	0					0	0	0
50	google.es	216.58.210.35	4.338	4.907	5.715	50	50	0	lhr14s23-in-x03.1e100.net	3.963	4.543	5.461	50	50	0
51	netflix.com	50.19.210.42				49	0	49					0	0	0
52	amazon.de	178.236.6.250				49	0	49					0	0	0
53	gmw.cn	111.202.12.1				49	0	49					0	0	0
54	fc2.com	54.148.76.135	149.872	150.924	151.974	50	49	1					0	0	0
55	360.cn	106.120.167.66	176.302	177.604	181.387	50	50	0					0	0	0
56	alibaba.com	198.11.132.23	144.695	145.18	146.226	50	49	1					0	0	0
57	stackoverflow.com	198.252.206.16	83.033	83.857	84.89	50	50	0					0	0	0
58	go.com	199.181.131.249	136.686	137.457	138.746	50	49	1					0	0	0
59	google.com.mx	216.58.210.35	4.471	5.263	5.97	50	49	1	lhr14s23-in-x03.1e100.net	4.093	4.787	5.69	50	50	0
60	ok.ru	217.20.156.159	54.942	55.848	56.438	50	50	0					0	0	0
61	google.ca	216.58.210.35	4.637	5.118	5.957	50	50	0	lhr14s23-in-x03.1e100.net	3.903	4.485	5.692	50	49	1
62	google.com.hk	216.58.210.35	4.199	5.191	6.629	50	49	1	lhr14s23-in-x03.1e100.net	4.421	4.997	6.809	50	48	2
63	tianya.cn	124.225.65.154	242.886	247.381	257.391	50	47	3					0	0	0
64	amazon.in	54.239.34.40				49	0	49					0	0	0
65	amazon.co.uk	178.236.7.220				49	0	49					0	0	0
66	craigslist.org	208.82.238.129	142.154	142.777	143.345	50	49	1					0	0	0
67	rakuten.co.jp	133.237.48.124	266.547	267.284	272.682	50	48	2					0	0	0
68	pornhub.com	31.192.117.132				49	0	49					0	0	0
69	naver.com	202.179.177.22				49	0	49					0	0	0
70	blogger.com	216.58.210.41	4.341	5.066	6.155	50	49	1	lhr14s23-in-x09.1e100.net	4.289	4.915	5.973	50	50	0
71	diply.com	184.27.136.120	4.478	4.994	6.066	50	50	0					0	0	0
72	xhamster.com	88.208.29.24	10.096	11.291	16.545	50	50	0	2a02:b48:4000:d::1	9.613	10.611	18.378	50	50	0
73	google.com.tr	216.58.210.35	4.372	5.088	11.501	50	49	1	lhr14s23-in-x03.1e100.net	3.909	4.617	6.092	50	48	2
74	flipkart.com	163.53.78.58	145.724	146.658	148.342	50	50	0	2001:df0:23e:9002::15	116.853	117.855	118.647	50	49	1
75	espn.go.com	68.71.212.186	137.183	138.1	139.633	50	49	1					0	0	0
76	googleadservices.com	216.58.210.34	4.036	4.704	5.672	50	50	0					0	0	0
77	soso.com	106.120.151.169	192.633	193.531	194.7	50	19	31					0	0	0
78	outbrain.com	66.225.223.5				49	0	49					0	0	0
79	cnn.com	157.166.226.25	88.462	88.92	89.727	50	50	0					0	0	0
80	nicovideo.jp	202.248.110.243				49	0	49					0	0	0
81	google.co.id	216.58.210.35	4.095	4.876	5.809	50	50	0	lhr14s23-in-x03.1e100.net	3.92	4.485	5.826	50	50	0
82	dropbox.com	108.160.172.200	152.031	152.031	152.031	1	1	0					0	0	0
83	googleusercontent.com					0	0	0					0	0	0
84	github.com	192.30.252.131	78.465	78.896	79.884	50	50	0					0	0	0
85	bongacams.com	64.210.142.13	19.519	20.284	21.492	50	50	0					0	0	0
86	kat.cr	78.138.99.144	15.043	16.015	17.915	50	46	4					0	0	0
87	xinhuanet.com	202.108.119.194				49	0	49					0	0	0
88	google.co.kr	216.58.210.35	4.037	4.783	5.363	50	50	0	wl-in-x5e.1e100.net	10.99	11.717	12.582	50	50	0
89	bbc.co.uk	212.58.244.18	4.151	4.966	5.906	50	50	0					0	0	0
90	ebay.de	66.211.181.235				49	0	49					0	0	0
91	google.pl	216.58.210.35	4.088	5.004	7.228	50	50	0	lhr14s23-in-x03.1e100.net	3.868	4.388	6.364	50	50	0
92	google.com.au	216.58.210.35	4.096	4.448	5.251	50	50	0	lhr14s23-in-x03.1e100.net	3.941	4.69	5.181	50	50	0
93	pixnet.net	103.23.108.107	286.382	287.107	287.813	50	46	4					0	0	0
94	popads.net	184.154.76.140	92.731	93.419	94.268	50	50	0					0	0	0
95	ebay.co.uk	66.211.181.235				49	0	49					0	0	0
96	sogou.com	106.120.188.46	196.929	197.989	199.203	48	17	31					0	0	0
97	dailymotion.com	195.8.215.137	12.132	13.163	14.873	50	49	1					0	0	0
98	adcash.com	104.154.36.143	105.894	106.606	107.661	50	50	0					0	0	0
99	adobe.com	192.150.16.117	118.065	119.054	120.162	50	49	1					0	0	0
100	nytimes.com	170.149.159.130	147.227	147.976	149.528	50	50	0					0	0	0

Click to download the full dataset of the top 100,000 sites.

http://raspberrypi.stackexchange.com/a/199 1000 / 23.186 = 43.13 requests per second ↩
http://whatmyip.co/view/web_hosting/4638/Cloudflare_Inc.html ↩
http://whatmyip.co/browse/web_hosting/World_Hosting_Companies_DB_130000.html ↩
https://github.com/fiorix/freegeoip ↩

Human Computer Interaction Prototyping Network Penetration Testing