Plenary session

Tuesday, 17 November, 2015

9 a.m.:

FILIZ YILMAZ: All right. Is this working? OK. I am loud and clear enough, I guess. We will start shortly, so can we please take our seats and make sure that we are ready to start. Thanks for gathering on this second day of the type meeting. It's a bit thin in the audience but as the morning progresses we hope there will be more souls in this very room. My name is Filiz Yilmaz, I am part of the Programme Committee, and I will be chairing this session together with Shane Kerr here, my colleague. ‑‑ maybe this is better for a while.

So, the house rules are the same as yesterday. We will have speakers and then after that, we hope to have five to ten minutes of question and answer or comments. During that time, please come and line up behind the microphones and state your name. And we will coordinate those sessions.

So we will start with our first speaker, Dario Rossi, if you can come to the stage. He is going to talk about Anycasting and the technology, how it progresses so far. Thank you.

DARIO ROSSI: So can everybody hear me? OK. So, very excited to be here, as you can see on the slides, this is my first RIPE meeting so I am thankful for RACI support sponsoring my participation here. This is work that has been carried out in context of our project which is on mPlane which is about to finish. The topic of the talk is going to be about Anycast, more precisely are Anycaster and as you can see I am I am a professor at Telecom Politech from Paris.

Let me just briefly mention so what is going to be the talk. So the idea is that a couple of years ago seminal work at NANOG was showing who are the Anycasters with some new work and what is the topic of this talk, where are those Anycast addresses. Just as a comment let me remind you everything you can see in this top is available as OpenSource as this URL but please don't brows there and just follow me for a short while while I take you into this about Anycast. We are going to talk about background and related work and then we are going to comment on what we have so far, basically, a technique which is based on latency management to provide geolocation facilities and the IPv4 world so we did several senses in order to discover where the Anycaster are.

So I am commenting on what we are starting doing which is not achieved yet, we are studies of infrastructure, for instance, evolution or the usage of Anycast in the real networks with measurements. And something new, which is application of this technique for BGP hijack detection. And sorry for theacronym, I don't come up with anything better. So about the Atlas new yearly conclusions.

I am pretty sure that everybody knows what we are talking about but just to avoid ambiguity, we can probably ‑‑ Anycast is a set of servers that provide the same services and we can do that application layer and that typically, it works is that you have some mechanism in direction between, for instance, a name and an address. So that you can have different people that we redirect to different services and this is the way in which you do at application level. The advantages is that you can have a very fine control, you can also maintain connection affinity of fast failover but the difficulties you need to enforce something per service, you need to maintain the service and do measurements so it's not so trivial. This is why a.m. me is serving its CDN with several thousand boxes. We are working on and what is our focus is IP layer Anycast where the idea is that now all the serves share the same physical resource which is on IP address, and the way in which it works is that now, the BGP, this address is pop grated from several points of origin and it is for ‑‑ in the mandate where the clients are going to be directed to. What you gain is that you are now gaining something which is support over IP, transparent to upper layers and some control like visibility of servers being global or local but what you lose is ability of fine‑grained control, you are directed to something which is dictated by the IP matrix, which are depending not on quality of service but more on economic reasons. And you have no guarantee of affinity. Though this is not a big problem in practice.

So in order to cover briefly the related work, basically there are a couple of work that focus on Anycast detection or enumeration but previously to you work there was none able to gey locate. And doing characterisation of some important properties of a number of target servers and our ‑‑ we do apply our technique to the whole IP world instead of limiting ourself to some specific deployment.

So in order to look a little bit what we are going to talk in the next slide, our workflow is in two stage, part one RIPE lab and RIPE Atlas, are lightweight, use a fix number of pings per target and vantage point and in the case of DNS we don't need to issue new measurement, historical RIPE data. In the case of CDN issue new measurement, is it going to cost 500 per target, together good enough coverage or if you want to use all the RIPE Atlas vantage points about 180 credits per target. The reason we use PlanetLab, it allows to us use some protocols like http over TCP. We are going to build a ground root nod to do our evaluation. We are going to move on into PlanetLab, we are examining to apply this technique in the order of 5 million targets which means we need too far lot of credit, the estimate is about 3 billion credit, we don't have that, and so we use PlanetLab. But we have ideas on how to combine and I am going to show one picture about how to combine PlanetLab and RIPE measurement to get something more useful for both. The finish on the problem is relatively simple, consider Google or CloudFlare or Edgecast or root servers or any other Anycast technique. How do you find where those replica are located. If you are going to ask to a commercial database you are going to get wrong answer, why? Because most commercial database are only going to give you one answer and in the case of Anycast you need multiple so it's already implicitly wrong. It's also wrong because the answer is then varying, all those services were given a different answer at different time point. These answers sometimes very precise like Mountain View, California, this is the headquarters of the others are locating this, not the physical location of the server. In this case MaxMind was returning latitude and longitude, if you browse on a map you will find yourself in non‑paved road which is very unlikely that these broken down facility hosts from servers.

However, even if you are using measurement like doing trace route measurement from Europe, Asia and North America to the same target, despite they could have some mean to provide an answer, then the answer is still wrong. You see, for instance, they are forecasting you can go from Ireland to California in two millisecond which is kind of strange, so you have got physical limitation about the speed of light and roughly over the Internet the light speed travels at about 100 kilometre every second. It's clear you cannot go from here to there in two milliseconds. The measurements are not good either but they could and this is ‑‑ this was precisely the idea that was using NANOG ‑‑ this was the idea that was using NANOG to detect Anycast. So, whenever you see an inconsistency in the speed of light, whenever, it means that the servers they are contacting are two physical separate replicas and this is precisely what we are going to use here in our technique to bring to next step in order to geolocate. Let me give you a brief overview of how the algorithm works. You start with latency measurements, towards the destination which means you can map those latency measurements so you can draw circle on a map and by drawing the circle basically you are confining the boundaries where physically any replicas can lie, so here we have a number of them and you can see that some of the circles are not overlapping and not overlapping means speed of light violation, contacting different Anycast replicas and located at the centre of the circle. In order to enumerate fully all the replicas, formulate the problem as a maximum dependent set, a much more ‑‑ optimisation problem, we can solve it with a Bruce force approach for small deployments but we find approximation is very good in practice so we are examining to live with that. And the next step is to pick a circle and geolocalise ‑‑ and so you cannot rely on latency in order to geolocalise precisely. We are using site channel information. So to put it very simply: Our technique: Pick a circle, the biggest city and hyper ability where Anycast is. The reason this might hold you are using Anycast in order to it reduce latency and load and you want to put servers close to where user live, densely populated areas which are cities. So then we can put ‑‑ he we did a study and this is a good enough which works well in practice and then something you can do at this stage is that you can pick any circle, pick the point which you believe is going to be the most likely source of Anycast, collapse the circle to that point so no longer overlapping, so that was overlapping in this point is no longer here so you can iterate until convergence, until exhaustion of all the Anycast replicas. And if you want to see how it works in practice so we have a very accurate numeration, we find over 75% of the measurement of the ground root in our measurement, whenever we use DNS K ‑‑ replies. It is precise, over the 75% in the cases we are right just picking largest city. The good thing is this protocol agnostic so it works for DNS, CDN, also other things that we wasn't forecasted yet. It is lightweight, we are using couple of magnitude less vantage points so generate less traffic. It is pretty fast because it completes in 100 millisecond instead of hours for brute force and there is the OpenSource code that let you analyse and draw these kind of maps over Google map.

OK. How do you apply this technique in order to perform a large scale study? Things change a little bit, in this case you need to something with start around 10 to the power of seven /24 which are basically all the /24 that are routable. Then you can ‑‑ you need to have a number of vantage points in the order of 200, 300, which is the number of vantage points at PlanetLab. From each of those vantage points one of the living IPs in each /24. Sour going to issue a mm‑hmm of measurements which is typically ten to the power of seven times ten to the power of two. And you are issuing them as given rate, actually we are probing at ten to the power of three so 1,000 probes per second. We had to slow it down. If you need to know why, come to the MAT Working Group, we have going to have a wit more details about this. And about those 10 million probes that you send only some of them is going to get a reply, some of them are going to be ICMP errors, host prohibited, in order to avoid probing this from a lot of vantage points. And whenever you are finishing with some echo reply then you can run analysis and you are going to have run that to show you earlier over millions of this ‑‑ of these targets, only to find most of them are Unicast, which was suspected and at the end of this stage only about 1,000 deployments which are to be found Anycast.

Just as a comment, 100 milliseconds done a million times makes weeks, so we need to engineer significantly our effort in order to make everything run in about five hours. And if you do so, well, at the end you are going to find something which is the proverbial needle in the haystack. In order to give you an idea, this is geographical map. Actually, there are some deployments for which we find less than five replicas, so we prefer to talk about deployments for which we are definitely sure that they are Anycast so we are going to focus on those that have five in our measurement and these which ‑‑ by the way, count for most of the replicas that we find which are over about 10,000. Here you see we have done a number of sense uses during March this year. One take away, when you combine measurement for multiple sense us ‑‑ number of replicas, where you see that most of the deployments we measure to have about 10 replicas or less. But what happens when we do compare, for instance, PlanetLab and RIPE, so for sure we cannot issue complete IPv4 census from RIPE Atlas because we would /THAOED too many credits, once we have those results made by a preliminary (frozen screen) we can issue measurement to ameliorate the coverage so guys that have more than ‑‑ ‑‑ in order to enlarge the coverage that we have at the geographical point, we have a number of replicas where we only see two replicas, some of them appear to be Anycast but we are hot sure so what we can do is we can launch replicas ‑‑ measurement from RIPE Atlas in order to validate this and as you can see here small measurement ‑‑ sorry, small deployment, in most of the cases we foundational replicas from RIPE, and also RIPE is also able to ameliorate the coverage from the perspective of the large footprint ones.

So as I said, this is only very briefly, quickly here, so if you want to have more details about this, come talk about this in the MAT Working Group.

What I want to do is give you a brief coverage of what you have seen. This is top 50 chart of the largest deployment, stopped at 50 because this is what about fits in the slide, and as you can see here, it is pretty easy to spot a number of big fishes that are listed here for reference that pertain to a lot of different services. The picture shows her the AS number, ordered by the replica that we find in the measurement. This is the number of /24 that the AS owns. This is the open ports that we find with complimentary survey or census of the open port. And then this is the pages that are served by these AS and this is the CAIDA rank. The take aways important ASs are between 10 top in the CAIDA list. Serves that are hosting the between the top 1,000 Alexa pages. We see a lot of server diversity, not only port 53, port 80, port 443 for open ports and CDN but also unknown ports. We have intelligence footprint with some of the ASs accounting for a significant fraction so if I don't recall correctly here we have 600 of the 897 are from CloudFlare and some just using a single /24 to Anycast. So we have picture which is pretty complex, but the main takeaway from this picture we are finding needle in the haystack since we are still big fishes with a lot of services and popular services we are finding a silver needle in the haystack. This is one overview of something that we found in the census about the ports so what we did was an Nmap census probing all the ports, one IP for each the /24 and we final CDN and web with break down of the software that is popular in Anycast which is very different from the software that is popular in the typical Unicast world and we find some oddities, we weren't expecting to find any gmail, e‑mail per POP or Gsmtp, so we didn't look into the issue much more in detail, we also found oddities, over 10,000 open ports on OVH, largest hosting facility in Europe, third largest in the world, and is using definitely a lot of services which we have no idea what they are because they are non‑standard ports. Which is kind of interesting. Other surprising things to find open SSH running over Anycast replicas bus we haven't looked to that in detail. The reason why we are also doing different stuff, like I am showing here a couple of viewpoints so‑so far what we did with a spatial view over all IPv4. What we can do is to have orthogonal view of evolution of the l‑root server taking measurement from historical RIPE data so we are tracing back to 2012 to more or less now. The measurements are from DNS chaos, we have the chaos name and DNS measurement, so the gives you the round tree time at the application layer so we can apply our technique and this is about a we found out of what is visible in the measurement. So which is partly by the reason of the overlapping in the circles so this is basically the limits of our technique.

Another orthogonal view, I don't want to read the picture, just wanted to show you something which is basically about characterisation of Anycast usage, so here we are using passive measurement at some ISP point of presence, with some 40,000 users, we are characterising the traffic pass civil so we know what our address is as per our census, we are going to look at what are the relevant properties of the traffic from this perspective. And the very last comment is about another kind of application of this technique, so, if you remember in the beginning I was talking about one of the problem of Anycast being prone to BGP hijacking and this is example of BGP hijack, actually it's an interception where the traffic that was just log to United States first to Moscow and then back to the original destination. In this case it would make latency go up and this can be detected in somehow without having precise resolute information but only in the latency space. So the difference between Anycast and BGP hijack, if you want, is that in the first case, the BGP hijack routers are not authorised to push BGP updates while in the case of Anycast routers can legitimately do it but the technique can also work to defect hijacks. What are the problems and how can we use it? Basically using two different viewpoints. One is reactive scan on BGP announces. Having a pronounce that rates announces close to some monitor and then performs some interest about the announces, trigger some measurement and try to confirm what the BGP announces do, from the contra plane with latency measurement in the data plane. So one of the problems of doing so is that not all the hijackers but some are very short duration, like one two minutes, and the contra plane information can propagate to some monitors only after the problem has disappeared from the data plane, we are not sure if this can work because you need to have very timely and tight workflow in order to defect it. Another complementary viewpoint, if BGP hijack duration is one minute, then let's scan the Internet every minute. Let's scan all the /24 prefixes. So the problem is that what you need is basically going 100 times faster than what we are going now, so our current speed, but since this is more challenging it is also more fun. Then just a comment, we started working on this issue under the aegis of Google faculty research this year and already a factor of ten in terms of performance was what I told you before, so we needed to slow down our software just because we had some problem in the Internet that was filtering our replies, I am going to talk about this more in the MAT Working Group, so a factor of ten is already there so we need another factor of ten, just need another factor of ten. But we have reason to believe that this is doable and one of the reasons here is also to talk with you to exchange about BGP hijack detection so I am going to be here all week so if you are interested in that.

We are proposing a new technique allows to investigate but especially geolocate Anycast deployment, lightweight, faster and protocol agnostic. It is ready, so there is OpenSource, it works from RIPE, using your credits. And it is useful, there is a web interface that is already exporting a significant subset of the census data that we have made which is already available right now. So if you are interested, don't hesitate to drop an e‑mail. My PhD is mainly working on this topic because my e‑mail reply is knowing to be heavy detailed, if you want to get a timely reply cc time. This is the reference. Now, I am happy to take questions if you have any?


SHANE KERR: Thank you, that was very interesting. Are there any questions or comments.

JIM REID: Just speaking for myself. Very interesting talk. I wonder if you had a chance to compare notes with what Ray Bellus has been doing for ISC, his effort was more traffic on the receiving side of Anycast instance so he is one of the guys that look after of the F roof server and they were finding problems to do with strange policies at other ISPs, there is example of very large European that was ‑‑ tending the query traffic to F and appearing at the Anycast traffic to in pawality owe rather than ‑‑ I wonder if you give any consideration to those factors or are you considering so many probe points all over the place that any issues about a hosting ISP and any peering policies they have are not going to affect your over all measurements?

DARIO ROSSI: It was a pretty complex question. The first part basically, there wasn't somebody from Netnod looking at the same perspective so we have Anycast facility traffic from where because we are deploying Anycast because it's suspected, because we want to serve close by people. But it is not always true and there are a lot of problems with issues with routing with policies so traffic which is far away gets absorbed to far away. This is complementary viewpoint. In our case from our measurement if one of those guys happened to be in one AS that he is having a worth policy going to be absorbed by some far away replica and for the way the measurement work, the algorithm work, is going to circles in increasing radius and basically the far away guy are not appearing the result because they overlap a lot with the others. We don't see these kind of oddities. In order to see them, you really need to a more fine grain so not just need to consider latency but also as you said, more advance information which is not trivial to map. Even if do you trace route how do you map to one XP with ‑‑ without, this is known to be not very easy to do so that is a complimentary viewpoint but we didn't touch at all this stuff.

JIM REID: I will have a chat with you later.

SHANE KERR: I have a question, so this is sort of an observational study. Do you think ‑‑ do you have recommendations to people running Anycast operators or have you established some kind of general principles about the way this works?

DARIO ROSSI: So, I am from the academic community and this kind of work was already done in the beginning of the year 2000 where there were recommendations how to improve affinity, so other aspects ‑‑ our point was more to say, so now Anycast is a reality but is not necessarily known in the academic community. We didn't even know what was the ‑‑ of Anycast, this was observational study, since we have not from one operational community we don't want to give lesson because my lesson would be how do you tune an algorithm and not how do you run a network. This was not the point. And moreover, if you want fairly late with respect to academic research and the paper would have been rejected with probability.

SHANE KERR: Thank you very much.


Our next talk this morning is also a measurment related one, this is Andra Lutu and she is going to be talking about mobile networks.

ANDRA LUTU: Good morning. I am a Romanian coming all the way from Oslo, from Simula Research Laboratory. It's a research project together with seven other different partners we are trying to build a measurement platform for MONROE, which aims to measure mobile broadband networks in Europe. First I am going to talk about the vision of the project, why do we think this is necessary in today's Internet. Then I am going to talk about who can benefit from such a platform. And finally give you a little, some preliminary results of what we can do with the results from such a platform.

So let's begin.

Well, what motivates this work? What we observe is everybody has a smartphone and the combination of this very powerful devices with the high capacity 3G and 4G has radically changed the way we access and use the Internet. As we ‑‑ we are becoming more reliant on mobile networks, what we observe is that there is a lack of information of objective data and performances over the mobile broadband networks. In the same way organisations such as Standard and Poor's or moody's offer ratings for countries to show their trustworthiness for external investors, MONROE plans to offer this objective ratings of mobile broadband in Europe, and application performance.

So, what we strive to achieve is to design, build and operate, a open European scale flexible hardware based platform that allows everybody to run experiments on operational networks in Europe. So the main idea of why we are building this platform is to identify the key performance parameters that allow for an accurate and consistent way of describing the performance of the networks. Not only that, because the platform is kind of flexible and contains a very powerful notes, we allow for examination evaluation of innovation.

Now, as a European project, we are very open in collaborating with other projects, so you just heard before me Dario speaking about research done in the projects we are collaborating with and we are striving to be mPlane compatible, with the architecture. And the user access and experience scheduling, other FP7 project. We are building on top of NorNet Edge which is the national MONROE for Norway, so it's hardware based measurements platform that monitors and assesses the performance of mobile broadband providers in Norway. We collaborate with WiRover which is a similar measurements platform but from the University of Wisconsin Madison, the data is going to be made available via coming out of another European project. We are very open to establishing other collaborations because we feel it's very important to have a consistent methodology for performing these measurements so we want to be able to put together the data sets of these measurements from different platforms, to make valuable results, to produce valuable results.

Now, where is MONROE, what is it? So it's a hardware based platform, as I said we are building on top of the NorNet Edge infrastructure which already consists of about 200 nodes that are deployed in Norway. But we are going to extend this within money row in three more countries in Europe in, Sweden, Spain and Italy. So this will allow us to compare what the ecosystem in these countries and to be able to show different configurations, regulations, frequency and operator strategies.

As I said, it's a hardware based plait post‑mortem so what we are using is a very powerful node, a measurement node, that is basically a Linux box. It's very flexible and allows for modification and supports very demanding applications like video. And we don't only have stationary nodes but 150 nodes of the 450 that we are going to be deploying all over Europe are mobile nodes so basically, they are operating on board trains, buses, trucks. So it will allow for external users for ex /PERPBTers to observe the impact of mobility. And not only that but also compare performance in rural and city areas. Each of these measurement nodes connects to at most three mobile broadband operators per country and this allows for experimenting on different access technologies and ways of combining them, new opportunities like 4G wi‑fi offloading and so on. So a wide area of experiments that you can imagine and deploy here.

Now, who can benefit from this? For whom is it good for? Well, what we try to achieve is lowering the barrier for external users. They are the centre of the platform. And what we are going to, what we strive on doing is open up all the data sets, provide data to the community. Not only the data sets but also the measurement methodology and all the tools that we design in order to analyse the data sets. Not only that, but we are going to be running a couple of open calls, first one coming in December this year, I am going to be a bit more later on about this. We offer experiments as a service. The measurements as I said, are very flexible and offer modifications so you can do lots of things there. I want to hear about what you are interested in doing. So now who are these external users, I say? Well I hope everybody here in the room. But we have put them in five different categories, obviously regulators and society at large can benefit from having such an interesting data set that allows to assess the stability and performance of mobile broadband networks and guide competition, which we believe it's very positive thing. Users and consumers, are we are becoming for reliant on mobile broadband networks and used cases, what we want to make an informed decision when we buy our sub‑description so this sort of measurement platform will allow to generate the data set for us to make informed decision as consumers, which we, I believe, all are.

Also organisations and businesses. Well, we have been seeing quite a few of these coming up in the last years, and we have couple of them in the consortium, and we believe that, for example, ambulances or trains offer services based on mobile broadband networks and these will benefit greatly from such rate that is we were talking about.

Obviously, researchers, innovators and experimenters, and I am pot just talking about academia here, but operators and the businesses that I just mentioned, it will allow them to evaluate the performance of normal applications and protocols in a realistic scenario, in operational scenario.

And finally, last but not least, operators, who can find different ways of, for example, testing coverage, which I am going to talk about a bit more later on. They can find new ways, better ways to do frequency plans, more cost fish enter investments and better network utilisation.

So, all this constellation of external users, well we started talking with them and hearing what they would be interested in and we translated all these into use cases for the platform. So as a consortium we are dedicated to identifying all the key performance parameters that I mentioned before. We are also very interested in the application performance, also the quality of service to quality of experience mapping for different, for example, for video or web browsing. But not only that, but also assessing protocol innovation, also understanding whether you can still innovate into this Internet.

So, just to give you a little taste of what we can ‑‑ what type of data we are collating and what we can do with it. I am going to show you what we have found while working with data from NorNet Edge and try and just look at how good coverage is.

So, as I said, NorNet Edge is also hardware based platform. We have, now, six nodes running on top of trains ‑‑ on board trains. We are working with the operator, the train operator in Norway, NSB, each of these nodes that are hosted on the trains, they are connecting to the two largest mobile broadband operators in Norway, tell nor and net come. And we are traversing basically about 2,500 kilometres. The routes are the green, the blue one. Red and magenta one and the question we ask is very simple. Where do we have good coverage and where do we have bad coverage? When on a train. It's all very simple. And what we propose is a complete approach to coverage characterisation so we look in an area in ‑‑ in the same area, all the possible ‑‑ all the available radio access technologies and we believe this is of interest for regulators and users who are using their mobile phones while on the trains, and businesses, basically NSB is the business here, providing the service to their customers.

Now, the data set that we have, we collected it over five months, and it basically consists of radio access technology from each of the modems connected to the measurement node and we also merged with this with the GPS information we get from the Norweigan railway system. So the beauty of this set‑up is that we can run repeatedly the same measurements over the routes, right? So over five months we run more than 100 different ones, by run I mean a trip of the train in any of the two directions. So, what end up in the end is this complex geo‑tagged data set where we have diversity both in space and in time, so it's pretty complex to work with, right? Because we have all these variable locations and we have also the variable time tags, so how do we deal with the limitations of these data set, with the challenges of working with the data set? If you block all the points that we have in the data set, on the map, it's basically what you see here, all the black points, that follow the train routes that you saw before. But now it's very hard to work with them because they have variable space coordinates, we do geographical data binding, overlay grid by 2 kilometres on top of the Norwegian map, you will see this is exaggeration, not to scale in this one, we put together all the data points that fall within the same grid block, so our data points in the new data set will become the grid blocks. For each of the grids we generate coverage chart which tells us which is the percentage of the coverage along the segment of the route that falls along the grid block. What what we generate are time series, so these are new data points. And now the question comes out to, how can I find a way to put together all the grid blocks that have good coverage or bad coverage? This is a very simple categorisation, right? And the method that we found is clustering, I am not going to bore with you the details of that but I am going to show you the data. That is the data we collected over the route Oslo ‑‑ What you have here on the horizontal is all the grid blocks along the route. Both foretell nor and net come. And on the vertical starting with the first one to the last one. Now, each of the squares, the little squares that you probably not see very well here are colour coded to show all the different radio access technologies, blue, 2G, orange 3G, 4G green and no service in red. We put the gradient to show what is the percentage of the coverage within this particular grid block. So if you look at the little square here, you are going to see basically what is the distribution of this radio access technologies. Now, we put on top of all these data the clustering algorithm that, can differentiate, can separate this grid blocks based on their similarity and we end up with two clusters, so good coverage and bad coverage, in good coverage you have mostly orange‑coloured, so mostly 3G and in bad mostly reds, so no services bit of blue, bit of 2G, a bit of everything. So a lot of mixture here. And the situation is consistent both foretell another and net come. Here we observe one simple observation tell nor does a bit better in 4G because we see a bit more green.

What we do with this information, we plot on the map, obviously. We observe that city areas have better coverage. No surprise there. But also what we look at is how many different runs, how many different repetitions of this measurement you need in order to consistently classify one of these little areas as good or bad coverage. What we do is, have a sliding win toe of end different drive runs and observe the similarity and classification of good and bad coverage. The classification we calculated with the average jackard distance. As we hit around 10 different runs, then we observed that our classification becomes table. So this tells us we need at least 10 different runs in order to be able to consistently classify a region as good ‑‑ having good coverage or bad coverage based on our measurements.

So now, that I have taken you through all this travel, I am going to come back to the initial point where I was talking about the project. And as I said before, we started in March of this year. We are currently working on the system design and proof of concept implementation. We have completed the hardware selection and we are going to start deployment the nodes in March next year but what is very important here is that we are running this open call so we want to get external users excited about running their experiments on our platform and we offer funding, please go to this link here, subscribe, to get more information and hopefully to get up until €150,000 to run your experiments on this platform.

So if you have any questions, doubts, please just drop me an e‑mail or find me around here, I will be here almost all week. Go on the we can page of the project MONROE. Thank you, if you have any questions?


FILIZ YILMAZ: Thank you. Any questions? No. Well, this is very interesting and thanks for your speech. So, next one up we are ahead of the time, we will probably get early access to the coffee, but before that, we have Jamie, who is from New Zealand, and he is going to give us a glimpse of what is happening over there in regards to the broadband happenings, please.

JAMIE HORRELL: Well, I am from NSRS across from New Zealand. Now ‑‑ I am going to be talking about mapping New Zealand's broadband infrastructure which is something that we have been working on and in NZRS, I have got some background about what it is, people tend to ask. We were formerly known as dot N Z registry services which means we do operate provider of critical infrastructure structure and authoritative Internet data and primarily the DNS and the dot NZ registry.

We also operate a number of public services so we actually have an NTP serve network for public good, and RPKI validation service, open PGPKEY server, Internet data portal which has got open data from the Internet and New Zealand. And we maintain an Internet research function and I am part of that function.

So, New Zealand broadband again some background, it's been an interesting 15 years, things have changed a lot, we have gone from monopoly infrastructure providers to a lot of infrastructure competition so we have got competition on wireless networks, fixed cellular fibre, HFC, cable networks, SDSL, VDSL and satellite there on the edges. And consumers will often have a lot of choices about the actual infrastructure they choose. And we are in the middle of some quite aggressive builds. So we have got 20 to 30 regional wireless ISPs that serve remote and rural areas, typically these guys about 40 to 150 wireless sites a pop, and typically in those rural areas. I will add that New Zealand is actually quite an urbanised country, I think we are fourth most in the ‑‑ we are not as rural as we think we are but we still have a reasonable rural population. We have got a fibre to the home build that is going to 80% of the population and majority of that is going to be done by 2019 so we have got these networks that are growing and expanding all the time. And we have now got more money going into rural broadband, 99% of the population which will have 50 megabits per second plus so from 80 to 99% we are going to see those gaps filled with copper and wireless and final one percent are satellite guys and they are really living in remote places. So we have had copper loops, VDSLs, pretty solid product competing well with fibre and fixed wireless access there.

We have got a stack and back to competition, a stack of retail competition, so most of our networks are open access, some of that is by legislation, some of it by choice, so we have got as an example, our fibre networks about 80‑odd ISPs selling on top of the New Zealand fibre networks. So we have got this infrastructure competition and the competition at the layers above.

And that is on DSL as well, our main provider of DSL, they were basically provide a bit stream, deliver it to wherever you like and that's what the ISPs purchase.

And then we have got some ‑‑ independent open access fibre networks and wireless networks as well. A lot of wireless networks not that open access. On to what I really should talk about, is the spatial data. What we have done, we have pulled together geospatial information related to these networks so it's both geospatial and temporal spatial, so we have got timings because as I said, we are growing, we see these networks getting pushed out so we have got timings of when these networks will become available. We collect that information and create some of that information and curate it. So currently in our data services we have got about 100 plus layers of geospatial information related to the Internet and telecommunications. And we use about 20 to 30 of those on the national broadband map. So the national broadband map, so with this data we can actually ‑‑ we have built a consumer‑focused availability tool, that is for consumers. We have built a data service which hopefully have more researchers using and hopefully that can influence good technical and policy decisions.

So, the broadband map, you can access it at broadband map dot NZ, one of our flash new registrations. And what we have here, this is a website, it is mobile friendly, you can drop an address in there, you can drop a pin in there and it will return what services are available at that address. By services, I am talking at the infrastructure level. We don't have a map into the retail products yet.

We visually represent that. Looking here, we can see this is the city of Auckland, our largest city. The dark pink is where we have ‑‑ where we have fibre coverage, the light pink is where it's coming. And we are getting those timings really good so we have got the granularity of some of this fibre stuff down to the month so we can say that fibre will be available in October 2006. So I have worked quite closely with the builders of these fibre networks to get that data in order. And of course we provide more meaningful textual information so we have actually got upload and download speeds and then some sort of link to how a consumer can actually get connected to these networks.

So they are able to take their information from us about what infrastructure is this rather than rely on a retailer who may or may not tell them what infrastructure options they have.

And again, this is the sort of stuff we can see. Same stuff. And we have released it in July 2015, we had 60,000 visits in first two hours, took down our infrastructure and it was ‑‑ yeah, it wasn't quite what we expected. Got covered by the major news sites. Now, this is the architecture, I am not going to dwell on this too long, but what we have is a stack of services, so everything ‑‑ we have got some JavaScript firing off in a browser, drawing in base maps from map box and our own stuff running in Amazon ‑‑ Amazon Cloud using Elastic Beanstalk, we are talking to coordinates, which is another service, that gives us another couple of services. So, I am not going to dwell on that too much but basically, we really are just consuming services.

So these services are on the interfaces, we use address finder for geocoding of addresses, put in an address, that returns an X, Y coordinate, lots of options, called address finder. We could use Google Maps, these services are not unique, they are in a number of places.

We went with address finder because it does hold some authoritative address data. Koordinates (with a K), we use that for our Vector queries and web map tile services and our ‑‑ we augment some of the stuff from coordinates with stuff we know. Address finder, simple, converts an address to an X, Y coordinate, based off New Zealand address authoritative data and access that via JavaScript.

We go off to Koordinates and do a point in polygon query, what we are asking is, saying, hey, at this particular point what polygons does this accept? So our coverage is polygons, the maths is simple, that we were just asking is this point within this shape? So I have tried to represent that there. But when we are actually doing it, we are actually going through a lot of polygons, so, we are looking at the same bit of New Zealand here. This is Marlborough, where they grow lots of grapes. And yes, so there is a lot of networks at this place. So we can have multiple overlapping ADSL layers, we will have VDSL, we actually don't have cable here but there are three wireless networks going through this layer, so when we are doing that point in polygon query we are going to return to three wireless networks. And the study is also fully fibre. We get a JSON response back from our service. And while it's not clear here, you can see it does have the information that we can put into our pretty ‑‑ side of things, there is something in there that is a sit ID gives us ability to draw on our map tiles off of the same service.

So, that is kind of how we built this map up.

We have got a data service, this is what drives the map. But it does some other things. We would like as much of this data to be open for reuse by others. Most of it is not our data so we just can't rerelease it; we need the agreement of the providers and we are working on giving them the comfort that they can release that data to others for others to use in different ways. So we can expose that in a number of ways, we have got the APIs and tile services, I won't go too much in it, but that is visualised stuff and we can allow direct download of the data. The data service, it enables us, we have got Vector querying is where we can ask questions is this point within this poll gone or what is the closest network to this. That is simple, again it's just a simple RIS interface and the tile services map tiles can be put into web mapping applications and pulled into desktop applications. We have got the storage and the permissions, so we can actually provide data on a non‑open basis, metadata management, which is relevant important and distribution, it is CDN distributed.

The architecture: Vector query services rest query service and the map tiles service, all restful APIs and got the ability to download the data.

And we pulled that data in ‑‑ we can upload it ourselves, we can actually back in to external GIS systems so we act as a conduit. Or we can potentially use other data services. And if you are familiar with desktop GIS you can open up ‑‑ you can open up these restful interfaces and build up your own maps.

So we normally asked how we source the data. So we have got over 20 providers of data. Now, it's coverage area, it's not statistical area so it is real coverage. I have seen some other things, things are based on statistical areas where mesh blocks or area units or cities, this is true coverage areas, they are not statistical areas or you will likely get; they actually genuinely are true coverage areas. So we work closely with those providers and we help them generate the data, particularly the smaller wise guys I have got an particular in, they often didn't have their network maps or didn't have that coverage available in a geospatial format to allow for reuse. So getting the data is generally about asking nicely the providers, giving them the comfort that you won't misrepresent them and getting that consumer tool broadband map dot NZ out there was great so increased enthusiasm, this seemed to be more of a motivation for the providers to get their data up and keep it up to date and we have got a nice feedback loop with those guys. Sometimes people contact us and say hey, we think this is wrong and we can feed that back to the providers and they can either, you know, let us know if it is wrong or perhaps give us an understanding why somebody might think it is wrong. And we have got a wireless propagation data. You know, I will come back to these wireless networks, I do like them. And we have got this 20 to 30 is a reasonable estimate, and these are little businesses and they feel this niche in rural and remote areas. They tend to be small rateers and know their networks well. They have lovingly crafted their networks, built them themselves physically. Communicating their coverage has always been a problem, they will know where their coverage is but turning it into a geospatial format to communicate it gets a bit more difficult. We took a couple of our approaches here, so we wrote a wee bit of software to take georeferenced images, images referenced with KML as used in Google Earth and convert it into Vector shaped files. Once it's in that we can start doing some cool maths on it and work things out. So we put some stuff out there. Some of them we use in a tool called radio mobile and some using called, it allowed both georeferenced images with KML and allowed us to take that and turn it into a true GIS format. And the other thing we did, we have got wave trace which is bit of OpenSource software. And that allows us to generate coverage based on terrain  datatech and Longley Rice production model. These are the sort of inputs we put in, two wave trace, the input is basically CSV. It's real use case is not for planning a network but quantifying a network that already exists, like when I see these guys we will have 40 to 150‑odd towers, they don't really want to map each one by one so it allows them to batch, they can fire me a spread sheet and I will run it through that. So it takes digital elevation model which is a train based model, tells us what the up and down S the network details, so latitude, longitude, an ten at that height above ground level, frequency, the transmitting AP, the power they are transmitting at, polarisation, horizontal beam widths, this is real stuff, this isn't quite ‑‑ this is estimating but it's not ‑‑ it's not just ‑‑ yeah, it is an estimate but a pretty good estimate.

This is the sort of output that we get, so this might not be quite straightforward to see but you can see we have got terrain playing a part here, so not just drawing circles around model sites. Modelled off ISP, well over 100 sites, I don't know if Martin Levy is here but he had something to to with the formation on it, based in Auckland New Zealand, so that is the sort of coverage I have got. They use this tool to map their network and do you have networks using these for qualification of customers now so they don't have to do truck rolls. But to get on to the more interesting stuff, we are now at the stage we can start doing some analysis and this is cool. We are got some consumer stuff out there and this is great, that is helping people make decisions about what the best broadband options are for them but now we have got the ability to do some analysis and this is a bit more interesting, and hopefully we can get others doing analysis if we can get this data a bit more open.

So we can identify true extent to of service ‑‑ broadband and mobile broadband black spots so that is kind of useful. If we jump back and we say we have got this roll out of fibre going to 80% of the country and got that final 20% to fill in, it's a significant amount of government money and industry money going into that. The industry are leaveied and the money is divvied out amongst them. Knowing where to invest that money is quite important. So, this talk can be used to identify where you are going to get the best use of that government and industry money. So it can inform public policy.

So you can do stuff visually like ‑‑ I mean this is mapping and mapping is meant to be visual. So we can style our geographic data, simple language called CartoCSS, as a development, it's very similar to CSS. It's supported by many applications and services. So the ‑‑ what I will show is we have actually styled, we have styled based on block spots, so the black on here, this is in all networks by the way ‑‑ where there is no broadband coverage in New Zealand. That is the top of the south island. It's not strictly correct, we are missing some of the mobile data. But that is not too far off. So we have got a big chunk of the country that has no mobile coverage. That is not that scary, most of that is national park on the south Alps, I think they start there somewhere. People don't live there. But, where people do live we can start looking at where under‑served customers here, this is tech knack key, which is volume can I can cone, one of those perfect looking mountains. So we have got a wireless network called Premo Wireless and a stack of fibre and ADSL throughout there as well. You can't quite see the ADSL there, it isn't great, but we did try to look at it. So using SQL like language and post GIS with GIS extinctions, we can ask that question. We have asked, OK, we want to know addresses that can't get more than five megabits per second with SDSL but can from Premo Wireless which is more than five megabits per second. From that we can return both the geometry which shows where these under served customers are as well as appropriate data which is the contextualised stuff to say what these actual addresses are, where they are, and that is useful, it's useful for public policy and it can be useful for marketing. So from memory I think there is about two‑and‑a‑half thousand addresses here that couldn't get a five megabit per second copper based service but could with wireless and that is useful and useful to know.

So, we took data that we held, which was the ADSL coverage, as well as the wireless coverage. We got address data from land information New Zealand, they have an open data service. Same technology that we use, coordinates, and they grab that ‑‑ make that available, and we grabbed it and we used it.

And now another thing that we did, was that we opened up our data service to some of our territorial authorities, basically councils, so our local government. Some of them were responding to a central government initiative, this one that linked to 80% of fibre and their 50 megabit per second plus wireless, a lot of them went out to public consultation to understand what people in regions needed, wanted required, so we made available a tile service to them so they could bring in some of the data that they held into their GIS packages, they were building up maps so they would have coverage data that we held but then also asset data that they had, demographic data, pull that in, take them out as maps so as they can better consult with the members of their community.

So that kind of worked. And it also let them identify what network providers were available. And again, we come to these smaller wireless players. Somebody sitting in a policy role within a council may not have been aware of some of the operators that were providing service in particular areas. And it's not like the wireless providers were telling them.

So, conclusions: So, on the broadband map side from user‑friendly data to ‑‑ from data to user‑friendly application, there is a lot of work. Most of it is collaboration, it's dealing with people and working with people. But now that we have got this data, now we have got it in a usable service, we can hopefully use that to better fully inform ourselves about the state of the Internet in New Zealand. And that's me. Is there any questions?


SHANE KERR: Questions? Questions? It's a big room. Thank you, I thought that was very interesting.

FILIZ YILMAZ: I feel I am being censored from this meeting. That is better. And this might be ‑‑ I don't want an answer from you just here, it's not a well formed question but I have been thinking about this one since I saw Andra's presentation as well about the measurements on the mobile growth in Europe. There is an element of public policy, informing public policy out of these collaborations and he was and the efforts both of them, yours and Andra's requires that collaboration from the private sector, from the ISPs, from the operators. I am wondering if informing public policy side was seen as a disincentive from the operators' side, more like, oh, I am a bit ‑‑ you are going to be informing with this the public policy so can you give me more information? Did you see that resistance or the, maybe the doubt, to understand more about where this is going to go?

JAMIE HORRELL: Definitely, definitely, it took a lot of time to get some of the networks comfortable enough to even have their data visualised and we are still not at the stage they will readily let us distribute that data to others to further understand things. We are lucky our largest provider Chorus which provide the majority of fibre optic networks, very open with their data and always have been and they seem to take the view they will try to be as open as they can so having them over the line well helped, but yes, lots of resistance, some of the big boys, very concerned, you know, depends who you talk to, some people are concerned people might know where their mobile networks were, and is that really a problem? Certainly, there was resistance and that is why it was just about working with people and giving them the comfort that we weren't going to misrepresent them.

SHANE KERR: We do have a question.

AUDIENCE SPEAKER: Rob Seastrom, asking for myself not for my employer, obviously. Can you tell me a little bit about the open access for fibre to the home networks, is this open access at layer 1 or 2, or how does that work?

JAMIE HORRELL: It's layer 1 and 2 open access, so that is by legislation. So, the local fibre companies, LFSs, as they are called, they have to provide dark fibre and a Layer 2 bit stream of, with certain parameters. Entry level now is basically 100 megabits per second, some are still selling 50 and there is a couple of 30 meg product, you know, grandfather. So it's open access at Layer 2. And it is open access dark fibre as well. Obviously different price points.

AUDIENCE SPEAKER: So there is no lock‑in to the particular technology being used like there would be say for a DSL DSLAM wholesale DSL set‑up?

JAMIE HORRELL: What do you mean lock‑in technology?

AUDIENCE SPEAKER: Meaning type of Pawn, RThog, active Ethernet...

JAMIE HORRELL: Yes, all the Layer 2 stuff is owned by the local fibre company. It's not owned by the consumer or the ISP.

AUDIENCE SPEAKER: But you said open access at layer 1. Can you get dark fibre at layer 1?

JAMIE HORRELL: Yes, dark fibre is available.

AUDIENCE SPEAKER: At scale? To the end user?

JAMIE HORRELL: Yes, it's a different price point. You are talking about $400 times two wholesale versus $42.50 wholesale for 100 megabit stream. I think I have got those numbers sort of right.


SHANE KERR: Before you step down, I actually have a question as well. So, I am wondering, since we are in a RIPE meeting, how much ‑‑ I am wondering about how much of this was informed by related work going on in other regions, I saw you used a little bit of OpenSource in your publishings, similar research efforts and how much you think you can push out to other regions?

JAMIE HORRELL: We probably weren't that informed by other regions. I mean, that is one reason I am here, we kind of really are out in the middle of the Pacific on our own, there is two of us on the team. So, if we can add something back, that is great. If we can take something back to New Zealand that is probably even better for us.

SHANE KERR: Of course, you have your own constituents to worry about. It just occurred to me, while you have unique characteristics, everywhere does, but there is also a lot of commonality.


SHANE KERR: Great. Well that's it, thank you.


That was our last presentation this morning and I am glad to see the room has filled up as people slowly made their way here. We are going to have a coffee break now and we will be be back in a little bit more than half an hour. To to the meeting page, RIPE 71, and login with your access account and you will see rating boxes next to all of the slides. Please do that, it helps us see how much you like different presentations, what kind of things are working for the community. I think that is about it. We will see you soon.