JIM REID: Shall we get started. This is the DNS Working Group so if you think we are talking about the RIPE database, you are in the wrong room and this would be good time to pop through into the other room. And Peter has just told me we have five minutes left already.
Before we get started, there are one or two formalities. If you are going to have anything to say, please use the mics and speak clearly and identify yourself and your affiliation because the proceedings are being webcast and the people would like to know who is speaking and what they have to say.
I would also ask to you put your all your little shiny devices into silent mode. Let's have no mobile phones ringing. We have one or two small tweaks to make to the agenda. First of all, there is not going to be anything agend item G because we couldn't find a Stuckee from SSAC to come and talk about the issue. That will give a few more minutes for the presentations in the first slot, we still should finish roughly as we expected at the coffee break at the half hour break.
A couple of other things to remind you is you have got the ability to rate the talks, it's not just now for the plenary sessions, if you have a RIPE access account rate the talks for the DNS Working Group and this will be a help for the future in terms of speakers and topics and all the rest of it and it would be useful for us to know about.
One other announcement is that the current format for the meeting plan that we ran, has been run for this meeting, is likely to be replicated at RIPE 72 in Copenhagen so this is roughly the running order where the Working Groups are likely to place in the agenda for the overall week's meeting plan so if you have concerns or questions about that because there are unacceptable classes or ‑‑ other other things like that, please let the Working Group Chairs know and we may be able to try and switch things around to try and minimise those clashes, the possibility will then arise we have got other clashes elsewhere so we can't guarantee we will change things. If you have any steps please step forward or let me know about it. We also have an apology to make, the minutes of the last meeting have kind of disappeared into a blackhole somewhere, and I am not sure ‑‑ I am not quite sure what happened. They should have been posted to the list some time ago and we are still trying to find out where this disconnect arose and how to deal with it, apologize that the minutes from Amsterdam haven't been circulated and we will get on top of that, and I hope to get the minutes for this circulated shortly after this meeting finishes so we don't have a situation dragging out for much longer again.
With that in mind, that is all the additional topics I have to discuss at this particular point. I would like to ask Anand to come up to give us an update on NCC's various DNS activities. Anand.
ANAND BUDDHDEV: Good afternoon. I am Anand Buddhdev at the RIPE NCC and I am going to do a short presentation and tell you about the stuff that we have been busy with in the last few months and that we are going to be doing over the next few months. The first topic that I would like to address is K‑root. In case you are not aware of this already, the RIPE NCC operates one of the 13 root name servers of the DNS and it's called K‑root. We have been doing this for a long, long time, many years. We currently operate a large Anycast cluster that sees about 50,000 queries per second at peak time so query rates have been going up steadily and we are hitting 50,000 now. Earlier this year we began expanding this network, we had 17 nodes previously and we have decided to add several more new nodes, and we have added 17 new nodes this year, so there is a little map on the screen showing locations of some of these sites, most of them are in Europe, but there is a couple of new ones in the United States and we have now some new ones in the Middle East as well, and also in Eastern Europe.
So, our operations of K‑root, it's business as usual, really. There is nothing extraordinary, except that of course, we have a new model and this is the model that we are using for our new sites. It's a single server, a Dell server, running Linux and on this we install our DNS software and announce the K‑root prefixes using BGP. One of the things that we were trying to address with this expansion was to increase some diversity in K‑root's software, so whereas previously K‑root was run exclusively on NSD version 3, with this expansion we have added BIND and Knot DNS to the mix, so we are running on BIND 9, not 1.6 and we have started using NSD version 4. We also have some diversity at the BGP software layer and that is with BIRD and ExaBGP and for the sites where we have lord ware routers we are using Cisco and Juniper, helps protect against vulnerabilities in one type of software.
So one of the nice things about this expansion is that, generally, latency towards K‑root has gone down and we feel that clients are getting slightly better responses towards K‑root. My colleague Renee published an article last week on RIPE Labs where he has done an extensive analysis of latency and responses towards K‑root and in this article you can actually see the improvements. There are some graphs and the link on this page leads you to a page of articles generally about K‑root, which also includes articles about the expansion process and how it's going. So, please feel free to visit RIPE Labs and read up some of these articles.
The RIPE NCC also operates other DNS services, besides K‑root, because we have to operate the Reverse‑DNS trees of all the address space that is allocated to the RIPE NCC. So this is a completely separate and independent DNS cluster and this one peaks at about 100,000 queries per second.
So, on this DNS platform, we run the Reverse‑DNS zones for all the address space that is allocated to the RIPE NCC. And we also run the e.164.arpa zone which is used for mapping, telephone numbers to resource records using the DNS. Earlier this year, there was a requirement to update our provisioning software because there was a policy coming into effect to transfer address space between RIRs, and so our provisioning system needed to be updated so that it could adapt itself to address space transfers without manual intervention. So we deployed this earlier this year, it's written in Python and it has very few dependencies, it only needs the DNS Python module and net ADDR which are amazing modules in themselves. This provisioning system takes input from the RIPE database, it reads domain objects that are created by users, and it also takes input from the other RIRs and then it merges them into zones and publish them.
Now, this provisioning system makes use of delegated extended stats files which are published by every RIR, listing all the address space that they have assigned or allocated to their end users. So this provisioning systems makes use of these stats files and adapts itself by figuring out if there is address space which has been transferred away from or into the RIPE NCC. If address space is transferred out of the RIPE NCC, for example, then the RIPE NCC needs to pick up zone lets from the other RIRs to stitch back delegation information into the zones that we operate, conversely, if there is address space that is being used in the ARIN region or the APNIC region and is transferred into the RIPE NCC region, then we need to produce zone lets for them and that is what this provisioning system does. It dumps these zones and zone lets to plain text files on disc and then we can load them into any DNS server, we currently use BIND but we could use any DNS server so it's flexible that way.
On this same DNS infrastructure, we also provide secondary DNS services. We do secondary DNS for the Reverse‑DNS zones for the other RIRs. We also provide secondary DNS for a number of ccTLDs and members of the RIPE NCC LIRs who have large address space allocations can also make use of one of our servers NS.ripe.net for secondary DNS service.
I would like to address the secondary DNS service for ccTLDs. So a few meetings ago there was a requirement to try and come up with some guidelines from the community for the RIPE NCC to operate this secondary service for ccTLDs because previously we didn't really have any guidelines, so there was a focus group established at RIPE 67 and the outcome of this discussion was presented to the Working Group by Peter Koch at RIPE 68. And further to that, we have now published a draft of a RIPE document to the DNS Working Group. So if you haven't already seen this draft, please go look at the DNS Working Group archives, read this document and if you have any feedback, please review and comment on this.
A little bit about our infrastructure:
We have a virtual server on which we run the provisioning system, and this one generates all the zones. These zones are then fed into a pair of DNSSEC signers. These DNSSEC feed a pair of masters, one in Amsterdam and Stockholm, and these distribution masters feed the zones to our Anycasted cluster which is in London, Amsterdam and Stockholm. At each of these sites we have one router and three servers and again, we have a mix of BIND, NSD and Knot for diversity and Cisco and Juniper for the routers.
Something else that we were very busy with just before the RIPE meeting, DNSSEC, so we have been signing our zones for several years now and for the last few years our DNSSEC hasn't really seen any changes, we have been signing and rolling the keys as usual, so it was business as usual. But, late last year we decided that we would like at rolling the algorithms of our DNSSEC signed zones and the motivation for this was obviously the community, because the community was asking us to switch to charter 56, this increases the security because SHA‑1 is now considered ‑‑ not hackable, sorry, but there can be collision attacks, as researchers have demonstrated. And also, by doing this algorithm roll‑over we thought we would share our experiences of doing this with the community, because this is something that other people may be looking at and could benefit from our experience. So the first thing that we needed was we needed our DNSSEC signer software to be updated, because the version we had did not support signing the zone with two different algorithms at the same time, and this is obviously a requirement if you are going to roll the algorithm. So we asked our vendor for software updates, which they provided, and we also noticed while researching that many other signers software out there generally has poor support or no support at all for algorithm roll‑over so this is something that we would like to encourage vendors to look at because other people may want to do similar things.
We began testing in October, we set up a test zone, we began rolling the algorithm, and then examining the output to see what would fail. So I am not going to go into all the details, but we have written a RIPE Labs article about all other experiences and this has been published so I would encourage to you read this article, again, if it's available at the link shown on the slide. So please do read it. But I would just like to provide a quick summary of what we found in this testing.
First of all, we realised that when you are going to roll algorithms you must roll the KSK and the ZSK at the same time, this is something that can you not do independently. If you are keeping the same algorithm then you can roll your ZSKs and KSKs independently but with an algorithm roll you have to roll them together. And all the zones in the records has to be signed by both the keys, the old one, the old ZSK and the new ZSK, using the old algorithm and the new algorithm. So, for a brief period, the size of the zone will be bigger and the size of the responses generated will also be larger because they will be twice as many RR SIGs. And you have to introduce the signatures before you introduce the you this keys, otherwise validators again start to fail, and you have to wait for the TTL of the RR SIGs to allow propagation of them. And then this is one very important thing: You have to keep the old keys and signatures in the zone until you update the DS record in the parent zone. If you don't this, then some validators will be strict and will fail validation for your zone, so I would like to point out that this is into the criticism of the validators; they are, I think, doing the right thing, but you have to be careful with them. So, you must update the DS record and only then withdraw the old ZSK and its signatures and the old KSK and its associated signatures. And obviously, be very, very careful; this is not a simple process and any small mistake will lead to validation failures and we can't afford that.
So, what we now plan to do is we are going to roll the algorithm of all the zones after RIPE 71, shortly after. We are going to be upgrading from SHA‑1 to 256 for all the signatures. And we are going to employ a slow start, we are not going to roll all the zones at the same time, we are going to first use a few sacrifical zones, let's say, so we will be using the Reverse‑DNS zones of the RIPE meeting space because that is not used outside the RIPE meeting and we will to the RIPE NCC internal zones and then all the big Reverse‑DNS zones and the forward zones, including one e.164.arpa.
A final slide and this is about RIPE Atlas DNS measurements and there is more for request for feedback. So at the moment the RIPE Atlas network can do DNS measurements but the types of queries that you are allowed to perform are limited, there is 15 allowed at the moment, so the common ones like A, AAAA, NS, etc., but we have had users requesting support for other query types because they want to test some new feature or they want to ‑‑ you know, they have records in their zone which are not supported, and our question is, you know, what does the community feel? Would you like the RIPE Atlas to only allow queries that are registered in the IANA registry or would you like to allow all types or should it be all except for some? We would like your feedback on this so please come and see us, talk to us about this or send e‑mail to Atlas at RIPE to the net with your feedback. And with that, I end my presentation and ask for questions.
JIM REID: Are there any questions for Anand? I think Geoff is going to get there first.
GEOFF HUSTON: Slide 12, where did that requirement come from that you have got to sign everything with both ZSKs and you have got to roll the thing at once because I don't believe it's a DNSSEC so there must be an implementation out there forcing you to go to that
ANAND BUDDHDEV: RFC 6840 which says that the ‑‑ that you have to have signatures of both algorithms in the zone and it does say to validators that they should be lenient and the requirement is more for the zone signer to produce this, and we discovered that unbound and Verisign's public DNS were following this particular part of the RFC rather strictly so they wanted signatures of both algorithms to present while the DS record was still pointing to the old KSK.
GEOFF HUSTON: Do you think it's an old DS record issue or do you think it's an issue within the zone itself?
ANAND BUDDHDEV: It's the old DS record issue, that is the problem.
GEOFF HUSTON: Thank you.
AUDIENCE SPEAKER: Dave Knight, Dyn. Anand, you mentioned that you are running different software implementations on K‑root, so I found that makes false diagnosis more interesting than it could have been, I was wondering if could you expand on that a bit, specifically have you done any work to verify the answers that you received from each of them are the same? I will let you answer that.
ANAND BUDDHDEV: What we have done is before we deployed the Knot and NSD, we have scripts that run queries against a zone, that is loaded into each of the three, and we have looked at the responses of several of these queries, to see if they are identical, and in some cases we have found differences, and if any of you were following the Knot users mailing list earlier when I was steadily sending in a stream of reports about things that were different, then you may have noticed us trying to polish things up there.
AUDIENCE SPEAKER: And my second question then was, how have you distributed that? Do you have single locations which answer with one software implementation or are you doing this on a per serve basis within a location?
ANAND BUDDHDEV: At the multi server right three servers and each runs a different software for K‑root where we have single we just pick one at random and try and balance out the numbers of NSD Knot and BIND approximately equally.
DAVE KNIGHT: One recommendation if you are very lucky when people are reporting problems they will give you an NSID but very rare they will look up the version if you can expose the host name that can often be very helpful.
ANAND BUDDHDEV: Right, thanks for that suggestion.
AUDIENCE SPEAKER: Speaking for myself. Just out of curiosity I wanted to know why you had structure with only one type of DNS server, why do you expand, what was the reason behind it because usually a thing people tend to stick with the thing they know and not try to get more ‑‑ try ‑‑ they work, why did you ‑‑
ANAND BUDDHDEV: It's to guard against a vulnerabilities in one implementation affecting our entire cluster of DNS servers. I mean, it's a known fact that DNS servers now and then have packet of net vulnerabilities which can cause them to crash and if such a packet was sent to our cluster and took down every server, then that would cause a big service outage and with this set‑up we can guard against this type of problem.
AUDIENCE SPEAKER: Dimitri. I see that you use multi DNS servers and load balancing process, the round‑robin answer, what was the way that you used two BGP Daemons at the same time or is it one in one location and one in another?
ANAND BUDDHDEV: You mean ExaBGP and BIRD for example?
AUDIENCE SPEAKER: Correct.
ANAND BUDDHDEV: That is different architecture. So we have K‑root model which is deployed at Internet Exchanges and we use BIRD because that is a fool routing Daemon and ExaBGP because we only want to announce the BGP root, we don't want to modify FIB there
AUDIENCE SPEAKER: Both running at the same time and I was wondering how you do that.
(Applause)
JIM REID: Next up we have Chris Baker from Dyn who is going to give a talk on resolver stuff, pretty much resolver and statistics stuff for the rest of this session.
CHRIS BAKER: Good afternoon. Hopefully we are all lively and awake after lunch. I do data analysis over at Dyn and I am going to talk about some questions that came up about the announcement of the default expectation of using IPv6 for OSX. I know we all love IPv6, it goes everywhere and the question comes up why haven't we done it yet? And I know the folks at APNIC have done some great measurements to try and explain this and some presentations using Atlas, some of the barriers.
Some background about how this came to be. All of a sudden on one of the i.e. T mailing lists somebody mentioned that Apple was going to change to a default of IPv6. Striking I know, and being a very selfish person I locked in my pocket and realised I use an iOS device, what does this mean for me, especially for privacy people in the audience, due to the plain text nature of DNS, it's good to know where your query might end up going. So thinking about this is a more fully vetted model it was of high interest. And then at the bottom there was a section where they mentioned they would like somebody to take a look at this and try to figure out and measure the impact of it on end users and that is when things started brewing.
Now, yesterday somebody was giving a presentation about some of the new tools they developed using RIPE Atlas and they mentioned they worked for the bank, so it was easy for them to do a lot of exploration. So not working for the bank I had to beg, borrow and steal all credits from my co‑worker. So, what we really want to know how is this going to affect our global Anycast infrastructure, how the change is going to be implemented:
We were setting up a race between the two protocols, so initially the AAAA will be issued and then A, in the event that it's not in the local cache. And you now a race, like a global race to figure out which answers gets back first and because you are trying to switch to a preference there is this notion of a timer introduced of 25 milliseconds so the A record response is handy capped by those 25 milliseconds and as long as the A can get in within ‑‑ AAAA within that bound, that AAAA wins the race. So when you are looking out into the world all of a sudden this starts to boil up, you look at Twitter and social media and people are all of a sudden calling this a 25 millisecond tax. And that was kind of like an interesting response because we didn't really mow it was going to be a tax, we didn't really know what it was at all because as far as I knew at that point nobody had ton any testing so of course the first thing you do when you see 25 second timer you assume the worst so all of a sudden my Internet is going to be that much slower, but being quantitative people you figure out, how can I start to think about measuring that. So, here is a description of the mental model that I came up with when I thought about what could we quantify in this space and one of reasons I wanted to give this presentation here is I am wondering what people think approximate it and if it makes sense, there is various holes based on tooling, I will walk you through the rationale. We run a global Anycast network and the question is if we issue a DNS query over v6 and 4 to this Anycast network, what is the impact going to be? End up at the same data centre or different places and what is the corresponding impact on the latency of this? In the event they do end up at different data centres why? Are these some of the bumps in the road that we are going to face. After that, I don't know about you, but having worked in Anycast space for a while, the authoritative level, we all know there is these non‑RFC conformant resolvers that live out in the world and do horrible things to our servers and that is how I think about them, they are nasty things and this can be anything from not understanding large queries responses and continuously sending TCP requests to a server, issues with caching or assumptions made on caching which aren't RFC compliant so I assume the worst and everything is horribly broken and if no data is improperly implemented then essentially this, if the no data response for the lack of a AAAA query isn't properly implemented the 25 minimumy second might be real because you need to do a full recursion for open a of those AAAAs but not the As it is more realistic to ‑‑ I am not saying this is rational, this is more those paranoid responses from boil up from seeing the weird code which runs the Internet.
So, lacking the wonderful APNIC system, you rub it to RIPE Atlas and how can I try to structure some tests and measurements on here to explore these conclusions. As I have mentioned we operate a global Anycast network, so we have a way of probing things using the Atlas and looking at the way that traffic got to our end points.
So from there, it was ‑‑ I was off to the chaos queries thanks to the Anycasters presentation we have pretty good overview of, we are going to issue chaos queries for host named dot bind and figure out where these are going, over v4 and v6 and reconcile the different places these things end up.
So an important mention is these findings where going to be somewhat specific to our network and more a question of methodology, this is due to the ISPs we use in our network's topology. So, when you have a massive set of data the question of how do I put that in a PowerPoint slide in a presentation is often one of the largest challenges you have to deal with in the testing, it's not collecting the data or doing analysis, it's how can I communicate that to the audience. We will start with text and walk through different scenarios. As you all may know Oceania is a weird and wild place, this could be a mix of how we have deployed our infrastructure or how peerings are set up around the world. So we have got examples South America interesting, and Africa, the most interesting of them all. Lets talk about Oceania. Oceania in general presented I don't want what I expected to find but the type of abnormality which drives you to think this test might mean something in the rest of the world. You will see if this is true as we go forward. But in a few cases you have those strange queries that go from Australia to the west coast to the US in both v4 and 6 and you run tier networking team, and you ask yourself how could this be possible, why would this be ever a thing that somebody chose to do and you get lectured on how the Internet works. The more interesting cases were the ones where the v4 was properly resolved at our Sydney data centre and the v6 queries went off to the US. So as you go through and look at these probes some have native v6 access, some don't, so that is another sort set of variables we have to consider, the ‑‑ what does it mean. And the difference here is because it's going all the way to the west coast the latency of that packet reaching the west coast versus going to Sydney means in that one case the 25 millisecond text would be real. You are going to say hey, man, you are skipping the recursive, that is not fair, going right from a probe to authoritative name server, sadly yes. That is one of the gaps. I know ‑‑ we don't have the Google ‑‑ but there is some things we can learn from this, and we are going to go through those. So we look at the Middle East and Asia and see some splits, some where the world is cut in half, some resolved to Europe and Asia and Tokyo and Los Angeles, I wonder how that probe got there. South America, always interesting split between landing in Miami and the west coast of the US. But more interesting questions:
So, can we say that 25 millisecond ‑‑ or the network topologist between these two is different, yes, from the same point we are going to two different places based on protocol. Is this the path the recursive would have taken, no, it's not, caveat, lies and statistics. There is some geoIP assumptions built in there so the tower of lies continues. But the impact of latency in this v6 tax was the next notion that I wanted to consider and that really relied on looking at the things took ‑‑ path things took and that is where things start to get more interesting. We take the populations we used for the initial v6, v4 testing and pop those into a trace route and see what path did it take to get there and go back and look and see how those paths diverged.
So I present here to you possibly geoIP picture of Africa and I was kind of amazed because everything looks so normal and you look at this picture and you think, if you were to assume which of these had a more developed or a more studied protocol usage, which would it be, who is V v4 and 6? Surprisingly the more normalised picture with the straighter lines and understandable triangular shape is the v6 map. So then you ask yourself: How could that be and this is where you start to dig into some of those strange things which initially I like to the jump to the conclusion it had to do with cost offsets so it was cheap tore put traffic in one place than another, the more I dug in the less it made sense. In the case it was a cost example so we have Level 3 in Frankfurt and in Amsterdam, and in this case the packets would go to Frankfurt, hangout for a bit and go down to Amsterdam and you start to consider how is this decision made and what is going on? And all we can do really at this point is think oddities because we don't have any proof and I haven't heard back from Level 3 yet, knowing why that would be isn't something I have now but follow‑up item for the end of the research.
The most interesting part was some of these Hurricane Electric tunnels, in the trace route paths they would pop up constantly and ask yourself, what the motivation in using that specific tunnel was for some of these customers. So, one of the very interesting ones was a probe that existed in Sudan and using a Frankfurt tunnel end point, that is highly suboptimal decision and you think about some of the other implications of why you might want to put traffic somewhere versus somewhere else. But, so ‑‑ understanding where your tunnel is is going to be very important on impacts of latency, a few we see some interesting activity where people were using a Hurricane Electric end point that existed in Miami, very close to that, however the traffic would end up being blackholed up to Ashbourne, another set of questions, we have to think about in this world now we have to think about safe harbour and where data is going, just thinking about the rudimentary of where our packets are going backs harder challenging situation because we assume that normalised v4 v6 topatology.
Path variation it turned out very much to be a thing and a lot of cases coming up with a solid reason for why it fell as it fell was more challenging than I would have liked. Now on to the fun stuff, the real meat and bones. So, the assumptions about recursive resolvers working and recursive resolvers being RFC compliant. How does one start to think about doing this? Well, if there is a 25 millisecond delay in, that it actually exists and thousand would come to be you have to structure a test which can expose all those different scenarios.
So, as I started to think about it that required creating three zones, one zone that had both an A and AAAA record, one only had A and one only had a AAAA and through these we explore different patterns that might arise of something asking for all the different variations and looking at timing differences and responses and if we did that long enough we could throw away network issues as causes of latency and tried to discover if had a differential timing ‑‑ allow for multiple queries in the TTL and ‑‑ assumed by model behaviour of what that would look like when you got to the cache.
The first scenario is dual stack applications, that is the zone A and AAAA ‑‑ and of course we start with the same population in the the United States because it's home. But then as I was looking at the response data there is a number of variables we have to think about in this case so there is the destination resolver, and is the destination address local to the probe or is it like a very sign public DNS or open DNS, the response time, the resource that was requested and the AS that the query originated from. So now you have a large blob of data and more variables than you considered so things get more fun.
So then you think, how am I going to pull all of this together. If I start to look at things in simple time series only destination address, what am I going to be missing? Locality and what the quantile ranges look like for that individual probe. So we can't group everything initially by destination address. First we have to look at each probe and what that experience looked like.
So we started looking at the quantile distribution of that ‑‑ resolution and essentially, there wasn't enough ‑‑ there wasn't a statistically amount of variation that one could attribute to the resolvers. Most of the variation appeared to be network based because my simple population isn't large enough yet ‑‑ so, when considering destination resolvers we were unable to find any issues with the destination resolver itself was manifesting this behaviour. If we start looking at the same distribution probe by probes can we find any of these that are manifesting behaviour we are looking for, somewhere distribution response time varies enough to assume it's paced on the record type and the request type. Well, it turns out not really, the distributions are all pretty normal across all the probes and nothing is really outlandish, and the things that you see which are large out liars are all assumed to be either network issues or packet loss.
So, the continued study here is just getting a larger population set and testing more of the Atlas surface area to see if there are areas where this arises or stopping the testing ‑‑ assuming I was totally wrong and looking back at the world and say tonight worry iOS everything is going to be OK. So that is ‑‑ that is pretty much what we ended up finding. The notion of the 25 millisecond v6 is pretty much at this point disproven at the operating system level ‑‑ routing for v6 specifically which you would probably have anyway if they didn't switch the default itself. So, if anybody has any questions?
JIM REID: Any questions for Chris? Going ‑‑
GEOFF HUSTON: Really confused. Geoff Huston here. I really am completely confused by what you are trying to do. If what you are saying is I have looked at Apple giving effectively a 25 millisecond variance on DNS resolution, the transport protocol used for the DNS queries start to recursive to recursive to authoritative, doesn't matter in the slightest; it's what you query for, it's not how you query.
And the real issue is: As you pointed out, you know, whether you ask for an A or AAAA makes no difference, right.
So in that respect, what I was thinking you were going to do, was not what I saw you do, which was kind of look at the variance with multiple iterations of actually going through the normal DNS resolver mechanism, because a.m.'s choice of 25 milliseconds versus Chrome's choice of 300 is the interesting issue and it's kind of that variance that you were going to look at because I am really confused about all of this, the DNS does not use v6, right, it just does not use v6 as a transport. If you go look with a mag anyifying glass you will find a few percentage. Google's public DNS doesn't use v6 if you can get away with it, almost no one does, it's just the way they work. I am trying to put what you said in context and I am really confused.
CHRIS BAKER: I think a lot of it is about the assumptions we were making from our resolver standpoint. The question where did queries end up, if v6 itself could impact what Anycasted results somebody would get and change the response ‑‑
GEOFF HUSTON: You are saying v6 transport. Correct?
CHRIS BAKER: Yes.
GEOFF HUSTON: And the issue is almost no one does DNS queries as a v6 transport.
CHRIS BAKER: I didn't have any stats to look at that to be honest.
GEOFF HUSTON: I have done a lot of that, have a look at DNS OARC from October, you will drown in the stats.
CHRIS BAKER: I started looking at some of the post that is jar he had had, showing increase in v6 traffic related to OSX and iOS rolling out so I guess ‑‑
GEOFF HUSTON: Data plane, not DNS. The DNS does not do happy eyeballs or does not even prefer v6, violently shies away from it in most conversations between recursives and authority tiffs, it's just the way these guys write their software. Something about packet too big in UDP, I don't know.
Dimitri: Geoff, you are kind of wrong at least on ccTLD, steady rate of v6 ‑‑ we love V of and yes, indeed not every serve ‑‑ eyeballing on the DNS transport level but that is not for this talk but a lot of curiosity for servers on v6, public ‑‑ use transport that is another question and I am sure there is somebody here from Google who can say that.
JIM REID: Thank you, thank you, Chris.
(Applause)
Next talk is Joao Damas from Bondis, a talk on TCP traffic.
JOAO DAMAS: Hello everyone. I know some of you, a few of you have seen this talk already elsewhere, I guess that is a symptom that you are travelling too much. What would happen since it has been discussed in the past, if suddenly we stopped using UDP as address port for DNS and just went on and used TCP for everything, which has its advantages in terms of spoofing ‑‑ either spoofing avoidance and so on, it comes with the penalty, we just wanted to find out what penalty might be.
We will start with the data. The data we took is captured rather data capture from two recursive servers at medium‑sized ISP. One is around a day, the other one is slightly bigger and that is what they look like. They have basically a sustained rate of between 200 and 400 queries per second throughout the whole period with peaks, so that you have a baseline to compare what comes next.
It's not one of the heaviest used servers as you can see but still has sufficient data there that it's of statistical significance.
So, if do you this chart, these changes, if you start using TCP instead of UDP for everything in DNS, what would happen and what is the cost we incur? And that basically is the cost of maintaining state, particularly on the server. The client probably can go away and not suffer much from it, but the servers, the recursive servers are going to have to keep a lot more state than they keep now, UDP basically is over as soon as it starts, basically.
So this is simulation of what the load might look like and fending on the time out. TCP, part of ‑‑ there are several time outs that affect how long you keep information. There are time out waiting for clients to close in case they don't, there are time outs waiting for out of order packets so these timers extend even beyond the time line when the TCP connection is properly closed by both the client and the server, so even if the feen and feen ark still data is kept around for a little while longer in the servers, in the end points, really, both.
So as you can imagine, we just simulated what would be the impact of these different trailing times, how long the information gets to stay on the server side, on the recursive serve. Obviously if the time outs were down to zero you would have UDP behaviour or stateless TCP behaviour and so that be the baseline. The Linux default would be the yellow line, and so the TM 120, and so you would to be careful if you were to run a server this that mode because you would be multiplying the number of established connections by ten from comparing to the UDP case. This is important not only because of the amount of resources that are consumed but also because there is a limited amount of port. If you are going to keep 6,000 connections open at any time and you keep them for ten seconds, well after those ten seconds you have used all of your ports and you cannot accept new connections, so, then it becomes a Rob for more than than just load, right? So tuning these parameters is quite critical. But if do you it properly, fur aware, if you look at what we have learned from running high throughput HDP servers I think it actually could be done, but if you jump naively into it you will be in big trouble.
One of the benefits that you might consider getting if you use TCP instead of UDP, is that maybe you can reuse an already open connection for subsequent queries, and therefore, lower the cost of using TCP back to UDP is, so you do the handshake once to get the thing started and at the times it's a packet that goes and one that comes back. Looking at the different time outs, for how long you were willing to keep a connection between two end points, the recursive server and authoritative server, open, what would be the chance that there would be benefit in putting im ‑‑ implementing a connection reuse for subsequent queries. Unfortunately for most of the cases, nothing. Zero there in that X Axis means there was a query and during the creation of the time out up to 300 seconds, there was not a second query at all. So keeping that connection open doesn't buy you anything. There were a few and keep in mind that the Y axis here is logarithmic, there were a few reused but not sufficiently to justify keeping these connections for a long time. Unless you have special use hosts. Very few, three hosts there and a few more at a lower level, actually would benefit from connection reuse because as you see there, they issuing more than a million queries that would make use of these connection reuse. It turns out looking at what the servers are, they are mail servers. Mail servers do a lot of DNS, particularly these days where SPF is checked for, where you need to look for the MX and the backup MX probably if there is records and names and all this crap gets checked before you send a mail. Those servers put the Whois load and they do queries so frequently that having a permanent connection to these would be highly beneficial.
For the ones that are worth keeping the connections for, we wanted to look at what does the pattern of usage look like. So this is basically connection for a few connected servers as time goes by throughout the day. As you can see, obviously, there is a pattern of day and night there. There are serves like number ten that are web servers that run their log processing sometime during the night, so they have ‑‑ shouldly show up with a very strong signal and then go away, so that is an adaption if you think you have a negotiation mechanism to keep connections up, maybe you want to include variable behaviour throughout the day into what you can communicate there.
Number 8 is one of the e‑mail servers, basically asking questions all day long.
The other ones are like very small, but if you look at them, pick one and looked at very, very small time scales, you will see that actually what happens is that you don't get one or two queries, when you get queries from towards a server you tend to get little births of queries. I don't exactly at this time know why, the obvious candidate for an explanation might be that the client behind the recursive server was loading a page and the browser was doing refresh. But keep in mind that this is ‑‑ these are connections between recursive server and the authoritative server, you have to wonder if people say they care so much about web page load times, then why don't you just use a better opt misation of how you spread your links and your to mains instead of asking for so many different queries right, because it affects how long you take, the time period there is couple of hundreds of seconds.
An argument that has been used, several the arguments used of why we start using using TCP versus UDP is about the benefits you gets, the needs. One of them is clear, TCP is much harder to spoof in terms of ‑‑ than reflection attacks than UDP, UDP is trivially spoofable, TCP is harder. The other one is message size. There have been arguments with more and more stuff into DNS including DNSSEC and then you need to make the keys stronger, you implement NSEC3 which takes more says and so on. Long ‑‑ the other mechanism we have to transport big things big chunks of data in DNS doesn't cut it because it has problems along firewalls. Of course, that is some times true for TCP as well, many people out there sadly don't understand that even RFC 1044 called for transport for DNS must be available over both UDP and TCP but anyway, let's look at the message site thing.
So this is the distribution of observed messages, DNS messages going through this server and the responses ‑‑ mainly the responses coming back, right. There is a high concentration of small sizes like below 1,500 bytes so MTU size, there is a few items that are dispersed at higher sizes, but if you look in the small graph there, maybe it's too tiny and not focused enough, whereas the scale on the big graph is 200,000, had 100,000, 600,000, the other is 20, 40, 60, so there's several of magnitude there. Those messages are there already today falling back to TCP because when you get 8,000 byte message from authoritative server there is no other way around it, you're going to have to do this.
For the DNSSEC traffic specifically, I was a little bit surprised because in none of those messages did we observe any message that exceeded 1,500 bytes. All of them actually fitted. All the bigger than 1,500 messages were not signed at all. So, kind of that was for me at least quite ‑‑ counter‑intuitive, not what I expected.
As for the difference between what happens to message size when you sign a zone with NSEC versus NSEC 3, it's clear to see there: The green graph represents the message size distribution for NSEC signed zones and the blue for NSEC3, NSEC3 makes messages bigger but not much of a problem in the general case.
So as I said, sort of conclusions, message size doesn't seem to be a big problem, management of it. CP time out is quite critical, you can physically run very quickly if you use a wall if you use default installations. You have to tune things. This is a general thing for DNS, we immediate more mechanisms of signalling points between end points in DNS, the current way of doing things about ‑‑ imagine even EDNS buffer size is like I send you something big and if it's too big I will try with something smaller and still too big something smaller and that is just annoying and a waste of time for everyone. So, having these capability of telling the two end points, having efficient way of agreeing would be very nice, and then caching that information of course.
Connection reuse could be used but only in small cases, it's not a general behaviour that will always yield against. And this was a study between recursive servers and authoritative servers so looking outwards from the ISP.
I want to have a quick look at what does it look like inside, the query load on the inside is a lot bigger so we have the data set so we will be looking at that. And that was it. OK.
JIM REID: Any questions for Joe? No.
AUDIENCE SPEAKER: I have one. João. You say you have been doing this work this a lab setting, have you any plans of doing TCP queries to auth servers on the Internet to measure things like brokenness caused by firewalls and routers that are caused by TCP traffic?
JOAO DAMAS: We do include that because we wanted to see what was the study ‑‑
JIM REID: If you plan to do that in future phase of work.
JOAO DAMAS: It would be nice to trace because it is argument EDNS is not a solution because it gets stuck several ways through the network maybe finding out what the case is for TCP will be worthwhile.
SHANE KERR: Shanke Kerr, BII. I have a few questions. I feel like I have seen this exact presentation before but for some reason it captured my interest this time. So I don't know.
So, given that the packet size that we are seeing on even the DNSSEC side, it seems like maybe the recent push for a clip to curve cryptography may be an anti‑goal must from the authoritative to the resolver which is the ones currently suffering today. Does that seem reasonable to say?
JOAO DAMAS: To trade of data that you exchange V CPU, right?
SHANE KERR: Right.
JOAO DAMAS: CPU seems to be in abundance today actually.
SHANE KERR: Crypto 2 ‑‑ anyway. One thing that is missing in your list of motivations for TCP is just the effect of middle boxes on traffic in general, there is a lot of packets bigger than 512 and a lot of crappy firewalls and old rules and things like that, well, will cause problems with fragmented packets so that in itself ‑‑ I have been doing some work as you know recently on application level fragmentation and a common refrain against that, just use TCP. It means there is a third or fourth or however motivation for using TCP
JOAO DAMAS: That goes with Jim's question before. Maybe it would be worthwhile checking how much that applies.
SHANE KERR: I have some ideas we can talk off‑line. This is the last thing. I guess you are probably following the work in the IETF on the TCP recommendations and TCP keep alive. From your presentation it actually seems to me it argues against a lot of complicated signalling and management options between servers, because it almost seems to me like, as the client ‑‑ well as the recursive resolver in this case, you basically have all the motivation in the world to cheat which means the authoritative servers have all the motivation not to trust anything that any resolver is going to give them. We are not in the Internet from 30 years ago, you basically have no trust relations any more.
JOAO DAMAS: I think it's universal truth that clients have the motivation in the world to cheat about everything. They will set traffic priority in TCP so downloads faster, they will ‑‑ it will do anything,
SHANE KERR: Right. Get me my data.
BRUCE VAN NICE: .
JIM REID: Running slightly behind time, so could questions and answers be as brief as possible.
GEOFF HUSTON: Answering your Jim we did this exact same experiment about two years ago doing a broad scale measurement against the authoritative servers that we own, the recursive servers that people use asking authority tiffs. About 7% of recursive resolvers will not use TCP so you hand them a tiny response with the truncate bit set, don't like what you are doing and will not ask. Whether it's filtering firewalls stopping it or them just going I am not doing TCP so 7% of resolvers, that that only affects a much smaller mm‑hmm of users, and most of those users go and use another resolver and get the answers. Around a little over 2% of all users, which is what, 30‑odd million or so, just a few, get stuck, they will not get an answer. But it's not that big in the grand scheme of things. So it's not that there is a whole bunch of folk that just will not work over TCP, there is a small pool of folk and a bigger pool of resolvers. If you want the work done again, it can be done again but it's not that serious a problem.
JIM REID: Thanks. I guess we are done. Thank you very much, João.
(Applause)
Ondrej is up next his work with resolver test‑bed for the Knot resolver server.
ONDREJ SURY: I am going to present a thing we did as part of development of Knot DNS resolver, we called it Deckard, and I think that we all seen things that the DNS user wouldn't believe.
Testing DNS software, it's a hard problem because it, on one part it's standard compliance but on the other hand it's real world compliance because resolvers are not always on par with the operational experience. And they have to be repeatable and well, should we test on live Internet especially for the resolvers that could be a problem because the network could be changing all the time and set up complicated test line or have hooks in the code. Well, I am sure we all have set up complicated test lab, and it might look like this in the end.
So, what we did, we developed a something like software test lab as a part of the Knot DNS resolver and we simulate everything on run time. The things creates a controlled environment and wraps around the Ciscos and library functions by library functions and tests are very fast so you don't have to wait for everything, and because it's fast it could be included as a regular part of your development cycle, you continuous integration and test and stuff like that. It composed of some parts but these are the most important parts, socket wrapper and it creates fake networking environment and we modified the library because original could only work with the local addresses, we have a copy of that and then to send it back to upstream that could also fake real addresses in the Internet.
And also the application use space can bind to privileged ports and as they are running inside the socket wrapper they communicate with mock terrorists, which is those are scripted. The other part is let's take time that can change the flow of time and do some time manipulation on the tested binary and last part is Jinja 2 which is way to create configuration password that's the DNS servers.
The test scenarios were heavily inspired by unbound, replay test cases that inside the code, and we even use some of those tests and we develop some new ones, and the good thing is that it runs a production binary as process, sort of black box testing you don't have to have any hooks in the code inside and redirects all the network communications that are set up at the socket wrapper and the tests are declarative, description of the environment, that means the variables in the configs and the network configuration and then it's just sequence of DNS messages like queries to send, answers to respond to specific queries, and expectation about the answers given by the server.
One of the scenarios exampled is Jinja2 template for Knot resolver. Does this work. So this is just network configuration where it should listen, then there are some modules, it's not important. The important stuff is in the double ‑‑ gets replaced by the card programme. So you can turn the minimisation on or off and stuff like that. And set up the Trust Anchors, these are the important things.
And then I am not sure, is it visible, the example here? So, at the beginning is a description of the config, which is like the local addresses, stuff like that, it's not really important for this test. Then, you can have some range with numbers and it describes the packets that would be given as answers, and this is this part, so ranges from here to here and it could include multiple packets defined by entry begin and entry end and each entry has filter which will trigger on incoming packet, then you can do adjustments over the reply and set different flags, R code stuff like that and there are several sections, the usual ‑‑ the answer, additional and it's written in just standard zone file format.
And then the test begins with the steps, so there could be several steps and this step 1, for this test, which is lame root server is it sends a query to the tested server and then in the next step it expects some specific answer for that. We extended the format for raw, this was not in the original unbound test cases, it's just encoded binary content for the packets.
So, for development of software, it's again, it's free software, OpenSource, found on Git lab, there is scenario guide and more scenarios we have available. We have some tests for the authoritative part, because it's basically it's agnostic to the binary, it just expects some stuff from the network interfaces to be sent and to be received. And there is also more complicated examples if you want to look at that. And you are very welcome to participate, we need more test cases for resolvers, for outers of DNS and even for DNS tools because we can just easily ‑‑ we need the support for more servers to test, to create templates for the configuration and it was, well, mentioned by raffle at the last DNS org, it would be nice if somebody did a testing of DNS servers and to be frankly think it would be very good utility 20 do that because it doesn't take much resources to set up and run this test. (To do). If you you have any questions Deckard will answer them.
JIM REID: Any questions? Nothing from the chat room?
ONDREJ SURY: Everybody is afraid of the blade runner.
BENNO OVEREINDER: Great work.
JIM REID: Last talk is Marco Davids from SIDN about the new work they have been doing with data statistics.
MARCO DAVIDS: Hello everyone. So, first apologise to of you who have seen these slides before, it's been shown on several occasions, I will Rye to make it short and keep it short. This is us, as SIDN regry for .nl, 5 of 6 million domain names, half, assigned to DNSSEC. I work for the R&D department, the team we have at SIDN called SIDN labs and ENTRADA which is our small big data platform which I am here to talk about with you is at the signed within this R&D team.
The intent of ENTRADA is to collect all the DNS query data that we receive on the authoritative name servers for to the NL, we are not there yet, we receive roughly 300 gigs of peak data per day on all the name servers combined, including the Anycast Cloud that we obtained from Netnod, so at the moment we only only capture roughly 10% of all that data in our small big data platform.
It took us quite a while to figure out what would be the best set‑up for us, the best infrastructure to run this stuff. We started off traditionally by looking at relational databases that didn't work out obviously, so we quickly moved over to SQL solutions and after small endeavour to MONGO DB and we ended up with Hadoop environment HD F S, distributed file system and Parquet which is the storage and Impala which is interface on top of that you can see it does the query engine. This is how it looks like. In this picture there are three nodes, in reality there are five, but as you can see there is this distributed file system below and on top of that the Parquet files and then the query engine, impala in our case. Impala had a nice web gooey for ad hoc queries, you can add sequel statements in there. It has a command line interface and you can do cool stuff with Python and even a Java interface.
This is the idea: To do analyses and research on DNS. As you can see, this is the system here. These are the data sources primarily of course DNS data, but there is also an option for adding other sources, I will show you an example of that later on. It's being processed in engine and put in storage and then we can define nice algorithms to do cool stuff with with all that information and on top of that we are hoping to build services and applications and disclose all that through user interfaces and APIs.
Obviously privacy is an issue when doing things like this, so we are thought about that as well, you can read everything about that in our privacy framework position paper. It's on this URL, you can find it on our site, it's quite an extensive document where we clearly explain how we work, we have installed a privacy board internally at SIDN so there are quite a few procedures and things like that that we take into account before playing with the data ourself or even disclose it with other parties such as universities.
This is a workflow so from the name server we collect PCAP data, being transferred to a PCAP ‑‑ PCAP data, we combine the queries and the replies together in one record, and we enrich the data so we add, for example, geolocation to the IP addresses and also AS numbers and other information and from that point of this being transferred into the system as a Parquet file and we analyse it with this Impala thing. For us the performance is satisfactory, just to give you an idea, if you would like to run a query on entire year of DNS data which is 2.2 terabytes in our case, that query will run approximately six minutes or so. So, this is quite rare, normally we do queries on a day basis or week basis or maybe even month so it performs rather well.
Here are just some figures to impress you, well, not really. It's a small big data cluster but it's interesting.
I would like to show a few examples of the stuff we are doing with this at the moment. One of them is DNS security score board. This is an intent of bringing together multiple sources of data. What we do here is, we have incorporated the data of net draft and fish tank and they inform us whenever a domain is used or being abused for phishing, for example. So there is this do maim name and phishing site on there and net craft will inform us about that. Now, what we have done on this system is to combine that net craft data with our own DNS query data and as you can see on the graph all the way up, the bar graph, the most right bar is the day where net craft informed us about this phishing site and as you can see in the days prior to that we already received a different DNS behaviour, a different traffic, this is very common domain name and hardly attracting any traffic at all and suddenly a few days before net craft was informing us we already noticed a difference in behaviour and you can see in the graph below, we can see that there is also a difference in autonomous system numbers queries for the domain name so there is a clearly relationship between what we saw and what net craft saw and we hope to use this kind of intelligence to be able to earlier discover phishing and other abuse under NL domain names.
This is another project we are running and already in production at the moment, called the resolver reputation system. The goal is to have a look at valid resolvers that are behaving perfectly according to the standards and other IP clients that are connecting authoritative name servers and if you compare both of them you will see a difference; you will see odd behaviour from the malicious ones because these are malicious IP addresses contacting us and in this case we are simply able to discover a Spambot net and we have used that information to inform what is called the abuse information exchange in the Netherlands, an initiative of ISPs working together in order to combat malware and so we feed that information exchange, it's a platform, with the information about the BotNets that we discover. So, we discovered them at the moment where they are active, so that is sooner than the traditional methods. This is an interesting product of this ENTRADA platform. This is the reason I am here today with you, it's relatively new initiative, we call it the open data programme. What we are hoping to achieve here is to publish aggregated data from the platform on our website, so, we publish it through graphs but we also publish the raw JSON files and it's all aggregated data and we hope to add more data as time progresses. You can think of information who is doing TSLA queries, is DANE picking up, how much percentage is IPv4 and IPv6, you name it, we would like to add T that is why I am here, I would like to you have a look at that, hopefully gets inspired and give us feedback, do you think it is useful and would you like us to add more of that information? If you have any information or suggestion please let us know and this is how you can contact me or the team. And that is it. Thank you very much.
JIM REID: Are there questions for Marco? Well, I will ask you one: The data that you are putting on this Hadoop have you any plans to expire that or grow indefinitely or get rid of this stuff after 90 days or a year or whatever.
MARCO DAVIDS: There is a policy in place. This is part of that privacy board that we have internally. We will take away IP address information after 18 months. So the data will be anonymised and we will get rid of the IP addresses.
JIM REID: But the rest of the data will still remain indefinitely?
MARCO DAVIDS: At the moment, yes.
JIM REID: Thank you. Any other questions? No. Thank you very much, Marco.
(Applause)
Well, that is the famous first, we are finished ten minutes earlier. I have one or two announcements to make before we break for the coffee. Benno asked me to remind you there is still opportunities available for lightning talks in the closing plenary tomorrow, if you have got anything you would like to share with the RIPE attendees. There is still time to vote for the PC candidates for the Programme Committee, please, if you have not voted yet, vote now or many times if you like. The opportunity is still there to do that. We will start at the top of the hour and have a nice coffee break. Thank you.