2014-04-18 “IPv6 Deployment in Sofia University” lecture at RIPE-SEE-3
Friday, April 18th, 2014This is my talk from RIPE-SEE3, on the deployment of IPv6 in Sofia University. The presentation can be downloade in pdf or odp format.
I’ve put the slide number or the notes/comments I have in [], the rest is as close to what I said as I remember, with some corrections.
[1]
Hello. I’m going to talk about the IPv6 deployment in Sofia University. It’s an important example how this can be done on a medium-to-large scale, with no hardware upgrades or hidden costs.
This presentation was prepared by Vesselin Kolev, who did most of the architecture of the network there and was involved in its operations until recently.
(I want to clarify, we’re not the same person, and we’re not brothers, I get that question a lot)
[2]
Please note that I’m only doing this presentation, I was not involved in the deployment or any network operations of the university’s network.
[3]
These are the people that did and operated this deployment. Most of them aren’t there any more – Vesselin Kolev is at Technion in Israel, Nikolay Nikolov and Vladislav Rusanov are at Tradeo, Hristo Dragolov is at Manson, Radoslav Buchakchiev is at Aalbord University, Ivan Yordanov is at Interoute, Georgi Naidenov is a freelancer. Stefan Dimitrov, Vladislav Georgiev and Mariana Petkova are still at the university, and some of them (and some of the new people there) are at this conference, so if you have questions related to the current operations there, they’ll probably be able to answer better than me.
[4]
So, let me start with the addressing.
[5]
This is the current unicast network that’s in use in the university. It was assigned around February 2011 and was used from the 11th of February 2011 onwards. It’s a /47, later on I’ll explain why.
Also please note that the maintenance of the records in the RIPE database for the network is done only with PGP-signed emails, to protect this from hijacking. As noted by one previous speaker, there are a lot of cases in RIPE NCC, where bad people try to steal prefixes (as their value is rising), and that’s possible mostly because there aren’t good security measures in place in a lot of LIRs.
[6]
Before that, the university used a /35 from SpectrumNet (since then bought by Mobiltel), until they got their own allocation.
It should be noted that in IPv6, the renumbering is a lot easier than in IPv4 and this was done very fast.
[7]
Now, on the allocation policy,
[8]
this is how unicast addresses are assigned. It’s based on RFC4291, and for the basic entities in the university (faculties) there’s a /60, for each backbone network, server farm or virtual machine bridge there’s a /64, and all the additional allocations are the same as the initial size (on request).
Also, allocations of separate /64 are available for special/specific projects.
[9]
The university also utilizes RFC4139 addresses, for local restricted resources. The allocations are done on /32 basis.
[10]
Now, onto the intra- and inter-AS routing,
[11]
The software used is Quagga on CentOS.
There is a specific reason for using CentOS – the distribution is a recompilation of RHEL, and follows it closely, which means that it’s as stable as it gets – if there’s a security or other important update, just the pieces are backported to the current version of the software, which effectively means that you can have automatic updates running on your servers and routers and be confident that they won’t break horribly.
That’s in stark contrast with almost every router vendor I can think of.
[12]
The current transit providers for the university network are Unicom-B (the academic network in Bulgaria), and SpectrumNet (now owned by Mobiltel).
The university has private peering with Digital systems, Evolink (it changed it’s name from Lirex), ITD, Netissat and Neterra. It also provides an AS112 stub node.
(For the people that don’t know, AS112 is somewhat volunteer project run anycast nodes for the DNS servers for some zones that generate crap traffic, e.g. “example.com”, or the reverse zones for the RFC1918 addresses)
[13]
This is the basic schema of the external connectivity. The university has two border router, each of which is connected to both upstream providers (and to the private peers, but that would’ve cluttered the drawing too much). They’re interconnected through one of the MAN networks, and are in the Lozenetz campus (where the University Computing Center is, the main operator of the network) and the Rectorate.
[14]
These are the prefixes that the university originates. Here we can see why the university has a /47 – so it could be de-aggregated, for traffic engineering of the inbound traffic. That’s one problem that nothing has solved yet, and that would plague us for a lot more years…
Here each border router announces a /48, and both announce the /47, so they can effectively split the inbound traffic.
There’s also the IPv6 prefix for AS112, announced from both borders.
[15]
This is what every router should do for the prefix announces it receives. Basically, everything from a private ASN is dropped, and all prefixes that are not in 2000::/3 (the unicast part of the IPv6 space), are shorter than /3 or longer than /49 are also dropped.
[16]
Here you can see the schema of the backbone network – the two border routers, and the access routers of the faculties. They’re all under the administrative control of the network operations team.
[17]
For this schema to work efficiently, the two border routers do the job of route reflectors, and the access routers are route reflector clients – e.g. each access router has two BGP sessions, one with each border router, and that way it learns all the routes coming from the rest of the access routers.
This setup would’ve been unmanageable otherwise, as doing a full mesh of so many routers would’ve resulted in full mess.
[Ok, I didn’t think of that pun at the presentation]
[back to 16]
The initial idea was to actually have the border routers be route reflectors only, and have all the access routers in the VLANs of the external BGP peers, for the traffic to flow directly and not have the two borders as bottlenecks. This wasn’t implemented because of administrative/layer8-9 issues.
[19]
This is how the core network connects with the faculty networks – the access router is connected in a network with the routers of the faculty, and via OSPF (or RIP in some cases) it redistributes a default route to them.
(Yes, RIP is used and as someone told me a few hours ago, if it’s stupid and it works, maybe it’s not that stupid)
[20]
Now here’s something I would’ve done differently :)
Both OSPF and RIP are secured using IPSec, here’s how the config looks like.
[21]
This is not something that you usually see, mostly because it’s harder to configure and the weirdness in the interoperability of different IPSec implementations. For an university network, where risk factors like students exist, this provides a layer of protection of the routing daemons (which, to be frank, aren’t the most secure software you can find) from random packets that can be sent in those segments.
It’s reasonable to accept that the kernel’s IPSec code is more secure than the security code of the routing daemons.
This the only thing that this setup provides more than the other options – a pre-shared key is used, as there aren’t any implementations that have IKEv1 and can be used for this task.
Also, this is harder to operate and configure for untrained personnel, but the team decided to go ahead with the extra security.
[22]
And of course, there has to be some filtering.
[23]
Here’s a schema of one external link – in this case, to Neterra.
[24]
Here we can see the configuration of the packet filtering. This is basically an implementation of BCP38.
First, let me ask, how many of you know what BCP38 is?
[about 25-30% of the audience raise their hands]
And how many of you do it in their own networks?
[Three times less people raise their hands]
OK, for the rest – please google BCP38, and deploy it :)
[Actually, this RFC has a whole site dedicated to it]
Basically, what BCP38 says is that you must not allow inbound traffic from source addresses that are not routed through that interface. In the Cisco world it’s known as RPF (Reverse path filtering), AFAIK in Juniper it’s the same.
In Linux, this is done best using the routing subsystem. Here what we can see is that on the external interface we block everything that’s coming from addresses in our network, and on the internal – anything that’s not from the prefixes we originate.
[25]
Here we can see the setup on an access router (as we can see, BCP38 is deployed on both the border and access routers). Here we can differentiate and allow only the network of the end-users.
[26]
On the border routers, there’s also some filtering with IPtables, to disallow anyone outside of the backbone network to connect to the BGP daemon, and also to disallow anyone to query the NTP server.
(what’s not seen here is that connection tracking is disabled for the rest of the traffic, not to overload it)
[27]
On the access routers, we also have the filtering for the BGP daemon and the NTP server, but also
[28]
we filter out any traffic that’s not related to a connection that was established from the outside. This is the usually cited “benefit” of NAT and the reason for some ill-informed people to ask for NAT in IPv6.
With this, the big-bad-internet can’t connect directly to the workstations, which, let’s be frank, is a very good idea, knowing how bad the workstation OS security is.
[29]
The relevant services provided by the university have IPv6 enabled.
[30]
Most of the web server have had support since 2007.
The DNS servers too, and there’s also a very interesting anycast implementation for the recursive resolvers that I’ll talk in detail about later.
The email services also have supported IPv6 since 2007, and they got their first spam message over IPv6 in on the 12th of December, 2007 (which should be some kind of record, I got mine two years ago, from a server in some space-related institute in Russia).
[31]
ftp.uni-sofia.bg has been IPv6 enabled since 2010, it’s a service for mirroring different projects (for example Debian).
The university also operates some OpenVPN concentrators, that are accessible over IPv6 and can provide IPv6 services.
[32]
The coverage of the deployment is very good – in the first year they had more than 50% of the workstations in the Lozenetz campus with IPv6 connectivity, and today more than 85% of the workstation in the whole university have IPv6 connectivity.
(the rest don’t have it because of two reasons – either they are too old to support it, or they are not turning it on because they had problems with it at the beginning and are refusing to use it)
Around 20% of the traffic that the university does is IPv6.
[question from the public – what is in that traffic? We should be showing people that there’s actually IPv6 content out there, like facebook and youtube]
From my understanding (the university doesn’t have a policy to snoop on the users’ traffic) this is either to google(youtube), or to the CERN grid (which is IPv6 only). Yes, there actually exists a lot of content over IPv6, netflix/hulu have already deployed it, and it’s just a few of the big sites (twitter is an example) that are still holding back.
The university provides both ways for the configuration of end-nodes – Router Advertisement (RA) and DHCPv6. For most cases they’re both needed, as RA can’t provide DNS servers, and DHCPv6 can’t provide a gateway (which is something that’s really annoying to most people doing deployments).
[33]
Here’s how the default route is propagated in the network.
[34]
This was actually a surprise for me – there are actually two default routes in IPv6. One is ::/0, which includes the whole address space, and there is 2000::/3, which includes only the unicast space. There is a use for sending just 2000::/3, to be able to fence devices/virtual machines from the internet, but to leave them access to local resources (so they can update after a security breach, for example).
[35]
Here is how you redistribute ::/0, by first creating a null route and adding it in the configuration.
[36]
And here’s how it’s propagated in the network, using OSPF.
[37]
We can do the same for 2000::/3,
[38]
and push it to the virtual machines, who also have connectivity to the local resources through a different path.
[39]
Now, this is something very interesting that was done to have a highly-available recursive DNS resolver for the whole network.
[40 and 41 are the same, sorry for the mistake]
[41]
This is how a node works. Basically, there’s a small program (“manager”) that checks every millisecond if the BIND daemon responds, and if it’s just starting, it adds the anycast resolver addresses to the loopback interface and redistributes a route through OSPF.
(note the addresses used – FEC0::.. – turns out that this is the default resolver for MS windows, so this was the easiest choice. Seems that even in the land of IPv6 we still get dictated by MS what to do)
[42]
If the name server daemon dies, the manager withdraws the addresses.
Also, if the whole nodes dies, the OSPF announce will be withdrawn with it.
[43][44][43][44][43][44]
Here you can see what changes when a node is active. If one node is down, the routers will see the route to the other one, and send traffic accordingly.
[45]
And if they’re both operational, the routers in both campuses get the routes to the node in the other campus with lower local preference, so if they route traffic just to their own, and this way the traffic is load-balanced.
[46]
Thank you, and before the questions, I just want to show you again
[3]
the people that made this deployment. They deserve the credit for it :)
[In the questions section, Stefan Dimitrov from SU stood up and explained that now they have three border routers, the third one is at the 4th kilometer campus]