2009-08-12 dhcpd

by Vasil Kolev

Fun.

So, we have a working DHCP in failover mode, working in the 52 VLANs and everything seems fine. But, I think that all its developers are idiots, as I’ve hit a few really stupid problems.
(background information – we’re using the same version on the same hardware, on a freebsd and ubuntu).

1) When a peer is defined, both sides have to be named in the same way, or it would refuse to link. Seems stupid to me.
2) Again in the peer definition, if peer port and port are the same, it refuses to connect. WTF.
3) If both machines’ clocks differ with more 60 seconds, it doesn’t link. Tis is OK, but then one of them crashes with SIGSEGV.
4) While in failover mode, you can expect all kinds of weird messages looking like errors, which are actually normal, like:

dhcpd: bind update on 87.76.100.23 got ack from SERVER2: xid mismatch.
dhcpd: bind update on 87.76.104.41 from SERVER2 rejected: incoming update is less critical than outgoing update

5) From time to time they might lost connectivity and stay in a weird hanging mode, in which they easily overflow a pool. Most probably the problem is this one, but I haven’t seen a solution yet, I think it’s not fixed in 4.1 either. The only way to know that has happened is by doing

grep 'failover peer' /var/log/syslog

and the last thing at the end is

dhcpd: failover peer SERVER2: peer moves from normal to communications-interrupted

(without any commonications-interrupted to normal transition later)

I tried running dhcpd 4.1, but it had some weird issues too (although I tested it just for a bit, I got a bit mad at the compilation – had to hack some things and to remove the dhcpv6 support).

In short, the failover protocol is crap.

Tags:

Leave a Reply