Long ago I found myself at work, needing a place to put my work files. At the time, everything was saved to the desktop of a sketchy gateway computer. I set out to have a single network storage area for both of our computers at the time, and that took the form of a 500GB hard drive in a USB enclosure connected to a hacked PogoPlug. Over time, I saw the growth of my department start to require separate user accounts and that eventually meant creating a Windows Domain. That was big and scary, at the time, so I left it as a side project. Much later on we had some people come in and set up a “proper” server complete with MS Windows Server and a domain controller with it. When the IT company that built it dissolved, it eventually came down to me to figure out what other people had setup. Turns out, they had absolutely NO redundancy built in. There was a single VM running as a domain controller, a single fileserver, and a single DNS and DHCP server. I also didn’t like the way it was setup, as it used a .local domain suffix. Our company owns a proper domain name for our website, so it makes sense to use that as our internal domain, since that’s generally regarded as better practice than using bogus suffix. So I set out to finally understand how this works.
First off, I wanted to get away from Microsoft, since it’s expensive, and very restrictive. If I build a Linux server, I can make 1000 copies of it and nobody cares. Enter Turnkey Linux. They have a ton of prebuilt appliances just for servers. They also have a domain controller appliance that saves me the 100 lines of bash to setup all the prerequisites for a samba DC.
So I quickly spun up one of the TKL VM’s, and set it up as my primary domain controller, added all the users and done! I added a fileserver as well, copied our stuff to it, and its been running happily ever since. Except for one thing. It’s running on a VMware host, and the vCenter server appliance that’s running on it was setup by the aforementioned dissolved IT company, so I don’t have the login to get in and fully control this host. After much searching, it looks like the only way around this is to wipe the host and start over. Great. And all our services are running on that host with no redundancy. Thanks guys. Further, it’s all Windows Server based, and being that Windows VM licensing is about as clear to understand as an alien language written in the quantum vacuum, I don’t really know if I can just copy them to my other host (that I set up), and have them off until I’m ready to make the transfer. WS2K12 is too expensive to buy extra licenses just to have them off until I can make the switch, and afterwards, they would be extras I don’t need.
So anyway, I’ve finally come to a point where I feel it’s very wise to start building in redundancy, so I can work toward rebuilding the host I don’t like.
I call it Lenny.
After a ton of researching, I found nothing that actually helps with the failover part of this equation, since almost all the pages I find on “failover samba AD DC” setups are of the form:
“install all this stuff before you install samba. Setup all the kerberos stuff. Add second server. Done.”
What I needed was between the “Add second server” and “Done” parts.
Also, as much as I like turnkey linux, I have a very hard time finding specific info for solving problems with their appliances, so I have to look for general linux advice.
The first step I took was to start with a copy of my existing DC that I made once it was setup and working, and import it into VirtualBox, keeping the network set to internal, so it doesn’t interfere with the rest of the work network. This machine can’t be taken down or reprovisioned, or I’d have to rejoin ALL the machines in our network. I really don’t want to do that. So I imported a second domain controller appliance. It’s worth noting that they have to be the same version, specifically the same samba version, or they won’t sync properly.
When the DC2 is running for the first time, it wants to set itself as the primary controller, which I don’t want, so I just accepted the defaults to get through the setup process. After that, delete everything in /var/lib/samba/ , as well as the smb.conf file in /etc/samba/
# rm -rf /var/lib/samba/*
# rm /etc/samba/smb.conf
Change the hostname.
This defaults to dc1, change it to dc2.
Then sync their clocks.
# service ntp stop
# ntpdate -s dc1
# service ntp start
Then join the domain as a domain member server. Note the uppercase domain name in the username.
# samba-tool domain join example.com DC -U”EXAMPLE\administrator”
This should now show up in the Domain Controllers list in the Windows Active Directory Users And Groups tool.
At this point you’d think we are done. There it is in the list, after all. But no. It’s not that simple. And I could NOT find anywhere that said how to fix the next parts. The last two problems at this point are server replication, and failover. I found several pages showing how to check if replication is working, with MANY step and workarounds to try to diagnose them, but it turns out it’s pretty simple. in my case, it boiled down to the DNS server on DC1. When it was provisioned, it added itself all through the DNS records in many different places that I really don’t fully understand, but when you add DC2 to the directory, those records are not automatically created. After setting these records, replication was working, and failover appeared to work as well. If I disconnect DC1 from the internal network, my windows machine can’t see it and hangs for a minute until DC1 times out, then it looks to DC2 for whatever it needs.
It’s worth noting at this point, that if the DHCP server is running on DC1, as it is in my case, once DC1 goes down, so does DHCP and DNS with it. For that to work, you need a second DHCP failover server that I won’t cover here, or just move the service to another linux VM, since DHCP is dead-simple by comparison. Also, the DHCP server must have DC2 in its DNS Servers list as well.