This morning’s outage was caused by a kernel soft CPU lockup on the server that serves the home directories and also one virtual private server. Because it is a physical host, not a virtual machine, I had to drive to the co-location to power cycle it.
This is caused by a race condition in the kernel when two or more CPUs attempt to access a resource that has not implemented proper locking.
I have made some changes to the system configuration that should result in an automatic panic and reboot should this occur again in the future.