Kernel Upgrades

     Kernel upgrades are done except for four machines, one of them is just borked, somehow the image got corrupted so it’s being restored from backups.  Could not get the drivers for the Fusion I/O drive to compile under 5.16 so I’ve ordered a more main brand conventional SSD, it won’t provide quite the I/O rate but it still should be adequate.  So the main physical server is still on 5.13.19 for now.  UUCP also won’t work with a modern kernel, issue with Centos’s start up routines wanting a feature that was deprecated.  Zorin is bored.  And Manjaro had to be restored from backups but hasn’t been brought totally current with upgrades yet as I am having some issues with the AUR processing.

Oops

     I was doing half a dozen things at once and got into the wrong terminal and accidentally rebooted one of the physical servers before I had intended to and without stopping virtual servers first, and under this circumstance it takes forever and a day to reboot so it will probably be 15-30 minutes before the mail system and many of the shell servers is available again.  My apologies but was trying to get kernel upgrades in place to address a potential security issue and two older machines required a lot of modification to run a modern kernel.

Fedora Broken

     Something broke networking on Fedora.  I thought it was the new kernel as it failed after rebooting, but booting off the old kernel, networking also does not start.  So I am restoring from a backup and then will apply the kernel upgrade and then updates to bring it forward again.

Kernel Upgrades Between Wed March 9th and Saturday March 12th

      Because a very nasty exploit has become known that could lead to privilege escalation, a kernel upgrade will be applied to all servers as soon as possible.

     Because the exploit requires some sort of inside access to utilize, shell servers will be done first, then virtual private servers, and the rest.  The physical machines will be done Friday evening between 11pm-midnight.  Most things will take about five minutes but the web server could possibly take much longer as it requires compiling special drivers for a special SSD device we have on that machine for fast database access.

     This will affect all of our servers and services.  But with the exception of the web all outages will be brief.  Most of the work will be done during the evenings 10pm-2am, especially shell servers and virtual private servers.  This will impact all eskimo.com services including https://friendica.eskimo.com/, https://hubzilla.eskimo.com, https://nextcloud.eskimo.com, e-mail, shell servers, private virtual servers, and virtual domains.

Unscheduled Reboot

     Around 12:35 I needed to reboot the server that hosts the home directories, web server, ubuntu, debian, and mint because something went wrong with the quota system that I was unable to identify and correct without a reboot.

Scheduled Maintenance March 5th Cancelled

     The downtime for March 5th is cancelled.  The drive in question has only a single failed sector.  SMART estimates more than six years of life remaining.  It would not automatically re-allocate the failed sector on a read, apparently it can only do this on a write which is odd since it’s part of a RAID array it could have gotten the data from the mirror drive.  Oh well, instead of physically replacing it, I’m going to fail it (stop the RAID from using it, overwrite that sector to force a re-allocation, then restore it to service which will cause the raid software to mirror it from the operational drive.  All this can be done without interrupting service.

Eskimo North Maintenance Saturday Morning Midnight – 2AM’ish March 5th Pacific Standard Time

     I will be taking the main server holding home directories down for up to two hours Sunday to replace a failed drive in a RAID array.  No data is lost as the result of this drive being ill since all data is duplicated in the RAID array.  Things will be slower for the twelve hours or so after replacement as the RAID resync’s the new drive.  Because virtually everything depends upon home directories pretty much everything except private virtual servers will be out of service during this interval.

Mint is Repaired but Down for Backup

     I’ve repaired Mint, at the heart of the issue was the glibc package name was somehow missing a ‘1’ at the end and so package version mis-matches resulted.  I had to force a manual install and then re-apply several hundred updates.  Because chasing this down was not easy I am making a backup now that it is fixed so I don’t have to do this again.  Mint should be available by approximately 1900 Pacific Standard Time March 1st.

Mint Unavailable for a few hours

     A routine upgrade exploded today on Mint leaving packages in an inconsistent state of dependency loop hell that I can’t easily fix so I’m going to restore from a backup and then re-apply missing updates.  This will take an hour or more to complete.  It will be unavailable during this time.

Brief Interruptions Around 4pm Today

     I apologize for the brief interruptions of service around 4pm and 4:20pm today.  Our router was being attacked by a botnet attempting to brute force guess passwords on it.

     To combat this I was able to employ fail2ban, the same software we use on our hosts, as the router is based upon Debian stretch.  But because the router uses an overlay file system I had to change the default logging location to a place where it would save across boots and needed to test this requiring a couple of reboots.