Crash & Reboot

     We had one of our physical servers, which mainly services virtual private servers, crash and reboot this morning.

     The crash was caused by snapd attempting to apply livepatch to our kernel which is not a Canonical kernel.  Livepatch was disabled in the software properties but it tried anyway.

     I’ve removed snapd from our physical servers to prevent a recurrence and any security risk that snapd may represent.

Mint is Up – 20.1

     Mint is back online and is now Mint 20.1, however it is a fresh install and there is not a lot of software installed yet.  If there is something you’d like that is missing, please go to our website https://www.eskimo.com/ and select Support -> Tickets and generate a trouble ticket with your request.

     Thank you.

Mint

     I attempted to upgrade Mint again today and again it failed, this time leaving the machine in an unusable state where python3 wasn’t properly installed and python3 is needed for apt so I could not fix it.

     I could have restored from backups but instead opted to try a fresh install of Mint 20.1 (it was on 19.3), unlike Mint 20, this fresh install succeeded and Mate is working.

     I am still in the process of getting other needed software installed and at least a basic configuration in place so Mint may be unavailable this evening and possibly part of tomorrow depending upon how much I complete tonight.

Web Trouble

     Our web server was largely unavailable between 12:30AM and about 4AM on March 30th owing to one of the system database tables in MariaDB getting corrupted.  I could have restored everything from backups in a shorter period of time but instead loaded backups, rescued the hurt system table structure from backups, then returned the server to the current image and dropped the corrupted table and re-created it from backups up data.  That way no user data was lost.

Network OK

     Nobody at Isomedia has gotten back to me but the packet loss has resolved so I assume someone found what was wrong or if it was a DDoS attack, it’s ended.  At any rate it’s cleared up.

Network Trouble

     We are seeing significant packet loss from various locations to our servers in the Bellevue CoLo facility.  I have determined the issue to be between Isomedia’s Seattle ring router and Bellevue core router and generated a trouble ticket.  We are not seeing heavy traffic to our facility but it may be another customer in the facility is under DoS attack or there maybe hardware or routing issues.  At any rate a ticket has been generated and we are waiting on a response.

Centos-Stream Broke

     Centos-Stream is broken at present.  I tried to install the current openssl on it and unfortunately sshd was built against 1.1.1b which has some different symbols than 1.1.1k so does not work.  Pulling it out also didn’t fix for reasons I do not understand so doing a fresh install.

 

OpenSSL and Kernel Upgrades mostly completed

     With the exception of Redhat based machines, all updates are completed.  I am going to have to build openssl for the Redhat machines because Redhat seems to be ignoring the openssl exploits so there may be some reboots later this evening of Centos7/8/stream, Fedora, and Scientific7.

Kernel, Openssl Upgrade / Reboots Tonight

     I do not normally do kernel updates mid-week, I prefer to wait until Friday on the off chance something goes horribly wrong, to provide the most time to recover before the business week.

     However, a serious vulnerability has been discovered in openssl and I’m going to have to reboot all the machines just to get any old copies of openssl out of memory so might as well do a kernel upgrade at the same time.

     Most machines will remain on openssl 1.1.1f but it will be a patched version that fixes the exploit.  The webserver with any luck will be on openssl 1.1.1k, this is just because it’s already on a self-compiled version of openssl to get the most current encryptions.

     Normally I would start this at 11pm but because of the seriousness of this exploit, I am going to proceed as soon as I have the current software in place on all the machines but some time after 5PM.  The downtime for the entire system should be less than 1/2 hour and any given machine not more than about ten minutes.