Kernel Upgrades Jan 20 11PM PST (GMT-0800)

     We will be upgrading to the 6.1.7 kernel this evening at 11pm.  Because KASAN caused issues with some of our servers, some would not boot with it, some were slower, we will only be putting it on two NFS servers that have been problematic.  I believe however that 6.1.7 has already addressed the bug because I found a patch in the changelog that addresses exactly the issue we’ve been experiencing, a use after free in nfsd.

     We will be rebooting centos7 and scientific7 earlier in the afternoon because of difficulties in upgrading those kernels that requires some extra processes.

     Tonight will affect all services, if all goes well we should be done by 11:30 and no service should be out more than 10 minutes EXCEPT for Yacy.  Yacy rebuilds it’s database upon reboot and this takes 30-45 minutes.

     This will also affect all of our fediverse servers, https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://nextcloud.eskimo.com/ (currently unfederated owing to a plugin problem), and https://yacy.eskimo.com/.

Customer Service on the 18th

     Comcast will be doing work on the 18th that will take my telephone system, fax, and Internet connection away from my home office during part of that day.  I do not know what time or how long.  I hope to have some internet access via my tablet but it is very limited.  If you have an issue and the website is working, please generate a ticket, else please try to contact me periodically,

Server Hang – Kernel Upgrades 1/12/2023

     The server that provides home directories and also shared web services as well as two virtual private servers hung with the CPU stall bug even though we’re about 12 incarnations down the road from where it first appeared.  We know this seems to be a use after free error in the kernel but kexec is not finding it.  The kernel developers have asked me to compile in KASAN which is another memory allocation debugger, the system is now running that kernel and hopefully if it hangs again it will provide some debugging information.

     Between yesterday when I compiled the kernel and today, 6.1.5 has come out, so I am going to try to get kernels ready for another upgrade tomorrow evening and if I make it we will be doing a kernel upgrade at 11pm.  This affects all eskimo north services.

Kernel Upgrade Aborted

     Tonight’s kernel upgrade is aborted because it will not build with the debugging options the developers wanted me to include so I’ve sent the compiler errors back to them and will resume when I have a fix.

Kernel Upgrade Tonight Sunday 11pm

     I am going to be upgrading the kernels on only the physical servers tonight in order to turn on some additional debugging options to help the developers chase down an error in the NFS code that is causing issues for us.  Apparently this bug only occurs when you have a mix of NFSv3 and NFSv4 clients as we do, (also an NFSv2 client).  So it’s an issue that is rarely triggered but our environment triggers it.  It is a use after freed error that for some reason KFENCE is not finding, they have asked me to turn on KASAN, a different somewhat higher overhead memory allocation troubleshooter, and this requires a rebuild of the kernel and rebooting of the physical servers.  Because this only affects the NFS servers, I will be installing this on Iglulik, Igloo, and Mail, but not the other servers at this time.  This will affect vps6, vps9, and all the shell servers and mail.  The interval will be between 11pm-11:30 with individual outages not lasting more than about 10 minutes with the exception of yacy.eskimo.com which takes about half an hour to 45 minutes to rebuild it’s database after a reboot.

     This will affect all Eskimo North services EXCEPT for vps1-vps6, vps7 and vps8.

     It will impact our Fediverse instances including https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://nextcloud.eskimo.com/, and https://yacy.eskimo.com/.

Kernel Upgrades 1/6 11PM PST (GMT-0800)

     Planning to upgrade to a 6.1.2 kernel Friday 1/6 at 11pm Pacific Time.  The present kernel, 6.0.15 has a nasty bug where it locks hard, no kernel dump, no auto reboot, no magic sys request key, only power cycling the affected machine restores service.  The inability to get a kernel dump makes this bug particularly difficult to troubleshoot.  Since this bug has persisted from 6.0.12, I’m going to try a 6.1 kernel and hope for better.

     This will result in outages between 11pm-11:30pm of all services lasting about 5-10 minutes each EXCEPT for yacy which takes close to 45 minutes to rebuild it’s database after every reboot.

     This will affect all of Eskimo North’s paid services such as mail, web hosting, virtual private servers, shell accounts, etc, as well as our free services including https://nextcloud.eskimo.com/, https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, and as I mentioned, https://yacy.eskimo.com/.