Fedora and Other Works in Progress

     I managed to move the mail spool and a number of virtual machines to different hosts.  Fedora is caught in the middle right now and my quality Comcrap cable Internet keeps going down making work close to impossible so I’m going to leave Fedora down for the night and hopefully pick it up in the morning.

     Still have a number of virtual machines to move, both to align services with the disks that they are using and to clear one machine so I can do hardware upgrades and re-install an operating system (replacing Centos 6.7 with Ubuntu 15.04) so there will be some individual machines interrupted tomorrow to complete this.

Wiring Replaced but Bad Jack

     I replaced all the gigabit wiring with cat6, however, I discovered the Jack provided by Isomedia to connect back to their router is flaky.  I’ve put in a trouble ticket but this probably means there will be some interruption during the day to repair or replace it.

Out of Office

     I’ll be out of the office for a while as I am headed to Fry’s to get some cat-6e or better cable to replace the cat-5 cables that were not designed for 1G Ethernet usage.

Ethernet Cable Replacement

     There will be brief network interruptions (two, lasting maybe 3 seconds each) as I replace Ethernet cables between two physical host computers and our switch at the co-location facility this afternoon.

     Two servers are sometimes experiencing brief intermittent network problems. Both of them are with cables I made many years ago.  I am suspecting it is just cat5 wiring that I used and not really adequate for the 1G Ethernet presently in use resulting in a marginal signal.

Ubuntu Maintenance

     As nobody is presently utilizing ubuntu.eskimo.com, I am going to take it down briefly to move it to a different server in order to clear virtual.eskimo.com, the existing physical host computer, for hardware and software upgrades.

     The expected downtime will be approximately 45 minutes.  It should be back online around 2:15pm Pacific time.  In the meantime, if you need a Debian based shell server, debian.eskimo.com and mint.eskimo.com are available.

Mail Maintenance 10pm-11pm Pacific Time

     Tonight, October 14th 2015, around 10pm, I will be stopping all mail processing for a few moments while I move the servers to use a new spool on a newly upgraded server.

     The transfer of the servers to the new spool will take about ten minutes and then processing will resume.

     All of the shell servers will then need to be switched to the new mail spool.  Some will require a reboot.

     The end result of this is that the mail spool will no longer be on the same machine as home directories which will improve responsiveness by distributing the disk I/O load across multiple servers.

     In addition both the mail spool and the home directories will be on RAID 10 devices so any single disk failure will not interrupt service or cause loss of data.

SquirrelMail (and other web applications)

     There is a problem with recent versions of Firefox that causes certain fields to not be filled in intermittently.  Sometimes it won’t fill them in at all, sometimes partially, sometimes fully.

     This causes things like reply and forwarded e-mail not to get the original text filled in.

     This is not a flaw with mail, it is a flaw with the regular version of Firefox.

Fixes available this time:

  • Install the Developer version of Firefox.
  • Downgrade to an older version of Firefox.
  • Use a different browser such as Opera, Internet Exploder, or Chrome

Maintenance Done

     The maintenance is completed but just about anything that could go wrong did.

     I brought an old 3×4 style monitor because I had intended to move just Sparc equipment which has an 1152×900 resolution.  That is close enough to 1200×900 that the old ViewSonic 3×4 style monitors will sync to it.

     In the process I managed to snag the cord to both existing hosts / file servers and basically take everything down.

     The old monitor will not sync to the 1680×1050 resolution of the modern equipment except one modern servers will step down to 1200×900 resolution.

     After accidentally powering everything down that was my only window to bring things back up.  The file server with all the user information failed to start rpc.mountd, so none of the other machines could mount file systems from it.  After fixing that I had to reboot every other machine.

     I got the rack re-arranged so it can accommodate another couple of monster cases which in turn accommodates more drivers for RAID10 file systems and Hyper 212 Evo coolers to replace the Intel stock coolers.

     With the stock Intel coolers, the CPUs were overheating and throttling, with just an ordinary work load, resulting in slower performance.  Now they can run full tilt all four cores doing prime95 or the Linux equivalent, mprime, all day and never exceed 65°C.  They throttle at 74°C so that basically means they can run in turbo mode indefinitely now even with the most demanding work loads.

     I also have all the power and Ethernet cables routed nicely and strapped down now instead of running all over so less likely to accidentally unplug a server in the future.

     But I wasn’t able to get one server back in service yet because I didn’t have a monitor I could see to configure it so will have to make another trip down there tomorrow to finish it.  However, since this server currently has no services on it and all the cabling is now neatly strapped, this should not be service impacting.

Shellx Back In Service – Maintenance This Afternoon

   Shellx is back in service.  After restoring from a backup image, re-installing 170 updates, and rebooting, mail continued to function.

     We use postfix for the mail transfer agent.  For some reason instead of expanding what is in /etc/mailname, it was including it literally.

     Re-installing postfix entirely did not fix it so it was something postfix depended upon that was restored from backup but I do not know what.


     Late this afternoon I will be turning eskimo.com down as well as our radius servers to physically move them in the rack to make way for a larger case for another of our servers to address overheat issues.

     I will also be rebooting Igloo, our newest file server, to get rid of a stuck process.  This will momentarily affect all services.  This will probably happen around 4:30-5:00pm.

 

Shellx – Progress

     Restoring shellx from a backup image required fixing some things that have changed since, like the IP address of NIS servers and the file server.

     After that shellx worked, mail functioned properly again.

     However, I will need to re-apply updates.  I am making a new disk image now in case it was one of the updates that broke things. There are two likely candidates, a bad update, or corruption resulting from improper shutdowns that happened several times in the process of putting a new server in place.

     The plan is to make a backup of the server now that it is in a known good state, then bulk apply all the updates and re-test.  If everything still works I’ll know it wasn’t an update that caused the problem, if not I’ll restore from the backup now being made and then re-apply updates one at a time to find out which breaks, restore again and re-apply all except the troublesome update.