Eskimo North


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Friday / Saturday Outages




     First, to anyone who wants to unsubscribe from this list:

	Send e-mail to: outages-list-request@eskimo.com for the outages
list or eskimo-announce-request@eskimo.com for the Eskimo announcements
list.  In the subject line put the word "unsubscribe".  Do not forget the
'-request' or the command will not be acted upon and be sure to put the
unsubscribe in the "Subject:" not the body of the message.

     Things were up and down late friday and much of saturday as a result
of problems integrating a new server into our network.

     The server is intended to provide both relief from some capacity
issues as well as provide some new services and improved modernized
versions of older services.

     However, Linux has evolved considerably and I'm learning all sorts of
new things as I go.

     For example, the old main NFS server that holds users files and the
mail spool was capable of sustained disk I/O of about 6.5 MB/s, the new
unit is capable of sustained disk I/O of around 140 MB/s, and I/O bursts to
cache of around 1GB/s, a considerable improvement.

     Yet, NFS performance using NFS version 2 was very disappointing at
around 2.5 MB/s.  However, the newer Linux supports version 3 and 4, and
the existing version has client support for version 3, and hypothetically
server support but the server portion is seriously borked as I discovered.

     So my first inclination was to switch everything over to version 3.
Well, initially that appeared to work, but it turned out to have serious
stability problems, so I changed things back to version 2 except for the
main NFS server which has the newer version of Linux (CentOS 5.2) is
working as a version 3 server.

     Version 3 is TCP based rather than UDP based, and that makes it
stateful, and that statefulness brought up all sorts of interesting
deadlock issues that I spent much of today coming to grips with and
resolving.

     Also, the server for version 3 on the old Redhat 6.2 is SO broken that
even having it compiled into the kernel and not used caused sporadic
deadlock issues.  That was the cause of much of Saturdays outages.

     So I had to rebuild kernels on ALL of the Linux machines, and several
times reboots didn't go well necessitating trips to the Bellevue co-lo
center in the snow.

     Of coarse it didn't help that I started out my day by vomiting up my
breakfast; suspect related to a new brand of multivitamin as the grocery
store no longer carried the one I had been using and I think it is just too
much of something.

     But we are making progress.  And there has been some good
developments.  I thought Sparc Linux had been abandoned because there had
been no development on Aurora Linux which is THE Sparc port in a year.  It
turns out this is because it's been rolled back under the Fedora umbrella
and Fedora 11 is slated to include Sparc support.

     They are abandoning 32-bit hardware, but that's ok as we're not really
using 32-bit hardware for anything substantial anymore except the main
shell server and that's slated to be moved to 64-bit hardware.  The 32-bit
Sparc hardware is becoming extinct and if it was to fail we'd be very hard
pressed to find replacement hardware anyway.  So we'll be able to keep most
of our hardware in service and still modernize the software when that
release comes out.

     We're going to be offering a host of new services including Linux
based shell services; virtual servers, shell services with greater than 2GB
quota limits and support for large (>4gb) files which will open it up to
video applications to a greater degree, more advanced spam filtering, and
many other things.

     Unfortunately there is a lot of learning curve involved; a lot of
gotcha's that don't seem terribly well documented, things like the newer
Linux needs to have NFS servers in /etc/hosts or it may not find them even
though they're resolvable in DNS and NIS.  There are incompatibilities
between old and new, etc.  Older versions of Linux had some issues with
Indianness (relates to whether byte order is MSB first or LSB first), and
since we're going from Big Indian hardware (MSB first) to Small Indian
hardware (LSB first) there are some issues with the older software.

     So I apologize for the glitches today but no it's in the interest of
providing faster better and more service in the future.

Eskimo North Support   | Voice Numbers - (206)812-0051 or 800-246-6874
support@eskimo.com     |   Voice help available 7am to 10:45pm Mon-Fri
PO Box 55816           |      and 11am to 7pm Saturday and Sunday
Seattle, WA 98155-0816 |          Fax us at - (206)812-0054