Today we experienced an unexpected outage from the primary server that relays all of the mail incoming to our customers.   We identified the issue as being the virus scanner that we were using.  It was essentially aging, but all of the definitions were up-to-date.  The problem is the way in which it ran allowed it to be held into memory longer than it should.  This virus scanner works with the network and when it’s taken offline everything goes offline which has been our reservation and reluctance to upgrade it.

When the master mail server began to get a little shaky today, we went ahead and did this.   That meant as soon as we started the upgrade of removing the old virus scanner everything went down.   The downtime lasted a little longer than expected because as we began to work on it, we were mailbombed (which the virus scanner usually stops).  After we got that under control by completely disabling the network ports, we were able to get the virus scanner setup.  We then brought the network ports back online at about 7:04PM EST and so far the performance has been very acceptable.  In fact, overall performance, so far, has increased by about 500%.   That means the main email server is using less RAM and less CPU (much less).

We do apologize for the downtime but we’re always happy to ensure we’re providing you with the most secure and best services we can deliver.  If anyone has any other questions about this downtime, please open a ticket.

TECHNICAL INFORMATION:

The server was running on the old clamd way of scanning emails which conflicts with the new clamav-server way.  It was necessary to completely remove the methods of using clamd and remove the software and install clamav-server and provide new methods of establishing IP connections to scan email which means all other email servers can now scan after-delivery as well as the old pre-delivery methods.