Server Status
This section provides real-time news, announcements and updates regarding service offerings, hosting server statuses, scheduled reboots, network statuses, outages, upgrades and improvements, and planned or unplanned maintenance. Come here to check the situation if your site or application is not responding or performing at expected speeds.
Hurricane Irene - Aug 26-28
Hurricane Irene is scheduled to pass through the neighborhood of our east coast datacenter in Herndon, VA. While we don't expect any serious problems, we are performing our full backup offloading process outside of the normal schedule, due to the limited time remaining before the weather lands. Some of this work must be performed during normal business hours, not during the standard 7-9 PM PST maintenance window, so you may notice periods of reduced performance during these operations. We expect it will all be normal again by Sunday.
Are you affected?
Ping your domain with this web tool and if the IP address is 64.34.162.117 then you're in the East Coast datacenter. NOTE: It will always say "Timed Out" because the servers are configured to not respond to ping. This does NOT mean the server is down! This page is also hosted in the Herndon datacenter - so if you're reading this, that's a good sign. :)
Update
The backup and offsite transfer of the backup has completed ahead of schedule, around 1 AM PST Saturday, and no further service interruptions are expected.
Monitor the situation
Currently it appears the warning cone for the hurricane puts the nearest flooding related risks nearly 20 miles away, and the general hurricane path cone around 120 miles away from the datacenter.
See the cone here - when you zoom out far enough, the nearest point on the map to the datacenter is Reston, VA which you can then measure the distance between there and the affected areas.
The datacenter is here: Google Map (note that the elevation is around 400 feet while the White House is nearly at sea level, with regard to any flooding related news you might encounter.)
An assortment of webcams in and around Washington D.C.
Here's a tool to locate webcams anywhere in the US.
Final Update
The hurricane has passed and no downtime or other problems occurred.
Ayanova: Object reference not set to an instance of an object
Some users have reported the periodic error "Object reference not set to an instance of an object" when trying to access the AyaNova scheduled users grid and possibly other areas of the software.
The Ayanova team had this to say: "A similar issue occurred with another company using AyaNova when they were viewing the Service Workorders grid. We obtained a copy of their database to determine what it was as unable to reproduce. Turned out the specific data in the grid exposed a strange bug in the Microsoft .net framework code related to some internal XML handling we do for grid sorting and filtering."
The latest Hotfix, AyaNova 7.0.4.0 Hotfix 7 may resolve this issue for you.
More information about this hotfix is here:
http://forum.ayanova.com/showthread.php?1509-AyaNova-7.x-Maintenance-Update-Fixes-amp-QuickFixes
Please note that if you are using a hosted AyaNova system with us, it is important that the version on the hosted server and clients are the same.
Our maintenance window is from 7-9 PM PST (GMT -8) during which such updates can be applied. If you are running 7.0.4.0 and would like this hotfix applied, here is the procedure:
- Contact Us to request an update be scheduled.
- We will apply the update during the maintenance window and email you to confirm it is complete.
- Anytime after this, apply the hotfix locally to all of your Windows Data Portal installations.
If you would prefer to apply the hotfix before the server is updated (i.e., so that it is ready for your staff the next morning without them having to do anything), that is OK, but it is recommended you apply the update then shut down the systems. The goal is to avoid ever executing a copy of AyaNova program that is not running the exact same version of as the hosted system, so any advance updates are OK as long as the program isn't executed prior to the server being updated.
If you did happen to execute a local copy of AyaNova that was slightly newer than the server, don't worry, it would not damage any data. One of two things would happen - the software would throw an error on startup indicating they do not match and immediately shut down, or it would go ahead and load normally. If it loads normally, it is most likely usable, but not recommended to use it in this mismatched state for any significant activities (read-only would be a good idea in that case.)
Service interruption, April 7, 2010
Some service outages occurred today preventing HTTP access to the server. We believe the problem has been resolved and everything is operating normally again.
The issue was the IIS server was failing to service requests and returning an ECONNRESET. The problem was noticed around 3 PM PST and again around 5 PM PST. The first occurrence was temporarily resolved by restarting the web services while the investigation began. The cause had not been discovered until the 5 PM PST outage forced a server reset. Around this time, the investigation revealed some configuration issues with a mandatory update of the server firewall/antivirus/security system. Some time ago it was discovered that an optional email scanner feature was creating an excessive number of TCP endpoints that were not being terminated. During the update of the security software, this feature was inadvertently re-enabled and was very likely (but not 100% verified to be) the cause of the failures today. This feature was disabled. Also around this time it was noticed that the update did not correctly bring in the previously stable firewall configuration from the earlier version and this needed to be reconfigured. This required additional maintenance outside the standard maintenance window of 7 PM - 9 PM PST due to the urgency of having proper firewall configuration. We apologize for the inconvenience and service interruption. We've done everything in our power tonight to ensure that everything is configured properly and the situation will be monitored diligently in case the cause of the initial ECONNRESET problem is not yet fixed so that swift action can be taken.
Longstanding, devious, server performance vampire vanquished today
A very devious, tricky, and longstanding bug affecting server performance has been eliminated today. I feel like the coyote having finally caught the roadrunner on this particular issue. It has been my enemy for a long time and it is finally vanquished. I expect notable performance increases across the board, as this was a very low-level issue affecting raw filesystem and kernel performance, thus affecting every area of the system.
Server issue resolution info
The server issue that occurred (discussed in the previous blog post) has been resolved. The issue was linked to an anti-virus email scanner. While the issue is not fully understood as it has not been duplicated on another system yet, it was unmistakably linked to this as disabling this scanner resulted in a total recovery of the system without having to restart services (note that prior, even restarting the services completely failed to resolve it). The TCPView utility from SysInternals reported this scanner having a large number TCP endpoints (and possibly half-open or fully open connections) which were not shutting down. It may have also overloaded the Named Pipes causing other havoc, but primarily the connections and endpoints appear to have caused the system to hit a limit preventing new ones from being created, essentially blocking the web server from communicating with the database. The scanner has been shut down until the situation is understood better thus returning the server to a fully functioning state. This scanner had been enabled and tested many weeks ago and was thought to be completely stable which made it difficult to zero in on the problem. Why it suddently started behaving in this manner is unknown as there was not a seriously increased influx of spam or other messages that would clearly have led to this, and even so, it should have been designed to handle it better.
Thanks to our customers for their patience and prompt reporting of issues which was helpful in resolving the situation and we apologize for the interruptions that were caused.
Server issue, Dec 23, 2009
A server issue occurred this morning. The result of this issue was that some site functions and AyaNova systems were unavailable. The basic cause was essentially that the databases stopped responding to internal requests from the web server.
Due to the time of day and the nature of the problem as it could be best understood at the time, I made some adjustments to various configurations and then did a full shutdown / reboot in order to get everything up and running as quickly as possible. The system is back up and hopefully the changes made will ensure that the system will remain stable throughout the day until further investigation can be done and no further interruption will occur.
The problem is not yet fully understood due to the extensive log data that I will need to examine in order to properly investigate, which I will be doing throughout today. Any further adjustments to the server will be made during the standard maintenance window of 7-9 PM tonight if necessary.
For the technical among you and in the interest of full disclosure:
The problem spanned Microsoft SQL and extended to MySQL causing them both to fail to respond to requests simultaneously. Restarting various services revealed that the IIS FTP service refused to restart with an error regarding “insufficient storage” (of course it is not a disk related error), and while the databases restarted without incident, they failed to respond to requests. The Event Viewer reported nothing of interest, however the Microsoft SQL logs have some information that will need to be examined more to understand. My sense of the matter based on the FTP service failure message is that it lies somewhere in the arena of the Microsoft MSDTC system and how it internally maps ports in the networking layers for requests. Some significant Windows Updates were applied on Sunday which were tested ahead of time, and worked fine Monday and Tuesday, but as always unforeseen issues can appear later as may be the case here. The adjustments made to hopefully reduce the problem for today were reducing the port usage range for MySQL, reducing the total number of active connections in the AyaNova IIS Application Pool. There was a known MSDTC problem noted in earlier versions of IIS, but resolved in the version we are running making the workarounds for the previous versions invalid for this one, which is another reason I suspect the recent Windows Updates may be at work here.


