Friday Sept 16 NZT Site Outage - Now with Incident Report
Here is this incident report for this issue.
This issue was triggered during our automated backup of the game database. We are not absolutely sure what happened yet but the backup caused the database software to panic. Normally this would lead to the server aborting all open transactions, restarting the database and then retrying all the transactions once more, continuing where it left off. Unfortunately the DB recovery code was not well tested enough and the server managed to get into a state in which it thought it was recovered, but yet every transaction returned a database error. While our monitoring did detect the issue, nobody was awake to see problem and attempt to fix the issue. When I woke up this morning I restarted the realm and this brought everything back up. Here are the steps we will be taking in response to this issue. 1) Investigate the cause of the original issue and attempt to fix it. We will be asking the developers of our database software questions about what occurred. This should prevent this specific issue happening again. 2) Fix the code that recovers the database after a failure. This should prevent problems occurring with the database in the future. 3) Investigate our options for adding support for sending an SMS message to our developers so that if something goes wrong during the middle of the night we can investigate it immediately. 4) Potentially add an automatic restart to our monitoring so that if the realm continues to be in bad state for more than 5 minutes, automatically restart it. This is something that we have been considering but are reluctant to do because we might end up causing more problems if there are false positives. While we are in beta, server problems are expected, but longer term downtime is not acceptable. We apologise for this issue and hopefully with these steps it will not happen again! Path of Exile II - Game Director
| |
beta key and a apologise u :-)
kiddin |
|
Conflicts may arise on the apache server when running a backup depending on what other processes are running. Especially mail services such as spam assassin can occassionally crash the system when running a database backup. Just my two cents.
|
|
" Hi!!111! it's me... debbbyy server... I kow itz 3am buut I wazout nd I think u need a bootycall so come over ok and we have some fuuuuuunnnnnnnnnnsd;fdkmv *Dev's wife rolls over and asks... uhh who was that? No, seriously you're just going to leave at 3am? Who was that??? GIVE ME YOUR PHONE! OMG WHO IS DEBBIE????* *uhh hun, that's the database server at work* If you have account problems please [url="http://www.pathofexile.com/support"]Email Support[/url]
| |
" I loled. Seriously though, I thought this was a DDoS attack after experiencing a few on gaming websites a couple of months back. Glad to know this wasn't an attack and good luck with fixing the semantics. |
|
I was wondering yesterday evening that I could not log on, and got only a "faulty login" reply. It was strange at the time that there was no mention about problems on the news nor in the forums, but the fact that nobody was able to login does kind of explain this ;)
I must say that I am impressed by the openness of the developers regarding the situation and their response to it. This kind of attitude gives me hope of plenty of good things to come. Keep up the good work! |
|
Nice to see, that GGG report problem to us.
|
|
Thanks for repaired that problem;) Like GGG :]
http://pathofexile.com/forum/view-thread/2243
^^^^^Polish Corner^^^^^ |
|
Thanks Chris and Jonathan for keeping us up to date.
|
|
Cool. Good to know. Gave me a fright last night.
|
|