[ICANN-CSC] [Ext] Re: Notification of a recent service performance incident
amy.creamer at iana.org
Thu Jun 9 22:33:27 UTC 2022
Thank you for the question.
The documented change script had the side effect of undefining the setting that controlled the outbound delivery of email. Unfortunately this error was not detected during review and testing as the affected setting applied only in the production environment. The deployment process, while managed under our change control processes, is partly designed and constrained by the architecture of our current platform. One objective of the next generation of the root zone management system is to incorporate modern deployment practices that we believe would have prevented this error. This release this is planned for later this year.
Director of Operations
From: Gaurav Vedi <gaurav.vedi at gmail.com>
Date: Thursday, June 9, 2022 at 10:00
To: lpyadav <lpyadav at nepal.gov.np>
Cc: Brett Carr <brett.carr at nominet.uk>, Gaurav Vedi via ICANN-CSC <icann-csc at icann.org>, Amy Creamer <amy.creamer at iana.org>
Subject: [Ext] Re: [ICANN-CSC] Notification of a recent service performance incident
Thanks Amy for keeping us informed.
Glad that the issue has been fixed. As Brett suggested, we should briefly discuss this in the next CSC meeting.
Just a quick question. Did the engineer manually changed the system setting or changed some automation/deployment scripts incorrectly which caused the issue ? May be worth hardening the process here for robust deployment.
On Thu, Jun 9, 2022 at 10:44 AM lpyadav via ICANN-CSC <icann-csc at icann.org<mailto:icann-csc at icann.org>> wrote:
Laxmi Prasad Yadav
----- Original Message -----
From: Brett Carr via ICANN-CSC <icann-csc at icann.org<mailto:icann-csc at icann.org>>
To: Amy Creamer <amy.creamer at iana.org<mailto:amy.creamer at iana.org>>, Gaurav Vedi via ICANN-CSC <icann-csc at icann.org<mailto:icann-csc at icann.org>>
Sent: Thu, 09 Jun 2022 18:29:55 +0545 (NPT)
Subject: Re: [ICANN-CSC] Notification of a recent service performance incident
Thanks for keeping us informed, it feels like a one off issue that you have identified a fix for and are learning lessons from (we’ve all been there), so I don’t believe there is anything for the CSC to worry about here. Perhaps worth spending 10 mins in next weeks meeting to give us a quick overview of what happened, the impact and changes you are making to ensure it doesn’t re-occur.
Manager DNS Engineering
From: ICANN-CSC <icann-csc-bounces at icann.org<mailto:icann-csc-bounces at icann.org>> on behalf of Amy Creamer via ICANN-CSC <icann-csc at icann.org<mailto:icann-csc at icann.org>>
Date: Wednesday, 8 June 2022 at 21:08
To: Gaurav Vedi via ICANN-CSC <icann-csc at icann.org<mailto:icann-csc at icann.org>>
Subject: [ICANN-CSC] Notification of a recent service performance incident
We would like to inform you of a system issue that degraded our service performance recently. This issue will impact SLA performance for May, and is expected to have an impact on our performance in June.
On 20 May 2022, during a planned upgrade to the Root Zone Management Systems (RZMS), an engineer managing the release mistakenly altered a system system setting that caused automatically generated emails from the system to not be delivered to external parties. These emails included prompts for TLD contacts to authorize change requests, and notifications of system events such as technical check remedy and completion. Additionally, users were unable to receive links via email to reset their passwords.
The issue was not caught through our SLA monitoring as emails were being sent successfully from the system, but not being routed correctly once they had left RZMS. Customers first reporting issues on 27 May 2022, and the issue was identified and resolved on 31 May 2022.
We’ve identified that six TLD change requests and five users attempting to reset their password were impacted. While the issue did not trigger any deviating from the SLAs as they are measured, we believe it would be consistent with the intent of the SLAs for these to show these instances as breaching the SLAs. For this reason, we have proactively adjusted the timings for the SLAs in these cases where possible.
IANA is working with ICANN’s Engineering and IT Department on adding additional safeguards on future upgrades to prevent this issue from recurring.
We are happy to answer any questions you may have, either on the list or at our meeting next week.
Director of Operations, IANA Services
Email: amy.creamer at iana.org<mailto:amy.creamer at iana.org><mailto:amy.creamer at iana.org<mailto:amy.creamer at iana.org>>
ICANN-CSC mailing list
ICANN-CSC at icann.org<mailto:ICANN-CSC at icann.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ICANN-CSC