-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CCADB Incident Reporting Guidelines (Fall 2024) #186
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! I've gone ahead and left a bevy of comments here. The vast majority are just line-edits around things like "i.e.", "e.g.", and markdown whitespace; just a couple are more substantive comments.
I think my overall feedback is:
- The updated instructions and textual requirements are really good improvements!
- The distinction between "action items" and "ongoing commitments" feels arbitrary and awkward -- if I make a code change two weeks after the incident ended that prevents this from happening ever again, is that an action item (specific task with a deadline) or an ongoing commitment (designed to prevent a future incident)? Also this distinction is only used once: in the incident closure template. It's never referenced again, not even in the incident report template. Also, the incident closure template is embedded in the midst of the requirements, instead of being with the other templates at the bottom of the doc. Altogether this leads to the "ongoing commitments" section being jarring and out-of-place when reading the doc.
- The new incident report template is difficult to read. There is so much going on there, and the use of square brackets to indicate text that should be replaced is confusing in a markdown context where square brackets generally denote hyperlinks. It's also unclear whether we're supposed to simply address each bullet point, or copy each bullet point into our report.
|
||
### What should Root Cause Analysis consider? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is somewhat confusing to me. It's very good material! But it doesn't contain any MUST/SHOULD/etc requirements, and its content seems to overlap with advice in the new "incident-reporting.md" doc below. Can we consolidate advice into one doc, and requirements into the other?
### Action Items | ||
| Characteristic | Action Item | Ongoing Commitment | | ||
| --------------------- | ------------------------------ | ---------------------------------------------- | | ||
| **Objective** | Resolve the immediate incident | Prevent future incidents and improve practices | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a strong re-definition of the term "Action Item" away from how we've been using it in this community. For the most part, by the time an incident report has been uploaded, all action items aimed at resolving the immediate incident have long since been completed. It's only the forward-looking things (putting new processes in place, adding safeguards, deleting old systems, etc) that generally get listed as Action Items at the bottom of a report.
Are we saying that all of those are now Ongoing Commitments instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - how can these two concepts be clearly distinguished?
|[2.0](https://github.com/mozilla/www.ccadb.org/blob/master/incident_archive/ir_version_2_0.md)|October 17, 2023| | ||
|[1.0](https://github.com/mozilla/www.ccadb.org/blob/master/incident_archive/ir_version_1_0.md)|February 15, 2023| | ||
|
||
## Audit Incident Reporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be "Audit Finding Reporting"?
A finding does not have to be an incident... (applies also for the rest of the template)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanf I believe the intent of using the naming "Audit Incident Report" is to differentiate that type of report and its contents from a "Incident Report". Depending on Root Program Policies the "Audit Incident Report" may be required for any non-conformities, qualifications, or modified opinions included in an Audit. The "Audit Incident Report" may include one or more findings, but it is focused on the root cause of each and action items to address the root cause. The other type of report (i.e., "Incident Report") intends to include even more content.
Co-authored-by: Aaron Gable <[email protected]>
Co-authored-by: Aaron Gable <[email protected]>
Co-authored-by: Aaron Gable <[email protected]>
Co-authored-by: Aaron Gable <[email protected]>
Co-authored-by: Aaron Gable <[email protected]>
Co-authored-by: Aaron Gable <[email protected]>
In response to @aarongable's comment: Thank you for the speedy and detailed review! We also very much appreciate the set of suggested edits made to the draft, many of which have been accepted.
Thanks for sharing this perspective. The Steering Committee can discuss this further and someone will report back on this comment.
One of the goals of consolidating the reporting expectations within the template was to promote more reliable collection of the required information. As an example, many Incident Reports filed over the last year (and beyond) have failed to include a narrative on how the root cause(s) of a subject incident avoided detection, yet this expectation is described on CCADB.org (below).
Our thought was that perhaps because the reporting expectations were separate from the reporting template, it contributed to the inadvertent omission of required information. Questions:
Thanks again! |
Community feedback expressed it would be helpful to more directly present the status of action items, rather than inferring status based on date.
|
||
Open Incident Reports MUST be updated: | ||
- on or before the "Next update" date in the "Whiteboard" field of the bug (note: CA Owners MAY request the "Next update" Whiteboard field be set by a Root Store Operator to align with a specific date related to an open Action Item.); | ||
- weekly, if a "Next update" date is not recorded; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the more prescriptive requirements being added elsewhere, I wonder if "weekly" is sufficiently descriptive of this expectation. For example, can someone post an update on a Monday and then on Friday of the following week? I'm not a fan of making this strictly 'within 7 days' either. Perhaps SHOULD weekly and MUST within 14 days would be a good balance here.
7. The Appendix is for all supporting data: log files, graphs and charts, etc. In particular, in the case of incidents which directly impacted certificates, the Appendix must include a listing of the complete certificate details of all affected certificates. The recommended format is to ensure that all affected certificates are logged to CT, then to attach a text file where each line is of the form `https://crt.sh/?sha256=[sha256 fingerprint of the certificate]`. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. | ||
|
||
### Incident Report Template | ||
In the case of Incident Reports with a Whiteboard field of "revocation-delay", Incident Reports MUST be updated every 72 hours to describe a summary of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, this feels punitive. Is there much value in getting this info every 3 days rather than every 7?
|
||
2. **Actionability**: The commitment describes concrete steps the CA Owner will take, ensuring it's not merely a statement of intent or a vague promise. | ||
|
||
3. **Measurability**: The commitment includes a way to track progress and assess whether the CA has fulfilled its pledge. This could involve public reporting, audits, or other verifiable means. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this requirement quite confusing. How is "public reporting" (presumably by the CA) a form of measurability that is verifiable? Requiring external verifiability seems to greatly restrict actions the can be ongoing commitments. In the example below, I fail to see how "The presence of linting failures where certificates were issued more than 7 days after a new lint was released will represent a failure of this ongoing commitment." meets this requirement, unless we're saying that "verifiable" only means that the community will notice if/when something goes wrong that implies that the commitment was not met, and that seems pointless.
I agree that having the requirements and template in separate places probably did lead to inadvertent omission of required information. However I don't think that moving the requirements into the template will be an effective solution. Details below.
Yes, simply because I think that formatting requirements-within-comments-within-markdown-within-a-code-block-within-markdown makes those requirements difficult to read and understand. I think that this arrangement is a good idea in theory, but I think in practice it will result in difficult-to-read reports that poorly copy-paste bullets from the template and accidentally leave formatting or prompt text in place. Also, I cannot imagine a world in which all of the requirements (e.g. how often reports need to be updated) end up within the template. As such, people writing incident reports will need to be referencing requirements in multiple places anyway. Having to cross-reference requirements from multiple sources increases the cognitive load required to write the report, and will result in slower and less well-constructed reports. The more onerous the report-writing process becomes, the more rules-as-written (as opposed to spirit-of-the-rules) the reports will be.
I would follow RFC 3647's lead on tackling this issue. The requirements document should have the exact same set of headings as the template, and the contents of each section should be the requirements concerning exactly that section. Then the templates can just be (markdown codeblock formatted) lists of empty headings, contained in an appendix. |
|
||
The purpose of incident reporting is to help us work together to build a more secure web. Therefore, the incident report should share lessons learned that could be helpful to all CA Owners in building better systems. The incident report should explain how systems or processes failed, how the mis-issuance or incident was made possible, and why the problem was not detected earlier. In addition to the timeline of responding to and resolving the incident, the incident report should explain how the CA Owner's systems or processes will be made more robust, and how others may learn from the incident. | ||
If being reported by the CA Owner corresponding with an incident, all fields included in the relevant template MUST be completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to add that fields that are not applicable (e.g. Appendix) shall be included in the report and marked as such. I'm not a big fan of being so prescriptive, but I think the "MUST be completed." language takes us down this road.
| Action Item | Kind | Due Date | | ||
| ----------- | ---- | -------- | | ||
| Example | Prevent | 2038-01-19 | | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do ongoing commitments belong?
In October 2023, the CCADB Steering Committee, with valuable feedback from the CCADB Public community, updated the CCADB Incident Reporting Guidelines (IRGs). While the resulting updates have led to some reports becoming more useful and effective, Root Store Operators have continued to stress the importance of high-quality incident reports during CA/Browser Forum Face-to-Face updates and elsewhere.
In the spirit of continuous improvement, the CCADB Steering Committee has worked over the past few months to further enhance the effectiveness of the IRGs.
Objectives for this update to the IRGs include:
The set of proposed updates are described in this Pull Request.
These changes should not be considered “final”, but instead a “work in-progress” that we hope to enhance through community contributions. Feedback is welcome either on this Pull Request (preferred), or via the discussion thread on public [at] ccadb [dot] org by January 15, 2025.