Evaluation Criteria for Conformance Models #34
Replies: 13 comments 9 replies
-
For details of how the metrics from the WAI Symposium were applied to WCAG3, see Metrics and Plan for Evaluating Conformance Scoring for WCAG 3 from the Conformance Architecture Testing subgroup. |
Beta Was this translation helpful? Give feedback.
-
@rachaelbradley Can you explain why you think WCAG 3 needs a conformance model that is built on a scoring system? I know there are some difficulties that we're hoping to solve with this. I think the big one is that some people have been asking that WCAG 3 allow sites with minor issues to conform. That has gotten push-back both from the "WCAG is a ruler not the rule" crowd, and from the "no exceptions" crowd. Until we have agreement conformance can ever be less than meet 100% of the requirements, I don't think we can make that a requirement for the conformance model. Second is that I'm somewhat skeptical that it's even possible to come up with a scoring system that meets all those criteria. Especially the reliable and equitable parts. Lets say we have a scoring of 1 - 10. Instead of needing to ensure that we have an equitable conformance level, we now have to ensure that we have 10 equitable grades. It gets harder the more granularity you add to that. And on the reliability front, that gets harder the more you allow testers to decide what is and isn't important / critical / essential for someone with a disability. That feels inherently problematic and ableist to me. I do appreciate there are significant challenges that come from full conformance being this almost unachievable target for a lot of organizations. That feels much more like a policy problem than a standards problem. Policies should describe to what extent okay for things to be imperfect, and what measures an organization should provide to compensate for these shortcomings. There is far more flexibility in that space than there is in WCAG 3's conformance model to consider things like how responsive is the help desk, what non-web alternatives are available, how quickly can issues be resolved, etc. |
Beta Was this translation helpful? Give feedback.
-
Please be sure to include equity as one of the criteria for evaluating conformance models. As we compare models, we need an approach that supports equity across the disability categories. This is a must. |
Beta Was this translation helpful? Give feedback.
-
Re complexity/taking "a reasonable amount of time to test": This makes sense as a criterion, but it's also concerning. Are we basically asking how much of the testing can be automated? The Metrics and Plan for Evaluating Conformance Scoring for WCAG 3 that Jeanne shared above says the way to test conformance models for complexity is to "ask experts to run a test on a site where they know how long it took to test with WCAG2 and compare how long it took to test with WCAG3." But one goal of WCAG 3 is to cover more user needs, including more needs of people with cognitive disabilities. Will an emphasis on test time mean we're less likely to cover certain user needs? Or that we'll have to cover them in a less rigorous way, such as an assertion? As we look at the current set of criteria, will equity serve as a balance to testing time? Will each criterion have the same weight? |
Beta Was this translation helpful? Give feedback.
-
Re: AdequacyI think this should be "Proportionality" instead. As this currently stands, the title and definition allow for a large change at the guideline level to create a small scoring change. I feel like guideline changes should create large scoring changes, and small guideline changes should create small scoring changes. |
Beta Was this translation helpful? Give feedback.
-
We should be awear that versions of WCAG had not fully managed to support all disabilities equally. These criteria failed at creating equality, or at putting the user first. Therefor we must be careful to not repeat the same error and exclusion. Just because groups were excluded in the past does not mean they should be excluded in the future. Complexity and repeatability? no thank you. Not when compared to the need to meet our core mandate to make guidelines that tell content creators how to include people with disabilities. To me this is not just making tests to meet the needs of the testers. Does the content meet the needs of the users? does it favor groups over others? Do design choices make the affects of disabilities worse, or even create disability? Equity, equality. |
Beta Was this translation helpful? Give feedback.
-
I think it's very important to ensure that user testing is recognized as a valid way to test for reliability. There will always be aspects of accessibility that don't fit into a black and white, automated testing scheme -- which is why we should not discount the importance of well-structured user testing. Well-structured user testing follows a measurable protocol that identifies patterns in user feedback. These patterns are what equate to "consistency" and "reproducibility". "Repeatability" does not mean "fast" or "convenient" or "cheap." We need to focus on the intent of the term and recognize that if we are truly concerned about including historically excluded populations in these guidelines, then we will validate user testing as a necessary tool for measuring accessibility for certain populations and certain criteria. |
Beta Was this translation helpful? Give feedback.
-
Validity, Sensitivity and adequacy as defined above seem not so much separate but interlinked and dependent both on granularity of the scale and the availablitiy of additional conditions captured. In the scoring example cited this is the "critical error" device which overrules the arithmetic approach (say, dividing the number of images with appropriate alt by the total number). The mix of instances falling under a particular guideline, each of which can have different impact when not implemented properly, makes an arithmetic approach dubious. As a human tester looking at, say, all images on a page, I process both the qualitative aspect, the estimated impact from high (say, missing name on a critical image based control) to low (say, bad alt on a teaser image that is followed by a linked teaser heading) as well as the quantitative aspect (how many images are we talking here). So I simultaneously process qualitative and quantitatative information in determining where to rate it on our 5-point Likert scale. (which btw also maps on WCAG pass/fail, losing granularity by doing so). Obviously, there is some subjectivity in that rating, and some other evaluators may arrive at a different result. But it safes evaluators going though a detailed (and by that token, complex and time-consuming) process as described in the scoring example. One could argue evaluation would hardly be efficiently doable otherwise - and WCAG 2.X PASS/FAIL raters apply something similar today in deciding whether content is still within tolerances of "PASS". Would doing it that complex way as in the scoring example improve replicability? Possibly - but at the penalty of being captive of a process that is much more time consuming than assessing the likely impact of issues on a page and doing that calculation as part of an expert assessment that thankfully needs not be explicit to the dot (even though it could be explained and laid out in more detail in any post mortem analysis). |
Beta Was this translation helpful? Give feedback.
-
When the Silver Task Force created conformance prototypes in what was called phase 3 circa 2018 – 19, we had and/or created evaluation criteria to compare and discuss each. I do not recall this list of criteria from a 2011 symposium being among them. Please ensure that those evaluation criteria are also considered. Other criteria that I can imagine:
|
Beta Was this translation helpful? Give feedback.
-
Just a note that these are considerations to discuss when evaluating conformance and not requirements. The purpose is that we understand tradeoffs.Sent from my iPhoneOn Jan 18, 2024, at 11:48 AM, Lisa Seeman ***@***.***> wrote:
we run the risk of priotizing supporting peoples business model over user needs. that is what these requirements are heading towards (again)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
This comment is from the COGA subgroup. As we consider WCAG 3.0, we need to be mindful that equity is foundational to ensuring the representation of historically marginalized groups. In the context of WCAG, equity means ensuring users with any disability or combination of disabilities have a digital experience functionally equivalent to the experience provided to people without disabilities. As the group responsible for writing WCAG 3.0, we must guard ourselves against falling into subjective measures that can lead to exclusion. For example, arguing a proposed guideline should meet the needs of a minimum, arbitrary number of people, is an ableist bias that has led to the exclusion of certain types of disabilities in previous versions of WCAG. We must be committed to equity for equity's sake. We cannot allow ableist bias to creep into our justifications for accepting or denying guidelines into the levels of WCAG that will be bound to legislation and broadly adopted as the baseline of accessibility by corporations. We must also avoid dismissing the needs of users simply because solutions to address their needs might require testing beyond what can be easily automated. Accessibility must be similar for different groups of disabilities and different combinations of disabilities at any conformance level. The needs of users must be central to our reasons for including or excluding guidelines and how we rank those guidelines. For the basic accessibility needs of some groups to be considered Bronze, while other groups are relegated to Silver or Gold, would be a disaster, and would simply be repeating history. |
Beta Was this translation helpful? Give feedback.
-
In some comments the need for something to be testable seems to be interpreted as having to be automatically testable. We have not ever had a requirement that provisions need to be automatically testable in order to be included. Just testable with inter-evaluator reliable results. best |
Beta Was this translation helpful? Give feedback.
-
Hi, just posting https://w3c.github.io/wcag/conformance-challenges/ here as well. I see related links such as the Research Report , but wanted to post the challenges link here too for reference's sake. |
Beta Was this translation helpful? Give feedback.
-
When we look across conformance models, it will help to have a set of criteria to use to evaluate and compare the variations.
The proposed criteria from the 2011 WAI Symposium on Accessibility Metrics are:
Additional proposed criteria from AGWG discussion:
Question for Discussion: What evaluation criteria are missing or need to be adjusted?
Note: No conformance model will meet all these criteria. The purpose of this discussion thread is to identify whether there are any other criteria we should be using to compare and discuss each model or whether we should revise/remove any of the ones listed.
Beta Was this translation helpful? Give feedback.
All reactions