Specific Criticism of CVSS4

What is not going to be better

by Marc Ruef

on March 14, 2024

time to read: 20 minutes

Keypoints

Why CVSSv4 Will Fail

The Common Vulnerability Scoring System (CVSS) was able to establish itself as a risk metric
The newly released CVSS 4.0 has some problems
Too long vectors, unnecessary attributes and contradictory calculations make the work more difficult
Fixing these errors requires a complete reworking of the metric
It remains to be seen whether the new version can prevail with its inabilities

The Common Vulnerability Scoring System (CVSS) has established itself as a risk metric in the field of technical cyber security. Since the release of CVSSv2 in 2007, the approach has been widely used as an industry standard to assign traceable risks to vulnerabilities. However, even the improvements in CVSSv3 have not saved the system from criticism. The recently published successor CVSSv4 is, in my opinion, a concrete step in the wrong direction. This post discusses what got worse and why I hope it won’t catch on.

Our Red Team is very familiar with CVSS, as it is an integral part of our risk assessments, which we provide to our customers in reports and presentations. In addition, scip was responsible for VulDB for almost 20 years and during this time around 150,00 vulnerabilities were classified by our moderation team with CVSSv2 and CVSSv3. VulDB is no longer part of scip. Nevertheless, we are currently supporting the introduction of CVSSv4 there too. We therefore had to deal very intensively with the new iteration, both in terms of processes (e.g. moderation) and technical implementation (e.g. implementation, calculations, parsing).

In doing so, we have noticed peculiarities that can be dismissed as potentially debatable paradigm shifts or unattractiveness. They require a rethink or prevent comparability with the scores of previous versions. However, some aspects represent a concrete improvement that is probably not acceptable. These could prevent broad acceptance of the new version.

Vectors Much Too Long

CVSS can be used to calculate quantitative scores ranging from 0.0 to 10.0. Before such a score can even be calculated, a vector string must be created. This consists of individual attributes that outline the nature of a vulnerability. For example, the Attack Vector (AV) is used to specify the access possibilities of an attack. An AV:N means that the attack is possible via the network. Otherwise it would indicate AV:A for Adjacent Network (in the LAN), AV:L for Local or AV:P for Physical.

A CVSSv2 Base Vector comprises 6 attributes. An unauthenticated SQL injection that could be exploited via the Internet leads to this simple vector:

CVSS2#AV:N/AC:L/Au:N/C:P/I:P/A:P

One of the improvements of CVSSv3 was the introduction of additional attributes, allowing the nature of a vulnerability to be outlined with more granularity. Suddenly, attribute names are now either one or two characters long. Newly introduced were User Interaction (UI) and Scope (S). Both have no influence in the established SQL injection scenario, which means that the new vector with its 8 attributes is structured as follows:

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L

Further attributes were added with CVSSv4. In addition to Attack Complexity (AC), Attack Requirements (AT) is now also used and Scope (S), which has been rather neglected since CVSSv3, has been split into three attributes of the Subsequent System Impact Metrics. The new vector for the same SQL injection then looks like this with its entire 11 attributes:

CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:N/SI:N/SA:N

The number of minimum attributes used for the base score has therefore almost doubled since CVSSv2. Gone are the days when the structure of a vulnerability could be determined at a glance. The simultaneous detection (subitizing) of humans is no longer able to do this, as adults increasingly make mistakes with numbers greater than 4. Instead, the vector must be painstakingly dissected and the individual attributes considered separately. These vectors therefore no longer seem to be made for humans.

However, creating the vectors also involves more effort. Since, as we will see shortly, the attributes no longer have the same weight, working out CVSSv4 vectors is extremely laborious. Especially when certain unknowns are present.

Unimportant Subsequent System Impact Metrics

In CVSSv4, the impact is basically defined in the Vulnerable System Impact Metrics (VSIM). These specify the effects that the attacked system experiences. Now VC is used instead of C (Confidentiality), VI instead of I (Integrity) and VA instead of A (Availability).

The Scope (S) attribute was introduced in CVSSv3. It is used to declare whether an attack only affects the attacked system (S:U for Unchanged) or also has an impact on other components (S:C for Changed). In CVSSv4, this one attribute is transferred to the Subsequent System Impact Metrics. The attributes SC (Confidentiality), SI (Integrity) and SA (Availability) are now additionally used there:

The Impact metrics reflect the direct consequence of a successful exploit, and represent the consequence to the “things that suffer the impact”, which may include impact on the vulnerable system and/or the downstream impact on what is formally called the “subsequent system(s)”.

The fundamental problem here is that there is enormous scope for judgment as to what should be considered a different component and to what extent an attack can affect it. A typical example of this discussion is a cross-site scripting vulnerability in a web application. The web application is actually the vulnerable system where a vulnerability (incorrect input validation) is exploited. Traditionally, the impact in VSIM would be defined as VC:N/VI:L/VA:N. However, the vulnerability (execution of injected script code) can only effectively unfold in the web browser, which in turn can be understood as a Subsequent System. The same impact vector would therefore also have to be adopted at least in the form SC:N/SI:L/SA:N.

But is this correct? After all, an XSS vulnerability can also be used to carry data from the browser (e.g. cookies), so the vector must at least be adapted as SC:L/SI:L/SA:N. If the browser is executed with administrative rights at the latest, it could also be discussed that the risk should be upgraded to SC:H/SI:H/SA:H. This would raise the score from 6.9 (Medium) to 7.9 (High). Is this really justified for all XSS attacks? According to the CVSS paradigm, the worst possible scenario should always be assumed.

In addition, these attributes must be mandatory. They are not optional attributes that can be dismissed with a Not Defined (X). However, it is often not even known whether and to what extent the vulnerability will have an impact on other systems. Sometimes also because it depends on configuration settings and individual mechanisms on site. This is why these attributes almost belong in the Environmental Metrics rather than in the Base Metrics.

Ineffective Supplemental Metrics

There are now Supplemental Metrics, which are to be filled in by the supplier of the vector. These 6 attributes are optional, as they define properties such as the requirements for Safety (S), whether an attack can be automated (AU) and what the Recovery (R) of an attack should look like.

These attributes are not only optional, they are only additional information and have no influence on the score. Let’s take the example of SQL injection, which in turn affects a medical device with sensitive patient data. The new overall vector with its additional requirements would now look like this:

CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:N/SI:N/SA:N/S:P/AU:Y/R:A/V:C/RE:H/U:Red

The score remains unchanged at 6.9, regardless of whether the Supplemental Metrics are completed or not. Creating or understanding such a long vector requires intensive consideration. The fact that Provider Urgency (U) as the only attribute of all available attributes is not abbreviated as a single letter (e.g. U:R), but written out as a whole word U:Red, is probably the smallest problem here.

Threat Metrics as Temp Scores

In CVSSv2 and CVSSv3, a distinction was made between different score types. The so-called Base Score contained all static information about a vulnerability, which remained the same for all users and over time. The Temp Score built on this and also took into account properties that change over time, such as the availability of exploits and countermeasures. And the Environmental Score took into account the individual circumstances depending on the customer and environment.

CVSSv4 handles this similarly in principle: here the Base Score is shown as CVSS-B and the Environmental Score as CVSS-BE. Although there is also a kind of Temp Score, which is represented as CVSS-BT, it completely neglects most of the popular properties. It is defined more narrowly and traded as a Threat Score.

Previously, the Report Confidence (RC), the Remediation Level (RL) and the Exploit Code Maturity (E) were used in the Temp Score. With CVSSv4, however, only the Exploit Maturity (E) (formerly Exploitability) is taken into account. This provides for the states Not Defined (X), Attacked (A), POC (P) and Unreported (U). What is striking here is the imbalance: POC (P) is the quality status of an exploit and Attacked (A) is the confirmed action of malicious actors. In addition, it usually does not take long from POC (P) to Attacked (A), because an attacker with technical understanding is usually able to develop a Weaponized Exploit within a very short time. And it is often the case that an attack can be carried out with a POC. Perhaps a little unwieldy and not fully automated, but exploitation may well be possible.

These threat metrics are generally very one-sided, they primarily discuss the quality or activities on the attacker side:

This metric measures the likelihood of the vulnerability being attacked, and is based on the current state of exploit techniques, exploit code availability, or active, “in-the-wild” exploitation.

This may be interesting for Red Teams and help Blue Teams to a small extent to assess the current state. However, the quality of the publication and the availability of countermeasures are at least as important for assessing risks. The fact that these aspects were simply discarded without replacement is irritating.

Added to this is the unpleasant effect that the selection of threat metrics always has a serious impact on the score. Let’s stay with the SQL injection vulnerability:

Metrics	Vector	Score	Delta	Risk
CVSS-B	CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:N/SI:N/SA:N	6.9	±0	Medium
CVSS-BT Not Defined	CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:N/SI:N/SA:N/E:X	6.9	±0	Medium
CVSS-BT Attacked	CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:N/SI:N/SA:N/E:A	6.9	±0	Medium
CVSS-BT POC	CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:N/SI:N/SA:N/E:P	5.5	-1.4 (20.28%)	Medium
CVSS-BT Unreported	CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:L/VI:L/VA:L/SC:N/SI:N/SA:N/E:U	2.7	-2.8 (50.9%)	Low

It is noticeable here that the maximum score of 6.9 is achieved if either CVSS-B is used or CVSS-BT with the worst assumption of Attacked. In this context, the worst possible case is assumed, unless otherwise known.

In most cases, however, the state is not Not Defined (X) but Unreported (U). This means that neither exploits nor attacks are known. Here, however, the worst case scenario is suddenly no longer assumed, but the score is downgraded to 2.7 (medium). This corresponds to a total reduction of 4.2 points (60.86%). I doubt that this contradiction can be intentional. Unless, of course, the intention was to provide customers with a tool that would allow them to easily downgrade and thus marginalize most of the weak points.

Documentation and Sample Implementation

In terms of their basic structure and orientation, the documentation of the various CVSS versions does not differ from one another. What is striking, however, is that CVSSv4 does not disclose and present the underlying mathematical formulas as in CVSSv3.

Instead, only an example implementation in JavaScript is made available on GitHub. It is the implementation that is also used on the official CVSS v4.0 Calculator. This has some peculiarities that are due to an idiosyncratic programming style (e.g. complex negated else if expressions are often used, although a simple else expression would have been just as possible; incrementing for loops are preferred to comprehensible .forEach constructs).

Porting and implementing your own implementation is therefore more difficult in CVSSv4, as you cannot fall back on a neutral representation of how it works. As a result, there are still very few ports in other languages. In addition, there are apparently deviations when it comes to rounding values, which in the worst case can lead to a different score being generated depending on the implementation.

Conclusion

CVSSv4 is complex and complicated. The introduction of additional attributes was probably intended to improve granularity and thus increase precision. But first the question should have been asked whether this is desirable or even necessary. Most users of CVSS are either only interested in the scores anyway, without looking at the vectors in detail.

Or they love to philosophize about it endlessly, which vector is really capable of reflecting the most correct scenario:

[I]t is likely that many different types of individuals will be assessing vulnerabilities (e.g., software vendors, vulnerability bulletin analysts, security product vendors), however, note that CVSS assessment is intended to be agnostic to the individual and their organization.

This discussion can now also extend to the different generations of CVSS, as the SQL injection mentioned at the beginning is either a 7.5 (CVSSv2), 7.3 (CVSSv3) or 6.9 (CVSSv4). The accuracy of CVSS is therefore generally not given, which is why an increase in complexity only conceals the problems and brings additional disadvantages.

In addition, the degradation of temp scores is an absolute mistake. The majority of CVSS users – especially administrators and blue teams – rely heavily on these attributes. By omitting them, the usefulness of the metric for this user group is absolutely diminished. For this reason, I cannot imagine that CVSSv4 will be able to establish itself with the masses. An additional metric would have to be introduced to compensate for the new inability.

I am of the opinion that CVSSv4 cannot be fixed. It is true that some minor flaws will probably be addressed in version 4.1. However, some fundamental decisions have been made in the development process that, if improved, would force a complete rewrite of the metric. The compatibility of a minor release would be completely lost as a result. It is therefore to be hoped that everything will improve with CVSSv5. A little less complexity would definitely be good for the approach.

The first news articles and contributions to CVSSv4 were all very optimistic. As an attentive reader, however, you quickly realized that the authors had not dealt with the details or used the new metric (productively). I can only imagine that CVSSv4 will prevail if the details of the metric continue to be neglected and the scores are blindly relied upon. I have to admit that this is one of my biggest fears. Because in this case, CVSS would ungratefully degenerate into a farce.

About the Author

Marc Ruef has been working in information security since the late 1990s. He is well-known for his many publications and books. The last one called The Art of Penetration Testing is discussing security testing in detail. He is a lecturer at several faculties, like ETH, HWZ, HSLU and IKF. (ORCID 0000-0002-1328-6357)

Your Blue Team may use some support?

Our experts will get in contact with you!

scip Cybersecurity Forecast

Marc Ruef

Voice Authentication

Marc Ruef

Bug Bounty

Marc Ruef

Breach and Leak

Marc Ruef

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

Specific Criticism of CVSS4

What is not going to be better

Keypoints

Vectors Much Too Long

Unimportant Subsequent System Impact Metrics

Ineffective Supplemental Metrics

Threat Metrics as Temp Scores

Documentation and Sample Implementation

Conclusion

About the Author

Links

Tags

Your Blue Team may use some support?

scip Cybersecurity Forecast

Voice Authentication

Bug Bounty

Breach and Leak

You want more?

You need support in such a project?

You want more?