HardeningKitty Score

Pros and Cons of the Scoring of Windows Configurations

by Michael Schneider

on January 21, 2021

time to read: 9 minutes

Keypoints

This is what the HardeningKitty Score can do

A score is supposed to be a simplification of the overall result
The score is calculated from the sum of individual checks
However, such a score cannot be judged without background knowledge
The list of check item should be adapted to one's own needs
The assessment of a configuration review is too complex to abstract it to a simple score

With the security check of a Windows configuration, many individual settings are examined and their effect on the security of the test object is classified. From the sum of the ratings, an assessment of the strengths and weaknesses of the configuration is made.

The severity classification of a configuration setting differs from the classification of a vulnerability in a software component. Since a misconfiguration of a setting does not necessarily have to result in a vulnerability. For the classification of vulnerabilities, there is the industry standard Common Vulnerability Scoring System (CVSS). Here, a vulnerability is evaluated according to criteria for exploitation and impact. As a counterpart to CVSS, there is the Common Configuration Scoring System (CCSS) for configuration reviews. The goal of such standards is it different description and measuring systems with one another to make compatible and generally understandable.

Systems such as CVSS and CCSS are intended for the classification of individual weak points and configuration settings. For assessing the overall outcome of a configuration review, there is also a desire for the result to be presented in a simplified form such as a score. Ideally, this score can also be used to make comparisons with similar projects. Whether the use of such a score achieves this goal of a simplified evaluation of a Windows configuration and whether this is also useful, we will look at in this article using the newly introduced Score of HardeningKitty.

HardeningKitty Score

With the Release v0.5.0 the HardeningKitty Score was introduced. For each test item there is a different number of points depending on the severity. If the checkpoint is rated as Passed, this is assigned four points. A severity score of Low gives two points, for a severity score of Medium gives one point and for a High no more points are given.

The HardeningKitty Score is calculated using the following formula:

(points achieved / maximum possible points) * 5 + 1

The score obtained can be interpreted using the following table:

Score	Rating
6	Excellent
5	Good
4	Sufficient
3	Insufficient
2	Insufficient
1	Insufficient

The aim of awarding points in this way is to ensure that the severity of a test point is also taken into account in the score. Alternatively, a checkpoint would only be scored for pass/fail. However, assigning points according to severity means that a system without a passed checkpoint will not drop to 0 points unless all checkpoints are rated High, and thus will not receive a score of 1.0. This gradation is therefore subsequently made in the calculation in the Code of HardeningKitty.

Pros and Cons

What is the use of a score?

The score is intended to be a simplification of a result and to allow assessment at a glance. When hundreds of individual tests are available, it is difficult to interpret the result without technical expertise and a yardstick: Are 20 medium weaknesses out of 250 test points a good result or not. In addition, a score can also provide a means of comparison when the same tests are performed on two different systems.

The score is calculated on the basis of a defined metric, so that it is possible to understand how a rating is arrived at. For example, if a HardeningKitty Score is below 4.0, this is considered an insufficient implementation. Based on the distributed points, it is also comprehensible how many settings must be corrected in order to achieve a sufficient rating.

The score is not a complete evaluation

The score cannot be used alone to evaluate the Windows configuration. As an example, a system on which many hardening recommendations have been implemented. This leads to a score above 5.0 and thus represents a good to very good result according to the HardeningKitty Rating. However, the hard disk of the system was not encrypted. This leads to a High rating for notebooks, since attackers with physical access can compromise the system in the absence of hard disk encryption. However, the point deduction of a High vulnerability is not directly apparent in the score. Therefore, HardeningKitty displays the sum of the results in the respective severity level in addition to the score, so that critical findings are not lost.

In addition, combinations of vulnerabilities due to misconfigurations cannot be mapped with a score. If PowerShell version 2 is installed on a system, the LSASS process is not protected and Credential Guard is not used, then attackers with administrative rights can use PowerShell version downgrade bypass to execute the script Invoke-Mimikatz and read the credentials of logged-in users. This bypasses existing protection systems such as Microsoft’s Antimalware Scan Interface (AMSI). Individually, these are all medium-findings in a configuration review, but in combination they can have a critical impact on system security.

What should be measured?

HardeningKitty has lists of various frameworks, including CIS Benchmarks for Microsoft Windows 10 and the Microsoft Security Baseline for Windows 10. Checking a system yields a very different score depending on the list of findings. For example, scanning a hardened Windows 10 system with the 0×6d69636b list yields a score of 5.66, while the Microsoft Security Baseline yields a score of 4.1 and CIS yields a score of 4.26.

When analyzing the lists, it is noticeable that Microsoft specifies over 50 settings regarding Internet Explorer. If Internet Explorer is not used on the system, the configuration of Internet Explorer settings is secondary. At CIS, recommendations for disabling services are given, among other things. Since neither Internet Explorer settings nor the configuration of services were made on the checked system, the difference in the score comes into play.

Therefore, before reviewing the system, it is recommended to think about what should be measured. If the system is to be checked for compliance with recommendations from Microsoft or CIS, the respective list can be adopted accordingly without any adjustments. On the other hand, it is worth compiling your own list from the existing recommendations. If Microsoft Edge is used as the default browser, the check should be extended to include Edge elements, but checking explicit Internet Explorer settings may not be necessary.

There are also settings where Microsoft and CIS directly contradict each other. For example, Microsoft recommends enabling PowerShell Script Block Logging, while CIS claims not to enable this feature. Microsoft weights the benefit of evaluating PowerShell logs higher than the risk that the logs may contain passwords.

Conclusion

The goal of using a score to represent the result of a Windows configuration review is not achieved. The score cannot be assessed without expert knowledge and detailed knowledge of the checklist used. The assessment of an operating system hardening is too complex to be represented by a score alone. Especially when it comes to combining and exploiting misconfigurations. The wish that a complex entity can be abstracted into a number remains unfulfilled. The overall result should therefore be presented accordingly with an analysis of strengths and weaknesses rather than a simple number.

The HardeningKitty Score is nevertheless not without utility. For example, it can be used to simply measure the improvement achieved after repeated testing. Likewise, such a score can be used as a threshold for hardening settings to be fulfilled, provided that the advantages and disadvantages of a score have been considered and the preliminary work in compiling a checklist has been done.

About the Author

Michael Schneider has been in IT since 2000. Since 2010 he is focused on information security. He is an expert at penetration testing, hardening and the detection of vulnerabilities in operating systems. He is well-known for a variety of tools written in PowerShell to find, exploit, and mitigate weaknesses. (ORCID 0000-0003-0772-9761)

You want to test the security of your firewall?

Our experts will get in contact with you!

Reporting and Documenting

Michael Schneider

Introduction of CVSS v4.0

Michael Schneider

Rogue Device

Michael Schneider

Windows LAPS

Michael Schneider

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

HardeningKitty Score

Pros and Cons of the Scoring of Windows Configurations

Keypoints

HardeningKitty Score

Pros and Cons

What is the use of a score?

The score is not a complete evaluation

What should be measured?

Conclusion

About the Author

Links

Tags

You want to test the security of your firewall?

Reporting and Documenting

Introduction of CVSS v4.0

Rogue Device

Windows LAPS

You want more?

You need support in such a project?

You want more?