XXE, how did a feature become a critical vulnerability?

Tushar Bhatia
6 min readOct 25, 2021

--

This is my take on XML External Entities and how a feature of this front-end language intrigued the security researchers and how it was proved to be a vulnerability if XML is not implemented properly.

The process to analyze or understand any vulnerability:

1. What it is?

2. Why does it exist (basic and specific reasons)?

3. Impacts (the damage it can cause)

4. Mitigation (ways to prevent it)

Description:

What is XML?

It stands for Extensible Markup Language and is described as a metalanguage meaning that XML is used to describe other languages. It was developed to fulfill the shortcomings of the HTML because it could only define how the data will be displayed. However, XML defines the structure of data as well.

Differences between HTML and XML:

1. HTML has predefined tags (some of them need to be closed while some don’t) but XML has no predefined tags. Instead, tags are user-defined and unlike HTML, XML tags always require a closing tag.

2. Example:

1st. <?xml version=”1.0" encoding=”UTF-8"?>

2nd. <Hi>

3rd. <Hello>

4th. <Title>Bug Bounty</Title>

5th. <Compensation>1000000</Compensation>

6th. <Responsibility fundamental=”1">Shot web</Responsibility>

7th. </Hello>

8th. </Hi>

Look closely here, because all of the tags are user-defined so it’s impossible to know how this data will look on a web page.

Note: “1st” line is the declaration header of XML indicating the version and the type of Unicode encoding.

Now, before getting into basic XML syntax in order to fathom and create an attack, let’s understand why is the XML vulnerable?

Moreover, in layman’s terms, XML is used to fetch some data from the Back-End to the Front-End in order to increase the application’s speed and efficiency. However, if we look at it from an attacker’s perspective then this feature could be a disastrous vulnerability.

In fact, if XML is not implemented properly then the attacker can abuse its feature in order to fetch sensitive data from the Back-End.

Basics of XML:

Document Type Definition (DTD):

It defines the structure, the legal elements, and the attributes of an XML document. In other words, because all the tags are user-defined then, in order to work an XML document must follow this set of general rules known as DTD.
DTD is a set of declarations that decides the elements, structure and attributes as explained above. For a better understanding of these terms consider the following example:

<address category = “residence”>

<name>name content</name>

<company>company content</company>

<phone>(011) 123–4567</phone>

</address>

Here, “address” is the element, and “category” is the attribute. However, “name” is a tag because it marks the start and end of an element without any attributes i.e. for empty elements, by an empty-element tag.

On the other hand, DTDs are of two types: Internal and External.

Internal DTD: This DTD is defined within the XML document.

External DTD: This is actually an external .dtd file from which the XML document fetches and references.

Note: All the elements used in the XML document are defined in the external .dtd file using the keyword “!Element”.

Moreover, before moving on to entities and parsable character data let’s understand the meaning of “parsing” in XML.

XML Parser: In computing, a parser is a program (or a piece of code or API that you can reference inside your own programs) which analyses files to identify the component parts. All applications that read input have a parser of some kind, otherwise they’d never be able to figure out what the information means. It also verifies that the XML document is in a suitable format and may also validate the XML documents.

There is also one more use of parser that is it converts an XML document into an XML DOM object so that it can then be manipulated with JavaScript.

Parsable Character Data (#PCDATA):

This type of data tells the parser about the data in the elements that are going to be parsed. Similarly, CDATA means that a particular element contains unparsable character data.

XML Entities (Finally!):

XML Documents consist of a set of storage units called Entities. They act as a replacement mechanism and placeholders for information.

For Example: Creating a database of names, addresses, and attaching them to a word document, and then storing the document in the form of an Entity to be accessible by the parser.

Nevertheless, entities can be used as a shortcut that allows us to embed blocks of text or even entire documents and files into an XML document as it makes updating documents across networks really convenient.

Sample XML Document:

<?xml version=”1.0" encoding=”UTF-8"?>

<!DOCTYPE Jobs [

<!ATTLIST Responsibility fundamental CDATA “0”>

➊ <!ELEMENT Website ANY>

➋ <!ENTITY url SYSTEM “website.txt”>

]>

<Jobs>

<Job>

<Title>Hacker</Title>

<Compensation>1000000</Compensation>

<Responsibility fundamental=”1">Shot web</Responsibility>

➌ <Website>&url;</Website>

</Job>

</Jobs>

Notice that I’ve added a Website “!ELEMENT”, but instead of (#PCDATA), I’ve used ANY (it refers to literally anything at all as long as rules are followed) because it is useful when you have yet to decide the allowable contents of the element ➊.

➋ This data definition means the Website tag can contain any combination of parsable data. I’ve also defined an !ENTITY with a SYSTEM attribute, telling the parser to get the contents of the website.txt file wherever the placeholder name URL is inside a website tag.

At ➌ I use the website tag, and the contents of website.txt would be fetched in the place of “&url;” Note the & in front of the entity name. Whenever you reference an entity in an XML document, you must precede it with &.

To sum it up, we can make use of these External Entities to fetch data from the Back-End like the contents of “/etc/passwd” etc.

XXE Vulnerability:

In an XXE attack, an attacker abuses a goal software in order that it consists of outside (external) entities in its XML parsing. In different words, the software expects a few XML, and it isn’t always validating what it is receiving; it simply parses something it gets. For instance, let’s say the job board in the previous example lets you register and upload jobs via XML.

The job board might make its DTD file available to you and assume that you’ll submit a file matching the requirements. Instead of having the !ENTITY retrieve the contents of “website.txt”, you could have it retrieve the contents of “/etc/passwd”.

Retrieving the contents of “/etc/passwd” is the best way to prove that this vulnerability exists.

We should submit something like this in the XML document:

<?xml version=”1.0" encoding=”UTF-8"?>

➊ <!DOCTYPE foo [

➋ <!ELEMENT foo ANY >

➌ <!ENTITY xxe SYSTEM “file:///etc/passwd” >

]>

➍ <foo>&xxe;</foo>

➊The parser receives this code and recognizes an internal DTD defining a foo document type.

➋The DTD tells the parser that foo can include any parsable data; then there’s an entity xxe that should read my /etc/passwd file (file:// denotes a full URI path to the /etc/passwd file) when the document is parsed.

➌The parser should replace &xxe; elements with those file contents.

➍Then, you finish it off with XML defining a <foo> tag that contains “&xxe;”, which prints my server info.

Note: The above example has been taken from “REAL-WORLD BUG HUNTING by Peter Yaworski”

Bonus: A little tip to find Hidden Attack Vectors for XXE

Sometimes the attack vectors are less visible than the common ones. It happens because some apps receive client-submitted data, embed it server-side in an XML document, and then parse the document.

Mitigation:

1. First and foremost protection from XXE attacks would be to disable DTD and External Entities Completely.

2. The XML Processor must be given many little features. For example, it should not be allowed to retrieve contents of the /etc/passwd or any kind of backend file on the client-side.

3. User Input, File Upload, and all the URLs must be sanitized, validated, and whitelisted.

For Practice: I would personally recommend Portswigger Labs for the basic practice of XXE as it accumulates the real environments really well.

--

--

Tushar Bhatia

Cyber Security Analyst | Trainer | Researcher | CTF Player | Ethical Hacker