Entity Processing in XML
Understanding how XML parsers process entities, the foundation of XXE vulnerabilities.
Overview
Entity processing is the mechanism by which XML parsers expand entity references into their defined values. Understanding entity processing is fundamental to comprehending XXE vulnerabilities.
What are XML Entities? Entities are placeholders that represent content in XML documents. They can be:
- Predefined entities - Built into XML (< > & " ')
- Character entities - Numeric references (A A)
- Internal entities - Defined within the document
- External entities - Reference content from external sources
- Parameter entities - Used within DTD declarations
Why Entity Processing Matters: Entity processing is powerful but dangerous. When parsers automatically expand external entity references, they can:
- Read arbitrary files from the filesystem
- Make HTTP requests to internal/external systems
- Cause denial of service through entity expansion
- Enable complex attacks through parameter entity chaining
Predefined Entities
1<!-- XML has 5 predefined entities for special characters -->
2<message>
3 <tag> displays as: <tag>
4 & displays as: &
5 " displays as: "
6 ' displays as: '
7</message>
8
9<!-- These are safe and always available -->
10<!-- Used to escape special XML characters -->Internal (General) Entities
Internal entities are defined within the DTD and contain static values. They're replaced when referenced in the document.
Internal Entity Example
1<?xml version="1.0"?>
2<!DOCTYPE message [
3 <!ENTITY company "Acme Corporation">
4 <!ENTITY email "contact@acme.com">
5 <!ENTITY copyright "Copyright 2024 Acme Corporation. All rights reserved.">
6]>
7<message>
8 <from>&company;</from>
9 <contact>&email;</contact>
10 <footer>©right;</footer>
11</message>
12
13<!-- When parsed, entities are expanded:
14<message>
15 <from>Acme Corporation</from>
16 <contact>contact@acme.com</contact>
17 <footer>Copyright 2024 Acme Corporation. All rights reserved.</footer>
18</message>
19-->External Entities (The Security Risk)
External entities reference content from outside the XML document using the SYSTEM or PUBLIC keywords. This is where XXE vulnerabilities originate.
SYSTEM Entities: Reference a URI (file://, http://, ftp://, etc.)
PUBLIC Entities: Reference a public identifier, with optional system identifier fallback
External Entity Examples
1<?xml version="1.0"?>
2<!DOCTYPE root [
3 <!-- SYSTEM entity - references a URI -->
4 <!ENTITY external SYSTEM "file:///etc/passwd">
5
6 <!-- Can also reference HTTP URLs -->
7 <!ENTITY config SYSTEM "http://internal-server/config.xml">
8
9 <!-- PUBLIC entity with fallback -->
10 <!ENTITY logo PUBLIC "-//Acme//Logo//EN" "http://acme.com/logo.xml">
11]>
12<root>
13 <data>&external;</data>
14</root>
15
16<!-- When parsed with external entity resolution enabled:
17 Parser loads /etc/passwd and inserts content
18 This is an XXE vulnerability! -->Entity Expansion Process
How Entity Expansion Works:
- Parser encounters entity reference - &entityName;
- Lookup entity definition - Searches DTD for <!ENTITY entityName ...>
- Retrieve entity value - Gets value from definition or external source
- Replace reference - Substitutes &entityName; with actual value
- Recursive expansion - If value contains entities, expand those too
Example Expansion Chain:
<!ENTITY a "Hello">
<!ENTITY b "&a; World">
<!ENTITY c "&b;!">
When &c; is referenced:
- &c; → &b;!
- &b;! → &a; World!
- &a; World! → Hello World!
- Final result: "Hello World!"
Parameter Entities (DTD Only)
Parameter entities are special entities that can only be used within DTD declarations. They use the % prefix instead of &.
Key Characteristics:
- Defined with <!ENTITY % name "value">
- Referenced with %name; (not &name;)
- Only valid within DTD (internal or external)
- Cannot be referenced in document content
- Critical for advanced XXE exploitation
Parameter Entity Example
1<?xml version="1.0"?>
2<!DOCTYPE root [
3 <!-- Define parameter entity -->
4 <!ENTITY % greeting "Hello">
5
6 <!-- Use parameter entity to define general entity -->
7 <!ENTITY % wrapper "<!ENTITY message '%greeting; World'>">
8
9 <!-- Evaluate wrapper to create 'message' entity -->
10 %wrapper;
11]>
12<root>
13 <text>&message;</text>
14</root>
15
16<!-- Result: <text>Hello World</text> -->
17
18<!-- Parameter entities enable dynamic DTD construction -->
19<!-- Used in advanced XXE attacks for blind exfiltration -->Entity Expansion Limits
Modern XML parsers implement limits to prevent abuse:
Common Limits:
- Expansion depth - Maximum nesting level (typically 10-20)
- Expansion count - Maximum number of expansions (64,000 in Java)
- Content size - Maximum size of expanded content
- Recursion detection - Prevents infinite loops
These limits protect against:
- Billion Laughs attacks (exponential expansion)
- Stack overflow from deep nesting
- Memory exhaustion from large expansions
- Infinite recursion
Security Note: Limits prevent DoS but don't prevent XXE. A single external entity reading a file bypasses all expansion limits.
Entity Types Summary
General Entities (Prefix: &)
- Used in document content
- Can be internal or external
- Example: &entityName;
- Security risk: External entities enable XXE
Parameter Entities (Prefix: %)
- Used in DTD declarations only
- Can be internal or external
- Example: %entityName;
- Security risk: Enable blind XXE and advanced attacks
Predefined Entities
- Always available (< > & " ')
- Cannot be redefined
- Safe to use
Character References
- Numeric (A = 'A', A = 'A')
- Not actually entities, just character encoding
- Always safe
Security Implications
Vulnerable Entity Processing:
- Parser automatically loads external entities
- No validation of entity URIs
- No restriction on protocols (file://, http://, ftp://)
- Entity expansion performed before application sees data
Attack Surface:
- External general entities → Classic XXE (file disclosure, SSRF)
- Parameter entities → Blind XXE (out-of-band exfiltration)
- Nested entities → Billion Laughs DoS
- DTD injection → XXE in contexts without direct entity control
Defense Strategy: Disable or strictly control entity processing at the parser level, before any expansion occurs.
Parser Behavior Comparison
Default Behavior by Language/Library:
Vulnerable by Default:
- Java (pre-configured) - Expands external entities
- PHP libxml (older) - Expands external entities
- lxml (Python) - Expands external entities
- libxmljs (Node.js) - Expands external entities
Safe by Default:
- Python xml.etree.ElementTree - Ignores external entities
- xml2js (Node.js) - Doesn't support external entities
- .NET 4.5.2+ - External entities disabled
- PHP 8.0+ - External entities disabled
Configurable: Most modern parsers allow explicit control via settings:
- resolve_entities (Python lxml)
- XMLResolver (Java/C#)
- noent/nonet flags (libxml)
- Parser options (Node.js)