Skip to content
🔵Info0.0

Entity Processing in XML

Understanding how XML parsers process entities, the foundation of XXE vulnerabilities.

CWE-611: Improper Restriction of XML External Entity ReferenceOWASP Top 10:2021 - A05: Security Misconfiguration

Overview

Entity processing is the mechanism by which XML parsers expand entity references into their defined values. Understanding entity processing is fundamental to comprehending XXE vulnerabilities.

What are XML Entities? Entities are placeholders that represent content in XML documents. They can be:

  • Predefined entities - Built into XML (< > & " ')
  • Character entities - Numeric references (A A)
  • Internal entities - Defined within the document
  • External entities - Reference content from external sources
  • Parameter entities - Used within DTD declarations

Why Entity Processing Matters: Entity processing is powerful but dangerous. When parsers automatically expand external entity references, they can:

  • Read arbitrary files from the filesystem
  • Make HTTP requests to internal/external systems
  • Cause denial of service through entity expansion
  • Enable complex attacks through parameter entity chaining

Predefined Entities

XMLpredefined-entities.xmlâś“ Secure
1<!-- XML has 5 predefined entities for special characters -->
2<message>
3  &lt;tag&gt; displays as: <tag>
4  &amp; displays as: &
5  &quot; displays as: "
6  &apos; displays as: '
7</message>
8
9<!-- These are safe and always available -->
10<!-- Used to escape special XML characters -->

Internal (General) Entities

Internal entities are defined within the DTD and contain static values. They're replaced when referenced in the document.

Internal Entity Example

XMLinternal-entities.xmlâś“ Secure
1<?xml version="1.0"?>
2<!DOCTYPE message [
3  <!ENTITY company "Acme Corporation">
4  <!ENTITY email "contact@acme.com">
5  <!ENTITY copyright "Copyright 2024 Acme Corporation. All rights reserved.">
6]>
7<message>
8  <from>&company;</from>
9  <contact>&email;</contact>
10  <footer>&copyright;</footer>
11</message>
12
13<!-- When parsed, entities are expanded:
14<message>
15  <from>Acme Corporation</from>
16  <contact>contact@acme.com</contact>
17  <footer>Copyright 2024 Acme Corporation. All rights reserved.</footer>
18</message>
19-->

External Entities (The Security Risk)

External entities reference content from outside the XML document using the SYSTEM or PUBLIC keywords. This is where XXE vulnerabilities originate.

SYSTEM Entities: Reference a URI (file://, http://, ftp://, etc.)

PUBLIC Entities: Reference a public identifier, with optional system identifier fallback

External Entity Examples

XMLexternal-entities.xml⚠️ Vulnerable
1<?xml version="1.0"?>
2<!DOCTYPE root [
3  <!-- SYSTEM entity - references a URI -->
4  <!ENTITY external SYSTEM "file:///etc/passwd">
5  
6  <!-- Can also reference HTTP URLs -->
7  <!ENTITY config SYSTEM "http://internal-server/config.xml">
8  
9  <!-- PUBLIC entity with fallback -->
10  <!ENTITY logo PUBLIC "-//Acme//Logo//EN" "http://acme.com/logo.xml">
11]>
12<root>
13  <data>&external;</data>
14</root>
15
16<!-- When parsed with external entity resolution enabled:
17   Parser loads /etc/passwd and inserts content
18   This is an XXE vulnerability! -->

Entity Expansion Process

How Entity Expansion Works:

  1. Parser encounters entity reference - &entityName;
  2. Lookup entity definition - Searches DTD for <!ENTITY entityName ...>
  3. Retrieve entity value - Gets value from definition or external source
  4. Replace reference - Substitutes &entityName; with actual value
  5. Recursive expansion - If value contains entities, expand those too

Example Expansion Chain:

<!ENTITY a "Hello">
<!ENTITY b "&a; World">
<!ENTITY c "&b;!">

When &c; is referenced:

  • &c; → &b;!
  • &b;! → &a; World!
  • &a; World! → Hello World!
  • Final result: "Hello World!"

Parameter Entities (DTD Only)

Parameter entities are special entities that can only be used within DTD declarations. They use the % prefix instead of &.

Key Characteristics:

  • Defined with <!ENTITY % name "value">
  • Referenced with %name; (not &name;)
  • Only valid within DTD (internal or external)
  • Cannot be referenced in document content
  • Critical for advanced XXE exploitation

Parameter Entity Example

XMLparameter-entities.xmlâś“ Secure
1<?xml version="1.0"?>
2<!DOCTYPE root [
3  <!-- Define parameter entity -->
4  <!ENTITY % greeting "Hello">
5  
6  <!-- Use parameter entity to define general entity -->
7  <!ENTITY % wrapper "<!ENTITY message '%greeting; World'>">
8  
9  <!-- Evaluate wrapper to create 'message' entity -->
10  %wrapper;
11]>
12<root>
13  <text>&message;</text>
14</root>
15
16<!-- Result: <text>Hello World</text> -->
17
18<!-- Parameter entities enable dynamic DTD construction -->
19<!-- Used in advanced XXE attacks for blind exfiltration -->

Entity Expansion Limits

Modern XML parsers implement limits to prevent abuse:

Common Limits:

  • Expansion depth - Maximum nesting level (typically 10-20)
  • Expansion count - Maximum number of expansions (64,000 in Java)
  • Content size - Maximum size of expanded content
  • Recursion detection - Prevents infinite loops

These limits protect against:

  • Billion Laughs attacks (exponential expansion)
  • Stack overflow from deep nesting
  • Memory exhaustion from large expansions
  • Infinite recursion

Security Note: Limits prevent DoS but don't prevent XXE. A single external entity reading a file bypasses all expansion limits.

Entity Types Summary

General Entities (Prefix: &)

  • Used in document content
  • Can be internal or external
  • Example: &entityName;
  • Security risk: External entities enable XXE

Parameter Entities (Prefix: %)

  • Used in DTD declarations only
  • Can be internal or external
  • Example: %entityName;
  • Security risk: Enable blind XXE and advanced attacks

Predefined Entities

  • Always available (< > & " ')
  • Cannot be redefined
  • Safe to use

Character References

  • Numeric (A = 'A', A = 'A')
  • Not actually entities, just character encoding
  • Always safe

Security Implications

Vulnerable Entity Processing:

  • Parser automatically loads external entities
  • No validation of entity URIs
  • No restriction on protocols (file://, http://, ftp://)
  • Entity expansion performed before application sees data

Attack Surface:

  • External general entities → Classic XXE (file disclosure, SSRF)
  • Parameter entities → Blind XXE (out-of-band exfiltration)
  • Nested entities → Billion Laughs DoS
  • DTD injection → XXE in contexts without direct entity control

Defense Strategy: Disable or strictly control entity processing at the parser level, before any expansion occurs.

Parser Behavior Comparison

Default Behavior by Language/Library:

Vulnerable by Default:

  • Java (pre-configured) - Expands external entities
  • PHP libxml (older) - Expands external entities
  • lxml (Python) - Expands external entities
  • libxmljs (Node.js) - Expands external entities

Safe by Default:

  • Python xml.etree.ElementTree - Ignores external entities
  • xml2js (Node.js) - Doesn't support external entities
  • .NET 4.5.2+ - External entities disabled
  • PHP 8.0+ - External entities disabled

Configurable: Most modern parsers allow explicit control via settings:

  • resolve_entities (Python lxml)
  • XMLResolver (Java/C#)
  • noent/nonet flags (libxml)
  • Parser options (Node.js)