Skip to content
🔵Info0.0

Python XXE Prevention

Secure XML parsing configuration for Python applications using lxml, ElementTree, defusedxml, and other XML libraries.

CWE-611: Improper Restriction of XML External Entity ReferenceOWASP Top 10:2021 - A05: Security Misconfiguration

Overview

Python has multiple XML parsing libraries with different security characteristics. Understanding which library you're using and its default behavior is critical.

Python XML Libraries:

  • xml.etree.ElementTree - Standard library (safe by default in Python 3.x)
  • lxml.etree - Fast C library (VULNERABLE by default)
  • xml.dom.minidom - DOM implementation (safe by default)
  • xml.sax - SAX parser (safe by default)
  • defusedxml - Security wrapper library (RECOMMENDED)

Key Security Considerations:

  • lxml is VULNERABLE by default - requires explicit security configuration
  • Standard library (xml.etree) is generally safe but has limitations
  • defusedxml provides comprehensive protection for all parsers
  • Always use defusedxml for untrusted XML input

Best Practice: Use defusedxml library for all XML parsing of untrusted input. It automatically applies security configurations across all Python XML libraries.

Vulnerable lxml (Default Configuration)

Pythonvulnerable_lxml.py⚠️ Vulnerable
1from lxml import etree
2
3# VULNERABLE: Default lxml settings allow XXE
4def parse_xml_vulnerable(xml_data):
5    # Default parser resolves external entities
6    tree = etree.fromstring(xml_data)
7    return tree
8
9# Also vulnerable: Explicitly enabling entity resolution
10def parse_xml_also_vulnerable(xml_data):
11    parser = etree.XMLParser(
12        resolve_entities=True  # Explicitly dangerous!
13    )
14    tree = etree.fromstring(xml_data, parser)
15    return tree
16
17# Example usage that would be exploited
18xxe_payload = b'''<?xml version="1.0"?>
19<!DOCTYPE root [
20  <!ENTITY xxe SYSTEM "file:///etc/passwd">
21]>
22<root>
23  <data>&xxe;</data>
24</root>'''
25
26tree = parse_xml_vulnerable(xxe_payload)
27print(tree.find('.//data').text)  # Prints /etc/passwd contents!

Secure lxml Configuration

Pythonsecure_lxml.pyâś“ Secure
1from lxml import etree
2
3# SECURE: Configure lxml parser to block XXE
4def parse_xml_secure(xml_data):
5    # Create parser with security settings
6    parser = etree.XMLParser(
7        resolve_entities=False,  # Don't resolve external entities
8        no_network=True,         # Disable network access
9        dtd_validation=False,    # Disable DTD validation
10        load_dtd=False,         # Don't load DTD
11        huge_tree=False,        # Prevent billion laughs (DoS)
12        remove_blank_text=False
13    )
14    
15    # Parse with secure parser
16    tree = etree.fromstring(xml_data, parser)
17    return tree
18
19# Alternative: Use iterparse for large files
20def parse_xml_iterparse(xml_file_path):
21    parser = etree.XMLParser(
22        resolve_entities=False,
23        no_network=True,
24        load_dtd=False
25    )
26    
27    for event, elem in etree.iterparse(xml_file_path, 
28                                        events=('start', 'end'),
29                                        parser=parser):
30        if event == 'end' and elem.tag == 'data':
31            print(elem.text)
32            elem.clear()  # Free memory
33
34# Example secure usage
35xxe_payload = b'''<?xml version="1.0"?>
36<!DOCTYPE root [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
37<root><data>&xxe;</data></root>'''
38
39tree = parse_xml_secure(xxe_payload)
40# Entity NOT expanded - data element will be empty or contain literal &xxe;

Standard Library ElementTree (Generally Safe)

Pythonelementtree_safe.pyâś“ Secure
1import xml.etree.ElementTree as ET
2
3# Standard library ElementTree is generally safe by default
4# But doesn't support all XML features
5
6def parse_elementtree(xml_data):
7    """Standard ElementTree - safe by default in Python 3.x"""
8    try:
9        # External entities not expanded by default
10        tree = ET.fromstring(xml_data)
11        return tree
12    except ET.ParseError as e:
13        print(f"Parse error: {e}")
14        return None
15
16# IMPORTANT: ElementTree silently ignores external entities
17# This is safe but may not be obvious behavior
18
19xxe_payload = '''<?xml version="1.0"?>
20<!DOCTYPE root [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
21<root><data>&xxe;</data></root>'''
22
23tree = parse_elementtree(xxe_payload)
24if tree is not None:
25    data = tree.find('.//data')
26    # data.text will be None (entity not expanded)
27    print(f"Data: {data.text}")  # Prints: Data: None
28
29# For maximum security, still use defusedxml
30# ElementTree may have other vulnerabilities

Django Integration

Pythondjango_views.pyâś“ Secure
1from django.http import JsonResponse, HttpResponse
2from django.views.decorators.http import require_http_methods
3from django.views.decorators.csrf import csrf_exempt
4from defusedxml.lxml import fromstring
5import defusedxml
6import logging
7
8logger = logging.getLogger(__name__)
9
10@csrf_exempt  # Only if XML from trusted source
11@require_http_methods(["POST"])
12def process_xml(request):
13    """Secure XML processing endpoint in Django"""
14    
15    # Validate content type
16    if request.content_type != 'application/xml':
17        return JsonResponse(
18            {'error': 'Invalid content type'}, 
19            status=400
20        )
21    
22    # Get XML data
23    xml_data = request.body
24    
25    # Size limit (prevent DoS)
26    if len(xml_data) > 1048576:  # 1MB
27        return JsonResponse(
28            {'error': 'XML too large'}, 
29            status=413
30        )
31    
32    try:
33        # Parse with defusedxml (automatically secure)
34        tree = fromstring(xml_data)
35        
36        # Process XML safely
37        result = process_xml_tree(tree)
38        
39        return JsonResponse({'result': result})
40        
41    except defusedxml.DTDForbidden:
42        logger.warning('DTD forbidden in XML')
43        return JsonResponse(
44            {'error': 'DTD not allowed'}, 
45            status=400
46        )
47    except defusedxml.EntitiesForbidden:
48        logger.warning('Entities forbidden in XML')
49        return JsonResponse(
50            {'error': 'Entities not allowed'}, 
51            status=400
52        )
53    except Exception as e:
54        logger.error(f'XML processing error: {e}')
55        return JsonResponse(
56            {'error': 'Invalid XML'}, 
57            status=400
58        )
59
60def process_xml_tree(tree):
61    """Process parsed XML tree"""
62    # Safe processing logic
63    return {'status': 'success'}

Flask Integration

Pythonflask_app.pyâś“ Secure
1from flask import Flask, request, jsonify
2from defusedxml.lxml import fromstring
3import defusedxml
4import logging
5
6app = Flask(__name__)
7logger = logging.getLogger(__name__)
8
9@app.route('/api/xml', methods=['POST'])
10def process_xml():
11    """Secure XML processing endpoint in Flask"""
12    
13    # Validate content type
14    if request.content_type != 'application/xml':
15        return jsonify({'error': 'Invalid content type'}), 400
16    
17    # Get XML data
18    xml_data = request.get_data()
19    
20    # Size limit
21    if len(xml_data) > 1048576:  # 1MB
22        return jsonify({'error': 'XML too large'}), 413
23    
24    try:
25        # Parse securely with defusedxml
26        tree = fromstring(xml_data)
27        
28        # Process XML
29        result = {'status': 'success'}
30        
31        return jsonify(result), 200
32        
33    except defusedxml.DTDForbidden:
34        logger.warning('Blocked DTD in XML')
35        return jsonify({'error': 'DTD not allowed'}), 400
36    
37    except defusedxml.EntitiesForbidden:
38        logger.warning('Blocked entities in XML')
39        return jsonify({'error': 'Entities not allowed'}), 400
40    
41    except defusedxml.ExternalReferenceForbidden:
42        logger.warning('Blocked external reference')
43        return jsonify({'error': 'External references not allowed'}), 400
44    
45    except Exception as e:
46        logger.error(f'XML processing error: {str(e)}')
47        return jsonify({'error': 'Processing failed'}), 400
48
49if __name__ == '__main__':
50    app.run(debug=False)  # Never run debug=True in production

FastAPI Integration

Pythonfastapi_app.pyâś“ Secure
1from fastapi import FastAPI, Request, HTTPException, Response
2from defusedxml.lxml import fromstring
3import defusedxml
4import logging
5
6app = FastAPI()
7logger = logging.getLogger(__name__)
8
9@app.post("/api/xml")
10async def process_xml(request: Request):
11    """Secure XML processing endpoint in FastAPI"""
12    
13    # Validate content type
14    content_type = request.headers.get('content-type')
15    if content_type != 'application/xml':
16        raise HTTPException(
17            status_code=400,
18            detail="Invalid content type"
19        )
20    
21    # Get XML data
22    xml_data = await request.body()
23    
24    # Size limit
25    if len(xml_data) > 1048576:  # 1MB
26        raise HTTPException(
27            status_code=413,
28            detail="XML too large"
29        )
30    
31    try:
32        # Parse securely
33        tree = fromstring(xml_data)
34        
35        # Process XML
36        result = {"status": "success"}
37        
38        return result
39        
40    except defusedxml.DTDForbidden:
41        logger.warning('Blocked DTD')
42        raise HTTPException(
43            status_code=400,
44            detail="DTD not allowed"
45        )
46    
47    except defusedxml.EntitiesForbidden:
48        logger.warning('Blocked entities')
49        raise HTTPException(
50            status_code=400,
51            detail="Entities not allowed"
52        )
53    
54    except Exception as e:
55        logger.error(f'Error: {str(e)}')
56        raise HTTPException(
57            status_code=400,
58            detail="Processing failed"
59        )
60
61# Run with: uvicorn app:app --host 0.0.0.0 --port 8000

Security Testing with pytest

Pythontest_xxe_prevention.pyâś“ Secure
1import pytest
2from defusedxml.lxml import fromstring
3import defusedxml
4
5class TestXXEPrevention:
6    
7    def test_xxe_blocked(self):
8        """Test that XXE payloads are blocked"""
9        xxe_payload = b'''<?xml version="1.0"?>
10<!DOCTYPE root [
11  <!ENTITY xxe SYSTEM "file:///etc/passwd">
12]>
13<root>
14  <data>&xxe;</data>
15</root>'''
16        
17        # Should raise exception or block entity expansion
18        with pytest.raises((defusedxml.EntitiesForbidden, 
19                           defusedxml.DTDForbidden,
20                           defusedxml.ExternalReferenceForbidden)):
21            tree = fromstring(xxe_payload)
22    
23    def test_billion_laughs_blocked(self):
24        """Test that billion laughs attack is blocked"""
25        billion_laughs = b'''<?xml version="1.0"?>
26<!DOCTYPE lolz [
27  <!ENTITY lol "lol">
28  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
29]>
30<lolz>&lol2;</lolz>'''
31        
32        with pytest.raises((defusedxml.EntitiesForbidden,
33                           defusedxml.DTDForbidden)):
34            tree = fromstring(billion_laughs)
35    
36    def test_valid_xml_accepted(self):
37        """Test that valid XML without entities is accepted"""
38        valid_xml = b'<root><data>test content</data></root>'
39        
40        tree = fromstring(valid_xml)
41        assert tree is not None
42        assert tree.find('.//data').text == 'test content'
43    
44    def test_external_dtd_blocked(self):
45        """Test that external DTD is blocked"""
46        external_dtd = b'''<?xml version="1.0"?>
47<!DOCTYPE root SYSTEM "http://evil.com/evil.dtd">
48<root><data>test</data></root>'''
49        
50        with pytest.raises((defusedxml.DTDForbidden,
51                           defusedxml.ExternalReferenceForbidden)):
52            tree = fromstring(external_dtd)

Python XXE Prevention Checklist

âś… Library Selection:

  • Use defusedxml for all untrusted XML input (HIGHLY RECOMMENDED)
  • If using lxml directly, configure parser with secure settings
  • Standard library ElementTree is safe but use defusedxml for defense in depth
  • Never use xml.etree.ElementTree.iterparse() or parse() on untrusted data without defusedxml wrapper

âś… lxml Configuration (if not using defusedxml):

  • Set resolve_entities=False
  • Set no_network=True
  • Set load_dtd=False
  • Set dtd_validation=False
  • Set huge_tree=False (DoS prevention)

âś… Input Validation:

  • Reject XML containing <!DOCTYPE if not required
  • Implement size limits (prevent DoS)
  • Validate against expected schema
  • Log XXE attempts for monitoring

âś… Framework Integration:

  • Validate Content-Type headers (application/xml)
  • Implement request size limits
  • Use proper error handling (don't leak parse errors)
  • Log security events

âś… Testing:

  • Unit tests with XXE payloads (should be blocked)
  • Tests with billion laughs (should be blocked)
  • Tests with external DTD (should be blocked)
  • Integration tests with security scanner
  • Regular penetration testing

âś… Dependencies:

  • Install defusedxml: pip install defusedxml
  • Keep lxml updated: pip install --upgrade lxml
  • Monitor security advisories
  • Use virtual environments for isolation