🔵Info0.0
Python XXE Prevention
Secure XML parsing configuration for Python applications using lxml, ElementTree, defusedxml, and other XML libraries.
CWE-611: Improper Restriction of XML External Entity ReferenceOWASP Top 10:2021 - A05: Security Misconfiguration
Overview
Python has multiple XML parsing libraries with different security characteristics. Understanding which library you're using and its default behavior is critical.
Python XML Libraries:
- xml.etree.ElementTree - Standard library (safe by default in Python 3.x)
- lxml.etree - Fast C library (VULNERABLE by default)
- xml.dom.minidom - DOM implementation (safe by default)
- xml.sax - SAX parser (safe by default)
- defusedxml - Security wrapper library (RECOMMENDED)
Key Security Considerations:
- lxml is VULNERABLE by default - requires explicit security configuration
- Standard library (xml.etree) is generally safe but has limitations
- defusedxml provides comprehensive protection for all parsers
- Always use defusedxml for untrusted XML input
Best Practice: Use defusedxml library for all XML parsing of untrusted input. It automatically applies security configurations across all Python XML libraries.
Vulnerable lxml (Default Configuration)
Pythonvulnerable_lxml.py⚠️ Vulnerable
1from lxml import etree
2
3# VULNERABLE: Default lxml settings allow XXE
4def parse_xml_vulnerable(xml_data):
5 # Default parser resolves external entities
6 tree = etree.fromstring(xml_data)
7 return tree
8
9# Also vulnerable: Explicitly enabling entity resolution
10def parse_xml_also_vulnerable(xml_data):
11 parser = etree.XMLParser(
12 resolve_entities=True # Explicitly dangerous!
13 )
14 tree = etree.fromstring(xml_data, parser)
15 return tree
16
17# Example usage that would be exploited
18xxe_payload = b'''<?xml version="1.0"?>
19<!DOCTYPE root [
20 <!ENTITY xxe SYSTEM "file:///etc/passwd">
21]>
22<root>
23 <data>&xxe;</data>
24</root>'''
25
26tree = parse_xml_vulnerable(xxe_payload)
27print(tree.find('.//data').text) # Prints /etc/passwd contents!Secure lxml Configuration
Pythonsecure_lxml.pyâś“ Secure
1from lxml import etree
2
3# SECURE: Configure lxml parser to block XXE
4def parse_xml_secure(xml_data):
5 # Create parser with security settings
6 parser = etree.XMLParser(
7 resolve_entities=False, # Don't resolve external entities
8 no_network=True, # Disable network access
9 dtd_validation=False, # Disable DTD validation
10 load_dtd=False, # Don't load DTD
11 huge_tree=False, # Prevent billion laughs (DoS)
12 remove_blank_text=False
13 )
14
15 # Parse with secure parser
16 tree = etree.fromstring(xml_data, parser)
17 return tree
18
19# Alternative: Use iterparse for large files
20def parse_xml_iterparse(xml_file_path):
21 parser = etree.XMLParser(
22 resolve_entities=False,
23 no_network=True,
24 load_dtd=False
25 )
26
27 for event, elem in etree.iterparse(xml_file_path,
28 events=('start', 'end'),
29 parser=parser):
30 if event == 'end' and elem.tag == 'data':
31 print(elem.text)
32 elem.clear() # Free memory
33
34# Example secure usage
35xxe_payload = b'''<?xml version="1.0"?>
36<!DOCTYPE root [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
37<root><data>&xxe;</data></root>'''
38
39tree = parse_xml_secure(xxe_payload)
40# Entity NOT expanded - data element will be empty or contain literal &xxe;defusedxml Library (Recommended)
Pythondefusedxml_usage.pyâś“ Secure
1# Install: pip install defusedxml
2
3from defusedxml import ElementTree as DefusedET
4from defusedxml.lxml import fromstring as defused_lxml_fromstring
5from defusedxml.lxml import parse as defused_lxml_parse
6import defusedxml.minidom
7
8# BEST PRACTICE: Use defusedxml for all untrusted XML
9
10def parse_elementtree_safe(xml_data):
11 """Safe ElementTree parsing using defusedxml"""
12 try:
13 tree = DefusedET.fromstring(xml_data)
14 return tree
15 except DefusedET.ParseError as e:
16 print(f"Parse error: {e}")
17 return None
18 except defusedxml.DTDForbidden:
19 print("DTD forbidden")
20 return None
21 except defusedxml.EntitiesForbidden:
22 print("Entities forbidden")
23 return None
24
25def parse_lxml_safe(xml_data):
26 """Safe lxml parsing using defusedxml"""
27 try:
28 # Automatically applies security settings
29 tree = defused_lxml_fromstring(xml_data)
30 return tree
31 except Exception as e:
32 print(f"Blocked malicious XML: {e}")
33 return None
34
35def parse_file_safe(xml_file_path):
36 """Safe file parsing using defusedxml"""
37 try:
38 tree = defused_lxml_parse(xml_file_path)
39 return tree.getroot()
40 except Exception as e:
41 print(f"Blocked malicious XML: {e}")
42 return None
43
44# Example usage
45xxe_payload = b'''<?xml version="1.0"?>
46<!DOCTYPE root [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
47<root><data>&xxe;</data></root>'''
48
49# defusedxml blocks XXE automatically
50tree = parse_lxml_safe(xxe_payload)
51# Returns None or safe tree without entity expansionStandard Library ElementTree (Generally Safe)
Pythonelementtree_safe.pyâś“ Secure
1import xml.etree.ElementTree as ET
2
3# Standard library ElementTree is generally safe by default
4# But doesn't support all XML features
5
6def parse_elementtree(xml_data):
7 """Standard ElementTree - safe by default in Python 3.x"""
8 try:
9 # External entities not expanded by default
10 tree = ET.fromstring(xml_data)
11 return tree
12 except ET.ParseError as e:
13 print(f"Parse error: {e}")
14 return None
15
16# IMPORTANT: ElementTree silently ignores external entities
17# This is safe but may not be obvious behavior
18
19xxe_payload = '''<?xml version="1.0"?>
20<!DOCTYPE root [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
21<root><data>&xxe;</data></root>'''
22
23tree = parse_elementtree(xxe_payload)
24if tree is not None:
25 data = tree.find('.//data')
26 # data.text will be None (entity not expanded)
27 print(f"Data: {data.text}") # Prints: Data: None
28
29# For maximum security, still use defusedxml
30# ElementTree may have other vulnerabilitiesDjango Integration
Pythondjango_views.pyâś“ Secure
1from django.http import JsonResponse, HttpResponse
2from django.views.decorators.http import require_http_methods
3from django.views.decorators.csrf import csrf_exempt
4from defusedxml.lxml import fromstring
5import defusedxml
6import logging
7
8logger = logging.getLogger(__name__)
9
10@csrf_exempt # Only if XML from trusted source
11@require_http_methods(["POST"])
12def process_xml(request):
13 """Secure XML processing endpoint in Django"""
14
15 # Validate content type
16 if request.content_type != 'application/xml':
17 return JsonResponse(
18 {'error': 'Invalid content type'},
19 status=400
20 )
21
22 # Get XML data
23 xml_data = request.body
24
25 # Size limit (prevent DoS)
26 if len(xml_data) > 1048576: # 1MB
27 return JsonResponse(
28 {'error': 'XML too large'},
29 status=413
30 )
31
32 try:
33 # Parse with defusedxml (automatically secure)
34 tree = fromstring(xml_data)
35
36 # Process XML safely
37 result = process_xml_tree(tree)
38
39 return JsonResponse({'result': result})
40
41 except defusedxml.DTDForbidden:
42 logger.warning('DTD forbidden in XML')
43 return JsonResponse(
44 {'error': 'DTD not allowed'},
45 status=400
46 )
47 except defusedxml.EntitiesForbidden:
48 logger.warning('Entities forbidden in XML')
49 return JsonResponse(
50 {'error': 'Entities not allowed'},
51 status=400
52 )
53 except Exception as e:
54 logger.error(f'XML processing error: {e}')
55 return JsonResponse(
56 {'error': 'Invalid XML'},
57 status=400
58 )
59
60def process_xml_tree(tree):
61 """Process parsed XML tree"""
62 # Safe processing logic
63 return {'status': 'success'}Flask Integration
Pythonflask_app.pyâś“ Secure
1from flask import Flask, request, jsonify
2from defusedxml.lxml import fromstring
3import defusedxml
4import logging
5
6app = Flask(__name__)
7logger = logging.getLogger(__name__)
8
9@app.route('/api/xml', methods=['POST'])
10def process_xml():
11 """Secure XML processing endpoint in Flask"""
12
13 # Validate content type
14 if request.content_type != 'application/xml':
15 return jsonify({'error': 'Invalid content type'}), 400
16
17 # Get XML data
18 xml_data = request.get_data()
19
20 # Size limit
21 if len(xml_data) > 1048576: # 1MB
22 return jsonify({'error': 'XML too large'}), 413
23
24 try:
25 # Parse securely with defusedxml
26 tree = fromstring(xml_data)
27
28 # Process XML
29 result = {'status': 'success'}
30
31 return jsonify(result), 200
32
33 except defusedxml.DTDForbidden:
34 logger.warning('Blocked DTD in XML')
35 return jsonify({'error': 'DTD not allowed'}), 400
36
37 except defusedxml.EntitiesForbidden:
38 logger.warning('Blocked entities in XML')
39 return jsonify({'error': 'Entities not allowed'}), 400
40
41 except defusedxml.ExternalReferenceForbidden:
42 logger.warning('Blocked external reference')
43 return jsonify({'error': 'External references not allowed'}), 400
44
45 except Exception as e:
46 logger.error(f'XML processing error: {str(e)}')
47 return jsonify({'error': 'Processing failed'}), 400
48
49if __name__ == '__main__':
50 app.run(debug=False) # Never run debug=True in productionFastAPI Integration
Pythonfastapi_app.pyâś“ Secure
1from fastapi import FastAPI, Request, HTTPException, Response
2from defusedxml.lxml import fromstring
3import defusedxml
4import logging
5
6app = FastAPI()
7logger = logging.getLogger(__name__)
8
9@app.post("/api/xml")
10async def process_xml(request: Request):
11 """Secure XML processing endpoint in FastAPI"""
12
13 # Validate content type
14 content_type = request.headers.get('content-type')
15 if content_type != 'application/xml':
16 raise HTTPException(
17 status_code=400,
18 detail="Invalid content type"
19 )
20
21 # Get XML data
22 xml_data = await request.body()
23
24 # Size limit
25 if len(xml_data) > 1048576: # 1MB
26 raise HTTPException(
27 status_code=413,
28 detail="XML too large"
29 )
30
31 try:
32 # Parse securely
33 tree = fromstring(xml_data)
34
35 # Process XML
36 result = {"status": "success"}
37
38 return result
39
40 except defusedxml.DTDForbidden:
41 logger.warning('Blocked DTD')
42 raise HTTPException(
43 status_code=400,
44 detail="DTD not allowed"
45 )
46
47 except defusedxml.EntitiesForbidden:
48 logger.warning('Blocked entities')
49 raise HTTPException(
50 status_code=400,
51 detail="Entities not allowed"
52 )
53
54 except Exception as e:
55 logger.error(f'Error: {str(e)}')
56 raise HTTPException(
57 status_code=400,
58 detail="Processing failed"
59 )
60
61# Run with: uvicorn app:app --host 0.0.0.0 --port 8000Security Testing with pytest
Pythontest_xxe_prevention.pyâś“ Secure
1import pytest
2from defusedxml.lxml import fromstring
3import defusedxml
4
5class TestXXEPrevention:
6
7 def test_xxe_blocked(self):
8 """Test that XXE payloads are blocked"""
9 xxe_payload = b'''<?xml version="1.0"?>
10<!DOCTYPE root [
11 <!ENTITY xxe SYSTEM "file:///etc/passwd">
12]>
13<root>
14 <data>&xxe;</data>
15</root>'''
16
17 # Should raise exception or block entity expansion
18 with pytest.raises((defusedxml.EntitiesForbidden,
19 defusedxml.DTDForbidden,
20 defusedxml.ExternalReferenceForbidden)):
21 tree = fromstring(xxe_payload)
22
23 def test_billion_laughs_blocked(self):
24 """Test that billion laughs attack is blocked"""
25 billion_laughs = b'''<?xml version="1.0"?>
26<!DOCTYPE lolz [
27 <!ENTITY lol "lol">
28 <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
29]>
30<lolz>&lol2;</lolz>'''
31
32 with pytest.raises((defusedxml.EntitiesForbidden,
33 defusedxml.DTDForbidden)):
34 tree = fromstring(billion_laughs)
35
36 def test_valid_xml_accepted(self):
37 """Test that valid XML without entities is accepted"""
38 valid_xml = b'<root><data>test content</data></root>'
39
40 tree = fromstring(valid_xml)
41 assert tree is not None
42 assert tree.find('.//data').text == 'test content'
43
44 def test_external_dtd_blocked(self):
45 """Test that external DTD is blocked"""
46 external_dtd = b'''<?xml version="1.0"?>
47<!DOCTYPE root SYSTEM "http://evil.com/evil.dtd">
48<root><data>test</data></root>'''
49
50 with pytest.raises((defusedxml.DTDForbidden,
51 defusedxml.ExternalReferenceForbidden)):
52 tree = fromstring(external_dtd)Python XXE Prevention Checklist
âś… Library Selection:
- Use defusedxml for all untrusted XML input (HIGHLY RECOMMENDED)
- If using lxml directly, configure parser with secure settings
- Standard library ElementTree is safe but use defusedxml for defense in depth
- Never use xml.etree.ElementTree.iterparse() or parse() on untrusted data without defusedxml wrapper
âś… lxml Configuration (if not using defusedxml):
- Set resolve_entities=False
- Set no_network=True
- Set load_dtd=False
- Set dtd_validation=False
- Set huge_tree=False (DoS prevention)
âś… Input Validation:
- Reject XML containing <!DOCTYPE if not required
- Implement size limits (prevent DoS)
- Validate against expected schema
- Log XXE attempts for monitoring
âś… Framework Integration:
- Validate Content-Type headers (application/xml)
- Implement request size limits
- Use proper error handling (don't leak parse errors)
- Log security events
âś… Testing:
- Unit tests with XXE payloads (should be blocked)
- Tests with billion laughs (should be blocked)
- Tests with external DTD (should be blocked)
- Integration tests with security scanner
- Regular penetration testing
âś… Dependencies:
- Install defusedxml: pip install defusedxml
- Keep lxml updated: pip install --upgrade lxml
- Monitor security advisories
- Use virtual environments for isolation