Security Code Review Basics in Python
Understanding how this functions works in depth
Manual security code reviews in Python are essential for identifying business logic flaws and complex trust boundaries that automated tools frequently overlook.
A primary focus of this article involves function analysis, specifically examining how user-controlled input flows into “sinks”—sensitive functions that can trigger vulnerabilities if the data is not properly sanitized.
Analysis of High-Risk Python Functions
A critical component of manual review is identifying specific built-in and external library functions with unexpected or dangerous behaviors.
Dynamic Evaluation
The functions eval() and exec() are significant liabilities because they allow strings to be executed as code.
Examples eval() https://docs.python.org/3/library/functions.html#eval
This function executes arbitrary code. Calling it with untrusted user-supplied input will lead to security vulnerabilities.
1
2
3
4
5
6
eval('4+1')
# 5
# Injecting malicious code.
eval('__import__("os").system("uname -a")')
# Linux legion 6.11.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 26 14:16:59 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
The safe way
Manual analysis should confirm these are replaced with safer alternatives like ast.literal_eval() for data structures or dispatch tables for function calls.
1
2
3
4
5
6
# In this case when trying to execute arbitrary commands the system raises an exception.
ast.literal_eval('__import__("os").system("uname -a")')
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/usr/lib/python3.12/ast.py", line 112, in literal_eval
ast.literal_eval() safely evaluates only literals (strings, numbers, tuples, lists, dicts, booleans, None). It converts a JSON-like string into a dictionary and prevents arbitrary code execution by raising a ValueError for unsafe expressions.
OS Command Execution
Functions such as os.system(), os.popen(), and subprocess.run(..., shell=True) are prone to command injection if user input is directly concatenated.
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Vulnerable: User input directly concatenated to command string
import os
import subprocess
username = "John; rm -rf /"
# Dangerous - command injection via os.system
os.system(f"echo Hello {username}")
# Dangerous - command injection via os.popen
os.popen(f"cat /etc/passwd").read()
# Dangerous - subprocess with shell=True
subprocess.run(f"ping -c 1 {username}", shell=True)
Using
shell=Truepasses the command string to a system shell, allowing command chaining with;,|,&&, etc., or arbitrary command substitution with backticks or$().
The safe way
Reviewers must ensure arguments are passed as a list with shell=False to avoid invoking a system shell.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import subprocess
username = "John; uname -a"
# Safe - using list and shell=False
subprocess.run(["echo", f"Hello {username}"], shell=False)
# Safe - using shell=False without injection risk
subprocess.run(["ping", "-c", "1", "example.com"], shell=False)
# For commands that need shell features, validate input strictly
allowed_chars = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-")
if not all(c in allowed_chars for c in username):
raise ValueError("Invalid characters in username")
subprocess.run(["echo", f"Hello {username}"], shell=False)
Passing arguments as a list with
shell=Falseprevents shell interpretation of special characters. The command and its arguments are passed directly toexecve()without shell processing.
Insecure Deserialization
pickle.loads()
This module is inherently insecure and can execute arbitrary code during the unpickling process via methods like __reduce__.
Examples
1
2
3
4
5
6
7
8
9
10
11
12
import pickle
import os
# Attacker-controlled pickle data with malicious payload
class Malicious:
def __reduce__(self):
return (os.system, ("whoami",))
malicious_data = pickle.dumps(Malicious())
# Dangerous - deserializing untrusted data
pickle.loads(malicious_data) # Executes: whoami
During unpickling, Python calls
__reduce__to reconstruct objects. An attacker can specify arbitrary functions and arguments, leading to remote code execution.
The safe way
Use safer alternatives like json for data interchange, or implement strict validation if pickle is required.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import pickle
import json
# Safe - use JSON for untrusted data
data = '{"name": "John", "age": 30}'
parsed = json.loads(data) # No code execution possible
# If pickle is required, restrict to trusted sources only
# Or use a signed/encrypted wrapper with validation
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
# Whitelist only allowed classes
allowed = {"datetime": ["date", "timedelta"], "collections": ["deque"]}
if module in allowed and name in allowed[module]:
return super().find_class(module, name)
raise pickle.UnpicklingError(f"Forbidden: {module}.{name}")
# Safe unpickling with restricted class loading
with open("data.pkl", "rb") as f:
data = RestrictedUnpickler(f).load()
JSON only parses primitive data types and cannot instantiate arbitrary objects. If pickle is unavoidable, subclass
Unpicklerto whitelist allowed classes.
yaml.load()
Reviewers must verify the use of yaml.safe_load() or an explicit SafeLoader to prevent the instantiation of arbitrary Python objects.
Examples
1
2
3
4
5
6
7
8
9
10
11
import yaml
# Malicious YAML that executes commands
malicious_yaml = """
!!python/object/apply:os.system
args: ['whoami']
"""
# Dangerous - loads arbitrary Python objects
yaml.load(malicious_yaml, Loader=yaml.FullLoader)
yaml.load(malicious_yaml, Loader=yaml.UnsafeLoader)
The
!!python/object/applytag tells PyYAML to instantiate a Python object and call it. Combined withos.system, this executes arbitrary shell commands.
The safe way
Always use yaml.safe_load() or explicit SafeLoader.
1
2
3
4
5
6
7
8
9
10
11
12
13
import yaml
data = """
name: John
age: 30
items: [1, 2, 3]
"""
# Safe - only basic Python objects (dicts, lists, strings, numbers, etc.)
parsed = yaml.safe_load(data)
# Or explicitly specify SafeLoader
parsed = yaml.load(data, Loader=yaml.SafeLoader)
yaml.safe_load()only constructs simple Python objects (mappings, sequences, strings, numbers, booleans, None). It explicitly forbids arbitrary Python object instantiation.
Path Manipulation Pitfalls
os.path.join() and pathlib.joinpath()
A common manual review finding is absolute path truncation, where the function discards all preceding segments if a later part of the path starts with a forward slash (/).
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import os
from pathlib import Path
# User provides path that starts with /
user_input = "/etc/passwd"
base_dir = "/var/www/uploads"
# Dangerous - absolute path bypasses base directory
result = os.path.join(base_dir, user_input)
print(result) # /etc/passwd - base directory is discarded!
# Same issue with pathlib
result = Path(base_dir) / user_input
print(result) # /etc/passwd
When a path component begins with
/,os.path.join()treats it as an absolute path and discards all preceding components. This enables path traversal attacks.
The safe way
Validate and sanitize paths, then resolve to ensure they stay within allowed directories.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import os
from pathlib import Path
def safe_join(base_dir, user_input):
base_path = Path(base_dir).resolve()
requested_path = (base_path / user_input).resolve()
# Ensure the resolved path is within base_dir
if not requested_path.is_relative_to(base_path):
raise ValueError("Path traversal detected")
return requested_path
# Usage
base_dir = "/var/www/uploads"
user_input = "/etc/passwd"
# Safe - raises ValueError for traversal attempt
safe_join(base_dir, user_input) # ValueError: Path traversal detected
# Safe - normal filenames work correctly
safe_join(base_dir, "documents/report.txt")
# Returns: /var/www/uploads/documents/report.txt
Using
.resolve()canonicalizes the path, then checkingis_relative_to()ensures the final path stays within the allowed directory.
urllib.parse.urljoin(): This function may discard the entire base URL if an absolute URL is provided as input, which can be abused for Open Redirects or SSRF.
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
from urllib.parse import urljoin
base_url = "https://example.com/path/"
user_input = "https://evil.com/phishing"
# Dangerous - absolute URL completely replaces base
result = urljoin(base_url, user_input)
print(result) # https://evil.com/phishing
# Also dangerous - protocol-relative URL
user_input = "//evil.com/phishing"
result = urljoin(base_url, user_input)
print(result) # https://evil.com/phishing
urljoin()treats absolute URLs (starting with scheme or//) as complete replacements, ignoring the base URL entirely. This enables open redirect attacks.
The safe way
Validate that the resolved URL stays within the expected domain, or use explicit allowlists.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from urllib.parse import urljoin, urlparse
def safe_redirect(base_url, user_input):
# Reject absolute URLs and protocol-relative URLs
if user_input.startswith(("http://", "https://", "//")):
raise ValueError("Absolute URLs not allowed")
resolved = urljoin(base_url, user_input)
parsed = urlparse(resolved)
# Verify the resolved URL stays on the same domain
base_domain = urlparse(base_url).netloc
if parsed.netloc != base_domain:
raise ValueError("Redirect to external domain forbidden")
return resolved
# Usage
base_url = "https://example.com/path/"
# Safe - relative path works
safe_redirect(base_url, "dashboard")
# Returns: https://example.com/dashboard
# Safe - blocked: absolute URL
safe_redirect(base_url, "https://evil.com")
# Raises: ValueError: Absolute URLs not allowed
# Safe - blocked: external domain
safe_redirect(base_url, "//evil.com/phishing")
# Raises: ValueError: Absolute URLs not allowed (caught early)
Rejecting absolute URLs at input time prevents the issue entirely. Additionally validating the final domain ensures no open redirect to external sites.
Sources
- https://www.aikido.dev/blog/python-security-vulnerabilities
- https://docs.python.org/3/library/functions.html#eval
- https://docs.python.org/3/library/functions.html#compile
- https://semgrep.dev/docs/learn/vulnerabilities/insecure-deserialization/python
