Post

Security Code Review Basics in Python

Security Code Review Basics in Python

Understanding how this functions works in depth

Manual security code reviews in Python are essential for identifying business logic flaws and complex trust boundaries that automated tools frequently overlook.

A primary focus of this article involves function analysis, specifically examining how user-controlled input flows into “sinks”—sensitive functions that can trigger vulnerabilities if the data is not properly sanitized.

Analysis of High-Risk Python Functions

A critical component of manual review is identifying specific built-in and external library functions with unexpected or dangerous behaviors.

Dynamic Evaluation

The functions eval() and exec() are significant liabilities because they allow strings to be executed as code.

Examples eval() https://docs.python.org/3/library/functions.html#eval

This function executes arbitrary code. Calling it with untrusted user-supplied input will lead to security vulnerabilities.

1
2
3
4
5
6
eval('4+1')
# 5

# Injecting malicious code.
eval('__import__("os").system("uname -a")')
# Linux legion 6.11.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 26 14:16:59 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

The safe way

Manual analysis should confirm these are replaced with safer alternatives like ast.literal_eval() for data structures or dispatch tables for function calls.

1
2
3
4
5
6
# In this case when trying to execute arbitrary commands the system raises an exception.

ast.literal_eval('__import__("os").system("uname -a")')
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/usr/lib/python3.12/ast.py", line 112, in literal_eval

ast.literal_eval() safely evaluates only literals (strings, numbers, tuples, lists, dicts, booleans, None). It converts a JSON-like string into a dictionary and prevents arbitrary code execution by raising a ValueError for unsafe expressions.

OS Command Execution

Functions such as os.system(), os.popen(), and subprocess.run(..., shell=True) are prone to command injection if user input is directly concatenated.

Examples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Vulnerable: User input directly concatenated to command string
import os
import subprocess

username = "John; rm -rf /"

# Dangerous - command injection via os.system
os.system(f"echo Hello {username}")

# Dangerous - command injection via os.popen
os.popen(f"cat /etc/passwd").read()

# Dangerous - subprocess with shell=True
subprocess.run(f"ping -c 1 {username}", shell=True)

Using shell=True passes the command string to a system shell, allowing command chaining with ;, |, &&, etc., or arbitrary command substitution with backticks or $().

The safe way

Reviewers must ensure arguments are passed as a list with shell=False to avoid invoking a system shell.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import subprocess

username = "John; uname -a"

# Safe - using list and shell=False
subprocess.run(["echo", f"Hello {username}"], shell=False)

# Safe - using shell=False without injection risk
subprocess.run(["ping", "-c", "1", "example.com"], shell=False)

# For commands that need shell features, validate input strictly
allowed_chars = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-")
if not all(c in allowed_chars for c in username):
    raise ValueError("Invalid characters in username")
subprocess.run(["echo", f"Hello {username}"], shell=False)

Passing arguments as a list with shell=False prevents shell interpretation of special characters. The command and its arguments are passed directly to execve() without shell processing.

Insecure Deserialization

pickle.loads() This module is inherently insecure and can execute arbitrary code during the unpickling process via methods like __reduce__.

Examples

1
2
3
4
5
6
7
8
9
10
11
12
import pickle
import os

# Attacker-controlled pickle data with malicious payload
class Malicious:
    def __reduce__(self):
        return (os.system, ("whoami",))

malicious_data = pickle.dumps(Malicious())

# Dangerous - deserializing untrusted data
pickle.loads(malicious_data)  # Executes: whoami

During unpickling, Python calls __reduce__ to reconstruct objects. An attacker can specify arbitrary functions and arguments, leading to remote code execution.

The safe way

Use safer alternatives like json for data interchange, or implement strict validation if pickle is required.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import pickle
import json

# Safe - use JSON for untrusted data
data = '{"name": "John", "age": 30}'
parsed = json.loads(data)  # No code execution possible

# If pickle is required, restrict to trusted sources only
# Or use a signed/encrypted wrapper with validation
class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        # Whitelist only allowed classes
        allowed = {"datetime": ["date", "timedelta"], "collections": ["deque"]}
        if module in allowed and name in allowed[module]:
            return super().find_class(module, name)
        raise pickle.UnpicklingError(f"Forbidden: {module}.{name}")

# Safe unpickling with restricted class loading
with open("data.pkl", "rb") as f:
    data = RestrictedUnpickler(f).load()

JSON only parses primitive data types and cannot instantiate arbitrary objects. If pickle is unavoidable, subclass Unpickler to whitelist allowed classes.

yaml.load() Reviewers must verify the use of yaml.safe_load() or an explicit SafeLoader to prevent the instantiation of arbitrary Python objects.

Examples

1
2
3
4
5
6
7
8
9
10
11
import yaml

# Malicious YAML that executes commands
malicious_yaml = """
!!python/object/apply:os.system
args: ['whoami']
"""

# Dangerous - loads arbitrary Python objects
yaml.load(malicious_yaml, Loader=yaml.FullLoader)
yaml.load(malicious_yaml, Loader=yaml.UnsafeLoader)

The !!python/object/apply tag tells PyYAML to instantiate a Python object and call it. Combined with os.system, this executes arbitrary shell commands.

The safe way

Always use yaml.safe_load() or explicit SafeLoader.

1
2
3
4
5
6
7
8
9
10
11
12
13
import yaml

data = """
name: John
age: 30
items: [1, 2, 3]
"""

# Safe - only basic Python objects (dicts, lists, strings, numbers, etc.)
parsed = yaml.safe_load(data)

# Or explicitly specify SafeLoader
parsed = yaml.load(data, Loader=yaml.SafeLoader)

yaml.safe_load() only constructs simple Python objects (mappings, sequences, strings, numbers, booleans, None). It explicitly forbids arbitrary Python object instantiation.

Path Manipulation Pitfalls

os.path.join() and pathlib.joinpath() A common manual review finding is absolute path truncation, where the function discards all preceding segments if a later part of the path starts with a forward slash (/).

Examples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import os
from pathlib import Path

# User provides path that starts with /
user_input = "/etc/passwd"
base_dir = "/var/www/uploads"

# Dangerous - absolute path bypasses base directory
result = os.path.join(base_dir, user_input)
print(result)  # /etc/passwd - base directory is discarded!

# Same issue with pathlib
result = Path(base_dir) / user_input
print(result)  # /etc/passwd

When a path component begins with /, os.path.join() treats it as an absolute path and discards all preceding components. This enables path traversal attacks.

The safe way

Validate and sanitize paths, then resolve to ensure they stay within allowed directories.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import os
from pathlib import Path

def safe_join(base_dir, user_input):
    base_path = Path(base_dir).resolve()
    requested_path = (base_path / user_input).resolve()
    
    # Ensure the resolved path is within base_dir
    if not requested_path.is_relative_to(base_path):
        raise ValueError("Path traversal detected")
    return requested_path

# Usage
base_dir = "/var/www/uploads"
user_input = "/etc/passwd"

# Safe - raises ValueError for traversal attempt
safe_join(base_dir, user_input)  # ValueError: Path traversal detected

# Safe - normal filenames work correctly
safe_join(base_dir, "documents/report.txt")
# Returns: /var/www/uploads/documents/report.txt

Using .resolve() canonicalizes the path, then checking is_relative_to() ensures the final path stays within the allowed directory.

urllib.parse.urljoin(): This function may discard the entire base URL if an absolute URL is provided as input, which can be abused for Open Redirects or SSRF.

Examples

1
2
3
4
5
6
7
8
9
10
11
12
13
from urllib.parse import urljoin

base_url = "https://example.com/path/"
user_input = "https://evil.com/phishing"

# Dangerous - absolute URL completely replaces base
result = urljoin(base_url, user_input)
print(result)  # https://evil.com/phishing

# Also dangerous - protocol-relative URL
user_input = "//evil.com/phishing"
result = urljoin(base_url, user_input)
print(result)  # https://evil.com/phishing

urljoin() treats absolute URLs (starting with scheme or //) as complete replacements, ignoring the base URL entirely. This enables open redirect attacks.

The safe way

Validate that the resolved URL stays within the expected domain, or use explicit allowlists.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from urllib.parse import urljoin, urlparse

def safe_redirect(base_url, user_input):
    # Reject absolute URLs and protocol-relative URLs
    if user_input.startswith(("http://", "https://", "//")):
        raise ValueError("Absolute URLs not allowed")
    
    resolved = urljoin(base_url, user_input)
    parsed = urlparse(resolved)
    
    # Verify the resolved URL stays on the same domain
    base_domain = urlparse(base_url).netloc
    if parsed.netloc != base_domain:
        raise ValueError("Redirect to external domain forbidden")
    
    return resolved

# Usage
base_url = "https://example.com/path/"

# Safe - relative path works
safe_redirect(base_url, "dashboard")  
# Returns: https://example.com/dashboard

# Safe - blocked: absolute URL
safe_redirect(base_url, "https://evil.com")  
# Raises: ValueError: Absolute URLs not allowed

# Safe - blocked: external domain
safe_redirect(base_url, "//evil.com/phishing")
# Raises: ValueError: Absolute URLs not allowed (caught early)

Rejecting absolute URLs at input time prevents the issue entirely. Additionally validating the final domain ensures no open redirect to external sites.


Sources

  • https://www.aikido.dev/blog/python-security-vulnerabilities
  • https://docs.python.org/3/library/functions.html#eval
  • https://docs.python.org/3/library/functions.html#compile
  • https://semgrep.dev/docs/learn/vulnerabilities/insecure-deserialization/python
This post is licensed under CC BY 4.0 by the author.