Detailed Explanation of Path Traversal Vulnerability in Web Applications
I. Vulnerability Description
Path traversal, also known as directory traversal, is a security vulnerability where an attacker can access files or directories outside the intended directory of an application by manipulating input parameters (typically file path-related parameters). By using special character combinations containing ../ (or similar sequences), attackers can bypass the root directory restrictions set by the application, thereby reading, writing, or executing arbitrary files on the target server's file system. For example, an attacker might construct a path like ../../../../etc/passwd to access sensitive system files.
II. Vulnerability Principle and Causes
- Root Cause: The application fails to adequately validate, filter, or normalize user-supplied file path parameters. For instance, a web application providing file download functionality might have a URL like
https://example.com/download?file=report.pdf. The backend code might directly use the user-inputfileparameter value to concatenate the full filesystem path, such as/var/www/app/uploads/report.pdf. - Attack Vector: An attacker can change the
fileparameter value to../../../etc/passwd. If the backend program naively concatenates it as/var/www/app/uploads/../../../etc/passwd, after operating system path resolution, it becomes/etc/passwd, successfully accessing the system password file. - Input Points: This vulnerability exists not only in query parameters (GET) but is also common in POST request parameters, cookies, HTTP headers (e.g.,
X-Forwarded-For,User-Agentmight be treated as paths in specific contexts), and filenames during file uploads.
III. Vulnerability Exploitation Steps (Progressive)
Assume a simple file download service exists, with backend code written in PHP:
$file = $_GET['file'];
$filepath = '/var/www/html/uploads/' . $file;
readfile($filepath);
Step 1: Basic Probing
The attacker first attempts to access a normal file to confirm functionality: https://example.com/download.php?file=user_guide.pdf
The server returns the content of the file /var/www/html/uploads/user_guide.pdf.
Step 2: Attempting Path Traversal
The attacker attempts to escape the uploads directory to access a file in the parent directory: https://example.com/download.php?file=../config.php
The backend constructs the path /var/www/html/uploads/../config.php, which resolves to /var/www/html/config.php. If this file exists and is readable, its source code (potentially containing database credentials) is leaked.
Step 3: Multi-level Traversal to Access System Files
To access files outside the web root directory, the attacker needs more ../. For example, attempting to access the Linux system password file: https://example.com/download.php?file=../../../../etc/passwd
The constructed path is /var/www/html/uploads/../../../../etc/passwd. The resolution process is as follows (assuming Web root directory is /var/www/html):
/var/www/html/uploads/../../-> first go up fromuploadstohtml, then up fromhtmltowww, resulting in/var/www- Then append
../../etc/passwd-> go up two more levels from/var/wwwto the root directory/, then intoetc, finally resolving to/etc/passwd.
Step 4: Bypassing Potential Filtering Mechanisms
- Filtering
../: If the program simply removes../, attackers might use double encoding or nested sequences. For example,..%2fis the URL encoding of../. If the program filters before decoding, attackers can try%2e%2e%2f(encoding of each character) or....//(which becomes../after one round of filtering../). - Absolute Paths: Sometimes, if the program does not restrict relative paths, submitting an absolute path like
file=/etc/passwdmight also succeed. - Null Byte Injection (in older versions of PHP, etc.): In dynamic languages like PHP, if the program uses string concatenation followed by a function like
include, an attacker might append a null byte (%00) to truncate any suffix added by the program. For example,file=../../../etc/passwd%00, when the code isinclude($user_input . '.php');, the null byte causes the system to terminate the path before reading.php.
Step 5: Writing Files (if write operations exist)
If the application also allows file uploads or modifications and has a path traversal flaw, attackers might attempt to write files to unintended locations, such as writing a Web Shell to the web directory: file=../../../var/www/html/shell.php (specific path needs probing).
IV. Vulnerability Defense Measures
- Input Whitelist Validation: The most effective method. Define a whitelist of allowed filenames or identifiers (e.g., only
guide.pdf,intro.txt, etc.), allowing access only to these predefined files. Avoid directly using user input as part of the filesystem path. - Path Normalization and Validation: If a whitelist cannot be used, user input must be processed:
- Normalize Path: Use functions provided by the programming language (e.g.,
os.path.normpathin Python,Path.normalizein Java) to resolve.and..in the path, then check if the normalized path remains within the intended base directory. - Absolute Path Check: Concatenate the user input with a predefined safe base directory (e.g.,
/var/www/app/uploads/), then check if the resulting full path starts with this base directory. Example:if (!fullPath.startsWith('/var/www/app/uploads/')) { throw error; }.
- Normalize Path: Use functions provided by the programming language (e.g.,
- Use File IDs or Indexes: Do not pass filenames directly; instead, pass a database ID. The backend queries the database with the ID to retrieve the corresponding safe storage path.
- Remove Special Characters: The validation logic should reject inputs containing path traversal sequences (e.g.,
../,..\(Windows)) or null bytes. Note: Filtering should occur after URL decoding, considering various encodings and variants. - Principle of Least Privilege: The operating system user running the web server (e.g.,
www-data,nobody) should have only the minimum necessary read permissions for application files and required directories, and should never have read permissions for critical system files (e.g.,/etc/shadow) or write permissions outside the web root directory. - Secure Framework Functions: Use secure APIs. For example, in Java use the
Pathinterface with itsresolve,normalize, andstartsWithmethods; in Python useos.path.join(base, filename)combined with checks.
Summary: The core of path traversal vulnerabilities lies in excessive trust in user-supplied path parameters. The key to defense is never trusting user input and ensuring that the final accessed file path is strictly confined within the application's predefined safe directory scope. By combining whitelists, path normalization, base directory verification, and the principle of least privilege, such attacks can be effectively defended against.