What is the overflow?
In computer programming, an overflow error happens when a calculation generates a result that exceeds the maximum value the computer can store in the designated memory location. This occurs when an arithmetic operation tries to create a number too large for representation within the available space or when the operation's result exceeds the capacity of the variable designated to hold it.
Overflow errors may cause unforeseen behavior in a program and can be difficult to debug because they may not always be immediately apparent. Programmers should consider the possibility of overflow errors while designing and coding a program.
When a program tries to write more data than the buffer can accommodate, unusual behavior ensues. As a consequence, the adjacent buffer memory locations suffer overwriting. We call this phenomenon Buffer Overflow anomaly.
Before diving deep into overflow vulnerabilities, let's understand the basics.
Memory Layout For A Process
Memory refers to the integrated circuits that store information for immediate use in a computer. Computer memory holds bit patterns by design. Also, computers can store instructions as bit patterns in memory, aside from data. Furthermore, segments differentiate based on data type and program type. Systems software stores them separately.
A multitasking OS runs in a virtual address space. The smallest unit the processor can address is 1 byte (8 bits). A 64-bit system allocates memory addresses in 8 bytes, a 32-bit system in 4 bytes, and a 16-bit system in 2 bytes.
During program execution, two spaces on the system are for processing: Kernel Space and User Space. Implicit interference occurs between the two processing spaces, and the program continues processing.
Kernel Space
User processes access the kernel space only through system calls, which are requests in a Unix-like environment, such as input/output (I/O) or process creation.
User Space
In the user space, the executing program has direct access to a computational resource allocated to the user. It can divide into a few segments.
The following image displays the virtual memory spaces for both kernel space and user space. There are five categories of user space in the virtual space: Stack and Heap, BSS, Data, and Text.
Memory Layout For A Process
Stack
This memory area is just below the OS kernel space and grows downwards to lower addresses. On some other architectures, it may develop in the opposite direction.
The stack is a last-in-first-out (LIFO) data structure. Stacks are abstract data types that serve as collections of elements with two primary operations:
- Push, which adds a new element to the collection
- Pop, which removes the most recently added element that still needs to be removed
A stack frame is a dataset that pushes for function calls. When calling a function, you push the called function's execution onto the stack, and upon the function's completion, you return the results and pop the function off the stack. Data needed by a function call is stored here, containing the following data:
- Parameter Values (Arguments) passed to the routine
- Return address to the routine caller
- Space of Local variables of the routine
Heap
A heap is a segment in which memory allocates dynamically. In most cases, this area begins at the end of the BSS segment and grows upward to higher addresses in the memory. For C, it's managed by malloc / new, free / delete, using brk and sbrksystem calls to adjust its size.
In the following cases, heap space is allocated.
- At runtime, memory size dynamically allocates
- There is no limit to the scope. Variables referenced from multiple places
- The size of the memory is considerable
Freeing the objects on the heap is essential to prevent memory leaks. Languages that use garbage collection free memory from the heap and avoid memory leaks.
Fragmentation occurs when "unused nodes" and "in use" nodes mix as garbage divides unused areas into pieces. Repeated allocation and release of the area may generate the unused area on the heap. As a result, there is little performance since there is no overhead to search for free space or degradation of the "locality of reference" of data.
BSS
The BSS segment (Block Started by Symbol) is an uninitialized data segment. Before the program executes, the kernel initializes the data in this segment to arithmetic 0. In other words, static int i; variable would allocate to the BSS segment.
Data
Data segments contain initialized global and static variables with a predefined value. The space divides into read-only and read-write areas.
Text
Machine language instructions are in this segment. This is a read-only segment space.
What is Buffer?
In computer science, a buffer is a region of physical memory storage that temporarily holds data while moving from one place to another. This can store data during context switching or to temporarily store user input until it transfers to a dedicated storage area.
Buffer Memory
The buffer acts as a connector between two processes, allowing them to run independently without being blocked by each other.
A buffer can be used in a variety of contexts, such as:
- In input/output operations, data is read from an input device or written to an output device in small chunks, called "buffers," rather than all at once. This allows the system to handle the data more efficiently and prevents the input/output device from being overwhelmed.
- In network programming, data is often sent and received in small chunks, called "packets," rather than all at once. These packets are temporarily stored in a buffer while transmitting or receiving.
- In database management systems, a buffer temporarily holds data that is read from or written to the database.
There are different types of buffer, such as input buffer, output buffer, software buffer, and hardware buffer. Each of them serves different purposes and can work in different ways.
What is buffer overflow?
A buffer overflow is a software vulnerability when a program attempts to write more data to a buffer (a memory location used to store data temporarily) than it can hold. This can cause the buffer to overflow, corrupting adjacent memory and potentially allowing an attacker to execute malicious code.
There are three main types of buffer overflows: stack-based, heap-based and global buffer.
- Stack-based buffer overflow: The stack is a section of memory that stores temporary data, such as function call frames and local variables. A stack-based buffer overflow occurs when a program writes more data to a buffer located on the stack than it can hold, causing the buffer to overflow into adjacent memory. This can corrupt data stored on the stack, potentially allowing an attacker to execute malicious code.
- Heap-based buffer overflow: The heap is a section of memory that stores dynamically allocated data, such as objects and arrays. A heap-based buffer overflow occurs when a program writes more data to a buffer located on the heap than it can hold, causing the buffer to overflow into adjacent memory. This can corrupt data stored on the heap, potentially allowing an attacker to execute malicious code.
- Global buffer overflow: A global buffer overflow is a type of software vulnerability that occurs when a program writes more data to a buffer than it can hold. Global buffer overflows can occur in any software, but they are widespread in C and C++ programs due to how arrays and pointers are in these languages.
Stack-based buffer overflow
A stack-based buffer overflow is a type of software vulnerability that occurs when a program writes more data to a buffer located on the stack (a section of memory used to store temporary data such as function call frames and local variables) than it can hold. This causes the buffer to overflow into adjacent memory on the stack, corrupting stored data, and potentially allowing an attacker to execute malicious code.
Stack-based buffer overflow Example
What is in this program?
This program takes a single command-line argument, a string, and copies it to a buffer of size 8 using the strcpy function.
The program also includes the print_stack function that displays the contents of the stack at different points in the program.
The vulnerable_function function first copies the input string to the buffer; then prints the buffer's contents and the stack. The primary function first prints the contents of the stack, then calls the vulnerable_function function with the input string.
What happens in execution?
The program will execute without issues if you run the program and provide an input string shorter than eight characters. However, suppose you give an input string that is longer than eight characters. In that case, the buffer overflow will occur, and the extra characters will overwrite the adjacent memory on the stack, potentially corrupting the return address of the vulnerable_function function.
For example, if you run the program with an input string "AAAAAAAAAAAAAA" (14 bytes long), you will see that the stack has been overwritten and the return address changes.
Output:
As you can see, the return address has been overwritten by the characters "AAAA" altering the program execution flow.
If the attacker can control the value of the overwritten return address, they can redirect the flow of the program to execute their malicious code.
Stack-based buffer overflow Example 2
What is in this program?
In this example, the vulnerable_function() occurs when a client connects to the server. The function reads data from the client into a buffer of size 10 using the recv() function. However, the extra data will overwrite adjacent memory on the stack if the client sends more than ten characters.
An attacker can exploit this vulnerability by sending a specially crafted input string containing the machine code they want to execute. For example, if the attacker knows that the buffer is at a specific memory address, they can craft an input string that contains the memory address of the buffer followed by the machine code that they want to execute
What happens in execution?
When the vulnerable_function() is called with the attacker's specially crafted input, the following sequence of events occurs:
- The recv() function reads the attacker's input into the buffer, which is on the stack.
- The input is longer than the size of the buffer, so the extra data overwrites adjacent memory on the stack. This can include essential data such as the return address, saved register values, and local variables.
- The attacker's input contains machine code they want to execute, which is now in the buffer on the stack.
- The printf() function is called and executes the machine code written to the buffer.
Note: The attacker can also use this machine code to redirect the flow of execution by overwriting the function's return address with the buffer's address, allowing them to execute the code they have sent as input. This is commonly known as a return-to-libc attack.
The attacker's machine code could be anything, including code that opens a shell, creates a new user, or connects to the attacker's machine to establish a remote shell. It could also be used to execute malicious code to perform various malicious activities, such as exfiltrating data or installing malware.
It is important to stress that this kind of attack can be successful only if the attacker can predict the program's memory layout.
Heap Based Buffer Overflow
A heap-based buffer overflow occurs when a program writes more data to a buffer on the heap than it can hold. This can happen in several ways:
- When a program allocates memory for a buffer on the heap using a memory management library such as malloc, calloc, realloc, etc., it writes more data to the buffer than it can hold.
- When a program uses a string manipulation function such as strcpy, strcat, sprintf, vsprintf, etc. that does not check the size of the destination buffer and writes more data to the buffer than it can hold.
- When a program uses a memory management library with a bug that allows for heap overflows, such as double-freeing memory or use-after-free.
Heap Based Buffer Overflow Example
What is in this program?
In this example, the check_password function takes a single command-line argument, a string, and compares it to a hard-coded password, "secretpassword" using the strcmp function. It allocates a buffer of size 16 on the heap using the malloc function and copies the hardcoded password to the buffer using the strcpy function.
What happens in an execution?
An attacker can try to overflow the buffer by providing a long password string that contains the correct password followed by additional characters.
In execution, The strcpy function will copy all the characters to the buffer and overwrite the adjacent memory on the heap, potentially corrupting the heap data and causing a heap-based buffer overflow. This can lead to problems such as program crashes, data loss, or even remote code execution.
For example, if an attacker provides a password string "secretpasswordAAAAAAAAAAAAAA," the strcpy function will copy all the characters to the buffer, including the correct password.
Then it will overwrite the adjacent memory on the heap. This may corrupt the memory allocation data structures, causing the heap to become unstable, and the checkpassword function will grant the attacker access to the program regardless of input.
Output:
Heap Based Buffer Overflow Example 2
What is in this program?
In this example, the vulnerable_function() takes a string as input and copies it to a buffer of size 10 allocated on the heap using malloc(). However, if the input is longer than ten characters, strcpy() will write past the end of the buffer, overwriting adjacent memory. This can cause the heap memory manager to become confused and lead to a crash or unstable behavior.
What happens in an execution?
When the vulnerable_function() is called with an attacker's specially crafted input, the following sequence of events occurs:
- The malloc() function allocates a buffer of size 10 on the heap.
- The strcpy() function copies the attacker's input into the buffer.
- The input is longer than the size of the buffer, so the extra data overwrites adjacent memory on the heap, including the heap metadata.
- The free() function is called to release the buffer, but the heap memory manager needs clarification because the metadata has been overwritten.
- The heap memory manager may now attempt to access invalid memory, leading to a crash or unstable behavior.
Global Buffer Overflow
A global buffer overflow occurs when a program writes more data to a buffer than it can hold. This can cause the excess data to overwrite adjacent memory, potentially corrupting or altering the program's intended behavior.
Global Buffer Overflow Example
What is in this program:
In this example, the function "overflow_me()" uses the "fgets()" function to read input from the user into the local buffer. "fgets()" function accepts three parameters: the buffer, the maximum number of characters to read, and the input source. The function reads characters from the input source and stores them in the buffer until the number of characters is read or a newline character is read, whichever comes first.
What happens in an execution?
If the user provides more than 32 characters, the excess data will overflow into the adjacent global buffer, potentially corrupting it. In this scenario, if the user inputs more than 64 characters, the computer will store the last 32 characters in the global buffer; the first 32 characters will cause overflow, and an attacker can use them to execute arbitrary code or crash the program.
Output:
Prevention Of Buffer Overflow Vulnerabilities
Buffer overflow vulnerabilities can be present in any application or system that uses buffers, so it is crucial to be aware of the potential risks and to take appropriate measures to prevent them. Combining the following techniques can also provide more robust protection against buffer overflow attacks.
- Input validation: This technique involves checking all input data for size and content before it is written to a buffer. For example, input validation would check that the incoming data does not exceed this limit if a program expects to receive a string of no more than 32 characters. Input validation can also check for potentially malicious characters or patterns in the input data.
- Boundary checks: This technique involves checking the size of the buffer before writing data to it. You can achieve this by comparing the input data size to the buffer size or using a counter that tracks the number of bytes written to the buffer. For example, if a program has a buffer of 32 bytes, a boundary check would ensure that the incoming data does not exceed 32.
- Stack protection: A stack is a region of memory used to store function call information such as function arguments, return addresses, and local variables. Stack protection techniques protect the stack from buffer overflow attacks by detecting when the stack has been overwritten. A common method used for stack protection is canary values, also known as stack cookies. A canary value is a random value placed on the stack before a buffer, and its value is checked before a function returns. If the canary value has been overwritten, it indicates that a buffer overflow has occurred.
- Address Space Layout Randomization (ASLR): This technique randomizes the memory layout of a program so that an attacker cannot predict the location of a buffer or other sensitive data. ASLR makes it more difficult for an attacker to locate and exploit a buffer overflow vulnerability by making it harder to predict the location of a buffer in memory.
- Data Execution Prevention (DEP): This technique marks certain memory regions as non-executable so any data written to a buffer cannot execute as code. DEP makes it more difficult for an attacker to take control of a program by preventing the execution of code injected into a buffer overflow.
- Use of safe libraries and functions: Some programming languages and libraries provide functions that automatically handle buffer overflows, such as strlcpy, strlcat, etc. These functions can be used in place of the standard string functions to reduce the risk of buffer overflows.
Conclusion
In conclusion, buffer overflow is a critical security vulnerability that can have severe consequences if not correctly handled. Preventing buffer overflow attacks requires careful programming practices, such as using safe functions for string handling like strncpy, snprintf, and strncat, which have built-in bounds checking.
To ensure the security of your application, it is essential to be aware of buffer overflow vulnerabilities and take the necessary steps to prevent them. This includes regular code reviews, penetration testing, and keeping your software up to date with the latest security patches.
References
- https://d0nut.medium.com/week-13-introduction-to-buffer-overflows-5f15c0d5b5c1
- https://medium.com/purple-team/buffer-overflow-c36dd9f2be6f
- https://medium.com/nerd-for-tech/buffer-overflow-attacks-b5e62a522e6e
- https://www.youtube.com/watch?v=yJF0YPd8lDw&t=2863s
- https://www.youtube.com/watch?v=1S0aBV-Waeo
- https://vickieli.dev/binary%20exploitation/buffer-overflow/
- https://snyk.io/blog/buffer-overflow-attacks-in-c/
- https://medium.com/techloop/understanding-buffer-overflow-vulnerability-85ac22ec8cd3
- https://blog.pentesteracademy.com/tagged/buffer-overflow