• No se han encontrado resultados

Algoritmos ‘one-stage detectors’

3. Estado del arte

3.1. Algoritmos de detección de objetos

2.1.4. Algoritmos ‘one-stage detectors’

Until about 2010, buffer overflows were the most widely exploited memory-corruption vul- nerability accounting for almost two thirds of all exploited vulnerabilities, at least in Mi- crosoft products [10]. They are relatively simple to exploit, especially when compared to use-after-free vulnerabilities, which we introduce in Chapter 2.2.2. Conceptually, a buffer overflow is a simple vulnerability: a fixed-size buffer is filled with more data than it can hold, which causes adjacent memory to be overwritten. This chapter covers the underlying problem of buffer overflows and how they are exploited.

Underlying Problem

We identify two major issues. The first issue is that control-flow data is sometimes stored in the same memory area with other, potentially user-controlled data and not protected in any way. The second issue is that C has no automatic bounds checks, which makes buffer overflows possible in the first place.

Storage of Control-Flow Data. Oftentimes, an address the program uses later for determin- ing control-flow is stored in writeable memory, i.e., on the stack or the heap. Remember, for example, how function calls work: first, parameters to the function are set up. Then, the actual call instruction executes and pushes the return address, i.e., the address of the instruction after the call instruction, on the stack. Then program execution continues at the call address. When the callee has finished, the previously stored return address is used to return to the caller, where program execution continues. This means that while the callee executes, an address that will be used later to control the program flow is stored on the stack together with other data.

No Implicit Bounds Checks. C heavily relies on arrays if several elements of the same type need to be stored. A typical example is the data type string, which does not exist in C. Instead, strings are stored in an array of characters, i.e., as a series of characters ter- minated by the string termination character '\0'. The size of an array is static, and no meta-information, such as its length, is stored with it9. Furthermore, during compilation

all information about variables and their respective sizes is lost. This is due to the fact that variable declarations in C are compiled to a single memory allocation, i.e., moving rsp to allocate space for them in the function prologue. Listing 2.7 shows C code, Listing 2.8 its ASM counterpart. Note how only one large block of memory, which is large enough to hold all variables, is allocated (the long long type is 8 bytes long, char is 1 byte long). Access- ing variables works simply through an offset of rbp or rsp, if base pointer omission is used as variables per se do not exist, only memory locations.

Listing 2.7: Variable declaration

1 int main() {

2 long long i, j, k, l;

3 char buffer[32];

4 ...

9During the design of C there were discussions whether a string terminator should be used or if the length

of the array should be stored in the first byte. Ultimately, it was decided to use a terminating character, as this more closely resembled C’s predecessor, B [134]

Listing 2.8: Space allocation

1 sub rsp, 40h ; 40h = 64

Both of these issues combined cause the problem that it is not possible to implicitly check whether accesses to an array are within its bounds. While it is to some extent possible to extract information about variables, e.g., from debugging symbols, forcing bounds checks on every array access has a negative effect on performance, one of the main benefits of C / C++. Therefore, it is the programmers’ responsibility to explicitly implement correct bounds checks. If they neglect this, it is possible that memory accesses are out of bounds. Listing 2.9 shows an example of this. This simple program creates a buffer and then uses a loop to initialize its elements to the character ’A’. buffer contains 32 elements (line 2), but the loop (line 4) will execute 33 times10. In the last iteration the program overwrites data which is not part of buffer, more precisely, it overwrites the byte following buffer. Depending on what data is stored at this location, the program might crash.

Listing 2.9: Out of bounds access in an array

1 main() {

2 char buffer[32];

3

4 for (int i = 0; i <= 32; i++ ){

5 buffer[i] = 'A';

6 }

7 }

Exploiting Buffer Overflows

In Chapter 2.1.3 we reviewed the concept of function calls, which store the return address on the stack. If a buffer overflow affects a buffer located on the stack it may be possible to overwrite a return address. By carefully crafting an input that triggers the vulnerability, an attacker may therefore overwrite the return address with an arbitrary value, hijacking the program flow. This attack is know widely known as stack smashing [3]. Listing 2.10 shows code containing a buffer overflow that can be exploited by an attacker to hijack the control-flow.

Listing 2.10: A buffer overflow vulnerability

1 int vuln() {

2 char buffer[24];

3 long long length = 0;

4 gets(buffer);

5 length = strlen(buffer);

6 return length;

7 }

This function reads user input and returns the input’s length. Line 2 declares a variable called buffer, which can store 24 bytes. The actual usable size is 23 bytes, because the string termination character (\0) requires another byte. Line 3 declares an integer variable called length and initializes it to 0. Line 4 reads user input from the keyboard into variable bufferusing library function gets. Line 5 computes the length of buffer using library function strlen and stores it in length. Line 6 returns the value of length to the caller.

In this toy example, line 4 introduces a buffer overflow vulnerability. In this line, user input is read into buffer, a variable with a fixed length of 24. Therefore, if the user enters more than 23 characters (the string terminator is appended automatically after hitting enter), contents in adjacent memory, in this case the saved base pointer and the return address, are overwritten. Figure 2.4a and Figure 2.4b show the stackframe of function vuln before and after the overflow11. Note that the program still expects the return address at the same location, therefore, it can be controlled by the attacker.

Assume a malicious user tries to actively exploit the vulnerability, entering the follow- ing string: AAAAAAAAAAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD The stack will look like shown in Figure 2.5. When function vuln has finished and the function prologue runs (see Chapter 2.1.3), Step 7 works correctly, but Steps 8 and 9 load user-controlled data into rbp and rip. rip contains 4444444444444444h (the ASCII value of ’D’, which is 44h, re- peated 8 times), an address where likely no code is mapped, causing an access violation which leads to a crash of the program. However, since the user can enter arbitrary data, she can craft a more malicious exploit, shown in Figure 2.6. Here, the user injects shellcode and overwrites the return address with the start address of the buffer, in this case 18FF00h. When the program executes the ret instruction, 18FF00h is loaded into rip, which causes the shellcode to be executed.

This is just one possible exploit structure. It has the disadvantage, that the attacker has only 40 bytes of space available for her shellcode. Another approach would be, for example,

Listing 2.11: Fixed buffer overflow

1 int vuln() {

2 long long length = 0;

3 char buffer[16];

4 fgets(buffer, sizeof(buffer), stdin);

5 length = strlen(buffer);

6 return length;

7 }

to place the shellcode beyond the current stack frame, resulting in a layout as shown in Figure 2.7. If the program allocates space on the heap it may also be possible to place the shellcode there.

Lastly, Listing 2.11 shows the code snippet from Listing 2.10 with the buffer overflow bug fixed. Instead of reading user input using gets, fgets is used. It takes two additional parameters, the maximum size that should be read, which is set to the size of the buffer using the sizeof function, and where data should be read from. In this example we use stdin, which is the keyboard.