The Hypertext Transfer Protocol (HTTP) and its most notable version HTTP/1.1 published in 1999 [49], is the result of a coordinated work between the IETF and the World Wide Web Consortium (W3C). Its standard describes a text based request-response protocol, in use between a client and a server to exchange application data on top of TCP [109]. Internet navigators, such as Firefox or Internet Explorer, use this protocol to download website content during an Internet navigation. The navigator plays the role of an HTTP client and sends HTTP requests to a website hosted by an HTTP server that answers with HTTP responses. HTTP is a stateless protocol that only accepts sequences of messages that follow a request/response pattern initiated by the client.
An HTTP request denotes a specific method (e.g. GET, HEAD, POST, PUT) indicating the desired action to be performed on a given resource. Some of these methods are only intended for information retrieval, such as the HEAD, GET, OPTIONS and TRACE methods while others may change the server internal state such as POST, PUT or DELETE methods. A request message has a specific format that consists of a “Request-Line” followed by a “Request Header” and a message body.
The Request-Line is made of three successive fields separated by a space character. The first field contains the method name also called the request command. Its value must be one of the following: GET, HEAD, POST, OPTIONS, CONNECT, TRACE, PUT, PATCH or DELETE. The next field denotes the resource on which the command applies and must be an URL. Finally, the third field of the Request-Line contains the protocol version number.
The Request Header consists in a set of name-value pairs, each pair denoting a property. Properties are separated with a CRLF sequence of characters. HTTP specifications describe a property under an ABNF language as illustrated on Listing 2.1. It specifies it as the concatenation of a token, the “:” character and a value. Previously in specification, a token is defined as a string that contains neither American ASCII control characters or delimiters. The value property is made of any sequence of printable ASCII characters, token, separators, space character or quoted string. The only mandatory property that must be present in a Request Header is the Host property that denotes an hostname optionally followed by a port number (e.g. “www.w3c.org:80”).
The message body is separated from the previous field with a blank line. It is used to carry the
entity-body associated with the request or the response message. If a message-body is specified, the Request Header must include a Content-Length or a Transfer-Encoding property. Figure 2.1 illustrates an example of an HTTP request.
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations of token, separators, and quoted-string>
Listing 2.1– ABNF definition of the HTTP header properties as described in RFC 2616 [49])
HTTP response messages accept a very similar message format. It denotes a “Status-Line” field made of the server version number and a status identifier that contains a numeric status code, such as “200” and a textual reason phrase, such as “OK”. This line is then followed with properties stored in a “Response Header”. An optional message body can also follow.
GET / HTTP/1.1
Host: www.amossys.fr
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv: 24.0) Gecko/20131101 Firefox/24.0 Iceweasel/24.1.0 Accept: text/html,application/xhtml+xml
Accept-Language: fr,fr-fr;q=0.8,en-us; Accept-Encoding: gzip, deflate
Connection: keep-alive
---blank line--- (Empty body)
Figure 2.1 – Sample HTTP GET request with highlighted fields.
To summarize, HTTP is a stateless text-based protocol that enables data exchanges on top of TCP. It follows a request-response communication pattern always triggered by the client. An HTTP message consist in a header made of multiple fields followed by an optional payload that can host data brought by protocols on top of HTTP. Messages can be classified in different types following the semantic they denote. However, all the request messages share the same protocol format and the type information is represented by one of its field value. Similarly, all HTTP response messages follow the same field definition. Regarding its format, HTTP makes an heavy use of ASCII delimiters (e.g. “ ”, “:”, “CRLF”) and very few size fields such as the “Content-Length” field. Besides, some of its fields are optional and no specific rule establishes their declaration order. Finally, it exists very few relationships between its fields value. Many communication protocols that belongs to the highest layers of the ISO model share similarities. Indeed, traditional protocols belonging to these layers such as the application layer were often created with an objective of being usable and readable by humans. It explains the use of ASCII to encode exchanged data. For example, the SMTP [110, 79], FTP [62] and IRC [102, 74] protocols share similar message formats with their ASCII fields delimited with specific ASCII characters.