• No se han encontrado resultados

The HTTP protocol

N/A
N/A
Protected

Academic year: 2023

Share "The HTTP protocol"

Copied!
36
0
0

Texto completo

(1)

The HTTP protocol

Jes´us Arias Fisteus

Web Applications (2022/23)

(2)

The HTTP protocol

HTTP (Hypertext Transfer Protocol) is an application-level stateless protocol for distributed and

collaborative hypertext information systems.

(3)

The HTTP protocol

HTTP is based on the transmission of messages on top of the TCP transport protocol:

I Clients send a request message to a server, requesting it to perform an action on a specific resource (often, just getting the resource itself).

I Servers send back a response message, which usually includes the contents of the requested resource.

(4)

Protocol versions

The mostly used versions nowadays are:

I HTTP/1.1: the most frequently used version since the end of 90s.

I HTTP/2: new version being deployed in the last few years. It improves performance through binary encoding of messages, header compression, request/response multiplexing on top of single TCP connections, server initiated requests, etc.

(Both protocol versions are compatible in terms of message semantics and structure, with changes affecting mainly to message encoding and transport on top of TCP connections.)

(5)

Resource identifiers

HTTP resources are identified in HTTP throughuniform resource identifiers (URI).

AURI is a compact character sequence that identifies a physical or abstract resource, using multiple protocols and

applications.

A URI that also provides the information needed to locate and access the resource is nameduniform resource locator

(URL).

(6)

Structure of a URI: example

(7)

Structure of a URI

I Scheme: refers to the name of a scheme, which defines how identifiers are assigned within its scope. The schemes used for HTTP are httpandhttps.

I Authority: element from a hierarchical naming authority, typically based on a DNS domain name or a network address (IP, IPv6) and, optionally, a port number.

I Path: element that identifies a resource within the scope of the provided scheme and authority, typically hierarchically organized in fragments separated by slashes (“/”).

I Query: non hierarchical data that, combined with the path, allows to identify the resource. It’s usually presented as one or more name/value pairs.

I Fragment identifier: identifies a secondary resource in the

(8)

Examples of URIs with query

https://aulaglobal.uc3m.es/course/view.php?id=91019

https://www.google.com/search?q=madrid&tbm=isch

(9)

Examples of URIs with fragment identifiers

http://example.com/manual#cap3

http://example.com/manual?lang=es#cap3

(10)

Reserved characters

I URIs can contain just US-ASCII letters, digits and a few graphical symbols (“-”, “.”, “ ” y “∼”), as well as reserved characters used, among other things, to delimit their components (“:”, “/”, “?”, “#”, “[”, “]”, “@”, etc.)

I Other characters, as well as any reserved character when used to represent normal data instead of a delimiter, must be encoded withURL encoding.

(11)

URL encoding

I Each character not allowed in a URI is encoded as an octet sequence. Each octet is presented with the symbol “%”

followed by the octet itself represented with two hexadecimal characters.

I For example:

I “path=docs/index.html” is encoded as

“path=docs%2Findex.html” (the “/” character is encoded with the 2F octet in ASCII, UTF-8 and most character encoding schemes).

I “q=evaluaci´on” is encoded as “q=evaluaci%C3%B3n” (the

“´o” character is encoded in UTF-8 with the octet sequence C3–B3).

(12)

HTTP methods

Request messages specify a method, which defines the action to perform on the resource.

(13)

HTTP methods

The main methods used by Web applications are:

I GET: get the resource.

I POST: do some processing on the resource (the actual kind of processing is resource-dependent) based on the data included in the request message.

(Other methods defined by the HTTP standard are: HEAD, PUT, DELETE, CONNECT, OPTIONS and TRACE.)

(14)

Requests with GET method

GETrequests:

I Are used to get the contents of resources (HTML pages, images, etc.).

I Are generated by Web browsers when, among others, users type some URL at their address bar or click on a hyperlink, additional resources linked to a just-received Web page are needed, or some forms need to be sent.

I Are subject to caching to optimize resource utilization.

I Are supposed to be safe, that is, they cannot have side effects on the server, application state, etc.

(15)

Requests with POST method

POSTrequests:

I Are used to perform actions (authenticate a user, add a product to the shopping cart, confirm an order in an online shop, upload a message to a social network, etc.).

I Are generated by Web browsers when some forms are sent.

I Aren’t subject to caching.

I Can be unsafe. Among other potential problems, repeating the request could have side effects (e.g. confirming the same order twice).

(16)

Structure of a request message

An HTTP request includes:

I Resource URL (without scheme and authority).

I Method.

I Request headers: additional data about how the request needs to be processed.

I Request body (only for some methods): data to be processed by the server.

(17)

Request body

A request body:

I Cannot appear in GET requests.

I Includes, in POST requests, the data needed by the server to process it.

I Is usually combined with the Content-Type and Content-Length request headers.

(18)

Example of an HTTP/1.1 request

GET /Inicio HTTP/1.1 Host: www.uc3m.es Connection: keep-alive Cache-Control: max-age=0

User-Agent: Chrome/62.0.3202.89 Upgrade-Insecure-Requests: 1

Accept: text/html,application/xhtml+xml,application/xml;q=0.9 Accept-Encoding: gzip, deflate, br

Accept-Language: es-ES,es;q=0.9,en;q=0.8,en-US;q=0.7

(19)

Structure of a response message

An HTTP response includes:

I A status: numeric code that indicates the result of the processing of the request.

I Response headers: additional data about how the response needs to be processed.

I Response body: representation of the response to the request, typically as an HTML page, an image, etc.

(20)

Example of an HTTP/1.1 response

HTTP/1.1 200 OK

Server: Apache-Coyote/1.1

Set-Cookie: JSESSIONID=E26E8...; Domain=www.uc3m.es; HttpOnly Cache-Control: no-store

Last-Modified: Fri, 10 Nov 2017 11:44:28 CET Content-Type: text/html;charset=UTF-8

Transfer-Encoding: chunked

Date: Fri, 10 Nov 2017 10:44:28 GMT

<!DOCTYPE html>

<html lang="es" class="no-js">

<head>

<title>Inicio | UC3M</title>

(...)

(21)

Main headers used in requests and responses

I Connection: information on whether the TCP connection should be closed after completing the response to the request.

I Content-Encoding: data encoding (compression, typically) that has been applied to the request or response body.

I Content-Length: length in bytes of the request or response body.

I Content-Type: MIME type of the request or response body (e.g., text/html).

(22)

Main request-specific headers

I Accept: client preferences regarding content types to receive.

I Accept-Encoding: client preferences regarding the encoding to apply to the response body (compression, typically).

I Cookie: sending cookies back to the server.

I Host: requested authority.

I If-Modified-Since: timestamp of the latest version of the resource at the client’s cache.

I If-None-Match: ETag value received with the last version of the resource at the client’s cache.

I Referer: URL from which the current request originates.

I User-Agent: information (name, version, etc.) about the

(23)

Main response-specific headers

I Cache-Control: instructions about how the resource can be stored in cache.

I ETag: code that identifies the current contents of the resource.

I Expires: up to when the resource can be taken from cache.

I Vary: list of request headers whose values would make the contents of a resource change.

I Location: in redirection responses, new URL to be requested by the client.

I Server: information (name, version, etc.) about the server’s software.

I Set-Cookie: cookies that the client should send back in

(24)

Status codes of HTTP responses

Five types of status codes:

I 1XX: informational.

I 2XX: resource successfully processed.

I 3XX: redirection to another resource.

I 4XX: error in the request.

I 5XX: error in the server.

(25)

Status codes of HTTP responses

Code Reason Meaning

200 OK Request successfully processed.

301 Moved Permanently Resource moved to another URL (Location header), that the client should always use from now on.

302 Found Resource temporary moved to another loca- tion (Location header).

303 See Other The other resource (confirmation page, progress, etc.) has to be requested with GET method.

304 Not Modified The client can use the version of the resource it currently has in cache.

400 Bad Request The client sent an invalid HTTP request (syn- tax, etc.).

403 Forbidden It’s forbidden to the client to access this re- source.

404 Not Found There is no resource with such path.

405 Method Not Allowed The resource does not allow such method.

(26)

Cookies

HTTP is a stateless protocol, i.e., each request is independent from other requests.

Cookiesallow the server to keep state: they are small data pieces associated to a name that the server creates and sends to the client in its response messages, in order for the client to

send them back with its next requests.

(27)

Structure of cookies

A cookie is represented as a short string that contains:

I Aname: a server can set several cookies with different names.

I A value: the actual data of the cookie.

I Attributes:

I DomainandPath: they define in which requests, according to their authority and path, the client will send the cookie to the server.

I ExpiresyMax-Age: they define when the client must stop using the cookie. If not specified, it’s removed when the browser gets closed.

I Secure: the cookie can only be sent through secure channels (HTTPS, typically).

I HttpOnly: the cookie can only be sent or accessed through HTTP or HTTPS. For example, accessing it from JavaScript

(28)

Examples of the use of cookies

Setting cookies(at HTTP responses):

Set-Cookie: sid=4RT67aY...;

Expires=Thu, 13 Feb 2020 21:47:38 GMT;

Path=/; Domain=.example.com; Secure; HttpOnly

Sending cookies back(at HTTP requests):

Cookie: sid=4RT67aY...

(29)

Applications of cookies

I Some typical uses of cookies include:

I Session tracking: the user signs in to create a session (the server sets a cookie that includes a session token). The server identifies future requests as part of the same session because they include the same session token.

I Storing user preferences at the client side: user preferences for a Web site can be stored in cookies inside the user’s Web browser.

I User tracking: Web sites can use cookies to track the user’s behavior (when third parties do that, e.g. with commercial intentions, their use can be considered abusive).

(30)

HTTP over TLS (HTTPS)

HTTP over TLS, also known as HTTPS(Hypertext Transfer Protocol Secure), defines how HTTP is transported

over a secure TLS (Transport Layer Security) channel.

(31)

Security properties of HTTPS

The use of HTTP over TLS provides the following security properties:

I Authentication: the server is always authenticated and, optionally, the client.

I Confidentiality: data sent through the secure channel once it has been established can only be seen by the two end-points of the channel.

I Integrity: any modification to the data sent through the secure channel once it has been established will be detected.

(32)

HTTP/2

HTTP/2was designed to fix some issues in HTTP/1.1 that impact the performance of current Web applications. More specifically, it optimizes how HTTP messages are transported

through the underlying connection.

HTTP/2 keeps HTTP/1.1 message semantics (structure, headers, etc.).

(33)

Main changes in HTTP/2

I The frameis the basic unit of the protocol. It is encoded in binary format in order to speed up its processing. Each HTTP message is encoded as one or more frames.

I Multiple requests and their responses aremultiplexed as independent flows within the same connection, without delays in the processing of some requests affecting the rest of concurrent requests.

I The protocol integrates flow control andflow prioritization mechanisms.

I A server can send responses to the client without the client having sent the corresponding request (server push), thus anticipating future client requests.

I A header compressionmechanism (HPACK) is applied to

(34)

HTTP/3

HTTP/3provides the same HTTP semantics as HTTP/1.1 and HTTP/2, but on top of the QUIC transport protocol instead of TCP:

I QUIC provides applications with flow-controlled streams (ordered sequences of bytes), low latency connection establishment and network path migration.

I QUIC works on top of the UDP protocol.

I QUIC integrates with TLS for security.

(35)

References

I HTTP Semantics. IETF RFC 9110. June 2022.

I HTTP/1.1. IETF RFC 9112. June 2022.

I HTTP/2. IETF RFC 9113. June 2022.

I HTTP/3. IETF RFC 9114. June 2022.

I Uniform Resource Identifier (URI): Generic Syntax. IETF RFC 3986. January 2005.

I HTTP State Management Mechanism. IETF RFC 6265. April 2011.

(36)

Other resources

I MDN Web Docs, “Web Technology for Developers: HTTP”

I Andrew S. Tanenbaum, David J. Wetherall, Computer Networks, 5th ed., Prentice Hall (2010):

I Chapter 7.3 (The World Wide Web).

I Online access at O’Reilly through UC3M Library

Referencias

Documento similar

Abstract: Transepidermal water-loss (TEWL), stratum-corneum hydration (SCH), erythema, elas- ticity, pH and melanin, are parameters of the epidermal barrier function and

The main goal of this work is to extend the Hamilton-Jacobi theory to different geometric frameworks (reduction, Poisson, almost-Poisson, presymplectic...) and obtain new ways,

The paper is structured as follows: In the next section, we briefly characterize the production technology and present the definition of the ML index as the geometric mean of

Para el estudio de este tipo de singularidades en el problema de Muskat, consideraremos el caso unif´ asico, esto es, un ´ unico fluido en el vac´ıo con µ 1 = ρ 1 = 0.. Grosso modo,

teriza por dos factores, que vienen a determinar la especial responsabilidad que incumbe al Tribunal de Justicia en esta materia: de un lado, la inexistencia, en el

Lo que se conocía entonces como Plaza Nueva, no era sino el resultado de uno de los primeros logros en la cubrición del Darro, es decir, surgió gracias a que se ganó al río todo

In the “big picture” perspective of the recent years that we have described in Brazil, Spain, Portugal and Puerto Rico there are some similarities and important differences,

a symptom of disconnected agricultural and sanitation policies ...74 Policy cocktails for protecting coastal waters from land-based activities 84 Pathways to improved water