Vertido 5 – Alivio de pluviales
H. CONDICIONES DE EXPLOTACIÓN DISTINTAS A LAS NORMALES
I. CIERRE DE LA INSTALACIÓN
• To understand the Common Gateway Interface (CGI)
protocol.
• To understand the Hypertext Transfer Protocol
(HTTP).
• To implement CGI scripts.
• To use XHTML forms to send information to CGI
scripts.
• To understand and parse query strings.
• To use module cgi to process information from
XHTML forms.
This is the common air that bathes the globe. Walt Whitman
The longest part of the journey is said to be the passing of the gate.
Marcus Terentius Varro
Railway termini...are our gates to the glorious and unknown. Through them we pass out into adventure and sunshine, to them, alas! we return.
E. M. Forster
There comes a time in a man’s life when to get where he has to go—if there are no doors or windows—he walks through a wall.
6.1 Introduction
The Common Gateway Interface (CGI) describes a set of protocols through which appli- cations (commonly called CGI programs or CGI scripts) interact with Web servers and indirectly with Web browsers (e.g., client applications). A Web server is a specialized software application that responds to client application requests by providing resources (e.g. Web pages). CGI protocols often generate Web content dynamically. A Web page is dynamic if a program on the Web server generates that page’s content each time a user requests the page. For example, a form in a Web page could request that a user enter a zip code. When the user types and submits the zip code, the Web server can use a CGI pro- gram to create a page that displays information about the weather in that client’s region. In contrast, static Web page content never changes unless the Web developers edit the doc- ument.
CGI is “common” because it is not specific to any operating system (e.g., Linux or Windows), to any programming language or to any Web server software. CGI can be used with virtually any programming or scripting language, such as C, Perl and Python. In this chapter, we explain how Web clients and servers interact. We introduce the basics of CGI and use Python to write CGI scripts.
The CGI protocol was developed in 1993 by the National Center for Supercomputing Applications (NCSA—www.ncsa.uiuc.edu), for use with its HTTPd Web server. NCSA developed CGI to be a simple tool to produce dynamic Web content. The simplicity of CGI resulted in its widespread use and in its adoption as an unofficial worldwide pro- tocol. CGI was quickly incorporated into additional Web servers, such as Microsoft Internet Information Services (IIS) and Apache (www.apache.org).
Outline
6.1 Introduction
6.2 Client and Web Server Interaction 6.2.1 System Architecture 6.2.2 Accessing Web Servers 6.2.3 HTTP Transactions 6.3 Simple CGI Script
6.4 Sending Input to a CGI Script
6.5 Using XHTML Forms to Send Input and Using Module
cgi
to Retrieve Form Data6.6 Using
cgi.FieldStorage
to Read Input6.7 Other HTTP Headers
6.8 Example: Interactive Portal
6.9 Internet and World Wide Web Resources
6.2 Client and Web Server Interaction
In this section, we discuss the interactions between a Web server and a client application. A Web page, in its simplest form, is either a Hypertext Markup Language (HTML) docu- ment or an Extensible Hypertext Markup Language (XHTML) document. (In this chapter, we use XHTML.) An XHTML document is a plain-text file that contains markup, or tags, which describe how the document should be displayed by a Web browser. For example, the XHTML markup
<title>My Web Page</title>
indicates that the text between the opening <title> tag and the closing </title> tag is the Web page’s title. The browser renders the text between these tags in a specific manner.
XHTML requires syntactically correct documents—markup must follow specific rules. For example, XHTML tags must be in all lowercase letters and all opening tags must have corresponding closing tags. We discuss XHTML in detail in Appendix I and Appendix J.
Each Web page has a unique Uniform Resource Locator (URL) associated with it—an address of sorts. The URL contains information that directs a browser to the resource (most often a Web page) the user wishes to access. For example, consider the URL
http://www.deitel.com/books/downloads.html
The first part of the address, http://, indicates that the resource is to be obtained using the Hypertext Transfer Protocol (HTTP). During this interaction, the Web server and the client communicate using the platform-independent HTTP, a protocol for transferring re- quests and files over the Internet (e.g., between Web servers and Web browsers). Section 6.2.3 discusses HTTP.
The next section of the URL—www.deitel.com—is the hostname of the server, which is the name of the server computer, the host, on which the resource resides. A domain name system (DNS) server translates the hostname (www.deitel.com) into an Internet Protocol (IP) address (e.g., 207.60.134.230) that identifies the server computer (just as a telephone number uniquely identifies a particular phone line). This translation opera- tion is a DNS lookup. A DNS server maintains a database of hostnames and their corre- sponding IP addresses.
The remainder of the URL specifies the requested resource—/books/down-
loads.html. This portion of the URL specifies both the name of the resource (down-
loads.html—an HTML/XHTML document) and its path (/books). The Web server maps the URL to a file (or other resource, such as a CGI program) on the server, or to another resource on the server’s network. The Web server then returns the requested document to the client. The path represents a directory in the Web server’s file system. It also is possible that the resource is created dynamically and does not reside anywhere on the server computer. In this case, the URL uses the hostname to locate the correct server, and the server uses the path and resource information to locate (or create) the resource to respond to the client’s request. As we will see, URLs also can provide input to a CGI program residing on a server.
6.2.1 System Architecture
A Web server often is part of a multi-tier application, sometimes referred to as an n-tier application. Multi-tier applications divide functionality into separate tiers (i.e., logical
groupings of functionality). Tiers can be located on a single computer or on multiple com- puters. Figure 6.1 presents the basic structure of a three-tier application.
The information tier (also called the data tier or the bottom tier) maintains data for the application. This tier typically stores data in a relational database management system (RDBMS). We discuss relational database management systems in further detail in Chapter 17, Database Application Programming Interface (DB-API). For example, a retail store may have a database for product information, such as descriptions, prices and quan- tities in stock. The same database also may contain customer information, such as user names, billing addresses and credit-card numbers.
The middle tier implements business logic and presentation logic to control interac- tions between application clients and application data. The middle tier acts as an interme- diary between data in the information tier and the application clients. The middle-tier controller logic processes client requests from the client tier (e.g., a request to view a product catalog) and retrieves data from the database. The middle-tier presentation logic then processes data from the information tier and presents the content to the client.
Business logic in the middle tier enforces business rules and ensures that data are reli- able before updating the database or presenting data to a client. Business rules dictate how clients can and cannot access application data and how applications process data.
The middle tier also implements the application’s presentation logic. Web applications typically present information to clients as XHTML documents (older applications present information as HTML). Many Web applications present information to wireless clients as Wireless Markup Language (WML) documents. We discuss WML in detail in Chapter 23, Case Study: Online Bookstore.
The client tier, or top tier, is the application’s user interface. Users interact with the application through the user interface. This causes the client to interact with the middle tier to make requests and to retrieve data from the information tier. The client then displays to the user the data retrieved from the middle tier.
6.2.2 Accessing Web Servers
To request documents from Web servers, users must know the machine names (called host- names) on which Web server software resides. Users can request documents from local Web servers (i.e, those that reside on users’ machines) or remote Web servers (i.e., those that reside on different machines).
Fig. 6.1 Fig. 6.1 Fig. 6.1
Fig. 6.1 Three-tier application model. Application
Middle Tier Information Tier
Database Client Tier
We can request document from local Web servers through the machine name or through localhost—a hostname that references the local machine. We use local-
host in this book. To determine the machine name in Windows 98, right-click Network
Neighborhood, and select Properties from the context menu to display the Network
dialog. In the Network dialog, click the Identification tab. The computer name displays in the Computername: field. Click Cancel to close the Network dialog. In Windows 2000, right click MyNetworkPlaces and select Properties from the context menu to display the Network andDialupConnections explorer. In the explorer, click Net-
workIdentification. The Full ComputerName: field in the System Properties
window displays the computer name. To determine the machine name on most Linux machines, simply type the command hostname at a shell prompt.
A client also can access a server by specifying the server’s domain name or IP address (e.g., in a Web browser’s Address field). A domain name represents a group of hosts on the Internet; it combines with a hostname (such as www—a common hostname for Web servers) and a top-level domain (TLD) to form a fully qualified hostname, which provides a user-friendly way to identify a site on the Internet. In a fully qualified hostname, the TLD often describes the type of organization that owns the domain name. For example, the com TLD usually refers to a commercial business, whereas the org TLD usually refers to a non- profit organization. In addition, each country has its own TLD, such as cn for China, et for Ethiopia, om for Oman and us for the United States.
6.2.3 HTTP Transactions
Before exploring how CGI operates, it is necessary to have a basic understanding of net- working and the World Wide Web. In this section, we discuss the technical aspects of how a browser interacts with a Web server to display a Web page and we examine the Hypertext Transfer Protocol (HTTP). We also explore HTTP’s components that enable clients and servers to interact and exchange information uniformly and predictably.
An HTTP request often posts data to a server-side form handler that processes the data. For example, when a user participates in a Web-based survey, the Web server receives the information specified in the XHTML form as part of the request.
When a user enters a URL, the client has to request that resource. The two most common HTTP request types (also known as request methods) are get and post. These request types retrieve resources from a Web server and send client form data to a Web server. A get request sends form content as part of the URL. For example, in the URL
www.somesite.com/search?query=value
the information following the ? (query=value) indicates the user-specified input. For ex- ample, if the user performs a search on “Massachusetts,” the last part of the URL would be
?query=Massachusetts. Most Web servers limit get request query strings to 1024 characters. If the query string exceeds this limit, the post request must be used. The data sent in a post request is not part of the URL and cannot be seen by the user. Forms that con- tain many fields are submitted most often by post requests. Sensitive form fields, such as passwords, usually are sent using this request type.
To make the request, the browser sends an HTTP request message to the server (step 1, Fig. 6.2). HTTP has two request types, get and post. The get request (in its simplest form) follows the format: GET /books/downloads.html HTTP/1.1. The word GET is an
HTTP method indicating that the client is requesting a resource. The next part of the request provides the name (downloads.html) and path (/books/) of the resource (an HTML/ XHTML document). The final part of the request provides the protocol’s name and version number (HTTP/1.1).
Servers that understand HTTP version 1.1 translate this request and respond (step 2, Fig. 6.2). The server responds with a line indicating the HTTP version, followed by a status code that consists of a numeric code and phrase describing the status of the transaction. For example,
HTTP/1.1 200 OK indicates success, while
HTTP/1.1 404 Not found
informs the client that the requested resource was not found on the server in the location specified by the URL.
Browsers often cache (save on a local disk) Web pages for quick reloading, to reduce the amount of data that the browser needs to download. However, browsers typically do not cache server responses to post requests, because subsequent post requests may not contain the same information. For example, several users who participate in a Web-based survey Fig. 6.2
Fig. 6.2 Fig. 6.2
Fig. 6.2 Client interacting with server and Web server. Step 1: The request,
GET
/books/downloads.html
HTTP/1.1
.Fig. 6.2 Fig. 6.2 Fig. 6.2
Fig. 6.2 Client interacting with server and Web server. Step 2: The HTTP response,
HTTP/1.1200OK
.Internet
Web server Client
The client sends the get request to the Web server.
1 After it receives the
request, the Web server searches through its system for the resource. 2
Internet
Web server Client
The server responds to the request with an appropriate message, along with the resource contents.