1 The industry of pure ideas

(1)

Draft 17.14, 9 December 07 20:19 (Zürich)

1 The industry of pure ideas

1.1 THEIR MACHINES AND OURS

Engineers design and build machines. A car is a machine for traveling; an electronic circuit is a machine for transforming signals; a bridge is a machine for crossing a river. Programmers — “software engineers” — design and build machines too. We call our machines programs or systems.

There’s a difference between our machines and theirs. If you drop one of their machines, it will hurt your feet. Ours won’t.

Programs are immaterial. This makes them closer, in some respects, to a mathematician’s theorems or a philosopher’s proposition than to an airplane or a vacuum cleaner. And yet, unlike theorems and propositions, they are engineering devices: you can operate a program, like you operate vacuum cleaners or planes, and get results.

Since one cannot operate a pure idea you will need some tangible, material support to operate programs or, using the more common terms, to run or execute them. That support is another machine: a computer. Computers and related devices are called hardware, indicating that — although they’re getting ever lighter — computers are the kind of machine that will hurt your feet. Programs and all that relates to them are by contrast called software, a word made up in the 1950s when programs emerged as topic of interest.

(2)

The person who writes the program — “you” in the previous paragraph — is predictably called a programmer. Others, whom we call users, can then run your program on your computer, or theirs.

If you have ever used a computer, you’ve run some program, for example to browse the Web or play a DVD, so you already are a user. This book should help you make it to the next step: programmer.

Cynics in the software industry pronounce “user” as “loser”. It’s one of the goals of this book that users of your programs will pronounce themselves winners.

The immaterial nature of the machines we build is part of what makes programming so fascinating. Given a powerful enough computer you can define any machine you want, whose operation will require billions upon billions of individual steps, and the computer will run it for you. You do not need wood or clay or iron or a hammer or anything that could wear you out carrying it up the stairs, burn you, or damage your clothes. State what you want, and you will receive it. The only limit is your imagination.

Well, all right, it is one of two limits; we don’t like to mention the other in genteel company, but you will likely encounter it before long; it is your own fallibility. Nothing personal: if you are like the rest of us, you make mistakes. Lots of mistakes. In ordinary life they are not all harmful, as most human activities are remarkably error-tolerant. You can press your fork a little too intensely, drink your water a little too fast, push the accelerator a little too hard, use the wrong word; this happens all the time and in most cases doesn’t

Computer.

A writes a _Program

User which a

runs on a programmer

(3)

§1.1 THEIR MACHINES AND OURS 7

prevent you from achieving what you wanted: eat, drink, drive, communicate. But programming is different! At a dazzling speed — hundreds of millions of basic operations per second — the computer will run your machine description, your program, exactly as you prepared it. The computer doesn’t “understand” your program, it just runs it; the slightest mistake will be faithfully carried out by the machinery. What you wrote is what you get.

As you learn about programming in the following chapters, this is perhaps the most important property of computers to keep in mind. You might still believe otherwise: because computer programs do things that seem so sophisticated — like finding, in less than a second, your ideal vacation rental from millions of offers available on the World-Wide Web — you may easily succumb to the impression that computers are smart. Wrong. Although some programs may embody considerable human intelligence, the computer that runs them is like a devoted and unsufferable servant: infinitely faithful, almost infinitely fast, and definitely stupid. It will carry out your instructions exactly as you give them, never taking any initiative to correct mistakes, even those which a human being would find obvious and benign. The challenge for you, the programmer, is to feed this obedient brute with flawless instructions representing — in an execution of any significant program — billions of elementary operations.

If you have used computers you know that they do not always react the way you like. It doesn’t take very long to experience a “crash”, that state in which it seems everything goes away and execution stops. But except for the rare case of a hardware malfunction it’s not the computer that crashed; it’s a program that did not do the right thing, and behind the program it’s a programmer who did not foresee all possible execution scenarios.

You cannot learn programming without going through this experience of programs — yours, or someone else’s — that do not work as they should; and you cannot become a professional programmer without learning the techniques that will let you build programs that do work as you want.

(4)

1.2 THE OVERALL SETUP

In the next chapters we are going to jump right into program development. Initially we will not need too much detailed knowledge about computers, but let’s see their fundamental properties, as they set the context for the construction of software.

The tasks of computers

Computers — “automatic stored-program digital computers” to be precise — are machines that can store and retrieve information, perform operations on that information, and exchange information with other devices.

This definition highlights the major capabilities of computers:

Storage and retrieval capabilities are a prerequisite for everything else: computers must be able to keep information somewhere before they can apply operations to it, or communicate it. Such a “somewhere” is called a memory.

Operations include comparisons (“Are these two values the same?”), replacement (“Replace this value by that one”), arithmetic (“Find the sum of these two values”) and others. These operations are primitive; what makes computers able to perform amazing feats is not the intrinsic power of their basic mechanisms, but the speed at which they can carry them out and the ingenuity of the humans — you! — who write programs that will execute millions of them.

Communication allows us to enter information into computers, and retrieve information from them (the original information, or information that has been modified or produced by the computer’s operations). It also enables computers to communicate with other computers and with devices such as sensors, telephones, displays and many others.

What computers do

• Storage and retrieval • Operations

• Communication

(5)

§1.2 THE OVERALL SETUP 9

General organization

The previous definition yields the basic schematic diagram for computers:

The memories hold the information. We talk of memories in the plural because most computers have more than one storage device, of more than one kind, differing by size, speed of access to information and persistence (whether or not a memory retains information when power is switched off).

The processors perform the operations. Again there usually are several of them. Occasionally you will still see a processor called a CPU, an acronym for the older term Central Processing Unit.

The communication devices provide means of interacting with the rest of the world. The figure shows the communication devices as interfacing with the processors rather than the memories; indeed, when exchanging information between a memory and the outside world, you will usually need to go through some operations of a processor. A communication device supports input (outside world to computer), output (the other way around), or sometimes both. Examples include:

• A keyboard, through which a person enters text (input). • A video display or “terminal” (output).

• A mouse or joystick, enabling you to designate points on the terminal screen (input).

• A sensor, regularly sending measurements of temperature or humidity to a computer in a factory (input).

• A network connection to communicate with other computers and devices (input and output).

The abbreviation I/O covers both input and output. The words “input” and “output” are also used as verbs, as in “you must input this text”.

Processors

Memories Communication

Rest

world

devices Components of_{a computer}

system

(6)

Information and data

The key word in the above definition of computers is “information”: what you would like to store into memories and retrieve from them, process through the processors’ operations, and exchange through the communication devices.

This is the human view. Strictly speaking, computers do not directly manipulate information, they manipulate data representing that information:

Some people will tell you that “data” should only be used in the plural, because it’s originally the plural of “datum”. Thank them for the kindness of their advice and disregard it cheerfully. Unless they intend to continue the conversation in Latin, their linguistic data is obsolete.

Information is what you want: the day’s headlines, a friend’s picture, background on someone you’ll be meeting. Data is how it’s encoded for the computer.

As an example, the MP3 audio format, which you may have used to listen to music with the help of a computer, is a way to encode enough information about a piece of music into data that can be stored in a computer, exchanged across a network, and sent to an audio device so that it will replay the music.

The data will be stored in memory. It is the task of the communication devices to produce data from information coming from the world out there, store it in memory, and when the processors transform this data or produce new data, to send it out to the world so that it will understand it as information. Adapted to show the functions performed, the original picture looks becomes this:

The right-to-left arrow suggests that the process is not just one-way but repetitive, with information being repeatedly fed back to yield new results.

Definitions: Data, information

Collections of symbols held in a computer are called data. Any interpretation of data for human purposes is called information.

Process Output

Information and data processing

Input

Information

Data Information

(7)

Computers everywhere

The familiar picture of a computer is the “desktop” or “laptop” computer, whose processor and memory components are hosted in a box of a size somewhere between a textbook like this one and a big dictionary; the terminal is often the biggest part. All this is at human size. At hand size we find such devices as mobile phones, which today are really pocket computers with extended telecommunication capabilities. At the higher end, computers used for large scientific computations (physics, weather prediction...) can reach room size. This is of course nothing compared to computers of a generation ago, which took up building size for much more modest capabilities.

Reduced to their central processor and memory components, computers can be much smaller than any of this. Increasingly, “the computer” is a device included — the technical term is embedded — in products or other devices. Today’s cars include dozens of small computers, controlling fuel delivery, braking, even windows. The printer connected to your desktop computer is not just a printing engine, it is itself a computer, able to produce fonts, smooth images, restart on the next page after a paper jam. Electric razors include computers; manual razors might include one some day. (The more expensive razor blades already contain electronic tracking tags to fight theft.) Washing machines contain computers, and in the future clothes may have their own computers, helping to tune the washing process.

Computers: desktop (a); laptop (b); PDA (c); processor to be embedded (d).

(a)

(b)

(c)

(8)

The computers you will use for exercises of this book are still of the keyboard-mouse-terminal-box kind, but keep in mind that software techniques have to cover a broader scope. Software for embedded systems must satisfy very high quality requirements: malfunctions in (for example) brake-control software can have terrible consequences, and you cannot fix them — as you would for a program running on your laptop — by stopping execution, correcting the error, and starting again.

The stored-program computer

A computer, as noted, is a universal machine: it can execute any program that you input into it.

For this input process you’ll use communication devices, typically a keyboard and mouse. Text will appear on your terminal screen as you type it, seemingly as a direct result, but this is an illusion. The keyboard is an input device, the terminal a distinct output device; echoing the input text on the screen requires a special program, a text editor, to obtain this input, process it and display it. Thanks to the speed of computers, this usually happens fast enough to give the illusion of a direct keyboard-screen connection; but if the computer responds more slowly, perhaps because it’s running too many programs at the same time, you may notice a delay between typing characters and seeing them displayed.

When you input the program, where does it go? Memories are available to host it. That’s why we talk of stored-program computers: to become a specific machine ready to carry out the specific tasks that you (as the programmer) have assigned to it, the computer will read its orders from its own memory.

This property of computers explains why we have not seen a proper definition of “memory” yet. It could have said that a memory is a device for storing and retrieving data, where (in accordance with the notion of stored-program computer) “data” includes programs. But it is clearer to separate the two notions:

This ability of computers to treat programs as data — executable data — explains their remarkable flexibility. At the dawn of the computer age, it led to visions of self-modifying programs (since a program can modify data, it can modify programs, including itself) and to some grand philosophizing about how programs were going, through repeated self-modification, to become ever more “intelligent” and take over the world. Closer to us, it’s also the reason why email users are told to be careful about opening email attachments, since the data they contain could be a maliciously written program, whose execution will destroy other data.

Definition: Memory

(9)

For programmers, the stored-program property has a more immediate consequence: it makes programs amenable, like any other kinds of data, to various transformations through computer operations. In particular, the program you write is usually not the program you run. Codes that a processor can execute are designed for machines, not humans; using them directly to construct your programs would be tedious and error-prone. Instead you will:

• Write programs in notations designed for human consumption, called programming languages. This form of a program is called its source text (or source form, or just source).

• Rely on special programs called compilers to transform such human-readable program texts into a form (their target form) appropriate for processor execution.

We’ll often encounter the following terms reflecting this division of tasks:

The details of all this — processor codes, programming languages, compilers, examples of static and dynamic properties — appear in later chapters. What matters for the moment is knowing that the programs you are going to write, starting with the next chapter, are meant for people as well as for computers.

This human aspect of programming is central to the engineering of software. When you program you are talking not just to your computer but also to fellow humans: whoever will be reading the program later, for example to add new functions or correct a mistake. That’s a good reason to worry about program readability; and it’s not just a matter of being nice to others, since that whoever might be you, a few months older, trying to decipher what in the world you had in mind when writing the original version.

Throughout this book we’ll emphasize, along with practices that make your programs good for the computer — for example, designing programs so that they will run fast enough — practices that make them good for human readers. Program texts should be understandable; programs should be extendible (easy to change); program elements should be reusable, so that when later on you are faced with a similar problem you don’t have to reinvent the solution; programs should be robust, protecting themselves against unexpected input and abnormal circumstances; most importantly, they should be correct, producing the expected results.

Definitions: Static, Dynamic

Static properties of a program are properties of its source text, which can

be analyzed by a compiler. Dynamic properties are those characterizing its individual executions.

(10)

1.3 KEY CONCEPTS LEARNED IN THIS CHAPTER

• Computers are general-purpose machines. Providing a computer with a program turns it into a special-purpose machine.

• Computer programs process, store and communicate data representing information of interest to people.

• A computer consists of processors, memories and communication devices. These material devices together make up hardware.

• Programs and associated intellectual value are called software. Software is an engineering product of a purely intellectual nature.

• Programs must be stored in memory prior to execution. They may have several forms, some readable and intended for human use, others directly processable for execution by computers.

Touch of history:

It’s all in the holes

Aerospace industry old-timers tell the story of the staff engineer who, in an early rocket project, was in charge of tracking the weight of everything that would get on board. He kept pestering the programmers about how much the control software would weigh. The reply, invariably, was that the software would weigh nothing at all; but he was not convinced.

One day he came into the head programmers’ office, waving a deck of punched cards (the input medium of the time, see the picture): “This is the software”, he said, “Didn’t I tell you it had a weight like everything else!”. This did not deter the programmer: “See the holes? They are the software.”

A deck of punched cards

(11)

§1-E EXERCISES 15

• Computers appear in many different guises; many are embedded in products and devices.

• Programs must be written to facilitate understanding, extension and reuse. They must be correct and robust.

New vocabulary

At the end of every chapter you’ll find such a list. Check (this is the first exercise in the chapter) that you know the meaning of each term listed; if not, find its definition, as you’ll need all terms in subsequent chapters. To find a definition, look up the Index, where definition pages appear in bold.

1-E EXERCISES

1-E.1 Vocabulary

Give a precise definition of each of the terms in the above vocabulary list.

1-E.2 Data and information

For each of the following statements, say whether it characterizes data, information or both (explain):

1 • “You can find the flight details on the Web.”

2 • “When typing into that field, use no more than 60 characters per line.” 3 • “Your password must be at least 6 characters long.”

4 • “We have no trace of your payment.”

5 • “You can’t really appreciate her site without the Macromedia Flash plug-in.”

6 • “It was nice to point me to your Web page, but I can’t read Italian!” 7 • “It was nice to point me to your Web page and I’d like to read the part in

Russian, but my browser displays Cyrillic as garbage.”

Communication device Compiler Computer

Correct CPU Data

Dynamic Embedded Extendible

Hardware Information Input

Output Memory Persistence

Processor Programmer Programming language

Reusable Robust Software

Source Static Target

(12)

1-E.3 Defining precisely something that you’ve always known

You know about alphabetical order: the order in which words are listed in a dictionary or other “alphabetical” list. Alphabetical order specifies, of two different words, which is “before” the other. For example the word sofa is before soft, which itself is before software.

The question you are asked in this exercise is simply:

That is to say, define alphabetical order. This is a notion that you undoubtedly know to apply in practice, for example to look up your name in a list of candidates to an exam; what the exercise requests is a precise definition of this intuitive knowledge, of the kind you might need for a mathematical notion — or for a concept to be implemented in a program.

To construct your definition you may assume that:

• A word is a sequence of one or more letters. (It’s also OK to use “zero or more letters”, that is to say accept the possiblity of empty words, if you find this more convenient. Say which version you are using.)

• A letter is one among a finite number of possibilities.

• The exact set of letters doesn’t matter but for any two letters it is known which one is “smaller” than the other. For example, with letters of the Roman alphabet,a is smaller thanb,b is smaller thanc and so on.

If you prefer a fully specified set of letters, just take it to include the twenty-six used in common English words, lower-case only, no accents or other diacritical marks:a b c d e f g h i j k l m n o p q r s t u v w x y z, each “smaller” than the next.

The problem calls for a definition, not a recipe. For example, an answer of the form “You first compare the first letters of the two words; if the first word’s first letter is smaller than the second word’s first letter then the first word is before the second, otherwise...” etc. is not acceptable since it is the beginning of a recipe, not a definition. A proper definition might start: “A wordw1 is before a wordw2 if and only if any of the following conditions holds: ...”.

Make sure that your definition covers all possible cases, and respects the intuitive properties of alphabetical ordering; for example it is not possible to have bothw1 beforew2 andw2 beforew1.

About this exercise: The purpose is to apply the kind of precise, non-operational reasoning essential in good software construction. The idea is borrowed from a comment of Edsger Dijkstra, a famous Dutch computer scientist.