Usually the host application program has no knowledge of the actual physical layout of data on the disk device; it knows about logical files of information specific to the application. The disk drive on the other hand knows nothing about applications or files. It only knows about blocks and sectors formatted on the physical storage media.
2.7.1 File system I/O
The application program makes an I/O request to a File System, which is an integral part of the operating system (OS) of the host server. The File System defines the directory structure which subdivides disk partitions into smaller files, assigns names to each file, and manages the free space available where new files can be created. For instance, in the Windows NT world, the standalone computer’s file system is known as NT File System (NTFS).
The OS manages the scheduling of system resources. It is responsible for routing the I/O request from the application, through the appropriate processes, and finally to the device driver, which controls the operation of the specific storage device.
The File System controls the organization of the data on to the storage device. It also manages a cache buffer for data in the server memory. On receiving the I/O the File System decides whether the file has a valid name. Does a file with this name already exist or must one be created? It determines if the file is read-only, or if data may be written to it, for instance. It establishes if there is an appropriate directory where the user is writing the file, and if there is enough space on the disk for the file to be written; and other such checks. Then, if appropriate, it decides where to place the file on the device.
Although the File System does not deal directly with the physical device, it does have a map of where data is located on the disk drives. This map is used to allocate space for the data, and to convert the file I/O request into storage I/O protocols. The I/O must go to the device in a format which is understandable to the device; in other words, in some number of “block-level” operations. The File System therefore creates for the I/O some metadata (data describing the data), and adds information to the I/O request which defines the location of the data on the device.
The File System deals with a logical view of the physical disk drives. It maps data on to logical devices as evenly as possible in an attempt to deliver consistent performance. It passes the I/O request via a volume manager function, which processes the request based on the configuration of the disk subsystem it is managing. Then the volume manager passes the transformed I/O to the device driver in the operating system.
The device driver reads or writes the data in blocks. It sizes them to the specific data structure of the storage media on a physical device, such as a SCSI disk drive. SCSI commands contain block information mapped to specific sectors on the surface of the physical disk. This block information is used to read and write data to and from the block table located on the disk device.
A File System is designed to provide generalized services for many applications and different types of data. The whole process of directing the I/O via the OS File System is known as “file system I/O” (commonly abbreviated to the term “file I/O”). A file I/O is known as “cooked” in the UNIX world, because it provides pre-programmed, ready-to-use services.
2.7.2 Raw I/O
Some database applications use the OS File System facilities, opening a file for update, and leaving it open while it makes I/O requests periodically to update blocks within the file.
However, database applications are generally not oriented to file structures, but instead are “record” oriented, using a great deal of indexing to database tables. Different databases may have very specific I/O requirements, depending on the applications they support. For instance, a data mining database system may have very long streaming I/Os, whereas a transaction oriented database is likely to generate many short bursts of small I/Os.
High performance is frequently paramount for a database application, and use of generalized file services may not deliver good results. For instance, each I/O may involve many thousands of processor instructions. It is therefore common that a database application bypasses the File System, and itself manages the structure, caching and allocation of data storage.
In this case, the database applicationprovides its own mechanism for creating an I/O request. It reads and writes blocks of data directly to a
raw partition
, and provides its own volume management functions. The database assumes control over a range of blocks (or sectors) on the disk. This range of blocks is called the “raw partition.” It then directly manages the system software component of the I/O process itself. In effect the raw partition takes the role of the File System for the database I/O operations.The database provides its own complete method of handling the I/O requests. This includes maintenance of a tailored table, or index, which knows the location of records on the disk devices. When it recognizes that an I/O operation is required it uses this table, and directs the record-level I/O through the raw partition to the device driver, which reads or writes the data in blocks to the disk. The database application also handles security locking at the record level, to prevent multiple users updating the same record concurrently. Some other applications, especially those which stream large amounts of data to and from disk, also generate “raw I/O”.
Raw partitions can be totally optimized to the specific application or database (Oracle, UDB — formerly DB2, Sybase and so on), and tuned for its unique requirements to achieve optimal performance.
2.7.3 Local and SAN attached storage block I/O summary
A fundamental characteristic of DAS, and SAN implementations (unlike TCP/IP network storage devices) is that, regardless of whether the application uses “cooked” or “raw” I/O (that is,
file system
orblock
access) all I/O operations to the device are translated to storage protocol I/Os. That means they are formattedin the server
by the database application, or by the operating system, intoblocks
that reflect the address and structure of the data on the physical disk device. The
blocks
are moved on the SCSI bus, or the Fibre Channel connection, to the disk device. Here they are mapped to a block table in the storage device I/O bus, and from there to the correct sector on the media. In mainframe parlance this is achannel I/O
. A file system and a “raw partition” I/O are illustrated in Figure 2-8.Figure 2-8 Tracing a local or Fibre Channel SAN block I/O’