17 PATRONES DE CAMBIO - EL TAO DE LA FÍSICA

The code base on which the DEC OSF/ l product is built, i.e., the Open Software Foundation's OSf'/1 software, provides a strong foundation for SMP. The OSF further strengthened thi s foundation in OSF! l

ve rsions 1 . 1 and 1 .2, when it corrected multiple Si'vll' problems in t he code base and parallel ized (and thus unfunneled) additional subsystems. As the mu ltiprocessing bootstrap effort continued , the team analyzed and incorporated the OSF/

I

ver sion 1 . 2 S1Y!P improvements into DEC OSF!l version

3.0. As strong as this starting point was, however, some st ructures in the system did not receive the

DEC OSF/ 1 Version .). 0 Syrnmetric Multiprocessing Implementation

appropriate level of synchronization. The team cor rected these problems as they were uncovered through testing and code inspection.

The DEC OSF/ 1 _{operating system uses a combina} tion of simple locks, complex locks, elevated SPL, and funneling to guarantee synchronized access to system resources and data structures. Sim ple locks, SPL. _{and fu nnel ing were described briefly in the} earl ier d iscussion of preemption. Complex locks. l i ke elevated SPL, _{are used in both uniprocessor and} m u l tiprocessor enviro nments. These locks are usu ally sleep locks- threads can block while they wait for the lock-which offer additio nal features, i nclud i ng m u l t iple-reader/single-writer access and recursive acquisition.

An example of the use of each synchronization technique fol lows:

• A simple lock is used to protect the kernel 's cal l

out (timer) queue. In an S,\1 P environment, m u l tiple threads can update the ca l lout queue at the same time. as each of them adds a t i mer entry to the queue. Each thread must obtain the call out lock before adding an entry and release the lock when done. The cal lout simple lock is also a good example of SPL synchron ization under multiprocessing because the cal lout queue is scanned by t he system clock l S R . _Therefore, before locking the cal lout Jock, a thread m u st raise the SPL _{to the clock's}lPL. _{Otherwise, the} thread holding the cal lout lock at an SPL of zero can be interrupted by the clock I S R , which w i l l in tu rn attempt to take the callout lock. The resu lt is a permanent dead lock.

• A complex lock protects the file system direc

tory structure. A blocking lock is requ ired because the d i rectory lock holder m ust perform I/O _{to update the directory, which itself can} block. Whenever block ing can occur w h ile a lock is hel d , a complex lock is requi red.

• Fu nnel ing is used to synchronize access to the

I S O ₉₆₆₀_CD-ROM_{file syste m . - The decision to} fu n nel this file system was .largely due to l i mi ta tions in the DEC OSI'/ 1 _{version 3.0 schedu le;} however, the file system is a good cho ice for fun nel ing because of its general l y slow operation and I ight usage.

To ensure adequate performance and seal ing as processors are added to t he system , an SJ\II P _imple mentation must provide fo r as much paral lel ism through the kernel as possible. The granularity of

Digital Technical jourual Vol. (J No . . > Summer 19')4

locks placed in the system has a major i mpact on the amount of paral lelism obtained .

During multiprocessing developmen t, locking strategies were designed to

• _{Reduce the total number of locks per su bsystem} • Reduce the number of locks ta ken per subsys

tem operation

• Improve the level of paral lelism throughout the

kernel

At t i mes, t hese goals clashed: enhancing paral lelism usually involves add ing a lock to some struc ture or code path. This outcome confl icts with the goal of reducing lock counts. Consequent ly, in prac tice. the process of successfu l l y para l lel izing a sub system involves striking a balance between lock red uction and the resulting increase i n lock granu larit y. Often, benchmarking different approaches is required to fine- tune this balance.

Several general trends were uncovered during lock analysis and tuning. In some cases locks were removed because they were not needed; they were the products of overzealous synchro nization. For example, a structure that is pr ivate to a thread may require no lock ing at a l l . Moreover, a data ele ment t hat is read atomica l l y needs no locking. An

example of lock removal is the getti meofday( ) sys tem cal l, which is used frequent ly by DBMS _servers. The system cal l simply reads the system time, a 64- bit quantity, and copies it to a buffer provided by the cal ler. The original OSF/ 1 _{system cal l . running on a} 32-bit architecture, had to take a simple lock before reading the time to guarantee a consistent value. On the Alpha archi tecture, the system call can read the ent ire 64 -bit time value atomical ly. Removing the lock resulted in a 40 percent speed up.

In other cases, analyzing how structures are used revealed that no lock ing was needed. For example, an 1/0 control block cal led the buf structure was being locked in several device drivers while the block was in a state that al lowed only the device driver to access i t . Removing t hese unnecessary locks saved one complex and one simple locking sequence per l/0 _{operation in these drivers.}

Another effective optim ization involved post poning lock i ng until a thread determined that it had actual work to do. This technique was used success fu l l y in a routine frequently cal led in a transaction processing benchmark. The routi ne, which was locking structures in anticipation of fol lowing a rarely used code path, was mod ified ro lock only

DEC OSF/1 Synunetric Multiprocessing

when the u ncommon code path was needed . This optimization significantly reduced lock overhead. To improve paral lel ism across the system, the

DEC OSf/1 SMP development team modified the lock

strategies in numerous other cases.

In document EL TAO DE LA FÍSICA (página 105-115)