• No se han encontrado resultados

1.2. ANTECEDENTES DE INVESTIGACIÓN

2.1.6. Tipos de Riesgos Laborales: Tenemos los siguientes:

2.1.6.3. Riesgos Biologicos

Setting up FAC SIM ILE

The computer analysis of the experimental data was performed using a computer program called FACSIMILE, developed by the United Kingdom Atomic Energy Authority at Harwell. When the analysis was first begun, two versions of the program were available. One version ran on an IBM personal computer, and the other ran on a mainframe computer, and at that time it was adapted to the AMDHAL 5890 computer in Manchester, running the IBM Conversational Monitor System (CMS). The mainframe was accessed via the Joint Academic Network (JANET), which links the Universities in the UK so as to facihtate the exchange of data. Both o f these versions were designed for computer experts and neither o f them was user- friendly. The instruction manual is effectively incomprehensible to the non-mathematician and presented great problems, and a substantial amount of time was spent on writing and debugging programs to solve the kinetic problem.

The analysis was initially carried out on the mainframe computer. This had the advantage that the execution time was very short. However, access to the mainframe was very difficult, and the file editor used for writing/editing programs was very unsatisfactory. Thus, the mainframe version o f FACSIMILE was replaced by the PC version. At that time, the PC version was written for use on an IBM XT machine which uses the Intel 8088 microprocessor. The Intel 8088 microprocessor is a second generation microprocessor which recognises 16-bit words and which has an internal clock running at 6 MHz. The PC used for the analysis o f the data was an IBM PS2 model 30/286; this uses an Intel 80286 processor, which also recognises 16-bit words but has a faster internal clock running at 10 MHz.

At this stage, severe problems arose because of the limited memory capacity of the IBM PS2 30/286 model, with only secondary storage (i.e. storage for the hard-disk), 1 Mbyte RAM and no cache. The operating system (which is the main communications medium between the user and the hardware) used at the time was DOS 3, which can only address 640 kbytes of RAM, part of which is taken up by DOS 3 itself. This leaves very little space for FACSIMILE to run. FACSIMILE itself is a big program, but its storage requirement does not end here. FACSIMILE is written in FORTRAN 77, and so it requires a FORTRAN 77 compiler for execution.

Obviously, after accommodating DOS 3 and the FORTRAN 77 compiler in the RAM there is insufficient memory left to hold the entire FACSIMILE program. The capacity o f RAM

had to be expanded, and this was done with the aid of a piece of software known as a memory manager, which is designed to obtain the best possible use of RAM (and cache, if installed). Some of the secondary memory is made to behave like RAM, so that large chunks of programs normally resident in RAM such as DOS 3 and compilers can be stored in such expanded memory, leaving more RAM memory available for the execution o f application programs. However, despite this improved memory handling, there was still insufficient RAM to store the whole FACSIMILE program, and parts o f it had to be stored in the hard disk, and the program then worked by cycling bits of the program and data to and from the hard disk.

The combination of the slowness of the Intel 80286 processor and the memory limitation had severe effects on the speed of calculations. A single calculation could take anything between an hour to more than five hours to complete. This did not appear a serious problem at first because it was thought that only a few runs would be required to analyse the data. However, this view soon changed as it was realised that writing a program to analyse the data is an evolutionary process, and at least a hundred runs were necessary to get the program to work. The processing and storage limitations of the computer therefore became a big handicap. The long calculation times proved to be an insurmountable difficulty in carrying out this evolutionary task because the slightest error in coding, for example, missed or incorrect punctuation, might not come to light until the calculation was complete or crashed several hours after it has been executed.

At this point, a return to the mainframe computer was considered. Unfortunately this became impossible because the mainframe at Manchester had been changed to a UNIX system, and FACSIMILE could not run on this system. A great deal of time and effort was spent (Dr. E. M. Chance) trying to adapt FACSIMILE to the UNIX system, but that adaptation was never completed. Eventually, one of the original authors of FACSIMILE, Dr. A. R. Curtis, revised it to run on the Intel 80386 or 80486 processor. In order to use this new PC version of FACSIMILE, the existing IBM PS2 30/286 model was upgraded with a 386 SX board (Kingston SX Now, from Kingston Technology, Corporation, Fountain Valley, CA 92708, USA). The Kingston SX 386 board has a 16-bit 25 MHz microprocessor chip that is equivalent to the Intel 80386 chip and a 16 kbyte memory cache. At the same time, a 25 MHz Kingston 80387 SX maths co-processor was installed, and the RAM was upgraded fi'om 1 Mbyte to 4 Mbytes by replacing the four 256 kbytes memory chips originally present in the computer with four 1 Mbyte memory chips. With an improved RAM capacity, a faster processor (80/386 running at 25 MHz) working in conjunction with a 25 MHz maths co-processor, and a 16 kbyte cache, the calculation time was dramatically decreased. What used to take hours to run now requires only a few minutes. In practice, the cache makes a great difference to the run time of FACSIMILE. For instance, a fairly simple calculation which took 389 seconds o f CPU time to

complete without the cache, took merely 143 seconds when the cache was installed. Another advantage that came with the upgrade to an IBM 386-equivalent was that the machine became compatible with DOS 5, an updated version of the DOS 3 operating system. DOS 5 has a file editor that is far superior to that was originally used. In practical terms, this means that writing and editing the programs became much easier and faster.

Although setting up a PC to run FACSIMILE is now relatively easy, writing a program to solve an experimental model using FACSIMILE still remains a difficult task for the non-computer expert. FACSIMILE is written in FORTRAN 77, so the investigator has to write the whole analysis program in FORTRAN 77, from declaration o f the parameters, defining the model used, setting up arrays for the experimentally derived data, setting up work space for calculations, telling the computer when to execute the program and how to analyse the data (e.g. which data to include in the analysis, what error margin to allocate, and setting the maximum number of reiterations and etc.), to perform a simulation based on the calculated results and how to display the output o f the analysis (in graphical presentation or in a table and etc.). This is illustrated in the final part of this section, where details of how the solution to the problem of incorporation by Klenow fragment of thymine and of cytosine opposite 0^-meG in the template DNA was arrived are described. But before this is presented, it is necessary to understand how FACSIMILE works, and this is described in the following paragraphs.

W hat FACSIM ILE does

The aim of the mathematical analysis of the experimental data is to determine all or most o f the rate constants for all the steps involved in the polymerisation of thymine and of cytosine opposite O^-meG in the template strand. There are two ways o f solving rate equations. The first is to solve the rate equations algebraically. For example, for a simple first-order reaction A B, the differential rate equation is d[A]/dt = -k[A] where k is the rate constant. Upon integration, this becomes [A] = [AgJcC'^), and a value for k can be solved by measuring [A] at different reaction times (t). However, this approach becomes very difficult with a multistep reaction. Consider a slightly more complex reaction scheme involving 2 steps,

k 1 k o

A + B

AB

C

kj

k 4

There are four rate constants, and a solution to all four rate constants would require solving the four simultaneous differential rate equations that govern this reaction;

d[A]/dt = k2[A.B]-kj[A].[B], d[B]/dt = k2[A.B]-kj[A].[B],

d[A.B]/dt = k i[A ].[B ]-(k2 + k3)[A.B] + k^[C], and d[C]/dt = kg[A.B] - k^[C].

To solve these equations would require measurement of rates of change o f substrate, intermediate and product, which may be very difficult especially if the transient intermediates are difficult to detect. A multistep reaction scheme involving more substrates and intermediates would be more difficult to solve by this approach. The sets of simultaneous rate equations can sometimes be simplified by making certain assumptions, but the assumptions have to be very carefully considered. For example, as pointed out before, Boosalis et al. (1987) made the assumption that the dissociation o f polymerase fi'om DNA during steady state polymerisation is reversible only for the substrate DNA, but not for product DNA in order to reduce the number o f unknown rate constants in their calculations. This assumption was not substantiated and from the experiments on Klenow fragment and T7 DNA polymerase, it is clear that this assumption is not valid. In addition, one of the most common errors in the algebraic approach o f solving rate equations is the occurrence of mistakes in the rate equation itself. As noted by Davis (1992), "kinetic equations are notorious for the number of typographical and algebraic errors that appear in published material", and one has to ensure that the rate equation used is correct before proceeding with further calculations.

An alternative approach to solutions of kinetic problems is to integrate the rate equations numerically. This is particularly useful in cases where the kinetic scheme is unknown and involves many substrates, and has become more popular with the availability of the appropriate computer software. The principle o f numerical integration can be illustrated by considering the reversible reaction:

k

A + B ' ^ C

The differential rate equation for this reaction is

= k=.[C] - k,.[A][B].

at

For a very small time interval At, the following approximation is true: d[A].dt = A[A].At

Thus, we can write A[A] = k2[C].At - kj [A] [B] .At

If this is repeated many times over different At periods, then a profile of the concentration of A over time t can be obtained by summing the A[A]s:

[A] = [Ac] + % A [A ]

t - Q

t

= [Ao] + 2 ^(k2[C]At - ki[A][B]At)

where [Aq] is the initial concentration o f A. This method of numerical integration is known as Euler's method. Using the law of conservation of mass, we can write the following equations for B and C :

[B] = [Bo] + g A [ A ] and [C] = [Co] - ^ A [ A ]

t= 0 t=0

By putting in values for kj and k2, we can then work out a profile for the concentration o f A, B and C with time, and if we then compare these calculated profiles with those experimentally determined, we are in a position to see how close the values of kj and k2 chosen for that particular calculation are to the actual values. Thus, the principle o f avvlving numerical integration to solve kinetic schemes is to simulate the vrofîle o f the reactants, intermediates and products over the expérimental time course and compare this with the observed data. B y changing the values o f the rate constants, the profile o f the simulated curve is changed. The simulated curves are optimized in repeated simulations by using least square fit analysis

For a simple reaction scheme this can be done quite easily, but when a complex reaction scheme is encountered, then there will be many rate laws and their integration becomes exceedingly tedious. This is where the power o f the computer comes in. With the help o f a computer, even the most complex rate laws can be integrated numerically.

The accuracy of numerical integration depends on the integration interval At; the smaller the integration interval the better the approximation. This is particularly important in reactions where the reaction rates vary markedly over a small time interval. Such reactions are described as "stiff" systems, and in general they occur when the values of the rate constants are very different in magnitude. For example, the intermediate in a reaction could be formed very quickly but is converted into product slowly, so the system is very stiff initially because of the rapid formation o f the intermediate (Barshop et al., 1983). To solve differential equations that are very stiff would mean that very small integration intervals have to be used in Euler's method, and this poses a great disadvantage because the large number o f calculations with very small At would take up long computer execution time. FACSIMILE overcomes the problem of stiffiiess by using Gear's method where the integration interval is adjusted according to the stiffiiess: a small At is used for integration when the system is stiff, but as the stiffiiess decreases, the At is increased accordingly in order to save computing time (Chance & Curtis, 1970). The reader is referred to the paper by Chance & Curtis for details of this method.

Like many other programs that fit mathematical models to experimental data, FACSIMILE uses non-linear regression analysis to do this. Regression analysis uses the least

square criterion. The residual sum of squares of the errors (RSQS, i.e. difference between the calculated and observed value) is first calculated:

RSQS = Z error^ = Z (calculated - observed)^

The smaller the RSQS, the better the fit. The line of best fit or the regression line is the line for which the RSQS is minimum, and therefore ÿte^^RSQS^Js^^ülîiî!2iS^^JîLm2LÉ£LJ2^Ji!2ÈJt£

equation that best fits the siven data. How is the RSQS minimised ? For every rate equation

integrated, the RSQS is calculated; then a change is made to the rate equation and the process repeated. In FACSIMILE, the change that is made is the natural logarithm of the specified parameter, which must therefore have a positive value. If a positive initial estimate is not given, a default value of 1 is assigned. If the RSQS is smaller than the previous one, than there is an improvement of the fit. If however the RSQS gets bigger, then the direction of the change is reversed to minimise the RSQS. The process is repeated until the RSQS reaches a minimum, which indicates that the corresponding rate equation is the one that best fits the data. As pointed out by Smith (1992), a helpful way of visualizing non-linear regression analysis is to consider a topographical analogy of a two-parameter model: if we plot the RSQS against all the parameter values, then the lowest point of the resulting response surface represents the minimum RSQS (Fig.3.36); the aim of non-linear regression analysis is to find this minimum, and the process is sometimes described as a convergence to the minimum.

RSQS

Parameter 2

Parameter 1

Figure 3.36. A topological analogy of non-linear regression showing convergence to the minimum (taken from Smith, K. in Laboratory Equipment Digest, February 1992).

The model in Figure 3.36 is a two-parameter model. A model involving many more parameters would be much more complicated, and there is always the possibility that more than one minimum exists in such a complex model. Under these circumstances, application of suitable

constraints will help in deciding which minimum is the true minimum for the system. This is discussed later.

How is the RSQS minimised in a complicated system where many rate constants need to be optimized? In FACSIMILE, the organisation of the parameter fitting process is divided into two stages. In the first stage, simulation runs are carried out with different parameter values to find out the minimum RSQS, but only one parameter is varied in each simulation. The first run uses the initial estimates o f the parameters provided by the user. In the next p runs (where p = number o f independent parameters to be optimized) each of the parameters is varied in turn, and a sensitivity matrix is constructed which relates the sensitivity of the RSQS to the parameter being varied. If a change in a particular parameter lowers the RSQS, then the initial estimate for that parameter is replaced by the parameter value which produces the smaller RSQS. Then iterative minimisation of the RSQS is carried out using and updating the sensitivity matrix (FACSIMILE User's Manual, 1988).

The second stage of FACSIMILE'S parameter fitting process begins only after the non­ linear regression analysis shows no further convergence to the minimum. In the first stage, the sensitivity matrix is built from simulations with initial estimates of the parameters. In the second stage, the aim is to recalculate the sensitivity matrix using the best parameter values obtained in the first stage. Thus, another (p +1) simulations are carried out but this time the best parameter values from stage 1 are used, and one parameter is varied at a time.

Statistical tests to determine the goodness of fit are also carried out in the second stage of parameter fitting. Although the aim in the analysis o f the experimental data is to obtain values for the rate constants, it is equally important to know how accurate the determined values are. After performing the optimization, FACSIMILE carries out a series of statistical analyses to see how well the computed solution fits the experimental data. First, FACSIMILE can estimate a range within which the RSQS should fall if the proposed model is correct and if the residuals are due to random experimental errors. It also calculates the following indices:

• CORI: a correlation index o f the residuals. One can tell if the predicted curve has a good fit to the observed data points by looking at the distribution of the data points relative to the predicted curve: a curve which fits the data points well should have the data points evenly distributed above and below the calculated values. The value of |CORI| gives an indication of the distribution o f the data points, and |CORI| should be close to 1 if the residuals are unbiased. If |CORI| is much bigger than 1, then the data points are biased one way.

• AUCO: an auto-correlation index. This indicates whether the residuals are random; if they are, then AUCO will be similar to AUCR.

Details o f the statistical analyses can be found in FACSIMILE/CHEKMAT User's Manual

Documento similar