Justificación - Planteamiento del problema

Capítulo 2: Planteamiento del problema

2.5. Justificación

The NIR spectroscopy algorithms used to “interpret” optical data for absorbing samples may be explained as different approaches to relating sample absorbance (A) at specific wavelengths to analyte concentrations via Beer’s law. To continue:

A= Mcd (7.1)

where A = absorbance (optical density) M= molar absorptivity

c = molar concentration of absorber d = sample path length

and thus

c= A

Md (7.2)

So the multiregression equation commonly used for calibration is

Y = Bo+ Bi(− log Ri)N+ E (7.3)

where Y = percent concentration of absorber Bo= intercept from regression Bi = regression coefficient

i = index of the wavelength used and its corresponding reflectance (Ri) N = total number of wavelengths used in regression

E = random error

This is actually a form of Beer’s law with each B term containing both path length and molar absorptivity (extinction coefficient) terms. Most simply, the concentration is related to the optical data as

Conc.= Change in concentration

Change in absorbance × Absorbance + Some error or

Conc.= K × Absorbance + Some error

Thus, K the regression coefficient is equal to the change in concentration divided by the change in absorbance. Therefore if there is a large change in concentration for the calibration set with a relatively large change in absorbance, the regression coefficients tend to stay small, indicating a large sensitivity and signal-to-noise ratio. In contrast, if we have a large concentration change relative to a small change in absorbance, the regression coefficients tend to be large and indicate low sensitivity and low signal-to-noise. This simple model applies whether a univariate (one-wavelength) or multivariate (many-wavelength) regression is performed.

If calibrations are performed for the simplest case involving one analyte per wavelength and no instrument error (i.e., no instrument drift with time, or noise), Equation (7.3) could be used with a single term.

Y = B1(− log R) (7.4)

By knowing Y , and measuring −log R at each wavelength i, Equation (7.4) could be used to solve for B. However, real-world complications exist, such as instrument noise, drift, nonlinearity between optical data and analyte concentration (deviations from Beer’s law), scattering, nonlinear dispersion, error in reference laboratory results, physical property variations in samples, chemically unstable samples, band overlap, band broadening, and sampling errors. If ideal conditions are approximated (a) noise is assumed to be stochastic (random) and not unidirectional; (b) precautions are taken to optimize signal-to-noise ratios; (c) reference readings at each wavelength and collected to negate drift considerations; and (d) excellent laboratory technique and sample selection protocol are followed, Equation (7.3) may be used to solve for B(o) and B(i). This multilinear regression is a mathematical model relating the absorbance of several samples to their analyte concentration (as previously determined via primary wet chemical methods). The regression provides a best-fit linear model for absorbance vs. analyte concentration, and the mathematical model minimizes the sum of

the square residuals (distances) from each data point to the regression line (termed Y estimate). Note that Beer’s law is a rigorously derived model applying only to transmission spectroscopy.

Beer’s law applies where refractive index, scattering specular reflection at “infinite–finite” num-bers of surfaces all obey the Fresnel formulas. A definite reflectance theory does not exist, as the convolution of an infinite number of integrals would be required to describe all the combined light interaction effects at all surfaces under varying conditions. Thus, Beer’s law is often shown to illustrate the properties of NIR spectroscopy for lack of an ideal model.

So how does one generate a calibration? Let us go back to basics and observe that (ideally) one can find a suitable wavelength for any given substance wherein absorption is proportional to concentration (Beer’s law). But as we noted, the real world is often nonideal, and absorbances deviate from Beer’s law at higher concentrations due most often to nonlinearity of detection systems, scattering effects (which are wavelength-dependent), and stray light-caused nonlinearity. Note that spectra of compounds with absorbance bands more narrow than the instrument resolution can demonstrate substantial nonlinearities due to the stray light characteristics of the instrument.

Even when one restricts measurements to lower concentrations where the relationship between change in absorbance and concentration is linear, the calibration line seldom passes through the origin. Accordingly, even for single components, an offset is most often observed. The use of deriv-ative math pretreatments is often used to reduce the nonzero bias, yet derivderiv-atives do not remove multiplicative error and nonlinearities. The offset occurs as a compensation to background interfer-ence and scattering. In a practical sense, most of the previous considerations do not matter accept in an academic sense. This is due to the fact that the practical tools used in NIR are mostly based on empirical calibration methods where calibration techniques are well understood. Calibration equations using multivariate techniques are used to compensate for the full variety of common vari-ations found in “noisy” chemical values and imperfect instrumental measurements. This is why properly formulated calibration models work extremely well despite the imperfect world of the analyst.

If a set of samples is analyzed with high precision by some reference (standard) method so the analyte concentrations are known, they can be used as a “teaching set” to generate an equation suitable for subsequent predictions. To be a good teaching set, the samples must evenly span the concentration range of interest. There is analytical “danger” in developing calibrations using sample sets with uneven constituent distributions as the mathematical calibration model will most closely fit the majority of samples in the calibration set. Therefore, a calibration model will be weighted to most closely fit samples at, or approximately at, the mean concentration value. Conversely, an evenly distributed calibration set will equally weight the calibration model across the entire concentration range. A properly developed calibration model will perform most accurately for samples at high and low concentration ranges when the calibration set is evenly distributed.

Note that the points do not lie in a straight line but are removed from a line by some distance (called a residual). With a mathematical treatment known as a linear regression, one can find the

“best” straight line through these real-world points by minimizing the residuals. This line is known as the calibration line, and its equation can be used to determine the concentration of unknown samples. If we use principal components regression (PCR) or partial least-squares (PLS), we are regressing the sample scores rather than the optical data directly. A review of calibration problems encountered using standard spectroscopic techniques is outlined in the following paragraphs.

7.2.2 MAJORERRORSOURCES INNIRS

In reflection of radiation at solid matte surfaces, diffuse and specularly reflected energies are super-imposed. The intensity of the diffusely reflected energy is dependent on the angles of incidence and observation, but also on the sample packing density, sample crystalline structure, refractive index, particle size distribution, and absorptive qualities. Thus, an ideal diffusely reflecting surface can only be approximated in practice, even with the finest possible grinding of the samples. There are always

coherently reflecting surface regions acting as elementary mirrors whose reflection obeys the Fresnel formulas. Radiation returning back to the surface of the sample from its interior can be assumed as largely isotropic and should thus fulfill the requirements of the Lambert law. The assumption is made that radiant energy is continuously removed from the incident NIR beam and converted to thermal vibrational energy of atoms and molecules. The decrease in the intensity of the diffusely reflected light is dependent on the absorption coefficient of the sample. The absorption coefficient (K), when taken as the ratio K/S, where S is the scattering coefficient, is proportional to the quantity of absorbing material in the sample.

Utilizing the Kubelka–Munk (K–M) theory we then can relate the reflectance (R) to the absorption (K) and the scattering coefficient (S) by the equation:

S =(1 − R)²

2R = F(R)

It may be stated that R, the diffuse reflectance, is a function of the ratio K/S is proportional to the addition of the absorbing species to the reflecting sample medium. On these relationships is based the assumption that the diffuse reflectance of an incident beam of radiation is directly proportional to the quantity of absorbing species interacting with the incident beam, and so R depends on analyte concentration.

NIR spectroscopic theory does not have to assume a linear relationship between the optical data and constituent concentration, as data transformations or pretreatments are used to linearize the reflectance data. The most used linear transforms include log(1/R) and K–M as math pretreatments.

Calibration equations can be developed that compensate to some extent for the nonlinear relationship between analyte concentrations and log(l/R) or K–M-transformed data. PCR, PLS, and multilinear regression can be used to compensate for the nonlinearity.

If a matrix absorbs at different wavelengths than the analyte, K–M can prove to be a useful linearization method for optical data [2]. If the matrix absorbs at the same wavelength as the analyte, log(1/R) will prove to be most useful to relate reflectance to concentration. Attempts to minimize the effects of scattering and multicollinearity using standard normal variate and polynomial baseline correction are described in Reference 3. When generating calibration equations using samples of known composition, the independent variable is represented by the optical readings (−log R) at specific wavelengths, while the analyte concentration (as determined by manual laboratory technique) is the dependent variable. The stepwise multiple regression statistic allows for the selection of calibration spectral features that correlate (as a group) most closely to analyte concentration for a particular sample set. Once optimal wavelengths are selected, the NIR spectroscopy instrument can be calibrated to predict unknown samples for the quantity of the desired analyte. Thus, regression analysis is used to develop the relationship (regression calibration equation) between several spectral features and the chemical analyte (constituent) being investigated. Note that calibration equations will also contain wavelength terms to compensate for repack variations and interferences such as sample moisture content.

Questions often arise as to which mathematical treatments and instrument types perform opti-mally for a specific set of data. This is best addressed by saying that reasonable instrument and equation selection composes only a small quantity of the variance or error attributable to the NIR analytical technique for any application. Actually, the greatest error sources in any calibration are generally reference laboratory error (stochastic error source), repack error (nonhomogeneity of sample — stochastic error source), and nonrepresentative sampling in the learning set or calibration set population (undefined error).

Total variance in analysis= Sum of all variances due to all error sources

Recent NIR research of necessity is moving toward an explanation of the chemistry involved in the various applications. Originally, the technology was totally empirical and only statistical methods were used to provide tests of calibration validity. With an increasing knowledge base, the NIR community is designing experiments in an attempt to reveal more information about this technology for samples with vastly differing chemical and physical properties. Thus, future users will be able to solve NIR spectroscopy problems with greater chemical and statistical knowledge a priori.

The largest contributions to calibration error can be minimized by reducing the major contributors to variance. These major error sources are described in Table 7.1 and in the following text.

7.2.2.1 Population Error

Population sampling error can be minimized by collecting extremely comprehensive datasets and then reducing them via subset selection algorithms (e.g., Bran and Luebbe PICKS Program, and NIRSystems Subset algorithm). These techniques allow maximum variation in a calibration set with minimum laboratory effort [4,5].

7.2.2.2 Laboratory Error

This source of error can be substantially reduced by performing an in-house audit for procedures, equipment, and personnel, paying particular attention to sample presentation, drying biases, and random moisture losses upon grinding [6].

7.2.2.3 Packing Error

Packing variation can be accommodated by compression (averaging) multiple sample aliquots, by generating a calibration equation on the compressed data and then by predicting duplicate packs of unknown samples [7]. Spinning or rotating sample cups also can reduce this error. The concept is to produce as large a population of measurements as possible noting that the mean of this set of measurements more closely approximates a less “noisy” measurement value. The combination of these methods can reduce prediction errors by as much as 70% (relative) for predicted values as compared to less careful sample collection and presentation.

7.2.3 T^HEM^{EANING OF}O^UTLIERS

Outlier prediction is important during the calibration modeling and monitoring phases. True spectral outliers are considered to be samples whose spectral characteristics are not represented within a specified sample set. Outliers are not considered to be part of the group that is designed to be used as a calibration set. The criterion often given representing outlier selection is a sample spectrum with a distance of greater than three Mahalanobis distances from the centroid of the data. Another quick definition is a sample where the absolute residual value is greater than three to four standard deviation from the mean residual value [8]. Standard multivariate or regression texts more extensively describe these selection criteria.

In a practical sense, outliers are those samples that have unique character so as to make them recognizably (statistically) different from a designated sample population. Evaluation criteria for selecting outliers are often subjective; therefore there is a requirement that some expertise in multivariate methods by employed prior to discarding any samples from a calibration set.

TABLE 7.1

Calibration Error Sources with Recommended Action for Error Reduction

Variance source Recommended solutions

Nonhomogeneity of sample • Improve mixing guidelines

• Improve grinding procedures

• Average replicate repacks

• Rotate sample cup

• Measure multiple readings of large sample volume

Laboratory error • Laboratory audit to correct procedural error

• Suggest improvements on analytical procedures

• Retrain analysts on procedures

• Check and recalibrate reagents, equipment, etc.

Physical variation in sample • Improve sample mixing during sample preparation

• Diffuse light before it strikes the sample using a light diffusing plate

• Pulverize sample to less than 40-µ particle size

• Average multiple repacks

• Rotate sample, or average five sample measurements Chemical variation in sample with time • Freeze-dry sample for storage and measurement

• Immediate data collection and analysis following sample preparation

• Identification of kinetics of chemical change and avoidance of rapidly changing spectral regions Population sampling error • Review calibration set selection criteria

• Use sample selection techniques such as SUBSET or PICKS used for Selecting Calibration Set

Non-Beer’s law relationship (nonlinearity) • Use smaller concentration ranges for each calibration

• Use baseline correction such as standard normal variate or polynomial baseline correction

• Use one or more indicator variables

• Try shorter path lengths

• Check dynamic range of instrument Spectroscopy does not equal manual chemistry • Use different chemical procedures (possibly

spectroscopic)

• Redefine analytical requirements in terms of known chemistries

Instrument noise • Check instrument performance (i.e., gain, lamp voltage, warm-up time, etc.)

• Determine signal-to-noise

• Check precision with standard sample replicate measurements

Integrated circuit problem • Replace faulty components

Optical polarization • Use depolarizing elements

Sample presentation extremely variable • Improve sample presentation methods

• Investigate wide variety of commercially available sample presentation equipment

(Continued)

TABLE 7.1 (Continued)

Variance source Recommended solutions

Calibration modeling incorrect • Select and test calibration model carefully

• Calculate new equation

Poor calibration transfer • Select calibrations with lowest noise, wavelength shift sensitivity, and offset sensitivity

• Identify and transfer actual (not nominal) wavelengths and corresponding regression coefficients

Outlier samples within calibration set • Cumulative normal plots

• CENTER program by ISI

• DISCRIM by Bran and Luebbe Transcription errors • Triple-check all handscribed data

7.3 THE CALIBRATION PROCESS

In document Escuela de Humanidades y Educación (página 98-105)