• No se han encontrado resultados

K ârsn âs and Lindblom (1992) investigated the selectivities o f different Hydrophobic interaction (HIC) media for protein separation. The use o f PCA revealed that

A Practical Investigation into the use o f Principal C om ponent Analysis for the M odelling and Scale-up o f High Perform ance

Liquid Chrom atography C hapter 1

different media could be divided into several groups via the resultant scores plots. For example, some media were selecting mainly according to protein hydrophobicity whereas with other media, the charge (or lack o f charge) on the protein was the most important factor. PCA and other multivariate techniques were found to be valuable tools in understanding and optimising HIC .

An early use o f statistical techniques in the analysis o f chromatograms was demonstrated by Fernando Faigle et al (1991). The effect o f column temperature on overlapping peaks o f mixtures o f toluene, isooctane and ethanol was investigated using Gas Chromatography. The statistical techniques employed were Partial Least

Squares (PLS) and Principal Component Regression (PCR) , both o f which are

similar to PCA and involve the use o f scores and loadings plots. Although only a small number o f samples were used (14) at each o f three temperatures (105, 120 and 130 ®C ), this paper highlighted the potential o f statistical techniques for the analysis o f chromatographic data by revealing distinct clusters via scores plots. The study also used complete chromatography profiles with 41 detection readings. However, the only variable tested was temperature and so the potential o f statistical techniques like PCA to analyse multivariate and interacting bioprocess data was not investigated.

A study by Malmquist and Danielsson (1994) appreciated the fact that different combinations o f process variables lead to different shaped profiles which make data interpretation a complex issue. This paper also realised that the best way to represent ‘variables’ on which to perform PCA would be to use detection readings at regular time intervals and not resolution criteria such as retention time and peak width etc. The study appreciated the problems o f base-line shift o f peaks evident, especially with variations in flow rate which may cause significant information to be concealed. Proposed methods to ‘align’ successive chromatograms included a ‘time warping’ function as proposed by Reiner et al (1979), but this was only used for visual inspection and not for use with multivariate techniques. Other methods included using compression/expansion o f the time scale [Andersson and H âm âlâin en (1994)]. Another proposition was ‘normalisation to constant area’, but the method employed by Malmquist and Danielsson was a combined procedure. The first step involved

A Practical Investigation into the use o f Principal C om ponent A nalysis for the M odelling and Scale-up o f High Perform ance

L iquid C hrom atography C hapter 1

comparing the sample chromatogram with a target. The second step involved a retention time correction shift and the last stage involved fine tuning correction factors. This study was useful for the understanding o f effective pre-processing prior to the implementation o f PCA.

Liang et al (1994) also appreciated the problems o f comparing chromatograms o f differing base-line positions. PCA again was performed on complete chromatograms and was performed locally on selected regions o f chromatograms in order to find chromatographic regions with similar concentration patterns.

M almquist (1994) made use o f PCA for the analysis o f peptide mapping which is important in the quality control o f rDNA derived proteins. Once again the whole chromatogram was used as input data thus eliminating the need for retention and resolution data fi'om each profile. Peptide mapping is a very demanding analytical technique and is vital for the assessment o f amino-acid sequence integrity in proteins. Multivariate techniques like PCA are able to provide an unbiased evaluation and are capable o f handling experimental variations. A set o f reference chromatograms were used and PCA was performed on the dataset and test samples were classified using SIMCA (soft independent modelling o f class analogy). PCA was able to identify relationships between test samples and historical data but no attempt was made to analyse the effect o f varying the operating conditions.

Some o f the most interesting and efficacious applications o f PCA in the analysis o f chromatogram data has been conducted by Kvalheim and his co-workers (predominantly with Liang). There have been many such publications throughout the 1990’s.

These publications focus on the problem o f resolving co-eluting components into the pure constituents. Ensuring peak purity in liquid chromatography is a very demanding problem and impurities o f less than 1% need to be detected.

A Practical Investigation into the use o f Principal Com ponent Analysis for the M o delling and Scale-up o f High Perform ance

Liquid Chrom atography C hapter 1

Kvalheim and Liang (1992“) introduce a new technique (and used in the subsequent papers) to help solve this problem. It is called Heuristic Evolving Latent Projections

or HELP and involves the use o f PCA for the analysis o f multi-wavelength chromatograms. A single chromatographic analysis is performed at many different wavelengths e.g. Kvalheim and Liang (1992'^) used 32 wavelengths at 5 nm intervals between 210 and 365 nm for a separation o f two isomers. The technique has the advantage o f being able to determine the number o f species present in a mixture from a single analysis, as well as being able to resolve the mixture into spectra and concentration profiles o f the pure constituents.

The data presentation prior to PCA for the HELP method is as follows. Each chromatogram is scanned at m wavelengths and contain n time points. The m wavelengths are the columns o f the matrix and the n time points are the rows. (This is analogous with the type o f matrix used in this thesis where the chromatogram samples comprise the rows o f the matrix and the time points comprise the columns.

The data matrix is then decomposed using PCA and the resulting scores and loadings plots reveal information about the numbers of species in the mixture. Local PCA is also performed on critical regions o f the chromatograms e.g. the from and back edges o f peaks to reveal more qualitative information.

Noise was a major consideration in the work described in these publications as the presence o f peaks from more minor peaks species may be confused with noise. Noise effects such as those resulting from drifting baselines need to be accounted for. Such pre-treatments were focused on in Keller et al (1992). One technique for filtering noise was to use PCA on so-called zero-component regions o f the chromatogram where there is a known absence o f chemical species. An appreciation that some pre­ processing o f data was made in Liang et al (1994). Here baseline shifts o f elution peaks (due to noise or otherwise) were accounted for prior to analysis.

The HELP methodology described in these papers by Kvalheim and Liang was generally validated using synthetic mixtures. Also the techniques detailed were for

A Practical Investigation into the use o f Principal C om ponent Analysis for the M odelling and Scale-up o f High P erform ance

Liquid C hrom atography C hapter 1

analytical chromatography. Some o f the principles detailed in the publications could be valuable in the resolving o f peaks (and thus provide an assessment o f peak purity) arising from a complex industrial separation such as the purification o f erythromycin described in this thesis. The application however would require modification to allow for non-linear adsorption properties experienced when operation is under overload conditions.

A PhD thesis by Chandwani (1995) sought to examine how PCA could be used as a chromatographic pattern recognition tool. A size exclusion chromatographic (SEC) system was used to separate three proteins: myoglobin, ovalbumin and bovine serum albumin (monomer, dimer and trimer respectively). The mixture was made up synthetically to a fixed ratio o f 17:2:1.

Experimental design techniques were use to investigate 4 process variables: temperature, load volume, load concentration and flow rate. The idea was to characterise the separation process using PCA. Clusterings in the resulting scores plots were used to identify those separations which were similar but not easily detectable to the human eye. Future runs could then be tested using statistical techniques like the F-test to see whether or not they were in process specification.

1.8. Thesis Aims

The aims o f this thesis are to investigate the following:

• To attempt to use PCA to model the complex separation o f crude erythromycin which would be similar to the type o f separation which may be obtained following a fermentation.

• The importance o f data pre-processing prior to PCA in order to maximise the ability o f the PCA models.

• To examine ability PCA to help with predictions o f realistic, non-linear separations which result when columns are operated under overload conditions.

A Practical Investigation into the use o f Principal C om ponent A nalysis for the M odelling and Scale-up o f High Performance

Liquid Chrom atography Chapter 1

• To compare PCA models which result when performing the same separation on columns with different stationary phases hence exhibiting different adsorption properties.

• To compare PCA models which result when performing the same separation on columns o f different dimensions whilst maintaining the type o f stationary phase. • To use PCA on a set o f chromatogram data generated using finite difference

techniques.

• To outline the potential limits o f using the PCA method for the modelling o f chromatography.