• No se han encontrado resultados

Computer Aided Breast Cancer Detection and Diagnosis System based on Histopathological Image Analsis of TMAs.

N/A
N/A
Protected

Academic year: 2020

Share "Computer Aided Breast Cancer Detection and Diagnosis System based on Histopathological Image Analsis of TMAs."

Copied!
210
0
0

Texto completo

(1)PhD Thesis Computer Aided Breast Cancer Detection and Diagnosis System based on Histopathological Image Analysis of TMAs. Mª del Milagro Fernández Carrobles Ingeniería de Sistemas y Automática, E.T.S.I. Industriales. Advisors Mª Gloria Bueno García Oscar Déniz Suárez May 2015.

(2)

(3) PhD Thesis. Computer Aided Breast Cancer Detection and Diagnosis System based on Histopathological Image Analysis of TMAs. Author Mª del Milagro Fernández Carrobles. Advisors Mª Gloria Bueno García Oscar Déniz Suárez May 2015.

(4)

(5) hola. A mi hermana, mis padres y Jorge. Os quiero..

(6)

(7) hola. Quién lo diría, los débiles de veras nunca se rinden. (Mario Benedetti).

(8)

(9) ACKNOWLEDGMENTS. A los directores del grupo VISILAB (Grupo de Visión y Sistemas Inteligentes) de la E.T.S. Ingenieros Industriales de Ciudad Real por toda su ayuda, interés y buen trato recibido. Y en especial a la Dr. Gloria Bueno que me ha guiado desde que entré a trabajar en el grupo, me animó a realizar el doctorado y ha hecho que mejore en muchos aspectos. A mis padres y mi hermana, por animarme también a seguir estudiando y apoyarme diariamente para realizar este doctorado. Siempre han estado ahí cuando les he necesitado y sin ellos esto no hubiera sido posible. A Jorge, por su apoyo incondicional en mis buenos y malos momentos. A mis compañeros de trabajo, por su amistad y por hacer tan buenos los momentos que vivimos en el laboratorio. Y a todos aquellos que de alguna u otra forma han contribuido a que hoy me pueda encontrar aquí.. i.

(10) ii.

(11) ABSTRACT. Currently, TMA analysis is performed manually by the pathologist who gives a diagnostic based on microscopic observations of the biopsies samples. In this way, the evaluation can be subjective. For that reason, the automation of this task is fundamental in order to provide the pathologist with a tool which automatically analyses (at different magnifications) the samples and produces a diagnostic result. TMAs can gather dozens or even hundreds of tumours in a paraffin block and be used to analyze large molecular markers. Besides, it should be pointed out that TMAs allow carrying out simultaneous and standardized studies of multiple samples with uniform staining. All this allows reducing the economical and temporal cost of TMA preparation and interpretation. The objective of this PhD thesis was the development of a CAD system focused on breast TMA which was able to automatically acquire and classify TMA cores. For that purpose, in the acquisition process several image processing algorithms were created and applied on the TMA thumbnail image to detect, select and archive the cores in an individual image at different magnifications, that is, 5x, 10x, 20x and 40x. On the other hand, the tissue examination and classification process were considered. In this regard, the author conducted a thorough analysis of the existing techniques in histopathological image classification. This preliminary analysis led to a complete study of breast tissue based on colour and texture features. TMA cores extracted from the previous core acquisition algorithm were used to select the 628 ROIs of 4 representative tissue classes. The size of these regions was 200 x 200 pixels (0.74µm/pixel at 10x) and the TMA tissue classes were categorized as: i) benign stromal tissue with low and medium cellularity (170 images), ii) adipose tissue (103 images), iii) benign structures and anomalous (163 images) and iv) different kinds of malignity, that is, ductal and lobular carcinomas (192 images). Once the dataset was established it was first transformed and then filtered by several colour models and texture descriptors respectively. Therefore, a relevant set of features was obtained on 8 different colour models: RGB, CMYK, HSV, Lab, Luv, SCT, Lb and Hb from from 1st and 2nd order statistical descriptors obtained from the intensity image, Fourier, wavelets, Gabor, M-LBP and texton descriptors. The number of features extracted depends on the descriptor used. Fourier, wavelets and Gabor descriptors have four bands to build their filtered images so that their filtered iii.

(12) images are multiplied by 4. Finally, intensity, M-LBP and texton (frequential and spatial) statistic descriptors contain and average of 229 features for each colour model and Fourier, wavelets and Gabor statistic descriptors contain and average of 916 (229x4) features. However, the large number of features may produce redundant information and increase the computational time. Therefore, a dimensionality reduction of the feature sets is needed. In this PhD thesis three methods of dimensionality reduction were analysed: linear discriminant analysis, correlation and sequential forward search. Finally, the training and classification stages are applied, using 10-fold cross-validation and 5 different classifiers: Fisher, support vector machine (SVM), Bagging, random forest and AdaBoost. The results obtained in both core acquisition and classification were highly satisfactory. For core acquisition, 4 TMA datasets with a total of 21244 cores had been processed obtaining an average of 98% accuracy. For TMA core classification four types of classification experiments were performed, that is, (1) Classification per colour model individually, (2) Classification by combination of colour models (3) Classification by combination of colour models and descriptors and (4) Classification by combination of colour models and descriptors with a previous feature set reduction. The best result in classification experiments was obtained with the CMYK&Hb&Lb&HSV&Luv&SCT colour combination and Intensity&M-LBP&Gabor&Spatial Textons descriptors reaching an average of 99.045% accuracy and 98.34% precision with a total of 1719 features. Once all the algorithms were developed they were integrated into a complete CAD tool called TMA CAD System. In conclusion, CAD systems in histopathology are still a challenge due to the fact that histopathological images encompass a variety of cancer types and the analysis is still performed by the pathologist under the microscope. It is therefore important the development of applications such as the TMA CAD System developed in this PhD thesis. Firstly, the tool allows to acquire the TMA cores individually which is essential for further tissue classification and secondly there is an improvement in breast tissue classification by combining colour and texture descriptors. Pathologists at the department of Anatomical Pathology of Ciudad Real have tested and evaluated the TMA CAD System. They assessed the tool very positively.. iv.

(13) RESUMEN. Hoy en día el análisis de los TMAs es realizado manualmente por el patólogo, el cual, basa su diagnóstico en las observaciones que realiza a través del microscopio. De esta forma, la evaluación de las biopsias puede ser subjetiva. Por esa razón, la automatización de este tipo de tareas es primordial, con el objetivo siempre de ofrecer al patólogo una herramienta completamente automática que analice (a diferentes aumentos) los ejemplos de tejido y obtenga un resultado de diagnóstico. Los TMAs pueden reunir docenas e incluso cientos de tumores en un mismo porta, puediendo además analizar una gran cantidad de marcadores moleculares. Además, se debe de tener en cuenta que los TMAs permiten llevar a cabo simultáneos y estandarizados estudios de múltiples casos con tinción uniforme. Todas estas características permiten además reducir el coste económico y temporal en la preparación e interpretación de los TMAs. El objetivo de esta tesis doctoral es el desarrollo de un sistema CAD enfocado en TMAs de mama el cual sea capaz de adquitir y clasificar de una forma completamente automática los cilindros de TMA. Para conseguir este propósito, en el primero de los casos se desarrollarán y aplicarán varios algoritmos de procesamiento de imagen sobre la imagen en miniatura del TMA para detectar, selecionar y almacenar los cilindros de forma individual en diferentes aumentos (5x, 10x, 20x, 40x). Por otro lado, en el caso de la clasificación, el cual es realmente el principal objetivo de esta tesis, se trabajará para que el sistema pueda obtener la más apropiada clasificación del tejido de mama. En este aspecto, la autora llevó a cabo un análisis exahustivo de las técnicas existentes para la clasificación de imágenes histopatológicas. De este análisis previo surgió la idea de realizar un completo estudio sobre el tejido de mama basado en color y textura. Para ello, utilizamos los cilindros de TMA extraídos del algoritmo previo de adquisición para seleccionar 628 regiones de interés de cuatro clases de tejido representativas. El tamaño seleccionado para dichas regiones de interés fue de 200 x 200 pixeles (0.74µm/pixel a 10x) y el tejido de mama fue clasificado en: i) estroma con baja y media celularidad (170 imágenes), 2) tejido adiposo (103 imágenes), 3) estructuras benignas del tejido y anómalas (163 imágenes) y 4) diferentes tipos de malignidad, carcinomas ductal y lobulillar (192 imágenes). Una vez que la base de datos se estableció fué transformada y filtrada por varios modelos de color y descriptores de textura. Por tanto, se obtuvo un destacado conjunto de características en 8 modelos de color: RGB, CMYK, HSV, v.

(14) Lab, Luv, SCT, Lb and Hb a partir de los descriptores estadísticos de de 1er y 2◦ orden obtenidos de los descriptores de intensidad, Fourier, wavelets, Gabor, M-LBP y textons. El número de características estraídas es dependiente del tipo de descriptor utilizado. Fourier, wavelets y Gabor posee 4 bandas para contruir las imágenes filtradas de forma que sus imágenes filtradas se ven multiplicadas por 4. Finalmente, en el caso de las imágenes de intensidad, M-LBP y textons (frequenciales o espaciales) obtenemos 229 características estadísticas por cada modelo de color y en el caso de Fourier, wavelets y Gabor las características estadísticas obtenidas son 916 (229 x 4). Sin embargo, el tener un número considerable de características también puede afectar en los datos debido a redundancias en la información o el incremento del tiempo de computación para poder procesarlas. Una reducción de la dimensionalidad de las características es necesaria en estos casos. En esta tesis doctoral tres métodos de reducción de dimensionalidad fueron analizados: el análisis lineal discrimiante, la correlación entre variables y la búsqueda secuencial hacia adelante. Finalmente se lleva a cabo la fase de clasificación donde podemos distinguir la fase de entremamiento y la de test. La fase de entrenamiento fué llevada acabo usando una validación cruzada de 10 iteraciones y la fase de test aplicando 5 tipos diferentes de clasificadores: Fisher, support vector machine (SVM), Bagging, random forest and AdaBoost. Los resultados obtenidos en ambos, la adquisición de cores y su posterior clasificación, han tenido unos resultados altamente satisfactorios. En la adquisición de los cores, 4 bases de datos de TMA con un total de 21244 cores fueron procesadas obteniendo una exactitud (accuracy) del 98%. Por otro lado, en el algoritmo de clasificación cuatro tipos de experimentos fueron realizados: (1) Clasificación por modelos de color de forma individual, (2) Clasificación por combinacón de modelos de color, (3) Clasificación por combinación de modelos de color y descriptores de textura y (4) Classificación por combinación de modelos de color y descriptores de textura con una reducción de la dimensionalidad del conjunto de características. El mejor resultado de clasificación fué obtenido mediante la combianción de modelos de color CMYK&Hb&Lb&HSV&Luv&SCT y la combinación de descriptores Intensity&MLBP&Gabor&Spatial Textons que alcanzó una exactitud y una precisión del 99.045% y 98.34% respectivamente utilizando un total de 1719 características. Una vez que los dos algoritmos anteriormente citados fueron desarrollados y probados, se integraron en el sistema CAD denomiando TMA CAD System. Como conclusión de esta tesis, reiterar que los sistemas CAD en histopatología se encuentran todavía en una fase inicial debido a que las imágenes hitopatológicas abarcan una gran variedad de tipos de cánceres y su análisis es la mayor parte de las veces realizada manualmente por el propio patólogo con el miscroscopio. De ahí la importancia de la herramienta desarrollada en esta tesis doctoral. Primeramente, TMA CAD System permite adquirir de una forma individual y a diferentes aumentos los cores de TMA, lo cual es esencial para su posterior análisis y clasificación. Y en segundo lugar, vi.

(15) recalcar la mejora en los resultados de clasificación del tejido de mama que obtiene la herramienta en contraste con otros algoritmos similares mediante la combinación de modelos de color y descriptores de textura. Finalmente, mencionar que TMA CAD System ha sido probada y evaluada por patólogos pertenecientes al Departamento de Anatomía Patológica del Hospital de Ciudad Real, los cuales han mostrado su interés y han valorado muy positivamente la herramienta.. vii.

(16) viii.

(17) CONTENTS. 1. 2. 3. INTRODUCTION 1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . 1.2 TMA Introduction . . . . . . . . . . . . . . . . . . . 1.3 State of the Art: TMA Core Acquisition and Analysis 1.4 Framework . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Motivation . . . . . . . . . . . . . . . . . . 1.4.2 Materials for TMA Core Acquisition . . . . 1.4.3 Materials for TMA Core Classification . . . 1.4.4 Objectives . . . . . . . . . . . . . . . . . . . 1.5 Structure of the Document . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 1 . 2 . 2 . 5 . 7 . 7 . 7 . 8 . 10 . 12. INTERPRETATION OF BREAST TISSUE 2.1 Breast Tissue . . . . . . . . . . . . . . 2.1.1 Benign Tissue . . . . . . . . . . 2.1.2 Benign Anomalous Tissue . . . 2.1.3 Malignant Tissue . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 15 15 16 17 24. TMA CORE ACQUISITION 3.1 Introduction . . . . . . . . . . . . . . . . . . . 3.2 State of the Art: TMA Core Acquisition . . . . 3.3 Selection and Extraction Methods . . . . . . . 3.3.1 Tissue Core Detection . . . . . . . . . 3.3.2 Tissue Core Selection . . . . . . . . . . 3.3.3 Tissue Core Positioning and Extraction 3.3.4 Tissue Core Archiving . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 31 31 32 33 33 35 36 39. ix. . . . .. . . . .. . . . ..

(18) CONTENTS 4. 5. 6. 7. COLOUR AND HISTOPATHOLOGICAL IMAGE ANALYSIS 4.1 Colour Models . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 RGB Colour Model . . . . . . . . . . . . . . . . . . . 4.1.2 CMYK Colour Model . . . . . . . . . . . . . . . . . 4.1.3 HSV colour model . . . . . . . . . . . . . . . . . . . 4.1.4 L*a*b* Colour Model . . . . . . . . . . . . . . . . . 4.1.5 L*u*v* Colour Model . . . . . . . . . . . . . . . . . 4.1.6 SCT Colour Model . . . . . . . . . . . . . . . . . . . 4.2 Colour Model Combination . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . .. 41 41 42 42 43 44 45 46 47. TEXTURE DESCRIPTORS 5.1 Image and Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Introduction to Texture Descriptors . . . . . . . . . . . . . . . . . . . 5.3 State of the Art: Descriptors and Histopathological Images . . . . . . 5.3.1 Statistical Descriptors . . . . . . . . . . . . . . . . . . . . . 5.3.2 M-LBP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Fourier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Gabor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Frequential and Spatial Textons . . . . . . . . . . . . . . . . . . . . . 5.4.1 Textons Using Filter Banks: Frequential Texton . . . . . . . . 5.4.2 Textons Using a N xN Square Neighbourhood: Spatial Texton. . . . . . . . . . . .. 49 49 50 51 56 58 58 64 69 71 71 74. FEATURE EXTRACTION AND CLASSIFICATION 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . 6.2 Feature Dimension Reduction . . . . . . . . . . . 6.2.1 Linear Discriminant Analysis . . . . . . . 6.2.2 Correlation . . . . . . . . . . . . . . . . . 6.2.3 Sequential Forward Search . . . . . . . . . 6.3 Training & Classification . . . . . . . . . . . . . . 6.3.1 Fisher . . . . . . . . . . . . . . . . . . . . 6.3.2 Support Vector Machine . . . . . . . . . . 6.3.3 Bagging . . . . . . . . . . . . . . . . . . . 6.3.4 Random Forest . . . . . . . . . . . . . . . 6.3.5 AdaBoost . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. 77 77 79 79 79 80 80 81 81 82 82 82. ACQUISITION AND CLASSIFICATION 7.1 Result Interpretation . . . . . . . . . . . . . . . . . . . . . 7.1.1 ROC Curves . . . . . . . . . . . . . . . . . . . . . 7.2 TMA Core Acquisition Results . . . . . . . . . . . . . . . . 7.3 Classification Results . . . . . . . . . . . . . . . . . . . . . 7.3.1 Experiment 1: Results per Colour Model . . . . . . 7.3.2 Experiment 2: Results of Combining Colour Models. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 85 85 87 87 93 93 95. x. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . ..

(19) CONTENTS 7.3.3 7.3.4. Experiment 3: Results of Combining Colour Models and Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Experiment 4: Results of Combining Colour Models and Descriptors with a Previous Feature Correlation and Feature Forward Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 99. 8. A CAD SYSTEM FOR THE ACQUISITION AND CLASSIFICATION OF BREAST TMA 103 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.2 TMA CAD Core Acquisition . . . . . . . . . . . . . . . . . . . . . . . 104 8.3 TMA CAD Core Classification . . . . . . . . . . . . . . . . . . . . . . 104 8.4 TMA CAD Core Viewer . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.5 Classification Results Obtained Using the TMA CAD System . . . . . . 108. 9. CONCLUSIONS/CONCLUSIONES 9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . 9.2 Original Contribution . . . . . . . . . . . . . . . . . . 9.3 Scientific Publications . . . . . . . . . . . . . . . . . 9.3.1 Journals . . . . . . . . . . . . . . . . . . . . . 9.3.2 Books . . . . . . . . . . . . . . . . . . . . . . 9.3.3 International Conferences . . . . . . . . . . . 9.3.4 National Conferences . . . . . . . . . . . . . . 9.4 Quality Aspects . . . . . . . . . . . . . . . . . . . . . 9.5 Future Improvements . . . . . . . . . . . . . . . . . . 9.6 Conclusiones . . . . . . . . . . . . . . . . . . . . . . 9.7 Contribución Original . . . . . . . . . . . . . . . . . . 9.8 Publicaciones Científicas . . . . . . . . . . . . . . . . 9.8.1 Publicaciones en Revistas Indexadas en el JCR 9.8.2 Libros y Capítulos de Libro . . . . . . . . . . 9.8.3 Conferencias Internacionales . . . . . . . . . . 9.8.4 Conferencias Nacionales . . . . . . . . . . . . 9.9 Aspectos de Calidad . . . . . . . . . . . . . . . . . . 9.10 Mejoras futuras . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. 113 113 114 115 115 116 117 117 118 118 120 121 122 122 123 124 124 125 125. BIBLIOGRAPHY. 126. APPENDIX. 143. A EXPERIMENTAL RESULTS: DESCRIPTORS, COLOUR MODELS AND COMBINATIONS 143 A.1 Experiment 1: Results per Colour Model . . . . . . . . . . . . . . . . . 144 A.2 Experiment 2: Results of Combing Colour Models . . . . . . . . . . . 146 A.3 Experiment 3: Results of Combining Colour Models and Descriptors . . 148 xi.

(20) CONTENTS A.4 Experiment 4: Results of Combining Colour Models and Descriptors with a Previous Feature Correlation and Feature Fordward Selection. . A.4.1 97% Correlation . . . . . . . . . . . . . . . . . . . . . . . . A.4.2 99% Correlation . . . . . . . . . . . . . . . . . . . . . . . . A.4.3 SFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B FEATURE SET REDUCTION B.1 Introduction . . . . . . . . B.2 97% Correlation . . . . . . B.2.1 Intensity . . . . . B.2.2 Wavelets . . . . . B.2.3 M-LBP . . . . . . B.2.4 Gabor . . . . . . . B.2.5 Spatial Textons . . B.3 99% Correlation . . . . . . B.3.1 Intensity . . . . . B.3.2 Wavelets . . . . . B.3.3 M-LBP . . . . . . B.3.4 Gabor . . . . . . . B.3.5 Spatial Textons . . B.4 Sequential Forward Search B.4.1 Intensity . . . . . B.4.2 Wavelets . . . . . B.4.3 M-LBP . . . . . . B.4.4 Gabor . . . . . . . B.4.5 Spatial Textons . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. xii. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . .. 150 150 152 154. . . . . . . . . . . . . . . . . . . .. 157 157 160 160 161 162 163 164 165 165 166 167 168 169 170 170 171 172 173 175.

(21) FIGURE LIST. 1.1 1.2. 1.3. Automatic TMA core acquisition.(Image extracted from [10]). . . . . . . . . 3 Benign structures and benign anomalous structures in TMA images stained with HE. A) Terminal ducts and lobules, B) Sclerosing lesions (radial scar), C) Adenosis lesions, D) Fibroadenomas, E) Tubular adenomas, F) Phyllodes tumors, G) Columnar cell lesions and F) Duct ectasia. 9 The four classes selected to perform the TMA core classification. . . . . 10. 2.1 2.2. Anatomy of the female breast. (Image extracted from [45]). . . . . . . . Benign breast tissue: (a) Stroma, (b) Fatty tissue, (c) Stroma with cellularity and (d) Lobules. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Radial scar. (a) Tissue at low magnitude and (b) High magnification of the areas of the first image. (Image extracted from [46]). . . . . . . . . 2.4 Sclerosing adenosis. (a) Tissue at low magnification and (b) High magnification of the areas of the first image. (Image extracted from [46]). . . 2.5 Fibroadenoma. (a) Intracanalicular pattern and (b) Pericanalicular pattern. (Image extracted from [46]). . . . . . . . . . . . . . . . . . . . . 2.6 Tubular adenoma. (Image extracted from [46]). . . . . . . . . . . . . . . . . 2.7 Phyllodes tumour. (a) Benign and (b) Malignant. (Images extracted from [46]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Columnar cell lesions: (a) Columnar cell changes, (b) Columnar cell hyperplasia and (c) Flat epithelial atypia. (Images extracted from [46]). 2.9 Intraductal papillomas. (a) Central (or solitary) papilloma and (b) Peripheral papilloma. (Image extracted from [46]). . . . . . . . . . . . . . 2.10 Duct ectasia. (Image extracted from [46]). . . . . . . . . . . . . . . . . 2.11 DCIS grades and types. Low and moderate grade: (a) Cribiform, (b) Solid, (c) Papillary and (d) Micropapillary. High grade: (e) Comedo. (Images extracted from [46]). . . . . . . . . . . . . . . . . . . . . . . . xiii. 16 17 18 19 20 20 21 22 23 24. 25.

(22) FIGURE LIST 2.12 IDC grades. (a) grade 1, (b) grade 2 and (c) grade 3. (Images extracted from [46]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.13 (a) LCIS at low magnification, (b) LCIS at high magnification. (Images extracted from [46]). . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.14 ILC variants: (a) Classic, (b) Solid, (c) Alveolar, (d) Tubulolobular, (e) Pleomorphic and (e) Histiocytoid. (Images extracted from [46]). . . . . 29 3.1 3.2 3.3 3.4 3.5. 3.6 4.1 4.2 4.3 4.4 4.5 5.1 5.2 5.3 5.4. 5.5 5.6. 5.7. TMA core segmentation process. . . . . . . . . . . . . . . . . . . . . Tissue core selection based on the amount of segmented tissue. . . . . Positioning of TMA cores in the thumbnail. . . . . . . . . . . . . . . Lineal Problems in the positioning . a) Different row orientation, b) No linear columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Difference between tiling and stitching. a) Tiling or union of the tiles. It produces a badly reconstructed core with duplicated regions, b) Stitching or rigid registration of the image tiles. It copes with overlapped regions produced by the scanning process. . . . . . . . . . . . . . . Directory hierarchy and magnification subdirectories . . . . . . . . .. . 34 . 36 . 37. Samples of breast tissue stained with HE. . . . . . . . . . . . . . . . RGB colour model. The coordinates with a P are the primary colours. (Image extracted from [53]). . . . . . . . . . . . . . . . . . . . . . . HSV colour model. (Image extracted from [59]). . . . . . . . . . . . Lab colour model. (Image extracted from [60]). . . . . . . . . . . . . TMA image sample in different colour models. . . . . . . . . . . . .. . 42. Different textural surfaces. . . . . . . . . . . . . . . . . . . . . . . . Co-occurrence matrix creation. . . . . . . . . . . . . . . . . . . . . . LBP descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-D Discrete Fourier transform (a) original image, (b) DFT magnitude with position (0,0) on the upper left corner (c) DFT magnitude with position (0,0) on the image centre. . . . . . . . . . . . . . . . . . . . Frequencies and orientations in the Fourier magnitude image. . . . . . Magnitude image explanation. (a) Horizontal frequencies (v = 0 and u 6= 0). This frequential line is associated to horizontal lines of the original image like the shoulders and the background line, (b) Vertical frequencies (v 6= 0 and u = 0). This frequential line is associated to vertical lines like the background builds or the central leg of the tripod (c) Diagonal frequencies (v 6= 0 and u 6= 0). In this case, this frequential line takes low frequency in u direction and very high frequency in the v direction (it is nearest to horizontal frequency line). This line is associated to lines like the coat or the left leg of the tripod. . . . . . . Difference between Fourier and Wavelts functions. Fourier signals (top) and wavelets signals (bottom). . . . . . . . . . . . . . . . . . . . . . xiv. . 37. . 38 . 40. . . . .. 43 44 45 48. . 50 . 57 . 60. . 62 . 62. . 63 . 64.

(23) FIGURE LIST. 5.16. 1-D DWT algorithm where hψ (−n) corresponds to a high-pass filter and hφ (−n) is a low-pass filter. . . . . . . . . . . . . . . . . . . . . . . 2-D DWT algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . Daubechies wavelets. . . . . . . . . . . . . . . . . . . . . . . . . . . . MR8 filter bank. The filter bank comprises 38 filters for isotropic and anisotropic filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . Extracting the 38 dimensional vector of pixel 1 from a RGB image. The pixel vector consists of the first 38 pixels of each filtered image. . . . . Texton vocabulary generation. Pixel vectors are used by the k-means clustering algorithm to select the textons which form the texton vocabulary. In this study 60 textons per class have been used, that is the vocabulary comprises 240 textons of dimension 1 x 38. . . . . . . . . . Visualization of the first 36 textons in the RGB images as local filters. . Texton Maps. Each original image pixel is now represented by its corresponding texton number. . . . . . . . . . . . . . . . . . . . . . . . . Representative vector of pixel i composed of N xN coordinates (N =3).. 6.1 6.2 6.3. Stages in feature extraction. . . . . . . . . . . . . . . . . . . . . . . . . 78 High-dimensional feature space mapping. (Image extracted from [138]). . . . 81 Random forest algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 83. 7.1 7.2 7.3. . 87 . 88. 5.8 5.9 5.10 5.11 5.12 5.13. 5.14 5.15. Diagnostic result depending on the AUC. . . . . . . . . . . . . . . . Computational Time for TMA core extraction and archiving. . . . . . Quantitative validation by means of ROC analysis. a) ROC analysis for GIST TMA cores digitized with the Microscope, and the Scanner, b) ROC analysis for breast cancer with HE TMA cores digitized with the Microscope and the Scanner. . . . . . . . . . . . . . . . . . . . . . . 7.4 Type I and II errors in TMA core extraction. . . . . . . . . . . . . . . 7.5 TMA core segmentation at different magnifications I. Results with our TMA database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 TMA core segmentation at different magnifications II. Results with other TMA databases. . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Results from the best classifier using colour models: AdaBoost. . . . 7.8 Results obtained with the Fisher classifier using each colour model independently. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Results obtained with the AdaBoost classifier using each colour model independently. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10 Results obtained with the Bagging classifier using each colour model independently. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. xv. 68 68 69 72 73. 73 74 74 75. . 89 . 90 . 91 . 92 . 94 . 96 . 96 . 97.

(24) FIGURE LIST 7.11 Results using the Bagging classifier with a combination of colour models and descriptors. Where: (1) Intensity&M-LBP, (2) Intensity&Spatial Textons, (3) Intensity&M-LBP&Gabor, (4) Intensity&M-LBP&Spatial Textons, (5) Intensity&M-LBP&Gabor&Spatial Textons and (6) Intensity&M-LBP&Gabor&Wavelets. . . . . . . . . . . . . . . . . . . . . . . 99 7.12 Results using the Bagging classifier with a combination and previous correlation threshold of 97%. Where: (1) Intensity&M-LBP, (2) Intensity&Spatial Textons, (3) Intensity&M-LBP&Gabor, (4) Intensity &MLBP&Spatial Textons, (5) Intensity&M-LBP&Gabor&Spatial Textons and (6) Intensity&M-LBP&Gabor&Wavelets. . . . . . . . . . . . . . . 101 7.13 Results using the Bagging classifier with a combination of colour models and descriptors with feature selection. Where: (1) Intensity&MLBP, (2) Intensity&Spatial Textons, (3) Intensity&M-LBP&Gabor, (4) Intensity&M-LBP&Spatial Textons, (5) Intensity&M-LBP&Gabor&Spatial Textons and (6) Intensity&M-LBP&Gabor&Wavelets. . . . . . 102 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9. TMA core classification options. . . . . . . . TMA core extraction. . . . . . . . . . . . . . Selection of the cores to be extracted. . . . . Selection of the ROIs to be classified. . . . . TMA CAD classification process. . . . . . . TMA CAD core viewer. . . . . . . . . . . . . Zoom window. . . . . . . . . . . . . . . . . Classificated TMA core. Left: Original image. Classification results. . . . . . . . . . . . . .. B.1 B.2 B.3 B.4 B.5. Feature set reduction with 97% correlation using intensity descriptors. Feature set reduction with 97% correlation using Wavelets descriptors. Feature set reduction with 97% correlation using M-LBP descriptors. . Feature set reduction with 97% correlation using Gabor descriptors. . Feature set reduction with 97% correlation using spatial textons descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature set reduction with 99% correlation using intensity descriptors. Feature set reduction with 99% correlation using Wavelets descriptors. Feature set reduction with 99% correlation using M-LBP descriptors. . Feature set reduction with 99% correlation using Gabor descriptors. . Feature set reduction with 99% correlation using spatial textons descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature set reduction with SFS using intensity descriptors. . . . . . . Feature set reduction with SFS using Wavelets descriptors. . . . . . . Feature set reduction with SFS using M-LBP descriptors. . . . . . . . Feature set reduction with SFS using Gabor descriptors. . . . . . . . . Feature set reduction with SFS using spatial textons descriptors. . . .. B.6 B.7 B.8 B.9 B.10 B.11 B.12 B.13 B.14 B.15. xvi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Right: . . . .. . . . . . . . . . . 105 . . . . . . . . . . 106 . . . . . . . . . . 106 . . . . . . . . . . 107 . . . . . . . . . . 107 . . . . . . . . . . 109 . . . . . . . . . . 109 Core image result. 110 . . . . . . . . . . 111 . . . .. 160 161 162 163. . . . . .. 164 165 166 167 168. . . . . . .. 169 170 171 172 173 175.

(25) TABLE LIST. 1.1. Coloration of the breast tissue structures with HE. . . . . . . . . . . . .. 3.1. Relationship between pixels and magnifications. . . . . . . . . . . . . . 38. 5.1 5.2 5.3 5.4. Types of texture models and descriptors. . . . . . . . . . . . . . . . . Classification performance of different state-of-the-art textons methods in biomedical applications. . . . . . . . . . . . . . . . . . . . . . . . 1st order statistical descriptors. . . . . . . . . . . . . . . . . . . . . . 2nd order statistical descriptors. . . . . . . . . . . . . . . . . . . . .. 6.1 6.2. Bagging algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 AdaBoost algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84. 7.1 7.2. Quantitative validation of the TMA core acquisition. . . . . . . . . . . 90 Computational time in seconds for each classifier and descriptor using Hb. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Best classification per colour model: AdaBoost Classifier in Hb and intensity descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Best classification by combination of Hb&Luv&SCT colour models: Fisher classifier and M-LBP descriptors. . . . . . . . . . . . . . . . . . 97 Best classification by combination of CMYK&Hb&Lb&HSV&Lab colour models: AdaBoost classifier and Intensity descriptors. . . . . . . . . . . 97 Best classification using a Bagging classifier and a combination of CMYK&Hb&Lb&HSV&Lab colour models and Intensity&M-LBP&Gabor&Spatials Textons descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . 98. 7.3 7.4 7.5 7.6. xvii. 4. . 51 . 56 . 57 . 59.

(26) TABLE LIST 7.7. The best final classification was obtained by a previous correlation threshold of 97% and the Bagging classifier combining CMYK&Hb&Lb&HSV&Luv&SCT colour model and Intensity&M-LBP&Gabor&Spatial Textons descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100. 8.1. Classification results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 110. A.1 A.2 A.3 A.4 A.5 A.6 A.7. Classification per colour model I. . . . . . . . . . . . . . . . . . . . . Classification per colour model II. . . . . . . . . . . . . . . . . . . . Classification using a combination of all colour models I. . . . . . . . Classification using a combination of all colour models II. . . . . . . Classification using a combination of colour models and descriptors I. Classification using a combination of colour models and descriptors II. Classification using a combination of colour models and descriptors and a 97% correlation threshold I. . . . . . . . . . . . . . . . . . . . . . . Classification using a combination of colour models and descriptors and a 97% correlation threshold II. . . . . . . . . . . . . . . . . . . . . . Classification using a combination of colour models and descriptors and a 99% correlation threshold I. . . . . . . . . . . . . . . . . . . . . . . Classification using a combination of colour models and descriptors and a 99% correlation threshold II. . . . . . . . . . . . . . . . . . . . . . Classification using a combination of colour models and descriptors and a sequential forward search I. . . . . . . . . . . . . . . . . . . . . . . Classification using a combination of colour models and descriptors and a sequential forward search II. . . . . . . . . . . . . . . . . . . . . .. A.8 A.9 A.10 A.11 A.12. B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.8 B.9 B.10 B.11 B.12 B.13 B.14 B.15 B.16 B.17. . . . . . .. . 150 . 151 . 152 . 153 . 154 . 155. Statistical feature abbreviations . . . . . . . . . . . . . . . . . . . . . . The best features with 97% correlation using intensity descriptors. . . . The best features with 97% correlation using Wavelets descriptors. . . . The best features with 97% correlation using M-LBP descriptors. . . . . The best features with 97% correlation using Gabor descriptors. . . . . The best features with 97% correlation using spatial textons descriptors. The best features with 99% correlation using intensity descriptors. . . . The best features with 99% correlation using Wavelets descriptors. . . . The best features with 99% correlation using M-LBP descriptors. . . . . The best features with 99% correlation using Gabor descriptors. . . . . The best features with 99% correlation using spatial textons descriptors. The best features with SFS using intensity descriptors. . . . . . . . . . The best features with SFS using Wavelets descriptors. . . . . . . . . . The best features with SFS using M-LBP descriptors. . . . . . . . . . . The best features with SFS using Gabor descriptors. . . . . . . . . . . . The best features with SFS using Gabor descriptors. . . . . . . . . . . . The best features with SFS using Gabor descriptors. . . . . . . . . . . . xviii. 144 145 146 147 148 149. 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175.

(27) TABLE LIST B.18 The best features with SFS correlation using spatial textons descriptors. 176. xix.

(28)

(29) ABBREVIATIONS. ABC Adjuvant Breast Cancer ACC Accuracy AdaBoost Adaptive Boosting AUC Area Under the Curve BIF Basic Image Features BI-RADS Breast Imaging Report and Database System CD Clusters of Differentiation cDNA Complementary DNA CFT Continuous Fourier Transform CIE International Commission on Illumination CLBP Compound Local Binary Patterns CMYK Cyan, Magenta, Yellow and Key CUReT Columbia-Utrecht Reflectance and Texture CWT Continuous Wavelet Transform DABH Diaminobenzidine and Hematoxylin DC Direct Current DCIS Ductal Carcinoma In-Situ DDSM Digital Database for Screening Mammography. xxi.

(30) ABBREVIATIONS DFT Discrete Fourier Transform DNA Deoxyribonucleic Acid DWT Discrete Wavelet Transform ER Estrogen Receptors FISH Fluorescence In Situ Hybridization FN False Negative FP False Positive GIST Gastrointestinal Stromal Tumor GLCM Grey Level Co-occurrence Matrix HE Hematoxylin and Eosin HER2 Human Epidermal growth factor Receptor-type2 HSI Hue, Saturation, Intensity HSV Hue, Saturation, Value ICFT Inverse Continuous Fourier Transform ICWT Inverse Continuous Wavelet Transform ID Identification IDC Invasive Ductal Carcinoma IDWT Inverse Discrete Wavelet Transform IHC Immunohistochemistry ILBP Improved Local Binary Patterns ILC Invasive Lobular Carcinoma IPP Integrated Performance Primitives ISH In Situ Hybridization KB Kilobytes xxii.

(31) ABBREVIATIONS KNN K-Nearest Neighbours LBP Local Binary Patterns LCIS Lobular Carcinoma In-Situ LDA Linear Discriminant Analysis LGA Local Greylevel Appearances LM Leung-Malik M-LBP Mean Local Binary Pattern MAE Mean Absolute Error MB Megabytes MCC Matthews Correlation Coefficient MIAS Mammography Image Analysis Society MIB-1 Mindbomb E3 Ubiquitin Protein Ligase 1 MR4 Maximun Response 4 MR8 Maximun Response 8 NPV Negative Predictive Value OpenCV Open Source Computer Vision PPV Positive Predictive Value PR Progesterone Receptors QDA Quadratic Discriminant Analysis QMF Quadrature Mirror Filters RGB Red Green Blue ROC Receiver Operating Characteristic ROI Region of Interest SBFS Sequential Backward Floating Search xxiii.

(32) ABBREVIATIONS SBS Sequential Backward Search SCT Spherical Coordinate Transformation SFFS Sequential Forward Floating Search SFS Sequential Forward Search SIFT Scale-Invariant Feature Transform SPM Spatial Pyramid Match SVM Support Vector Machine TBB Threading Building Blocks TDLUs Terminal Ductal Lobular Units TMA Tissue Microarray TN True Negative TP True Positive UIUC University of Illinois at Urbana-Champaign WSI Whole Slide Image. xxiv.

(33) CHAPTER 1. INTRODUCTION. This chapter provides an introduction to the work presented in this PhD thesis. The proper management of large microscopy images is only possible with the improvement of computer-based systems. For that reason, the practice of anatomical pathology using image processing and artificial intelligence is still a challenge. Besides, pathology is applied to a wide range of body areas, each with its features and different study needs. This chapter describes the current technology and studies published on the management and the analysis of breast histological images.. Este capítulo presenta una introducción al trabajo realizado en esta tesis doctoral. El manejo de forma adecuada de las imágenes microscópicas de gran tamaño es sólo posible mediante la mejora de los sistemas asistidos por computador. Por esa razón, el tratamiento de imágenes de anatomía patológica mediante el uso de la inteligencia artificial es todavía un desafío. Además, se debe tener en cuenta que la patología abarca un amplio rango de zonas del cuerpo, cada una de ellas con sus propias características y diferentes necesidades de estudio. Este capítulo describe la tecnología ya existente y los estudios publicados sobre el manejo y el análisis de las imágenes histopatológicas de mama.. 1.

(34) CHAPTER 1. INTRODUCTION. 1.1. Preamble. This PhD thesis was carried out in the VISILAB research group. VISILAB leads national and international research projects related with computer vision and artificial intelligence. The work developed in this thesis is directly related to two research projects led by Dr. Gloria Bueno: • Automatización y Análisis en Microscopía Óptica. Aplicación al Diagnóstico y Pronóstico por Imagen Virtual in Vitro (Spanish Research Ministry Project DPI2008-06071). • AIDPATH-Academia and Industry Collaboration for Digital Pathology (European Commission, FP7). The principal objective of these projects is the development of tools to support histopathological image management, including Tissue Microarray (TMA): • Processing, acquisition and archiving of TMA cores: 1. Core detection. 2. Preprocessing to remove possible noise from the signal (image), morphological alterations or irregularities of the cores. 3. Core segmentation. 4. Registration of core alignment. • TMA core classification with Hematoxylin and Eosin (HE) stain. This includes the classification and characterization of Region of Interest (ROI) by colour and texture analysis. • Integration of all techniques and algorithms developed in a flexible system which supports the pathologists in their daily task. The author had the collaboration of several pathologists at the department of Anatomical Pathology of Ciudad Real. They have collaborated in the TMA core acquisition providing TMA glass slides and the TMA images captured with scanner Aperio ScanScope T2. With respect to the classification process, pathologists classified each training and testing image and later evaluated the classification results.. 1.2. TMA Introduction. The TMA represents a powerful new technology designed to assess the expression of proteins or genes across large sets of tissue specimens [1]. A TMA is an ordered array of up to several hundreds small cylinders of single tissues (core sections) in a 2.

(35) CHAPTER 1. INTRODUCTION paraffin block from which sections can be cut and treated like any other histological section, using Immunohistochemistry (IHC) for protein targets and In Situ Hybridization (ISH) to detect gene expressions or chromosomal alterations [2], [3]. Fig. 1.1 shows the TMA acquisition process. TMA allows rapid and reproducible investigations of biomarkers. The integration of TMA and clinical pathology data is emerging as a powerful approach to molecular profiling of human cancer [4], [5]. Another use of the TMA is to provide random samples of a representative lesion. TMAs may be evaluated by automated methods in order to achieve an objective diagnosis in pathology, which is nowadays one of the diagnostic laboratories with more human intervention (manual work), and more subjective assessment [6],[7],[8]. The high speed of scanning, the lack of significant damage to donor blocks, and the regular arrangement of scanned specimens substantially facilitates automated analysis [9].. Figure 1.1: Automatic TMA core acquisition.(Image extracted from [10]). Nowadays TMA analysis is made manually. The pathologist provides a diagnostic based on microscopic observations of the samples. In this way, the evaluation can be subjective. For that reason, the automation of this task is very interesting in a way to provide the pathologist with a tool that automatically analyses (at different magnifications) the samples and produces a decision result. TMA can gather dozens or even hundreds of tumours in a paraffin block and be used to analyze large molecular markers. Besides, it should be pointed out that TMA allows carrying out simultaneous and standardized studies of multiple samples with. 3.

(36) CHAPTER 1. INTRODUCTION uniform staining. All this allows reducing the economical and temporal cost of TMA preparation and interpretation [11], [12], [13], [14], [15], [16], [17]. TMAs are also useful in the study of non neoplastic diseases and using cytologist [3]. They have the potential to accelerate molecular studies which seek an association between molecular changes and clinicopathological features of the tumour. It is important to know that TMAs are designed to examine tumor in a population instead of individual tumours. For this reason, they are used as a tool for population screening [5]. Conventional techniques of molecular pathology require large amounts of time and tissue. There are different methods to facilitate and optimize these studies like the Complementary DNA (cDNA) microarrays and the aforementioned TMA. The cDNA microarrays technique allows studying expression changes in a large number of genes in only one tumour sample [18]. TMA allows making evaluations in Deoxyribonucleic Acid (DNA) genetic alterations (Fluorescence In Situ Hybridization (FISH) or IHC). Immunohistochemistry studies entail the TMA staining with markers to examine its alterations. Well-known breast tissue markers (which will be used in this PhD thesis) are HE stain, the semi-quantitative protein analysis of Estrogen Receptors (ER), Human Epidermal growth factor Receptor-type2 (HER2), and Progesterone Receptors (PR) or the Ki67 protein expression. HE is one of the most commonly used stains in histology. Hematoxylin is a cationic stain which stains with blue colour the tissue acid structures (basophilics) such as cellular nucleus. On the other hand, eosin is an anionic dye which stains with pink colour the basic components (acidophilus) like the cytoplasm. Then, the tissue structures are stained when this type of marker is applied on the TMA [19] as shown in Table 1.1. Studies and tests performed for this PhD thesis are based on breast TMA stained with HE. Table 1.1: Coloration of the breast tissue structures with HE. Tissue structure cellular nuclei Cytoplasm Musculature Red blood cells Fibrin. Colour Blue Pink Red, pink, fuchsia Red, orange Pink. TMA are also used as a quality control technique to evaluate the sensitivity and specificity of antibodies, tissue sensitivity, fixing tissue methods and the optimization of staining protocols. Besides, TMAs can be used to optimize and standardize IHC interpretation [20]. They are also a valuable tool for a rapid projection of markers and alterations of different tumor types [5], [11], [21]. Another TMA feature is the provision of representative random samples of lesions which can be evaluated by objective automatic methods (image processing: nuclear density, chromatic density, inmunohistochemistry staining) [6], [7], [8], [14].. 4.

(37) CHAPTER 1. INTRODUCTION Despite all the advantages, working with TMA is difficult, both in data acquisition and in its management and interpretation. The use of IHC with TMA generates large amounts of information, which requires careful analysis. Currently, this analysis is done manually under the microscope, which besides being a tedious job that hinders the workflow, is subject to errors due to subjective interpretations of the specialists. The automatic analysis of TMA data and multicenter studies is still a challenge [22], [2], [15], [23], [24], [17], [16]). As a consequence the notes of satined cells with IHC is one of the bottlenecks in proteomic analysis based on antibodies. The use of automatic acquisition systems for various digital imaging and tissue staining, as well as the development of tools for processing these images, will help to overcome these difficulties. Another difficulty in TMA analysis is that usually the cores are neither aligned nor regular and they do not have enough tissue to be evaluated. There are different problematic situations when the TMA analysis is performed. The tumour sample in the cores has a diameter of 0.6 mm which represents approximately the 0.3% of the tumour [25], [14]. This causes several validation studies to be done depending on the tumour type and the stain and also the analysis of the tumor cores to be done for their evaluation [26], [27], [28], [29], [8]. Finally, in some cases the cores are neither aligned nor regular, besides the typical problems of digital images such as noise, distortion, so on. This may lead to lost cores in the detection process. Thus, there is a need to develop reliable tools to acquire, share and assess microarrays and related data [2], [23], [15]. Another use of the TMA is to provide random samples of a representative lesion, which may be evaluated by automated methods in order to achieve an objective diagnosis in pathology, which is nowadays one of the diagnostic laboratories with the most human intervention (manual work), and the most subjective assessment [6],[7],[8]. The high speed of scanning, the lack of significant damage to donor blocks, and the regular arrangement of scanned specimens substantially facilitates automated analysis [9].. 1.3. State of the Art: TMA Core Acquisition and Analysis. There are some studies on how to perform TMA core acquisition: Della Mea [15], Demichelis [23], Shaknovich [30], Liu [31], [6], [32], [33] and TAMEE [34]. However, most of them are not completely automatic or can not handle high dimensional images forcing the continuous pathologist presence. Commercial tools like: SpotBrowser3 (ALPHELYS software) [35], TMA Slide Evaluation and Analysis [36] or TMALab [32] also provided analysis and classification methods but they are base on other measurements and stains different than those used for HE. SpotBrowser3 allows to locate positive or negative nuclei based on HSV colorimetry and morphometry methods. However, this detection can be combined with manual 5.

(38) CHAPTER 1. INTRODUCTION operations wherever detection cannot be achieved automatically such as the identification of a tumour area. In the case of TMA Slide Evaluation and Analysis software, the detection is currently provided by 5 algorithms: HistoQuant, NuclearQuant, MembraneQuant, DensitoQuant, FISHQuant. HistoQuant detects regions of interest based on colour and intensity information but the process is not automatic because previously the user must interact with the tool. NuclearQuant, MembraneQuant and DensitoQuant performs automatic evaluation and measurements on IHC nuclear stainings (ER, PR, Ki67, etc.). FISHQuant performs different measurements using scan FISH samples. Finally, TMALab is web-based and allows pathologists to analyse and manage TMA samples from anywhere. This software provides IHC nuclear, IHC membrane, colour deconvolution and co-localization algorithms. However, as far as we know, TMALab does not have the option to automatically locate malignant tissue areas. It should be also mentioned that there are multiple studies about histopathological image classification. They will be reviewed in Chapter 4. However, focusing on those which perform TMA core classification using texture and colour descriptors we find the works of: Ahonen [37], Fuchs [38] and Xin [39] using Local Binary Patterns (LBP) as textural descriptors, Le [40] using wavelets and Amaral [41], Yang [42] and Chekkoury [43] using textons descriptors. The studies of Ahonen and Fuchs showed TMA core classification using IHC stain which previously stains the cancerous tissue areas. In the case of Le, the criteria for scoring is based on stromal fibroblast and the classification is performed all over the core. This type of classification has some disadvantages. First, if the cancer is located only in a small tissue region it can not be detected. Second, the classification score may vary depending on the cancer type. Xin used LBP descriptors, Haralick coefficients and a texture feature coding method to extract the texture features. The TMA core is previously segmented to find the ROIs by different thresholding and morphological operations. However, it is not clear if these operations are automatically performed or manually selected. Amaral and Yang used texton histograms to classify breast TMA cores. Amaral divided the breast tissue into four classes: tumour, normal, stroma and fat. In the case of Yang, the division has three classes, that is, cores with benign tissue and samples of ductal and lobular carcinoma in-situ and with metastasis respectively. Both works used the complete core to perform the classification. However, a TMA core can present several of these classes at the same time and therefore the classification result could be wrong. Finally, Chekkoury compared different methods to classify breast tissue. One of this methods is texton histograms which obtains a valuable classification results. The images utilized are regions of cancerous and non-cancerous HE biopsy samples. Once again, the classification process used images which had to be pre-selected by the pathologist.. 6.

(39) CHAPTER 1. INTRODUCTION This literature review has described the common problems found in the current systems and studies about breast tissue and TMAs. In the next section, the solutions and objectives proposed in this thesis will be described.. 1.4 1.4.1. Framework Motivation. TMAs are tissue biopsies that have not been widely used in research. However, as mentioned above, they represent a powerful way to perform population studies about cancer on different areas of the human body. Nowadays, the management and interpretation of the breast TMA is made by the pathologist. However, this is a tedious job due to the amount of information acquired and the study required. For that reason, the development of automated methods for breast TMA core acquisition and classification would reduce the diagnostic time. Although there are several studies and systems about histopathological tissue and TMAs, it seems that a complete tool which handles automatically TMA cores stained with HE does not exist. For that purpose, two objectives have been set in this PhD thesis: 1) the individual TMA core acquisition from the whole TMA and 2) automatic TMA core classification. This second objective is the most difficult to achieve. Several algorithms that allow performing a TMA analysis and diagnosis based on texture and colour models will be developed. The ultimate goal in this thesis is to develop a tool that encompasses both objectives. This tool should be completely automatic and get efficient results of classification and computation time. Besides, it would be highly recommended that the application can execute several processes at the same time.. 1.4.2. Materials for TMA Core Acquisition. The acquisition of the digital TMA images has been carried out using the robotized microscope ALIAS II and the Aperio ScanScope. The ALIAS II microscope (LifeSpan Biosciences Inc.) has lens for magnifications at 1.24x (thumbnail), 5x, 10x, 20x and 40x, a LED-type light source and a large format camera with a resolution of 2048 x 2048 pixels. Four TMA datasets with a total of 21244 cores have been processed. The datasets are as follows: a) A dataset composed of 9 TMAs, 5 Gastrointestinal Stromal Tumor (GIST) with brown staining for KIT IHC and 4 of breast cancer stained with HE prepared with a manual tissue arrayer composed of 56 cores/TMA and digitalized with the motorized microscope ALIAS II at 5x, 10x, 20x and 40x. b) A dataset composed of 15 TMAs (14 GISTs and 1 of breast cancer stained with HE) prepared with a manual tissue arrayer composed of 56 cores/TMA and digitalized with Aperio ScanScope T2 at 40x.. 7.

(40) CHAPTER 1. INTRODUCTION c) A database composed of 10 TMAs stained with IHC against D2-40, anti-Clusters of Differentiation (CD)34 antibodies and Alcian blue for angiogenesis research. This dataset was prepared with an automatic tissue arrayer composed of 70 cores/TMA and digitalized with Aperio ScanScope T2 at 20x and 40x. d) A database composed of 384 TMAs stained with IHC against anti-CD123 antibodies with the ENDVISIONT M FLEX (DAKO) method using a diaminobenzidine (DAB) chromogen for breast cancer analysis. This dataset was prepared with an automatic tissue arrayer composed of 50 cores/TMA and digitalized with Aperio ScanScope T2 at 40x. The TMA paraffin blocks were prepared at different Institutions with different biopsy core needle sizes and as above mentioned, with both manual and automatic tissue arrayer. Therefore, the datasets cover a full range of core sizes of 1mm, 1.5 mm and 2mm diameter. Furthermore, the resolution and acquisition method, due to the CMOS sensor size, is different for the scanner than for the microscope. The scanner resolution (µm/pixel) is 1.2 times higher than the microscope. Thus, the acquisition of microscopic fields is square-by-square, from the upper left corner to the lower right one. Thus, the final image is a mosaic composed of multiple files 2000 x 2000 each. The Aperio ScanScope T2 uses a linear camera, where the acquisition file corresponds to a strip set of 1000 x D, where D varies between 72098 and 87891 pixels. Experiments were performed on an Intel Core i7 950 3.07 GHZ computer with 12 GB RAM. The method has been implemented using C/C++ and the Intel Integrated Performance Primitives (IPP) and Open Source Computer Vision (OpenCV) libraries for image processing. Also, the Intel Threading Building Blocks (TBB) library has been used for parallel processing.. 1.4.3. Materials for TMA Core Classification. Once the breast TMA stained with HE were digitalized, each TMA core is extracted using several image processing methods. Then, using the TMA core images at 10x, 628 representative regions of 4 tissue classes were selected. These regions of interest have been selected manually and under the supervision of a pathologist. The size of these regions was 200 x 200 pixels (0.74µm/pixel at 10x) and the TMA tissue classes were: i) benign stromal tissue with low and medium cellularity (170 images), ii) adipose tissue (103 images), iii) benign structures and anomalous (163 images) and iv) different kinds of malignity, that is, ductal and lobular carcinomas (192 images). The first class (i) is characterized by the pink hue-blue stromal cells prior staining due to tissue with HE. The second class (ii) is represented in the images as bubbles on the tissue stroma. The third class (iii) shows lobules and other anomalous structures. The types of anomalous benignity represented in class 3 are: sclerosing and adenosis lesions, fibroadenomas, tubular adenomas, phyllodes tumours, columnar cell lesions and duct ectasia, see Fig. 1.2.. 8.

(41) CHAPTER 1. INTRODUCTION. Figure 1.2: Benign structures and benign anomalous structures in TMA images stained with HE. A) Terminal ducts and lobules, B) Sclerosing lesions (radial scar), C) Adenosis lesions, D) Fibroadenomas, E) Tubular adenomas, F) Phyllodes tumors, G) Columnar cell lesions and F) Duct ectasia. 9.

(42) CHAPTER 1. INTRODUCTION. Figure 1.3: The four classes selected to perform the TMA core classification. Finally, the fourth class (iv) is characterized by the different kind of malignity. The images of this class show ductal and lobular carcinomas in-situ and invasive. Some examples of the tissue classes are shown in Fig. 1.3. Experiments for classification were performed in the aforementioned computer. The method has been implemented using C/C++ and the Intel IPP, OpenCV libraries for image processing and MATLAB. Also, the Intel TBB library has been used for parallel processing.. 1.4.4. Objectives. The aim of this PhD thesis is to perform a complete study of breast TMA classification based on textural descriptors and colour models. Also, and valuable result, the author developed a support tool for pathologist. The tool allows in a efficient and reliable way the TMA core acquisition and classification. As mentioned above, this classification is based on four different breast tissue classes. To achieve this general objective, other partial objectives were needed. These objectives are denominated as: O1, O2, O3 y O4.. 10.

(43) CHAPTER 1. INTRODUCTION • O1: Breast TMA database acquisition Images obtained from microscopes or scanners are images of a complete TMA, whatever the magnification in which they are extracted. An essential phase for this PhD thesis is the acquisition of the TMA core images. For that purpose, each TMA core must be detected, extracted and archived as a new image. However, the whole TMA core is not used because classification would be incorrect. The reason is that a given core may content different types of tissue: stroma, adipose tissue, lobules and a in-situ ductal carcinoma. Obviously, a pathologist classifies the core as malignant but in a computer the benign texture might influence the result. Therefore, significant interest regions of each class were extracted from the TMA cores. Thus, all the distracting information that could affect in the classification is removed. Details about the regions of interest were explained in the Materials section of this chapter. • O2: Texture feature extraction Texture descriptor analysis is a fundamental objective of this PhD thesis. It involves the study and development of the texture descriptors explained in Chapter 4. However, prior to extracting the texture features, this objective also includes the transformation of the Red Green Blue (RGB) database images to several colour models and combinations. The texture of breast tissue looks different depending on the colour model used. That is one of the keys to improve the classification results. Once the descriptor algorithms are developed they are used to filter all TMA core images. Then, the texture feature were obtained and later used to perform the classification. However, the number of features could increase exponentially due to the amount of descriptors and colour models. Therefore, it is necessary to reduce the space of features. Different methods to reduce the feature set were tested and compared. • O3: Breast TMA classification This objective addresses the TMA classification. Most classification problems depend on the selection of two factors: 1) feature descriptors and 2) classifiers. Several test were carried out to improve the classification results: descriptors and colour models, descriptor combinations, descriptors and colour model combinations and feature set reduction. The importance of selecting a suitable classifier should not be underestimated. That could improve the accuracy rates over our feature dataset. • O4: TMA CAD System Finally, the last objective is the assembly of previously objectives in a single and automatic tool that gives support to the pathologist in their daily work. The tool 11.

(44) CHAPTER 1. INTRODUCTION does not replace the pathologist work but it will help you to save time some essential in the cancer detection.. 1.5. Structure of the Document. This section provides an overview of the chapters of this PhD thesis starting from the second chapter. • Chapter 2: Interpretation of breast tissue. This chapter is an introduction to the different types of breast tissue: benign, benign anomalous and malignant cases. Most of them will be used in the classification process. • Chapter 3: TMA core detection and extraction. As explained previously, an algorithm to extract the TMA core images is needed. This chapter explains the methodology used to select, extract and archive the TMA core images at different magnifications. • Chapter 4: Colour and histopathological image analysis. This chapter describes in detail all colour models used in this PhD thesis and their use in other histopathological analysis. • Chapter 5: Texture Descriptors. The different texture descriptors used in this PhD thesis are explained in this chapter. The literature about texture descriptors applied to TMA or other histopathological images is reviewed. Besides, this chapter covers the dimensional reduction methods used. • Chapter 6: Feature Selection and Classification. The amount of features was increased due to the number texture descriptors and colour model combinations. A dimensional reduction was need to reduce the feature dataset. Chapter 5 shows the different methods used. Besides, this chapter details the tissue classification. Classifiers and classification methods are theoretically explained. • Chapter 7: Acquisition and Classification Results. Briefly, this chapter describes the final results obtained in the TMA core acquisition and the breast tissue classification. For classification, the results are ordered depending on the use and combination of colour models, texture descriptors and features.. 12.

(45) CHAPTER 1. INTRODUCTION • Chapter 8: TMA CAD System. Chapter 8 presents the application developed to combine the TMA core acquisition and classification algorithms. Besides, this chapter shows the results obtained by the tool and the pathologist opinion. • Chapter 9: Conclusions. Chapter 9 presents the conclusions reached in this PhD thesis and future improvements.. 13.

(46) CHAPTER 1. INTRODUCTION. 14.

(47) CHAPTER 2. INTERPRETATION OF BREAST TISSUE. Breast tissue is composed of a variety of types and structures. In order to classify the tissue in a breast biopsy it is very important to recognize each type and define whether it is benign or malignant. This chapter makes a deep explanation about the breast tissue types and their properties. This chapter has been done with the collaboration of expert pathologist from the Department of Anatomical Pathology of Hospital General Universitario de Ciudad Real, who have provided additional information and insights to our comprehension of breast pathology.. El tejido mamario está compuesto por una amplia variedad de tipos y estructuras. Cuando se clasifica el tejido de mama de una biopsia es imprescindible saber reconocer cada tipo y definir si es benigno o maligno. Este capítulo ofrece una profunda descripción sobre los tipos de tejido mamarios y sus propiedades. El capítulo ha sido realizado con la colaboración de expertos patólogos pertenecientes al Departamento de Anatomía Patológica del Hospital General Universitario de Ciudad Real, los cuales han proporcionado información y conocimiento sobre la patología de la mama.. 2.1. Breast Tissue. Breast cancer is the most common type of cancer and the fifth leading cause of death in women over the age of 40. Thus, breast screening programmes carried out to women aged 50 to 70 if they do not have risk factors or aged from 40 if they have risk factors, 15.

(48) CHAPTER 2. INTERPRETATION OF BREAST TISSUE such us, genetics, are very important. The aim of screening is to reduce deaths from breast cancer by finding and treating the disease at an early stage [44]. During the life of a woman, her breast may suffer a wide variety of changes and appear lumps or abnormalities which may be benign or malignant. The 80% of the lumps found in the breast which require a biopsy are benign. These kinds of benign breast tissue are classified in different categories, such as, normal tubules and ducts, sclerosing lesions, fibroadenomas, bening phyllodes tumors or duct ectasias. In this PhD thesis all benign structures will go under the same class defined as benign and benign anomalous breast tissue [45]. Three types of breast lesions can be defined, that are: benign, benign anomalous and malignant tissue. These three types may also be divided in other sub-types with different features in order to define their malignancy or benignity grade. These features are characterized by the amount and the size of cells in the stroma, anomalous structures and number of mitosis in the cells. A brief explanation of the breast TMA types and other subtypes will be introduced in the next sections.. Figure 2.1: Anatomy of the female breast. (Image extracted from [45]).. 2.1.1. Benign Tissue. Breast is composed of several types of tissue but mainly they are classified in glandular and connective tissue, known as stroma (see 2.1).. 16.

(49) CHAPTER 2. INTERPRETATION OF BREAST TISSUE. (a). (b). (c). (d). Figure 2.2: Benign breast tissue: (a) Stroma, (b) Fatty tissue, (c) Stroma with cellularity and (d) Lobules. Tissues responsible for the production of breast milk are found in the glandular tissue type, such as ducts, lobes and lobules. Cells and lobules produce milk when a women is in the breastfeeding period. This milk is then carried through the ducts to the nipple. On the other hand, connective tissue such as fatty tissue or fibrous connective tissue are responsible for shaping the breast. See Fig. 2.2.. 2.1.2. Benign Anomalous Tissue. Pathologist should be able to distinguish and recognize a variety of anomalous benignity in the breast tissue. Anomalous benignity covers a wide range of tissue struc-. 17.

(50) CHAPTER 2. INTERPRETATION OF BREAST TISSUE tures, hence, it was decided to explain in this section only the most common anomalous benignity which have been also used in this work. • Sclerosing lesions: radial scar and complex sclerosing lesion Sclerosing lesions are noted for being benign glands and tubules trapped and distorted by the fibrous connective or fibroelastic stromal tissue. This kind of breast lesion looks like some breast cancers. Although this lesion is benign can be followed by adenosis, epithelial hyperplasia, papillomas and cyst formation. Sclerosing lesions usually do not cause symptom and are found when a biopsy is performed for other cause but it is recommended their completely surgical extirpation.. (a). (b). Figure 2.3: Radial scar. (a) Tissue at low magnitude and (b) High magnification of the areas of the first image. (Image extracted from [46]). Radial scars are sclerosing lesions which are characterized by a central fibroelastic nucleus with glands which are trapped and surrounded by a myoepithelial layer. Perimeter ducts and lobules are radiated circumferentially from the central nucleus, see Fig. 2.3. They are called scar because they look like the common scars under the microscope. Complex sclerosing lesions are radial scars which look less organized. • Sclerosing adenosis: Sclerosing adenosis is the most common type of adenosis. This is a breast benign lesion characterized by a lobulocentric proliferation of glands and tubules accompanied by a stromal proliferation. That produces variable glandular compression and distortion, see Fig. 2.4. Besides, this type of lesion causes pain and breast discomfort. Usually, sclerosing adenosis produces microscopic changes 18.

(51) CHAPTER 2. INTERPRETATION OF BREAST TISSUE but in some cases adenosis can produce swelling and shows up as calcifications on mammography. Adenosis is often difficult to differentiate from a malignant lesion.. (a). (b). Figure 2.4: Sclerosing adenosis. (a) Tissue at low magnification and (b) High magnification of the areas of the first image. (Image extracted from [46]). • Fibroadenoma: Fibroadenoma is the most common benign breast tumour. Frequently it appears in young women, specially those below 30 years old, although it can appear at any age. Usually, this lesion is characterized as a solitary mass, palpable, firm and mobile with an average size of less than 3cm. There are two fibroadenoma growth patterns: a intracanicular pattern where the glands are distorted, stretched and compressed in the stroma and a pericanicular pattern where the stroma surrounds glandular structures with lumen, Fig. 2.5. Both patterns can coexist in the breast. However, fibroadenoma with a intracanacular patter can be interpreted as a phyllodes tumor or as a intraductal papilloma. Although this kind of lesion does not increase the risk of breast cancer, it is recommended their removal if they continue grown. In some cases, if the pathologist is sure that the lesion is a fibroadenoma, which is not to be removed but needs periodic reviews.. 19.

(52) CHAPTER 2. INTERPRETATION OF BREAST TISSUE. (a). (b). Figure 2.5: Fibroadenoma. (a) Intracanalicular pattern and (b) Pericanalicular pattern. (Image extracted from [46]). • Tubular adenoma The tubular adenoma is a well-defined breast lesion which has common features with fibroadenomas. This type of lesion is usually grown in young women and does not show external symbols in the skin or the nipple. It is characterized for being a well-defined lesion, like the fibroadenomas, but softer and with a touch of brownish colour. The tubular adenoma presents predominantly tubular elements and minimal amount of fibrous stroma. Tubules are aligned by a double row of cells but myoepithelial cells are often inconspicuous, Fig. 2.6.. (a). Figure 2.6: Tubular adenoma. (Image extracted from [46]).. 20.

Referencias

Documento similar

The Temperament and Character Inventory was developed as a neurobiologically-based model of the evolution of learning by extending the research of Jeffrey Gray on the relationship

 Ferrimagnetism - Magnetic behavior obtained when ions in a material have their magnetic moments aligned in an antiparallel arrangement such that the moments do not

The total and polarized emission of the C80/A80 structure, its lack of motion, and brightness temperature excess are best reproduced by a model based on synchrotron emission from

Since such powers frequently exist outside the institutional framework, and/or exercise their influence through channels exempt (or simply out of reach) from any political

Therefore, the aim of this research was to estimate the sensory shelf life of breast and leg quarter meat obtained from chickens fed on the HC diet (based on HC maize)

In the previous sections we have shown how astronomical alignments and solar hierophanies – with a common interest in the solstices − were substantiated in the

Nevertheless, the ratio of corporations using patents of introduction is higher than that obtained from the analysis of the entire patent system as a whole (around 8.5% from 1820

The two major breast cancer susceptibility genes BRCA1 and BRCA2 are involved in 30% of hereditary breast cancer cases, but the discovery of additional breast cancer