• No se han encontrado resultados

Retinopathy Detection

In this section, we extend the global data-driven and self-explainable referable DR detector, presented in Section 3.2.7, to involve also local two-tier saliency-oriented rep-

resentations. For answering Question Q2.8, we use locally significant image regions to

capture evidence that might be stressed to enhance the model. Note that the current stage requires a reasonably well-optimized model trained in the previous stage (last section).

NN1 Fisher Vector NN2 Map(GT) Referral DNN Referral DNN Training Referral DNN Map(+)

Map(-) Referral DNN Fisher Vector NN2 Referral DNN AVG Testing Patches Patches Patches Features Global

approach Local saliency-oriented approach

Features Global approach Local saliency-oriented approach AVG NN1 Legend Information flow Network probabilities

Figure 3.4: Overview of the proposed method. Training: two Neural Networks (NN)

are trained; the first based on features from a trained referral deep neural network and

a second one based on lesion patches. Testing: the testing phase combines results of

the two trained neural networks plus the probabilities of the already trained referral deep neural network.

In the next step, we describe the saliency-oriented data-driven local methodology that reinforces the performance of the purely global approach as well as the understanding of the solution. Fig. 3.4 depicts an overview of the proposed solution.

Local saliency-oriented data-driven approach

One of the main novelties in this work is our saliency-oriented local representation method- ology that relies on heatmaps to gather significant regions (from the previous data-driven global decision) for enhancing the pipeline and providing a more robust screening method. In this section, we describe the patch extraction protocol under an image pre- processing viewpoint, briefly detail the encoding technique (fisher vector), and describe the methodology for local feature representation.

Patch extraction

After we pass the original image into the deep network, and propagate back the pixel importance for the decision taken to generate a saliency map (see Fig. 6.11), we process the map and capture coordinates that are sequentially used to capture regions that could be relevant to enhance decision.

The saliency map has the same dimensions of the image. As we intend to capture importance and preserve locality, we initially convert the 3D tensor to a grayscale 2D tensor by summing up the activations per channel. In general, heatmaps are subject to visual noise. To reduce undesirable effects in region selection, we apply a threshold with binarization purposes. Other recent alternatives such as adding noise to reduce noise could be explored [98], but a single threshold was a reasonable choice towards efficiency. We filtered the maps with a threshold of 150, a good trade-off for removing noise and preserving small activations. In sequence, we invert and erode the binary structure (basic mathematical morphology operation) in order to extend chunks and connect close components.

their respective coordinates. To preserve the aspect ratio and produce visible regions to- wards boundaries (e.g., keep the boundaries of lesions), we square the regions and enlarge them using a factor inversely proportional to the original patch size. That operation dou- bles the height and width for small regions (microaneurysm candidates) and extends by 10% the dimensions of large regions (in general blood vessels or possible connected large lesions). Smallest regions (in general microaneurysm) are enlarged more than largest regions. In Fig. 3.5, we show a fundus image superposed with its saliency map, and re- spective significant regions extracted based on pixel importance for data-driven referral decision.

Figure 3.5: Saliency-oriented squared patches from which we extract local descriptors. We enlarge patches in a controlled design taking into account the region sizes: e.g., small regions (in general microaneurysms) are enlarged more than large regions (e.g., soft and hard exudates).

Fisher Vector encoding

Once we have the patches, a Fisher Vector encoding strategy is used to capture their local descriptors by pooling patches features [72, 73]. By combining generative and dis- criminative techniques, we rank low-level patches descriptions based on their deviation from a GMM (generative model) by calculating the patch gradient with respect the model parameters.

Integration

After training the deep model for referable diabetic retinopathy detection, generating the saliency maps for interpretation and local representation, and encoding the multiple patch-based features, we train a shallow neural network to take a novel complementary decision regarding need of consultation.

To avoid the new model to be strictly dependent on the decisions performed by the baseline CNN model, we extract two different, separate mid-level representations for the test sets — one for each class. We extract those maps by guided backpropagation, each of which guided by one specific class/neuron. Given that the groundtruth is known, we can fully use it to extract a unique saliency map and respective mid-level representation for the training set. As exposed in Fig. 3.4, for inference, the final local saliency-oriented decision is taken by averaging the two per-class responses.

Per Patient Analysis

As highlighted in Section 3.2.4, the more data available, the more confident and effective the learned model. One requirement for a robust data-driven model is having available a large amount of data, except when transferring parameters previously learned from a different, but similar task. Since our purpose is examining whether or not a patient needs to see a doctor within one year, we could substantially leverage the accuracy of the model once photographs of the two retinas are available. As such, we combine image information to provide final patient responses both in feature level and score level. When the method involves feature extraction, we concatenate features of both and include a binary indicator variable that refers to left or right (See Equation 3.4). Regarding score level, we assign to the patient the response of the retina that presents the highest probability of needing referral.

Documento similar