• No se han encontrado resultados

Image forgery analysis technique used for tampering detection.

N/A
N/A
Protected

Academic year: 2020

Share "Image forgery analysis technique used for tampering detection."

Copied!
99
0
0

Texto completo

(1)

Image Forgery Analysis Technique used

for Tampering Detection

by

Fausto Isaac Reyes González

Thesis submitted as partial requirement for fulfillment of the degree of

Master in Science, Computer Science

at

Instituto Nacional de Astrofísica, Óptica y Electrónica June, 2018

Tonantzintla, Puebla

Advisor:

Dr. René Armando Cumplido Parra Computer Science Department

INAOE

c

INAOE 2018 All rights reserved

The author grants INAOE permission for reproduction and distribution of this dissertation

(2)

A mi esposa por tantas ayudas y tantos aportes no solo para el desarrollo de mi

tesis sino también para mi vida, a mi madre por el apoyo ilimitado e incondicional

que siempre me has dado, a mi padre por permitirme llevar a cabo todos mis sueños

e impulsarme para lograrlos, gracias por su apoyo incondicional y comprensión.

A mi asesor y director de tesis Dr. René A. Cumplido Parra, por la orientación y

el soporte que me permitió un buen aprovechamiento en el trabajo realizado, y que

esta tesis llegara a buen término.

A mis sinodales, gracias por darme la oportunidad y por el tiempo que me han

dedicado para leer este trabajo.

Al Consejo Nacional de Ciencia y Tecnología (CONACYT) por brindarme la beca

(3)

Contents

Abstract xiii

1 Introduction 1

1.1 Motivation . . . 2

1.2 Main Objective . . . 3

1.3 Specific Objectives . . . 3

1.4 Thesis Organization . . . 4

2 Background 5 2.1 Methods: Active and Passive approaches . . . 6

2.1.1 Attacks . . . 8

2.2 Color Spaces . . . 9

2.2.1 Color Appearance Terminology . . . 10

2.2.2 RGB Color Space . . . 11

2.2.3 L*a*b Color Space . . . 11

(4)

2.3 Saliency . . . 14

2.3.1 Object Saliency Detection . . . 15

2.4 Image Segmentation . . . 16

2.4.1 Superpixel segmentation . . . 16

2.5 Classification . . . 18

2.5.1 Techniques . . . 18

2.5.2 Convolutional Neural Networks . . . 20

3 Related Work 21 3.1 Active Methods . . . 21

3.2 Passive Methods . . . 23

3.2.1 Edge detection Methods . . . 24

3.2.2 Noise irregularities . . . 25

3.2.3 Illuminant features . . . 26

3.2.4 Texture Features . . . 28

3.2.5 Other Features . . . 29

3.3 Saliency Detection . . . 31

3.3.1 Object saliency for tampering detection . . . 32

3.4 Conclusion . . . 33

4 Proposed Method 35 4.1 Color Space Conversion . . . 36

(5)

4.2 Object Saliency Detection . . . 39

4.3 Classification . . . 40

4.3.1 Feature Extraction . . . 40

4.3.2 Classifiers . . . 41

4.4 Object Segmentation . . . 42

4.5 Image Restraints . . . 44

5 Experiments and Results 47 5.1 Datasets . . . 48

5.1.1 CASIA I . . . 49

5.1.2 CASIA II . . . 50

5.1.3 Columbia Color . . . 51

5.1.4 Developed Dataset . . . 52

5.2 Analysis and Discussion . . . 53

6 Conclusions and Future Work 71 6.1 Summary . . . 71

6.2 Conclusion . . . 72

6.2.1 Main objective . . . 73

6.2.2 Specific Objectives . . . 73

(6)
(7)

List of Figures

2.1 Image Forgery Detection Approaches . . . 6

2.2 Active Approach . . . 7

2.3 Passive Approach . . . 7

2.4 Summary of attacks . . . 9

2.5 (a) Splicing Attack, (b) Copy-Move Attack . . . 10

2.6 RGB model representation . . . 12

2.7 L*a*b model representation . . . 13

2.8 HSV model representation . . . 14

2.9 Visual Saliency Examples . . . 15

2.10 Superpixel Segmentation Example . . . 17

3.1 Difference between edges in an image . . . 25

3.2 Noise Estimation Methods . . . 27

3.3 Itti - Koch Model Architecture . . . 32

(8)

4.2 Analyzed Image . . . 37

4.3 Hue . . . 38

4.4 Saturation . . . 38

4.5 "a" channel . . . 38

4.6 Saliency detection . . . 39

4.7 AlexNet Architecture [49] . . . 42

4.8 Characteristics of restraints . . . 45

5.1 Example of Images within CASIA v1.0 . . . 49

5.2 Example of Images within CASIA v2.0 . . . 51

5.3 Example of Images within Columbia Color . . . 52

5.4 Example of Images within Own Developed Dataset . . . 53

5.5 ROC curve for CASIA v1.0 AUC = 0.9662 . . . 56

5.6 Examples from CASIA v1.0 . . . 57

5.7 Examples from CASIA v1.0 . . . 58

5.8 ROC curve for CASIA v2.0 AUC = 0.9704 . . . 60

5.9 Examples from CASIA v2.0 . . . 61

5.10 ROC curve for Columbia Dataset AUC = 0.9828 . . . 63

5.11 Examples from Columbia Color Dataset . . . 64

5.12 ROC curve for OWN Dataset AUC = 0.9253 . . . 66

(9)

A.1 Proven Color Spaces . . . 75

A.2 HSL color space . . . 76

A.3 HCL color space . . . 76

A.4 HSV color space . . . 76

A.5 L*a*b color space . . . 76

A.6 YCbCr color space . . . 76

(10)
(11)

List of Tables

3.1 Summary of cited works that focus on passive approaches . . . 34

5.1 Complete results from CASIA v1.0 evaluation experiments . . . 55

5.2 Comparison Results for CASIA v1.0 . . . 56

5.3 Complete results from CASIA v2.0 evaluation experiments . . . 59

5.4 Comparison Results for CASIA v2.0 . . . 60

5.5 Complete results from Columbia Color evaluation experiments . . . . 62

5.6 Comparison Results for Columbia Color . . . 63

5.7 Complete results from Developed Dataset evaluation experiments . . 65

5.8 Best Result for Own Dataset . . . 66

(12)
(13)

Abstract

Images are found everywhere, from advertisements in printed media to the extensive information available over the Internet. However, it is not precisely known how much of this digital content is free of modification; thus it is important to have identification methods. Image Forgery detection techniques aim to identify if the media has any modification without prior information of the source, either identifying the false object or only assessing if the image is authentic or counterfeited. One of the most common alterations done to images is called splicing, which consists in cropping an object or region from an image and attach it to a different image in order to change the information present in the new source. There are several methods that address this attack, by dividing the image into blocks or regions to obtain characteristics in an effort to identify the inconsistencies caused by the attack. This features can be edges, blobs or characteristics in noise changes, illumination changes obtained throughout the image or in sections. In this work, we propose a method that uses a human perception model called Object Saliency Detection, that helps to assess the region where splicing occurs and later determine whether the region is part of the attack. Later, a Superpixel segmentation process helps us to identify the copied object to point out the splicing attack correctly.

(14)
(15)

Chapter 1

Introduction

With the increase in media content found over the Internet, it is becoming more difficult to discern between what is authentic information and what is false, mainly in images which have more probabilities to be modified. The wide range of software to alter information has led to an increase of forgery. Additionally, digital media, such as image and video, transmitted through computer networks has a low level of security due to open system nature of the Internet. Alteration of information can be done for different purposes, from hiding information of the original media to increase information present inside images or video, this kind of media can be easily manipulated without almost any perceptual distortions, and these alterations are sometimes used to generate strife or damage. This kind of multimedia data needs the means to be authenticated, in the case of images if they serve as vital information, then it is of utmost importance to know if they are pristine or altered. This application is essential for the identification of the victim –modified images-and serving as evidence for multiple purposes. In order to ensure proper integrity service for images, it is important to distinguish malicious manipulations of diverting the original image content; manipulations associated with its use or are stored in digital form, format conversion, compression, resembling, filtering, made by content

(16)

providers or users themselves. There are two main approaches to solve the issue, using active methods that rely on information embedded inside the original image and passive methods that only depend on the image to analyze. It is important to discern that forgery analysis in real life scenarios will be different to specific re-search attacks, because modified images suffer from more than one type of forgery, additionally in most cases the only information available is the tampered image, as such a passive method seems to be a more practical approach for this problem. Im-age modification can be done for several purposes, be it for recreational uses, or to more obnoxious cases as mentioned by Farid [1] in which a researcher fabricated his results, which were presented in one of the most prestigious scientific journals, lead-ing to an increase of awareness of submitted information, that can be tampered with.

In this work, we propose to develop an image forgery detection technique, primarily focused on the splicing attack using a passive method approach. Additionally, we are going to work with Object Saliency Detection, which determines the most notable object or region inside an image.

1.1

Motivation

It is becoming easier to modify images, as such, tampering detection is an imperative area to further knowledge on, developing new methods for image authentication or refining current methods will lead to an enhancement of this application. There are several methods for image tampering detection, active methods that rely on inserting information in the images need more control and supplementary processing whereas passive methods are more realistic in a sense that in real life applications we only have the media to analyze. Based on the last, methods range from diverse ideas, from extracting noise inconsistencies from the tampered regions, to use classification of features to determine the validity of the image. Most works are based on obtaining

(17)

features from the image, these can also be inconsistencies present inside images. These methods can be improved, the reasoning behind this is the fact that it is only possible to identify a region over the image to determine the tampering, and it is quite challenging to identify the spliced object. Thus, if it is possible to asses a region of interest and determine if this selection is original or false then, correct identification of the spliced object will be feasible. To solve this issue this work proposes usage of a segmentation processing only over the selected region.

1.2

Main Objective

Propose an Image Forgery Detection Technique focused on the splicing attack based

on a passive approach, capable of detecting a forged region by using image

segmen-tation.

1.3

Specific Objectives

• Propose a passive approach in order to identify tampering over images focused on the "Splicing" attack.

• Using the proposed approach, obtain the spliced object from the analyzed image.

(18)

1.4

Thesis Organization

This dissertation is organized as follows:

Chapter 2 introduces the concepts of tampering detection, outset on the origins of the methods of forgery detection, concepts and methods. Following information of color spaces and human perception methods known as saliency detection.

Chapter 3 is a review on the methods for image forgery detection, with main focus on Splicing detection present in the state of the art. Additionally, works on saliency detection used as human perception model, feature extraction model and related work on forgery detection fusing object saliency enclosure.

Chapter 4 introduces and describes the proposed method, the merger between a saliency detection method with classification and segmentation for splicing detec-tion. An explanation on each of the stages of the proposed method that results in an enhancement of actual works.

Chapter 5 has the results from all the dataset used in this thesis, additionally conclusions and findings in each section.

Chapter 6 presents conclusions and major contributions of this work.

Over the Appendix A, images from experiments to create the first step of the method are presented.

(19)

Chapter 2

Background

As an outcome of the growth in digital media manipulation, and focusing on ma-licious modifications of media such as images, Digital Forensics and Digital Water-marking as branches of multimedia security [2] aim at exposing these ill-disposed alterations. The forensic analysis of digital images then refers to the reconstruction of the generation process of a given digital image, where the main focus lies on in-ference about the image’s authenticity and origin [3]. In contrast, watermarking is defined by Cox as the practice of imperceptibly altering a work to embed a message about it [4]. Digital watermarking consists in hiding a mark or a message in a pic-ture in order to protect its copyright [2]. Furthermore, image processing for forensics shares similar techniques to those of digital watermarking and also to steganography, which consists in communicating secretly via some media. Taking this into account and in order to enhance the techniques, it is important to have a classification of the image authentication approaches. Those that have information embedded in the image, either a digital signature or watermark, are considered as active approaches whilst those where the image comes as it is are called passive methods [5]. In this chapter both methods are presented, additionally concepts of color spaces, saliency and segmentation are included.

(20)
(21)

one, being methods that use statistical characteristics or DCT coefficients and the avoidance in employing semantic information. For the next level, the detection of tampering operations is done with simple semantic data such as sharp edges, blurring and light inconsistencies. Whereas the high level, is also known as semantic level, it is tough for the computer to use semantic information to do tampering detection because the aim of tampering is changing the meaning of image content it originally conveyed.

For visual understanding, figures 2.2 and 2.3 depict both methods.

Figure 2.2: Active Approach

(22)

2.1.1

Attacks

Now that the main methods to detect image forgery have been explained, it is necessary to classify the types of forgeries present in literature.

Categorization of the modifications are made on three primary groups, according to A. Kashyap[8] there is Copy-Move forgery, Image Splicing, and Image Resampling. In copy-move forgery also known as cloning, a part of the picture of any size and shape is copied and pasted to another area in the same picture to shroud some important data. As the copied part originated from the same image, its essential properties such as noise, color and texture don’t change and make the recognition process troublesome. There are two derivations from this attack, if happens by copying one area within an image and pasting it onto the same image is plain copy move, however there is also copy create which consists in making use of one or more different images where various parts are copied, pasted and a forged image is created [6]. Image splicing uses cut-and-paste systems from one or more images to create another fake image. When splicing is performed precisely, the borders between the spliced regions can be visually imperceptible. These insights can, therefore, be utilized as a part of distinguishing the tampered areas. To make an astounding forged image, some selected regions have to undergo geometric transformations like rotation, scaling, stretching, skewing and flipping. The interpolation step plays an essential role in the resampling process and introduces non-negligible statistical changes. Resampling introduces specific periodic correlations into the image. These correlations can be utilized to recognize the modification. Furthermore, tampering can be soft in a way that it does not modify the contents however it changes the image’s quality. This kind of tampering include various operations such as contrast and brightness adjustment, up-sampling, down-sampling, zooming and rotation [9]. Figure 2.4 has a summary on attacks, and Figure 2.5 exemplifies by using images the splicing and copy move attacks.

(23)

Figure 2.4: Summary of attacks

2.2

Color Spaces

According to [10] the human eye behaves comparable to a camera, the cornea and lens act like the camera lens to focus an image on the retina at the back of the eyes, which in its case takes the role of the image sensor of such devices. Our visual perceptions are originated and influenced by the anatomical structure of the eye, these structures have a significant impact on our perception of color, and additionally, there are other important cognitive visual mechanisms that affect color appearance. Some important phenomena that affect color appearance are based on color constancy, which refers to the common perception that the colors of objects remain unchanged across significant changes in illumination color and luminance level [10]. Color constancy is served by the mechanisms of chromatic adaptation and memory color and can easily be shown to be very poor when careful observations are made. Memory color refers to the phenomenon that recognizable objects often have a prototypical color that is associated with them. In short, color constancy refers the ability to recognize colors of objects independent of the color of the light source. Following these concepts, it

(24)

Figure 2.5: (a) Splicing Attack, (b) Copy-Move Attack

is possible to estimate the illuminants from the source image, this is done in five different approaches in Van de Weijer [11].

2.2.1

Color Appearance Terminology

As part of the color appearance terminology present in the literature, we have the following most common terms [12]:

ColorAttribute of visual perception consisting of any combination of chromatic and achromatic content.

HueAttribute of a visual sensation according to which an area appears to be similar to one of the perceived colors: red, yellow, green, and blue, or to a combination of two of them. the property of a color that varies in passing from red to green [12]. LightnessThe brightness of an area judged relative to the brightness of a similarly illuminated area that appears to be white or highly transmitting.

ColorfulnessAttribute of a visual sensation according to which the perceived color of an area appears to be more or less chromatic.

(25)

Chroma Colorfulness of an area judged as a proportion of the brightness of a sim-ilarly illuminated area that appears white or highly transmitting.It describes the intensity of the hue in a given color stimulus.

Contrast It is typically defined as the difference between maximum and minimum luminance in a stimulus divided by the sum of the maximum and minimum lumi-nances.

Saturation Colorfulness of an area judged in proportion to its brightness. Satura-tion is a unique perceptual experience separate from chroma. Like chroma, satura-tion can be thought of as relative colorfulness. However, saturasatura-tion is the colorfulness of a stimulus relative to its own brightness, while chroma is colorfulness relative to the brightness of a similarly illuminated area that appears white.

2.2.2

RGB Color Space

The RGB color space is a linear color space that formally uses single wavelength primaries [12]. Red, green, and blue are three primary additive colors (individual components are added together to form a desired color) and are represented by a three-dimensional, Cartesian coordinate system [13]. Figure 2.6 shows a representa-tion of this model.

2.2.3

L*a*b Color Space

CIELAB was developed as a color space to be used for the specification of color differences [10]. The L* measure is lightness ranging from 0.0 for black to 100.0 for a diffuse white. The a* and b* dimensions correlate approximately with red–green and yellow–blue chroma perceptions. To calculate the L*a*b values from RGB images, first we need to convert from RGB space to XYZ, afterwards we convert to L*a*b color space, for this we use equations 2.1 through 2.4:

(26)

Figure 2.6: RGB model representation      X Y Z      =     

0.412453 0.357580 0.180423 0.212671 0.715160 0.072169 0.019334 0.119193 0.950227      ∗      R G B      (2.1)

L∗ = 116 Y

Y n

!13

−16 (2.2)

a∗ = 500 "

X Xn

!13

Y nY !13#

(2.3)

b∗ = 200 "

Y Y n

!13

ZnZ !13#

(2.4)

Where Xn, Yn and Zn are the X, Y , and Z coordinates of a reference white patch.

(27)

In Figure 2.7 there is the visual Cartesian representation of the L*a*b color space.

Figure 2.7: L*a*b model representation

2.2.4

HSV Color Space

HSV space (for hue, saturation and value) is obtained by looking down the center axis of the RGB cube [12]. A visual representation can be found in figure 2.8 . In situations where color description plays an integral role, the HSV color model is often preferred over the RGB model. The HSV model describes colors similarly to how the human eye tends to perceive color. This color space describes color using more familiar comparisons such as color, vibrancy and brightness. In order to convert from the RGB color space to the HSV color space, first we need to normalize the values of each individual channel (eq. 2.5) afterwards we calculate Chroma (eq. 2.6). From this we can obtain the individual channels for Hue, Saturation and Value (eq. 2.7 through 2.9) [12].

(28)

R′ = R 255 G ′ = G 255 B ′ = B 255 (2.5)

Cmax=max R

, G′, B′

Cmin =min R

, G′, B′

∆ =Cmax−Cmin (2.6)

H =                    0◦

, ∆ = 0

60◦

×(G′B′

∆ mod6), Cmax=R

60◦

×(B′R′

∆ + 2), Cmax=G

60◦

×(R′G′

∆ + 4), Cmax=B

′ (2.7) S =     

0, Cmax = 0

Cmax

, Cmax 6= 0

(2.8)

V =Cmax (2.9)

Figure 2.8: HSV model representation

2.3

Saliency

According to Li [14], one central task of the human vision system is to efficiently detect the important visual subsets, in other words, the salient subsets. These subsets are conspicuous and get processed with high priorities in our brain, while

(29)

other subsets are often inhibited or even ignored to increase the processing efficiency. Additionally, it is believed that visual saliency plays an important role in mechanisms to process the massive visual information received by our vision system. Visual saliency is a term that refers to the idea that certain parts of a scene are pre-attentively distinctive [15] and create some form of immediate significant visual arousal within the early stages of the Human Vision System (HVS).

Figure 2.9: Visual Saliency Examples

2.3.1

Object Saliency Detection

There are several ways to obtain the most salient subsets in an image, it goes from the saliency map model to the object-based saliency models. Usually, there are two different categories of approaches to detect salient objects in a scene, the first category focuses on utilizing the location-based saliency map in which the aim is to extract salient objects from the saliency maps computed using the location-based saliency models. In the second category, the saliency of objects are directly measured. As explained by Harel [16] the leading models of visual saliency may be organized into these three stages:

(30)

• Activation: form an "activation map" (or maps) using the feature vectors.

• Normalization/combination: normalize the activation map (or maps, followed by a combination of the maps into a single map).

2.4

Image Segmentation

One natural view of segmentation is that we are attempting to determine which components of a data set naturally “belong together” [17]. Segmentation is a fun-damental low-level operation on images. If an image is already partitioned into segments, where each segment is an “homogeneous” region, then a number of subse-quent image processing tasks become easier. A homogeneous region refers to a group of connected pixels in the image that shares a common feature. These features can be brightness, color, texture or motion [12]. In addition to grouping, boundary de-tection is the dual goal of image segmentation. After all, if the boundaries between segments are specified, then it is equivalent to identifying the individual segments themselves.

2.4.1

Superpixel segmentation

A superpixel is an image patch which is better aligned with intensity edges than a rectangular patch. This patches can be used to replace the structure of a pixel grid, and these regions are somewhat simple to be generated using several methods. Superpixels are an over-segmentation of an image - or seen the other way around a perceptual grouping of pixels [18]. Instead of finding the few foreground segments that correspond to objects, superpixel segmentation algorithms split the image into typically 25 to 2500 segments. The objective of this over-segmentation is a partition-ing of the image such that no superpixel is split by an object boundary, while objects

(31)

may be divided into multiple superpixels. This way, the object outlines can be re-covered from the superpixel boundaries at later processing stages. The approach presented in Achanta [19], which is used in this dissertation, adapts a k-means clus-tering algorithm to generate the superpixels, which is easy to use, offers flexibility in the compactness and number of superpixels it generates. Additionally, superpixel calculations can be done adaptable, depending on the image characteristics. Usage of this method is presented and described in Chapter 4.

(32)

2.5

Classification

One important part of the proposed technique is the classification step. The first phase of a classification algorithm is that of feature selection, then the output of a classification algorithm may be presented for a test instance in one of two ways: as a discrete label or a numerical score [20]. For classification,a segmentation is done to the data to analyze, on the basis of a training data set, which encodes knowledge about the structure of the groups in the form of a target variable and a testing set, which determines with labels or scores to which group the data belongs to. Thus, while the segmentations of the data are usually related to notions of similarity, as in clustering, significant deviations from the similarity-based segmentation may be achieved in practical settings. As a result, the classification problem of this dissertation is referred to as supervised learning.

2.5.1

Techniques

There are several techniques focused in the area of machine learning and used for classification, however, the most important used in this dissertation are as follows:

• SVM: SVM methods use linear conditions in order to separate out the classes from one another. The idea is to use a linear condition that separates the two classes from each other as well as possible.[20] SVM is an algorithm for learning halfspaces with a certain type of prior knowledge, namely, preference for large margin[21].

• Decision Trees: The decision tree is a classic and natural model of learning. It is closely related to the fundamental computer science notion of divide and conquer[22]. Decision trees create a hierarchical partitioning of the data, which

(33)

relates the different partitions at the leaf level to the different classes. The hierarchical partitioning at each level is created with the use of a split criterion. The overall approach is to try to recursively split the training data so as to maximize the discrimination among the different classes over different nodes [20].

• Random Forests: Collection of trees are called forests, and so classifiers built like this are called random forests. It tends to work best when all of the features are at least marginally relevant, since the number of features selected for any given tree is small[22]. The prediction of the random forest is obtained by a majority vote over the predictions of the individual trees[21].

• Naive Bayes: The Naive Bayes classifier is a classical demonstration of how generative assumptions and parameter estimations simplify the learning pro-cess. In this approach approach, a generative assumption is made that given the label, the features are independent of each other[21].

• Ensemble: Ensemble methods are learning models that achieve performance by combining the opinions of multiple learners. The main advantage of ensem-bles of different classifiers is that it is unlikely that all classifiers will make the same mistake[22].

• Neural Networks: An artificial neural network is a model of computation inspired by the structure of neural networks in the brain. In simplified models of the brain, it consists of a large number of basic computing devices (neurons) that are connected to each other in a complex communication network, through which the brain is able to carry out highly complex computations[21]. The basic computation unit in an artificial neural network is a neuron or unit. These units can be arranged in different kinds of architectures by connections between them. The most basic architecture of the neural network is a perceptron, which contains a set of input nodes and an output node[20].

(34)

2.5.2

Convolutional Neural Networks

A novel part of the proposed work consists in using a convolutional neural network in order to extract features for classification. CNNs are a specialized kind of neu-ral network for processing data that has a known grid-like topology. The name convolutional neural network indicates that the network employs a mathematical operation called convolution, a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers[23].

For this work, the CNN model proposed by Krizhevsky[49] will be used in order to extract image features. The reason for this is because its dataset which is ImageNet, has 15 million labeled high-resolution images belonging to roughly 22,000 categories. It contains eight learning layers —five convolutional and three fully-connected. Each layer forms a new representation for an input image by gradually extracting discriminative information, unrelated information is gradually removed from low layers to high layers and the reconstructions of last layer only keeps the most discriminate parts. A more detailed explanation of this CNN and its usage is presented in Chapter 4.

The following chapter presents information on the state of the art methods reviewed for this dissertation, related to classification of image attacks, describing methods for analysis and their respective features, concluding with works that make use of saliency detection.

(35)

Chapter 3

Related Work

As it was described in the previous section, tampering detection techniques can be divided into two major areas, active and passive approaches. Most of them, however, center around the detection of copy-move forgery. The outline of this chapter will be covered by briefing about active methods that even though are focused on the Copy-Move attack, can also be applied to solve the splicing attack with some modifications but the main purpose is to exemplify how active methods work. Following this information, the passive methods that will be covered are those with a focus on the splicing attack. As a supplementary part, works that make use of object saliency detection are presented. Furthermore, the only work that fuses splicing detection using a saliency method is also presented.

3.1

Active Methods

Amerini et al [24] proposed a method able to identify copy-move forgery and also to estimate the parameters of the transformation used based on the scale invariant feature transform (SIFT) to detect and describe the points that belong to tampered areas. To identify the possible cloned areas a hierarchical clustering is performed on

(36)

spatial locations of the matched points. When the image is classified as forged, point geometric relationships can be defined by a homography which is used to determine the geometrical transformation used between the original image and the copy-moved one. From the results it is clear that as long as the forged area is visible the method will be effective, also it must be applied over images with single modified areas be-cause as the size of tampering increases so the false results. It is important to note that this method could also be used for the splicing detection however performance will not be acceptable.

Among methods that embed information for restoration, Bravo Solorio [25] proposed a method capable of identifying the tampered areas and also manage to restore the original image as long as the altered region only covers 24% of the host. This method is divided into three main steps, the embedding process, tampering localization and cropping endurance, ending with restoration. The starting phase consists of creating a hash function to store reference bits with the information of image index, prefixes, and authentication reference. The image is divided into blocks and information is embedded in the least significant bits. Once the tampered blocks have been iden-tified, the restoration process is executed. This is an iterative process in which information from the blocks is used in order to estimate the original most significant bits and the reference information embedded in the prior step. The results of this work show that perfect restoration is possible as long as the tampering percentage is below 25% and it has the appropriate number of iterations. In the case of A. Phadikar [26], the method uses a semi-fragile watermarking scheme with the capa-bilities of detection and correction in the wavelet domain. An interesting process is made, by first using a digital signature as a fragile watermark which serves as the tampering identification, and a semi fragile watermark, comprised of a halftone version of the host image, both signals are embedded in the integer wavelet do-main (IWT) intermediate sub bands while being permuted in order to increase the restoring capability. This scheme is not made for perfect restoration, however it can reconstruct the original image, with acceptable quality, up to a 40% of image

(37)

mod-ification. Additionally, the authors remark that the algorithm can be implemented in hardware to accelerate the process. From the observed results, this method can be used to restore copy-move, copy-paste, cropping and splicing forged images. Even though, the work from L. Rosales-Roldan et al [27] which also makes use of the IWT domain, and actually compares to Phadikar’s work, claims that the method is efficient only if damage covers up to 25% of the image. While Rosales-Roldan pro-cess involves the use of a multilayer perceptron (MLP-Neural Network) it presents two different algorithms. The first one uses IWT domain while the other one applies discrete cosine transform (DCT) as authentication phases. Lastly, for the recovery step, the neural network is trained with the information from the image, its halftone version, and reference data, using a backpropagation algorithm. Their results show improvement in robustness and decrease in false detections.

3.2

Passive Methods

Image Splicing causes inconsistencies in many features, thus works take this infor-mation as the basis of analysis. Works focused on solving this attack are divided in mainly 4 areas, based on the analysis of edges inside the images, noise irregular-ities caused by the borders of the copied object, and feature based such illuminant features and textures. It is also important to note, that due to the nature of some methods it is possible to merge them in order to enhance detection capabilities. Be-sides the main areas, image analysis can also be done in two ways, either dividing into blocks or making use of the whole image. As for detection, it can only be a decision if there exist splicing or there is no splicing in the image, to identify the spliced region or the spliced object which is the most difficult issue to solve. At the end of this section, table 3.1 presents a summary of all cited works that use a passive approach.

(38)

3.2.1

Edge detection Methods

We start off with the method proposed by Zhen Fang et al. [28], which takes the edges of the image and classify them into different categories in order to segregate them as candidates of tampering caused by splicing. This is advantageous because, in image splicing detection, object reflectance and occlusion edges may be the splicing boundary. Work is done over the Hue, Saturation, and value (HSV) color space because the method is based in the calculus of quasi invariants derived from the dichromatic reflection model and obtained from projecting the image derivative into the Hue direction. Later for the process, an image histogram is generated, and its entropy is used to measure the difference of color distribution between natural and spliced edges if the value is less than a given threshold then the area is denoted as forged. As long as there is a noticeable difference in textures, then this method works well. The next article proposed by Song [29] uses the fact that Blur is a common phenomenon in images, and image splicing operations may influence the features of blur in natural images, making it possible that blur can be used as a cue for image splicing detection. The proposed method consists of four steps, first, the edges are identified by applying a Canny edge detection method, after this, the image is re blurred by three Gaussian blur kernels, this is done to estimate the difference between two estimated σ. This value, the standard deviation σ of defocus blur can reveal the depths of objects in natural images. Finally, an acceleration based criterion is used to determine image splicing. This work only delimitates the possible spliced region, furthermore, when the region around the selected edge has limited texture, the detection rates of the proposed method will be reduced. Following the scheme of analyzing the image through another method using a block approach is the work by Tu Huynh-Van et al. [30], which consists in two phases, the first one consists in an edge detection step and later both tampering attacks analysis. In the process a one level discrete wavelet transform (DWT) is applied and in the LH, HL and HH sub bands edges are detected. Later manipulation is confirmed by feature

(39)

similarity detection using a Blob detection algorithm which is applied to define the size of the tampered region. After this process, feature vectors are created by using a Run Different Method which extracts five features of the forgered parts and searches regions with similar characteristics. Figure 3.1 poses as an example of the difference between a forged edge and a natural edge inside an image.

Figure 3.1: Difference between edges in an image

3.2.2

Noise irregularities

Another important feature found on spliced images is noise, which is typically intro-duced during acquisition or subsequent processing of an image. For an untampered image, it is possible to assume that the noise statistics across different pixels differ only slightly. Thus, spliced regions from an image with significantly different noise statistics can be exposed with the inconsistency of local noise characteristics. As such, the work proposed by Siwei Lyu [31] uses a statistical property of natural im-ages in band-pass domains, projection kurtosis concentration, and its relation with the noise variances. This property is the statistic regularity present in natural im-ages and to observe the variances it analyzes the kurtosis. Assuming untampered images have spatially homogeneous noise statistics, a composite image with regions from other images with different noise characteristics can be exposed by the incon-sistencies of local noise statistics. It is important to note that in this work only the

(40)

spliced region can be determined and images with complex textures or edges become higher false positive detections. As a difference from the previous work, J. Dong [32] proposes adding white Gaussian noise to the whole image, then to expose the tam-pered parts the image is divided into sub-blocks and the noise variance of all of them is calculated. Then using a Laplace fitting based on the maximal likelihood estima-tion method the boundary between tampered regions and the original is decided. The main drawback of this work is the block size, which results are imprecise and time-consuming if the size of the image block is too large, results have low resolution and a high error rate; if the size of the block is too small, information is not enough and computational time increases.

Finally, a method that uses image segmentation for analysis is presented by Chi- Man Pun [33]. Instead of dividing the image with a fixed initial size or sub-blocks, this method segments the host image in multiple scales, where a minimum initial size is defined, and the initial sizes for the multiple scales are accordingly increased progressively. Hence, noise estimation is applied to all scales respectively and finally get composite results from all scales together. In the step of local noise estimation, the standard deviation of the noise in each segment is computed, addi-tionally, the average brightness of each segment is also included. Then, all segments are clustered into two sets regarding their probability of originality and those in the smaller cluster are regarded as potentially spliced regions.

3.2.3

Illuminant features

The color of an object in an image depends both on its intrinsic color and the color of the light source or the illuminant. Therefore, an object, which is in the same scene but exposed to different illuminants, shows different colors, following this idea, the color of the illuminant can be estimated by computing the average color of a scene. Yu Fan, Philippe Carré, and Christine Fernandez-Maloigne [34] present a method

(41)

Figure 3.2: Noise Estimation Methods

based on local illuminant estimation. For this estimations, the work makes use of the methods proposed by J. Van de Weijer [11], which consists of the illuminants calculated from algorithms composed of Grey-World, Max-RGB, Shades of Grey, first-order Grey-Edge and second-order Grey-Edge. The original image is then seg-mented into non-overlapping horizontal and vertical bands, and from each band, the illuminant estimations –for each algorithm- are calculated. After obtaining all esti-mations, a median of all estimates is computed and then considered as the reference illuminant, this is done for both bands. Next, using the Euclidean distance between the reference illuminant and an estimate, a potential tampered band is marked if its corresponding distance value is higher than a threshold. Subsequently, the po-tential forged patches are exposed by finding the intersection between all popo-tential tampered horizontal and vertical bands. However, the method has a limit because it fails to identify the authenticity of the images and requires a minimum amount of human intervention for annotation of suspicious objects including final decision of tampering.

In the case of X. Wu’s work [35] the image is divided into overlapping blocks, then illuminant color is estimated on each block, and the difference between the

(42)

estimation and reference illuminant color is measured. If the difference is larger than a threshold, the corresponding block is labeled as having splicing. A maximum likelihood classifier is used to adaptively select illuminant estimation algorithm, so in order to make the estimations of illuminants the algorithms for Grey-Shadow, first-order Grey-Edge and second order Grey-Edge are used. The intersection of spliced blocks based on different approximation is computed as the final detection result.

3.2.4

Texture Features

Hakimi’s work [36], makes use of texture classification, which is another important feature present inside images. In this work, the authors apply the local binary pattern algorithm (LBP) which defines a code for each pixel in the image and to reduce image size they apply a Haar Wavelet Transform. After all the information is separated by a PCA algorithm, a support vector machine is used to classify all features and detect whether an image is modified or not. Using known datasets, they achieve an 80% of acceptance rate determining real and forged images.

There is another work, considered as a hybrid technique proposed by Agarwal [37] that uses standard deviation filter to highlight the image details and then apply rotation invariant co-occurrence among adjacent local binary pattern (RIC-LBP) operator to extract the internal statistical information. The RIC-LBP operator is invariant to image rotations and has a high descriptive ability. It gives good results in texture classification. For the classification, they have used Spectral Regression Discriminant Analysis (SRDA) classifier based on spectral graph analysis and re-gression to get better results. This classifier time and time complexity are less than conventional LDA as well as from SVM, and it is more effective. It is due to the fact that SRDA has to solve only least squares problems of regularization and eigenvector calculation is not required that makes it time and memory efficient. The accuracy

(43)

of this method is only about 80%.

S. Mushtaq [38] proposed to use run length texture features, first the Gray Level Run Length Matrix is calculated for authentic images and spliced images, GLRLM is a pattern of grey level pixels in a particular direction from the reference pixel. It is a way of searching the image across a particular direction for runs of collinear pixels having same gray level values. Afterwards, an SVM is trained in order to identify if the image has splicing or not.

As a combination of DWT and LBP, Mandeep Kaur [39] presents a method that works over the YCbCr color space, representing the image in the form of the luminance component (the Y channel) and chrominance component (Cb or Cr chan-nel). Cb and Cr are the blue and red difference respectively. Luminance channel describes the image content and is strong enough to hide the tampering traces. The Chrominance channels describe the week signal content of the image like edges, and thus processing is performed on the extracted chrominance channel. A single level discrete wavelet transform is then applied to get the low level coefficients and ap-proximation coefficients and the texture from the four sub bands [LL, LH, HL, HH] is extracted using local binary patterns, then these LBP histograms are concatenated and used as inputs for an SVM classifier. One limitation of this work is that its performance gets affected when the size of the forged image is very small.

3.2.5

Other Features

Another work presented by Amerini [40], makes use of the Benford’s law also known as the First Digit Law, which is a well-known rule in statistics of natural phenomena. According to it, the frequency of appearance of each digit in the first significant place of quantities observed from natural phenomena is logarithmic. To achieve the aim of localization, the authors train an SVM by means of image portions whose size was compliant with that of search window used for forgery localization. Their results

(44)

show that good tampering localization is achieved, however in images where there is few tampering the rate of false positives is high.

It is also important to note that due to the nature of some methods it is possible to merge them in order to detect more than one specific attack. The work presented by Shinfeng D. and Tszan Wu [41] first divides the analysis of the image to detect splicing and copy move attacks independently. For the primary attack, the image -forged or not- is segmented into non-overlapping blocks and then the DCT coefficients are applied to extract the features which then become a histogram. Using this information, it is possible to compute the probability of forgery in each block. Analyzing all blocks, tampered and non-modified, it is observable that they have different phenomena in the histogram, whilst the first has higher peaks, a forged one tends to contribute randomly to the bins of the histogram. Concerning the copymove attack detection, the approach is based on the SURF algorithm. As with the first part, the image is segmented however this time the sub blocks are overlapping. After the key points are collected, a matrix of them and feature descriptors is created. Then a matching decision process is calculated by calculating the Euclidean distances between the points and compared to a threshold, with this the tampering region can be determined.

Finally, an approach proposed by Yuan Rao and Jiangqun Ni [42] that im-plements a deep learning technique. Two main steps are explained, to start with a convolutional neural network (CNN) is pretrained by using labeled patch samples of both attacks. After getting all features necessary to detect tampering, a support vector machine (SVM) is used for classification. Although these methods report good precision, the number of false detections remains high.

(45)

3.3

Saliency Detection

In order to identify a region of interest in an image, saliency detection methods have been proposed. The use of saliency allows estimating the importance of a particular object included in an image. The first work on saliency detection was proposed by Laurent Itti, Christof Koch and Ernst Niebur [43]. The approach follows the processing of the image by nine spatial scales are created using dyadic Gaussian pyramids which low pass filter and subsample the input image, yielding horizontal and vertical image reduction factors. Each feature is then computed by a set of linear operations center surround in differences and normalization. Local orientation information is obtained using oriented Gabor pyramids, and in total 42 feature maps are computed, six for intensity, 12 for color and 24 for orientation. The purpose of the saliency map is to represent the conspicuity also known as saliency, at every location in the visual field. A combination of the feature maps provides the input to the saliency map. In Figure 3.3, the general model is presented.

Zhenhua Qu [44], proposed a method that uses a detection window to scan across locations and the usage of saliency as means to extract features. The method works as follows: at each location, the window divides into nine sub-blocks. In order to detect the “unusual” locations in a sub-block, a visual search is performed by regulating a bottom-up Visual Attention Model(VAM) which the one used is the Itti-Koch model. It takes an image as input and constructs a Saliency Map from low level feature pyramids, such as intensity and edge direction. The salient visual locations are identified as the local maximums of the saliency map. Discriminative feature vectors are extracted from the most salient fixations to train a hierarchical classifier.

(46)

Figure 3.3: Itti - Koch Model Architecture

3.3.1

Object saliency for tampering detection

Contrasting the previous work, Oleg Muratov [45] combines salient object detection and forensic analysis on the process of identifying the tampering. Assuming that salient objects are the regions of an image whose integrity is more critical and im-portant to be verified since the semantic content of the photo is highly connected to them. The saliency map generated is not used directly, instead, it is used to generate a bounding box from the corresponding map. This allows for more generalization of the method, due to the relatively high number of region-based forgery detection methods that require a bounding box as an input. Then, using the information from this image limits, a JPEG artifacts method is applied to determine if splicing is present in the analyzed image.

(47)

3.4

Conclusion

Although there are several works that try to identify tampered regions and objects, still there is an area of improvement due to the low identification percentage and the fact of the capacity in labeling which is the original part of the image and which is the forged one. According to H. Li [46], only specific methods are capable of this action, however, they give ambiguous results on the differentiation between original and forged regions which causes inaccuracies in detection. It is important to note that a fusion of methods can help to obtain the desired information of the splicing object and thus this thesis proposal incorporates the use of saliency detection only as means to identify the most conspicuous region, however, this is not enough to determine if there is splicing in the image. For this reason, a classification step has to be developed. In order to train this classifier a texture – illuminant feature approach will be observed including a convolutional neural network approach to extract features, in the following sections the proposed technique will be presented. Although there are several methods present in state of the art, there is no direct comparison between a specific work and the proposed technique, however, over chapter 5 in table 5.9, a non-direct comparison between works that report the same metrics as the proposed work is presented. Table 3.1 shows as a summary, the methods, and results.

(48)

Author Method Segmentation Result Year

Zhen Fang[28] Edge Detection Complete Image Region 2010 Chunhe S.[29] Edge Detection Complete Image Region 2014 Tu H.[30] Edge Detection + DWT Sub Bands Object 2016

Siwei L.[31] Noise Estimation Blocks Object 2013

Jing D.[32] Noise inconsistency Blocks Region 2016

Chi Man P.[33] Noise Estimation SLIC Object 2016

Yu F.[34] Illuminant Estimation H / V bands Region 2015 X. Wu[35] Illuminant Color Consistency Blocks Region 2011

F. Hakimi[36] LBP + DWT Sub bands Decision 2015

S. Agarwal[37] RIC LBP Complete Image Decision 2016 S. Mushtaq[38] GLRLM Complete Image Decision 2014

M. Kaur[39] DWT + LBP Sub bands Decision 2016

Amerini[40] First Digit Features Blocks Region 2014 S. Lin[41] Double JPEG Compression Blocks Region 2011 Rao[42] CNN features Complete Image Decision 2016

Z. Qu[44] Visual Cues Complete Image Region 2009

O. Muratov[45] Saliency Detection + JPEG comp. Complete Image Object 2012

Proposed Work Saliency Detection + CNN Superpixels Region + Object 2018

(49)

Chapter 4

Proposed Method

Continuing with the information presented in chapter 3, there are several works that center only on finding if an image is attacked with splicing or not. Additionally, there are approaches that focus on enclosing the forged region in an effort to increase detection capabilities and including the location of the tampered area. However, an enhanced attribute, which is the detection of the spliced object within the image is challenging to achieve and thus, is not done due to the high difficulty that this represents.

The proposed method was devised with the idea of finding the most conspicuous region present in the image, as the splicing attack focuses on either hiding or adding information, and this can be done over the "most important" part of the image. Also, considering a more real life scenario, where this attack tends to cover up objects inside the images in order to change the context of it. This work, aside from featuring a decision process and region delimitation comparable to the state of the art methods, enhances image splicing detection by adding a step to obtain and segment the spliced object. Figure 4.1.a provides the complete diagram of the proposed method, divided into the steps taken in order to address the problem of finding both, if the image contains splicing and also the segmentation of the

(50)

spliced object. The following sections present the explanation and purpose of what is developed at each stage. Figure 4.1.b presents a brief outline of what will be presented subsequently.

(a) Proposed Method (b) Outline

Figure 4.1: Proposed Method Overview

4.1

Color Space Conversion

Regular methods used for tampering detection make use of grayscale images. Even if the original source comes as a color image, most works tend to transform the images from a three channel version to a simpler variant in order to extract features. In the case of saliency detection, it is preferable to handle color images as each channel contribute to the information needed for the method to work. This proposal is a merger of saliency detection and image classification. In order to define what is the most conspicuous region of an image, and then decide if this area contains any spliced object, it was important to use the information present in three channels. Nevertheless, using the data from RGB images, was not adequate in this case, as results were different in each of the proven color spaces, some channels gave out more information than others.

(51)

For this reason, the first part of the proposed method is the conversion of the original image, from an RGB Red, Green, Blue

color space to an HSa Hue, Saturation and "a" channel

color space. After careful analysis and consideration Appendix A figures contain all the color spaces conversions and single channels observed for taking the decision of creating HSa color space, furthermore the com-binations created from this 3 channels

, this color space gives the most useful hint on where the most salient part of the image is centered. Using the information that the saliency detection methods need to enhance selection, and noting that Hue and Saturation are the "chromacity" of the image, thus giving a better highlight to where the spliced part might be. The "a" channel magnifies the region of splicing. Hue gives out the contrast which focalizes the effect of illumination that disseminates the homogeneity of these values. Posterior to the decision of which channels were the best to use, the new HSa color space was conformed, also each channel was observed as a surface plot. Even more so, and as it is seen in the following figures, these channels present discernible changes over the spliced regions.

(52)

Figure 4.3: Hue

Figure 4.4: Saturation

(53)

4.2

Object Saliency Detection

Subsequently, object saliency detection is implemented. This is the most novel part of the proposed method, as its main usage within the area is limited to feature extraction and region bounding, and in this work it is used to bound the spliced object. As it is described in chapters 2 and 3, the main purpose of this step is to find the most noticeable region from an image. Although there are several techniques to address this problem -and it is still an open problem-, this dissertation benefits from the work presented by J. Harel [16] as the results obtained from his algorithm are the best for our application. The reason for this is given by the object highlight caused by the color conversion, identified by the algorithm as activation zones. In his work, Harel implements a bottom-up visual saliency model named Graph-Based Visual Saliency GBVS

that is presented in two steps: 1) forming activation maps on certain feature channels, and 2) normalizing them in a way which highlights conspicuity and admits combination with other maps. The most important result obtained from this work is the saliency map, which it is then used as a reference to extract and bound the suspicious region.

(a) (b) (c) (d) (e)

Figure 4.6: Saliency detection

Having this map, obtained directly from Harel’s solution (Figure 4.6 (c)), helps to have an overview of where the spliced region might be. In order to have this area

(54)

enclosed, both the saliency map and the original image are superimposed (Figure 4.6 (d)), and afterwards, a threshold is applied to trim the unusual zone, also aiding to calculate the bounding box of this region (Figure 4.6 (e)). This is needed because after the classification step, the decision of having splicing or not over the selected suspected area follows the extraction of the spliced object, thus, if it actually contains the attack, segmentation is straightforward as the area becomes smaller centered only in a defined region.

4.3

Classification

The final decision for tampering detection is completed in this step, as in the previous phase, the conspicuous region is already identified, however, it is still not known whether this area is indeed spliced or not. For this reason, the classification process will define if the selected region is original or part of the splicing attack. This stage is divided into two subsections, the first one giving an explanation on which features will be used and how were they extracted. The subsequent subsection has information on the classifiers used.

4.3.1

Feature Extraction

It is important to note that the most used features for tampering detection prob-lems are Edges, textures, or statistically related, including also the characteristics obtained in the frequency domain. Despite this, most works center on using global features as means to identify possible tampering. According to M. Hassaballa [47], since the global features aim to represent the image as a whole, only a single fea-ture vector is produced per image and thus the content of images can be compared making use of only their feature vectors. Also, in accordance with A. Apatean [48], to describe the content of an image, usually some numerical measures with different

(55)

possibilities to represent the information could be used, and this could lead to fusing different features without having redundancy.

Using these ideas as a reference, a fusion of two features was utilized for the correct segmentation of the object inside the spliced identified region. Furthermore, the characteristic vector obtained from this fusion was also taken for classification testing, nonetheless, the results from this experiments did not achieve the same devel-opment as the actual features used for classification. In the next section, explanation of these features will be given.

As for the actual features used for classification, a deep learning feature extrac-tion approach was employed. In order to obtain better accuracy results, a pretrained deep convolutional neural network CNN

was used. This CNN comes from the work of Alex Krizhevsky [49] and it is called AlexNet. It contains eight learning layers, five convolutional and three fully-connected. The first convolutional layer filters the 224x224x3 input image with 96 kernels of size 11x11x3 with a stride of 4 pixels this is the distance between the receptive field centers of neighboring neurons in a ker-nel map

. The second convolutional layer takes as input the response-normalized and pooled

output of the first convolutional layer and filters it with 256 kernels of size 5x5x48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convo-lutional layer has 384 kernels of size 3x3x256 connected to the outputs of the second convolutional layer. The fourth convolutional layer has 384 kernels of size 3x3x192 , and the fifth convolutional layer has 256 kernels of size 3x3x192. The fully-connected layers have 4096 neurons each and the size of the obtained feature vector is 1x4096.

4.3.2

Classifiers

The classification step, makes the splicing detection feasible. This is done by us-ing the obtained feature vector from the previously explained CNN (AlexNet) and

(56)

Figure 4.7: AlexNet Architecture [49]

applying a training process. As with the state of the art methods, Support Vector Machines(SVM) or Random Forest (RF) classification are applied, mainly because those classifiers are a good choice to classify vectors. However, to have a broader analysis of classification, besides the aforementioned methods, Decision Trees (DT), Ensemble Classification and Naive Bayes Classifier (NBC) are also applied. In the next chapter, description of how the training set and test set were divided is pre-sented, also a comparison between all of the results and image analysis are shown. All training and testing was done using MATLAB functions.

The classifiers were used separately to evaluate their performance with the deep learning feature vector. All were tested because this descriptor is not widely used for classification of Splicing attacks and, as it is observed in the result section, performance is favorable.

4.4

Object Segmentation

If the decision in the previous step is defined as the image having splicing, using the thresholded image as a reference, a bounding box is calculated. It serves to cut only

(57)

the false region to apply exclusively on this section the superpixel segmentation and thus, increasing the clustering of the spliced object entirely.

In order to get the best segmentation results from the superpixel algorithm, C. Pun [50] presents a method to determine an adaptive quantity of superpixel regions based on the texture of the image by using a Four level Discrete Wavelet Transform (DWT) to analyze the frequency distribution. First, the DWT is applied to the host image to obtain the coefficients of the low- and high-frequency sub-bands. Then, the percentage of the low-frequency distribution PLF is calculated using equation 4.3, according to which then it is determined the initial size of S (quantity of superpixels), by using equation 4.4.

ELF =

X

|CA4 | (4.1)

EH F =

X

i

X

|CDi |+

X

|CHi |+

X

|CVi |

, i= 1,2, ...4 (4.2)

PLF =

ELF

ELF +EH F

(4.3) S =      √

0.02×M ×N PLF > 50%

0.01×M ×N PLF ≤50%

(4.4)

Where ELF is the low-frequency energy, EH F the high-frequency energy, PLF

is the percentage of the low frequency distribution,CA4 indicates the approximation

coefficients at the 4th level of DWT; and CDi , CHi and CVi indicate the detailed

coefficients at the i th level of DWT, i = 1, 2, ..., 4. M and N is the size of the analyzed image and S the size of superpixels.

Now, having the correct quantity of superpixels, and for the last part, for each superpixel segment, a feature vector is extracted for posterior clustering. As it was mentioned inside the Feature extraction section, this vector was created by fusing

(58)

two image feature descriptors, illuminants as proposed by Y. Fan [34] and Van der Weijer [11] which size is 1×15, and the image descriptor based on Histogram of Orientated Gradients (HOG) from Dalal N. [51] and modifying the idea from J. O. Ludwig [52] which size is 1×81. The final vector used in the clustering part of superpixels is of size 1×96.

For clustering, a kmeans algorithm was used utilizing the information pro-vided by each feature vector from each calculated superpixel. The result, as the background is mostly reduced by the bounding box definition, leads to a correct and clean segmentation of the desired spliced object.

4.5

Image Restraints

As it will be observed in the next chapter, results are significantly improved when some of the aspects present in the images are taken into account. Thus, it is im-portant to discern some influential characteristics in the images found inside the datasets. These features to observe and remove are as follows:

• Texture only images, in overall, work well with the classification step. It is possible to detect if the image contains splicing or not, however if these im-ages incorporate or simulate different "visible regions" then a misclassification problem occurs.

• Following the aforementioned issue, images that seem to resemble contrasting areas, for example having plants, furniture, people and buildings in the same space create misclassification problems. (Figure 4.8 c)

• Images with only one focus point, mainly prevailing with luminance, create problems in classification and saliency detection. (Figure 4.8 a)

(59)

• Images with fairly similar people inside it, either same clothing, skin color, faces and positions, also create complications.(Figure 4.8 b)

In order to better exemplify these points, the subsequent images (Fig 4.8 ) serve as visual examples.

Figure 4.8: Characteristics of restraints

After reviewing the steps for the proposed technique, the next chapter presents information on the datasets used for testing, how the experiments were designed and which metrics are reported. It is important to indicate that, all testing was done using the illumination HOG features and the CNN extracted features. Chapter 5 is then divided into sections that cover each used dataset, presenting the numerical results for both features (classification) and an overall comparison between classifi-cation with and without taking restrictions into account.

(60)
(61)

Chapter 5

Experiments and Results

In this chapter, the experimental setup is demonstrated. First, the datasets used will be shown, in each of them, it will be explained how they were divided and used for classification purposes. Additionally, and as it was stated in the restrictions, results are going to be presented in two parts, the first one as the best -no constraints-results arose and then, the comparison between them and the outcome obtained when the aforementioned conditions are applied, the complete charts for the first analysis are found next inside a complete table. This is done in order to appraise the most relevant results.

To summarize the proposed method, first the image to evaluate is converted into the HSa color space, afterwards, a GBVS saliency detection step is used to determine a saliency map which helps to locate the possible spliced region, then to determine if there is actually splicing within the located region a classification step ensues. Finally, and to segment the spliced object,superpixel segmentation and clustering with the use of the fused illuminants HOG vector is applied.

Referencias

Documento similar

Many evaluation techniques for content based image retrieval are based on the availability of a ground truth, that is on a ”correct” categorization of images so that, say, if the

Many methods have been proposed for abandoned and stolen object detection focusing on the stabilization of the image sequence from a moving camera [2], based on the

First the four different face regions are extracted from the original image: eyebrows, eyes, nose and mouth (see Figure 1 right). The use of four regions is done so as to make

“ Symmetry detection through local skewed Symmetry”, Image and Vision computing, Vol.. Moon, “New Algorithm for Medial Axis transform of Plane Domain”, Graphical Models and

Pulse Sequence Programming for Fast Imaging: MRI is one of the imaging techniques widely used for the acquisition of the phase contrast images required for the analysis of

Frontal and profile images and voice recordings collected from a clinical population of 285 males were used to estimate the AHI using image and speech processing

The most commonly used methods for the subjective assessment of image quality are double-stimulus method with a score of image distortion (DSIS, double-stimulus impairment

The original idea was to calculate, for every location of a point electric source, the complex values of the electric dipole and charge images, placed outside