Capítulo IV: Influencia de las políticas públicas en el desarrollo de los emprendimientos
4.5 Propuestas
4.4.3 Desarrollo de los emprendimientos por parte de los Gobierno Autónomo
The methods proposed in different modules can be combined together to achieve a very high accuracy. However, when a large database is concerned, the SPN-based identification system presents its unique set of challenges, which revolve around two main issues. The first issue relates to the high dimensionality of SPN. As a result, main memory operations like loading of SPN takes considerable amount of time. At the same time, each SPN needs a fairly large amount of space for storage. Further- more, since the SPN looks more like random signal, compression is not very effective. Typically, the SPN extracted from a 10 megapixel image may take up to 50 MB of space even after compression. The second issue is the computational complexity of the matching algorithm. The matching process involves vector operations which, when combined with the high dimensionality of data, becomes a critical concern.
In order to address the issue of prohibitively computational complexity caused by the high dimensionality of SPN, many efforts have been made in recent years. The proposed methods in the literature can be divided into two categories. The methods of the first category attempt to reduce the number of correlations so that there is a smaller number of multiplications to be done. In [62, 63], Bayram et al. proposed to organize a database of reference SPNs into a binary search tree. In such a binary search tree, each leaf node represents a reference SPN from the database. Each internal node is the composite SPN which is composited from all the reference SPNs at the leaf nodes in the subtree beneath it. This composite SPN is defined as the normalized sum of all the reference SPNs beneath it. By organizing all the reference SPNs in such a tree, it allows matching multiple reference SPNs in a single verification process. For example, if a query SPN looks for a matched reference
SPN in a binary search tree, it traverses the tree from root to leaf, matches with the SPN at each node of the tree and makes decision. If the decisionss is positive, the searching is then continued in the subtree beneath it. But if the decision is negative, then no more comparisons are needed. On average, this means that each comparison allows the operations to skip about half of the rest tree so that each retrieval takes time proportional to the logarithm of the number of the reference SPNs stored in the tree. Compared to the method based on linear search, this method is more computationally effective. However, there is a trade off between efficiency and accuracy. Since the probability of error increases with the number of reference SPNs in a tree, this method is much less accurate than the method based on the linear search, especially when a large number of reference SPNs are stored in a single binary tree. Thus, it requires to construct multiple binary search trees so as to maintain a desirable identification accuracy. As a result, this method eventually requires to calculate a (L/t) logt number of correlations, whereL is the number of all the reference SPNs andt is the number of the reference SPNs in each tree.
Approaches of the second category aim to lower the computational com- plexity by compressing the large-sized SPN. In [64–66], the authors introduced a fingerprint digest as a possible solution. The authors assume that the larger com- ponents (in magnitude) of a reference SPN is more reliable than the small ones and thus should be used in correlation detection while the small components can be dis- carded. Thus, this fingerprint digest is primarily formed by keeping onlykelements of a reference SPN with the highest energy values and their locations. As a result, the dimensionality of this fingerprint digest is lower than that of the normal-sized SPN. Since the complexity of calculating the correlation is proportional to the size
of SPN, the method based on fingerprint digest would boost the matching efficiency byN2/k times, whereN2 is the size of the normal-sized SPN (e.g.,N2 = 5122 and
k = 50000). An improved search strategy based on fingerprint digest is proposed in [67] and [68].
In [69], Bayramet al. proposed to represent an SPN by binarizing the values of each pixel. Essentially, the authors only use the sign information of each element of the query or reference SPN and completely disregard the magnitude information. Therefore, by performing this method, each element of a real-valued SPN would be binarized into either −1 or +1. As a result, a binarized SPN only requires 1 bit to store each element, while a real-valued SPN requires 64 bit for each element. Although this method does not reduce the dimensionality of SPN, it can considerably reduce the storage requirements and speed up the time of loading SPN to memory so that indirectly boost fingerprint matching process. However, this method would inevitably cause the degradation in matching accuracy due to loss of information caused by binarization.
Valsesia et al. in [70–72] proposed to compress the sensor fingerprint via a random projection. This method is based on the Johnson-Lindenstrauss (JL) lemma [73]. The JL lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. Based on this concept, a random matrix can be found which satisfies the JL lemma. Then, anN2-dimensional SPN is projected into an m-dimensional subspace, with m < N2, by using the obtained random matrix. By doing so, the dimensionality of the original SPN can be reduced fromN2 tom. However, the method also causes penalties to the matching accuracy
during compressing the SPN.
As shown above, many efforts have been made to improve the efficiency of source camera identification in recent years. However, the results showed that while these compressing methods bring significant computation reduction, they al- so undesirably decrease the identification accuracy in the meantime. In the light of this limitation, we will introduce our algorithms which aims at improving the computational complexity of source camera identification without degrading the identification accuracy in the following chapters.