HENRY CUANDO LLEGA A BELALCÁZAR
EDITH VALENCIA (EXCOMPAÑERA DE TRABAJO AFRODESCENDIENTE)
Another RVQ based VQ method for ANN proposed in this thesis is Joint K-Means Quan- tization (JKM). As mentioned in Chapter 3.3.2.1 and 4.2.1 RVQ’s hierarchical structure separates the quantization problem into 𝑀 subproblems and the solution of each of prob- lem strongly depend on the previous one. However, in its proposed solution, RVQ does not consider this dependence. ERVQ claims to offer a joint training scheme, but the pro- posed algorithm only provides an update on the codebooks generated by RVQ, which are already obtained independently from each other. Hence, the proposed codebook update does not really construct a joint scheme.
Nevertheless, a combination of the hierarchical structure with a joint codebook genera- tion strategy would increase the performance while enjoying the low encoding complexity. Following this claim, Joint K-Means is proposed [P4]. JKM expands the “K-means”22
training on one of RVQ’s layers to all layers, providing a joint training scheme. Investi- gating the training scheme of K-Means23, first an “expectation” step is performed, where
22 K-Means clustering algorithm and Lloyd’s vector quantization are sometimes used inter-
changeably in the literature.
57
the vectors are assigned to the nearest codevectors. Later a “maximization” step follows the expectation step where the codevectors are updated with the means of the assigned vectors. RVQ applies these steps for many iterations separately at each layer. In JKM, it is proposed to extend this to all layers.
The “expectation-maximization” steps of JKM occurs as follows: in the “expectation” step, each vector is assigned to its “selected” codevector and the residual is immediately cal- culated and transferred to the next layer, where the same operation will be repeated until the final layer is reached. Then in the maximization stage, codevectors at each layer are updated with the means of assigned codevectors. Therefore, while RVQ waits for the quantization on a layer to converge, JKM propagates the residuals through layers during the iterations. Note that, JKM does not assign the given vector to the nearest codevector, but instead it assigns to the “selected” codevector, and this selection is performed by the encoding algorithm. JKM proposes a joint encoding algorithm, which takes also the layer below the current layer into account, while selecting the codevector from the current layer. Incorporating this encoding method into the training improves the codebook generation even further.
Encoding in RVQ is also performed independently for each layer in a nearest neighbor fashion. In other words, the nearest codevector from the corresponding codebook is se- lected for each residual. However, this does not guarantee the minimum error. Let 𝒄1,𝑎 be the closest codevector to 𝒙 and 𝒄2,𝑎 is the closest codevector to the first residual 𝒓1= 𝒙-𝒄1,𝑎. 𝒄1,𝑏 is a different codevector from the first codebook, i.e., 𝒄1,𝑎 ≠ 𝒄1,𝑏 and 𝒄2,𝑏 is a codevector from the second codebook. The suboptimality of this encoding scheme can be proven as follows:
lemma: Given ‖𝒙 − 𝒄1,𝑎‖22≤ ‖𝒙 − 𝒄1,𝑏‖22, and ‖(𝒙 − 𝒄1,𝑎) − 𝒄2,𝑎‖22≤ ‖(𝒙 − 𝒄1,𝑎) − 𝒄2,𝑏‖22 there exist at least one 𝒄1,𝑏 and 𝒄2,𝑏, which satisfy
‖(𝒙 − 𝒄1,𝑎) − 𝒄2,𝑎‖22≥ ‖(𝒙 − 𝒄1,𝑏) − 𝒄2,𝑏‖2 2
(4.12)
proof: Assume that 𝒙 = 𝒄1,𝑏+ 𝒄2,𝑏. Then (4.12) turns into the following:
58
which is always true. Now if one can show that the assumption for 𝒙 = 𝒄1,𝑏+ 𝒄2,𝑏 is valid, the proof is complete. If 𝒙 = 𝒄1,𝑏+ 𝒄2,𝑏, then putting it in the first inequality given in lemma gives the following:
‖𝒄1,𝑏+ 𝒄2,𝑏− 𝒄1,𝑎‖22≤ ‖𝒄2,𝑏‖22 (4.14)
Rearranging the terms in (4.14), one can obtain the equation below:
‖𝒄2,𝑏− (𝒄1,𝑎− 𝒄1,𝑏)‖22≤ ‖𝒄2,𝑏‖22 (4.15)
which is true when ‖𝒄1,𝑎− 𝒄1,𝑏‖22≤ 2〈𝒄2,𝑏, 𝒄1,𝑎− 𝒄1,𝑏〉. For the second inequality in lemma, when the proposed assumption for 𝒙 = 𝒄1,𝑏+ 𝒄2,𝑏 is put into the inequality, then the fol- lowing inequality is obtained:
‖(𝒄1,𝑏+ 𝒄2,𝑏) − 𝒄1,𝑎− 𝒄2,𝑎‖22≤ ‖𝒄1,𝑏− 𝒄1,𝑎‖22 (4.16)
Rearranging the terms in (4.16), one can obtain the equation below:
‖(𝒄1,𝑏− 𝒄1,𝑎) − (𝒄2,𝑎− 𝒄2,𝑏)‖22≤ ‖𝒄1,𝑏− 𝒄1,𝑎‖22 (4.17)
which is true when ‖(𝒄2,𝑎− 𝒄2,𝑏)‖22≤ 2〈𝒄1,𝑏− 𝒄1,𝑎, 𝒄2,𝑎− 𝒄2,𝑏〉. Since (4.15) and (4.17) can be true according to the selection of codevectors, in other words they are not always false, then 𝒙 = 𝒄1,𝑏+ 𝒄2,𝑏 is a valid case, hence the proof is complete.
In order to improve the encoding performance, “joint encoding” is proposed in JKM. Joint encoding is similar to beam search in AQ or OCKM, but much less complex since it enjoys the hierarchical structure, which reduces the number of required computations significantly. The joint encoding method searches for the codevector with the minimum quantization error in a small neighborhood of the nearest codevector. So instead of the nearest codevector, it is proposed to select the 𝐻 nearest codevectors and calculate the residuals for each of them. Then the same operation is repeated for each residual, giving 𝐻2 candidates. The best 𝐻 according to the quantization error is selected and the oper-
59
ations proceed until the final layer is reached. To explain the computational costs of en- coding in detail, the distance between the 𝑚𝑡ℎ layer residual 𝒓
𝑚 of the given vector 𝒙, and the 𝑘𝑡ℎ codevector on the 𝑚𝑡ℎ layer 𝒄
𝑚,𝑘 can be rewritten as follows:
𝑑(𝒓𝑚, 𝒄𝑚,𝑘) = ‖𝒙 − ∑ 𝒄̇𝑙 𝑚−1 𝑙=1 − 𝒄𝑚,𝑘‖ 2 2 = ‖𝒙 − ∑ 𝒄̇𝑙 𝑚−1 𝑙=1 ‖ 2 2 − 2 〈𝒙 − ∑ 𝒄̇𝑙 𝑚−1 𝑙=1 , 𝒄𝑚,𝑘〉 + ‖𝒄𝑚,𝑘‖22 = ‖𝒙 − ∑ 𝒄̇𝑙 𝑚−1 𝑙=1 ‖ 2 2 − 2〈𝒙, 𝒄𝑚,𝑘〉 + 2 ∑ 〈𝒄̇𝑙, 𝒄𝑚,𝑘〉 𝑚−1 𝑙=1 + ‖𝒄𝑚,𝑘‖22 (4.18)
where 𝒄̇𝑙 is the nearest codevector on the 𝑙𝑡ℎ layer. For each layer, note that the first term is already calculated in the previous layers. The third and fourth terms can be re- trieved from a look-up table. Hence, the second term should be calculated first for all the codevectors, which requires 𝑂(𝐾𝐷) operations for one layer. The look-ups for the third and fourth terms require 𝑂(𝑚𝐾𝐻) look-ups and additions for the 𝑚𝑡ℎlayer. Finally, among all the distances the best 𝐻 are selected, which cost 𝑂(𝐾𝐻 log 𝐻). This is re- peated 𝑀 times so the final cost of encoding is 𝛰 (𝑀𝐷𝐾 +(𝑀−1)(𝑀−2)
2 𝐾𝐻 + 𝑀𝐾𝐻 𝑙𝑜𝑔 𝐻).
More details on this encoding scheme can be found in [P4] and [P5].
To conclude, JKM takes the lower layers into account during both codebook generation and vector encoding steps. This affects the quantization performance as expected. The tests on ANN benchmarks are shown in Table 11 and Table 12. JKM is also presented in comparison with the prior art in Table 16, Table 17 and Table 18, in Chapter 4.3.
Table 11: JKM Test Results
TEST RESULTS FOR SIFT1M,32-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.100 0.348 0.731
JKM 0.121 0.402 0.790
TEST RESULTS FOR GIST1M,32-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.064 0.189 0.403
JKM 0.077 0.213 0.511
TEST RESULTS FOR SIFT1M,64-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.282 0.701 0.962
JKM 0.323 0.759 0.980
TEST RESULTS FOR GIST1M,64-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.136 0.360 0.705
60
Table 12: Computational and Storage Costs of JKM
Method Encoding Cost Encoding Cost for Different Datasets and Code Lengths (Number of Operations)
SIFT1M-32 SIFT1M-64 GIST1M-32 GIST1M-64
SOBE 𝛰(2𝑀𝐾𝐷) 262144 524288 1966080 3932160 JKM 𝛰 (𝑀𝐷𝐾 +(𝑀 − 1)(𝑀 − 2) 2 𝐾𝐻 + 𝑀𝐾𝐻 𝑙𝑜𝑔 𝐻) 319488 761856 1171456 2465792
Method Storage Cost Storage Cost for Different Datasets and Code Lengths (MB)
SIFT1M-32 SIFT1M-64 GIST1M-32 GIST1M-64
SOBE Ο(𝑀𝐾𝐷) 1.00 2.00 7.5 15 JKM Ο(𝑀𝐾𝐷) 1.00 2.00 7.5 15 𝑴: number of layers 128 128 960 960 𝑲: number of codevectors 256 256 256 256 𝑫: number of dimensions 8 8 4 4 𝑯: number of candidates 32 32 32 32