• No se han encontrado resultados

Supplementary material for the paper “The mRMR variable selection method: a comparative study for functional data”

N/A
N/A
Protected

Academic year: 2023

Share "Supplementary material for the paper “The mRMR variable selection method: a comparative study for functional data”"

Copied!
13
0
0

Texto completo

(1)

Supplementary material for the paper “The mRMR variable selection method: a comparative study for functional data”

Jos´ e R. Berrendero, Antonio Cuevas, Jos´ e L. Torrecilla Departamento de Matem´ aticas

Universidad Aut´ onoma de Madrid, Spain

1 List of models used in the simulation study and a few example outputs Our simulation study consists of 400 experiments based on 100 different underlying models. The optimal classification rule in each case depends only on a finite number of variables. Models differ in complexity and number of relevant variables. The processes involved are chosen among the following: first, the standard Brownian Motion, B.

Second, BT denotes a Brownian Motion with a trend m(t), i.e., BT (t) = B(t)+m(t);

we have considered several choices for m(t), a linear trend, m(t) = ct, a linear trend with random slope, i.e., m(t) = θt, where θ is a Gaussian r.v., and different members of two parametric families: the peak functions Φ

m,k

and the hillside functions, defined by

Φ

m,k

= Z

t

0

ϕ

m,k

(s)ds , hillside

t0,b

(t) = b(t − t

0

) I

[t0,∞)

,

where, ϕ

m,k

(t) =

√ 2

m−1

h

I (

2k−22m ,2k−12m

) − I (

2k−12m ,22km

) i

for m ∈ N, 1 ≤ k ≤ 2

m−1

. Third, the Brownian Bridge: BB(t) = B(t)−tB(1). Our fourth class of Gaussian processes is the Ornstein–Uhlenbeck process, with zero mean (OU) or different mean functions m(t) (OU t). Finally some “smooth” processes have been also include. They are obtained by convolving Brownian trajectories with Gaussian kernels. We have considered two levels of smoothing denoted by sB and ssB.

In the following list of models, µ

i

denotes de distribution of X|Y = i and variables is the set of relevant variables in each Gaussian or Mixture case. We call them “relevant”

in the sense that the optimal classification rule depends only on these variables. In the list below the variables written in boldface are “especially relevant” regarding their influence in the optimal classifier.

1. Gaussian models considered :

1. G1:

µ0: B(t)

µ1: B(t) +θt , θ∼N(0,3) variables={X100}.

2. G1b:

µ0: B(t)

µ1: B(t) +θt , θ∼N(0,5) variables={X100}.

3. G2:

µ0: B(t) +t µ1: B(t) variables={X100}.

4. G2b:

µ0: B(t) + 3t µ1: B(t) variables={X100}.

(2)

5. G3:

µ0: BB(t) µ1: B(t) variables={X100}.

6. G4:

µ0: B(t) +hillside0.5,4(t) µ1: B(t)

variables={X47,X100}.

7. G5:

µ0: B(t) + 3Φ1,1(t) µ1: B(t) variables={X1,X48, X100}.

8. G6:

µ0: B(t) + 5Φ2,2(t) µ1: B(t) variables={X48,X75, X100}.

9. G7:

µ0: B(t) + 5Φ3,2(t) + 5Φ3,4(t)

µ1: B(t)

variables={X22,X35, X49, X74,X88, X100}.

10. G8:

µ0: B(t) + 3Φ2,1.25(t) + 3Φ2,2(t)

µ1: B(t)

variables={X9,X35, X48, X62,X75, X100}.

2. logistic-type models considered: they are all defined according standard (ii) (see Sec. 4 in the main paper). The processX =X(t) follows one of the distributions mentioned above and Y = Binom(1, η(X)) with η(x) = (1 +e−ψ(x(t1),···,x(tk)))−1, a function of the relevant variables x(t1),· · ·, x(tk).

L1: ψ(X) = 10X65.

L2: ψ(X) = 10X30+ 10X70. L3: ψ(X) = 10X30−10X70. L4: ψ(X) = 20X30+ 50X5020X80. L5: ψ(X) = 20X30−50X50+ 20X80.

L6: ψ(X) = 10X10+ 30X40+ 10X72+ 10X80+ 20X95. L7: ψ(X) =P10

i=110X10i.

L8: ψ(X) = 20X302 + 10X504 + 50X803 . L9: ψ(X) = 10X10+ 10|X50|+ 0X302X85. L10: ψ(X) = 20X33+ 20|X68|.

L11: ψ(X) =X20

35 +X30

77. L12: ψ(X) = logX35+ logX77.

L13: ψ(X) = 40X20+ 30X28+ 20X62+ 10X67. L14: ψ(X) = 40X20+ 30X28−20X62−10X67. L15: ψ(X) = 40X20−30X28+ 20X62−10X67.

Some variations of these models have been also considered:

(3)

L3b: ψ(X) = 30X30−20X70.

L4b: ψ(X) = 30X30+ 20X50+ 10X80. L5b: ψ(X) = 10X30−10X50+ 10X80.

L6b: ψ(X) = 20X10+ 20X40+ 20X72+ 20X80+ 20X95. L8b: ψ(X) = 10X302 + 10X504 + 10X803.

3. Mixture-type models: they are obtained by combining (via mixtures) in several ways the above mentioned Gaussian distributions assumed forX|Y = 0 andX|Y = 1. These models are denoted M1, ..., M10 in the output tables.

1. M1:

µ0:

B(t) + 3t, 1/2 B(t)2t, 1/2

µ1: B(t)

variables={X100}.

2. M2:

µ0:

B(t) + 3Φ2,2(t), 1/2 B(t) + 5Φ3,2(t), 1/2

µ1: B(t)

variables={X22,X35, X48,X75, X100}.

3. M3:

µ0:

B(t) + 3Φ2,2(t), 1/10 B(t) + 5Φ3,2(t), 9/10

µ1: B(t)

variables={X22,X35, X48,X75, X100}.

4. M4:

µ0:

B(t) + 3Φ2,2(t), 1/2 B(t) + 5Φ3,3(t), 1/2

µ1: B(t)

variables={X48,X62,X75, X100}.

5. M5:

µ0:

B(t) + 3Φ2,1(t) ,1/3 B(t) + 3Φ2,2(t), 1/3 B(t) + 5Φ3,2(t), 1/3

µ1: B(t)

variables={X1,X22,X35, X48,X75, X100}.

6. M6:

µ0:

B(t) + 3Φ2,1(t) ,1/2 B(t) + 3t ,1/2

µ1: B(t)

variables={X1,X22, X49,X100}.

7. M7:

µ0:

B(t) + 3Φ1,1(t) ,1/2 BB(t) ,1/2

µ1: B(t)

variables={X1,X48,X100}.

8. M8:

µ0:

B(t) +θt, θN(0,5) ,1/2 B(t) +hillside0.5,5(t) ,1/2

µ1: B(t)

variables={X47,X100}.

9. M9:

µ0:

B(t) +θt, θN(0,5) ,1/2

BB(t) ,1/2

µ1: B(t)

variables=X100.

10. M10:

µ0:

B(t) + 3Φ1,1(t) ,1/3 B(t)3t ,1/3 BB(t) ,1/3

µ1: B(t)

variables={X1,X48,X100}.

11. M11:

µ0:

B(t) + 3Φ1,1(t) ,1/4 B(t)3t ,1/4 B(t) +hillside0.5,5(t) ,1/4

BB(t) ,1/4

µ1: B(t)

variables={X1,X48,X100}.

(4)

Finally, the full list of models involved is as follows:

1. L1 OU

2. L1 OUt

3. L1 B

4. L1 sB

5. L1 ssB

6. L2 OU

7. L2 OUt

8. L2 B

9. L2 sB

10. L2 ssB

11. L3 OU

12. L3b OU

13. L3 OUt

14. L3b OUt

15. L3 B

16. L3b B

17. L3 sB

18. L3 ssB

19. L4 OU

20. L4b OU

21. L4 OUt

22. L4b OUt

23. L4 B

24. L4 sB

25. L4 ssB

26. L5 OU

27. L5b OU

28. L5 OUt

29. L5 B

30. L5 sB

31. L5 ssB

32. L6 OU

33. L6b OU

34. L6 OUt

35. L6b OUt

36. L6 B

37. L6 sB

38. L6 ssB

39. L7 OU

40. L7b OU

41. L7 OUt

42. L7b OUt

43. L7 B

44. L7 sB

45. L7 ssB

46. L8 B

47. L8 sB

48. L8 ssB

49. L8b OU

50. L9 B

51. L9 sB

52. L9 ssB

53. L10 OU

54. L10 B

55. L10 sB

56. L10 ssB

57. L11 OU

58. L11 OUt

59. L11 B

60. L11 sB

61. L11 ssB

62. L12 OU

63. L12 OUt

64. L12 B

65. L12 sB

66. L12 ssB

67. L13 OU

68. L13 OUt

69. L13 B

70. L13 sB

71. L13 ssB

72. L14 OU

73. L14 OUt

74. L14 B

75. L14 sB

76. L15 OU

77. L15 OUt

78. L15 B

79. L15 sB

80. G1

81. G1b

82. G2

83. G2b

84. G3

85. G4

86. G5

87. G6

88. G7

89. G8

90. M1

91. M2

92. M3

93. M4

94. M5

95. M6

96. M7

97. M8

98. M9

99. M10

100. M11

(5)

We next provide a few simulation results. Seewww.uam.es/antonio.cuevas/exp/mRMR-outputs.

xlsxfor the full simulation outputs.

NB accuracy outputs

Model MID FCD RD VD CD Base

L7 OU 90.73 87.01 90.41 87.72 90.50 93.35 L1 OUt 72.54 74.92 74.10 74.56 74.26 73.33 L14 B 75.89 77.31 77.17 76.72 76.75 75.33 L9 sB 84.73 86.01 85.84 85.40 85.78 82.27

G1b 79.74 75.94 80.73 81.66 77.67 79.49

G3 76.22 64.54 78.84 78.78 72.67 71.24

G6 83.13 83.50 84.38 83.96 84.28 74.90

M1 79.12 73.09 80.93 81.89 76.77 77.61

M4 64.73 68.04 67.91 67.48 67.89 61.36

M6 81.25 80.09 83.06 82.45 83.21 78.02

Table 10.- Average NB accuracy (proportion of correct classification) outputs, over 200 runs of the considered methods with sample sizen= 50.

NB number of variables

Model MID FCD RD VD CD Base

L7 OU 12.1 14.1 10.1 11.5 11.1 100

L1 OUt 8.9 8.0 7.0 5.9 6.6 100

L14 B 7.9 8.0 6.0 7.5 5.9 100

L9 sB 4.9 7.5 4.7 4.9 5.3 100

G1b 6.9 13.4 4.8 1.5 9.0 100

G3 6.4 11.6 3.7 3.7 8.2 100

G6 3.4 1.9 3.5 3.1 2.7 100

M1 4.9 11.7 4.5 1.4 8.4 100

M4 6.5 5.0 5.1 3.4 4.6 100

M6 7.1 7.6 5.5 5.6 6.4 100

Table 11.- Average number of selected variables over 200 runs of the considered methods with sample sizen= 50 using NB.

(6)

k-NN accuracy outputs

Model MID FCD RD VD CD Base

L7 OU 90.74 86.89 90.78 87.79 90.67 92.21 L1 OUt 75.83 77.34 76.77 77.22 77.11 75.81 L14 B 75.34 77.16 77.29 76.20 76.49 74.43 L9 sB 86.79 87.39 87.35 87.03 87.28 86.10

G1b 79.11 75.74 80.00 80.01 78.13 78.57

G3 73.39 61.97 77.28 77.03 68.20 65.26

G6 91.95 84.16 88.22 85.80 84.68 92.19

M1 81.63 75.47 82.79 83.00 80.59 80.72

M4 72.94 71.60 74.72 70.30 71.66 73.29

M6 83.08 79.70 84.13 84.26 84.02 80.99

Table 12.- Averagek-NN accuracy (proportion of correct classification) outputs, over 200 runs of the considered methods with sample sizen= 50.

k-NN number of variables

Model MID FCD RD VD CD Base

L7 OU 11.5 14.5 10.4 12.1 11.3 100

L1 OUt 9.0 7.9 6.9 6.5 6.8 100

L14 B 8.3 7.6 5.5 8.0 6.5 100

L9 sB 6.3 7.7 6.0 7.1 6.0 100

G1b 7.8 11.7 6.5 6.3 8.7 100

G3 5.1 11.2 2.5 2.9 7.8 100

G6 11.5 12.6 9.0 8.2 7.5 100

M1 7.8 11.4 6.3 4.9 8.6 100

M4 11.7 16.1 10.4 9.7 10.1 100

M6 9.9 9.6 7.5 8.7 7.6 100

Table 13.- Average number of selected variables over 200 runs of the considered methods with sample sizen= 50 usingk-NN.

LDA accuracy outputs

Model MID FCD RD VD CD Base

L7 OU 89.27 85.13 89.75 86.81 90.10 -

L1 OUt 72.05 73.74 73.48 73.96 73.66 -

L14 B 75.25 76.35 77.12 75.62 76.27 -

L9 sB 84.91 84.88 84.96 84.69 85.12 -

G1b 53.35 52.22 54.15 54.49 51.67 -

G3 52.26 50.91 53.53 53.51 51.06 -

G6 95.28 87.92 90.54 86.59 87.80 -

M1 54.44 53.64 54.88 54.68 53.70 -

M4 78.95 70.98 76.46 71.37 71.30 -

M6 79.92 79.53 80.80 80.57 80.70 -

Table 14.- Average LDA accuracy (proportion of correct classification) outputs, over 200 runs of the considered methods with sample sizen= 50.

(7)

LDA number of variables

Model MID FCD RD VD CD Base

L7 OU 5.5 6.9 6.2 6.1 7.2 -

L1 OUt 5.6 4.6 4.6 4.4 4.9 -

L14 B 3.6 3.2 3.1 4.1 3.6 -

L9 sB 4.0 3.5 3.3 3.4 3.5 -

G1b 5.6 7.0 5.3 4.6 7.4 -

G3 7.1 8.9 5.0 5.0 8.9 -

G6 10.7 14.3 11.2 7.8 11.6 -

M1 5.5 7.2 5.3 5.7 6.8 -

M4 11.1 10.2 10.3 7.4 8.5 -

M6 6.4 4.5 5.2 5.2 4.8 -

Table 15.- Average number of selected variables over 200 runs of the considered methods with sample sizen= 50 using LDA.

SVM accuracy outputs

Model MID FCD RD VD CD Base

L7 OU 90.73 87.01 90.41 87.72 90.50 93.35 L1 OUt 72.54 74.92 74.10 74.56 74.26 73.33 L14 B 75.89 77.31 77.17 76.72 76.75 75.33 L9 sB 84.73 86.01 85.84 85.40 85.78 82.27

G1b 79.74 75.94 80.73 81.66 77.67 79.49

G3 76.22 64.54 78.84 78.78 72.67 71.24

G6 83.13 83.50 84.38 83.96 84.28 74.90

M1 79.12 73.09 80.93 81.89 76.77 77.61

M4 64.73 68.04 67.91 67.48 67.89 61.36

M6 81.25 80.09 83.06 82.45 83.21 78.02

Table 16.- Average SVM accuracy (proportion of correct classification) outputs, over 200 runs of the considered methods with sample sizen= 50.

SVM number of variables

Model MID FCD RD VD CD Base

L7 OU 12.1 14.1 10.1 11.5 11.1 100

L1 OUt 8.9 8.0 7.0 5.9 6.6 100

L14 B 7.9 8.0 6.0 7.5 5.9 100

L9 sB 4.9 7.5 4.7 4.9 5.3 100

G1b 6.9 13.4 4.8 1.5 9.0 100

G3 6.4 11.6 3.7 3.7 8.2 100

G6 3.4 1.9 3.5 3.1 2.7 100

M1 4.9 11.7 4.5 1.4 8.4 100

M4 6.5 5.0 5.1 3.4 4.6 100

M6 7.1 7.6 5.5 5.6 6.4 100

Table 17.- Average number of selected variables over 200 runs of the considered methods with sample sizen= 50 using SVM.

(8)

2 Results with quotient criterion and other classifiers

The outputs included in the paper correspond to the difference criterion (in order to combine the relevance and redundancy measures). We provide here some additional results for the quotient criterion, instead of the difference one. Besides, Figure 2 in the paper was produced using only the outputs for the k-NN classfier. In this document we show the analogous displays for LDA, SVM and NB. Again, the entire simulation outputs can be downloaded from www.uam.es/antonio.cuevas/exp/mRMR-outputs.xlsx.

Tables 1a-9a below correspond and Tables 1-9 in the main paper with the difference criterion replaced with the quotient one. Figures 2a, 2b and 2c correspond to Figure 2 using the other classifiers.

Output (NB) Sample size MIQ FCQ RQ VQ CQ Base

Avgerage accuracy n= 30 78.10 78.76 79.53 79.58 79.12 77.28 n= 50 79.59 79.73 80.86 80.81 80.26 78.29 n= 100 80.62 80.54 81.82 81.75 81.16 78.84 n= 200 81.24 81.03 82.48 82.35 81.77 79.13 Average dim. red n= 30 8.9 8.6 7.2 7.0 7.9 100

n= 50 8.3 8.1 6.7 6.7 7.4 100

n= 100 7.7 7.2 6.1 6.1 6.9 100

n= 200 7.1 6.7 5.7 5.9 6.5 100

Victories over Base n= 30 60 65 72 76 68 -

n= 50 67 61 79 78 68 -

n= 100 71 64 88 86 79 -

n= 200 75 68 92 91 84 -

Table 1a.- Performance outputs for the considered methods, using NB and the quotient criterion, with different sample sizes. Each output is the result of the 100 different models for each sample size.

Output (k-NN) Sample size MIQ FCQ RQ VQ CQ Base

Avgerage accuracy n= 30 80.02 79.65 80.85 80.82 80.09 78.98 n= 50 81.32 80.40 81.72 81.67 80.87 80.34 n= 100 82.84 81.34 82.73 82.65 81.83 81.99 n= 200 84.06 82.09 83.56 83.49 82.65 83.38

Average dim. red n= 30 9.3 9.5 7.4 7.6 8.1 100

n= 50 9.6 9.6 7.6 7.8 8.3 100

n= 100 9.9 9.9 8.0 8.2 8.6 100

n= 200 10.1 10.1 8.3 8.5 9.0 100

Victories over Base n= 30 71 58 76 74 67 -

n= 50 67 53 73 72 64 -

n= 100 71 49 64 64 55 -

n= 200 64 42 62 64 54 -

Table 2a.- Performance outputs for the considered methods, usingk-NN and the quotient criterion, with different sample sizes. Each output is the result of the 100 different models for each sample size.

(9)

Output (LDA) Sample size MIQ FCQ RQ VQ CQ Base Avgerage accuracy n= 30 78.67 77.58 78.64 78.64 78.15 60.80 n= 50 80.20 78.53 79.58 79.52 79.11 58.75 n= 100 81.74 79.62 80.65 80.52 80.25 53.11 n= 200 82.90 80.47 81.53 81.35 81.10 73.25

Average dim. red n= 30 5.8 4.7 4.7 4.7 5.1 100

n= 50 6.9 5.7 5.6 5.6 6.0 100

n= 100 8.3 7.1 6.9 7.0 7.3 100

n= 200 9.5 8.3 8.0 8.1 8.4 100

Table 3a.- Performance outputs for the considered methods, using LDA and the quotient criterion, with different sample sizes. Each output is the result of the 100 different models for each sample size.

Output (SVM) Sample size MIQ FCQ RQ VQ CQ Base

Avgerage accuracy n= 30 81.62 79.81 80.69 80.65 80.27 81.91 n= 50 82.69 80.42 81.43 81.35 80.96 82.99 n= 100 83.80 81.21 82.20 82.12 81.76 84.11 n= 200 84.61 81.79 82.90 82.76 82.42 84.91 Average dim. red n= 30 10.6 10.3 9.1 9.2 9.5 100

n= 50 10.7 10.4 9.3 9.4 9.7 100

n= 100 11.1 10.5 9.5 9.7 9.9 100

n= 200 11.4 10.7 9.7 9.9 10.0 100

Victories over Base n= 30 32 37 49 47 42 -

n= 50 35 34 51 52 44 -

n= 100 35 33 51 50 48 -

n= 200 33 31 52 51 48 -

Table 4a.- Performance outputs for the considered methods, using SVM and the quotient criterion, with different sample sizes. Each output is the result of the 100 different models for each sample size.

(10)

Ranking criterion (NB) Sample size MIQ FCQ RQ VQ CQ

Relative n= 30 2.27 5.67 8.69 8.63 7.65

n= 50 2.69 4.94 9.09 8.70 7.51

n= 100 2.75 4.75 9.21 8.80 7.44

n= 200 2.71 4.57 8.87 8.31 7.41

Positional n= 30 6.78 7.83 8.53 8.50 8.39

n= 50 6.79 7.57 8.93 8.51 8.22

n= 100 6.80 7.47 9.01 8.58 8.14

n= 200 6.84 7.56 8.92 8.43 8.25

F1 n= 30 12.25 15.85 17.39 17.58 17.01

n= 50 12.24 14.90 19.19 17.37 16.35 n= 100 12.35 14.60 19.67 17.38 16 n= 200 12.49 14.92 19.28 16.86 16.45

Table 5a.- Global scores of the considered (quotient-based) methods using three different ranking criteria with the NB classifier.

Ranking criterion (k-NN) Sample size MIQ FCQ RQ VQ CQ

Relative n= 30 3.70 3.97 8.51 8.09 6.03

n= 50 4.39 3.72 7.84 7.59 5.61

n= 100 4.93 3.46 7.24 6.91 5.15

n= 200 5.52 3.00 6.72 6.52 5.00

Positional n= 30 7.31 7.29 9.06 8.64 7.70

n= 50 7.55 7.30 8.90 8.67 7.58

n= 100 7.75 7.37 8.82 8.62 7.49

n= 200 7.96 7.43 8.56 8.45 7.60

F1 n= 30 14.32 13.96 19.66 17.80 14.26

n= 50 15.27 14.06 18.78 17.94 13.95 n= 100 16.02 14.19 18.44 17.90 13.62 n= 200 16.84 14.09 17.48 17.57 14.02 Table 6a.- Global scores of the considered (quotient-based) methods using three different ranking criteria with thek-NN classifier.

(11)

Ranking criterion (LDA) Sample size MIQ FCQ RQ VQ CQ

Relative n= 30 3.94 3.14 7.48 7.38 5.38

n= 50 4.26 2.86 6.97 6.66 5.19

n= 100 4.76 2.60 6.98 6.43 5.33

n= 200 5.27 2.36 6.78 6.13 5.23

Positional n= 30 7.49 6.99 9.01 8.96 7.55

n= 50 7.64 7.12 8.90 8.64 7.70

n= 100 7.72 7.13 8.89 8.52 7.74

n= 200 7.80 7.23 8.79 8.36 7.82

F1 n= 30 15.05 12.67 19.11 19.32 13.85

n= 50 15.63 12.95 18.91 18.07 14.44 n= 100 15.80 13.04 18.83 17.72 14.61 n= 200 16.25 13.29 18.62 17.04 14.80

Table 7a.- Global scores of the considered (quotient-based) methods using three different ranking criteria with LDA.

Ranking criterion (SVM) Sample size MIQ FCQ RQ VQ CQ

Relative n= 30 6.02 2.85 6.58 6.28 4.42

n= 50 5.99 2.72 6.70 6.15 4.73

n= 100 6.14 2.61 6.43 5.99 4.66

n= 200 6.42 2.30 6.29 5.75 4.68

Positional n= 30 8.26 7.20 8.74 8.46 7.34

n= 50 8.16 7.19 8.80 8.43 7.42

n= 100 8.26 7.31 8.66 8.32 7.48

n= 200 8.28 7.36 8.61 8.22 7.56

F1 n= 30 17.90 13.78 17.99 17.17 13.16

n= 50 17.58 13.51 18.21 17.28 13.42 n= 100 17.97 13.86 17.71 16.95 13.59 n= 200 17.89 13.85 17.70 16.58 14.06 Table 8a.- Global scores of the considered (quotient-based) methods using three different ranking criteria with the linear SVM.

(12)

NB outputs

Output Data MIQ FCQ RQ VQ CQ Base

Classification accuracy Growth 88.17 87.10 86.02 86.02 87.10 84.95 Tecator 96.28 97.67 99.53 99.53 98.14 97.21 Phoneme 73.08 80.20 80.38 80.15 80.32 74.08

Number of variables Growth 1.5 1.1 1.1 1.1 1.1 31

Tecator 4.8 5.0 1.0 1.0 4.4 100

Phoneme 14.5 10.6 16.7 16.8 14.1 256

k-NN outputs

Output Data MIQ FCQ RQ VQ CQ Base

Classification accuracy Growth 95.70 83.87 83.87 83.87 83.87 96.77 Tecator 96.74 99.07 99.53 99.53 98.60 98.60 Phoneme 75.53 81.42 79.79 80.38 80.61 78.80

Number of variables Growth 3.9 1.0 1.0 1.0 1.0 31

Tecator 4.0 3.0 1.0 1.0 4.3 100

Phoneme 18.4 12.1 12.3 15.2 6.7 256

LDA outputs

Output Data MIQ FCQ RQ VQ CQ Base

Classification accuracy Growth 95.70 91.40 91.40 91.40 91.40 -

Tecator 94.88 94.42 94.88 94.42 95.35 -

Phoneme 74.55 78.88 79.10 79.63 80.26 -

Number of variables Growth 3.7 5.0 4.9 4.9 5.0 -

Tecator 6.1 8.4 4.1 2.2 3.1 -

Phoneme 19.0 8.9 10.0 9.0 9.2 -

SVM outputs

Output Data MIQ FCQ RQ VQ CQ Base

Classification accuracy Growth 94.62 87.10 87.10 87.10 86.02 95.70 Tecator 98.14 99.07 99.53 99.07 98.60 99.07 Phoneme 75.30 80.71 80.67 80.37 80.33 80.96

Number of variables Growth 3.5 5.0 4.9 4.9 5.0 31

Tecator 6.7 2.1 1.0 1.0 4.1 100

Phoneme 19.3 10.1 11.3 10.8 12.2 256

Table 9a.- Performances of the different (quotient-based) mRMR methods in three real data sets. From top to bottom tables stand for Naive Bayes,k-NN, LDA and linear SVM outputs respectively.

(13)

Simulation experiments

50 100 150 200 250 300 350 400

MID

FDC

RD

VD

CD

0 1 2 3 4 5 6 7 8 9 10

Figure 2a.- Cromatic version of the global relative ranking table taking into account the 400 considered experiments (columns) and the Naive Bayes classifier.

Simulation experiments

50 100 150 200 250 300 350 400

MID

FDC

RD

VD

CD

0 1 2 3 4 5 6 7 8 9 10

Figure 2b.- Cromatic version of the global relative ranking table taking into account the 400 considered experiments (columns) and the Linear Discriminant Analysis.

Simulation experiments

50 100 150 200 250 300 350 400

MID

FDC

RD

VD

CD

0 1 2 3 4 5 6 7 8 9 10

Figure 2c.- Cromatic version of the global relative ranking table taking into account the 400 considered experiments (columns) and the linear SVM.

Referencias

Documento similar

Aunque todavía no se ha logrado la inclusión de las nuevas tecnologías de la información para todos los estratos de la población venezolana, sobre todo para los grupos de edad