SECRETARIA DE EDUCACION PUBLICA
DECLARACIONES I De “LA SEP”:
7.1 Conclusion
DES-LEC enforce local expertise consistency and the resulting base learners pool from where better ensemble is selected. DES-CRL rebuild the neighborhood and create a better competence region that promoting good classifiers in selection. DES-RE combine everything together and make more out of DES-LEC and DES-CRL. We also see the performance of methods varies regarding to different performance scores. Depending on preference of a given task, the right technique should be selected to apply.
7.2 Future work
In DES-LEC, we currently increase/decrease by fixed percentage. However, other weighting strategy could also be applied in DES-LEC. Similarly, many other metric learning methods other than LMNN could be adopted in DES-CRL. Neighborhood size K currently the same in DES-LEC and DES-CRL. Probably not the best option.
LEC optimization needs to update hundreds of classifiers independently. It’s easy to parallelize in a distributed computing platform such as Spark.
Optimization of metric learning involves heavy matrix operations such like positive semi-definite projection and GPU computing is good at matrix calculation.
REFERENCES
[1] L. Hansen and P. Salamon, “Neural network ensembles,”Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 12, no. 10, pp. 993–1001, Oct 1990.
[2] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theor., vol. 13, no. 1, pp. 21–27, Sep. 2006. [Online]. Available: http: //dx.doi.org/10.1109/TIT.1967.1053964
[3] V. N. Vapnik, The Nature of Statistical Learning Theory. New York, NY, USA:
Springer-Verlag New York, Inc., 1995.
[4] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, Jun. 1998. [Online]. Available: http://dx.doi.org/10.1023/A:1009715923555
[5] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, Mar. 1986. [Online]. Available: http://dx.doi.org/10.1023/A:1022643204877
[6] ——, C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kauf- mann Publishers Inc., 1993.
[7] R. O. Duda, P. E. Hart, and D. G. Stork,Pattern Classification (2Nd Edition). Wiley- Interscience, 2000.
[8] T. M. Mitchell, Machine Learning, 1st ed. New York, NY, USA: McGraw-Hill, Inc., 1997.
[9] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd International Conference on Machine Learning, ser. ICML ’06. New York, NY, USA: ACM, 2006, pp. 161–168. [Online]. Available: http://doi.acm.org/10.1145/1143844.1143865
[10] T. G. Dietterich, “An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization,” Mach. Learn., vol. 40, no. 2, pp. 139–157, Aug. 2000. [Online]. Available: http://dx.doi.org/10.1023/A: 1007607513941
[11] H. D. Naftaly U., Intrator N., “Optimal ensemble averaging of neural networks,” Net- work, Computation in Neural Systems, vol. 8, pp. 283–296, 1997.
[12] D. W. Opitz and R. Maclin, “Popular ensemble methods: An empirical study.”
J. Artif. Intell. Res. (JAIR), vol. 11, pp. 169–198, 1999. [Online]. Available: http://dblp.uni-trier.de/db/journals/jair/jair11.html#OpitzM99
[13] R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21–45, 2006.
[14] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, Aug. 1996. [Online]. Available: http://dx.doi.org/10.1023/A:1018054314350
[15] ——, “Bias, variance, and arcing classifiers,” Tech. Rep., 1996.
[16] P. Domingos, “Why does bagging work? a bayesian account and its implications,” inIn Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1997, pp. 155–158.
[17] I. Davidson, “An ensemble technique for stable learners with performance bounds,” in Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, July 25-29, 2004, San Jose, California, USA, 2004, pp. 330–335. [Online]. Available: http://www.aaai.org/Library/AAAI/2004/aaai04-053.php
[18] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, (First Edition). Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2005.
[19] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in International Conference on Machine Learning, 1996, pp. 148–156. [Online]. Available: citeseer.nj.nec.com/freund96experiments.html
[20] Y. Freund, “The alternating decision tree learning algorithm,” inIn Machine Learning: Proceedings of the Sixteenth International Conference. Morgan Kaufmann, 1999, pp. 124–133.
[21] J. R. Quinlan, “Bagging, boosting, and c4.5,” in In Proceedings of the Thirteenth Na- tional Conference on Artificial Intelligence. AAAI Press, 1996, pp. 725–730.
[22] Y. Freund and R. E. Schapire, “A short introduction to boosting,” in In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 1999, pp. 1401–1406.
[23] T. G. Dietterich, “Ensemble methods in machine learning,” in Proceedings
of the First International Workshop on Multiple Classifier Systems, ser. MCS
’00. London, UK, UK: Springer-Verlag, 2000, pp. 1–15. [Online]. Available:
http://dl.acm.org/citation.cfm?id=648054.743935
[24] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001. [Online]. Available: http://dx.doi.org/10.1023/A:1010933404324
[25] M. A. Arbib, Ed.,The Handbook of Brain Theory and Neural Networks, 2nd ed. Cam- bridge, MA, USA: MIT Press, 2002.
[26] T. K. Ho, “The random subspace method for constructing decision forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp. 832–844, Aug. 1998. [Online]. Available: http://dx.doi.org/10.1109/34.709601
[27] R. Daz-Uriarte and S. A. de Andrs, “Gene selection and classification of microarray data using random forest.”BMC Bioinformatics, vol. 7, p. 3, 2006. [Online]. Available: http://dblp.uni-trier.de/db/journals/bmcbi/bmcbi7.html#Diaz-UriarteA06
[28] Z.-H. Zhou, J. Wu, and W. Tang, “Ensembling neural networks: Many could be better than all,” Artif. Intell., vol. 137, no. 1-2, pp. 239–263, May 2002. [Online]. Available: http://dx.doi.org/10.1016/S0004-3702(02)00190-X
[29] G. Martinez-Muoz, D. Hernandez-Lobato, and A. Suarez, “An analysis of ensemble pruning techniques based on ordered aggregation,” Pattern Analysis and Machine In- telligence, IEEE Transactions on, vol. 31, no. 2, pp. 245–259, Feb 2009.
[30] A. S. B. Jr., R. Sabourin, and L. E. S. de Oliveira, “Dynamic selection of classifiers - A comprehensive review,” Pattern Recognition, vol. 47, no. 11, pp. 3665–3680, 2014. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2014.05.003
[31] E. M. Dos Santos, R. Sabourin, and P. Maupin, “A dynamic overproduce-and-choose strategy for the selection of classifier ensembles,” Pattern Recogn., vol. 41, no. 10, pp. 2993–3009, Oct. 2008. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2008.03. 027
[32] K. Woods, W. P. Kegelmeyer, Jr., and K. Bowyer, “Combination of multiple classifiers using local accuracy estimates,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 4, pp. 405–410, Apr. 1997. [Online]. Available: http://dx.doi.org/10.1109/34.588027
[33] P. Smits, “Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection,”Geoscience and Remote Sensing, IEEE Transac- tions on, vol. 40, no. 4, pp. 801–813, Apr 2002.
[34] G. Giacinto and F. Roli, “Methods for dynamic classifier selection.” Lecture, 1999, pp. 177–189.
[35] T. Woloszynski and M. Kurzynski, “A measure of competence based on randomized reference classifier for dynamic ensemble selection,” in Proceedings of the 2010 20th International Conference on Pattern Recognition, ser. ICPR ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 4194–4197. [Online]. Available: http://dx.doi.org/10.1109/ICPR.2010.1019
[36] ——, “A probabilistic model of classifier competence for dynamic ensemble selection,” Pattern Recogn., vol. 44, no. 10-11, pp. 2656–2668, Oct. 2011. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2011.03.020
[37] G. Giacinto and F. Roli, “Dynamic classifier selection based on multiple classifier be- haviour,” Pattern Recognition, vol. 34, pp. 1879–1881, 2001.
[38] L. Kuncheva and J. Rodriguez, “Classifier ensembles with a random linear oracle,” Knowledge and Data Engineering, IEEE Transactions on, vol. 19, no. 4, pp. 500–508, April 2007.
[39] A. H. R. Ko, R. Sabourin, and A. S. Britto, Jr., “From dynamic classifier selection to dynamic ensemble selection,” Pattern Recogn., vol. 41, no. 5, pp. 1718–1731, May 2008. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2007.10.015
[40] M. Lichman, “UCI machine learning repository,” 2013. [Online]. Available:
http://archive.ics.uci.edu/ml
[41] A. Bellet, A. Habrard, and M. Sebban, “A survey on metric learning for feature vectors and structured data,” CoRR, vol. abs/1306.6709, 2013. [Online]. Available: http://arxiv.org/abs/1306.6709
[42] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, “Distance metric learning, with application to clustering with side-information,” inADVANCES IN NEURAL INFOR- MATION PROCESSING SYSTEMS 15. MIT Press, 2003, pp. 505–512.
[43] M. Schultz and T. Joachims, “Learning a distance metric from relative comparisons,” in In NIPS. MIT Press, 2004.
[44] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, “Neighbourhood components analysis,” inAdvances in Neural Information Processing Systems 17. MIT Press, 2004, pp. 513–520. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.108.7841
[45] A. Globerson and S. T. Roweis, “Metric learning by collapsing classes.” in
NIPS, 2005. [Online]. Available: http://dblp.uni-trier.de/db/conf/nips/nips2005. html#GlobersonR05
[46] K. Weinberger and L. Saul, “Fast solvers and efficient implementations for distance met- ric learning,” in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 1160–1167.
[47] K. Q. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,”J. Mach. Learn. Res., vol. 10, pp. 207–244, Jun. 2009. [Online]. Available: http://dl.acm.org/citation.cfm?id=1577069.1577078