• No se han encontrado resultados

La preparación profesional del tutor para la dirección del proceso de tutoría

CAPÍTULO 1. REFERENTES TEÓRICOS DEL PROCESO DE PREPARACIÓN

1.3 La preparación del tutor para su desempeño profesional en la dirección del proceso

1.3.3 La preparación profesional del tutor para la dirección del proceso de tutoría

The data shaping technique proposed in Chapter Three works on encoded data. The queries and the input documents need to be encoded. The data encoding incurs a finite overhead and

can have implications of performance. Two approaches are possible. One approach is to reduce the overhead associated with data encoding process. Another approach is to avoid data encoding and use the input as it is. Avoiding data encoding has a disadvantage: the footprint of working memory or working set size (WSS) of the application is likely to increase which may have some performance implication when using GPUs.

Study in Chapter Four suggests several directions for research. One of the directions is to explore design space to address the record linkage problem occurring in unstructured data domain leveraging the research discussed in this chapter. Another research direction is to explore the applicability of similar hash-based algorithms for efficient mining of linked records in semistructured and unstructured data sets using parallel architecture-based processors.

The research in Chapter Five points that changes to structure of trees, for example, orien- tation of subtree, can result in different computation patterns. This idea can be used to design a efficient techniques for tree matching problems in context of unordered trees and for problems involving tree rotations common in bioinformatics.

Deduplication with variable sized chunking (on CPU) incurs a finite overhead which could a bottleneck in performance. The research in Chapter Six suggests a research direction on how to increase throughput of deduplication process. To achieve a high throughput dedupli- cation, for example, wire-rate throughput, we need to address the overhead associated with variables-sized chunking. This can be done designing faster variable-sized chunking algorithms. Another possible approach is to shun the variable-sized chunking and use fized-sized chnking instead. The fixed-sized chunking has one drawback: the fixed-sized chunking techniques lead to reduction in deduplication gains. In order to mitigate the losses incurred in deduplication due to fixed sized chunking we can use smaller chunk sizes.

Bibliography

[1] Expat: The expat xml parser. 2012. URL http://expat.sourceforge.net/.

[2] Hasso plattner institut: Repeatability data sets. August 2012. URL http://www.hpi.

uni-potsdam.de/naumann/projekte/repeatability/datasets.html.

[3] Top 500: The list. top500.org, November 2015. URL http://www.top500.org/lists/ 2015/11/.

[4] LuxMark Database: Overall top 20 medium benchmark. 2017. URL http://www. luxrender.net/luxmark/.

[5] Wikimedia downloads. Accessed: 2016-09-30. URL https://dumps.wikimedia.org. [6] Tatsuya Akutsu and Magn´us M Halld´orsson. On the approximation of largest common

subtrees and largest common point sets. Theoretical Computer Science, 233(1):33–50, 2000.

[7] S. Al-Khalifa, HV Jagadish, N. Koudas, J.M. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient xml query pattern matching. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 141–152. IEEE, 2002.

[8] Mar´ıa Alpuente and Daniel Romero. A visual technique for web pages comparison. Elec- tronic Notes in Theoretical Computer Science, 235:3–18, 2009.

[9] Maria Alpuente and Daniel Romero. A tool for computing the visual similarity of web pages. In SAINT, pages 45–51, 2010.

[10] Amihood Amir and Dmitry Keselman. Maximum agreement subtree in a set of evo- lutionary trees: Metrics and efficient algorithms. SIAM Journal on Computing, 26(6): 1656–1669, 1997.

[11] Ion Androutsopoulos and Prodromos Malakasiotis. A survey of paraphrasing and textual entailment methods. J. of Artificial Intelligence Research., pages 135–187, 2010.

[12] Apache. Welcome to apache avro! 2017. URL https://avro.apache.org. [13] Austin Appleby. Murmurhash 2.0, 2008.

[14] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.

[15] Nikolaus Augsten, Michael Bohlen, Curtis Dyreson, and Johann Gamper. Approximate joins for data-centric xml. In ICDE 2008, pages 814–823. IEEE, 2008.

[16] Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999.

[17] Rapha¨el Barazzutti, Thomas Heinze, Andr´e Martin, Emanuel Onica, Pascal Felber, Christof Fetzer, Zbigniew Jerzak, Marcelo Pasin, and Etienne Rivi`ere. Elastic scaling of a high-throughput content-based publish/subscribe engine. In Distributed Computing Systems (ICDCS), 2014 IEEE 34th Int. Conf. on, pages 567–576. IEEE, 2014.

[18] Omer Barkol, Ruth Bergman, and Shahar Golan. Mining web applications, October 11 2011. US Patent App. 13/271,036.

[19] Kirill Belyaev and Indrakshi Ray. Towards efficient dissemination and filtering of xml data streams. In Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on, pages 1870–1877. IEEE, 2015.

[20] Philip Bille. A survey on tree edit distance and related problems. Theoretical computer science, 337(1):217–239, 2005.

[21] David Guy Brizan and Abdullah Uz Tansel. A survey of entity resolution and record linkage methodologies. Communications of the IIMA, 6(3):5, 2015.

[22] Deng Cai, Xiaofei He, and Jiawei Han. Tensor space model for document analysis. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 625–626. ACM, 2006.

[23] Saverio Caminiti, Irene Finocchi, and Rossella Petreschi. On coding labeled trees. The- oretical Computer Science, 382(2):97–108, 2007.

[24] Cassandra. Accessed: Jan 2014. URLhttp://cassandra.apache.org/.

[25] Richard Cole, Martin Farach-Colton, Ramesh Hariharan, Teresa Przytycka, and Mikkel Thorup. An o (n log n) algorithm for the maximum agreement subtree problem for binary trees. SIAM Journal on Computing, 30(5):1385–1404, 2000.

[26] Mariano P Consens and Tova Milo. Algebras for querying text regions. In Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 11–22. ACM, 1995.

[27] International Data Consortium. Idc digital universe study: Big data, bigger digi- tal shadows and biggest growth in the far east. On Web, December 2012. URL

http://www.sintef.no/en/corporate-news/big-data--for-better-or-worse/.

[28] Thomas H Cormen. Introduction to algorithms. MIT press, 2009.

[29] Intel Corporation. Big data analytics: Peer research report 2012., August 2012.

URL http://www.intel.com/content/dam/www/public/us/en/documents/reports/

data-insights-peer-research-report.pdf.

[30] Intel Corporation. Intel atom processor. Accessed: Jan 2014. URL http://www.intel. com/content/www/us/en/processors/atom/atom-processor.html.

[31] Intel Corporation. Intel xeon phi. product family: Product brief. Accessed: Jan 2017.

URL http://www.intel.com/content/www/us/en/high-performance-computing/

high-performance-xeon-phi-coprocessor-brief.html.

[32] BaseX: The XML Database. Accessed: Jan 2014. URL http://basex.org/.

[33] Erik D Demaine, Shay Mozes, Benjamin Rossman, and Oren Weimann. An optimal decomposition algorithm for tree edit distance. In Automata, languages and programming, pages 146–157. Springer, 2007.

[34] Narsingh Deo and Paulius Micikevicius. A new encoding for labeled trees employing a stack and a queue. Bulletin of the Institute of Combinatorics and its Applications, 34: 77–85, 2002.

[35] Y. Diao, P. Fischer, M.J. Franklin, and R. To. Yfilter: Efficient and scalable filtering of xml documents. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 341–342. IEEE, 2002.

[36] Y. Diao, M. Altinel, M.J. Franklin, H. Zhang, and P. Fischer. Path sharing and predicate evaluation for high-performance xml filtering. ACM Transactions on Database Systems (TODS), 28(4):467–516, 2003.

[37] Ase Dragland. Big data – for better or worse. SINTEF (Online), May 2013. URL

http://www.sintef.no/en/corporate-news/big-data--for-better-or-worse/.

[38] Wayne W. Eckerson. Data quality and the bottom line. The Data Warehousing Institute (TDWI), February 2002.

[39] The Economist. Special report on managing information: Data, data everywhere. Ac- cessed: Jan 2014. URLhttp://www.economist.com/node/15557443/.

[40] Amy Feekin and Zhengxin Chen. Duplicate detection using k-way sorting method. In Proc. of the ACM symp. on Applied computing-Volume 1, pages 323–327. ACM, 2000.

[41] Daiji Fukagawa, Takeyuki Tamura, Atsuhiro Takasu, Etsuji Tomita, and Tatsuya Akutsu. A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures. BMC bioinformatics, 12(Suppl 1):S13, 2011.

[42] Phillip B Gibbons. A more practical pram model. In Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, pages 158–168. ACM, 1989.

[43] Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. General transformations for gpu execution of tree traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 10. ACM, 2013. [44] T. Green, G. Miklau, M. Onizuka, and D. Suciu. Processing xml streams with determin-

istic automata. Database TheoryICDT 2003, pages 173–189, 2002.

[45] T.J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing xml streams with deterministic automata and stream indexes. ACM Transactions on Database Systems (TODS), 29(4):752–788, 2004.

[46] GREEN500. The green500 list – june 2015. green500.org, June 2015. URL http: //www.green500.org/lists/green201506.

[47] Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and Marimuthu Palaniswami. Internet of things (iot): A vision, architectural elements, and future directions. Future generation computer systems, 29(7):1645–1660, 2013.

[48] Sudipto Guha, HV Jagadish, Nick Koudas, Divesh Srivastava, and Ting Yu. Approxi- mate xml joins. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 287–298. ACM, 2002.

[49] Arvind Gupta and Naomi Nishimura. Finding largest subtrees and smallest supertrees. Algorithmica, 21(2):183–210, 1998.

[50] Jing Han, E Haihong, Guan Le, and Jian Du. Survey on nosql database. In Pervasive computing and applications (ICPCA), 2011 6th Int. Conf. on, pages 363–366. IEEE, 2011.

[51] Oktie Hassanzadeh, Ken Q Pu, Soheil Hassas Yeganeh, Ren´ee J Miller, Lucian Popa, Mauricio A Hern´andez, and Howard Ho. Discovering linkage points over web data. Pro- ceedings of the VLDB Endowment, 6(6):445–456, 2013.

[52] Apache HBase. Accessed: Jan 2014. URLhttp://hbase.apache.org/.

[53] Mauricio A Hern´andez and Salvatore J Stolfo. The merge/purge problem for large databases. In ACM SIGMOD Record, volume 24, pages 127–138. ACM, 1995.

[54] Kouichi Hirata, Yoshiyuki Yamamoto, and Tetsuji Kuboyama. Improved max snp-hard results for finding an edit distance between unordered trees. In Combinatorial Pattern Matching, pages 402–415. Springer, 2011.

[55] HPCWire. Gpus power one-third of top russian supercomputers. Ac-

cessed: March 2014. URL http://www.hpcwire.com/2013/10/02/

gpus-power-one-third-of-top-russian-supercomputers/.

[56] Joel Hruska. The death of cpu scaling: From one core to many and why we are still stuck. Accessed: Jan 2014. URL http://www.extremetech.com/computing/

116561-the-death-of-cpu-scaling-from-one-core-to-many-and-why-were-still-stuck/ 3.

[57] Binary JSON. Accessed: Jan 2014. URL http://bsonspec.org/.

[58] Introducing JSON. Accessed: Jan 2014. URL http://www.json.org/.

[59] Richard M Karp and Michael O Rabin. Efficient randomized pattern-matching algo- rithms. IBM Journal of Research and Development, 31(2):249–260, 1987.

[60] David Kay and Mark van Harmelen: Activity data delivering benefits from the data deluge. Accessed: Jan 2014. URL http://www.jisc.ac.uk/publications/reports/ 2012/activity-data-delivering-benefits.aspx/.

[61] C. Kim, J. Chhugani, N. Satish, E. Sedlar, A.D. Nguyen, T. Kaldewey, V.W. Lee, S.A. Brandt, and P. Dubey. Fast: fast architecture sensitive tree search on modern cpus and

gpus. In Proc. of the 2010 ACM SIGMOD Int. Conf. on Management of data, pages 339–350. ACM, 2010.

[62] Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D Nguyen, Tim Kaldewey, Victor W Lee, Scott A Brandt, and Pradeep Dubey. Designing fast architecture-sensitive tree search on modern multicore/many-core processors. ACM Transactions on Database Systems (TODS), 36(4):22, 2011.

[63] Hung-sik Kim and Dongwon Lee. Harra: fast iterative hashed record linkage for large- scale data collections. In Proceedings of the 13th International Conference on Extending Database Technology, pages 525–536. ACM, 2010.

[64] Toralf Kirsten, Lars Kolb, Michael Hartung, Anika Groß, Hanna K¨opcke, and Erhard Rahm. Data partitioning for parallel entity matching. arXiv preprint arXiv:1006.5309, 2010.

[65] J Steven Kirtzic, Ovidiu Daescu, and TX Richardson. A parallel algorithm develop- ment model for the gpu architecture. In Proc. of Intl Conf. on Parallel and Distributed Processing Techniques and Applications, 2012.

[66] Hanna K¨opcke and Erhard Rahm. Frameworks for entity matching: A comparison. Data & Knowledge Engineering, 69(2):197–210, 2010.

[67] Milen Kouylekov and B Magnini. Recognizing textual entailment with tree edit distance. In Proceedings of the PASCAL RTE Challenge, pages 17–20, 2005.

[68] Sangeetha Kutty, Richi Nayak, and Yuefeng Li. Xml documents clustering using tensor space model–a preliminary study. In Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, pages 1167–1173. IEEE, 2010.

[69] J. Kwon, P. Rao, B. Moon, and S. Lee. Fist: scalable xml document filtering by sequencing twig patterns. In Proceedings of the 31st international conference on Very large data bases, pages 217–228. VLDB Endowment, 2005.

[70] YAML: YAML Ain’t Markup Language. Accessed: Jan 2014. URL http://www.yaml. org/.

[71] Lu´ıs Leit˜ao and P´avel Calado. Duplicate detection through structure optimization. In Proc. of ICKM, pages 443–452. ACM, 2011.

[72] Lu´ıs Leit˜ao and P´avel Calado. An automatic blocking strategy for xml duplicate detec- tion. ACM SIGAPP Applied Computing Review, 13(2):42–53, 2013.

[73] Lu´ıs Leit˜ao and P´avel Calado. Efficient xml duplicate detection using an adaptive two- level optimization. In Proceedings of the 28th Annual ACM Symposium on Applied Com- puting, pages 832–837. ACM, 2013.

[74] Lu´ıs Leit˜ao, P´avel Calado, and Melanie Weis. Structure-based inference of xml similarity for fuzzy duplicate detection. In Proc. of CIKM, pages 293–302. ACM, 2007.

[75] Lu´ıs Leit˜ao, P´avel Calado, and Melanie Herschel. Efficient and effective duplicate detec- tion in hierarchical data. 2012.

[76] Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710, 1966.

[77] Dongyang Li, Qingbo Wang, Cyril Guyot, Ashwin Narasimha, Dejan Vucinic, Zvonimir Bandic, and Qing Yang. Hardware accelerator for similarity based data dedupe. In Networking, Architecture and Storage (NAS), 2015 IEEE International Conference on, pages 224–232. IEEE, 2015.

[78] Ming Li, Fan Ye, Minkyong Kim, Han Chen, and Hui Lei. A scalable and elastic pub- lish/subscribe service. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 1254–1265. IEEE, 2011.

[79] Xingkong Ma, Yijie Wang, and Xiaoqiang Pei. A scalable and reliable matching service for content-based publish/subscribe systems. IEEE Transactions on Cloud Computing, 3(1):1–13, 2015.

[80] Josh MacDonald. File system support for delta compression. PhD thesis, Masters thesis. Department of Electrical Engineering and Computer Science, University of California at Berkeley, 2000.

[81] Bohdan S Majewski, Nicholas C Wormald, George Havas, and Zbigniew J Czech. A family of perfect hashing methods. The Computer Journal, 39(6):547–554, 1996.

[82] Abdullah-Al Mamun, Tian Mi, Robert Aseltine, and Sanguthevar Rajasekaran. Efficient sequential and parallel algorithms for record linkage. Journal of the American Medical Informatics Association, 21(2):252–262, 2014.

[83] Christopher D Manning, Prabhakar Raghavan, and Hinrich Sch¨utze. Introduction to information retrieval, volume 1. Cambridge University Press Cambridge, 2008. URL

http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html.

[84] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H Byers. Big data: The next frontier for innovation, competition, and productivity. 2011.

[85] Ward Douglas Maurer and Theodore Gyle Lewis. Hash table methods. ACM Computing Surveys (CSUR), 7(1):5–19, 1975.

[86] Viktor Mayer-Sch¨onberger and Kenneth Cukier. Big data: A revolution that will trans- form how we live, work, and think. Houghton Mifflin Harcourt, 2013.

[87] Peter Mell, Tim Grance, et al. The nist definition of cloud computing. 2011.

[88] Paulius Micikevicius, Saverio Caminiti, and Narsingh Deo. Linear-time algorithms for encoding trees as sequences of node labels. Congressus Numerantium, 183:65, 2006. [89] Diego Milano, Monica Scannapieco, and Tiziana Catarci. Structure aware xml object

identification. IEEE Data Eng. Bull., 29(2):67–74, 2006.

[90] A. Mitra, M. Vieira, P. Bakalov, W. Najjar, and V. Tsotras. Boosting xml filtering with a scalable fpga-based architecture. arXiv preprint arXiv:0909.1781, 2009.

[91] Jeffrey C Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy. Po- tential benefits of delta encoding and data compression for http. In ACM SIGCOMM Computer Communication Review, volume 27, pages 181–194. ACM, 1997.

[92] Reza Mokhtari and Michael Stumm. Bigkernel–high performance cpu-gpu communica- tion pipelining for big data-style applications. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 819–828. IEEE, 2014.

[93] MongoDB. Accessed: Jan 2014. URLhttp://www.mongodb.org/.

[94] Tomoya Mori, Takeyuki Tamura, Daiji Fukagawa, Atsuhiro Takasu, Etsuji Tomita, and Tatsuya Akutsu. A clique-based method using dynamic programming for computing edit distance between unordered trees. Journal of computational biology, 19(10):1089–1104, 2012.

[95] R. Moussalli, R. Halstead, M. Salloum, W. Najjar, and V.J. Tsotras. Efficient xml path filtering using gpus. 2011.

[96] R. Moussalli, M. Salloum, W. Najjar, and V.J. Tsotras. Massively parallel xml twig filtering using dynamic programming on fpgas. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 948–959. IEEE, 2011.

[97] Claudiu CN Musat. Layout-based electronic communication filtering systems and meth- ods, May 17 2011. US Patent 7,945,627.

[98] Felix Naumann and Melanie Herschel. An introduction to duplicate detection. Synthesis Lectures on Data Management, 2(1):1–87, 2010.

[99] Gonzalo Navarro. A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1):31–88, 2001.

[100] Richi Nayak. Fast and effective clustering of xml data using structural information. Knowledge and Information Systems, 14(2):197–215, 2008.

[101] Nvidia. Supercomputing at 1/10th the cost, July 2010. URL http://www.nvidia.com/ docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf.

[102] Nvidia. Nvidia 16nm pascal based tesla p100 with gp100 gpu unveiled worlds first gpu with hbm2 and 10.6 tflops of compute on a single chip. 2017. URLhttp://wccftech. com/nvidia-pascal-tesla-p100-gp100-gpu/.

[103] Nvidia. Cuda parallel computing platform, Aa yy. URL http://www.nvidia.com/ object/cuda_home_new.html.

[104] Benjamin Okundaye, Sigrid Ewert, and Ian Sanders. A novel approach to visual password schemes using tree picture grammars, Aa yy. URLwww.prasa.org/proceedings/2014/ prasa2014-43.pdf.

[105] Patrick O’Neil, Elizabeth O’Neil, Shankar Pal, Istvan Cseri, Gideon Schaller, and Nigel Westbury. Ordpaths: insert-friendly xml node labels. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 903–908. ACM, 2004. [106] Zan Ouyang, Nasir Memon, Torsten Suel, and Dimitre Trendafilov. Cluster-based delta compression of a collection of files. In Web Information Systems Engineering, 2002. WISE 2002. Proceedings of the Third International Conference on, pages 257–266. IEEE, 2002. [107] Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. In European Symposium

on Algorithms, pages 121–133. Springer, 2001.

[108] Charles C Palmer and Aaron Kershenbaum. Representing trees in genetic algorithms. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 379–384. IEEE, 1994.

[109] George Papadakis and Wolfgang Nejdl. Efficient entity resolution methods for heteroge- neous information spaces. In Data Engineering Workshops (ICDEW), 2011 IEEE 27th International Conference on, pages 304–307. IEEE, 2011.

[110] Marcus Paradies, Susan Malaika, J´erˆome Sim´eon, Shahan Khatchadourian, and Kai-Uwe Sattler. Entity matching for semistructured data in the cloud. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, pages 453–458. ACM, 2012.

[111] Mateusz Pawlik and Nikolaus Augsten. Rted: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment, 5(4):334–345, 2011.

[112] Mateusz Pawlik and Nikolaus Augsten. Tree edit distance: Manual, 2012. URL http: //www.inf.unibz.it/dis/projects/tree-edit-distance/documentation.php.

[113] Sally Picciotto. How to encode a tree. PhD thesis, University of California, San Diego, 1999.

[114] Jeff Preshing. Hash collision probabilities. Accessed: March 2017. URL http:

//preshing.com/20110504/hash-collision-probabilities/.

[115] Heinz Pr¨ufer. Neuer beweis eines satzes ¨uber permutationen. Arch. Math. Phys, 27: 742–744, 1918.

[116] Sven Puhlmann, Melanie Weis, and Felix Naumann. Xml duplicate detection using sorted neighborhoods. In Advances in Database Technology-EDBT 2006, pages 773–791. Springer, 2006.

[117] Erhard Rahm and Hong Hai Do. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4):3–13, 2000.

[118] Banda Ramadan and Peter Christen. Forest-based dynamic sorted neighborhood index- ing for real-time entity resolution. In Proceedings of the 23rd ACM Int. Conference on Conference on Information and Knowledge Management, pages 1787–1790. ACM, 2014. [119] P. Rao and B. Moon. Prix: Indexing and querying xml using prufer sequences. In Data