IV. Capítulo
4. Conclusiones de la investigación
While two aspects spatial data processing (theory and HPC system design) are well
studied in this research and we made remarkable contribution to the domain by exploring
linear search space reduction filters and developing distributed heterogeneous (CPU+GPU)
computing systems, there are still many unexplored tracks in this field. In the last part of
those challenges interesting and try to propose efficient methods to address them.
• Scalability: Our model with PL load balancing algorithm shows a very good scalability. However, in order to process large spatial data (>100GB), we should have appropriate
heterogeneous cluster equipped with enough the CPU-GPU resources. Also, it is im-
portant to have access to the sate of art GPUs since there are significant improvements
in their computational power as well as their memory limitations. for instance, nvidia
P100GP U with P ascal architecture has 16GB of memory that is almost the same as
using all four K20 GPUs with Kepler architecture yet not considering the faster Tesla
architecture.
• CMBR-based filters: theCMBR-based filter can be generalized to many spatial search problem including the one discussed in Chapter 5 regarding colocation mining problem.
• More spatial join operations: In this research, we did not our libraries to handle all the spatial join operations. There are still some important operations such as overlay
that need to be addressed in terms of heterogeneous computing.
• Spatiotemporal join problem: One significantly important aspect of spatial data pro- cessing is adding the time series to the problem. Problem formulations and data
structures are fundamentally different and more complex in spatiotemporal domains
REFERENCES
[1] S. K. Prasad, D. Aghajarian, M. McDermott, D. Shah, M. Mokbel, S. Puri, S. J. Rey,
S. Shekhar, Y. Xe, R. R. Vatsavai et al., “Parallel processing over spatial-temporal
datasets from geo, bio, climate and social science communities: A research roadmap,”
in Big Data (BigData Congress), 2017 IEEE International Congress on. IEEE, 2017,
pp. 232–250.
[2] S. K. Prasad, M. McDermott, S. Puri, D. Shah, D. Aghajarian, S. Shekhar, and X. Zhou,
“A vision for GPU-accelerated parallel computation on geo-spatial datasets,” SIGSPA-
TIAL Special, vol. 6, no. 3, pp. 19–26, 2015.
[3] “NASA earth science data: https://aws.amazon.com/blogs/aws/process-earth-science-
data-on-aws-with-nasa-nex/.”
[4] S. Puri and S. K. Prasad, “A parallel algorithm for clipping polygons with improved
bounds and a distributed overlay processing system using mpi,” in Cluster, Cloud and
Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on. IEEE,
2015, pp. 576–585.
[5] S. You, J. Zhang, and L. Gruenwald, “Large-scale spatial join query processing in cloud,”
in Data Engineering Workshops (ICDEW), 2015 31st IEEE International Conference
on. IEEE, 2015, pp. 34–41.
[6] S. Ray, B. Simion, A. D. Brown, and R. Johnson, “A parallel spatial data analysis in-
frastructure for the cloud,” inProceedings of the 21st ACM SIGSPATIAL International
Conference on Advances in Geographic Information Systems. ACM, 2013, pp. 284–293.
[7] S. Puri, D. Aghajarian, and S. Prasad, “MPI-GIS : High Performance Computing and
IO for Spatial Overlay and Join,” The International Conference for High Performance
[8] A. Eldawy and M. F. Mokbel, “A demonstration of spatialhadoop: An efficient mapre-
duce framework for spatial data,” Proceedings of the VLDB Endowment, vol. 6, no. 12,
pp. 1230–1233, 2013.
[9] S. K. Prasad, M. McDermott, X. He, and S. Puri, “Gpu-based parallel r-tree con-
struction and querying,” in Parallel and Distributed Processing Symposium Workshop
(IPDPSW), 2015 IEEE International. IEEE, 2015, pp. 618–627.
[10] E. H. Jacox and H. Samet, “Spatial join techniques,” ACM Transactions on Database
Systems (TODS), vol. 32, no. 1, p. 7, 2007.
[11] PostGIS, “http://postgis.net/.”
[12] D. H. Jonassen, “Instructional design models for well-structured and iii-structured
problem-solving learning outcomes,” Educational Technology Research and Develop-
ment, vol. 45, no. 1, pp. 65–94, 1997.
[13] J. F. Voss, “On the solving of iii-structured problems,” The nature of expertise, p. 261,
2014.
[14] A. Aji, G. Teodoro, and F. Wang, “Haggis: Turbocharge a mapreduce based spatial data
warehousing system with gpu engine,” in Proceedings of the 3rd ACM SIGSPATIAL
International Workshop on Analytics for Big Geospatial Data. ACM, 2014, pp. 15–20.
[15] Y. J. Garc´ıa, M. A. Lopez, and S. T. Leutenegger, “On optimal node splitting for r-
trees,” in Proceedings of the 24rd International Conference on Very Large Data Bases.
Morgan Kaufmann Publishers Inc., 1998, pp. 334–344.
[16] J. Zhang and S. You, “Cudagis: report on the design and realization of a massive data
parallel gis on gpus,” in Proceedings of the Third ACM SIGSPATIAL International
Workshop on GeoStreaming. ACM, 2012, pp. 101–108.
[17] M. Schneider, “Spatial data types for database systems(finite resolution geometry for
[18] ——, “Spatial plateau algebra for implementing fuzzy spatial objects in databases and
gis: Spatial plateau data types and operations,” Applied Soft Computing, vol. 16, pp.
148–170, 2014.
[19] G. S. Taylor, J. Ousterhout, G. Hamachi, R. Mayo, and W. Scott, “Magic: A vlsi layout
system,” in Proceedings of the 21th Design Automation Conference, 1984, pp. 152–159.
[20] S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu, and C.-t. Lu, “Spatial databases-
accomplishments and research needs,” IEEE transactions on knowledge and data engi-
neering, vol. 11, no. 1, pp. 45–55, 1999.
[21] H. Samet,Foundations of multidimensional and metric data structures. Morgan Kauf-
mann, 2006.
[22] H. Zhu, J. Su, and O. H. Ibarra, “On multi-way spatial joins with direction predicates,”
in International Symposium on Spatial and Temporal Databases. Springer, 2001, pp.
217–235.
[23] H. Samet, “Applications of spatial data structures,” 1990.
[24] S. Berchtold, D. Keim, and H. Kriegel, “An index structure for high-dimensional data,”
Readings in multimedia computing and networking, vol. 451, 2001.
[25] N. Roussopoulos and D. Leifker, “Direct spatial search on pictorial databases using
packed r-trees,” in ACM Sigmod Record, vol. 14, no. 4. ACM, 1985, pp. 17–31.
[26] J. T. Robinson, “The kdb-tree: a search structure for large multidimensional dynamic
indexes,” in Proceedings of the 1981 ACM SIGMOD international conference on Man-
agement of data. ACM, 1981, pp. 10–18.
[27] W. G. Aref and H. Samet, “Efficient processing of window queries in the pyramid data
structure,” in Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium
[28] H. Samet and R. E. Webber, “Storing a collection of polygons using quadtrees,” ACM
Transactions on Graphics (TOG), vol. 4, no. 3, pp. 182–222, 1985.
[29] A. Guttman, R-trees: a dynamic index structure for spatial searching. ACM, 1984,
vol. 14, no. 2.
[30] D. Comer, “Ubiquitous b-tree,” ACM Computing Surveys (CSUR), vol. 11, no. 2, pp.
121–137, 1979.
[31] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The r*-tree: an efficient and
robust access method for points and rectangles,” in Acm Sigmod Record, vol. 19, no. 2.
Acm, 1990, pp. 322–331.
[32] T. Sellis, N. Roussopoulos, and C. Faloutsos, “The r+-tree: A dynamic index for multi-
dimensional objects.” Tech. Rep., 1987.
[33] H. Samet, “The quadtree and related hierarchical data structures,” ACM Computing
Surveys (CSUR), vol. 16, no. 2, pp. 187–260, 1984.
[34] R. A. Finkel and J. L. Bentley, “Quad trees a data structure for retrieval on composite
keys,” Acta informatica, vol. 4, no. 1, pp. 1–9, 1974.
[35] R. K. V. Kothuri, S. Ravada, and D. Abugov, “Quadtree and r-tree indexes in ora-
cle spatial: a comparison using gis data,” in Proceedings of the 2002 ACM SIGMOD
international conference on Management of data. ACM, 2002, pp. 546–557.
[36] W. R. Franklin, Combinatorics of hidden surface algorithms. Center for Research in
Computing Techn., Aiken Computation Laboratory, Univ., 1978.
[37] A. Appel, F. J. Rohlf, and A. J. Stein,The haloed line effect for hidden line elimination.
ACM, 1979, vol. 13, no. 2.
[38] W. R. Franklin, “Efficient polyhedron intersection and union,” inProc. Graphics Inter-
[39] W. R. Franklin and P. Y. Wu, “A polygon overlay system in prolog,” in Proceedings,
AutoCarto, vol. 8, 1987, pp. 97–106.
[40] W. R. Franklin, C. Narayanaswami, M. Kankanhalli, D. Sun, M.-C. Zhou, and P. Y. Wu,
“Uniform grids: A technique for intersection detection on serial and parallel machines,”
in Proceedings of Auto Carto, vol. 9. Citeseer, 1989, pp. 100–109.
[41] W. R. Franklin, N. Chandrasekhar, M. Kankanhalli, M. Seshan, and V. Akman, “Ef-
ficiency of uniform grids for intersection detection on serial and parallel machines,” in
New Trends in Computer Graphics. Springer, 1988, pp. 288–297.
[42] C. B. Walton, A. G. Dale, and R. M. Jenevein, “A taxonomy and performance model
of data skew effects in parallel joins.” in VLDB, vol. 91, 1991, pp. 537–548.
[43] J. M. Patel and D. J. DeWitt, “Partition based spatial-merge join,” inACM SIGMOD
Record, vol. 25, no. 2. ACM, 1996, pp. 259–270.
[44] S. Audet, C. Albertsson, M. Murase, and A. Asahara, “Robust and efficient polygon
overlay on parallel stream processors,” in Proceedings of the 21st ACM SIGSPATIAL
International Conference on Advances in Geographic Information Systems. ACM, 2013,
pp. 304–313.
[45] W. R. Franklin, V. Sivaswami, D. Sun, M. Kankanhalli, and C. Narayanaswami, “Cal-
culating the area of overlaid polygons without constructing the overlay,” Cartography
and Geographic Information Systems, vol. 21, no. 2, pp. 81–89, 1994.
[46] L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter, “Scalable sweeping-
based spatial join,” in VLDB, vol. 98. Citeseer, 1998, pp. 570–581.
[47] S. T. Leutenegger, M. Lopez, J. Edgington et al., “Str: A simple and efficient algo-
rithm for r-tree packing,” in Data Engineering, 1997. Proceedings. 13th International
[48] M. Pavlovic, F. Tauheed, T. Heinis, and A. Ailamakit, “Gipsy: joining spatial datasets
with contrasting density,” in Proceedings of the 25th International Conference on Sci-
entific and Statistical Database Management. ACM, 2013, p. 11.
[49] X. Zhou, D. J. Abel, and D. Truffet, “Data partitioning for parallel spatial join process-
ing,” Geoinformatica, vol. 2, no. 2, pp. 175–204, 1998.
[50] S. You, J. Zhang, and L. Gruenwald, “Parallel spatial query processing on gpus us-
ing r-trees,” in Proceedings of the 2nd ACM SIGSPATIAL International Workshop on
Analytics for Big Geospatial Data. ACM, 2013, pp. 23–31.
[51] D. Aghajarian, S. Puri, and S. K. Prasad, “GCMF: An efficient end-to-end spatial join
system over large polygonal datasets on gpgpu platform,” SIGSPATIAL, 2016.
[52] D. Aghajarian and S. K. Prasad, “A spatial join algorithm based on a non-uniform grid
technique over gpgpu,” in Proceedings of the 25th ACM SIGSPATIAL International
Conference on Advances in Geographic Information Systems. ACM, 2017, p. 56.
[53] T. Yampaka and P. Chongstitvatana, “Spatial join with r-tree on graphics processing
units,” KMUTNB: International Journal of Applied Science and Technology, vol. 5,
no. 3, pp. 1–7, 2013.
[54] L. Luo, M. D. Wong, and L. Leong, “Parallel implementation of r-trees on the gpu,” in
Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific. IEEE,
2012, pp. 353–358.
[55] S. Puri and S. K. Prasad, “MPI-GIS: New parallel overlay algorithm and system pro-
totype,” 2014.
[56] S. Puri, “Efficient parallel and distributed algorithms for gis polygon overlay process-
ing,” 2015.
[57] S. Puri, D. Agarwal, and S. K. Prasad, “Polygonal overlay computation on cloud,
[58] A. Eldawy and M. F. Mokbel, “Spatialhadoop: A mapreduce framework for spatial
data,” in Data Engineering (ICDE), 2015 IEEE 31st International Conference on.
IEEE, 2015, pp. 1352–1363.
[59] F. Baig, M. Mehrotra, H. Vo, F. Wang, J. Saltz, and T. Kurc, “Sparkgis: Efficient
comparison and evaluation of algorithm results in tissue image analysis studies,” in
Biomedical Data Management and Graph Online Querying. Springer, 2015, pp. 134–
146.
[60] F. Baig, H. Vo, T. Kurc, J. Saltz, and F. Wang, “Sparkgis: Resource aware efficient
in-memory spatial query processing,” in Proceedings of the 25th ACM SIGSPATIAL
International Conference on Advances in Geographic Information Systems. ACM,
2017, p. 28.
[61] D. Agarwal, S. Puri, X. He, S. K. Prasad, and X. Shi, “Crayons-a cloud based parallel
framework for gis overlay operations,” Distributed & Mobile Systems Lab, 2011.
[62] D. Agarwal, S. Puri, X. He, and S. K. Prasad, “A system for gis polygonal overlay
computation on linux cluster-an experience and performance report,” in Parallel and
Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE
26th International. IEEE, 2012, pp. 1433–1439.
[63] D. Agarwal, S. Puri, and S. K. Prasad, “Crayons: Empowering cybergis by employing
cloud infrastructure,” inCyberGIS for Geospatial Discovery and Innovation. Springer,
2019, pp. 115–141.
[64] M. Stonebraker, J. Frew, K. Gardels, and J. Meredith, “The sequoia 2000 storage
benchmark,” inACM SIGMOD Record, vol. 22, no. 2. ACM, 1993, pp. 2–11.
[65] B. Simion, S. Ray, and A. D. Brown, “Speeding up spatial database query execution
[66] J. Zhang, S. You, and L. Gruenwald, “High-performance spatial query processing on big
taxi trip data using gpgpus,” inBig Data (BigData Congress), 2014 IEEE International
Congress on. IEEE, 2014, pp. 72–79.
[67] S. You, J. Zhang, and L. Gruenwald, “Scalable and efficient spatial data management
on multi-core cpu and gpu clusters: A preliminary implementation based on impala,”
in Data Engineering Workshops (ICDEW), 2015 31st IEEE International Conference
on. IEEE, 2015, pp. 143–148.
[68] A. N. Tantawi and D. Towsley, “Optimal static load balancing in distributed computer
systems,” Journal of the ACM (JACM), vol. 32, no. 2, pp. 445–465, 1985.
[69] A. Legrand, H. Renard, Y. Robert, and F. Vivien, “Mapping and load-balancing iter-
ative computations,” IEEE Transactions on Parallel and Distributed Systems, vol. 15,
no. 6, pp. 546–558, 2004.
[70] G. Cybenko, “Dynamic load balancing for distributed memory multiprocessors,” Jour-
nal of parallel and distributed computing, vol. 7, no. 2, pp. 279–301, 1989.
[71] G.-w. You, S.-w. Hwang, and N. Jain, “Scalable load balancing in cluster storage sys-
tems,” in Proceedings of the 12th International Middleware Conference. International
Federation for Information Processing, 2011, pp. 100–119.
[72] H. J. Moon, H. Hacıg¨um¨u¸s, Y. Chi, and W.-P. Hsiung, “Swat: a lightweight load
balancing method for multitenant databases,” in Proceedings of the 16th International
Conference on Extending Database Technology. ACM, 2013, pp. 65–76.
[73] M. D. Lieberman, J. Sankaranarayanan, and H. Samet, “A fast similarity join algorithm
using graphics processing units,” in Data Engineering, 2008. ICDE 2008. IEEE 24th
International Conference on. IEEE, 2008, pp. 1111–1120.
tems,” in Proceedings of the 17th ACM SIGSPATIAL International Conference on Ad-
vances in Geographic Information Systems. ACM, 2009, pp. 392–395.
[75] M. Shimrat, “Algorithm 112: position of point relative to polygon,” Communications
of the ACM, vol. 5, no. 8, p. 434, 1962.
[76] J. Zhang, S. You, and L. Gruenwald, “High-performance online spatial and temporal
aggregations on multi-core cpus and many-core gpus,” in Proceedings of the fifteenth
international workshop on Data warehousing and OLAP. ACM, 2012, pp. 89–96.
[77] M. Aigner and G. M. Ziegler, Proofs from the Book. Springer, 2010, vol. 274.
[78] “XSEDE resource in Pittsburgh Supercomputing Center:
https://www.psc.edu/homepage/about-psc.”
[79] Y. Huang, S. Shekhar, and H. Xiong, “Discovering colocation patterns from spatial
data sets: a general approach,”IEEE Transactions on Knowledge and data engineering,
vol. 16, no. 12, pp. 1472–1485, 2004.
[80] P. Phillips and I. Lee, “Crime analysis through spatial areal aggregated density pat-
terns,” Geoinformatica, vol. 15, no. 1, pp. 49–74, 2011.
[81] S. E. Donovan, G. J. Griffiths, R. Homathevi, and L. Winder, “The spatial pattern
of soil-dwelling termites in primary and logged forest in sabah, malaysia,” Ecological
Entomology, vol. 32, no. 1, pp. 1–10, 2007.
[82] P. Haase, “Spatial pattern analysis in ecology based on ripley’s k-function: Introduction
and methods of edge correction,” Journal of vegetation science, vol. 6, no. 4, pp. 575–
582, 1995.
[83] A. Sadilek, H. A. Kautz, and V. Silenzio, “Predicting disease transmission from geo-
[84] G. M. Vazquez-Prokopec, D. Bisanzio, S. T. Stoddard, V. Paz-Soldan, A. C. Morrison,
J. P. Elder, J. Ramirez-Paredes, E. S. Halsey, T. J. Kochel, T. W. Scott et al., “Using
gps technology to quantify human mobility, dynamic contacts and infectious disease
dynamics in a resource-poor urban environment,” PloS one, vol. 8, no. 4, p. e58802,
2013.
[85] L.-A. Tang, Y. Zheng, J. Yuan, J. Han, A. Leung, W.-C. Peng, and T. L. Porta,
“A framework of traveling companion discovery on trajectory data streams,” ACM
Transactions on Intelligent Systems and Technology (TIST), vol. 5, no. 1, p. 3, 2013.
[86] K. Koperski and J. Han, “Discovery of spatial association rules in geographic informa-
tion databases,” inInternational Symposium on Spatial Databases. Springer, 1995, pp.
47–66.
[87] Y. Morimoto, “Mining frequent neighboring class sets in spatial databases,” inProceed-
ings of the seventh ACM SIGKDD international conference on Knowledge discovery and
data mining. ACM, 2001, pp. 353–358.
[88] A. M. Sainju and Z. Jiang, “Grid-based colocation mining algorithms on gpu for big
spatial event data: A summary of results,” in International Symposium on Spatial and
Temporal Databases. Springer, 2017, pp. 263–280.
[89] P. NEUBAUER, “Neo4j spatial-gis for the rest of us,” 2012.
[90] T. P luciennik and E. P luciennik-Psota, “Using graph database in spatial data genera-
tion,” in Man-Machine Interactions 3. Springer, 2014, pp. 643–650.
[91] P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J.-P. Prost, M. Snirt,
B. Traversat, and P. Wong, “Overview of the mpi-io parallel i/o interface,” inInput/Out-
put in Parallel and Distributed Computer Systems. Springer, 1996, pp. 127–146.
[92] W. Yu, S. Oral, J. Vetter, and R. Barrett, “Efficiency evaluation of cray xt parallel io
[93] N. Deo and S. Prasad, “Parallel heap,” in ICPP (3), 1990, pp. 169–172.
[94] ——, “Parallel heap: An optimal parallel priority queue,”The Journal of Supercomput-
ing, vol. 6, no. 1, pp. 87–98, 1992.
[95] S. K. Prasad and S. I. Sawant, “Parallel heap: A practical priority queue for fine-
to-medium-grained applications on small multiprocessors,” in Parallel and Distributed
Processing, 1995. Proceedings. Seventh IEEE Symposium on. IEEE, 1995, pp. 328–335.
[96] X. He, D. Agarwal, and S. K. Prasad, “Design and implementation of a parallel priority
queue on many-core architectures,” inHigh Performance Computing (HiPC), 2012 19th
International Conference on. IEEE, 2012, pp. 1–10.
[97] A. Eldawy and M. F. Mokbel, “SpatialHadoop: A MapReduce Framework for Spatial
Data,” in 31st IEEE International Conference on Data Engineering, ICDE 2015,
Seoul, South Korea, April 13-17, 2015, 2015, pp. 1352–1363. [Online]. Available: