• No se han encontrado resultados

TítuloDualgrid : a closed representation space for consistent spatial databases

N/A
N/A
Protected

Academic year: 2020

Share "TítuloDualgrid : a closed representation space for consistent spatial databases"

Copied!
152
0
0

Texto completo

(1)U NIVERSIDADE DA C ORUÑA D EPARTAMENTO DE C OMPUTACIÓN. D UALGRID : A CLOSED REPRESENTATION SPACE FOR CONSISTENT SPATIAL DATABASES. Tesis Doctoral A Coruña, Septiembre 2012 Autor: Directores:. José Antonio Cotelo Lema Dr. Miguel Ángel Rodríguez Luaces Prof. Dr. Ralf Hartmut Güting.

(2)

(3) Autor: Título: Departamento: Directores: Año:. José Antonio Cotelo Lema Dualgrid: A closed representation space for consistent spatial databases Computación Dr. Miguel Ángel Rodríguez Luaces Prof. Dr. Ralf Hartmut Güting 2012.

(4)

(5) PhD Thesis directed by: Dr. Miguel Ángel Rodríguez Luaces Departamento de Computación Facultade de Informática Universidade da Coruña 15071 A Coruña (Spain) Tel: +34 981 167000 ext. 1254 Fax: +34 981 167160 [email protected]. Prof. Dr. Ralf Hartmut Güting LG Datenbanksysteme für neue Anwendungen FernUniversität in Hagen Postfach 940 D-58084 Hagen Germany Tel: +49 (2331) 987 4279 Fax: +49 (2331) 987 4278 [email protected].

(6)

(7) A Noelia, Rubén e Irene.

(8)

(9) Abstract In the past decades, much effort has been devoted to the integration of spatial information within more traditional information systems. To support such integration, spatial data representation technology has been intensively improved, from conceptual and discrete models for data representation and query languages, to indexing and visualization technologies and interoperability standards. As a result of all these efforts, Geographic Information Systems (GIS) are nowadays a widely used technology. The existing spatial databases technology provides standardized data models and operations [OGC06], based on conceptually solid spatial algebras. However, translating such conceptual models into physical models suitable for their implementation on computers, where only finite precision representations of the space can be used, becomes a difficult task. As a result, the current implementations of physical models are generally severely limited when compared to their conceptual counterparts. They attempt to provide an implementation fulfilling the original conceptual algebras, but at the physical level they cannot further ignore the problems of robustness and topological correctness arising from the use of finite precision numbers for representing spatial coordinates. This results in deceptive physical algebra implementations because they break most of the properties of the conceptual algebra that they rely on. More specifically, the physical models do not remain closed under the data types and operations of the algebra, and the solutions applied to address this problem, usually some kind of approximated result, do not fulfill the properties expected from the affected operation. The consequence is that the physical models fail to provide consistent implementations for the spatial operations. This makes development of applications that rely on the properties of the conceptual model (e.g., spatial analysis applications) much more complex, if not impossible. Moreover, even the implementation of the physical model itself becomes more complex, as it can not rely anymore on the theoretical basis of the conceptual model it is supposed to implement..

(10) ii The main goal of this research work is to provide a framework to develop spatial database extensions capable of fulfilling the key properties of the conceptual spatial algebra they implement. At the same time, the proposed framework meets the constraints imposed by nowadays real world GIS applications in terms of performance and resource requirements, as well as interoperability with existing applications and standards. To achieve this goal, we first analyze the current state of the art in spatial information representation. The main focus is on the way the different approaches deal with the limitations imposed by computers and the effects that these solutions have in the properties of the conceptual model they intend to implement. Second, we study the sources of these problems and propose a well-grounded physical model framework (called Dualgrid) to guarantee that the implementations of spatial algebras keep their key properties from the perspective of the user application. We also provide an example of such an implementation and experimental results on how such a framework solves the consistency and even the implementation problems of an existing and widely used spatial database extension. Third, we revisit our framework to extend its properties (DualgridFF) so that it is able to meet the additional restrictions imposed by current spatial applications, tools and interoperability standards (OGC)..

(11) Resumen En las últimas décadas se ha dedicado un significativo esfuerzo a la integración de las tecnologías de Sistemas de Información Geográfica (SIG) con sistemas de información más tradicionales. Para dar soporte a esa integración la tecnología de representación de datos espaciales ha sido mejorada en múltiples aspectos, desde los modelos (conceptuales y discretos) de representación de datos y lenguajes de consulta a las tecnologías de indexación y visualización y a los estándares de interoperabilidad. Como resultado de estos esfuerzos, la tecnología de Sistemas de Información Geográfica es ampliamente utilizada en la actualidad en todo tipo de aplicaciones. Las tecnologías de bases de datos espaciales actuales ofrecen modelos de datos y operaciones estandarizados [OGC06], inspirados en álgebras espaciales con unas bases conceptuales sólidas. En contraste, las implementaciones existentes en la actualidad sufren severas limitaciones (en comparación con los modelos conceptuales que pretenden soportar), resultantes de las dificultades inherentes a traducir esos modelos conceptuales en modelos físicos susceptibles de su implementación en ordenadores, donde es necesario usar espacios de representación de precisión finita. A pesar del esfuerzo por ofrecer implementaciones que cumplan con el álgebra conceptual original, no es posible seguir ignorando a nivel físico los problemas de robustez y corrección topológica que surgen del uso de números de precisión finita para la representación de las coordenadas espaciales. El resultado son implementaciones que sólo cumplen en apariencia con las álgebra conceptuales originales, pero que en realidad incumplen la mayor parte de las propiedades en que están basadas esas álgebras. Más específicamente, los modelos físicos no mantienen sus propiedades de cierre bajo el conjunto de tipos de datos y operaciones implementados, y las soluciones aplicadas para solventarlo, normalmente algún tipo de resultado aproximado, no cumplen con las propiedades esperadas de la operación en cuestión. En consecuencia, el modelo físico resultante no es capaz de ofrecer una implementación consistente de las operaciones espaciales ofrecidas a los usuarios. Como resultado, el desarrollo de aplicaciones basadas en las propiedades del modelo conceptual (por ejemplo,.

(12) iv aplicaciones de análisis espacial) se vuelve mucho más difícil, si no imposible. De hecho, incluso la implementación del propio modelo físico se vuelve mucho más compleja, al no poder apoyarse ni siquiera en las bases teóricas del modelo conceptual que se supone se está implementando. El objetivo principal de esta tesis es sentar las bases para el desarrollo de extensiones de bases de datos espaciales capaces de cumplir las propiedades clave del álgebra espacial conceptual en la que se basan, teniendo en cuenta además las restricciones impuestas por la realidad de las aplicaciones GIS actuales en términos de rendimiento y consumo de recursos y de interoperabilidad con las aplicaciones y estándares existentes. Para alcanzar dicho objetivo, se analiza primero el estado del arte actual en representación de información espacial, prestando especial atención a las limitaciones impuestas por los ordenadores y los efectos que esas soluciones tienen en el (in)cumplimiento de las propiedades del modelo conceptual. En segundo lugar, se estudian las raíces de esos problemas y se propone un marco teórico para el diseño de modelos físicos (Dualgrid) que garantiza que las implementaciones de álgebras espaciales basadas en él mantienen las propiedades clave desde el punto de vista de las aplicaciones de usuario. Como prueba de concepto, se muestra un ejemplo de una implementación basada en Dualgrid y resultados experimentales mostrando cómo su uso soluciona los problemas de consistencia y (incluso) de implementación de una extensión de bases de datos espaciales ampliamente utilizada. En tercer lugar, se revisita dicho modelo para extender sus propiedades (DualgridFF) con el fin de hacer posible el cumplimento de las restricciones adicionales (en términos de rendimiento, espacio de almacenamiento e interoperabilidad) impuestas por las aplicaciones, tecnologías GIS y estándares de interoperabilidad (OGC) existentes..

(13) Resumo Nas últimas décadas tense adicado un esforzo significativo á integración das tecnoloxías de Sistemas de Información Xeográfica (SIX) con sistemas de información mais tradicionais. Para dar soporte a esa integración a tecnoloxía de representación de datos espaciais ten sido mellorada en numerosos aspectos, dende os modelos (conceptuais e discretos) de representación de datos e linguaxes de procura ata as tecnoloxías de indexación e visualización e os estándares de interoperabilidade. Como resultado destes esforzos, as tecnoloxías de Sistemas de Información Xeográfica son amplamente utilizadas na actualidade en todo tipo de aplicacións. As tecnoloxías de bases de datos espaciais actuais ofrecen modelos de datos e operacións estandarizados [OGC-SFS], inspirados en álxebras espaciais con unhas bases conceptuais sólidas. Por contra, as implementacións existentes na actualidade sofren severas limitacións (en comparación cos modelos conceptuais que pretenden soportar), resultantes das dificultades inherentes a traducir eses modelos conceptuais en modelos físicos susceptíbeis da súa implementación en ordenadores, onde é preciso usar espazos de representación de precisión finita. A pesar dos esforzos por ofrecer implementacións que cumpran coas álxebras conceptuais orixinais, non é posible seguir ignorando a nivel físico os problemas de robustez e corrección topolóxica que xorden do uso de números de precisión finita para a representación das coordenadas espaciais. O resultado son implementacións que sómente cumpren en aparencia coas álxebra conceptuais orixinais, pero que en realidade incumpren a maior parte das propiedades en que están baseadas esas álxebras. Mais especificamente, os modelos físicos non manteñen as súas propiedades de peche baixo o conxunto de tipos de datos e operacións implementados, e as solucións aplicadas para solventalo, normalmente algún tipo de resultado aproximado, non cumpren coas propiedades esperadas da operación en cuestión. En consecuencia, o modelo físico resultante no é capaz de ofrecer unha implementación consistente das operacións espaciais ofrecidas aos usuarios. Como resultado, o desenvolvemento de aplicacións baseadas nas propiedades do modelo conceptual (por exemplo, aplicacións de análise espacial) tornase moito.

(14) vi mais difícil, se non imposible. De feito, incluso a implementación do propio modelo físico se fai moito mais complexa, ao non poder apoiarse nin sequera nas bases teóricas do modelo conceptual que se supón se está implementando. O obxectivo principal de esta teses é sentar as bases para o desenvolvemento de extensións de bases de datos espaciais capaces de cumprir coas propiedades clave da álxebra espacial conceptual na que se basean, tendo en conta ademais as restricións impostas por a realidade das aplicacións SIX actuais en termos de rendemento e consumo de recursos e de interoperabilidade coas aplicacións e estándares existentes. Para acadar o devandito obxectivo, analizase primeiro o estado da arte actual en representación de información espacial, prestando especial atención as limitacións impostas por os ordenadores e os efectos que esas solucións teñen no (in)cumprimento das propiedades do modelo conceptual. En segundo lugar, estúdanse as raices de eses problemas e proponse un marco teórico para o deseño de modelos físicos (Dualgrid) que garante que as implementacións de álxebras espaciais baseadas en el manteñen as propiedades clave dende o punto de vista das aplicacións do usuario. Como proba de concepto, amosase un exemplo de unha implementación baseada en Dualgrid e resultados experimentais mostrando como o seu uso soluciona os problemas de consistencia e (incluso) de implementación de unha extensión de bases de datos espaciais amplamente utilizada. En terceiro lugar, revisítase o devandito modelo para estender as súas propiedades (DualgridFF) coa fin de facer posible o cumprimento das restricións adicionais (en termos de rendemento, espazo de almacenamento e interoperabilidade) impostas por as aplicacións, tecnoloxías SIX e estándares de interoperabilidade (OGC) existentes..

(15) Acknowledgements I would like to thank all those who, directly or indirectly, have helped this thesis come to be written. Especially, to Ralf Hartmut Güting, Nieves Rodríguez Brisaboa, Miguel Rodríguez Luaces, Roberto Creo Hombre, Miguel Rodríguez Penabad and my family. Also, I would like to thank the Databases Lab at University of A Coruña, the Datenbanksysteme für neue Anwendungen group at FernUniversität Hagen and the CHOROCHRONOS project. Had they not existed, this thesis would not have existed either..

(16) Agradecimientos Mis agradecimientos a todos aquellos que, directa o indirectamente, han ayudado a que esta tesis llegase a ser escrita. En especial, a Ralf Hartmut Güting, Nieves Rodríguez Brisaboa, Miguel Rodríguez Luaces, Roberto Creo Hombre, Miguel Rodríguez Penabad y mi familia. Igualmente, al Laboratorio de Bases de Datos de la Universidad de A Coruña, al grupo Datenbanksysteme für neue Anwendungen de la FernUniversität Hagen y al proyecto CHOROCHRONOS. Si no hubiesen existido, esta tesis tampoco lo habría hecho..

(17) Contents 1. 2. 3. Introduction 1.1 Background and motivation 1.2 Goals . . . . . . . . . . . 1.3 Scope and relevance . . . . 1.4 Thesis outline . . . . . . .. . . . .. . . . .. 1 1 7 8 9. State of the art 2.1 Geographic Information Systems and Spatial Information Modeling 2.2 Abstract spatial models . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Discrete spatial models . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Physical spatial models . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Commercial approaches . . . . . . . . . . . . . . . . . . . 2.4.1.1 PostGIS . . . . . . . . . . . . . . . . . . . . . . 2.4.1.2 Oracle Spatial . . . . . . . . . . . . . . . . . . . 2.4.1.3 Microsoft SQL Server . . . . . . . . . . . . . . . 2.4.2 The ROSE Algebra: Realms . . . . . . . . . . . . . . . . . 2.5 Analysis and conclusions . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. 11 11 16 18 21 23 24 25 28 28 32. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. Consistency of spatial operations 3.1 Understanding the relevance of consistency in the development of applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Approaches to deal with inconsistency in vectorial spatial data models 3.2.1 Restriction of operations . . . . . . . . . . . . . . . . . . . . 3.2.2 Restriction to orthogonal boundaries . . . . . . . . . . . . . . 3.2.3 Approximated operations . . . . . . . . . . . . . . . . . . . . 3.2.4 Exact representation . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Realms approach . . . . . . . . . . . . . . . . . . . . . . . .. 35 35 38 40 40 40 41 42.

(18) x 3.3. . . . .. . . . .. . . . .. . . . .. . . . .. 42 43 43 46. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 49 50 55 59 62 64 65. . . . . . . . . .. 69 69 70 71 72 78 78 80 82 83. 6. Conclusions and future research lines 6.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85 85 87. A. Spatial inconsistencies example A.1 Intersection test in Oracle Spatial . . . . A.1.1 Oracle commands . . . . . . . . A.2 Intersection test in PostgreSQL/PostGIS A.2.1 PostgreSQL/PostGIS commands A.3 Intersection test in SQL Server . . . . . A.3.1 SQL Server commands . . . . .. 91 91 92 95 95 97 98. 3.4 4. 5. Comparison of consistency support in the spatial dimension . 3.3.1 Operations classification . . . . . . . . . . . . . . . 3.3.2 Consistency analysis . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .. Dualgrid 4.1 Definition of Dualgrid . . . . . . . . . . . . . 4.2 Data importation and exportation . . . . . . . 4.3 Realms and the ROSE Algebra over Dualgrid 4.4 PostGIS-GEOS over Dualgrid . . . . . . . . 4.5 Rigorous spatial logics over Dualgrid . . . . 4.6 Conclusions . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. Dualgrid for floats 5.1 Original Dualgrid drawbacks . . . . . . . . . . . . . 5.1.1 Interoperability . . . . . . . . . . . . . . . . 5.1.2 Performance . . . . . . . . . . . . . . . . . 5.2 Dualgrid For Floats . . . . . . . . . . . . . . . . . . 5.3 Implementation issues . . . . . . . . . . . . . . . . 5.3.1 Storage and performance cost of DualgridFF 5.3.2 Performance improving tips . . . . . . . . . 5.3.3 Interoperability . . . . . . . . . . . . . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. B Publications and other research achievements. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . . . . .. . . . . . .. . . . . . .. 101.

(19) xi C Descripción del trabajo presentado C.1 Introducción . . . . . . . . . . . C.2 Metodología utilizada . . . . . . C.3 Conclusiones y contribuciones . C.4 Trabajo futuro . . . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 107 . 107 . 109 . 111 . 113.

(20)

(21) List of Figures 2.1 2.2 2.3 2.4 2.5 3.1. 4.1. Examples of geographic data represented at two different scales. . . . Object represented using (a) raster, (b) vectorial or (c) constraint databases models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two segments with end-points having integer coordinates. Their intersection point has non-integer coordinates. . . . . . . . . . . . . . Example of objects over a realm. a) Elements in a realm. b) Some objects over the realm. . . . . . . . . . . . . . . . . . . . . . . . . . Redrawing of a segment S. . . . . . . . . . . . . . . . . . . . . . . .. 18. 29 30. Example of errors in intersection operations due to space discretization and approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37. 20 21. Construction of a Realm with the arguments of a ROSE operation. a) Non realm-based arguments. b) Realm-based arguments. . . . . . . . Replacement of Realms by a preprocessing step before applying an operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 62. Example of points belonging to GPF and GPI and segments belonging to GSF and GSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of DualgridFF points and polylines. . . . . . . . . . . . .. 74 76. A.1 Simple set-theory test. . . . . . . . . . . . . . . . . . . . . . . . . .. 92. 4.2. 5.1 5.2. 61.

(22)

(23) List of Tables 3.1. 4.1 4.2. 5.1 5.2. Consistency properties of operations for each data model in the spatial domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44. Original PostGIS vs Dualgrid PostGIS. . . . . . . . . . . . . . . . . . Consistency properties of operations for each data model in the spatial domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 63. Original PostGIS vs Dualgrid-PostGIS performance comparative. . . Percentage of new points generated by spatial operations. . . . . . . .. 72 79. A.1 Table test_regions in Oracle after inserting both geometries and intersection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Oracle answers to the contains test. . . . . . . . . . . . . . . . . A.3 Table test_regions in PostgreSQL/PostGIS after inserting geometries and their intersection. . . . . . . . . . . . . . . . . . A.4 PostgreSQL/PostGIS answers to the contains test. . . . . . . . . A.5 Table test_regions in Microsoft SQL Server after inserting geometries and their intersection. . . . . . . . . . . . . . . . . . A.6 Microsoft SQL Server answers to the contains test. . . . . . . .. their . . . . . . both . . . . . . both . . . . . .. 66. 92 93 95 95 97 98.

(24)

(25) List of Algorithms 5.1 5.2. Point-segment orientation test when P ∈ GPI . . . . . . . . . . . . . . Point_comparison algorithm when P1 and P2 are GPI points. . . . . .. 81 82.

(26)

(27) Chapter 1. Introduction This thesis presents the research and development work performed in the area of spatial databases towards the definition of a physical data model that keeps the theoretical properties of conceptual models. To achieve this goal, we study the capability of existing spatial physical models to reliably translate all the semantics and properties of their original spatial conceptual models to the real world applications and spatial database extensions. Then, we analyze the consequences of their failure to keep the conceptual model properties. After that, we propose a new perspective to develop physical representation models that succeed in translating the key conceptual model properties to the final implementations. Our physical model drastically eases the use of such spatial database implementations by GIS application developers. It also increases their capabilities to develop applications with spatial reasoning functionalities.. 1.1. Background and motivation. There has been much research effort in the last two decades on representing, querying and exchanging spatial information in geographic information systems (GIS) and spatial database management systems (SDBMS). This effort lead these technologies to evolve from the original ad-hoc geographic-only applications (that were common in the early 90s) to the nowadays standardized, interoperable and widely-used technologies that enable applications in different domains to integrate spatial and non-spatial information..

(28) 2. Chapter 1. Introduction. The improvement of the GIS field is mainly driven by advances in spatial information modeling and technologies, which enable database management systems and applications to take advantage of spatial information. The use of correct spatial information models is fundamental for the evolution of the field, therefore, an important research effort has been devoted to it. The process of representing geographic information in a computer involves the definition of a set of data models to represent real-world geographic features using appropriate data structures. Typically, three different data models defined at three different abstraction levels are used: the abstract spatial model (also called conceptual model), the discrete spatial model (also called logical model) and the physical spatial model. The abstract spatial model describes the geographic features of the real world, the relationships between different features and the operations that can be performed to each feature. It uses formal concepts defined without taking into consideration the implementation details. An example of an abstract spatial model is the ISO 19107:2003 international standard [ISO03], which defines conceptual types (e.g., curve, surface) and operations (e.g., overlaps, touches) for geographic objects. The discrete spatial model takes into account the limitations of a computer system (e.g., limited memory and computing performance) to define data types and algorithms that can be used to implement the concepts of an abstract data model. The Open Geospatial Consortium standard for Simple Features in SQL [OGC06] is a common example of a discrete spatial model for ISO 19107:2003. This model defines data types such as linestring and polygon and ensure that the data types specifications are appropriate to support the implementation of efficient algorithms (with regard to the their algorithmic complexity) for the operations it defines. Finally, a physical spatial model is a particular implementation of a logical model in a specific computing environment. For example, the implementation of OGC SFS in PostGIS [Ref10] for PostgreSQL using GEOS is a physical data model. A physical model also defines some aspects left open by discrete models, like the particular representation of spatial coordinates and the precision used for it. Abstract spatial models define an algebra of spatial data types and operations with solid theoretical properties. They ensure that these operations are closed, that is, that the data types are powerful enough to represent any result of the spatial operations (e.g., an abstract spatial model must ensure that the result of the intersection of any two surface values can be represented using a surface value). Regarding the underlying geographic space, they assume a continuous space over R3 [ISO03]. There has been much work over the last decades on the definition of abstract spatial models, and these definitions are so mature that international standards like the aforementioned ISO 19107:2003 have been defined and approved. However, the.

(29) 1.1. Background and motivation. 3. definition of discrete and physical spatial models that maintain the properties of an abstract spatial data model has proven to be a difficult problem. Discrete spatial models must define data types using a finite number of components, and the designer must decide whether a computer-friendly representation is used or not. For example, a discrete data model may represent curves using a set of linear segments, surfaces using a set of segments to represent their boundary, and points with a pair of numeric coordinates. If the discrete spatial model continues to assume a continuous space, it can claim that the properties of the abstract spatial model are maintained, but the problem of representing the spatial coordinates in a computer (the discretization of the geographic space) is translated to the physical spatial model. However, if the problem is addressed at the discrete spatial model by using a finite space for coordinates (e.g., 64 bits integers or IEEE754 double-precision floatingpoint numbers), then it is really difficult to maintain the properties of the abstract spatial model. For example, it is not possible to represent the point with coordinates ( 13 , 32 ) using double-precision floating-point numbers. Furthermore, if two segments (e.g., S1 = ((0, 0), (1, 2)) and S2 = ((0, 1), (1, 0))) intersect at these coordinates, the point resulting from the intersection operation cannot be represented precisely, and subsequent predicates checking whether the point belongs to the original segments will return false. Spatial databases researchers have proposed several ways of addressing the space discretization problem while trying to comply with the original spatial model properties. Two such proposals are Realms [GS93] and the use of arbitrary precision rational numbers [WS05]. In the Realms physical spatial model [GS93], each time a new spatial object is inserted into the database, all boundary intersections with the existing objects are detected. The objects involved are rewritten so that such intersection points are made explicit in their representation. This ensures that, even if an intersection point needs to be approximated to the underlying finite resolution space, all the involved objects are adjusted so that such point continues to be part of their boundary. The result is that, once the object is inserted in the database, all the spatial operations of the abstract spatial model implemented by their proposal (the ROSE algebra [GS95]) remain closed and coherent among them. However, this coherence is only maintained as far as no new external objects are inserted (other than the ones resulting of spatial operations over the already existing objects). Therefore, consistency is not guaranteed between answers before and after an insertion. Moreover, its implementation in commercial1 databases can be problematic, because existing spatial objects can get modified without 1 In the following, we will use the term commercial to refer to final product implementations in contrast to research/experimental implementations, regardless of their business model (open source or proprietary software)..

(30) 4. Chapter 1. Introduction. user knowledge. In fact, the model implies that spatial object insertions made by a user will cause modifications in objects over which the user may have no update (or even read) privileges. Some authors [WS05] propose the use of arbitrary precision rational numbers as the underlying coordinates space. Although this would guarantee that the properties of the original model are maintained, it imposes a big toll in performance and storage requirements [BF09]. Therefore, no (commercial) spatial technology or DBMS has adopted it. Following a different approach, spatial database technology implementors (Geomedia, ESRI, PostGIS, etc.) have systematically assumed that space discretization problems were inherent to the discretization process and that adopting hardwaresupported representations for coordinates is a requirement (that is, integer or floatingpoint numbers). The first implementations of spatial DBMS usually removed any operation of the abstract spatial model that did not remain strictly closed under the coordinate space used in the discrete model. However, most customers preferred to have operations returning an approximation of the theoretical result (that could be represented with the given data types) rather than not having the operation at all. Therefore, current spatial DBMS implement the full set of operations of the abstract models, using approximate versions of the problematic ones (e.g., an intersection operation that, to be fair, should be called approximate_intersection). With the adoption of approximate operations, they have focused on solving their own implementation issues. For example, they usually ensure that their approximation is still a valid spatial value after the approximation, so it can be used as the input for another operator. They provide exact spatial predicates (implemented with algorithms such as [ABD+97]) that guarantee that all precision issues are properly handled to provide the correct answer for all spatial operations returning boolean values. They also ensure that their implementation of more complex operations takes the correct decisions. But, once they have addressed their problems, the remaining ones are left to their users. As shown, the development of physical spatial models that properly solve the space discretization problems in a way suitable to be implemented in commercial DBMS remains, despite the research efforts, unsolved. Moreover, the evolution of spatial technologies is not driven by a coordinated effort. Instead, the improvements in spatial technologies are being driven by different forces corresponding with the (usually not coincident) perspectives of four collectives: spatial database researchers, commercial spatial database technology2 developers, GIS software developers, and GIS application 2 We will use the term commercial spatial database technology to refer to Spatial DBMS as well as other technology for data management not directly related to databases, such as programming libraries and GIS.

(31) 1.1. Background and motivation. 5. users. In the following paragraphs we describe the points of view and motivations of each of these four collectives. The failure to conciliate the perspectives of these four collectives, combined with the existing open problems, undermines the evolution of the GIS field. Spatial database researchers tend to focus on proposing well-defined and powerful theoretical models, without taking into account how well they can be adapted to commercial DBMS requirements. Commercial spatial database technology developers focus on creating database management systems, libraries and tools. However, they are still not oriented to final users, but rather to software developers. Therefore, they need to achieve a tradeoff between a robust model implementation, an appropriate performance, low storage requirements, and power of their implementations. In addition to that, getting a solid implementation is a key requirement. Hence, if they can manage to get the problem sorted out well enough to get their code working without a sensible penalty in the rest of aspects, it is enough for them. For the remaining part of the problem, they can try to pass it on to the following element of the chain: GIS software developers. Of course GIS software developers would prefer solutions that solve the problem and avoid them any headache, but that would only make a difference in the selection of spatial technologies if there were significant differences between solutions regarding this issue, and not just “different flavors of tricks”. GIS software developers need to provide final users with the functionalities they require. They can hardly ignore the problem or move it to the user, because their software is the one expected to do spatial reasoning. For example, if a user just needs to know which pieces of land are adjacent to a given one, we could draw a map of the area around the given piece of land and let him select the adjacent ones. This way, the application itself does not need to implement the detection method. But if the application itself needs such information to compute the result the user needs (for example, to select the pieces of land adjacent to the ones that meet some criteria), then it should find it out by itself, and can not “pass the problem” on to the user. Therefore, GIS software developers would like to use spatial technologies that do exactly what they are expected, and that comply, as much as possible, with the original abstract model (which was, in fact, designed having the functionality requirements of GIS software developers in mind). The less the spatial technologies comply with the abstract model, the bigger the problems they will have when they try to get their applications working. Just for a moment, think of implementing a simple small program based in a programming language that has approximate boolean operations development tools. They are grouped together because their developers share a common perspective with regard to spatial models that plays, in fact, a big role in their common priorities..

(32) 6. Chapter 1. Introduction. (which 1 in 100 times return false for a = a), approximate counters (you can try to implement a loop using floating-point-based counters) or approximate ifs (which in fact are the result of an if with an approximate boolean operation). It would probably be funny, as long as you do not have the client waiting for it and your boss accounting the hours. Finally, GIS application users (as any application user) need the applications to provide them with the functionalities they require. The implementation details are irrelevant as long as they have an acceptable performance, they behave as expected, and they get the work done. GIS application users will usually be non-expert users in spatial technologies, and they need the applications to behave in a predictable and intuitive way. Current commercial spatial DBMS/technology implementations break the main properties of the abstract data model, either because they implement an ill-defined discrete data model, or because they had to address somehow the space discretization needs that the discrete data model avoided to address. As a result, their users (GIS software developers) can no longer rely on such properties when developing their applications. This is not a minor problem, because the abstract data model properties had been carefully chosen to model the reality. If the basic properties are not maintained, the simplest spatial reasoning algorithms become difficult to implement. For instance, suppose that a basic set-theory property is not maintained due to closure problems between data types and operations, which is fairly common in commercial spatial database technology implementations. If the intersection point of two segments P = S1 ∩ S2 cannot always be represented, and hence we just represent an approximation P0 , it turns out that when we need to check which segments P0 belongs to, we could get that neither S1 nor S2 contain it. Similarly, if no consistency is provided among operations, it is possible to get false when testing whether A ∩ B ∈ A3 . With these incoherencies, it is really difficult to implement any spatial reasoning support in GIS applications. As a result, the implementation of spatial reasoning functionality tends to be unusual in GIS applications. This thesis focuses on covering the gap that exists between the research work on discrete spatial models and the implementations in commercial spatial database management systems. First, it analyzes the properties that a physical spatial data model needs to fulfill to be able to implement a discrete spatial data model without breaking its closure properties, that is, to ensure that the result of any operation of the discrete model continues to be representable at the physical model. Second, it proposes two new physical spatial data models, called Dualgrid and Dualgrid For Floats (DualgridFF), 3 Some of the current commercial spatial DBMS/technology implementations (e.g., Geomedia by Intergraph) try at least to do a small effort and, for example, for any two segments s1 and s2 the test s1 ∩ s2 ∈ s1 returns always true..

(33) 1.2. Goals. 7. which succeed in maintaining the closure properties at the physical level. Dualgrid defines a finite resolution representation space that guarantees all the properties required to keep the discrete data model closed. DualgridFF goes a step forward, meeting the additional requirements of real commercial databases and technologies. DualgridFF represents a trade-off between the needs of spatial database researchers, commercial spatial database technology developers, GIS software developers and GIS application users. It provides a solid physical representation model that succeeds in keeping the main properties of the original abstract and discrete vectorial models. This way, spatial DBMS and technology implementors can easily translate them to their implementations whereas the behavior of the operations is kept intuitive and well-defined. This should allow application developers to focus on the problems they have to solve (the applications). At the same time, its design makes it possible to implement them without expensive performance and storage costs. It also illustrates how important is to have into account all stakeholders4 needs when designing physical data models.. 1.2. Goals. The problems presented in the previous section motivated the primary goal of this thesis: to improve the applicability of research works in discrete spatial models to commercial databases and technologies, allowing technology developers to avoid dealing with spatial model closure problems. This will improve the applicability of those technologies and spread the use of spatially enabled applications with spatial reasoning capabilities. To achieve this, we need to reach the following specific goals: • Analyze the problems that are generated when a discrete/physical spatial data model breaks the original abstract model properties. • Identify the issues that have to be addressed by the physical spatial data model to keep the properties of the discrete and abstract data models. • Propose a first physical spatial data model that allows to recover the properties of the abstract spatial model that were lost in the implementation, while at the 4 The term stakeholders is used in business to refer to all parts which have some impact from/to business operations. In this case, we use it to refer to all the parts that are affected by the decisions taken at the physical model. As it happens with business stakeholders, physical model stakeholders needs should be taken into account, as the positive and negative impacts that design decisions could take on their needs will drive their own decisions. The different stakeholders interests not being aligned will jeopardize the possibilities of the proposed model to have a real impact in final users..

(34) 8. Chapter 1. Introduction same time trying to require as few changes as possible in the data structures and algorithms that are already implemented. • Propose a second physical spatial data model that allows to keep the properties of the abstract spatial model and at the same time can be efficiently implemented in new commercial spatial databases and technologies, or efficiently incorporated into existing ones through more extensive changes in their data structures and algorithms.. 1.3. Scope and relevance. Spatial databases technology has reached a high level of maturity. Nowadays all relevant database management systems (PostgreSQL, MySQL, Oracle, DB2, Microsoft SQL Server, Informix, etc.) provide spatial data types and operations, usually implementing widely accepted spatial standards (SFS, ISO SQL/MM, etc.). However, all existing implementations suffer a common base problem: they are not robust/consistent, in the sense that they fail to fulfill even the more basic theoretical properties of the abstract model they intend to implement. This thesis makes three main contributions to the field: 1. It establishes the basic properties that a physical spatial model needs to meet to ensure that it is able to correctly implement a vectorial discrete model without breaking its theoretical properties. 2. It defines the Dualgrid representation space, designed to reincorporate to existing spatial implementations (e.g., ROSE algebra, PostGIS/GEOS, etc.) those robustness/consistency properties originally lost when implementing the discrete model. 3. It defines the DualgridFF physical spatial model, aimed at the implementation of new spatial databases and technologies. It provides all the benefits of Dualgrid while allowing the implementation of commercial quality spatial models, in terms of performance, storage overload and interoperability. That is, the contributions of this thesis allow a qualitative improvement on spatial databases/technologies that should provide developers of spatially-enabled applications with the solid grounds they are demanding..

(35) 1.4. Thesis outline. 1.4. 9. Thesis outline. Chapter 2 studies the current state of the art in geographical information systems and spatial representation. It outlines the requirements of modern geographical information systems, and gives an introduction to the more common spatial representation models. Although it focuses in vectorial representations, other representation models (constraint databases, raster, etc.) are presented. For each of them, the chapter describes how physical models address the discretization problems, as well as the impact that such decisions have from the user applications perspective. Chapter 3 analyzes in detail the spatial operations consistency provided by the more common physical representation models. Chapter 4 focuses on the key reasons for such inconsistencies and proposes a new physical model framework to provide spatial databases implementations that fulfill the requirements of their conceptual model counterpart. Chapter 5 goes a step forward, and revisits the proposed physical model to incorporate the restrictions imposed by current commercial GIS applications and standards, so that the proposed physical model is suitable for its use in commercial spatial databases and technologies. Finally, Chapter 6 concludes the thesis and points to future research directions..

(36)

(37) Chapter 2. State of the art This section shows an overview of the state of the art in geographic information systems (GIS) and spatial data representation. It gives an introduction to the GIS field, and more specifically to the more relevant spatial representation models. The section also shows how the different spatial physical models address the discretization of the space (i.e., the representation of spatial coordinates using finite-size representations) and how they handle the problems that arise from it. Furthermore, it also highlights the impact of those decisions from the perspective of the user applications. Finally, this state of the art focuses mainly in vectorial representation models, as they are widely used in GIS applications, which are specially affected by the space discretization problems.. 2.1. Geographic Information Systems and Spatial Information Modeling. The exponential improvement in the performance of computer systems and the advances in spatial information modeling in the last decades have made possible the appearance of new tools (Geographic Information Systems, GIS for short) to manipulate the geographic properties of objects and to represent them in a graphical way as a map on a computer screen. Geographic Information Systems are more than just cartographic tools to produce maps because they are a step forward over traditional information systems. They offer an appropriated environment for capturing, storing and managing both alphanumeric and geographic information, and they provide tools for processing and analyzing them together. By geographic information we mean here information about the spatial.

(38) 12. Chapter 2. State of the art. properties of objects. This information can be as simple as the position in the map of all the hospitals of a country or as complex as the partition of the country’s land with regard to the kind of vegetation that grows on it. According to [BM98], any GIS application should provide certain functionalities that can be classified as follows: 1. Data input and verification. This covers all aspects of capturing and verifying the correctness of geographic data, as well as their conversion to digital form. 2. Data storage and management. This functionality deals with all the aspects related to the structure and organization of geographic information. It must take into account both the way the geographic information is perceived by the users (abstract model) and the way it is handled by the computers (discrete and physical models). 3. Data transformation and analysis. This functionality is covered by the processes for editing the information (to keep it up to date or to remove errors) and for analyzing it. Data analysis is one of the main tasks of GIS, and it consists in the application of analysis methods to the information to achieve answers to the questions posed by the users. 4. Data output and presentation. The functionality of producing maps and mapbased material is a highly distinctive feature of GIS compared with a general purpose information system. Together with the analysis techniques, this is the aspect that differs the most from traditional information systems. A GIS must allow the efficient exploitation of all the information it manages, not only providing spatial operations for geographic data, but also allowing to analyze and browse these data graphically, and allowing the identification of geographic relationships between objects. Examples of application domains for GIS are, among others, cadastre management, sanitation and communication networks, computer assisted navigation, and decision support systems. Some of them have even become highly popular among home users in the last decade: • Online mapping services and applications as Google Maps (http://maps. google.es), Microsoft Bing Maps (http://www.bing.com/maps/) or Yahoo! Maps (http://maps.yahoo.com/) have popularized the access to worldwide maps with a detail never dreamed a decade ago. • GPS Navigation software (to be run on dedicated GPS devices and/or in modern general-purpose smartphones with GPS support) has become a.

(39) 2.1. Geographic Information Systems and Spatial Information Modeling. 13. common tool for route planning, for both commercial and domestic daily use. TOMTOM (http://www.tomtom.com/), Navman (http://www.navman. com/), ROUTE66 (http://www.66.com/route66/) and iGo (http://www. igomyway.com/) are a few examples. • Content geolocation is starting to become common. Examples are the geolocation-oriented photo sharing services Flickr (http://www.flickr. com/) and Panoramio (http://www.panoramio.com/), or the persons geolocation services Google Latitude (http://www.google.com/mobile/ latitude/) and Foursquare (http://www.foursquare.com/). It is important to distinguish clearly between a GIS application and a tool to develop GIS applications (herein GIS development tool). GIS development tools provide developers with the capabilities required for capturing, storing and managing both alphanumeric and geographic information. They are used as the basis to develop GIS applications that provide users with an environment adapted to their specific needs. This difference is somehow similar to the one between database management systems (or software development tools) and information systems. Examples of GIS development tools are: • Spatial database management systems: database extensions to provide support for spatial information. Examples of this type are Oracle Spatial (http://www. oracle.com/es/products/database/options/spatial/index.html), the commercial spatial extension for Oracle, and PostGIS (http://www.postgis. org/), a spatial extension for PostgreSQL, released as open source software. • Geospatial web services: services for spatial data publishing, usually following OGC standards as WFS [OGC09], WMS [OGC06b] or WCS [OGC09b]. Examples are the open source geospatial servers GeoServer (http://www. geoserver.org/) and MapServer (http://www.mapserver.org/). • GIS desktop clients, as the open source GIS clients QGIS (http://www.qgis. org/) and uDIG (http://udig.refractions.net/). • Libraries and APIs for online spatial data visualization, as the open source JavaScript library OpenLayers (http://www.openlayers.org/) or the Google Maps API (http://code.google.com/apis/maps/index.html). • Wide range GIS development platforms, trying to cover a wide range of GIS development aspects. Example are the commercial solutions provided by ESRI (http://www.esri.com/) and Intergraph (http://www.intergraph.com/)..

(40) 14. Chapter 2. State of the art. The popularization of GIS in the last decade is illustrated by recent initiatives to promote the interoperability of public data infrastructures and the access to free and public spatial data sources. Two of these initiatives are INSPIRE and OpenStreetMap. • INSPIRE (http://inspire.jrc.ec.europa.eu/) is an European Directive (Directive 2007/2/CE) whose goal is to establish an infrastructure for spatial information in Europe to support Community environmental policies. INSPIRE forces the creation of public spatial data infrastructures and initially addresses the publication of 34 spatial data themes needed for environmental applications. The public spatial data infrastructures publish the information following widely used OGC spatial standards. • OpenStreetMap (http://www.openstreetmap.org/) is a collaborative project to create a free editable map of the world. The maps are created using data from portable GPS devices, aerial photography, other free sources or simply from local knowledge. Both rendered images and the source vector dataset are available for download under a Creative Commons Attribution-ShareAlike 2.0 license. Both initiatives emphasize the needs of spatial data interchange, highlighting the importance of semantic and technical interoperability of spatial data and tools. The area of GIS has been widely covered by some authors from different perspectives. [RSV01] helps researchers new to the field to get a wide understanding of the current state of the art in spatial databases technologies, from spatial data modeling and representation to storage, retrieval and manipulation, as well as algorithms and indexing methods. [Wor04], on the other hand, is a good reference for computer science professionals new to the development of spatial and GIS technologies, as it addresses GIS from a computing perspective. For GIS users, [BM98] becomes a good reference to understand the applications of GIS, providing a wide perspective of the field and giving an introduction to the theoretical and technical principles that need to be understood to work effectively and critically with GIS. The completely different perspectives of these three books proves the existence of different interest groups in the GIS field, such as those described in Chapter 1 (spatial database researchers, commercial spatial database technology developers, GIS software developers, and GIS application users). Each of these groups has a different perspective on what is highly important, what is almost irrelevant, and what is someone else’s problem. A remarkable characteristic of current GIS systems is their high interoperability requirements. The high costs of producing spatial data, the importance of data analysis and processing to convert those data in information relevant to users with completely different needs, and the advantages of spatial information sharing among organizations.

(41) 2.1. Geographic Information Systems and Spatial Information Modeling. 15. have made interoperability and modularity a key aspect in the evolution and growth of the GIS field. Nowadays, a typical Geographical Information System relies in spatial database systems to store its data. Those data have been generated by several different sources (using their own specific GIS applications) and preprocessed when fed to the system to fit to the organization requirements. Some input data are processed automatically whereas other are introduced and managed by specific GIS applications. The information is processed again to create different presentations that fit the specific view of the world of different user types. Some information is even provided through standardized interfaces (SFS [OGC06], WFS [OGC09], WMS [OGC06b], WCS [OGC09b], etc.) that can used by other organizations for different applications. To be able to succeed in such a complex environment, the GIS domain has drastically evolved from the ad-hoc systems in the early nineties to the nowadays highly standardized, interoperable and modularized systems. The wide effort in standards definition, mainly driven by the Open Geospatial Consortium and ISO, in data representation (SFS [OGC06], GML [OGC07, ISO07b], ISO 19107:2003 [ISO03], etc.), functional modules (ISO 19142:2010 [ISO10], WFS [OGC09], WMS [OGC06b], WCS [OGC09b], etc.) and even metadata (CSW [OGC07b]) and processing (WPS [OGC05]) services, and their adoption in the GIS field, has been a key factor on this evolution. Both OGC and ISO families of GIS services standards follow a modular approach, where each specific standard addresses some specific functionality requirements. This approach simplifies the development of GIS development tools, as developers can focus at each moment on one specific group of functionalities. It also promotes and ensures interoperability between implementations, as service modules will interconnect through well know standards. However, previous to all this evolution, the development of generic spatial data models (initially as a research field on its own [Güt88, SH91, GS95, Ege94] and later in the form of international standards [ISO03, ISO05a, ISO05b, OGC06]), has been of fundamental importance. Such data models pursue two basic goals. On the one hand, they provide a set of spatial data types suitable to accurately and efficiently represent the kind of spatial data managed by a wide set of application environments. On the other hand, they offer a powerful set of operations over these types that fulfills the requirements of those applications. Spatial data models are usually classified as abstract (sometimes called conceptual), discrete (sometimes called logical) or physical spatial data models, depending on the abstraction level at which they are defined. Abstract spatial data models focus on the definition of conceptually meaningful representations of real world spatial information, as well as a powerful and meaningful set of operations for exploiting it. They are designed trying to model the spatial information in a way similar to how.

(42) 16. Chapter 2. State of the art. it is understood, classified and analyzed by GIS application users. Discrete spatial data models try to adapt abstract data models to the reality of computers and algorithm complexity, where finite storage space and computation power need to be taken into account. Finally, physical data models map discrete spatial data models to specific data structures and algorithms so that they can be directly (and efficiently) implemented. In the following sections each of these three types of spatial models are explained in more detail.. 2.2. Abstract spatial models. Research proposals first ([GS95, LTR99]) and international standards later ([ISO05a, ISO03, ISO05b]) have defined abstract spatial data models attempting to capture the semantics of data types and operations as they are seen by spatial information users, settling a formal and high level basis to represent and query spatial information. They represent spatial information over a continuous geographic space, a space of coordinates (usually R2 or R3 ) over which spatial data are mapped. This space is, usually, either Cartesian (the GIS uses a flat model of the earth) or geodesic (the space tries to better model the reality by taking into account the Earth’s shape and curvature, trying to represent distances and areas in a more realistic way). Current international standard ISO 19109:2005 [ISO05a] defines the way GIS applications should model information representing the real world through features. A feature describes objects with a geographic location (buildings, a digital terrain model, a map, etc.). The standard formalizes the features structure through the General Feature Model (GFM). In it, features have a type (e.g., roads, rivers, buildings, etc.), attributes, relations between feature types and behaviors. Features can be either geographic objects or coverages (space mapping functions). Geographic objects are defined in depth at abstract level by the standard ISO 19107:2003 [ISO03], whereas coverages are specified in standard ISO 19123:2005 [ISO05b]. According to ISO 19107:2003 [ISO03], a geographic object or entity is an application object for which the GIS stores geographic attributes and (optionally) alphanumeric attributes. A geographic attribute represents a geographic property (position, extension, etc.) of an object. A geographic domain is the set of values that a geographic attribute may have. The more relevant geographic domains, according to ISO 19107:2003 [ISO03], are: • Primitives: basic types. They can be: – Point: represents a single point in the space, for example the location of a farm..

(43) 2.2. Abstract spatial models. 17. – Curve: represents a sequence of contiguous points in the space (a curve). An example of such value is the course of a road. – Surface: represents a connected area in the space, possibly with holes. A piece of land or the area belonging to a municipality are examples of such values.. • Complex: they are a combination of primitive elements representing a single object. They might be either homogeneous, where all the elements belong to the same type (called composites), or heterogeneous, where elements of various types coexist. An example of the former is the area covered by the snow in a country. The hydrography of a given area, including both rivers (lines) and lakes (regions) is an example of the latter.. • Aggregates: collections of primitive elements. They differ from complex in that they intend to represent a collection of objects, instead of a complex object. Coverages are specified by the standard ISO 19123:2005 [ISO05b] as a data representation that directly assigns values to geographical positions. A coverage is a function from the geographic domain to a value of other domain (numeric, classification, etc.), where each geographic location has a unique value assigned. Coverages can represent both discrete or continuous functions. The standard defines several interfaces for different types of mapping methods (discrete coverages, rectangular or hexagonal grids, Triangulated Irregular Networks (TIN), etc.). But, as it corresponds to an abstract model, it does not impose any limitations on how the data are stored or managed. Figure 2.1 shows two possible representations of a set of features, depending on the information relevant for users at two different scales. At scale E1 the object c is represented using the entity city_locations, whose geographical location is represented by an attribute of type point. The geographic object r belongs to a feature rivers whose geographic attribute is represented by an attribute of type line. For scale E2, cities are represented by the entity city_areas, whose geographic attribute is a region. The rivers feature is also represented. In addition, coverages vegetation (a discrete coverage, splitting the space into the vegetation types ts1, ts2, ts3, ts4) and salinity (a continuous coverage, represented in the map with darker colors to represent lower soil salinity) are also depicted at scale E2..

(44) 18. Chapter 2. State of the art. ts3 r. c. ts1. ts2. r c ts4. (a) Scale E1. (b) Scale E2. Figure 2.1: Examples of geographic data represented at two different scales.. 2.3. Discrete spatial models. Conceptually, the spatial values represented by abstract models are usually non-empty and infinite subsets of the geographic space. However, in order to represent such values in a computer system (and in order to implement most operations over them), infinite sets must be modeled by some kind of finite representation. Several discrete spatial data models have been proposed in the spatial scientific literature, some of them being nowadays widely used. They try to provide efficient and finite representations for the usually infinite sets of points of spatial objects. These models can be roughly classified into a few basic categories [Par95] depending on the approach followed in the representation and the intended target application domains. The three main categories are the following: 1. Raster models. Raster models are based on the concept of maps of bits. The infinite points of the space are represented by a finite number of raster points, which are uniformly distributed over the space. They are usually represented using some kind of array data structure. Raster representations have some advantages (typical spatial operations are intuitive and simple to implement), but have also important drawbacks, such as the big space requirements for storing spatial objects (all the points belonging to the object have to be explicitly represented). These models are often used for spatial information generated by imagery techniques, and as a way to represent coverages..

(45) 2.3. Discrete spatial models. 19. 2. Vectorial models. In vectorial models [LT92], the information in the n-dimensional space is represented using m-dimensional hyperspaces, with m < n. More informally explained, the infinite set of points belonging to a spatial object is represented by its boundary, usually using linear representations. For instance, in the two dimensional space the following types are usually defined [GS95, OGC06]: • Points. • Graphs, composed by nodes (points) and arcs between nodes (segments). • Polylines, represented as a finite sequence of points. • Polygons, represented as a closed and no self-intersecting polyline. • Complex objects, as for example sets of polylines, complex polygons (composed by a set of polygons, possibly with holes) or heterogeneous sets of objects containing elements belonging to any of the previous types. Vectorial data models are widely used and numerous query languages [Cha94, Rig94, Güt94a, Ege94] and algebras [Güt88, Güt89, Güt94b] are based on them. One of the advantages of this family of models is the existence of efficient data structures [Gun88], as well as very efficient algorithms for detecting relationships between the objects [Sam90] and for computing set-theory operations. Furthermore, they are very appropriate for GIS applications where spatial objects have clearly defined boundaries (e.g., territory administration) and for visualization in user interfaces because the objects support zooms, rotations and other transformations without loosing quality. As disadvantages, using these models to represent continuous coverages would require to discretize and vectorize them in geometries, setting sharp and precise value boundaries to objects that conceptually are continuous. Moreover, they are not appropriate for imagery representation. 3. Constraint databases models. Under the constraint databases models, data are represented using linear constraints. The same approach can be used for representing spatial data in the space, as shown in [KPV95, GK97, GRSS97, GRS98]. For example, the constraint x > 1 ∧ 2x − y − 5 < 0 ∧ y < 7 (displayed in Figure 2.2.c) represents a region in the space. The advantage of this approach is that the extension of constraint databases systems to the management of spatial information is quite straightforward, representing them through constraints, similarly to all other data types. The disadvantages are the high computational complexity of query evaluation in constraint database systems [RSV01] and the.

(46) 20. Chapter 2. State of the art . . .   .  . . . . . .  .  . .   .   . Figure 2.2: Object represented using (a) raster, (b) vectorial or (c) constraint databases models.. difficulty of implementing them in non-constraint database systems (it would require to incorporate a constraints engine just for supporting spatial data).. Figure 2.2 shows the same object represented using a raster, a vectorial and a constraint databases model. Although all those types of spatial data models are used (each of them in their specific niche of application domains), only two of them are used in general purpose spatial databases and tools: the raster and vectorial models. Raster models are mainly used in imagery processing and domains where the spatial information of interest corresponds with coverages. They are also covered by several ISO standards (e.g they are considered in some of the interfaces for coverages defined by the ISO 19123:2005 [ISO05b]). Vectorial models are more widely used on GIS applications where the spatial data to be managed correspond mainly with geographic objects. They are also used in some of the international standards (e.g., [ISO07]). This research work focuses its attention in vectorial data models, given that they are the ones where inconsistency problems are more relevant. Nevertheless, and to put in context the analysis here performed, we compare in Chapter 3 the consistency properties of the different solutions used in vectorial models with the ones exhibited by the other data model types..

(47) 2.4. Physical spatial models. 21. y (4,7). (8,5). 5 P' P. (1,2) (1,5) x 5. Figure 2.3: Two segments with end-points having integer coordinates. intersection point has non-integer coordinates.. 2.4. Their. Physical spatial models. The main problem in defining physical spatial models that translate the expressiveness of abstract and discrete spatial models to computer applications (that is, commercial spatial database technologies) is that abstract and discrete spatial models are usually defined over a continuous domain, but for their implementation we need to define a physical spatial model that uses discrete representations for the spatial types. For example, a value of type point [ISO03] can be represented at physical level as a pair of coordinates, each of them represented as a 32 bit signed integer number. A curve can be represented as a sequence of point values defining a polyline. And surface can be represented as a polygon, which in turn is represented as the sequence of point values representing its boundary. The problems arising from the need to use a discrete space at physical level are far from trivial. For example, if we decide to use the previously described discrete space (coordinates represented as 32 bit signed integer numbers), then the representation space is no longer closed under the set-theory operations (e.g., union or intersection). This happens because the intersection point of two segments defined between points represented with integer coordinates does not usually have integer coordinates (an example is shown in Figure 2.3). The straightforward solution to such problems (as for example, to approximate the non-representable points by the closest representable.

(48) 22. Chapter 2. State of the art. ones) makes the operations closed,1 but it violates the basic properties of set-theory. For example, for regions A and B, the following relationships A ⊆ (A ∪ B), (A ∩ B) ⊆ A or (A \ B) ∩ B = 0/ do not hold any more, generating inconsistencies between the answers. In certain domains (e.g., graphical user interfaces) such rounding errors may be acceptable, but they cannot be tolerated in query evaluation for spatial analysis, since they may lead to wrong answers. For example, the intersection of a river and a highway may be found to lie neither on the river nor on the highway. Depending on their approach to this problem, spatial extensions for commercial databases can be classified in one of the following two groups: 1. They do not provide operations that are not closed under the selected discrete representation. This usually means that they provide only predicates over spatial data types and operations for composing and decomposing such objects (e.g., operations for constructing the segment between two given points). 2. They provide the whole set of operations, but for those operations that are not closed over the selected discrete representation space they return an approximation of the result. Examples of the first group of spatial implementations were the first generations of spatial databases extensions, such as Illustra 2D Spatial Datablade [Ill94], earlier versions of Oracle’s spatial extension (Oracle8 Spatial Cartridge) [Ora97] and MySQL 5.5 [Mys10]. Illustra’s extension [Ill94] provided data types Point, Line (in the mathematical sense), Segment, Path (a polyline), Polygon and Polygon Set (a region as a set of polygons with holes), apart from some other data types such as Circle, Ellipse or Box. The provided operations were basically predicates and operations for decomposing/constructing a value (for example, retrieving the n-th Segment of a Path or constructing the Segment having as end points two given Point values). Any operations that became non-closed over the discrete representation used, such as the set-theory operations, were not provided.2 Current MySQL 5.5 provides also spatial extensions [Mys10]. However, it only provides spatial operations (returning spatial values) that are already expected to return approximated results (e.g., Centroid(), PointOnSurface(), etc.), decomposition operations (retrieving elements of the representation of other object, e.g., Boundary(), StartPoint(), EndPoint(), PointN(), ExteriorRing(), InteriorRingN(), etc.) and constructor operations (building another 1 Artificially closed, because now we are not implementing the intersection operation (for instance), but an approximate intersection operation. 2 The only exception to this are operations that are already expected to return approximate values, as getting the approximation of an Ellipse value as a Polygon or getting the bounding box of any spatial value..

(49) 2.4. Physical spatial models. 23. spatial object representation from existing ones, e.g., Point(), LineString(), Polygon(), etc.). Examples belonging to the second group are more modern generations of spatial databases, such as DB2’s Spatial Extender [Dav98] (by ESRI, available also under Informix), more modern versions of Oracle’s extension (Oracle8i Spatial Cartridge) [Ora99, Ora10] and most of current spatial databases (Oracle Spatial 11g [Ora10], PostgreSql/PostGIS [Ref10] and SQL Server 2008 [Mic09]). The problem of robustness and topological correctness of geometric computation has also been addressed in the computational geometry literature [DS90, For85]. The literature distinguishes between perturbation-free approaches where the idea is to perform geometric computations with sufficiently high precision so that no errors occur (e.g., [KM83, OTU87]) and perturbation (approximation) approaches (e.g., [DS90, GM95, Mil89, Sch94]) that allow to slightly change the input data of computations (in order to reduce errors in the computations) or the results (in order to be able to represent data at a fixed level of precision). A special application of the perturbation approach within the area of spatial databases is used in the ROSE algebra [GS95]. The ROSE algebra uses an underlying discrete geometric basis called a realm [GS93]. Intuitively, a realm contains a consistent representation of all geometric data of an application. All numerical problems are treated at the realm level, which uses a particular perturbation method. Values of spatial data types are defined on top of realms which ensures that all operations (including the set-theory operations) return consistent answers (for example, all the relationships A ⊆ (A ∪ B), (A ∩ B) ⊆ A or (A \ B) ∩ B = 0/ hold in the implementation). In Section 2.4.1 we show three representative examples on how problems arising form space discretization are addressed in commercial and research solutions. A more detailed explanation on how realms work is shown in Section 2.4.2.. 2.4.1. Commercial approaches. Among current commercial solutions, PostGIS, Oracle Spatial and Microsoft SQL Server are the three more representative and used. PostGIS is the most commonly used free (GPL) spatial database extension (used with PostgreSQL), whereas Oracle spatial is a reference among commercial (proprietary) spatial database extension (used with Oracle DBMS). The spatial support is a relatively new feature in Microsoft SQL Server, although already fulfilling most spatial standards. All of them follow (in one way or another) the perturbation/approximation approach and are representative of the current state of the art in current commercial spatial database technologies. And all of them fail to provide appropriate consistency among spatial operations. Appendix A shows a.

Figure

Figure 2.1: Examples of geographic data represented at two different scales.
Figure 2.2: Object represented using (a) raster, (b) vectorial or (c) constraint databases models.
Figure 2.3: Two segments with end-points having integer coordinates. Their intersection point has non-integer coordinates.
Figure 2.4: Example of objects over a realm. a) Elements in a realm. b) Some objects over the realm.
+7

Referencias

Documento similar