2. ANTECEDENTES Y DESCRIPCION DEL SISTEMA ADVERSARIAL DE PARTES
3.1 CONCEPTO
The archival of scientific data and supporting documentation is crucial to research and innovation. The scientific reasons for preserving data derive from the fact that obser vations, knowledge and understanding are cumulative. We then believe that the more complete the record, the more we can extract from it. Observed data provide a base line for determining rates of change and for computing the frequency of occurrence of unusual events. The longer the record, the greater our confidence in the conclusions we draw from it. Our traditional observational records have portrayed frozen instants of reality. If preserved, they will continue to provide insights, but if neglected, they will melt away (Council, 1995).
There are thus strong motivations for preserving these scientific data sets:
• Many observations about the natural world are a record of events that will never be repeated exactly. Examples comprise observations of an atmospheric storm, a deep ocean current, a volcanic eruption, and the energy emitted by a supernova. Once lost, such records can never be replaced.
• Observed data provide a baseline for determining rates of change and for computing the frequency of occurrence of unusual events. They specify the observed envelop of variability. The longer the record, the greater our confidence in the conclusions we draw from it.
• A data record may have more than one life. As scientific ideas advance, new concepts may emerge (in the same or entirely different disciplines) from study of observations that led earlier to different kinds of insights. New computing technolo gies for storing and analyzing data enhance the possibilities for finding or verifying new perspectives through reanalysis of existing data records.
• The substantial investments made to acquire data records justify their preservation. The cost of preservation will almost always be smaller than the cost of observation. Because we cannot predict which data will yield the most scientific benefit in years ahead, the data that is discarded today may be the data that would have been invaluable tomorrow. This includes the raw data before any treatment is applied to it, because new ways of processing, calibrating and analysing will certainly produce different outcomes, usually with more quality and accuracy.
Scientific archives in the fields of astronomy and astrophysics follow these patterns and reasoning. Any astronomical and astrophysical research is nowadays likely to include data coming from a variety of archives, typically collected by sky surveys operating at
12 Chapter 1. Introduction
different wavelengths. This reinforces the idea that the knowledge acquired by each and every experiment is cumulative, effectively leading to more robust results. Furthermore, the cost of storing, preserving and making this data available to the scientific community is way lower than the cost of its acquisition. This is mainly due to:
• High assembly and operational costs of ground based telescopes. These telescopes are normally installed at high altitude places, avoiding a significant portion of the Earth’s atmosphere, to diminish the effects of the weather conditions, turbulences, absortion of infrared and submilimiter wavelenths by water vapor, etc.
• Very expensive costs to assemble, launch and operate a space satellite. Astronom ical space missions are commonly required when we need to perform high accurate measurements or observe far distant objects. The Gaia mission (Gaia Collaboration
et al., 2016b; Mignard, 2005) or the Herschel Space Observatory (HSO) (Pilbratt,
2008) are good examples of such space missions.
• There has been a significant boost of open source software initiatives (Laurent, 2004). This holds true for many of the engines and tools in the Big Data and Data Science fields, reducing entry barriers for adoption and lowering operational costs. • The commoditization of storage and data processing hardware together with the fact that these new open source tools and engines are designed to run on this cheaper hardware, leads to much lower costs.
However, the complexity of the different data archives pose a lot of issues on its own for effectively and efficiently squeezing out their inherent value. The architecture of a scientific archive is thus something difficult to generalize. The Open Archival Informa tion System (OAIS), in its reference model (Lavoie, 2004), identifies and describes six services or functional components that any scientific archive should implement. Those refer to Ingest, Archival Storage, Data Management, Preservation Planning, Access and Administration. These six building blocks ensure respectively that the data can be taken to the archive, digitally stored, its metadata is made available, there is a preservation strategy in place, data can be accessed by relevant parties and its day-to-day operations and activites are defined.
Furthermore, the availability of more and more observed data, with different degrees of overlap, raises the need to combine it in a meaningful way, producing synergies along the way. We refer to this concept as data fusion, which is no more than the integration of multiple data sets and knowledge about the same real-world object or phenomena into a consistent, accurate, and useful representation. This is crucial for scientific research as it provides different observations and perspectives about the same reality. As an example in the field of astronomy, data fusion facilitates (among other things) the task of cross matching objects from different catalogues surveyed at different wavelengths, enabling both a richer exploration of the galaxy and ways of cross-validating hypotheses.
To address this challenge, virtual observatories are being established in a wide range a disciplines, supported by a variety of agencies. Groups such as the International Virtual
Observatory Alliance (IVOA), Planetary Data System (PDS) and the Space Physics Archive Search and Extract (SPASE) consortium are defining metadata standards to aid in archiving and sharing of information resources. The role of the virtual observatories is to locate available resources and help users find what they need and then gain access to it. There are many different existing resource providers from which virtual observatories must collect descriptions for. These resource providers may have associations with other providers so the topology of information exchange can often become complicated (King
et al., 2008).
In Astronomy, the Virtual Observatory (VO) was born to address interoperability and integration of both tools and data sets, utilizing the Internet to form a scientific research environment in which scientific research programs can be conducted. Its main goal is to allow transparent and distributed access to data available worldwide. This naturally enables scientists to discover, access, analyze, and combine heterogeneous data collections in a user-friendly manner. VO standards are being driven and agreed within the IVOA. These standards focus on information registries, query languages, data models, semantics, data access, protocols and visualization.