• No se han encontrado resultados

SESIÓN PRACTICA ESPIRITA

In document Apometria Que Es La Apometria (Spanish) (página 47-50)

Table 39 on page 93 list the software requirements which apply to the GSA Data Ingest subsys- tem. Each requirement is followed by a short description of how the requirement will affect the Data Ingest subsystem. See [6] for a complete description of these requirements.

TABLE 39. User interface requirements

SR1.1 The GSA shall archive all data.

All data supersets received from gemini will be ingested into the catalogues, and will be available through the bulk data retrieval system.

SR1.2 The GSA shall respect proprietary periods assigned by Gemini.

The data ingest system monitors the meta-data database and propagates the release date value into the GSA catalogues.

SR1.4.2 Remove previews when release date is changed.

When release date is updated for a dataset, the catalogue update system should cause the DP Dis- covery Agent to re-evaluate the proprietary status of all data products derived from the now propri- etary dataset.

SR2.5.1 Remove derived image descriptors for proprietary data. See SR1.4.2.

SR2.6 Release date is a data descriptor.

Release date will be ingested into the GSA catalogues, and updates to release date will be reflected in the GSA catalogue content.

Database Schema

3.

Database Schema

The data ingest subsystem uses the tables in the Gemini meta-data store as the source of cata- logue data, the GSA catalogue tables as the destination for catalogue data, and the GSA data processing tables to set up data processing. The meta-data store tables are described in [3], the catalogue tables are described in Chapter 4, and the data processing tables are described in Chap- ter 5, and the descriptions will not be repeated here.

3.1 Data Ingest table storage

The data ingest catalogue will be a Sybase relational database. An overview of the data ingest tables is shown in Figure 16 on page 95.

3.1.1 Table: ingestProcess

This table is used to control the processing of the gemIngest program, allowing the program to

SR2.8 Data quality assessment shall be in the catalogue.

Data quality will be ingested into the GSA catalogues. This is not described explicitly in this chap- ter, but the data quality will be stored in the FITS headers, and incorporated in the meta-data store, and therefore will be included in the GSA.

SR2.9 The science program shall be associated with the catalogue data.

Science programs will be ingested into the GSA catalogues, and a link will be established between science program and datasets collected under the science program, or associated data supersets which incorporate data collected under the science program.

SR2.10 The electronic observing log shall be associated with the catalogue.

The electronic observing log will be ingested into the database. An association will be established between the electronic observing log and the data supersets in the GSA catalogue (based on expo- sure start/stop time, and the time-stamps associated with entries in the electronic observing log). SR2.11 Proposal id shall be a descriptor.

Proposal id will be ingested into the GSA catalogues. SR2.13 Calibration data shall be associated with science data.

When data supersets are ingested into the GSA catalogue, links will be established to the appropri- ate calibration data.

SR2.14.4 Remove derived objects for proprietary data. See SR1.4.2.

SR2.15 Publications should be linked to data supersets.

When publication information is received, links will be established to data supersets used in the publication. This requirement has been “de-scoped” to link science programs with publications. SR2.16 The GSA shall store Gemini hardware and software versions.

Version information will be ingested into the GSA catalogues. This is not described explicitly in this chapter, but the versions will be stored in the FITS headers, and incorporated in the meta-data store, and therefore will be included in the GSA.

SR4.1 Maximum and average requirements for catalogue ingest. The ingest software must support the data rate requirements. SR4.2 Maximum and average requirements for raw data ingest.

The ingest software must support the data rate requirements. TABLE 39. User interface requirements

Database Schema

to determine which data is already in the GSA catalogues. As gemIngest processes each row in the meta-data tables, it records the timestamp of the last row processed, and when the processing is finished, saves that timestamp in this table. This allows the gemIngest program to resume processing with the next largest timestamp. There is one row in this table for each meta-data table processed by gemIngest.

FIGURE 16. Ingest tables

value = received value = science timestamp = lastTimestamp timestamp = lastTimestamp timestamp = lastTimestamp timestamp = lastTimestamp timestamp = lastTimestamp timestamp = lastTimestamp received dataSupersetName complete incomplete received science char(32) binary(16) binary(16) char(1) char(1) <pk> <fk1> <fk2> ingestProcess databaseName tableName lastTimestamp sysname sysname timestamp <pk> <pk> <fk1,fk2,fk3,fk4,fk5,fk6>

These tables are from the meta-data stores. Only one of the meta-data stores is shown, and only one of the three narrow dataset properties

tables is shown (datasetPropertiesInt). label databaseName tableName columnName value language shortText mediumText longText varchar(30) varchar(30) varchar(30) char(6) char(2) char(15) varchar(255) text <pk> <pk> <pk> <pk,ak1> <pk,ak2> environment time attribute value timestamp datetime varchar(30) float timestamp <pk> <pk> <ak> observingLog time identity continuation comment timestamp datetime <Undefined> tinyint varchar(255) timestamp <ak> publications observingProgramId adsReference timestamp varchar(20) varchar(2) timestamp <pk,ak1> <pk> <ak2> observingPrograms programId programText timestamp varchar(?) text timestamp <ak> datasetPropertiesInt attribute datasetId extension value timestamp varchar(30) binary(8) integer integer timestamp <ak> datasetWide datasetId dataLabel instrument creationDate releaseDate timestamp binary(8) varchar(71) vharchar(71) datetime datetime timestamp <ak>

Database Schema

The columns of table ingestProcess are shown in Table 40 on page 96.

3.1.2 Table: received

The received table is used to track the availability of data in the archive. The current state of each data superset is recorded in this table. Both simple datasets, and associated data supersets will be recorded in this table.

The columns of table received are shown in Table 41 on page 96.

The complete and incomplete fields indicate the status of each of the data supersets on each type of archive media. These fields are used to control the migration from one media type to another, allowing software to easily determine which datasets need to be migrated to a media type. The complete field indicates which media types contain a complete copy of the raw data. The incom- plete field indicates which media types contain at least some of the data for the data superset, but which do not contain a complete copy of the data (this is only possible in the situation where the raw data is stored in more than one file, as is the case for associated data supersets). Each bit in the bit fields represents a media type.

TABLE 40. Columns of table ingestProcess

Name Comment Data Type Manda-

tory

Primary Foreign Key data-

baseName

The name of the meta-data database containing the table.

sysname TRUE TRUE FALSE tableName The meta-data table processed by the gsaIngest pro-

gram.

sysname TRUE TRUE FALSE lastTimes-

tamp

timestamp FALSE FALSE TRUE

TABLE 41. Columns of table received

Name Comment Data Type Manda-

tory

Primary Foreign Key dataSuperset-

Name

The name of the data superset. char(32) TRUE TRUE FALSE complete This flag indicates the types of archive media which

have complete copies of the raw data for the data super- set. See the text for a description of the values for this field.

binary(16) TRUE FALSE FALSE

incomplete This bit field indicates the types of media which contain at least some of the raw data for the data superset, but which do not contain a complete set of the data. See the text for a description of this field.

binary(16) TRUE FALSE FALSE

received This flag indicates if the data for the data superset is available from the archive. See the text for possible val- ues of this flag.

char(1) FALSE FALSE TRUE

science This flag indicates if the data superset has been desig- nated a science data superset. See the text for possible values for this flag.

Data ingest software

The current definitions of the bits of interest to the GSA are:

0x0001 — DVD. 0x0002 — CD-ROM. 0x0004 — Magnetic disk.

Note that all zeros in the complete field does not necessarily indicate that the archive center does not have a complete copy of the data. All of the files may be available, but spread over more than one type of media. The received flag indicates if all of the data is available.

The values of the received flag can be "Y" (yes) if all data for the data superset is available, "N" (no) if none of the data for the data superset is available, or "I" (incomplete) if some files for a data superset are available and some are not.

Possible values for the science flag are "Y" (yes) for science datasets, " " (not checked) for data- sets which have not been evaluated. The values for datasets which have been designated as not science will be determined when the gemScience program is designed.

3.2 Received table access

The received table will be accessed by several other modules. To simplify access to the table, the

gemRec library will be created. This library will provide functions to:

Select rows from the table based on various search criteria:

the value of the science field

the value of the received field

a range of dataset names

Insert new rows into the table.

Remove rows from the table.

Update rows in the table.

In document Apometria Que Es La Apometria (Spanish) (página 47-50)

Documento similar