CAPÍTULO 2: FUENTES PRIMARIAS Y TIPOS DOCUMENTALES
2.1 Breve historia del desarrollo de los ferrocarriles en Colombia
1 2015 1245 1180 2887 1831.88 -564.44 10 2455 2433 1870 3277.5 2508.88 112.56 100 2324.5 3137.5 2052.5 2262.5 2448.75 52.44 1000 2307.5 3363 2645 2867.5 2795.75 399.44 col. mean 2280 2544.63 1936.88 2823.75 µ=2396.31 αj -116.31 148.31 -459.44 427.44
Table 7.7: Computation of effects for ydi f f sT
unexplained = 40.12%
As for the first response variable analysed, yAutUpdT, the model is not accurate
enough to explain the total variation, and given what we said about the causes that generate these time differences, it is understandable that the model cannot precisely
explain the behaviour of ydi f f sT. However, comparing the explained variation for
all the response variables analysed, we see that the term that is responsible for the
inaccuracy of the model with regards to the synchronisation time (yAutUpdT) is mainly
the time needed for the update file creation phase, yLogGenT.
7.6 Conclusions of Performance Analysis
In this section we collected and summarised the main observations we did throughout the previous sections of the performance study. As regards the way the different sub- phases contribute to the synchronisation time, we have seen that the main responsible is the update file creation phase (LogGenT). This phase is much more time consum- ing than the other ones and possible optimisations can be done improving the way the DBWatcher interacts with the LogMiner. The second largest time contribution is GRCSNotT, which is due to a synchronous call from the DBWatcher to the GRCS. An asynchronous call could decrease the value of GRCSNotT but it would affect the way update requests are serialized by the GRCS. This is the reason why we did not deal with it in the current release of the software. All the other sub-phases, including the update file transfer, scale well with respect to both of the factors, considering the levels used for each of them.
As regards the model used to allocate the total variation of the response variables from their mean value, we have seen that the model accurately explains the total
variations in all the cases but yAutUpdTand yLogGenT. In particular, the inaccuracy of the
model as regards yLogGenT and some time contributions we did not take into account
can be considered the causes of the inaccuracy of the model for the response variable yAutUpdT. In particular, the delay for the synchronous call from the DBWatcher to the
GRCS (message 3.3 in Figure 7.2) and the variability of the polling frequency are not included in the model.
Chapter 8
CONStanza and Oracle Streams
for Conditions Database
Replication
As we have already seen in Chapter 4, one of the main use cases for CONStanza and the replication from Oracle to MySQL is the distribution of conditions databases. The four LHC experiments, and ATLAS in particular, use relational databases to save event metadata necessary for reconstruction analyses, what is generally known as conditions data. At Tier-0, conditions data are stored in Oracle, and replicated to Tier-1 sites to other Oracle databases using Oracle Streams. This replication is supported by the LCG-3D project, as we mentioned in Section 4.4. For what concerns the replication from Tier-1 sites to Tier-2 sites, more work needs to be done in order to provide a reliable solution for replicating data to open source databases.
In this chapter we show how CONStanza can be used in this scenario. CON- Stanza supports the synchronisation from Oracle to MySQL, and can be easily ex- tended to support the synchronisation to other open source or commercial databases. The COOL API, introduced in Section 4.2.3, will be used to insert and retrieve data in a 3-tiers database architecture.
8.1 Testbed setup
The testbed we used for these functional tests comprises an Oracle database at CERN (Tier-0), an Oracle database at CNAF (Tier-1), and a MySQL database at INFN-Pisa
(Tier-2). The two Oracle databases have been connected and kept synchronised us- ing Oracle Streams, while CONStanza has been deployed at CNAF (GRCS + LRCS Master) and at INFN-Pisa (LRCS Slave) for the heterogeneous synchronisation (from Oracle to MySQL). The COOL API has been used to develop two programs to in- sert data at the Tier-0 Oracle database and to read these data at the Tier-2 MySQL database.
8.1.1 Streams setup
For configuring Streams on the two Oracle databases we used several scripts pro- vided by the LCG-3D project (see Section 4.4) and available at the project web page [22]. The scripts use Oracle predefined PL/SQL packages that simplify the creation and the setup of database objects; for example, most of the procedures used to setup the Capture, Propagation and Apply processes are included in a package, DBMS_STREAMS_ADM, that is already available in Oracle. Assuming that two Oracle databases (single instance) are running at the source and at the destination sites (Tier-
0 and Tier-1), the Streams setup consists of the following steps1:
• Creating the Streams administrator user and grant him the necessary privileges.
This operation is done both on the source and on the destination database. This administrator account will be used to create database objects in the next steps.
• Creating a database link between the source and the destination database. This
link is necessary in order to propagate LCRs from a source queue to a destina- tion queue.
• Enabling supplemental logging on the source database: this must be done for
the LogMiner to work properly during the Capture activities.
• Creating the source queue and the Capture process at the source database.
Streams replication is integrated with Oracle Advanced Queueing. The Cap- ture process will use the LogMiner to extract DML and DDL changes made on the source database and will enqueue these changes as LCRs into the source queue.
• Creating the Propagation process at the source database. The Propagation pro-
cess is the process in charge of propagating LCRs from the source queue to the destination queue. When creating the Propagation process we have to specify the database link created in the second step.
8.1. Testbed setup 163
• Creating the destination queue and the Apply process at the destination database.
The Apply process is the process that takes LCRs from the destination queue and applies them to the destination database.
• Test the infrastructure. After the previous steps, we tested the infrastructure
by issuing SQL DDL and DML queries on the source database and verifying the correct application at the destination site. A simple test would be that of creating a table with some records on the source database and querying these records at the destination database.
These steps have been successfully done. After preparing the Streams infrastruc- ture we passed to the installation and configuration of the CONStanza servers. 8.1.2 CONStanza setup
At the Tier-1 site, the destination database for Oracle Streams has been used as mas- ter database for CONStanza. Here, a GRCS server and an LRCS server have been installed and configured. After preparing the security infrastructure, installing keys and certificates, we started configuring the servers. In the GRCS server configuration file (refer to the Appendix A for the GRCS configuration file options) we specified a logical database named “LDBCOOL”; when the GRCS server starts, this logical database is automatically subscribed, initially without replicas.
For configuring the master LRCS server, the file lrcs.master.conf has been edited, providing all the necessary options including the details about a database replica named “COOLDB1”. When the master LRCS server starts, after subscribing itself to the GRCS, it automatically subscribes the database replica “COOLDB1” as replica of the logical database “LDBCOOL”.
Once the configuration files have been edited, we started the GRCS server. Before starting the LRCS server, we configured the slave LRCS server at the Tier-2 site. Here, an empty MySQL database named “COOLDB2” was created; this replica will be automatically subscribed as a replica of the logical database “LDBCOOL” as soon as the slave LRCS server starts.
Then, we started the slave LRCS server and, after, the master LRCS server. At this time, all the synchronisation infrastructure is in place, and every change to the Oracle database at Tier-0 will be propagated and applied, first at Tier-1, using Oracle Streams, and then at Tier-2, using CONStanza. The scenario is depicted in Figure 8.1 The thick lines show the data flow, while the thin ones among the RCS servers show communications needed for implementing the update propagation.
CAPTURE Source Queue PROPAGATION Destination Queue APPLY Oracle source DB Oracle dest DB COOLDB1 Master LRCS MySQL COOLDB2 TIER-0 CERN TIER-1 CNAF TIER-2 INFN-Pisa GRCS Slave LRCS COOL insert program
COOL read program
Figure 8.1: Scenario for the replication of conditions databases using Oracle Streams and CONStanza