6. PRESENTE Y FUTURO EN LA TRIBUTACIÓN AMBIENTAL
6.6. Las ayudas del Estado y la protección ambiental en la UE
6.6.1.2. Comunicación de la Comisión Directrices sobre ayudas estatales en materia de
from a remote site to the site that issued the queries. To compare different vertical fragmen- tation schemata we would like to compare how it affects the transportation costs. So we can simplify the cost model in Section 3.3 as following:
costλ(Qm) = k X θ=1 v X u=1 needθ(Fu)·cθθ0. (4)
Note that the cost factorcθθ0 = 0, if θ=θ0.
Example 7.2. Assume a fragment F being accessed by three queries from three different sites, a, b and c, respectively. If we allocate fragment F to site c, λ(F) =c, then the costs of all queries that access this attribute can be calculated by summing up the needat site a multiplied by the cost factor cca and theneedat sitebmultiplied by the cost factorccb. Using Formula (4) we have:
costλ(F)=c(Qm) =needa(F)·cca+needb(F)·ccb
u t
7.3
A Heuristic Approach for Vertical Fragmentation
As in [88], we assume a simple transaction model where the system collects the information at the issuing site of the query and executes the query there. In this model we can evalu- ate the costs of allocating a single attribute to network nodes and then make decisions by choosing a site that leads to the least query costs. Also, according to our discussion of how fragmentation affects query costs, the allocation of fragments to network nodes following the cost minimization heuristics already determines the location assignment, provided that an optimal location assignment for the queries was given prior to the fragmentation.
Taking the simplified cost model introduced above we now analyze the relationships be- tween thecost, the pay and therequestof an attribute. We compute the following formulae:
costλ(ai)=θ(Qm) = k X θ0=1,θ6=θ0 needθ0(ai)·cθθ0 = k X θ0=1,θ06=θ m X j=1,λ(Qj)=θ0 n·`i·fji·cθθ0 =n·`i· k X θ0=1,θ06=θ m X j=1,λ(Qj)=θ0 fji·cθθ0 =n·`i· k X θ0=1,θ06=θ requestθ0(ai)·cθθ0 =n·`i·payθ(ai).
The above formulae gives rise to two alternative heuristics for the allocation of an attribute ai (i= 1, . . . , n).
7.3. A HEURISTIC APPROACH FOR VERTICAL FRAGMENTATION Hui Ma
– The first heuristic allocates ai to a network node Nw such thatpayw(ai) is minimal, i.e., we choose a network node in such a way that the total transport costs for all queries arising from the allocation are minimized.
– The second heuristic allocates ai to a network node Nw such that requestw(ai) is maxi- mal. i.e., we choose the network node with the highest requestof the attribute ai. This guarantees that there are no transport costs associated with the data of attribute ai for those queries that need the data ofai most frequently. In addition, the availability of data of attribute ai will be maximized.
Taking the first heuristic, first occurring in [78], we perform vertical fragmentation using six steps below. Read and write queries are distinguished because replication is not considered at this stage. The second heuristic is easily formulated. It is actually a special case of the first heuristic when a simple query environment is assumed.
1. Take the most frequently used 20% queries Qm.
2. Optimize all the queries and construct anAUFM for each database typeE based on the queries.
3. Calculate the request at each site for each attribute to construct an Attribute Request Matrix.
4. Calculate thepay at each site for each attribute to construct an Attribute Pay Matrix. 5. Cluster all attributes to the site which has the lowest value for pay.
6. Attach the primary key to each of the fragments.
In order to record query information, an Attribute Usage Frequency Matrix (AUFM) is used to record frequencies of queries, the set of atomic attributes accessed by the queries and the sites that issue the queries. Each row in theAUFM represents one queryQj; and the head of each column contains the set of atomic attributes of a given representation typetE, the site issuing the queries and the frequency of the queries. We do not distinguish between references and attributes, but record them in the same matrix. The values on a column indicate the frequencies fji of the queriesQj that use the corresponding atomic attribute ai grouped by the site that issues the queries. We treat any two queries issued at different sites as different queries, even if the queries themselves are the same. The AUFM is constructed according to optimized queries in order to record all the attribute requirements returned by queries as well as all the attributes used in some join predicates. If a query returns all the information of an attribute then all its sub-attributes are accessed by the query.
This procedure is formally described as the algorithm below. With theAUFM as an input, we now present a vertical fragmentation algorithm (as described in Algorithm 7.5). For each atomic attribute at each site, the algorithm first calculates the request and then calculates thepay. At last, all atomic attributes are clustered to the site that has the lowest value of the pay. Meanwhile, a set of path expressions for each vertical fragment are obtained. Vertical fragmentation is performed by using the sets of paths. A vertical fragmentation schema and an allocation schema are obtained simultaneously.
Algorithm 7.5. [Cost-Based Vertical Fragmentation Algorithm]
Input: theAUFM of database typeE
atomic(E) ={a1, . . . , an}/* a set of atomic attributes
P AT H(E) ={pathi, . . . , pathn} /* a set of path of all atomic attributes a set of network nodesN ={1, . . . , k}with cost factors cij