9. Troubleshooting
9.2 Nominal resistances of the temperature sensors
In stream query processing many assumptions that are the basis of traditional data- base systems do not hold [CcC+02, G ¨O03]. Not only are the queries inherently long- running, but data are pushed to the query evaluation engine asynchronously, rather than being pulled from a permanent store on demand. Thus, instead of simply an- swering streams of queries over static data, data can be streamed over queries as well. As the data characteristics and arrival rates of input streams fluctuate over time, and the resources available are subject to changes, AQP techniques have been employed to gain dependability and reasonable performance. The techniques that will be discussed next operate over streams produced by sensors and other autonomous remote sources, but, as in all techniques presented to this point, the query processing takes place in a centralised manner (Table 4.4).
An initial blending of Eddies [AH00] and Stems [RDH03] yielded the CACQ (Con- tinuously Adaptive Continuous Queries) technique [MSHR02], which performs adap- tations over many queries running in parallel. Thus the focus is slightly shifted towards inter-query rather than intra-query adaptivity, which is out of the scope of this thesis. CACQ shares the characteristics of Eddies and Stems: it adapts with intra-operator frequency, it impacts on the complete query plan, it monitors the operator cost and in- termediate data cardinality (to infer the operator selectivity), and it revises the execu- tion order, when, in the light of updated information, this becomes suboptimal. Stems are used only to share state, and their capability to replace operator implementations is not exploited in the context of CACQ. Nevertheless, CACQ provides a promising solution for adapting to changing workload, which affects operator cost, and skewed value distribution, which affects operator selectivity within streams.
[CF03] presents PSoup, which is an extension to CACQ. PSoup’s query engine permits the queries to refer to data arrived before the submission. The basic concept is to treat data streams and queries in the same way, using and extending the SteM and Eddies technology. Stems in Psoup store queries as well, and eddies route queries and not only data tuples, although both stems and eddies were not initially designed for this. It is important to note that the software platform for all the above has been TelegraphCQ [CCD+03, KCC+03].
dQUOB conceptualises streaming data with a relational data model, thus allowing the stream to be manipulated through SQL queries [PS00, PS01]. For query execution, runtime components embedded into data streams are employed. Such components are called quoblets, which correspond to query operators. Detection of changes in data stream behaviour is accomplished by a statistical sampling algorithm that runs periodically and gathers statistical information about the selectivities of the operators into an equi-depth histogram. Based on this information, the quality of quoblets order is re-assessed, and the system may choose to reorder operators on the fly.
Two other, related adaptive proposals for stream systems are the Chain [BBDM03] and STREAM (Stanford Stream Data Manager) systems [MWA+03, BMM+04]. The Chain focuses on adapting to data arrival rates, like, for example, XJoin [UF00]. How- ever, the aim is to keep memory usage at a minimum level rather than avoiding idle CPU times. Thus Chain tries to control the size of intermediate results stored in in- put queues of operators, and monitors their sizes, instead of monitoring the resource connections. Nevertheless, it employs the response form of operator rescheduling to achieve this.
[MWA+03] uses the same technique as Chain in the context of the STREAM sys- tem. It also allows query operators to reconfigure the memory they are allocated ex- plicitly. When there is not enough memory, the system starts evaluating queries over streams in an approximate manner. The memory allocation policy is re-evaluated, in such a way that the precision of final results is maximised. In general, approximation is not acceptable in traditional query processing, but in many wide-area scenarios ac- cessing remote data there is no requirement for 100% accuracy. STREAM, like Chain, also monitors the operator selectivities to obtain their actual value, and thus, to enable the determination of the optimal execution order [BMM+04]. However, heuristics are employed as an exhaustive search of possible ordering is not considered due to its complexity. AQP in STREAM attempts to balance three conflicting objectives: low runtime overhead, high speed of adaptivity and good convergence to the solution of a
Technique Monitoring Assessment Response Architecture
Focus Freq-
uency
Issue Response Form Impact Data
Local. QP Local. Adapt. Local. river [ADAT+99] resource per- formance intra- operator workload imbalance operator recon- figuration
operator local central central flux [SHCF03] resource per- formance intra- operator workload imbalance operator recon- figuration
operator local central central parad [HM02] data cardinal- ity, resource pool, memory inter- operator insufficient memory machine rescheduling
query plan local central central
Table 4.5: Summarising table of parallel AQP proposals according to the classifications of Section 4.3.
static system if parameters stabilize. Different variants of operator ordering algorithms are presented, which trade, to a various extent, one of these objectives in favour of the others. If the runtime overhead decreases, the convergence and the adaptivity speed degrade. Algorithms with worse convergence properties adapt more quickly and incur lower overhead. Quick adaptations may not converge satisfactorily and may be quite costly as well.