EVOLUCIÓN DE LA COPOLIMERIZACIÓN CON LA CONVERSIÓN

CAPITULO 1. SÍNTESIS Y CARACTERIZACIÓN DE LOS SISTEMAS

4 EVOLUCIÓN DE LA COPOLIMERIZACIÓN CON LA CONVERSIÓN

The detailed analysis of two datasets in the previous sections highlight the effectiveness of proposed techniques for view recommendation. Particularly, we show that the interestingness of the recommended views improve in terms of deviation by automatically generating best binning on numerical attributes by employing MuVE, automatically finding subsets of data by employing QuRVe which automatically refines input query and finally automatically findings two interesting subsets of data by refining reference dataset. The View-360 seamlessly performs all of these tasks and facilitate the user in the exploration process. However, by no means we claim that we have fixed all open issues in this problem domain. Rather, we believe that the View-360 is just an initial step towards having a holistic system that effectively recommends interesting views for data exploration. In this section we discuss the lessons learnt from View-360 and some of the open questions related to it.

Attribute Sets:One of the key factors on which interestingness of a view depends is the attributes used for measure, dimension and predicate. Although, View-360 searches for interesting views from all combinations of A,M, and P, however, the user provides us with the sets A,M and P as input. Identifying meaning and relative importance of attributes is a non-trivial task and depends completely

5.6. DISCUSSION 123

Figure 5.27: Top-k List with Target View Refinement

Figure 5.28: Top-k List with Comparison View Refinement

on the semantics of the data. This is an open question that how to decide which attribute is relevant to which set. One straight forward strategy is that all dependent and numeric attributes can be assigned to set of measures (M), while All independent and categorical attributes can be assigned as dimensions (A). However, some attributes are suitable for both predicates and dimensions. In View-360 the choice is left for the user and if the user defines overlapping sets of A and P, View-360 assigns the overlapping attributes one role at a time, i.e., if the attribute is used as a dimensions it is removed from predicates and vice versa.

Aggregate Functions: View-360 supports COUNT, SUM, AVG as aggregate functions in our analysis. When the list of top-k aggregate views is generated each view is ranked as an individual, however, in the analysis, it was noticed that just one particular view with a particular aggregate function fails to tell a complete story about the data. For instance, in Figure 5.16, just one view was not enough to tell the whole picture, views with other aggregate functions were displayed in View-360’s further exploration feature, as shown in Figure 5.17, to complete the picture. In short, to understand the insight, all the views with all aggregate functions are considered, therefore, it might be a good idea to group together the views with the same predicates, dimensions and measures but different aggregate functions and then rank the groups to get the insights.

Quality of Views: When View-360 was configured to refine the reference dataset, a big boost in the deviation of the recommended views was observed, but it compromised semantic quality of the recommended views. Simply searching for two subsets that are completely different from each other on some combination of A,M and F can be extremely noisy and misleading. Automatic refinement results in very restrictive queries that represent a small subset of data, the power analysis of View-360

124 CHAPTER 5. VIEW-360: A PROTOTYPE SYSTEM FOR VIEW RECOMMENDATION checks that the subset passes the minimum criteria, however it appears that this is not enough to make sure interesting insight. The smallest subset that passes the power test comes out in top-k with every other subset. For instance, Section 5.5.1 after automatic refinement the smallest subset comes out to be patients with age group ‘[0-10)’ and the top-2 views belong to that subset as shown in Figure 5.27. While Figure 5.28 shows list of top-k views, when the refinement is applied to reference dataset as well, it can be clearly seen that age group ‘[0-10)’ is compared with every other age group that exist in data. These views have high deviation but not really interesting semantically. Moreover, such views provide little information gain and are less interesting for the user. However, how to detect these views and prune them is challenging and is an interesting direction for the future work.

Ranking Criteria: View-360 ranks the views based on the deviation between target and comparison views, however, the interestingness can be explored by incorporating other criterion. For instance, in Figure 5.26b, the comparison view in itself was showing a unique pattern when compared with the comparison view of Figure 5.26a instead of the target view. This mean in one aggregate view if the corresponding target and comparison view are based on different aggregate function instead of different predicates it can bring out something new and interesting. Moreover, the conversion of results into probability distribution is useful generally, however, in some cases it leads to misleading results and investigating without normalization gives better insights into data. On similar lines other ranking criterion and possibilities need to be explored.

User Feedback:View-360 gives weight to the user’s preferences by allowing the user to specify number of input parameters and then making the recommendations based on automatic exploration. View-360 provides maximum coverage by exploring all possible subsets of data and making all possible comparisons. However, exploration is an iterative process, it is impossible to guarantee that the recommended views satisfy user’s expectations. In most cases, the user do explore in iterations by changing input parameters. Despite of all the automation, user still is the key to effective recommendation of views. Therefore, it is worth investigating how to improve quality of recommended view according to feedback from the user. Additionally, history of exploration from same user or other users on same dataset can also be used as an input to the recommendation process.

Chapter 6

Conclusions and Future Work

The goal of this thesis was the design, implementation and evaluation of view recommendation schemes for visual data exploration. Next, in Section 6.1, we summarize our contributions towards that goal and in Section 6.2, we describe directions for future work.

6.1 Summary of Contributions

We have addressed the challenging problem of efficiently and effectively recommending views from complex datasets for visual data exploration. While the recommended views provides the user with effortless insights into data, quantification of relevance for view recommendation is a non-trivial task and, additionally, the recommendation process is tremendously computationally expensive. Hence, in order to address these challenges we proposed various schemes in this thesis as summarized below.

In Chapter 3, we proposed a novel utility function and a suite of search schemes for recommending top-k views in the presence of numerical dimensions. Our utility function recognizes the impact of numerical dimensions on visualization, which is captured by means of multiple objectives, namely: deviation, accuracy, and usability. Our proposed search schemes further incorporate that utility function for the purpose of recommending the top-k aggregate data visualizations. A key goal in the design of those search schemes is to efficiently prune the prohibitively large search space of possible views. That goal is reasonably achieved by our first scheme Multi-Objective View Recommendation for Data Exploration (MuVE), and is further improved by uMuVE, at the expense of a high memory usage. Accordingly, we presented MuMuVE , which provides a pruning power close to that of uMuVE, while keeping memory usage within a predefined constraint. Our extensive experimental results show the significant gains provided by our proposed scheme.

The most expensive operation while computing the utility of the views is the time spent in executing the query related to the views. To reduce the cost of this particular operation, in Chapter 3.5, we propose a novel technique, materialized View (mView), which instead of answering each query related to a view from scratch, reuses results of the already executed queries. This is done by incremental materialization of a set of views in optimal order and answering the queries from the materialized

126 CHAPTER 6. CONCLUSIONS AND FUTURE WORK views instead of the base table.

Visual data exploration involves several iterations of selecting subset of data by issuing an input query, and analysis by generating different visualizations. Motivated by the need for finding interesting views from prudent subsets of data (i.e., input queries), in Chapter 4, we propose efficient schemes Query Refinement for View Recommendation (QuRVe), that automatically refine input query to search for subsets of data having interesting views and recommend the top-k views. However, such uncon- trolled refinement of queries can lead to multiple problems such as loss of user preference and random discoveries. Therefore, a multi-objective function is proposed to measure relevance, interestingness and significance of the refined queries and their corresponding views. We have proposed a novel suit of schemes that efficiently navigate the refined queries search space for recommendation of data visualizations. The main idea underlying the proposed QuRVe scheme is to incrementally access the refined queries in order of their similarity with the original query, which allows an early termination of search and results in pruning of a large number of views. Additionally, uQuRVe scheme reduces the cost further by tightening the upper bounds on the utility of the views and short circuiting unnecessary views. In addition, uQuRVe-range scheme is proposed, which makes sure that high utility views are probed first and as a result higher number of low utility views are pruned. We have also proposed approximation based schemes that provide order of magnitude reductions in processing costs, while maintaining utility of recommended views near optimal schemes. Our extensive experimental eval- uations show the efficiency exhibited by our proposed schemes under various settings, and the the significant benefit it provides compared to existing methods.

QuRVe focused on finding interesting views from all subsets of data by comparing them with a user provided reference dataset. However, the search space can be further extended by involving comparison of all subsets of data with each other to find interesting views. Therefore, to explore this dimension of the problem, in Chapter 5 we propose to automatically refine query for comparison views as well. We propose the context of comparison between the target and comparison view queries. We also outline the design and implementation of a holistic prototype system View-360, which included all aspects of aggregate view recommendation i.e, recommendation based on categorical and numerical attributes, and recommendation based on refinement on target and comparison queries. We then showcase the effectiveness of our proposed schemes by performing detailed analysis on two real datasets from different domains.

In document Sistemas poliméricos hidrofílicos con grupos ionizables y su aplicación como biomateriales (página 83-95)