• No se han encontrado resultados

The vast majority of commercial web search engines offer some kind of text-based query inter-face to its users. Typically, a user would construct some string containing keywords that, in the user’s perception, describe or is associated with the information a user is interested in retriev-ing. This string or query can be seen as the primary input from the user to a search system. A query based interface for the user agent of this section could also be beneficial, as most users are already familiar with this type of interface. There are however some design issues that must be discussed if such a query based interface is to be a success.

Query format

The first design issue that should be addressed in a text-based query interface is the question of what these queries should look like and how “rich” the query language should be. At a minimum, a typical user query usually consists of a string of n space-separated keywords (called terms). It is also quite possible that a user might want to group a certain sequence of terms into a phrase he/she is interested in searching for.

Another important element in the query syntax is a method for specifying which words a user does NOT want featured in the returned results of the specific query. This has a significant impact on the relevance the returned results has for the specific user. It should be noted here that by default, if a series of n space-separated keywords is submitted as a user query, the system should assume that the user is interested in documents containing all or some of the keywords

in the submitted query (i.e. documents that only contain some of the query terms should also be included in the results). There should also be a mechanism for specifying that a specific word or phrase should definitely be present in any document that is returned as a result (i.e. analogous to the boolean operator AND).

Finally, an additional feature that could be of value for users is the explicit indication of ex-changeable terms in a user query. This feature grants a user the flexibility of specifying that two (or more) keywords are relevant to a specific query. This could have the effect of returning a larger, more general, set of results for the specific query. Having users explicitly specify different options (or synonyms) for a specific keyword frees the system from having to guess them when the query is processed.

The syntax elements defined to express the ideas discussed above in a user query string is listed in table 8.1 below.

Syntax Element Description

query A collection of terms constituting a user’s request.

term Any combination of alphanumeric characters (typically forming some word).

“...” Inverted comma’s (“”) indicate that a collection of keywords are grouped together as a phrase and should be treated as a single entity by the search system.

-term or -“...” A subtraction symbol (-) placed in front of a search term or phrase

indicates that the documents containing that specific term or phrase should be excluded from search results.

+term or +“...” Indicates that the search term or phrase must be present in any potential result found for the specific query.

|term or |“...” Indicates that two (or more) terms are exchangeable and relevant to a given query. Results containing any of these alternatives should be returned.

Table 8.1: User query syntax elements

Query type

As was discussed in a previous chapter, queries can be classified as being one of three types:

navigational, informational or transactional. Classification of a query into one of these types could have an effect on how the query is handled by the system. If a user is required to classify his/her query before its submission into one of these three types, the system could then favour results that are related to the type specified. Additionally, forcing classification in this manner could also yield queries which are semantically less complex to handle.

As an example of this consider a query such as “Where can I buy sports shoes ?”. For a human observer, it is quite easy to see what type of query it is and what the topic is. This can be less obvious to a machine. If this query was classified by the user as a transactional query, the query could be reduced to only “sports shoes”, as the system would already know that results which are commercial in nature should be favoured over results that are informational in nature.

Requiring query classification could provide valuable leads as to the type of need behind the query and aid the search system in providing more relevant results.

Query context

Obviously, if a text based query interface approach is chosen for the user agent, the next logical question to ask is how context can be introduced into query. There are two strategies1that could hold potential for context introduction into a user query:

• Explicit context specification. This strategy relies on the user to select a specific category of information of interest. From the users selection, the system can then identify the context of the query and treat it accordingly.

• Guessing context. Rather than explicitly requiring users to supply contextual information, this technique attempts to determine the context of a query from the keywords present in a user query.

1Refer to chapter 6 for a more detailed discussion of context in web search.

In the user modelling approach presented in this chapter, both of these strategies could be suc-cessfully achieved.

For explicit context specification, the user could be presented with the structure of the ODP tree and be required to indicate in what category he/she perceives his/her query to be in. In effect, the user maps his query to internal nodes in the ODP tree. From this classification, the user agent could tailor results according to it and narrow or broaden the search effort accordingly.

The user agent could also guess context through analysis of keywords present in a user query. In effect, this would be analogous to matching the keywords stored at the external nodes in the ODP tree to the keywords found in a user query. If the word is found, its context could be discovered by considering the ancestors and siblings of the external node where that particular keyword was stored.

A combination of the two strategies could also be of interest. By requiring users to specify a top-level category of interest (i.e. arts, business, health & sciences etc.), the context of the query could be narrowed to only a subtree of the ODP-taxonomy indicated by the user. An analysis of the keywords present in the user query could then be restricted to the indicated subtree, narrowing the search space for keywords substantially.

Internal query structure and representation

As was discussed up to now, there are a number of aspects of importance when developing a query based interface to a search system. Another important factor is the actual internal repre-sentation of a user search query in the system.

The user agent discussed in this section endeavours to collect and process much more about a user request than a simple string of words. A format is therefore needed that can encapsulate the query string, as well as other information (such as the query type, context information etc.). To address this, an XML based scheme for query representation can be used.

The XML could provide an extensible and flexible platform for the encoding and representation

of user queries by the user agent. This has the additional benefit of being a standard representa-tion format that can be understood and exchanged between the agents of the proposed system.

The type of information that should be encoded in the XML-query is of key importance. There are a few basic elements that must be present in an encoded query like the actual query string entered by the user; the type of query indicated by the user; and the context of the query itself (see section on query augmentation for further details). It must be noted that these are only examples and that the scheme can be expanded to meet the needs of the system. An example of a XML scheme encoding this basic information is presented in figure 8.3 below.

<QUERY>

<STRING>...</STRING>

<TYPE>...</TYPE>

<CONTEXT>...</CONTEXT>

...

</QUERY>

Figure 8.3: Internal query representation using XML.