• No se han encontrado resultados

4. OBJETIVOS

7.11. ÁREAS DE PROTECCIÓN DEL MUNICIPIO DE LA CELIA RISARALDA

In the model of information driven E-Commerce, access is a key concept. By access mode we understand the type of selectional actions a user takes in order to get to E-Commerce relevant information. Selectional actions on the Web are units of user’s interaction, comprising of interface device actions such as keystrokes, clicks, mouse movement and scrolling21. The approach presented here is restricted to a subset of selectional actions, namely those that trigger access to new information.

Considering common web technologies, typical access actions are filling out and sub- mitting forms and clicking on links, either with textual anchors or images. Javascript also allows other events to be captured, for example moving the mouse over the area of a specified HTML element. The now widely deployed AJAX makes access actions even less recognizable by constantly feeding new information. Still, two principle modes of access can be distinguished, access by browsing and access by searching.

5.2.1. Access by Browsing

Access by browsing describes the method of delivering new content based on move, click and scroll actions of the user. Following hyperlinks is one example of access by browsing, though there are several other means of browsing. All have in common that the user does not type in textual strings conveying her selection, but chooses a visible object.

It is thus not strictly necessary that access by browsing is never performed by keyboard. However, the selectional choice must be already present when the user triggers it, which is the case for example with keyboard shortcuts.

Compared with entering a query, the selectional range of a user is much more limited when she access information by browsing. Elements that can be browsed have to com- pete for the limited space on the screen. Complex menu structures with encapsulated submenus have the drawback that every additional click thins out the user traffic. In addition to this, Java-script based or Flash implementations of complex menus are often not as searchable as plain html pages and are not accessible to every user.

A second crucial difference is the presence of selectional choices before the user triggers one of them. Access by searching typically leaves the user for some time in ignorance about what happens next and if any results at all are produced. Access by browsing, on the other hand, is in general accompanied by the indications that some results will follow from making the browsing selection.

Anchor links as descriptors

A large part of access by navigation operates through links with anchor texts. Anchor texts are the text labels of links that appears between < A >-Tags in HTML. Addi- tional glossary terms in this context is thetarget page to which a link points to and the source page on which it appears. Internal anchor texts appear with links connecting a source page and target page of the same site. External anchor texts appear with links pointing to a page of a different site than the source page.

Navigational anchor texts, occurring mostly with internal anchor texts, do not reveal any content property of the target site. Among typical navigational anchor texts are here,click here,next page,home,back or indexes of first letters.

Content anchor texts, however, provide some indication to the content of the tar- get page. In the ideal case, they state a compressed description of the target page. Through the re-occurrence of external anchor texts, a reliable picture of what a target page is about can be discerned. The reliabilty depends on how independent the sites exhibiting the links are. As Search Engines heavily exploit external anchor texts, it is alluring to manipulate them by setting up a network of interlinked sites. Different measures, including graph algorithms that detect cyclic structures may be helpful to detect manipulative anchor texts22.

From the point of view of accumulating vocabulary, both internal and external anchor texts provide a valuable resource. As typically the size of cumulated internal anchor texts in a large crawl exceeds the external anchor texts by factor 5-10, neglecting internal anchor texts leads to a tremendous loss in corpus size.

A further benefit of internal anchor texts is their usefulness for classifying types of websites, for example business directory, company homepage or private homepage. The majority of company homepages have one of the following anchor texts: about us, investor relations,our products.

As a compressed description of the target page anchor texts are similar to queries. Matching queries to anchor texts, especially re-occurring external anchor texts, pro- duces in general a highly relevant results as re-occurring anchor texts contain only meaningful and characteristic descriptors of target pages.

Site map evaluation

Based on a robust recognision of important terms on a site it is possible to print a separate site map that is ranked based on query log driven data. This new site index can be compared to the access possibilites originally offered to the user. Accessibility of a content page can be calculated by counting the selectional actions the user has to take (clicking on a link, choosing a form field, scrolling, entering text).

If restricting the access to navigational browsing, a simple metric just counts the number of clicks necessary to get to the page holding the content information. Eval- uation could then be done either as a macroevaluation over all terms on a site that also appear in a query log or as a microevaluation for individual pieces of information

such as products offered on a site. For example, locating a specific notebook model on samsung.de requires three clicks: Produkte — Notebook PC — Drop-down Modell ausw¨ahlen. In automatically calculating the navigational distance, clicking links has to be discriminated from selecting a form field, as crawling forms is still a challenge to automated spidering.

Refining these measure should take both the graphical representation and the tex- tual content of the selectional elements into account. The graphical representation comprises of the prominence of the link on the screen, determined by its size and position.

Examining the textual content of links that lead to a specific piece of information can be done through setting up navigational paths, i.e. the concatenation of anchor texts labeling the links that lead to this piece of information. Navigational paths can be compared both in terms of length and terms used.

DDR-RAM

preissuchmaschine.de

Computer>Arbeitsspeicher, CPU & Mainboards>PC-266 DDR-SDRAM

pearl.de

Hardware & Multimedia > Bauteile, Geh¨ause, Brenner & Rohlinge > DDR >NoName

Frontlader-Waschmaschine

preissuchmaschine.de

Haushaltselektronik>Waschmaschine> Frontlader

zarsen.de

Waschen & Trocknen> Trockner >Waschmaschinen >Waschmaschi- nen Einbau

Miniofen

promarkt.de

preissuchmaschine.de

Haushaltselektronik>Back- & Grillger¨at>Backofen>Toaster/Mini¨ofen

From these examples it becomes clear that it is not always easy to predict for a user which navigational path might lead to the aim. Both the length of the path and the terms used in the hierarchy vary. Although it is tempting to try to deduct synonyms and related terms by comparing the paths leading to the same product on different sites, one has to be aware of the subtle semantic changes — consider for example the quasi-synonyms Haushaltskleinger¨ate andHaushaltselektronik — and the edit rules of category wording — for example the usage of compounds (Haushaltskleinger¨ate) and coordinated terms (Back- & Grillger¨at) — that occurs in the terms of thee paths.

5.2.2. Access by Query Search

One component of Internet literacy is recognizing and manipulating a query interface23. An indication of how deeply ingrained querying is into Internet literacy lies in the difficulty to describe what one does conceptionally when putting in a query. Several models are conceivable but not any one of them alone is sufficient: ranging from asking a question, placing a directive, describing an informational need, referring to a named entity, putting in pieces of texts memorized from the last time the page was seen etc. Access by searching is used both for getting a general overview on a topic as well as retrieving a specific information (details on a classification of search queries and user intentions are provided in Part C). Obviously, users soon develop their own strategies in order to adapt to the capabilities of a search agent.

In this adaption proces, a standard look and feel of search engines has evolved featuring one big query field, a submit button and a separate result page listing the results linear and in priority order24. Though the size and the placement of the search slot is generally less prominent on content sites (such as online newspapers), search interfaces on these sites follow the same logic. If a search agent’s interface deviates from this design, it is likely that many users will not recognize its function. The downside of a commonly shared image of how a search interface should look like is that it impedes innovative interfaces. In addition, as the standard Web search engines also perform in a comparable way — for example, typing in a query hotel near neuschwanstein at 200-250 EUR per night available tomorrow night will not work on any search engine, hotel neuschwanstein will produce at least some valid results — adding new elements to the search logic will require some time until users become bolder and find adaptive strategies for the augmented capabilities.

23The issue of internet literacy is discussed especially from the point of view of education and school curricula. However, it affects more aspects of society, including government and work place. See the White Paper of the 21st Century Internet Literacy summit, Bertelsmann Foundation and AOL Time Warner Foundation, Berlin 2002.

24

See Jakob Nielsen, “Mental Models For Search Are Getting Firmer” online at

Providing example queries around the search slot is a good measure to guide users to what type of search queries the system is supposed to handle well. However, detailed analysis of query logs reveal that many users do not pay much attention to the context of the search slot. For example, Yellow Page sites offering both a What? and a Where? slot typically have some amount of intersection between these fields that is due to users mistakenly type into the wrong search slot.

5.2.3. Combining search and browsing

Both searching and browsing entail inspection cycles in which the user moves forwards (assessing the results) and backwards (reformulating the search or following a different navigational path). These two movements correspond to the two tasks any agent pro- viding information has to accomplish, firstly processing the users’ input and transform it into a look-up command, secondly performing this look-up on the database storing the results.

In the case of Web searching, the first task consists of a query processing that analyzes the search input and transforms it into an index look-up. The second task organizes the content of crawled webpages into one or more index tables.

Compared to this, the first task in access by navigation consists of placing links and adding anchor texts. The second task is distributing the content of a site among several browseable units, including webpages, popups, paragraphs etc.

There are two filters that a user query has to pass before it is met with relevant results — the system has to recognize how the query is worded and the index has to store the result wanted. For example, if searching for a hairdresser in a specific location on a categroy-based Yellow Page site25, there are two possible scenarios why the search could fail, either because the query term is not matched to the proper category or because there happens to be not hairdresser in that location.

Naturally, many sites use both strategies to get the users to the content wanted. There is major difference, however, between these two in how active users have to be in verbalizing their wants. Access by browsing provides a selection of already verbalized items, while a search slot is at first an empty line that the user has to fill by her own wording.

While access by browsing shows the user before what can be expected to be in the result set (apart from cases where the label of the navigational element and the content of the target page deviate), access by searching is at first a jump into unknown water. If it returns no hits the reason might either be that the result wanted is really not in the index or alternatively that the wording of the query was not suited for producing it. It is only through a trial-and-error process that the reason might be determined. The uncertainty why a search fails is even aggravated if a search application answers similar searches with disparate results as most Web searches do today (see Part C).

As a consequence, evaluating the quality of a search agent has to make sure that it does not evaluate the content of the index. Zero hits for a search might be the

consequence of a genuine gap in the available data which cannot be attributed to the performance of the search agent. Zero hits because of lack of records, vs. failure of query-recognition. If a result is in fact not in the index, this information has its worth even without presenting suggestion on where to look at elsewhere. In this context, it is the consistent handling of similar searches that is the indication that the first filter — recognizing the intent of the users’ search — performs well and that zero or few results are really due to gaps in the index and there is no need to further try very similar searches.

A solution could lie in combining searching and browsing by creating link to re- sults while the user starts to type a query. Such a suggest mode has to be capable of query variation principles — orthographic, morphologic, syntactic, semantic and head/container-related (see Part C) — in order to get the best out of both worlds. Shrowsing, the mixture of searching and browsing, would allow dynamic inspection of the available index data as the user types26.