• No se han encontrado resultados

LA PROVINCIA DE MISIONES Y EL AREA DE ESTUDIO CARACTERISTICAS FISICAS, ECONOMICAS Y SOCIALES

2.4. LA ACTIVIDAD ECONOMICA 40 Características Generales:

An important lesson in this work is that users are generally more able to recall the context in which a file was used than the contents of the file itself. One of the primary reasons for this is that this context often contains information about the personal ways in which a user conceptualizes a document.

In the process of doing a literature search for a research paper on contextual re- trieval, one might issue the query “papers on contextual retrieval”, to which a search engine like Google might be able to return papers on a conceptually similar topic like “personalized search”. This retrieval is enabled in part by the fact that the hyperlinked structure of the web can leverage the multiple ways in which the universe of users or- ganizes information. For example, an individual might link to a “personalized search” paper within their “context retrieval” web page, enabling search tools to connect the similar concepts. In local document retrieval, this structure cannot be leveraged. How- ever, being able to connect the user’s initial query to the document which was ultimately retrieved through a system like SeeTrieve allows the user to implicitly describe their own documents through their behavior.

Chapter 5

Passages: tracing text as a first

class entity

The Passages system enhances information management by maintaining a detailed

chronicle of all the text the user ever reads or edits, and making this chronicle available for rich temporal queries about the user’s information workspace. Passages enables queries like, “which papers and web pages did I read when writing the ‘related work’ section of this paper?”, and, “which of the emails in this folder have I skimmed, but not yet read in detail?” As time and interaction history are important attributes in users’ recall of their personal information, effectively supporting them creates useful possibilities for information retrieval. We present methods to collect and make sense of the large volume of text with which the user interacts. We show through user evaluation the accuracy of Passages in building interaction history, and illustrate its capacity to both improve existing retrieval systems and enable novel ways to characterize document

activity across time.

5.1

Introduction

We interact with our desktops through applications’ graphical user interfaces, through which large amounts of text are presented to us. This text can be captured and cheaply stored, making it amenable to indexing and retrieval. This work explores the application of this text to information management and retrieval, specifically by capturing the fine details of the user’s interaction as a first class entity.

There are two important attributes of the viewed desktop text — which I will refer to as the text stream. First, it contains a comprehensive record of all our text-based desktop activity, including the contents of all the web pages, emails, and other files with which we have interacted. Second, detailed timing information about its contents’ visibility is available, enabling a precise record of what was viewed and when it was viewed [26]. These attributes can be combined to form a rich history of the user’s interaction with their information, documenting for every point in time what the user was reading or writing.

This is well suited to address a need highlighted by recent studies on information retrieval which show that the history of our interaction with information plays a funda- mental and useful role in our recollection of that information [5, 15, 25]. For example, having recently read a useful fact from a research paper while writing a literature review for a grant proposal, a user may want to refer to that paper again but neither remember its location nor any specific keywords with which a search query could be issued. On the other hand, they may remember contextual, timing aspects of their interaction with the document, such as having read it within the week prior to the proposal deadline, having skimmed the document (e.g., spending under 10 minutes reading it, or only having read certain parts), or having used it contemporaneously to the grant proposal

within which they wrote about the lost document.

This recollection of temporal events is very nuanced and personal; yet existing sys- tems and applications such as browsers, email clients, and filesystems, remain coarse and one-dimensional in supporting it. Although some research systems have addressed this limitation by supporting time from the ground up, they lack generalized applica- bility as they involve either a dramatic overhaul of existing systems (e.g., [17, 28]), or application-specific adaptations [49]. Our work captures the best of both approaches, being an application-agnostic adaptation of existing applications and filesystems to sup- port rich time-awareness; in essence, migrating the state of the art from proof of concept to usable implementation. Further, our approach does not adhere to a strict definition of a file: where existing systems treat information by distinct file types (e.g., web pages, emails, orPdfs), Passages’s tracing at the text level captures information interaction without rigid, foreknown types. Hence, our approach is useful in new interaction con- texts such as web-based application interaction, where traditional document definitions do not apply.

We designed the Passages system to capture the user’s text stream and transform it into a rich, finely grained, application-agnostic, information-interaction history for use in information retrieval solutions. Passages can answer questions that are not easily answered by existing systems, such as, “which of these conference papers have I not yet thoroughly read”, “which documents did I read when writing this literature review section”, “what functions was I working on before I committed this code”, and “which documents did I spend the most time on the month before the grant deadline?” In this chapter, we detail and address the challenges involved in transforming the raw text stream into a form from which useful temporal information can be drawn. We show that these methods are accurate, efficient, and substantially improve upon existing systems.