document search and retrieval

Top PDF document search and retrieval:

Geometric and Structural-based Symbol Spotting. Application to Focused Retrieval in Graphic Document Collections

Geometric and Structural-based Symbol Spotting. Application to Focused Retrieval in Graphic Document Collections

Symbol spotting systems are intended to produce a ranked list of regions of interest cropped from the document images stored in the database where the queried symbol is likely to be found. Symbol spotting can thus be seen as a particular application within the Information Retrieval (IR) domain. Usually, retrieval systems are evalu- ated by precision and recall ratios which give an idea about the relevance and the completeness of the results (we will briefly review these measures in section 7.3). These basic measures can be enhanced considering many other indicators depending on the application. For instance, Lu et al. evaluate in [LSS07] a set of desktop search engines by deriving a set of ratios from precision and recall to indicate the abilities of the systems when incrementally retrieving documents. M¨ uller et al. evaluate in [MMS01] content-based image retrieval systems, proposing some strategies to take into account the way the number of items stored in the collection affects the results and how user feedback can improve the response of such systems. Kang et al. eval- uate in [KKL04] a text retrieval system which uses semantic indexing, focusing on the distribution and amount of key-indices used to index the database. Finally, we can find in [HWH07, NBM06] the performance analysis of some information retrieval systems having the information distributed in a peer-to-peer network (P2PIR), which takes into account the query response time, the network resources requirements and the tradeoff between distributed and centralized systems. As we can see, the coverage of information retrieval topic is so wide that even if researchers use similar indicators to evaluate the performance of their methods, no general evaluation framework can be defined. In our case we will also base our measures on the notions of precision and recall by adapting them to the recognition and location abilities that the spotting systems should present.
Mostrar más

209 Lee mas

Uses and applications of georeferencing and geolocation in old cartographic and photographic document management

Uses and applications of georeferencing and geolocation in old cartographic and photographic document management

Applied to the field of audio-visual heritage, geolocation allows to geographically locate old photographs belonging to an institution, as well as any other type of paintings, drawings, brochures or historical posters. Likewise, it allows the optimisation of search engines of photographic collec- tions: in the case of old photographs, rather than by the au- thor or by the title of the photograph, the users carry out searches by theme or, above all, places. If the photographs have been previously geolocated, their access and retrieval will be significantly simplified by a map search, for example. This would also facilitate the study of the evolution of a pla- ce or areas of a city through its photographs, confirming the benefits of such techniques and programs in relation to old cartography and photography.
Mostrar más

11 Lee mas

Interactive video retrieval

Interactive video retrieval

include Nearest Neighbors (NN), Naïve Bayes (NB), Decision Tree (DT), Sparse Network of Winnows (SNoW), and Support Vector Machines are experimented for automatic question classification task (Zhang and Lee, 2003). Kang, and Kim (2003) classifies the user queries into three categories, that is, the topic relevance task, the homepage finding task and the service finding task using various statistics from query words. Different linear weights of text information and hyperlink information will be assigned based on the query categories to improve the web document retrieval. The similar idea can be naturally extended to the context of video retrieval. Rong, Yang, and Hauptmann (2004) proposed using query-class dependent weights within a hierarchical mixture-of-expert framework to combine multiple retrieval results. Firstly, they classify each search tasks defined by TRECVID2003 into one of the four pre-defined categories: Named person (P-query) queries for finding a named person, possibly with certain actions, Named object (E-query) queries for a specific object with a unique name, which distinguishes this object from other objects of the same type. General object (0-query) queries for a certain type of objects, Scene (S-query) queries depicting a scene with multiple types of objects in certain spatial relationships (Rong, Yang, and Hauptmann et
Mostrar más

181 Lee mas

Exploiting user context and preferences for intelligent web search

Exploiting user context and preferences for intelligent web search

The first query terms generated for a Web search may not provide the definitive results. However, comparing the set of search results to the user task can help to automatically refine subsequent queries. As a first approximation, we as- sume that documents that are similar to the user context are relevant to the user task (although a different scheme could be adopted [2, 11, 8]). Once the relevance of the retrieved material is estimated, we can proceed to assess the impor- tance of the terms found in the set of search results. This requires a framework for weighting terms based on context. Substantial experimental evidence supports the effective- ness of using weights to reflect relative term importance for traditional information retrieval (IR) [10]. The main pur- pose of a term weighting system is the enhancement of re- trieval effectiveness. The IR community has investigated the roles of terms as descriptors and discriminators for sev- eral decades. The combination of descriptors and discrim- inators gives rise to schemes for measuring term relevance such as the familiar term frequency inverse document frequency (TF-IDF) weighting model [10]. The TF-IDF scheme is a reasonable measure of term importance but is insufficient for the task domain for our research. Search- ing the Web to support context-based retrieval presents new challenges for formulation of descriptors and discrimina- tors.
Mostrar más

5 Lee mas

TítuloIntelligent retrieval for biodiversity

TítuloIntelligent retrieval for biodiversity

head and the dependent of a new link we must define. To do this, the user first select the future head for latter establishing the dependency simply by dragging with the mouse to the dependent. In order to distinguish links of this kind from those associated to the original bg , its graphical representation is slightly differ- ent, as we can see in Fig. 12. We are now ready to raise the new search process, simply by pressing the button , to obtain the set of answers shown in the documentary panel of Fig. 13. In contrast to the situation in the original query, the system now covers the whole spectrum of answers, including five approximate ones in the highest position on the list. Likewise, the conceptual panel now shows the concepts and relations involved in the projection associated to the visual query with respect to the document Loxograma latifolia. The picture allows us not only to contextualize the search process, but also to appreciate that the answer is an exact one since no transformations are needed to build the projection from the conceptual representation of the query in the conceptual panel of Fig. 12 to the text one in Fig. 13.
Mostrar más

29 Lee mas

Word Embeddings and Length Normalization for Document Ranking

Word Embeddings and Length Normalization for Document Ranking

Abstract—Distributed word representation techniques have been effectively integrated into the Information Retrieval retrieval task. The most basic approach to this is mapping a document and the query words into the vector space and calculating the semantic similarity between them. However, this has a bias problem towards documents with different lengths, which rank a small document higher compared to documents with larger vocabulary size. While averaging a document by mapping it into vector space, it allows each word to contribute equally, which results in increased distance between a query and the document vectors. In this paper, we propose that document length normalization should be applied to address the length bias problem while using embedding based ranking. Therefore, we have presented an experiment with traditional Length Normalization techniques over, word2vec (Skip-gram) model trained using the TREC Blog06 dataset for ad-hoc retrieval tasks. We have also attempted to include relevance signals introducing a simple Linear Ranking (LR) function, which considers the presence of query words in a document as evidence of relevancy while ranking. Our combined method of Length Normalization and LR significantly increases the Mean Average Precision up to 47% over a simple embeddings based baseline.
Mostrar más

9 Lee mas

Image Retrieval: The MIRACLE Approach

Image Retrieval: The MIRACLE Approach

The proposed experiments were designed to retrieve the relevant images of the collection using different query languages, therefore having to deal with monolingual and bilingual image retrieval (multilingual retrieval is not possible as the document collection is only in one language). Although there are clear limitations in the current ImageCLEF task, both in the size of the collection and the number of possible experiments to be carried out (six – one monolingual and five bilingual), it represents an interesting starting point to get an idea of the performance of CLIR systems, both in monolingual and bilingual searches, and promote research into this information retrieval field.
Mostrar más

10 Lee mas

Search in

Search in

Alejandro Gaona |Search in 14 La obra está compuesta de cinco lienzos y una escultura con imagines tomadas sobre mi vida. Estas están situadas de manera ascendente haciendo un guiño al dibujo de la evolución humana, haciendo una comparativa a la evolución de uno mismo. Estas imagines representada en los lienzos están pintadas con distintas técnicas, en tintas planas están realizadas las figuras humanas y el espacio está construido con elementos cuadrado con una pincelada más diluida y suelta. Hay un elemento que une la obra en un todo, son unas barras de color que hacen alusión a las pantallas de los televisores en los ochenta y noventa (Mira rainbow). He cogido este elemento como hilo conductor por varias razones; En primer lugar porque es un recuerdo que tengo de pequeño de verlo en la televisión, también son los colores luz, se utiliza en la producción de
Mostrar más

45 Lee mas

Spectrographic phase retrieval algorithm for femtosecond and attosecond pulses with frequency gaps

Spectrographic phase retrieval algorithm for femtosecond and attosecond pulses with frequency gaps

Spectral phase interferometry for direct electric fi eld recon- struction (SPIDER) [1] and frequency resolved optical gating (FROG) [2] are the most popular pulse reconstruction techniques. The chief obstacle to a successful characterization of pulses with frequency gaps is the appearance of relative-phase ambiguities [3]. Typical examples are soliton molecules in optical fi bers [4] and supercontinua generated in highly nonlinear germanosilicate bulk fi bers [5]. To overcome this problem, the non-self-referenced variants of FROG and SPIDER like XFROG [6] and XSPIDER [7] can be applied. However, because of the signi fi cance of the relative-phase ambiguity problem, a solution for self-referenced methods was searched for. Indeed, in 2010, Walmsley et al. found a solution to the problem for SPIDER as a self-referenced non- spectrographic technique [8]. Other self-referenced interfero- metric techniques, such as blind-MEFISTO [9], also solve the problem. Recently, it was shown that interferometric and non- interferometric pulse-measurement techniques react very differ- ently to an unstable train of ultrashort pulses [10]. A self-referenced spectrographic but non-interferometric technique solving the relative-phase ambiguity problem is called “ Very Advanced Method for Phase and Intensity Retrieval of E- fi elds ” (VAMPIRE) [4,11]. This technique is based on the blind-FROG scheme [12,13], however, containing a conditioning fi lter for preventing ambiguities. It was fi rst employed to characterize soliton molecules [4], and in 2010, it
Mostrar más

7 Lee mas

Identifying Information Behavior in Information Search and Retrieval Through Learning Activities Using an E-learning Platform Case : Interamerican School of Library and Information Science at the University of Antioquia (Medellín-Colombia)

Identifying Information Behavior in Information Search and Retrieval Through Learning Activities Using an E-learning Platform Case : Interamerican School of Library and Information Science at the University of Antioquia (Medellín-Colombia)

Within these curriculum plans, ICT manifests itself as a thematic cluster--the ICT cluster--within the administrative and academic organizational structure of the U of A. Professors are responsible for including ICT components within their specific courses or modules within courses. These courses or modules need not be directly attached to the ICT cluster, but should be closely related to the application of ICT in different theoretical- conceptual and practical aspects for information and knowledge management. This pattern

17 Lee mas

document

document

Diferentes literaturas especializadas muestran diversas maneras de definir los indicadores, así como diferentes recomendaciones acerca de su aplicación y metodología, luego de un análi[r]

52 Lee mas

Job Satisfaction and On-the-Job Search: A Theoretical and Empirical Approach

Job Satisfaction and On-the-Job Search: A Theoretical and Empirical Approach

The results reproduce some usual findings in the economic literature. Firstly, there is a differential impact by gender in job satisfaction for women (Clark, 1997). Secondly, there is a well-defined profile with a U-shape between age and job satisfaction (Clark et al., 1996) with a minimum of the convex relationship around 38 years. Regarding employment characteristics, the sign of the correlations shown by wage (positive) and number of working hours (negative) are the ones expected. All the coefficients associated to variables indicating subjective valuations of job aspects show the expected signs, being job stability and good relationships with managers those associated to a greatest marginal effect.
Mostrar más

20 Lee mas

Fast and private web search using Android devices

Fast and private web search using Android devices

Per al tractament de XML el servlet utilitza les classes “MyDom4jParser” i “MessageCreatorServer”. Aquestes classes del package “xml” uti- litzen la llibreria Dom4j per al parseig o transformació de document XML en objectes en memòria tot tenint en compte la definició XML Schema que s’ha fet dels missatges del protocol UUP en el fitxer “uup.xsd”. El servlet utilitza aquest fitxer de tipus XML Sc- hema definition per a comprobar la validesa dels missatges xml del protocol UUP en la versió del treball. Aquest fitxer es defineix a l’apèndix B i ha de ser accessible des del servlet. També permet manipular i modificar aquests documents. La classe “MessageCre- atorServer” s’encarrega de formar els missatges XML de resposta. Per a les operacions criptogràfiques s’ha creat la classe “Genera- tor” la qual es basa en la classe “BigInteger” de “java.math”. Aquest classe Generator permet generar el paràmetre enter i primer p i el paràmetre g. Tot això s’explica en l’apartat Criptografia més en- davant.
Mostrar más

126 Lee mas

MIRACLE’s Combination of Visual and Textual Queries for Medical Images Retrieval

MIRACLE’s Combination of Visual and Textual Queries for Medical Images Retrieval

Our main interest is not in experiments where only image content is used in the retrieval process. Instead, our challenge was to test whether the text-based image retrieval could improve the analysis of the content of the image, or vice versa. Results show that this hypothesis was right. Our combination of a “black-box” search using a publicly accessible content-based retrieval engine with a text-based search has turned to provide comparable results to other presumably “more complex” techniques. This simplicity may be a good starting point for the implementation of a real system.
Mostrar más

7 Lee mas

A structural and quantitative analysis of the webof linked data and its components to perform retrieval data

A structural and quantitative analysis of the webof linked data and its components to perform retrieval data

Regarding (Butt et al, 2015), the process of data retrieval is complex and has been divided in several steps. The following figure obtained from the previous paper explains the process. In the figure, the boxes are the steps of the process and the arrows how the data flows. The first step “Data Acquisition”, one of the most important, consists of using structured Semantic Web data crawlers for crawling data, (Van de Maele et al, 2008) or (Isele et al, 2010). The aim is to obtain linked data as quick as possible and in an efficient way. Then “Data Warehousing” is used to define which data the user is interested in, the automatization of its extraction, transformation and load. Between the first and the second step, we could find the “Reasoning” process. This step is necessary because sometimes data from the crawled data is inferred using reasoners, for example (Haarslev & Möller, 2003) or (Glimm et al, 2014). Once the data has been stored the process continues by giving a result to the user, as the amounts of data are very large and the infeasible times of response. To solve this problem, data is stored using an URI called key and it is also decided where to store the information in the disk. These kinds of techniques are called “Indexing” or “Ranking”, the difference is that the second one tries to give the most appropriate for the user query. After that, the data will be available to be retrieved. To access the data, applications provide a user interface where users write their queries. These queries go through a process of validation after accessing the data that the user wants to retrieve.
Mostrar más

152 Lee mas

Universidad, ciencia y formación investigativa / Universities, science and search training

Universidad, ciencia y formación investigativa / Universities, science and search training

Para abordar esta temática se seguirá un enfo- que temporal en el que se ubica la emergencia de la universidad como institución y algunas de sus transformaciones más importan[r]

12 Lee mas

Hierarchical region based processing of images and video sequences: application to filtering, segmentation and information retrieval

Hierarchical region based processing of images and video sequences: application to filtering, segmentation and information retrieval

In the last years, image processing has become an increasingly active field of development. The situation is motivated by the day to day increasing multimedia applications and by the increasing utilization of the World Wide Web. In most image processing based applications, an image is usually viewed as a set of pixels placed on a rectangular grid. Most of the developed work has been focused on the basis of this point of view. One of the main difficulties that arise when working directly with a pixel based representation is the large number of pixels composing the image or video sequence. Thus only rather simple algorithms may be applied on them. For instance, the classical DPCM coding strategy is an example of such type of processing. Moreover, in the JPEG [98] and MPEG-1 [30] standards, also devoted to coding purposes, the image or frame to code is divided into blocks of constant size without taking into account the spatial organization of the image. In the recent years, there is an increasing interest of interpreting the image as composed of a set of arbitrarily shaped regions or objects. The image is not understood anymore as a set of pixels, but as a set of regions composing the image. This has taken, for instance, to the development of the MPEG-4 [13] or MPEG-7 [83] standards. Both standards interpret the image or video sequences as a set of audiovisual objects. Thus new representations of the image – not pixel based – should be developed in which the notion of region is implicitly present. In other words, an abstraction from pixel to region should be done. Moreover, appropriate techniques that are able to deal with these representations should be developed.
Mostrar más

215 Lee mas

Ontology based intellectual properties (IP) search indexing and evaluation

Ontology based intellectual properties (IP) search indexing and evaluation

The preceding sections concludes the system specification; defining the sys- tem variabilities is an optional step during the process and it defines the product line for the modeled system; for instance removing some require- ments, or adapting the specific functionality of blocks would create a vast universe of systems where the specification and modeling is reused. In the SHRMS displaying the measurements locally may be desired on a building, but not on an aircraft; for a bridge we are only interested on capturing small vertical accelerations (1 axis accelerometer) whether buildings oscillations are interesting in all axis. Product line architects must define at least two additional models; system variability model and specific feature model. The first refers to optional requirements and the second to terminal block variants i.e. different encryption algorithms, communication standards, accelerome- ter types etc.
Mostrar más

45 Lee mas

Introducción a bases de datos y recuperación de la información / Introduction to Databases and Information retrieval

Introducción a bases de datos y recuperación de la información / Introduction to Databases and Information retrieval

Se evaluará tanto la capacidad y destreza del alumno para el análisis y resolución de problemas teóricos y prácticos aplicados en la asignatura. Se tendrá en cuenta [r]

7 Lee mas

The Informative Documentation and Retrieval of Written Information. New Competences for Cyberspace

The Informative Documentation and Retrieval of Written Information. New Competences for Cyberspace

Así, mientras el archivo de prensa escrita es consultado cada día menos por los periodistas para satisfacer sus necesidades informativas, debido al acceso direc- to a las fuentes que Int[r]

30 Lee mas

Show all 10000 documents...