• No se han encontrado resultados

Comparación critica con la literatura existente

From the experiment results, it was found that the cataloging time was

significantly reduced with the proposed visual enhancement. The results also indicated that the system has the potential to improve metadata quality, although the increase of measured scores did not achieve a significance level of 0.05. The absence of significance

in metadata quality improvement may be explained by several reasons. First, catalogers are very familiar with the current system but it was their first time using the proposed system. Trust and job-related outcome expectation have been seen as important constructs of users’ acceptance and usage behavior of a technology (Venkatesh, et al., 2003; Cody-Allen & Kishore, 2006; Lee & Lei, 2007). The uncertainty about the new system and Health Topics it suggested, and the potential future consequence may all have an impact on their performances. To reduce the distrust and anxiety about the possible outcome brought by the new system, a follow-up study should be conducted after users become familiar with it, and fully understand how it works and what would probably happen to their work if the visual enhancement is really implemented. Second, catalogers at NC Health Info have, in the past, sought a consensus on topic metadata via iterative annotations between each other (Blake, et al, 2005). The experiment forced them to make the decision at once on their own, which may influence their decision making. Third, topic pairings which are common and established combinations of Local Terms and Health Topics developed by catalogers are frequently used in NC Health Info to facilitate the cataloging process. The current study only focused on the assignment of Health Topics due to a pressing need to reduce the time required for generation and maintenance of this type of metadata. Although the lack of Local Terms was one of reasons for the lack of insignificance in quality improvement, it noted the importance to employ the associations of Local Terms and Health Topics in the next step. Last, the suggested Health Topics were not complete due to accidental human errors and difficulties handling semantic issues, such as a general topic “Pain” versus a specific topic “Back Pain”, and implicit concepts unidentifiable with the current approach. In addition, users were limited to select only from Health Topics suggested when using the proposed system. The above mentioned conditions impeded them from some relevant Health Topics which were not recognized by the system. The identification and highlighting of potentially relevant Health Topics were executed manually in this study due to time limitation and in order to determine a proof-of-concept. A sophisticated automatic algorithm or system should be introduced and work as the back-end support of the proposed interface. Furthermore, users should be allowed to assign Health Topics which are not suggested by the system and to recommend new pairings of related Local Terms and Health Topics. Several

functions of the current system such as topic definitions look-up and Health Topics in the same group browse should also be integrated into the proposed system for their reference.

Although the quality of Health Topic assignment was not enhanced significantly in the study, it seems that there is a potential to achieve the goal once the noted

limitations are addressed. Moreover, important facts are found in the results. During the experiment period, it was proved that the important concepts relevant to the collection usually gather on several and certain types of pages, such as services and programs, and via presenting only the important pages and highlighting all the potentially relevant Health Topics could be very helpful to improve the assignment performance

quantitatively. Considering the cataloging task could be iterative and tedious, a second saved is a second earned. This time saving could eventually contribute to a cost savings and better service. Although some common Health Topics such as “Medicine” and “Pain” are often suggested and may contribute “noise” to relevance judgments when only occurrence frequencies were considered in the study, participants responded that these kinds of common topics only required a glance in order to make the decision with the visual enhancement. They further indicated that important concepts would not be missed even if related Health Topics occur only once or twice because all possibly relevant Health Topics are suggested. Although human catalogers tend to make false negative or false positive errors, they are able to identify implicit concepts related to the collection, such as the examples of “Heart Disease in Women” and “Financial Assistance” in the last section. Human errors may be alleviated via training, and it would be more cost effective and quality assured than to develop and employ a fully automatic system.

5.1.2 Perceptions

The proposed visual enhancement was also reviewed from user’s point of view. The results demonstrated notable changes with three perception measurements PEOL, PEOU, and PU, although they did not reach the significant level. Again, the lack of significance in perception improvement may be attributed to the fact that catalogers are very familiar with the current system and they have no difficulty manipulating it now, while the proposed system is totally new to them. Nevertheless, the average perceptions of three measurements toward the proposed system were rated higher, and all participants

were very positive about it. From their responses in the post-test questionnaire and the quotations in the table below (Table 8), it is confirmed that the visual enhancement proposed in the study is helpful to reduce catalogers’ cognitive and work load when performing Health Topic assignment and the implementation of the proposed interface is welcome.

“It is really exciting!”

“It was very helpful to see how they (the terms) were used (in context).” “They (the suggested terms) are already there for me.”

“I don’t have to think hard or spend too much time.” “There’s no need to go back and forth.”

“It is the least effort.”

Table 8. Quotations from Open-Ended Discussions

5.2 Limitations

A number of limitations stemming from practical research constraints have been mentioned above, such as uncertainty and anxiety of the possible outcome resulted from the proposed system, the difference between the experiment and the real world, the lack of pairings with Local Terms and a supportive automatic algorithm/system, and the missing useful functions originally provided by the current system. Additional limitations noted here are as follows: Firstly, only four catalogers participated in the study due to knowledge and experience requirement, and only twelve resources selected from the same level of difficulty assigned with no more than twenty Health Topics were cataloged. More catalogers should be involved and resources at different levels of difficulty should be tested. A longitudinal study should be able to more realistically reflect the effects of the proposed visual enhancement for metadata generation. Secondly, although catalogers were not required explicitly to participate and they all gave positive responses to the study and the system, the work environment setting is mandatory itself. Willingness to volunteer is also one of major constructs to users’ intention and usage behavior of a technology (Sharp, 2007). Thirdly, the questions used to measure users perceptions were mostly adopted from the previous NC Health Info usability study (Ellington, 2004). Although these questions were simple and straightforward, they may not have allowed the research to fully study users’ perceptions toward the two systems, given that they

were developed for a previous study. To closely approach users’ perceptions, multi- dimensional questions should be developed and tested instead. The additional limitations noted here also stem from practical research constraints and could be addressed in future research.

5.3 Recommendations

This study proposes a series of recommendations based on the research, including the experiment results. The recommendations consider the visual enhancement for

metadata generation tools and automatic algorithms/systems to be implemented as the back-end support are proposed. First, catalogers’ identification and relevance judgment of Health Topics should be further studied to improve the automatic algorithms. The

occurrence frequency of Health Topic alone is not sufficient to extract the implicit concepts specifically needed by the collection. Secondly, the effect of presenting the distribution of occurrence frequencies remains unknown since some participants like it and some neglected it. There is a need to clarify what information would assist in relevance judgment and how it should be presented in addition to the application of KWIC and highlighting. Third, even general Health Topics only take users a few seconds to decide whether they are worth further examination, it may be helpful to give different weights according to comparative generalization of Health Topics. For example, “Back Pain” should receive higher weight than “Pain”. Lastly, special care should be taken to implement the automatic algorithm/system to support the proposed interface, especially while calculating occurence frequencies of Health Topics and executing the highlighting. For instance, if “Back Pain” is detected and counted, the last half of it which is “Pain” should not be identified as the general term “Pain”. Also, be careful not to break the original HTML structure when making highlighting effects. Only contents within <body> and </body> and also outside of HTML tags should be processed. It is hoped that some of these suggestions could be pursued in the future.

6 Conclusions

In conclusion, the proposed visual enhancement via KWIC and highlighting combining human intelligence and automated technology was able to reduce the

assignment time for Health Topics in NC Health Info, The experiment results suggested a potential for metadata quality improvement. Users’ perceptions toward the new system were positive, indicating that the implementation of the semi-automatic approach is welcome. Future studies should be conducted to address the noted limitations and move this research further, in order to improve metadata generation within NC Health Info and other Go Local initiatives.

Acknowledgements

The author would like to thank Dr. Jane Greenberg, and NC Health Info director, Christie Silbajoris for their unceasing encouragement and constructive suggestions; and Beth Ellington, Linda Johnsen, and Megan Rudolph for their participation, time, and valuable feedback.

Bibliography

Anderson, J.D., & Perez-Carballo, J. (2001). The nature of indexing: how humans and machines analyze messages and texts for retrieval. part I: research, and the nature of human indexing. Information Processing and Management, 37(2), 231-254. Ardö, A., & Koch, T. (1999). Automatic classification applied to the full-text Internet

documents in a robot-generated subject index. Online Information 99: 23rd International Online Information Meeting Proceedings, 239-246.

Attardi, G., Gulli, A., & Sebastiani, F. (1999). Automatic Web page categorization by link and context analysis. Proceedings of THAI'99, European Symposium on Telematics, Hypermedia and Artificial Intelligence, 105-119.

Baldonado, M.Q.W., & Winograd, T. (1998). Hi-Cites: dynamically created citations with active highlighting. Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, 408-415.

Banerjee, K. (1997). Describing remote electronic documents in the online catalog: current issues. Cataloging & Classification Quarterly, 25(1), 5-20.

Blake, C., et al. (2005). Cataloging on-line health information: a content analysis of the NC Health Info portal. Proceedings of the Annual American Medical Informatics Association Conference (AMIA 2005), 56-60.

Brown, T.J. (1991). Visual display highlighting and information extraction. Proceedings of the Human Factors Soceity 35th Annual Meeting, 1427-1431.

Byrd, D. (1999). A scrollbar-based visualization for document navigation. Proceedings of the Fourth ACM Conference on Digital Libraries, 122-129.

Calvo, R.A., Lee, J.M., & Li, X. (2004). Managing content with automatic document classification. Journal of Digital Information, 5(2). Retrieved November 2, 2007, from http://jodi.tamu.edu/Articles/v05/i02/Calvo/

Chi, E.H., et al. (2004). eBooks with indexes that reorganize conceptually. CHI Extended Abstracts 2004, 1223-1226.

Cho, W.C., & Richards, D. (2004). BayesTH-MCRDR algorithm for automatic

classification of Web document. Lecture Notes in Computer Science, 3339, 344- 356.

Ciravegna, F., & Wilks, Y. (2003). Designing adaptive information extraction for the Semantic Web in Amilcare. In S. Handschuh & S. Staab (Eds.), Annotation for the Semantic Web. IOS Press, Amsterdam, The Netherland

Cody-Allen, E., & Kishore, R. (2006). An extension of the UTAUT model with e-quality, trust, and satisfaction constructs. Proceedings of the 2006 ACM SIGMIS CPR Conference on Computer Personnel Research, 82-89.

Ellington, B. (2004). An evaluation of cataloging input interface tools for NC Health Info System. Unpublished manuscript.

Ellis, D., & Vasconcelos, A. (1999). Ranganathan and the Net:using facet analysis to search and organize the World Wide Web. Aslib proceedings, 51(1), 3-10. Erdmann, M., et al. (2000). From manual to semi-automatic semantic annotation: about

ontology-based text annotation tools. Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content, 79-86.

Feldman, R., Dagan, I. &, Hirsh, H. (1998). Mining text using keyword distributions. Journal of Intelligent Information Systems, 10, 281-300.

Flannert, M.R. (1995). Cataloging Internet resources. Bulletin of the Medical Library Association, 83(2), 211-215.

Fox, S. (2005). Health Information online. Pew Internet & American Life Project. Retrieved April 3, 2008, from

http://www.pewinternet.org/pdfs/PIP_Healthtopics_May05.pdf

Fox, S. (2006). Online health search 2006. Pew Internet & American Life Project. Retrieved April 3, 2008, from

http://www.pewinternet.org/pdfs/pip_online_health_2006.pdf

Fox, S., & Fallows, D. (2003). Internet health resources. Pew Internet & American Life Project. Retrieved April 3, 2008, from

http://www.pewinternet.org/pdfs/PIP_Health_Report_July_2003.pdf

Fox, S., & Rainie, L. (2000). The online health care revolution. Pew Internet & American Life Project. Retrieved April 3, 2008, from

http://www.pewinternet.org/pdfs/PIP_Health_Report.pdf

Garofalakis, M.N., et al. (1999). Data mining and the Web: past, present and future. Proceedings of the 2nd International Workshop on Web Information and Data Management, 43-47.

Gietz, P. (2001). Report on automatic classification systems. Retrieved November 2, 2007, from http://www.daasi.de/reports/Report-automatic-classification.html

Golub, K., & Ardö, A. (2005). Importance of HTML structural elements and metadata in automated subject classification. Proceedings of ECDL 2005 - the 9th European

Conference on Research and Advanced Technology for Digital Libraries, 368-378. Greenberg, J. (2003). Metadata and the World Wide Web. In M.S. Drake (Ed.),

Encyclopedia of Library and Information Science (2nd ed., pp.1876-1888). New York: Marcel Dekker, Inc.

Greenberg, J. (2004). Metadata extraction and harvesting: a comparison of two automatic metadata generation applications. Journal of Internet cataloging, 6(4), 59-82. Greenberg, J., et al. (2002). Iterative design of metadata creation tools for resource authors. Retrieved November 3, 2007, from

http://www.siderean.com/dc2003/202_Paper82-color-NEW.pdf

Greenberg, J., Spurgin, K., & Crystal, A. (2006). Functionalities for automatic metadata generation applications: a survey of metadata experts' opinions. International

Journal of Metadata, Semantic and Ontologies, 1(1), 3-20.

Harper, D.J., Coulthard, S., & Sun, Y. (2002). A language modeling approach to relevance profiling for document browsing. Proceedings of the 2nd ACM/IEEE- CS joint conference on Digital libraries, 76-83.

Hearst, M.A. (1995). TileBars: visualization of term distribution information in full text information access. Proceedings of ACM Human Factors in Computing Systems Conference (CHI ’95), 59-66.

Hilligoss, B., & Silbajoris, C. (2004). MedlinePlus Goes Local in NC: the development and implementation of NC Health Info. Journal of Consumer Health on the Internet, 8(4), 2004.

Hirokawa, S., Itoh, E., & Miyahara, T. (2003). Semi-automatic construction of metadata from a series of web documents. Proceedings of Australian conference on

artificial intelligence, AI-03, Lecture Notes in Computer Science, 2903, 942-953. Hussam, A., et al. (1998). Semantic highlighting. CHI 98: Conference Summary on

Human Factors in Computing Systems, 191-192.

Irvin, K.M. (2003). Comparing information retrieval effectiveness of different metadata generation methods. Unpublished master’s thesis, University of North Carolina-

Chapel Hill.

Jenkins, C., et al. (1998). Automatic classification of Web resources using Java and Dewey Decimal Classification. Computer Networks & ISDN Systems, 30, 646- 648.

Käki, M. (2006). fKWIC: frequency-based keyword in context index for filtering web search results. Journal of the American Society for Information Science and Technology, 57(12), 1606-1615.

Koch, T., & Vizine-Goetz, D. (1999). Automatic classification and content navigation support for Web services: DESIRE II cooperates with OCLC. OCLC. Retrieved October 29, 2007, from

http://digitalarchive.oclc.org/da/ViewObject.jsp?objid=0000003489

Koehler, W.C. (1999). Classifying Web sites and Web pages: the use of metrics and URL characteristics as markers. Journal of Librarianship and Information Science, 31(1), 21-31.

Kris, C. (2007.) A dynamic learning object life cycle and its implications for automatic metadata generation. Unpublished doctoral dissertation, Katholieke Universiteit

Leuven, Leuven, Belgium.

Lawrence, S., & Giles, C. (1999). Accessibility of information on the Web. Nature, 400, 107-109.

Lee, C.B.P., & Lei, U.L.E. (2007). Adoption of e-government services in Macao. Proceedings of the 1st International Conference on Theory and Practice of

Electronic Governance, 217-220.

Liddy, E.D., et al. (2002). Automatic metadata generation & evaluation. Retrieved November 5, 2007, from

http://web.syr.edu/~diekemar/ruimte/Papers/p401liddy.pdf

Lin, S.H., et al. (1998). Extracting classification knowledge of Internet documents with mining term associations: a semantic approach. Research and Development in Information Retrieval, SIGIR, 241-249.

Litvak, M., Last, M., & Kisilevich, S. (2007). Classification of Web documents using concept extraction from ontologies. Lecture Notes in Computer Science, 4476, 287-292.

Loia, V., & Luongo, P. (2001). An evolutionary approach to automatic Web page

categorization and updating. Lecture Notes in Artificial Intelligence, International Conference on Web Intelligence(WI), 477-478.

Madria, S.K., et al. (1999). Research issues in Web data mining. Proceeding of Data Warehousing and Knowledge Discovery, First International Conference, DaWaK1999, 303-312.

Marchionini, G. (1995). Information Seeking in Electronic Environment. Cambridge: Cambridge University Press.

Marchiori, M. (1998). The limits of Web metadata, and beyond. Computer Networks and ISDN Systems, 30, 1-9.

Mase, H. (1998). Experiments on automatic web page categorization for IR system. Technical report, Stanford University, California. Retrieved October 15, 2007, from http://citeseer.ist.psu.edu/cache/papers/cs/1719/http:zSzzSzwww-

db.stanford.eduzSzpubzSzpaperszSzhisao.pdf/mase98experiments.pdf

Meire, M., Ochoa, X., & Duval, E. (2007). SAmgI: Automatic Metadata Generation v2.0. Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007, 1195-1204).

Muller, et al. (1999). Automatic classification of the World-Wide Web using the Universal Decimal Classification. Proceedings of the 23rd International Online Information Meeting (Online99), 231-237.

NC Health Info. (2003). NC Health Info documentation: NC Health Info cataloging training manual. Unpublished manuscript.

Nielsen, J. (2005). Ten usability heuristics. useit.com. Retrieved March 3, 2008, from http://www.useit.com/papers/heuristic/heuristic_list.html

Noh, S., et al. (2003). Classifying Web pages using adaptive ontology. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2144-2149. Pandia. (2007, February 25). The size of the World Wide Web. Pandia Search Engine News. Retrieved March 3, 2008, from http://www.pandia.com/sew/383-web- size.html.

Pew Internet & American Life Project. (2008). Retrieved March 15, 2008, from http://www.pewinternet.org/PPF/c/5/topics.asp

Sedano, J.M. (1964). Keyword-in-context (KWIC) indexing: background, statistical evaluation, pros and cons, applications. Unpublished doctoral thesis, Pittsburgh University. Retrieved March 28, 2008, from

http://stinet.dtic.mil/cgibin/GetTRDoc?AD=AD443912&Location=U2&doc=Get TRDoc.pdf

Sharp, J.H. (2007). Development, extension, and application: a review of the technology acceptance model. Information Systems Education Journal, 5(9), 1-11.

Song, M.H. (2006). Ontology-based automatic classification of Web documents. Lecture Notes in Computer Science, 4114, 690-700.

Spink, A., et al. (2001). Searching the Web: the public and their queries. Journal of the

Documento similar