Laburpena
Txosten honen helburu nagusia Euskararen Datu-Base Lexikalaren (EDBL) diseinu berria aurkeztea da. Datu-base hori hizkuntzaren tratamendu automatikorako euskarri lexikal orokorra da. Aurkezten dugun eskemak bere baitan ditu, gure iritziz, LNPrako datu- base lexikal batek izan behar lituzkeen ezaugarriak.
EDBLren diseinua azkenekoz aldatu zenetik (Agirre et al., 94) bost urte joan dira, eta epe horretan aurkituriko akatsak eta hobetu beharrak gauzatzera dator proposamen berri hau. Bestalde, interfaze-aplikazioa ere zaharkituta geratu da, eta gaur egungo programek erabili ohi dituzten ahalmen grafikoak izango lituzkeen interfazea eman nahi diogu datu- baseari; areago, EDBL interneteratzea helburu dugun momentu honetan.
Aurreko EDBLrekiko alderik nabarmenenak hauexek lirateke:
Eskema kontzeptual berritua, non hiru espezializazio nagusi bereizten baitira, hiru aspekturi erreparatuz: unitatearen estandartasun-estatusa, unitatea hiztegi- sarrera den ala ez, eta unitatearen grafian zuriunerik dagoen ala ez. Bestalde, eskema berrian aipatzekoak lirateke, besteak beste: lema kanonikoaren kontzeptu berria, aldaera edo errore tipikoen tratamendu zehatzagoa, morfotaktikari dagokion atal zuzendu eta aberastua, hitz anitzeko unitate lexikalak errepresentatzeko eredu berria, eta mantenimenduko taulen hobekuntza.
Interfaze berri eta moderno baten proposamena.
Informazioaren esportazioa SGMLz egiteko proposamena eta definizio formalak.
Abstract
This report presents the new design of the Lexical Database for Basque (EDBL). This database is the general lexical support for the automatic treatment of the language. The schema introduced here represents, in the opinion of the authors, a compendium of the characteristics that a lexical database for NLP purposes should meet.
Having past five years since the last changes were made to the design of EDBL (Agirre et al., 94), the objective of the present proposal is to correct the flaws found in it and to make the necessary improvements facing forward. Moreover, the interface application has also become out of date, and we would like to provide the database with an interface that will have the graphical capabilities that nowadays programs use to have; even more when one of our present goals is to put EDBL in the Internet.
Following are mentioned the outstanding aspects in which the new design differs from the "old EDBL":
A renewed conceptual schema with three main specialisations defined regarding to three different aspects: whether the entry is standard Basque or not, whether it is a dictionary entry or not, and whether it includes blanks in its spelling. Other aspects of the new schema that deserve to be mentioned are the following: the newly introduced canonical lemma concept, a more precise treatment of variants and typical errors, the corrected and enriched section on morphotactics, a new model to represent multiword lexical units, and the improvement made in the design of support tables.
The proposal for a new and modern interface.
A proposal for the exportation of data in SGML and its necessary formal definitions.