EL MODELO RELACIONAL Y EL LENGUAJE SQL - Ambiente de simulación de bases de datos distribuidas

There are far fewer studies on subtitling MOOCs by MT than on subtitling films and TV shows by MT, perhaps because MOOCs have just become popular in the past ten years. Although the number of relevant projects is small, the technology deployed for MT in MOOCs is advanced. For example, as with Georgakopoulou’s study (2012), Orlič et al. (2014) investigated ASR technology in their project named transLectures (Transcription and Translation of Video Lectures). This project was designed to develop advanced automatic transcription and MT (between Slovene and English) for two websites containing video lectures. They particularly focused on IT tools that can be deployed to help improve the quality and the speed of automatic transcription and translation. Engaged in the same project as Orlič et al., Miró et al. (2015) evaluated the efficiency of the manual review process of automatic subtitles and compared it with the conventional generation of video subtitles from scratch through transLectures. They recruited lecturers and volunteers as reviewers and explored the evaluation from two perspectives: the quality of automatic transcription or translation; and the time the lecturers spent reviewing the automatic subtitles. There were four tasks: 1) review of Spanish transcriptions; 2) review of English transcriptions; 3) review of Catalan transcriptions; and 4) review of Spanish to English translations. Non-native speakers of English (volunteers) took task 2), and lecturers took the other three tasks. The methods adopted for user evaluations were WER (word error rate), TER (translation edit rate) and RTF (real time factor). TER is determined by the minimum number of edits required to transform a MT output into its corrected version. RTF measured the speed of monitoring

the transcriptions of a video (lecturers were asked to monitor the automatic transcriptions for their videos, which means they play the videos with automatic transcriptions, when they spot an error, they can correct it and the video is automatically paused), while WER measured the accuracy of the automatic transcriptions (see the formula below for further explanation).

RTF = time devoted to reviewing the transcription or translation of a video / duration of the video

WER = number of basic word editing operations required to convert the automatic transcription into the correct reviewed transcription / total number of words in the reviewed transcription

TER = number of edit operations required to convert the automatic translation into the correct reviewed translation / total number of words in the reviewed translation

Miró et al. found that the time required for reviewing automatic subtitles (transcriptions and translations) in all three languages is significantly less than the time required for manual transcriptions and translations. Regarding the quality between the automatic subtitles and those translated from scratch, no difference was found.

Another project that used ASR and MT for lectures is a 30-month (Feb 2014 – Jul 2016) pilot action named EMMA (European Multiple MOOC Aggregator),34_{which provides a}

system for delivering free MOOCs from various European universities. Its website supports nine languages (Catalan, Dutch, English, Estonian, French, Italian, Portuguese,

Spanish, and Polish). The project provides automatic transcription in seven languages (Dutch, English, Estonian, French, Italian, Portuguese, and Spanish) and MT into English, Italian and Spanish. The transcriptions and translations are reviewed by lecturers so as to reach publishable quality. Most MOOCs in the project are bilingual (original language plus English) or trilingual (Italian, French or Spanish as a third language). Miró et al. (2018) claim that both the transLectures and EMMA projects have shown that domain-adapted ASR systems have reached a mature level, allowing us to generate low-cost automatic subtitles of (nearly) publishable quality in most cases. Their quality and efficiency evaluations for the EMMA project and the UPV media repository (a large video lecture repository developed by Universitat Politècnica de València) have confirmed their statement. As in the transLectures project mentioned at the beginning of this section (Miró et al., 2015), they measured the transcription quality using WER, the translation quality using TER, and the post-editing time for automatic transcriptions and translations using RTF. Results show that compared to generating subtitles from scratch, the ASR/MT systems can result in time savings of 25% - 75%, and the accuracy of the machine translated subtitles is such that they are worth post-editing. The multilingual subtitles of MOOCs have a positive impact on attracting students and have increased the student enrolment rate on EMMA by 70%.

The latest study on machine translated subtitles of MOOCs is the TraMOOC project35

carried out jointly by academic researchers, industrial organizations and user partners. The project has built a system named Translexy36_{to automatically translate English-}

language MOOCs into 11 languages (Bulgarian, Chinese, Croatian, Czech, Dutch, German,

35_{https://tramooc.eu}_{(Accessed: 5 December 2019)} 36_{http://www.translexy.com}_{(Accessed: 5 December 2019)}

Greek, Italian, Polish, Portuguese, Russian). Kordoni et al. (2016) point out that crowdsourcing was used in the project for both quality evaluation and creating parallel corpora. The evaluation of output quality in TraMOOC relies on a multimodal schema that involves error type markup, an error taxonomy for translation model comparison, explicit evaluation (automatic and human evaluation), implicit evaluation (by using topic identification which focuses on topical information elements like named entities, events and specific terms in source and target texts, and sentiment analysis that extracts users’ opinions about the MT text from forums) etc. Kordoni et al. indicate that by using this model, more in-domain parallel data with better quality can be fed back to the translation engine for the purpose of higher quality output. In order to create parallel corpora for the educational domain, Behnke et al. (2018) report that they applied crowdsourcing with strict quality controls. They found that their crowdsourced data outperformed both the general-domain baseline systems and the systems fine-tuned with pre-existing in-domain corpora all the way.

A unique feature of TraMOOC is that in addition to automatically translating subtitles, TraMOOC also translates other types of MOOC text, such as forums, assignments, reading materials, etc. The TraMOOC platform supports a wide range of file formats including SRT, WebVTT, SCC, Microsoft PPT, XML, to name just a few. The platform has already been successfully integrated into and tested in the openHPI MOOC platform37

and the VideoLectures.NET library.38_{The first MOOC that was automatically translated}

by the platform from English into the abovementioned 11 languages is entitled “In-

37_{https://open.hpi.de}_{(Accessed: 5 December 2019)} 38_{http://videolectures.net}_{(Accessed: 11 March 2019)}

Memory Data Management 2017” and offered by Prof. Hasso Plattner,39_{and in which}

over 7,000 students had enrolled by March 2019.

In document Ambiente de simulación de bases de datos distribuidas (página 119-125)