• No se han encontrado resultados

Energía solar del futuro: paneles solares flexibles

There are five main formats that are used to create IMOs on the Web today:

1. Flash: Adobe Flash applications are the most widespread type of interactive applica- tions on the Web today. 96 % of all Web browsers have a Flash plugin (OWL, 2012) and are therefore able to show Flash applets. The files are SWF files, which can also be indexed using a special parser that Adobe developed for search engines (Adobe, 2008). Using this parser, search engine bots can simulate users and virtually click through the content of the application. This way, they are also able to read text and extract images from the Flash file. Since HTML 5 is trying to provide all the features of Flash using an open standard, Flash can now be exported to HTML 5 Canvas as well.

2. Silverlight: Silverlight is the Microsoft pendant for Adobe’s Flash. Browsers need a special plugin in order to see the applications, but only roughly 60 % of all Web browsers have this plugin installed (OWL, 2012). The application is packaged in a XAP file, which contains an application markup file (XAML), the application itself (DLL), and additional contents such as images and videos. Search engines can also look into these XAP files, read the XML, and find the additional contents. The developer, however, decides how much information is put into these files to make it easier for search engines. 3. Java Applets: Java Applets are Java-based applications that run in the Java Vir- tual Machine on computers that have a Java runtime installed. Java is widespread – approximately 79 % of computers have a Java runtime. Still, Java applets are in the

1

GSMArena uses little 3D applets for many of the reviewed phones: http://www.gsmarena.com/samsung_ galaxy_ace_s5830-3d-spin-3724.php, last accessed on 25th of March 2012

2

For example, for the movie Ice Age a series of games have been developed: http://www.y8.com/games/ Ice_Age_Dawn_of_the_Dinosaurs, last accessed on 25th of March 2012

Extraction of Interactive Multimedia Objects and Images 165

minority and are also quite difficult to index for search engines. The code is packaged in a JAR file in CLASS files. Similar to Java applets, Java FX runs on any client that can host the Java Virtual Machine. The focus of JavaFX is platform independence, so the applications can run on desktop computers, smart phones, and even some television sets.

4. QuickTime Applets: Apple’s QuickTime is a multimedia architecture consisting of a framework, an API, and a data format. QuickTime applications can embed multimedia content and interaction features. The file formats are usually MOV, QT, and QTVR, and search bots do not have easy access to the file contents. Additionally, only about 59 % of the browsers had QuickTime installed (OWL, 2012).

5. HTML 5: HTML 5 is a set of new markups that aim to replace the need for proprietary applets such as Flash and Silverlight. Modern browsers already support large parts of

the standard3 and no plugin is necessary to see the applications. The most important

tag in HTML 5 is the canvas, which allows a developer to draw on it. This way, features such as video playback, which is still the main application for Flash, can be done without proprietary tools. HTML 5 objects are easier to index since all the parts that make the application are on the Web server (or reachable over another one) and no proprietary formats such as Flash need to be employed. The division of the actual application into several files such as JavaScript and Cascading Style Sheets, however, also poses a problem for our purpose of finding the “object” and referring to it. An HTML 5 application is essentially an HTML page.

8.1.2 Related Work

In general, we can distinguish between two approaches in finding and extracting IMOs: context-based and content-based. Searching the context of an IMO candidate can give more information on which topic/entity is presented in the IMO. For example, when we detect an IMO on a Web page, we can read the text around the IMO to get a better understanding of what the IMO is about. WebSeer (Frankel et al., 1996) indexes images using terms found in HTML headlines, ALT-tags, hyperlinks, and other context features. The content-based approach, on the other hand, does not always work since it is sometimes difficult to search inside the IMOs content when it is delivered in a binary form. Yang et al. (2005) and Meng and Liu (2008) have shown, however, that searching the contents of Flash applets can be rewarding. They were able to find components such as images and videos, and could also detect user interaction components such as buttons and scrollbars.

Since it is not possible to perform generic content-based extraction techniques on all the main formats we have described, we use a context-based approach with only minor content- based features for Flash. We manually searched for eight entities for each of the concepts Mobile Phone, Printer, Movie, Car, and Headphone on Google to find out how IMOs can be found manually and which formats dominate. As a result, we discovered that Flash is the dominating format for IMOs. For all 40 entities we found at least one Flash IMO (Werner, 2010).

3

Wikipedia keeps a compatibilty chart up-to-date: http://en.wikipedia.org/wiki/Comparison_of_ layout_engines_(HTML5), last accessed on 25th of March 2012

Documento similar