ESTRÉS LABORAL
10. DEFINICIÓN DE LOS TRABAJOS 1. INTRODUCCION
10.2. FERRALLA ARMADA EN OBRA O EN TALLER Y MONTAJE
In total, this research analysed the content of 166 course sections in the case LMS. This counts for approximately 11% of all courses at NTNU [88]. In total 3483 stand-alone documents published from these courses. These documents are referred to as “the final dataset”.
The pre-study showed that much content is reused when a course is offered multiple times, e.g. in the spring of 2005 and later in the spring of 2006. To avoid duplicate documents created by the same publishers, this research excluded courses with identical course names, with only the most recently offered course analysed. In addition, courses that were related to this research were excluded from the final dataset. In total 32
courses were excluded from analysis due to these two issues. This includes some courses that were used in the pre-study phase.
General statistics
The final dataset consisted of 3483 documents. There were a total of 41 different stand-alone document formats that had been published. Of these, three document formats dominated the statistics: Adobe PDF documents (1943 documents, 55.8%), MS Word (DOC) (745 documents, 21.4%) and MS PowerPoint (PPT and PPS) (475 documents, 13.6%). These document formats comprised 91% of the documents in the dataset. This research effort was thus concentrated on these document formats.
Figure 43: Stand-alone document format types (number of documents for each document format)
Figure 44: Stand-alone document types based on the document formats' primary usage area and Dublin Core “Types”
Figure 44 shows the published document types based on the types as presented in Chapter 4.2.1.
Videos are included in the LMS only to a very limited extent. Instead there is extensive use of hyperlinks to external video sources. To create such references, the LMS “Link”
document type is frequently used.
In total, 164 different element types with entities were harvested from the documents in the final dataset. These elements were located in the sections presented in Table 10. A number of these elements reflected the same issue of interest. For example, at least 5 elements reflected the “Title” element13. Even when duplicate elements reflected the same document, all entities do not have to be identical. This issue is further discussed in Chapter 4.3.
Table 10: Recorded elements
Element section Number of elements
General elements 33 elements
Dublin Core 5 elements
EXIF 40 elements
IPTC 12 elements
PDF 11 elements
PDFX 15 elements
Photoshop 4 elements
RDF 8 elements
TIFF 11 elements
XAP 21 elements
All stand-alone documents were given at least seven elements regardless of document content, as discussed in Chapter 4.2.1. The average document contains 21.35 elements, with as many as 61 elements collected as the maximum. The majority of documents contained between 16 and 30 elements, see Figure 45. The use of elements varied extensively between document formats. The PDF, Word, JPEG and PowerPoint document formats contained the greatest number of elements, see Table 11.
13 “DC. Title,” “iptcbylinetitle,” “PDF. Title,” “Title” and “XAP. Title” plus possibly “Name.”
Figure 45: Number of metadata elements collected per stand-alone document
Table 11: Number of elements per stand-alone document format
PDF Word PowerPoint Excel JPEG TIFF GIF TXT HTML All formats
Minimum 10 18 11 7 8 8 8 8 8 7
Mode 26 21 19 12 8 8 8 8 8 26
Maximum 60 24 24 15 61 8 8 8 10 61
Average 23.6 21.3 19.1 11.8 20.5 8.0 8.0 8.0 8.2 21.4
Median 26 21 19 12 8 8 8 8 8 21
In addition to the elements presented in Table 10, the most common elements were
“Author” (76.8%), “Pages” (75.9%) and “Title” (73.8%). These are all elements that are common in multiple document format schemas. A number of the sub-schema sections presented in Table 10 refer exclusively to technical issues. For example, the EXIF, IPTC, Photoshop and TIFF sections only contained content referring to photo technical properties. The majority of sub-schemas were located in JPEG images and PDF documents. The TIFF documents present the same opportunities for metadata descriptions as JPEG documents. Still, TIFF documents only contained just above the minimum of metadata elements. No TIFF images contained TIFF metadata (!). Only specific PDF documents contained TIFF metadata (PDF documents can contain full-word TIFF and JPEG images). The entities included issues such as the camera brand, shutter speed, white balance and colour settings. These elements contain entities that this research cannot verify. These elements have not been included in the analysis efforts.
Number of elements per file
0 200 400 600 800 1000 1200 1400
6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65 Number of elements
Number of files Number of files
Use of elements with differ from the pre-study dataset
The “Description” element was not found in the pre-study dataset, although this element was used in the final dataset. The first chapter presents these observations. The
“Keywords” element was observed in PDF documents that presented commercial content. The second describes other observations regarding this element based on the final dataset. More document formats were observed using identifiers. These
observations are presented in the third chapter. The other elements analysed in the pre-study were in line with the final dataset. These observations are not presented, as they appear to be almost duplicates of the pre-study results.
The Description element
The final dataset contained a number of elements that reflected the “Description”
element in the IEEE LOM schema and the “Subject” element in Dublin Core:14. One Word document (0.1%) contained a “Comments” element, which was a date, although no other information was provided with it. This limits the usability of this element since there is insufficient information to interpret the data. This date was not identical to any of the other embedded data elements. The document was based on an official NTNU template that does not contain this entity. This indicates that the user has specified this entity, though it is not possible for this research to determine what this entity refers to.
Nineteen PowerPoint documents (4.0%) contained a “Comments” element. These all referred to the document templates upon which the documents were based.
Twenty-four PDF documents (1.2%) contained a “DC. Description” element:
x Five entities were valid entities created by the user. These entities contained keywords from the subject at hand.
x Fifteen documents contained entities that were number codes (e.g. 725-403) or default values (e.g. WithoutName-7). These documents were created using the
“Adobe PageMaker 7.0” application. The number code entities were all identical to the “Title,” “PDF. Title,” “Subject” and “DC. Subject” elements. No templates were recorded for these documents. However, based on extensive visual similarities, it appears that these documents were based on the same template, which was a building legislation template. There was only one section that was visually the same as the “DC. Description” entity. This section only contained strictly standardized number codes. The variations discovered in the
“DC. Description” element was not found in this section. This researcher concludes that the user has specified this element, although it is not possible to conclude which element was the original or correct element. All the documents
14 Elements: “Comments,” “Notes,” “PDFX Comments,” “DC Description,” “XAP Description,”
“Subject,” “DC Subject,” “PDF Subject,” “PDFX EmailSubject,” “Category,” “iptccaption” and
“iptcbyline.”
were renamed to receive standardized document names based on the number codes (e.g. “725403”).
x Three documents contained the entity “Image,” which was automatically recorded by a scanner application.
x One document contained an entity with content intended for other elements15. This has been recognized as a problem for PDF documents that have been created using the PDF converter application included in the Mac OS X operating system. This results when the converter application specifies metadata that are not in accordance with the PDF standard.
Eighteen PDF documents (0.9%) contained a “PDF. Subject” element.
x Fifteen documents that were created using “Adobe PageMaker 7.0” contained entities identical to their “DC. Description” element.
x The three remaining documents were created using the most common PDF creator application “Acrobat Distiller 5.0 (Windows).” These “PDF. Subject”
entities contained keywords derived from the subject at hand. One of these documents was created using a non-standardized driver. This was the only document that contained a “PDF. Subject” element, but no “DC. Subject”
element.
A single PDF document (0.1%) contained a “PDFX. Comments” element. This was an extensive description of the actions performed by the user. This element was not repeated in any other elements, not even the “PDF. Comments” element. A commonly used application and application driver were used for document creation in this circumstance.
Keywords
Thirteen Word documents (1.7%) contained a “Keyword” element. All these elements referred to the document template that was used.
Seven PDF documents (0.4%) contained a “Keyword” element. All these elements referred to the document template that was used or commercial content from the converter application.
Two PDF documents (0.1%) contained a “PDF. Keywords” element. These elements referred to the document template used, and were identical to the “Keywords” element.
These observations confirm that the embedded “Keywords” element is not used by users. This element is instead used to distribute template information and commercial content. As the entities did not reflect the documents at hand in accordance with common metadata schemas, the embedded entities related to “Keywords” elements are hence of very low semantic quality.
15 “Capturefile: C:\Documents and Settings\Administrator\Desktop\New England\1D\
38AB1307.TIF, CaptureSN: 0000138A.014829”
Identifier
In the pre-study dataset, identifiers were only located in PDF documents. In the final dataset identifiers were located in selected JPEG and PSD image documents as well, as shown in Table 12. The percentage of use among PDF documents is almost the same in the pre-study and final datasets.
Table 12: Identifiers within stand-alone documents (both datasets) rdfabout xapMMDocumentID
JPEG 1.6% 35.5%
PDF 33.3% 53.6%
PSD 0.0% 100.0%