ESTRÉS LABORAL
9. PRIMEROS AUXILIOS
9.3 SOPORTE VITAL BASICO (SVB)
Document templates are the basis for creating stand-alone documents in the user controlled environment as well as the system controlled environment. The distinction
between the two environments is their use of content creation software, the number of available document templates and enforcement of the template.
Figure 32: Blank Word template Figure 33: Blank PowerPoint template
Figure 34: NTNU lecture slide PowerPoint template
Figure 35: NTNU thesis PowerPoint template
In the user controlled environment it is up to the individual user to decide which content creation software to use and how to use these tools in order to generate the document, including its metadata, formatting and intellectual content. The templates can include or be without visual content. Figure 32 shows the MS Word default template “blank.dot,”
while Figure 33 shows the MS PowerPoint default template “blank.pot,” which contains visual content. Organizations can use templates to create a common identity and to standardize the appearance of official documents, as in the templates in Figure 34 and Figure 35.
The content of templates can be a disadvantage in regards to AMG if undesired or unintended content in the template can be inherited by documents that use the template.
For example, several of NTNU’s stand-alone document templates contain elements with pre-defined entities:
x Creator = “O. Rakel”
x Title = “Line one”
If the document’s elements are not updated with valid entities, then the resulting document will contain false metadata that reflects the template and not the resulting document.
Creating a new stand-alone document
To illustrate the processes involved with the creation of a new, stand-alone document, this research presents the creation of a Word document. The creation of Word documents takes place in the user controlled environment of his or her local personal computer. To do this, the user uses the personal computer to access the MS Word content creation software application. This application automatically opens its default template when creating a new document. This is normally the “blank.dot” template, which does not contain visual content. However, it does include page layout information, template identification and text formatting styles.
Figure 36: Creating a new stand-alone document
After the template is opened and presented to the user through the graphical user interface, the user is allowed to make changes to the new document. This is where the user first experiences creating document content. Here the user is allowed to use his or her creativity to develop the new document content and present its intellectual content.
Saving the stand-alone document
When the user gives the application the command to save, a number of actions are automatically performed:
x If there is no “Title” element recorded, then an algorithm is executed to generate this element. This algorithm collects data from the first line of text. The “Title”
element is also used as the default document name. The file name may be changed, although the “Title” element is not automatically changed.
x The system clock is used to generate the “Creation date” element. If the user has printed the document, a “Last printed date” element is included with the collection of data from a temporary recording of the system clock at the time of printing.
x The application’s user profile is used to populate the “Author” and “Company”
elements.
x Technical metadata are generated by algorithms that analyze the document to retrieve entities for elements such as the number of “Characters,” “Words” and
“Pages.” Other technical elements are collected from the template including page size (e.g. “Letter” or “A4”), margins and orientation (“Landscape” or
“Portrait”).
x All the metadata are placed within the document’s metadata section.
x The intellectual content included by the template and the user (excluding metadata) is placed in the main document section of the document code.
Extensive formatting descriptions are included so that all the properties of the document are kept. This includes text style formatting, language, imported content, etc.
x The document format extension is automatically changed from the template format (“.dot”) to document format (“.doc”).
This shows that there are a number of different factors that influence the content of each document’s metadata elements and document content:
x The actions performed by the content creation software application x The document template
x The user’s performed actions x The application’s user profile x The system clock
x The application’s metadata generating algorithms
Figure 37: Saving a new document
Editing an existing stand-alone document
Based on the saved document, all the characteristics of the document should be retrievable from the document code. When opening an existing document for editing, the document code is used to bring all the document’s characteristics back into the application’s domain. The main document is presented to the user ready for editing.
Selected metadata elements and their entities are presented though the graphical user interface, normally the pages element (e.g. “Page: 3 of 5”), Words (e.g. Word: 680) and Language (e.g. English (U.S.)). The entities for these elements are automatically updated as the user edits and navigates within the document.
Figure 38: Editing an existing stand-alone document
Re-saving an existing stand-alone document
If the user gives the command to re-save the document, then a number of actions are performed to place the application’s information about the document back into the document code. However, this saving process is not identical to the first time the document was saved:
x The title-generating algorithm is not executed since the document already has a metadata “Title” element. User-specified updates of the visual title of the document are not used to update the existing, embedded metadata “Title”
element.
x The system clock is used to generate the “Modified date” element. If the user has printed the document since it was last saved, then the “Last printed date”
element is updated.
x The application’s user profile is used to update the “Last Author” element.
x The application once again executes an algorithm to collect and update the existing, embedded technical metadata elements.
Figure 39: Re-saving an existing stand-alone document
Converting a stand-alone document
Many document creators choose not to publish their original documents. The reasons for this may include a desire to restrict usage and editing opportunities, and to ensure that the document is presented in a specific way. There are multiple ways in which a conversion can take place.
Within the case LMS, 87% of PDF documents were confirmed converted using a converter application running on the user’s own computer, 7% used an online web application, 2% were scanned print-outs and 4% were missing “Producer” metadata. A total of 137 applications and application versions were recorded as having been used.
Converting PDF documents using a web application differs from traditional applications
by requiring the user to store the original document before the conversion process can take place. Documents that are not stored before being converted (on the user’s
computer) do not go through the initial storage process, and hence the metadata-specific storage processes described in Chapter 4.1.3 are not necessarily executed. This increases the uncertainties regarding the content of the converted document’s resulting metadata.
The remainder of this chapter presents a conversion process as if executed from the user interface of the original document format’s native application.
Figure 40: Converting a previously saved document
When the user gives the command to convert a document into a PDF document, this starts a new sequence of events. The document content and metadata are collected as if the document were to be saved (see Chapter 4.1.3) or re-saved (also see Chapter 4.1.3).
However, instead of placing these data in a document, they are transferred to the domain of the converter application. It is then up to the converter application to decide what
should be kept as document content and metadata, and what should be changed. In this process the user may be allowed to make adjustments, e.g. specify security restrictions.
The main task of the conversion application is to generate a PDF document with a visual appearance as similar as possible to the original document. Since a conversion process changes the characteristics of the document, many of the embedded metadata elements do not reflect the converted document. It is therefore common practice for the embedded entities to be discarded and replaced by metadata generated by the converter application. As with the original document creator application, the converter application can collect data from a range of data sources:
x New semantic metadata are created based on another “Title” algorithm.
x New technical metadata are created based on the new technical characteristics of the document.
x The converter application’s user profile is used for creating “Author” elements.
Online converter services commonly use alternative data to be included in the
“Author” element.
x Some converter applications allow the user to make corrections to the semantic metadata elements.
x The system clock is used to give a new time of when the converted document was created.
x The document content is re-formatted to the new document format. Existing non-visual formatting (e.g. formatting styles and language tags) is discarded.
x Finally the new document content and the new metadata are placed as document code within a new document.
Converted documents therefore reflect both the original document creator application and its application domain, and the application and the application domain of the converter application. As a result, there can be extensive differences between the content of the original document and the converted document. This reflects both the document’s metadata and the content of the main document content section.
4.1.4 Summary
There is a clear distinction between documents created in the system controlled
environment and stand-alone documents created without system enforced control. In the system controlled environment, the user is required to use system-specific applications that are not influenced by the user’s personal computer or local software. All documents are based on predefined, system-specific templates. The application can enforce mandatory elements and restricted value spaces. To some extent, such applications can validate text-based entities provided by the user. Through log-in features, the system has full control over who the user is, the sections in which the user is allow to create documents, and hence the context in which new documents are created. This does not assure that all data sources from the system controlled environment are correct, high quality entities. However, the system controlled environment ensures consistency in the created documents while avoiding local interpretations and variations. This ensures that countermeasures can be effectively enforced if false content is detected.
Stand-alone documents can be created in an infinite number of ways. The source code of these documents reflects the computer system of the creator, the content creation software, the templates that were used and the actions performed by the user. Validation of the user’s actions is not undertaken. Converting documents between non-compatible document formats further increases the uncertainties regarding the document code. As a result, stand-alone documents can be quite diverse with different document codes, even though the visual appearance of the documents is identical. In order to find common structures and consistency within the pre-study dataset of stand-alone documents, this research examines entities from such documents in Chapter 4.2.