• No se han encontrado resultados

2.1.2.1 Location

Location is the “the place” where data is stored and from which it can be fetched. However, this definition is ambiguous, since it is not always clear what and where this place is and what the relationship with the logical data is. So we must distinguish location into two types: the physical location and the logical location of the data.

Thephysical locationof data indicates the place where it is stored on physical storage media, such as a disk or a recording tape. However, understanding how data is managed in different physical locations is beyond the scope of this work and will not be discussed further in this thesis.

The logical location of data is the place where it is stored in relation to other data or other abstractions, such as directories for file systems, from a user/application level perspective. In a flat file system, which has only the root directory, the logical location of the files and their names are one and the same. In hierarchical file systems, the logical location of a file is related to its name and the structure of directories that contain it. Directories are the abstractions that allow files to be organised in hierarchies. The textual representation of logical locations is thepath. The following, for example, is a path for a file contained under the desktop directory, which itself is contained under the home directory: home/desktop/a file

The path of a file consists of the definition of the name of the file and the description of the logical location of the file in relation to directories and other files. Paths can be absolute or relative. Absolute paths locate a file regardless of the current working directory the user is in, while a relative path — as the name suggests — is a path that is built relatively to another path.

2.1. Data Storage Concepts

2.1.2.2 Naming

“Naming is a mapping between logical and physical objects” [31, p. 707] and allows users to refer to files and directories using human-readable names even though these are physically stored in a storage media and accessed by using physical “hard-to-read” ad- dresses.13 The resolution of the naming mappings, such as the one below, is performed

by thename service of a system. A typical name service for a local file system resolves the user-level name to its physical address, which corresponds to its actual location, as following:

user level textual name → system level identifier→ physical address

Naming is also very important in distributed systems, where data is distributed across multiple nodes (see Glossary of Terms for the definition of node). In distributed systems, naming and location are related one another and the following two properties must be taken into account when looking at naming in a system [31, 32]:

• Location Transparency. The name associated with the data is not related to its physical storage location.

• Location Independence. The name associated with the data does not need to be changed when its physical storage location changes. A name service providing location independence dynamically maps names to locations.

Finally, the collection of all the names allowed in a system defines the namespace. There are two types of namespaces: local and global. Names in a local namespace are unique only within a given node or set of machines. Systems using a local namespace can concatenate the host name with the data name to provide a wider access to such data. The web is an example of a system where file names are concatenated to a host name (i.e., http://〈domain〉/〈filename〉) in the form of a URI (Uniform Resource Identifier). This technique, however, conflicts with the location transparency and location independence

13

A physical address is a sequence of digits that indicate the actual location of the data on the physical storage and it is hard-to-read for humans.

properties described above, since the URI is related to the location of the data and needs to be changed if the data is moved somewhere else. On the other hand, names within a global namespace are unique over all the nodes of a system.

The remainder of this section briefly describes the URI, the globally unique identi- fier (GUID), and content-addressable naming schemes, necessary to understand the next chapters.

Uniform Resource Identifier

A URI is a string of characters that identifies a resource and follows the general scheme:

protocol:[[user[ password]@]host[ port]][path][?query][#fragment]

which is described in detail in the RFCs 2396 [33] and 3986 [34].

There are two main types of URIs: locator identifiers, or URL (Uniform Resource Locator), and name identifiers, or URN (Uniform Resource Name). URLs define the location of the resources, with the protocol describing how resources should be accessed. Examples of URLs are:

• http://www.ietf.org/rfc/rfc2396.txt • mailto:[email protected]

• https://doi.org/10.1109/JPROC.2010.2096170

URNs, unlike URLs, do not contain any hint about the location of the content. Exam- ples of URNs are:

• urn:ietf:rfc:2648

• urn:guid:7A7B95D784007B930599D491366E0C272D119EB0 • urn:isbn:0451450523

2.1. Data Storage Concepts

Globally Unique Identifier

A GUID (Globally Unique Identifier) is a number, of any length, used to uniquely identify a resource in a system. That means that two resources cannot have the same GUID, unless they are actually representing the same entity (i.e., they are the same sequence of bytes). The RFC 4122 [35] defines a GUID as a 128-bit number that can be generated in multiple ways (e.g., using a hash function) and that describes a resource uniquely within the system adopting it. GUIDs shorter or longer than 128-bit can also exist. The following is an example of GUID in hexadecimal format (32 digits, 4 bits each):

123e4567-e89b-12d3-a456-426655440000

Content-Addressable Naming Schemes

A content-addressable naming scheme allows storage systems to bind data to names that are related to its actual sequence of bytes, such that a different sequence of bytes must be bound to another name. In such schemes, the name of some content is usually generated by applying a cryptographically secure hash function on the content (see Section 2.1.8.2), so that a change to the content results also in a change in the name.

The purpose of content-addressable naming schemes is to abstract data from locations, so that the users of such storage system are able to access data irrespective of where it is stored or where it is accessed from.