• No se han encontrado resultados

SaaS storage provides easy-to-use cloud storage for the average end-user. In particular, SaaS storage allows users to backup the files stored in a particular folder of their computer over the service’s servers. Users, in addition, are able to use SaaS storage applications from multiple machines, which are automatically synchronised. Currently, all the main SaaS storage solutions are owned by companies that share only partial information about

3.4. Cloud Storage

their storage models and architecture design. Some SaaS storage solutions are: Drop- box [18], OneDrive [19], Google Drive [124], BlackBlaze [125], Amazon Drive [154], Apple iCloud [155], and Adobe Creative Cloud [156].

In the remaining part of this section, the Dropbox solution is described in detail, since its design and architecture is well documented. A brief summary of the Google Drive, OneDrive, and Adobe Creative Cloud solutions follows.

3.4.2.1 Dropbox

Dropbox is a storage solution that allows end-users to backup the files stored in a particular folder of their computer to Dropbox’s servers. Users can interact with Dropbox directly via a browser or through a Dropbox client installed on one or multiple machines, which are automatically synchronised with eventual consistency.

Files, Metadata, and Versions

In Dropbox, files are represented differently on the client- and the server-side. On the client-side, files are represented using the same abstractions as the local file system. On the server-side, files are represented as collections of immutable blocks of up to 4MB. Each file and its blocks are uniquely identified through their hashes (SHA-256), used to provide integrity of the stored data and block-level de-duplication. End-users can also set custom metadata for a given file, as explained in [157].

All files associated with a given user are said to belong to a specificnamespace. Files stored within a namespace are isolated, and thus not de-duplicated, with the files of other namespaces. In doing this, two identical files that have the same hashed identifier are stored twice on Dropbox server if belonging to different namespaces. This mechanism is used to prevent attacks where one can learn whether a particular file is stored in Dropbox and then mimic having the file, given the known hash, in order to gain access to it. This attack was shown for the first time in 2013 and is known as a dropship attack [158]. Shared folders represent namespaces over which multiple users have access to.

Furthermore, Dropbox retains previous versions of a file for a limited amount of time [159], which are accessible to users via the Dropbox web interface. The actual versioning model

is unknown.

Architecture

The known architecture of Dropbox (early 2012) consists of three important com- ponents: the client-side, the storage servers, and all the services that control and store metadata information about users, data, and notifications [160, 161] (see Figure 3.14).

This clear separation of services has enabled Dropbox to scale to hundreds of millions of users. When data is uploaded/downloaded to/from Dropbox, the client interacts with the load balancer to establish a connection with the block server, which is then responsible for controlling the rest of the data transfers. Drago et al. [160] have measured that over 90% of the Dropbox traffic (in terms of throughput) is due to data being uploaded/downloaded, which is handled by the storage services while keeping the interactions with the metadata services to a minimum. Figure 3.14 shows Amazon S3 as being the IaaS storage behind Dropbox, but most of the storage has been migrated [162] toMagic Pocket, Dropbox’s own IaaS storage infrastructure, over the last few years. All data stored on Dropbox is protected using AES 256-bit encryption. Moreover, uploaded files are processed through a metadata server, which stores all the relevant metadata information about the file on database [163], while the notification server is informed of any changes within the namespace and notifies all the clients that have access to it.

A user making changes to a file, or directory, from two or more client machines can result in a conflict state. Dropbox attempts to resolve the conflicts without any human intervention when synchronising the clients with the Dropbox service. When Dropbox is unable to resolve the conflicts, the conflicting versions of the file are stored on the client side. The user then has to resolve the conflicts manually, by deleting all the unwanted versions of the file.

3.4. Cloud Storage

Clients

DB Block Server AWS S3 Meta Server Notification

Server Load Balancer

In-Mem Cache

Figure 3.14: Dropbox architecture (early 2012) serving 50 million users. The arrows indicate the main direction of the requests made among the components involved. Diagram derived from the video at [161, minute 23].

Magic Pocket

Magic pocket is an immutable block storage system that Dropbox, Inc. has developed to substitute Amazon S3 over the years. Magic pocket is designed to provide secure and highly available storage [164].

Magic pocket stores files as a collection of blocks of size up to 4MB. All blocks are compressed, encrypted, and assigned to a unique key, such as a SHA-256 hash of the block. Blocks are aggregated together in buckets (1GB in size maximum) in order to improve IO performance when large amounts of data is moved/copied between Dropbox storage servers. In addition, blocks that are uploaded around the same time are stored closely together, thus exploiting temporal locality.

LAN Sync

The main performance bottleneck of SaaS storage is the bandwidth limit between the client and the service’s servers. Dropbox attempts to overcome this bottleneck via LAN Sync [165]. LAN Sync is a feature built in Dropbox clients that can be optionally enabled to allow data exchange directly between clients that reside in the same network. Each client periodically broadcasts its presence over UDP (on port 17500) to other machines

in the network, which are always listening. Whenever a file has to be downloaded, the client asks all other known nodes in the local network if they have it, otherwise the file is downloaded directly from Dropbox. LAN Sync operates on data files only and within the scope of the client’s namespace. Metadata, instead, is always synchronised with the Dropbox servers, so that it is easier to enforce consistency across multiple clients.

The Placeholder Metaphor

Smart Sync (previously known as Project Infinite) is a feature of Dropbox that allows users to handle very large collections of data using a limited amount of hard disk space by storing only references — placeholders — of files and folders, which are retrieved on-demand. Unfortunately, Smart Sync is available only to Dropbox business users and its internal structure and format could not be studied.

3.4.2.2 Google Drive and OneDrive

Google Drive is a SaaS storage solution developed by Google LLC [124]. The client-side data model provided by Google Drive is based on files and folders, exactly like Dropbox. Moreover, the data managed by the Google Apps (e.g., Google Docs or Google Photos) can be automatically stored to Google Drive. Google Drive also supports the placeholder metaphor [166].

OneDrive is a SaaS storage solution provided by Microsoft Corp. [19], similar to both Dropbox and Google Drive. Lindley et al. [167] have recently proposed an alternative metaphor to the file, called thefile biography, in which a file is represented as an entity that changes over time through versions and can exist in multiple locations, through actions like sharing, copying, licensing, and cloning, while retaining its unique identity. In the examples presented in the paper, data stored under the file biography metaphor is integrated with OneDrive and Microsoft Word, so that its new semantic operations are preserved as data is synchronised to/from the cloud.

3.4. Cloud Storage

3.4.2.3 Adobe Creative Cloud

Adobe Creative Cloud (CC) is a collection of software applications and cloud services, developed by Adobe Systems, Inc., for creative professionals who work with digital pho- tography, videos, and graphics in general [156]. CC provides a similar set of services to the other SaaS storage solutions presented above. The file metaphor used by the CC mobile clients, however, differs from the standard file representation used by other SaaS storage solutions. Goldmanet al. have proposed and implemented DCX (Digital Composite Tech- nology), a manifest-based data format that aggregates multiple independent components of a file together [168]. For example, a Word or PDF document can be represented in DCX as a collection of multiple components, containing the text, the formatting, the provenance, or the images. Using the DCX abstraction it is possible to have (I) semantic de-duplication, where semantically meaningful components of the file (e.g., images, text, and metadata information of a PDF file) are de-duplicated rather than blocks of fixed size; (II) different data views based on context; and (III) the ability to record the provenance of the data independently of the data itself. (I) enables better distribution and synchronisation by avoiding files being fragmented in blocks, which are semantically unrelated. (II) results in better network usage and user experience for end-users, but it also means that a file can be perceived and used differently depending on context. Provenance, as in (III), can help end-users to understand better content and how, where, and when it originated. Further, DCX files can be embedded within other DCX files, so that it is possible to create even richer data abstractions.

The implementation of the DCX format can also be used through other SaaS storage solutions as well.84

84Adobe Systems, Inc. has released an implementation of the DCX format for Dropbox under Apache

License 2.0. https://github.com/Adobe-Digital-Composites/Digital-Composites-ObjC [last accessed on 29/03/2018].

Documento similar