The generic storage layer provides a ubiquitous resilient mutable storage facility, for un- structured data, with an historical record. To support the historical record, updates are appended rather than being destructive. The main entities supported are data blocks, PIDs and GUIDs. All entities are identified with a unique key from the underlying P2P infras- tructure.
• A data block contains unstructured immutable data.
• A PID (Persistent Identifier) is used to denote a particular data block.
• A GUID (Globally Unique Identifier) is used to denote something with identity, such as a file, object or directory (that means meta-data).
Figure 3.3: ASA Storage Model [49]
Figure 3.3 shows how historical data is associated with PIDs and GUIDs. Additionally to maintaining historical versions of data items, the generic storage layer replicates data and meta-data on multiple nodes, and actively maintains these replicas as nodes fail, misbehave or leave the network. It is insufficient, however, for data to be replicated; the replicas must also be accessible. The placement of replicas is thus organised so as to reduce the probability of a malicious node being able to hinder access to particular replicas. Originally
inserted data is referred to with a PID created by hashing its content and replicas of this data item are distributed as determined by the cross algorithm (section 3.2). Storage hosts for meta-data are determined by GUIDs, which are derived from randomly selected P2P keys. Thus the authenticity of a specific data item can be verified with this data item’s content and the key with which it is addressed. In the original ASA implementation a replication factor of four is chosen.
Access to any file stored in ASA results in a sequence of operations in which meta-data and data items are fetched. A file is specified with a file path, where each element of the file path represents meta-data identified by a GUID. As meta-data is not self-verifying, at least three meta-data items have to be fetched before the data item can be fetched, in accordance with the protocol outlined in [15]. The meta-data maintains pointers to historical versions of the data. The default configuration ensures that the latest version of a data item is fetched at any request for data. In the original implementation four replicas are available for every data block (if none of the replica holding servers failed). Only one of these data items needs to be received by the requesting client. Its authenticity can be verified by recomputing the PID with the data item’s content. After receiving the data item, the meta-data items for the next child directory can be fetched and so forth.
3.4
Conclusions
This chapter explained the fundamental building blocks of the Autonomic Storage Archi- tecture ASA relevant for the work carried out in this thesis. It showed how the generic
storage layer utilises properties of the P2P infrastructure and provided an understanding of how specific effects on the P2P infrastructure affect the generic storage layer. For instance, incorrect key to host mappings in the P2P infrastructure would result in failed accesses to data items or would cause the replication mechanism to break.
Optimisation of P2P Overlays
Outline
This chapter investigates the scope for optimisation of P2P overlays with autonomic man- agement in order to improve performance and resource usage. It outlines related work and some background on the maintenance mechanisms of existing structured P2P overlay networks. It introduces an autonomic management mechanism, the goal of which is to detect whether maintenance effort is wasted, or required, in order to adapt the scheduling mechanisms appropriately.
4.1
Introduction
As outlined in chapter 3, ASA and other distributed storage systems use structured P2P overlay networks to provide scalable data location facilities even in the presence of churn in the network membership. The wide usage of P2P overlay networks in distributed storage systems is the motivation for this investigation.
In structured P2P overlay networks, each node maintains a set of nodes as itspeer-set. The peer-set is used to make routing decisions, and to adapt the overlay network to new nodes joining and existing nodes leaving or failing. The validity of the peer-set in existing P2P overlay networks is checked (and if required, repaired) periodically. These periodic opera- tions are referred to asmaintenanceoperations in this work. Each maintenance operation involves the exchange of one or more messages with other nodes in the overlay network. This means that each maintenance operation requires the usage of some network resources. The optimal scheduling of such maintenance operations depends both on the workload – that is, the pattern of routing calls applied to the network – and the churn in network mem- bership – that is the temporal pattern with which nodes join or leave the network. For example, if the network membership is completely static, then the optimal behaviour is to perform no maintenance, since it represents pure overhead and network resources are used up unnecessarily. Conversely, under rapid network churn it is beneficial for nodes to expend significant maintenance effort in order to sustain a high success rate for routing operations.
statically fixed interval. Workload and churn will often vary dynamically. Even if a pre- determined fixed interval is appropriate for the initial circumstances, it will cease to be so as conditions vary. An autonomic management mechanism is proposed for dynamically con- trolling maintenance scheduling, by adapting the interval between maintenance operations, in response to changing conditions. The proposed management mechanism is governed by a policy which has the objective to detect and correct unsatisfactory situations with respect to performance and resource consumption. Autonomic management may balance resource usage and performance better than a statically configured system.
This chapter is structured in the following way: in section 4.2 peer-set maintenance proto- cols of existing overlay networks are outlined. Related work on optimisation of P2P overlay networks in the presence of membership churn is discussed in section 4.3. This is followed by section 4.4 where the problems of current approaches are summarised and autonomic management is introduced to address the outlined problems. In section 4.5 an autonomic manager for the scheduling of maintenance operations in a P2P overlay is proposed. This is followed by some conclusions in section 4.6.