Diseño - DESARROLLO DE LA SOLUCIÓN - UNIVERSIDAD PERUANA LOS ANDES

CAPÍTULO IV: DESARROLLO DE LA SOLUCIÓN

4.2 Diseño

Getput is written in python and is available as source from Getput on github.

Getput v0.0.7, which is the version available on github at the time of this writing, requires python-swiftclient v1.6.0 or later. On Ubuntu 12.04, you should be able to use these instructions to install this package.

sudo apt-get install python-software-properties sudo add-apt-repository-cloud-archive:havana

sudo apt-get update && apt-get install python-swiftclient

Alternatively, you can create /etc/apt/sources.list.d/cloudarchive-havana.list with these contents: deb http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/havana main deb-src http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/havana main And then install the package using:

sudo apt-get update && apt-get install python-swiftclient

Running the test versus the Ceph cluster as configured required commenting out x-trans-id references for responses: #transID = response['headers']['x-trans-id']

transID = '' Test Parameters

Using version 1 authentication on the object gateway, we set up a resource file to define the environment variables for access.

export ST_AUTH=http://<load balancer address>/auth/v1.0 export ST_USER=<main user name>:swift

export ST_KEY="<swift subuser secret key>"

Full utility parameter help is available through man pages and running with ‘--help’. For the test run, these parameters were used:

• -n <NOBJECTS>: container/object numbers as a value or range. • -o <ONAME>: object name prefix

• -c <CNAME>: name of container

• -t <TESTS>: tests to run, these can be gpd (GET, PUT, DELETE). • -s <SIZESET>: object size(s)

• --procs=<PROCSET>: number of processes to run

Fio

Installing

Under Ubuntu, fio can be fetched as a standard package sudo apt-get install fio

Test Parameters

• --rw <IO Pattern>: Type of IO pattern for the test. Used read and write for sequential tests, randread, randwrite and randrw for the various mix tests.

• -ioengine=<IO execution method>: Defines how the job issues IO. The engine used was libaio, for Linux native asynchronous IO.

• --runtime=<maximum time to run test>: Terminates processing after specified time. 30 minutes was used. • --numjobs=<number of workload clones>: Defines processes/threads performing same workload of this job. Used

default of 1.

• --direct=<Boolean>: Selects use/non-use of buffered IO. Direct was set to 1 (true). • --bs=<size of IO>: Block size for IO units. 8k was used for random, 256K for sequential IO.

• --time_based: Force the test to run for full runtime even if there’s already complete coverage of the file.

• --size=<total IO size of job>: fio runs to cover this size, but size of device and time limit were controlling factors for this test.

• --iodepth=<units in flight>: Number of IO units to keep in flight against the file. A depth of 8 was used for testing. • --name <job name>: In this context, used /dev/rbd1 to specify job name and the device file being targeted. • --rwmixwrite=<% mix writes>: Percentage of mixed workload to make writes. The mix test used 30. • --rwmixread=<% mix reads>: Percentage of mixed workload to make reads. The mix test used 70.

Collectl

Installing

Written by Mark Seger (the author of getput), collectl is available as a standard distribution package. sudo apt-get install collectl

Test Parameters

Collectl has an extensive list of parameters to capture and play back information. CPU, Disk and Network data are captured by default, so the only parameter added during capture was –f for redirecting playback output to a file.

Playback of the results used these parameters, and was then post-processed to get average data:

• -s <subsystem>: This field controls which subsystem data is to be collected or played back for. A subsystem of ‘c’ specifies playing back CPU data.

• -p <Filename>: Read data from the specified playback file(s).

HAProxy

Installing

Under Ubuntu, haproxy can be fetched as a standard package. sudo apt-get install haproxy

Configuration

This is a minimal configuration that forwards 80 and 443 to the monitor/gateway boxes and uses source balancing to keep a best-effort client/server affinity.

#this config needs haproxy-1.1.28 or haproxy-1.2.1 global

log 127.0.0.1 local0 log 127.0.0.1 local1 notice #log loghost local0 info maxconn 4096 #chroot /usr/share/haproxy user haproxy group haproxy daemon #debug #quiet defaults log global mode http option dontlognull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 listen http_proxy :80 option httplog mode http balance source

server hp-cephmon01 192.168.0.10 check server hp-cephmon02 192.168.0.11 check server hp-cephmon03 192.168.0.12 check

listen https_proxy :443 option tcplog mode tcp

option ssl-hello-chk balance source

server hp-cephmon01 192.168.0.10 check server hp-cephmon02 192.168.0.11 check server hp-cephmon03 192.168.0.12 check

Glossary

• Cold, warm and hot storage—Temperature in data management refers to frequency and performance of data access in storage. Cold storage is rarely accessed and can be stored on the slowest tier of storage. As the storage ‘heat’ increases, the bandwidth over time as well as instantaneous (latency, IOPS) performance requirements increase.

• CRUSH— Controlled Replication Under Scalable Hashing. The algorithm Ceph uses to compute object storage locations. • Epoch—Ceph maintains a history of each state change in the Ceph Monitors, Ceph OSD Daemons and PGs. Each version

of cluster element state is called an “epoch.”

• Failure domain—An area of the solution impacted when a key device or service experiences failure.

• Federated storage—A collection of autonomous storage resources with centralized management that provides rules about how data is stored, managed and moved through the cluster. Multiple storage systems are combined and managed as a single storage pool.

• Object storage—A storage model focusing on data objects instead of file systems or disk blocks; objects have key/value pairs of metadata associated with them to given the data context. Typically accessed by a REST API, designed for massive scale and using a wide, flat namespace.

• PGs—Placement Group. A grouping of objects on an OSD; pools contain a number of PGs and many PGs can map to an OSD.

• Pools—logical partitions for storing objects. Pools set ownership/access to objects, the number of object replicas, the number of placement groups, and the CRUSH rule set to use.

• RADOS—A Reliable, Autonomic Distributed Object Store. This is the core set of storage software which stores the user’s data in a Ceph Cluster (MON+OSD).

• REST—Representational State Transfer is stateless, cacheable, layered client-server architecture with a uniform interface. In this cluster, the REST APIs are architected on top of HTTP.

In document UNIVERSIDAD PERUANA LOS ANDES (página 51-56)