Las etapas del desarrollo lingüístico bilingüe

CAPÍTULO 1: BILINGÜISMO ADQUISICIÓN BILINGÜE Y MEZCLA DE

3. LA ADQUISICIÓN BILINGÜE SIMULTÁNEA: LAS ETAPAS DEL DESARROLLO

3.1. Las etapas del desarrollo lingüístico bilingüe

We conduct experiments on both direct attached cloud storage and network based cloud storage and test our mechanism of separation storage verification.

Local Isolation Checking

Localisolation checking aims to verify if two conflicts files are stored separately in the attach based cloud storage. We set up a virtualization environment to simulate the attach based cloud storage. The host machine has an Intel Core i5 CPU 3.10 GHz and 4 GB RAM. Two Seagate ST3500418AS hard disks attached with the local computer have the same capacity (500 GB), disk cache1_{size (16 MB), average seek}

time (less than 8.5 ms), latency (4.16 ms) and rotation speed (7,200 rpm). To reduce the disk activities triggered by the operating system (OS), we turn off all possible background processes in Ubuntu 12.04 (64-bit) operating system and keep the CPU utilization lower than 1 % such that the disk contentions from other processes can be minimized. Xen [7] virtualization platform is installed on the host machine. The management domain Dom0 is located at the first hard disk sda. Each guest VM can be launched on any hard disks-sda and sdb.

Disk Contention Benchmark. The isolation verification on single file is conducted

on a guest VM that is installed on sda. One logical volume on sdb is attached to the guest VM. The activities of both Dom0 and guest VM OS have minimal impacts on the disk access. We generate random files with sizes from 16 MB to 4 GB on the disk.

Four factors will affect the block reading time from different parts of a single file. They are the block size for each read, the reading pattern (sequential/random), file size, and OS page cache (enable/disable). We test the impacts of all above factors and determine the query mode for verifying the isolation of two conflict files. Generally, the seek time dominates the reading time for small blocks. The blocks with the size smaller than 64 KB tend to have the same reading time regardless the disk manufacturers [11]. To have more accurate controls on the disk head movement and create sufficient disk contention, we read small blocks with sizes from 64 KB to 1 MB.

For the sequential reading, we select 50 sequential blocks from one 1 GB file that is stored on one disk. For random reading, we randomly select 50 blocks from an individual file with the sizes from 16 MB to 4 GB, respectively. Each test is repeated 200 times with different files in order to mitigate the file layout variability on the hard disk. Between each test, we clean both OS page cache and disk cache. The experimental results are shown in Fig.2.

Figure 2 (1) shows that random reading creates more considerable disk contention than sequential reading regardless of the status of OS page cache, especially when the block size is relatively small. There are two reasons. First, disk head has to move further for random read than the movements for sequential read. Second, random read cannot take full advantage of the read-ahead mechanism provided by the OS page cache. The results also show that enabled OS page cache dramatically reduces the reading time for smaller blocks. When the block size is small, the

1_{In this paper, we call the memory on the disk drive as disk cache. The physical memory used as}

sequential reading has significant advantage since the seek time dominates the access time. When the block size increases to 1 MB, the data transfer time is dominant in both random and sequential pattern. Therefore, reading random small blocks enlarges disk contention.

Figure2 (2) and (3) represent the random reading time of different block sizes from different files with OS page cache enabled and disabled, respectively. When OS page cache is enabled, the random reading time is affected by the file sizes. When the file is small, the page cache greatly benefits the random reading. We also observe that reading the same number of blocks from a larger file takes longer since the disk head has to move across a larger range on the disk platter surface. We compare the impact of OS page cache on different sizes of blocks as shown in Fig.2(4). When the block size is 256 KB, the status of OS page cache has minimal affect on the average reading time. Therefore, we choose randomly reading 256 KB as the query pattern for verifying the isolation of two files so that the impact of the OS page cache is minimal; meanwhile the disk contention is considerable.

In-House Cloud Experiments. We exploit the above observations to check if two

conflicting files are stored on the same hard disk. The separation verification on conflicting file pairs is conducted on different pairs of guest VMs. The virtualization platform has done lots of work to fairly assign CPU time to each VM so that the CPU contention between VMs is negligible. We create three pairs of guest VMs as follows:

• Pair I. Each VM is attached by the disk volume from different disks.

• Pair II. Both VMs are attached by the disk volume from the same disk. The distances of two volume on the disk is 10 GB.

• Pair III. Both VMs are attached by the disk volume from the same disk. The distances of two volume on the disk is 20 GB.

As we show in the analysis in Sect.4.3, the common practice of data partition is less than 100 MB. We generate random files on each VM with sizes of 16, 32, 64, and 128 MB. We compare the average random reading time from different pairs of VMs. We read 50 blocks (256 KB each) from each file with different sizes. The results are shown in Fig.3. For small files, such as 16 MB, the difference between reading from one disk and two disks is small. The reason is that the small co-resident files cannot cause enough disk movements to increase the reading time. However, when the file size is no less than 32 MB, the time difference becomes larger than 40 %. When the disk space attached to a pair of VMs is larger, the access time is slightly increased since the disk head has to move in a larger area on the disk platter. However, such difference is small since the disk seek time is in the range of 2–10 ms.

Public Cloud Experiments. We launch a t1.micro EC2 instance in Amazon cloud

with a 160 GB hard disk. We compare the 256 KB block reading time. As shown in Fig.4, reading two files from one disk takes double longer than reading one file from an individual disk. Therefore, our mechanism is practical in public clouds.

Fig. 3 Eucalyptus experiment

Fig. 4 EC2 experiment

Remote Separation Checking

Remote separation checking aims to verify whether or not two conflicts files are stored separately in the network based cloud storage. We conduct the experiments on both in-house cloud and public cloud. We store the conflicting files on the network based storage services, Walrus on Eucalyptus [25] and S3 on Amazon cloud [3]. Both are widely used network based cloud storage nowadays. We discuss the exploitation of our verification mechanism in public cloud.

In-House Cloud Experiments. We deploy the open source cloud platform Euca-

lyptus 3.1 and its object based storage service Walrus on our host machine to evaluate the isolation verification of remote conflicting files. The interface of Eucalyptus is completely compatible with Amazon cloud [3]. Two hard disks

serve the Walrus service. We upload different sizes of files on each hard disk. We randomly read 256 KB blocks from file pairs stored either on the same disk or separately. The experimental results are shown in Fig.5. From Fig.5(1), we observe that reading from a single disk takes more than two times longer than reading from two different disks. We randomly read 100 pairs of files on the same disk and another 100 pairs of file on different disk. Each pair of files have randomly different sizes. The average reading time of each 256 KB files are shown in Fig.5(2). With 0.02 s as the threshold, we can successfully distinguish the isolated storage and co-resident storage.

Public Cloud Experiments. We also evaluate our storage isolation checking

method in Amazon cloud, one of the most popular cloud platforms. Amazon S3

Fig. 5 Eucalyptus experiment

organizes the data by buckets and objects. Each bucket can contain unlimited number of objects. The object is like the file in common PC. In S3, all the buckets share a unique name space. However, Amazon rarely discloses the implementation details such as data partition and replication. We get the following clues majorly from the officially published S3 best practice [20], S3 patent [1], and our observations. • Network Variability: According to Amazon’s website [3], making GET requests

against Amazon S3 from within Amazon EC2 instances can minimize network variability.

• Bucket Separation: Multiple buckets that start with different alphanumeric characters will ensure a degree of partitioning from the start [20]. It implicates that objects logically in the buckets with different initial letters must not reside on the same disk.

• Object Layout: [20] also mentions that performing GETs in any sorted order can increase the throughput. The smaller the objects, the more significant impact on the overall throughput. For files with small size, sequential reading may benefit from the disk cache and prefetch. We infer that a number of sequentially uploaded small files should be stored on the same disk.

• Data Replication: For simplicity, we adopt RRS for all the experimental data. With RRS, all the objects have two replicas in Amazon Cloud.

Based on the limited Amazon S3 storage implementation details, we read the pairs of files in three modes:

• Two Buckets with Different Initials: Reading two files from two buckets with the same initial letter in the same region.

• Two Buckets with Same Initial: Reading two files from two buckets with different initial letters in the same region.

• One Bucket: Reading two files from one bucket.

We launch two EC2 m1.medium instances with the same configuration in US east region to execute the three reading modes. We create S3 buckets with different initial letters located in the same region with the EC2 instances. For each bucket, we upload 100 different 1 MB files with the RRS option. Most of these 100 small files should be stored on the same storage device according to the analysis above. We issue the GET requests from two EC2 instances at the same time. We evaluate the correlation coefficient of reading time recorded by two VMs. The result is shown in Fig.6. We conduct the experiment at different time of a day and repeated during 2 weeks. We can observe that the reading time from the same bucket or from two buckets with the same initial name has an order larger correlation coefficient than reading from buckets with different initial letters. Therefore, our storage separation verification method can be extended to distinguish accessing the same hard disk from accessing different hard disks in real cloud environment.

Fig. 6 Co-residency checking in cloud

5 Dedication Verification

We proposeTerraCheck [44] to help cloud users verify if their dedicated storage devices have been misused to store other users’ data. TerraCheck detects the malicious occupation of the dedicated device by monitoring the change of the

shadow data that are residual bits intentionally left on the disk and are invisible by the file system. When the cloud providers share the dedicated disk with other users, such misuses can be detected since the shadow data will be overwritten and become irretrievable. We describe the theoretical framework of TerraCheck and show experimentally that it works well in practice.

In document Análisis lingüístico de la traducción natural: datos de producción de dos niños gemelos bilingües inglés/español (página 37-41)