• No se han encontrado resultados

Notificaciones de alarma en XML/texto

The techniques described in this chapter can act as a form of disk level data reduction on local storage media, with a tiny fraction of a drive serving as a proxy for its content. These approaches may also be effective for cloud storage platforms, where thumbnails and item metadata are present. By greatly reducing the amount of data to read across the network, significant performance gains may be achieved, as with the sub-file hashing strategies in Section 4.5. This section first explores the possibility of using embedded metadata for contraband detection, much like the CRC checksums found in the Windows thumbnail cache, before discussing the possibility of using cloud storage thumbnails in lieu of full files. Both approaches assume that credentials have been obtained for accessing the cloud storage account, rather than relying on processing the local device cache [157].

5.7.1

Exploiting Embedded Discriminators

Cloud storage providers typically have a great deal of metadata associated with files and directories on their platform. This metadata allows for version tracking for client synchronisation, management of sharing/privacy properties, and provides the ability to uniquely identify an item on the platform. Checksums may also be used to verify the integrity of data once it has been transmitted across the network. As noted by Roussev et al. [109], this metadata is usually hidden, but APIs provided by the cloud service can access this information, which can be used to obtain a more complete acquisition of cloud storage.

The traditional approach to contraband detection is to acquire all of the binary data for a file, hash it, and then check if it exists in a database. However, when a highly discriminative identifier is already provided in file metadata, file data does not need to be fetched. As these discriminators are intended to track changes in a file, and are not able to be manipulated by users without changing file content, they are reliable data signatures. A list of cloud storage providers and their content signatures are provided below:

Google Cloud: CRC32C/MD5 – Google cloud provides both CRC32 checksums and MD5 hashes39 for the purposes of verifying the integrity of downloaded files. CRC32C is standardised and uses a different polynomial to CRC32, while the standard MD5 hash is used for all non-composite objects on the platform.

Google Drive: MD5 – Files with binary data stored on Google drive have a property named md5checksum40, which is a standard MD5 hash of the full file content.

39https://cloud.google.com/storage/docs/hashes-etags 40https://developers.google.com/drive/api/v3/reference/files

SHA1 full file hash, and iii) quickXorHash, a proprietary exclusive-OR (XOR) based hash with a fully documented implementation. CRC32 and SHA1 hashes are not available on OneDrive for Business, while quickXorHashes are not available on personal accounts.

Dropbox: content_hash – Dropbox uses a non-standard hashing approach for its con- tent_hash file property42, which is used for file verification. Files are broken into 4MiB blocks, which are hashed using the SHA256 algorithm. Block hashes are then concatenated into a single string, which is then hashed again with SHA256 to calculate the final hash.

All of the cloud platforms listed above make use of robust cryptographic hashes for at least one of their content signatures, meaning that it is not necessary to rely on CRC based checksums. These signatures can be exploited directly by an investigator without downloading any file content, and without calculating any hashes. All that is required is for a request to be made to the platform’s API to obtain metadata for the files in question. This can typically be done in bulk, as with Dropbox, which returns content hashes for each file when requesting a directory listing, as depicted in Figure 5.10. Additionally, as most platforms use unmodified variants of hashing algorithms which are frequently used in forensics, existing contraband databases may be used without modification. The API response can be parsed for hash strings, which can then be compared directly with existing databases. However, in the case of Dropbox, a new contraband signature database would be required for this approach, as it is unlikely that an existing database has been generated using that particular hash concatenation method.

The metadata based approach is very fast, as it requires only strings to be requested across the network. Additionally, few API requests are needed on platforms such as Dropbox, as hashes for all files in each directory can be queried in a single request. This alleviates API rate limiting concerns, which may slow down the overall acquisition process if files are retrieved individually. However, the cloud platform hashes are as fragile as cryptographic hashes in disk based forensics, in that modifying a single bit in the file will generate a different hash. A simple obfuscation script could render contraband files invisible to cryptographic hash databases, such that a more robust approach may be used as a follow-up.

41https://docs.microsoft.com/en-us/onedrive/developer/rest-api/resources/hashes 42https://www.dropbox.com/developers/reference/content-hash

Fig. 5.10 A screenshot of the Dropbox API V2 response to the list_folder endpoint. Content hashes (highlighted by boxes) are included in directory listings for easy bulk processing.

5.7.2

Processing Cloud Thumbnails

A robust method for detecting contraband on cloud services is to make use of the thumb- nails provided on the platform. These thumbnails are generated for the client and web application previews, and can be accessed via the platform API. As thumbnails are a condensed form of the complete image content, they both reduce the amount of data to be transmitted across the network, and provide assurance about the content of the file. These thumbnails can then be cryptographically or perceptually hashed for comparison with contraband databases.

A small case study of the Dropbox platform was undertaken in order to understand the potential benefits of this approach. The largest 5000 images of the Flickr 1 Million dataset were used to create a subset for testing. This subset totals 1.57 GiB, with a mean file size of 337.15 KiB. Files were uploaded to the Dropbox platform and accessed via the same client workstation and 100Mbit Internet connection as the benchmarks in Section 4.5. Where possible, requests were made using the Dropbox Python SDK [158].

Single Thumbnail: Thumbnails of size 128 were requested separately for each file using Dropbox.files_get_thumbnail.

Batch Thumbnail: Thumbnails were downloaded in batches of 25, which is the maxi- mum number of files allowed by Dropbox.files_get_thumbnail_batch.

Directory Zip: The entire directory was requested as a zip file via the files/download_zip HTTP endpoint using the Python 2.7 requests library [159]. This workaround was required as the Python SDK timed out when requesting the entire directory. Each approach was repeated three times, with reported times representing the mean duration taken to acquire the data and read it into memory with no further processing. Multiple simultaneous requests were issued with up to 32 threads for all but the zip approach, which only makes a single request to the API.

When executing large numbers of API requests it is necessary to take note of any rate limiting functionality employed by the endpoint server. In this case, Dropbox does perform rate limiting but does not disclose metrics for when the limit is triggered. No rate limiting was observed in this experiment up to 32 threads, though a brief test with higher thread counts did result in some requests being rejected with a timeout. Code snippets for these benchmarks are provided in Appendix F.1.2 and F.1.3. In all cases except downloading the zip, files were first enumerated using Dropbox.files_list_folder, with this enumeration time being counted towards the overall time. Results for these benchmarks are provided in Figure 5.11.

Making many small requests to the Dropbox API is expensive, as each individual API request, regardless of size, comes with its own overheads. However, the Dropbox API has a very high rate limit and allows for many simultaneous requests over a reasonable period of time, facilitating reduced acquisition times at high thread counts. Requesting individual file thumbnails instead of full files resulted in a performance increase of 1.2– 1.4×. As previously discussed in Section 3.6.3, file level data reduction approaches

perform best when there is a good trade-off between the base access overhead and the cost of transporting file data. As thumbnail file sizes should be consistent, regardless of the size of the source file, it is expected that this performance gap would widen with larger files. This assumes that thumbnails of the requested size are already available on the platform and do not have to be generated as the request is made. Higher resolution images require more processing when carrying out re-scaling operations, which may mitigate some of the performance gains on larger files. However, as no decrease in request times was noted

Fig. 5.11 Benchmark of mean time to acquire files from a Dropbox account. Comparison between downloading full files one at a time (File_Single) versus a single thumbnail at a time (Thumbnail_Single) and batch thumbnail downloading (Thumbnail_Batch), and downloading the directory as a zip file (File_Zip). Dataset is the largest 5000 files in Flickr 1 Million.

between runs, it is assumed that, at least for 128px thumbnails, they are already present on the Dropbox platform.

Acquiring thumbnails in batches of 25 reduced transfer times substantially. Between 1 and 8 threads the batch approach was approximately 9× faster than acquiring single files, reaching 20× at 16 threads, and 28× at 32 threads. This can be attributed to the 25-fold reduction in the number of API requests, and also to potential efficiency gains of the Dropbox server processing multiple related files simultaneously. This approach also compares favourably to downloading the entire directory via a zip file, which ends up being slower than the single file approaches at high thread counts. The reason for this is likely that zipping such a large directory requires much more memory and processing by the cloud provider, while the other approaches simply request that existing resources be transferred.

These results show that obtaining thumbnails, rather than full files, has the potential to decrease overall processing times of remote cloud storage. Small benefits are provided

much of the improvement can be attributed to reducing the number of API requests, which could also be achieved by batch downloading full files. Unfortunately, Dropbox does not provide a batch file download endpoint, such that a comparison cannot be made.

Documento similar