GlobalEPD 002-036 renovación 1
ETAPA DEL PRODUCTO
A well-accepted definition of cloud computing comes from the National Institute of Standards and Technology (NIST) that presents the computing as a utility:
“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (NIST 800-145 [50])
According to this definition, the five key characteristics of cloud services are:
- on-demand self-service: access to computing power without human intervention. - broad network access: access to computing power on the Internet.
- resource pooling: virtual resources dynamically assigned without having control of the exact location of them.
- rapid elasticity: resources are able to be quickly allocated by increasing/decreasing both, the number of virtual appliances deployed (i.e. horizontal) and resources dedicated within each (i.e. vertical).
- measured services: control and optimization of resource use by evaluating storage, processing power, bandwidth, active user accounts, etc. These measures are the metrics used for achieving a Service Level Agreement (SLA), basic for customer’s confidence.
Cloud computing has developed greatly since 2006 when the Amazon Elastic Cloud Compute (EC2) was first launched, and currently, it is reaching its productivity plateau after positioning itself beyond a disrupting IT technology, to become the basis for most future IT models [84] (Figure 1.10).
Cloud computing is a large-scale distributed computing model, providing computing, storage, networking, and other resources to many users in service mode. The cloud offers high scalability (i.e. horizontal, vertical elasticity), high availability and reliability (e.g. through replication), while improving resource utilization efficiency (e.g., over virtualization techniques like over-commitment or pre-emptible virtual machines). These key capabilities is motivating the migration of a large number of companies to cloud computing, but also scientists, who point out the new paradigm as a solution for big data storage and large-scale analysis in bioinformatics, thanks to the virtualization that may reduce data transference and improve computation accessibility to medium or small laboratories, meanwhile promoting data sharing and collaborative work.
Figure 1.10: Hype cycle for cloud and cloud-related technologies.
The hype cycle [258] graphically represents the maturity and adoption of specific technologies. “Cloud Computing” is already in the enlightenment slope and is predicted to be in the productivity plateau in less than 2 years. [84].
Besides the five essential characteristics, NIST describes three cloud service models, and four cloud deployments that are introduced in the following to subsections.
〕 Deployment Models
Clouds can be classified based on their deployment model, that is the architecture and purpose of the underlying infrastructure in terms of cloud location and management. NIST identifies four different deployment models for cloud computing: private clouds, community clouds, public clouds, and hybrid clouds.
Private Clouds
They are intended for the exclusive use of a single organization with multiple users and emulate cloud computing on private networks, either owned, managed and operated by the same organization (on-premises), or by an external cloud provider (off-premises). Hence, regarding its physical location, private clouds are classified into “on-site” or “outsourced” clouds. “On-site” private clouds are secured and controlled inside the organization and clients access it from within the security perimeter or through firewalls and a Virtual Private Network (VPN). In “Outsourced” private clouds, the cloud provider permits the client access into its premises and separate the private cloud resources from the other cloud resources by different mechanisms such as Virtual Local Area Network (VLAN), VPN, separate network segments or clusters. As discussed below at 1.4.3.2Cloud Management Platforms, several open-source packages are available to install private clouds.
Community Clouds
They are intended for the exclusive use of a specific community of users from several organizations that have shared objectives (e.g., security requirements, compliance considerations), who can reduce costs by sharing the infrastructure. Like in private clouds,
the infrastructure can be owned and managed by a third party, or be operated within the community on-premises on-site scenario, where each organization may provide cloud infrastructure, consume services, or both.
Public Clouds
They correspond to the traditional and mainstream definition of cloud computing, where resources are dynamically provisioned on a self-service basis over the Internet. The infrastructure, owned, managed and operated by a business, academic or governmental organization, is intended for open use by the general public. Cloud provider is responsible for the security management. Compared to private clouds, public cloud services offer a lower degree of control and oversight of the physical and logical security perimeters to separate computational resources (usually present in the outsourced private cloud model). Amazon Web Services (AWS) is the leading company, representing almost 50% of the market share in 2018. Together with the rest of the main public cloud providers, Microsoft Azure, Google Cloud, Alibaba Cloud, and IBM, they dominate 80% of the market [85].
Hybrid Clouds
They are a composite of any of the other deployment models. The individual clouds remain unique entities and are aggregated by standardized or proprietary technologies that enable data and application portability (see also section 1.4.3.3). Some use cases for this complex model are the access to external clouds during periods of high demand (called cloud bursting), or to run some applications into public clouds while maintaining sensitive data in an on-premises private cloud.
〕 Service models
Cloud Computing enables hardware and software to be delivered as services following the SOA principles (see also section 1.5.1 Service-Oriented Science). These services are usually described on a Something as-a-service (XaaS) taxonomy. In this context, the term service not only refers to SOA precepts, but also reflects that the service is provided on-demand as a utility - very convenient for cloud vendors. According to NIST and as depicted in Figure 1.13, these services can be classified into one of three delivery models: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS).
Each model allows the user a different degree of control over the cloud stack:
Software as a Service (SaaS)
This service mode delivers application services online and facilitates remote access to available bioinformatics software tools through the internet. The user typically accesses to a web application and is totally unaware of the infrastructure lying behind. A simple and widely used SaaS is Google Docs. SaaS is actually the most usual approach to publish bioinformatics tools on the cloud. CloudBlast [86] would be an example, but there are many others, some introduced at 1.4.3.5 Bioinformatics in the cloud.
Platform as a Service (PaaS)
Allows the development, installation and execution of user-developed applications on the cloud infrastructure. Applications must be created using specific programming languages, API libraries or tools supported by the PaaS provider which constitute the development environment. The clients have full access and control over the created tools and the developmental languages, yet not over service run times, web server, or storage network. Offered services rank from user’s application hosting, development and testing, to extensive integrated services with scalability and maintenance. Examples for PaaS generic vendors are Heroku [87] or openShift [88], where the user is able to execute in the cloud their applications (called “Dynos” or “Gears” respectively) as containers in real time and shying away the deployment details of the infrastructure. The main benefits of these services include that users can focus on high-value software rather than infrastructure. Additionally, to these stand-alone PaaS, if user’s applications are meant to be exposed to online through a certain SaaS, the developer’s platform is called “PaaS from SaaS”.
As illustrated in Figure 1.14, in the bioinformatics domain, some platforms have been designed following the same philosophy, as a new strategy to outsource infrastructure management while keeping control of the scientific code running there. Eventually, such code can even be offered as SaaS to researchers. Section 1.4.3.5 Bioinformatics in the cloud shows some examples following this approach.
Infrastructure as a Service (IaaS)
It refers to the group of high-level APIs that control low-level details of the underlying infrastructure, like physical computing resources, data partitioning, scaling, security, backup, etc. In this way, the user controls the virtualized servers and the specific computational capabilities and storage of the same: VLANs, access to raw block or file storage, load balancers, IP addresses, etc. However, he/she has limited control over the network settings. The most popular example is Amazon EC2, which allows the user to create virtual machines and manage them, or Amazon Storage Service (S3), which allows to store and access data through a web-service interface.