• No se han encontrado resultados

Preparar el plato

Many different definitions have been used so far to define Grids and Grid Comput- ing [67]. For our purposes, a Grid can be thought of as a distributed collection of heterogeneous computing, storage and network resources, that are bound together by a set of software services normally called middleware, and that are used by dy- namic sets of people and institutions. The term Grid was introduced in [69], in 1998, with the idea of comparing this new distributed computing infrastructure to the elec- tric power grid. The term Data Grid is often used to stress the properties of a Grid

whose main purpose is the management of huge amounts of data1, while the term

Computational Grid is used when we want to focus on the processing power of a Grid. However, these are minor terminology aspects, and recently the two terms have started to be used together to define Grids as a Computational Data Grids.

As we are now able to plug a home appliance in the electricity socket in the wall, according to the “Grid vision” one day we could attach our personal computer to a Grid system to receive not electric power but computing and storage capacity to run our jobs. Of course this is a figurative way to give an immediate idea; the situation now is quite far from this vision, and Grids are mainly used in research fields where complex computations must be performed on large amounts of world- wide distributed data. The resources owned by a single institution are not enough to perform some tasks, this is why multiple institutions need to cooperate, sharing their resources, in order to achieve better results. If with the introduction of the World Wide Web he have been able to access and share world-wide distributed information, with the Grid we will be able to do the same with computational power and storage capacity.

1 This is the case of the EU DataGrid project that was created for collecting and analysing data

5.1. Grids as a model for collaborative computations 59

Grid computing, at first sight, can look like another way to identify cluster com- puting or cycle-scavenging applications, but there are important differences to con- sider. Hence, let us refine the definition of Grid having a more detailed look into the main characteristics of a Grid.

When we say distributed collection of resources, we mean world-wide distributed; Grids today involve resources placed in different continents. The biggest difference that discriminates Grid computing from cluster computing is this one, and also the control of the resources. A Grid spans different administrative domains, there is no central point of management on the resources. A site that wants to join a Grid can share its resources still enforcing its access policies and keeping the ownership of them. This decentralised control over the resources is one of the biggest challenges of Grid computing. The number of resources used in Grid computing poses new chal- lenges also in terms of scalability: thousands of users, and thousands of processors and disks connected through unreliable links must be managed in an efficient way.

We also mentioned that the resources are used by dynamic sets of organisations. In Grid computing we generally refer to Virtual Organisations (VOs) to identify a group of people that share a common goal that can be achieved using the Grid. For example, in the LHC Computing Grid [41] each LHC experiment (see Section 4.1) has its own VO, with many institutions participating in it. Virtual Organisations are dynamic in that people and resources forming a VO come and go; an institution or a single person can join a VO for a limited time as well as resources can be made available for some time, for special purposes, and then taken back. This dynamic property is another complicating factor in the development of Grid software.

Heterogeneity in the collection of resources being shared is another important factor. Jobs submitted to the Grid can be executed using different hardware and soft- ware components, and can read and store data from different storage technologies. Matchmaking services are generally used to find, dynamically, the most proper re- sources to execute a job; a job can have its constraints, like a hardware platform or the presence of some software programs. The user of the Grid is generally unaware of where his job will run, or where his job will read the data from. A big effort in developing Grid services is done to guarantee this access transparency.

Another important property of Grid computing is that of being based on open standards, and using as much as possible well-known and reliable protocols. Many services like file transfers and security services in fact rely on protocols that are not new in computer science; Grid tools are expected to use such protocols, enhancing them with new capabilities or adapting them to the new constraints posed in Grid environments.

5.1.1 The EGEE Project

As an example of an international Grid project, we now introduce the EGEE project, Enabling Grids for E-sciencE [7]. Funded by the European Commission, the project, in its first phase, started in 2004 with the goal of providing scientists with a production quality Grid infrastructure supporting applications from various scientific domains, like Earth Sciences, High Energy Physics, Biomedicine and Astrophysics.

The EGEE project ended in 2006 when the second phase of the project, EGEE-II, started. The EGEE-II project extends and consolidates the EGEE Grid infrastructure, which is shared with the Worldwide LHC Computing Grid Project (WLCG) [41]. The capacity provided by Grids like the EGEE Grid is much bigger than the typical capabilities of local clusters at individual centres. With EGEE multiple institutions can effectively collaborate towards a unique and sustainable tool for computationally intensive science (e-Science).

Today, the project brings together scientists and engineers from more than 240 sites in 48 countries world-wide. More than 5000 people are registered to use the infrastructure, but the number of people benefiting from it is more than twice larger. The set of resources shared in the EGEE Grid infrastructure add up to something like 30,000 CPUs and 20 PB of data storage. These numbers are expected to increase in the next few years.

Research institutions are not the only ones to benefit from the EGEE project, that is also highly committed to building strong relations with industry. Regular events are organised by the project to promote the adoption of Grid in industry, analysing busi- ness needs, highlighting technical and non-technical barriers and suggesting ways to overcome them.

EGEE is not only hardware infrastructure and middleware, these are just two of 11 activities within the project that also comprises things like training, support, work on standards, international collaborations and on relation with business communities.

5.1.1.1 Applications

High Energy Physics (HEP) and Biomedicine were the first two scientific groups that joined the EGEE project. More and more applications are starting to use the EGEE Grid infrastructure and today we have applications in Earth Sciences, Bioinformatics, Astrophysics, Multimedia and Finance, Astrophysics, Archaeology, Computational Chemistry and Geology. Researchers in those fields form Virtual Organisations, and collaborate, share resources, and access common datasets to solve computational and data intensive tasks. The original community that promoted the development of the

5.1. Grids as a model for collaborative computations 61

EGEE Grid infrastructure was the one working at CERN with the four LHC experi- ments (see Chapter 4). Other international HEP experiments are also making use of the EGEE infrastructure, including BaBar (B and B-bar experiment) and CDF (Col- lider Detector at Fermilab).

In Computational Chemistry, the initial and primary user is the GEMS a-priori molecular simulator [58]. Several applications have already been ported to the Grid to calculate chemical reactions, simulate the molecular dynamics of complex systems, and calculate the electronic structure of molecules, molecular aggregates, liquids and solids.

In Astrophysics several communities share problems of computation involving large-scale data acquisition, simulation, data storage, and data retrieval. In 2008 the European Space Agency (ESA) is expected to launch the Planck satellite with the goal of mapping microwave sky with an unprecedented combination of sky and frequency coverage, accuracy, stability and sensitivity. PlanckEGEE [36] is a project whose main goal is to verify the possibility of using a Grid Technology to process Planck satellite data. Another example application is the MAGIC [24] application, that runs simulations needed to analyse the data of the MAGIC telescope (located in the Canary Islands) to study the origin and the properties of high-energy gamma rays. In Earth Sciences, the EGEE project is contributing to efforts in a large range of topics related to the earth’s atmosphere, ocean, crust and core, including applications in earthquake analysis.

The biomedical community is benefiting from the Grid by enabling remote col- laboration on shared datasets as well as performing high throughput calculations. The applications involve medical imaging, bioinformatics and drug discovery, with many individual applications deployed or being ported to the EGEE infrastructure. Among the most interesting projects, we cite the WISDOM (Wide In Silico Docking On Malaria) project, that makes use of the Grid for developing new drugs for neglected and emerging diseases with a particular focus on malaria.

Multimedia and Finance domains have just started to evaluate and use the Grid with EGEE.

To summarise, in the research environment, more and more application domains have started considering the use of Grid technologies as a good opportunity to im- prove their achievements. More computing power means having the opportunity to use more and more complex data analysis algorithms. Access to world-wide dis- tributed storage facilities means increasing the scope of your analysis. Last but not least, the collaboration concept which is the most original contribution of Grid tech- nologies offers to research institutes of medium-small size the possibility to join com-

plex and international projects which were before out of their scope.

In the commercial world, Grid technologies are mainly used in the finance sector to perform Montecarlo simulations and complex statistical analysis. In this sector, however, one of the key ideas behind Grid computing is missing: collaboration, that is, information and resource sharing. The collaboration of dynamic groups based on open standards is in contrast with some requirements of the finance sector, like privacy and competition with other companies. This is the reason why, for the finance sector, the distinction among Grid computing, High Performance Computing (HPC) and Service Oriented Architecture (SOA) is often unclear.

Documento similar