• No se han encontrado resultados

La medicina tradicional en el contexto de las políticas oficiales de salud en

4.11 Comunidad Nativa de Nuevo Sucre

5.1.1 La medicina tradicional en el contexto de las políticas oficiales de salud en

Despite having origins that go right back to DARPANET and ARPANET30 in 1969,

the Internet of 2013 bears little resemblance to its predecessor. Visualisations of all kinds of data are now an everyday part of modern life, whether in the form of an App for a mobile phone or tablet computer, a web page on the Internet or a segment on the television news. Intrinsically linked to this data-rich society which we now live in is an

28Mapnik: http:// mapnik.org/ .

29MapTubeD is a dynamic tile rendering system introduced in section 4.1. 30See: http:// www.computerhistory.org/ internet history.

3.8. Computing Architectures 97 underlying infrastructure to capture and deliver this information to us. The Internet is now mobile and able to both deliver data and be a source of information in itself. GPS in phones means that anything we do can be geolocated, opening the door to a greater quantity of machine generated data.

Google’s search business, which is built around their search engine and indexes all the web pages on the Internet, relies on a parallel architecture using redundant com- puters. In the book, “The Datacenter as a Computer” [BH09], Barroso outlines the architecture developed by Google and termed ‘warehouse scale computing’. This is also covered in [HP11], which gives an insight into the economics of computing at this scale. While Google’s business was built around their own ‘BigTable’ and ‘Google File System’ (GFS), open source competitors developed an architecture along the same lines, called ‘Hadoop’. The key algorithm under-pinning both systems, though, is a technique termed, ‘Map-Reduce’. The ‘Map’ phase aims to spread tasks across com- puters in a cluster for processing, after which the ‘Reduce’ phase brings all the results back together. As an example, if the task is to count the number of words in a book, then spread the pages of the book evenly across processors and allow the count to occur in parallel, then add up all the counts in the reduce.

Terms like “Big Data”, “Broad Data” and “Fast Data” [Slo12] highlight the different challenges that are now emerging from the data available to us. While “Broad Data” might not be intrinsically big, it is highly cross-referenced and linked, so its value is in its combination with similar datasets. “Fast Data” is not a new phenomena, but describes the real-time storage and processing of data. Fast in this context could be de- fined with respect to the processing algorithms applied to the data. In other words, it is no use being able to forecast weather two hours ahead if it takes the algorithm two hours to run. It is these two concepts which appear to be more interesting when combined with data stores and real-time APIs later in the section. The current trends in “Open Data” and “Open Government” mean that there is now a rich fabric of interconnected data in the public domain that has yet to be fully exploited. Public APIs, which give us access to previously unseen facets of city systems, are available due to the perva- sive automation of the modern world. Information systems are ubiquitous and control a diverse range of smart city systems, from mass transit to finance, retail, hydrology and security [Keh+11]. The challenge here is to use the empirical data to discover how these systems interact, and the key to this is in building the next generation of tools

which can handle the new data.

The buzzword of 2012 was, arguably, “Big Data”, but this is a concept that is noto- riously hard to pin down with a simple definition. One example is as follows:

“Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures.”

(Edd Dumbill, O’Reilly Big Data Now [Slo12]) While this definition from the O’Reilly book “Big Data Now” distances Big Data away from the fixed schema of the Relational Database Management System (RDBMS) first proposed by Codd [Cod72], the idea that database systems are ‘conventional’ or not scalable to big data sizes does not really fit. Oracle 10, 11 and 12G databases are scalable across a computing grid and their Exadata hardware has been running large databases (40TB per cabinet) on bare metal for many years.

The quote is also circular in nature by defining Big Data in terms of the tools that are needed to process it and not in terms of the data itself. An alternative quote demon- strates the circularity problem:

“Big Data is when the data is too big to be manipulated on a single computer.” (John Rauser31)

This definition opens the door to the sort of distributed computing found in MapRe- duce and Hadoop, but does at least give some notion of data that is bigger than a single desktop machine can handle.

Finally, this is the most succinct definition, even if it isn’t really a definition at all: “Big Data is defined as when the size of the data becomes a part of the

problem.” (Mike Loukides [Lou10])

In “What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets” [KM16], Kitchen and McArdle begin with three key Big Data traits of “volume”, “velocity” and “variety” (“the three Vs”), before analysing seven further qualities of 26 datasets. They come to the conclusion that, “there are multiple forms of Big Data”, and that “the 3Vs meme is false and misleading”. Looking at the problem from the other end, in “BigTable: A Distributed Storage System for Structured Data”

3.8. Computing Architectures 99 [Cha+06], the authors analyse the performance of the infrastructure behind Google’s “BigTable” and “GFS” technologies, showing “MapReduce” jobs running on hundreds of tablet servers. “MapReduce” is defined in “MapReduce: simplified data processing on large clusters” [DG08], which is the algorithm used to split (map) large jobs across multiple warehouse servers and then reduce the results back down for analysis. Once this idea was in the public domain, it was re-engineered and released as open-source in the form of the Apache Hadoop project. This is the substrate that Big Data is stored on and where Big Data jobs are run. In “The Real-time city? Big Data and smart urbanism” [Kit14], Kitchen writes about the smart city and real-time city, commenting on the CASA “London City Dashboard”32. The real-time transport data on the London

City Dashboard is directly dependent on the work in the real-time section of this thesis and the “Adaptive Networks for complex Transport Systems” project. The live system is running on the infrastructure described in section 4.5, “Real-time and Programmable Maps”, which is the basis for the further work in chapter 6. This is published as a chapter in the “CyberGIS” book, “CyberGIS: Fostering a New Wave of Discovery and Innovation.” [Che+14].

While these attempts at definitions all focus on data as information dumps, there is a case for defining Big Data in terms of the algorithms that are being run on it. If we consider the case, as is often cited in Geography, that all the cartographic data for the whole world constitutes big data, then geographers have been doing big data for a long time. The important point is that running algorithms against the whole dataset is extremely rare. While you can ask questions like, “where is the tallest building?”, because of the nature of space, the data is partitioned into grid squares or cities. Even companies like Google and Microsoft, who have complex work flows to take Lidar data, textures and GPS data to automatically make 3D models, rarely work on larger chunks than discrete spatial blocks. This is more of a question of hierarchy and efficient spatial indexing.

The fundamental question to be asked of “Big Data” today is whether it is an en- abling technology that allows discoveries to be made that were not possible before, or whether it is just a scaling effect. In the O’Reilly Strata book, “Big Data Now” [Slo12], the existence of a Moore’s Law for data is suggested, based on the increase in density of disk storage over many decades. Using these long-term stores of data, it is possible

to look for infrequent events or long term events and hopefully discover models to pre- dict them. Climate change is an obvious example of the use of very long-term data, but other applications like detecting road traffic congestion from real-time GPS of cars on the road or predicting crime hot-spots from archive data33are already in use today.