Large amounts of data

Top PDF Large amounts of data:

Perspectives in processing large amounts of information using Cloud

Perspectives in processing large amounts of information using Cloud

In recent years, many algorithms have been implemented to process large amounts of information. Many of these calculations are simple. However, input data is generally large, and the calculations have to be distributed through many machines in order to be able to do it in a short period of time. The fact of how to make the calculation simultaneously, distribute the data and manage the errors, make the original algorithm more difficult by introducing a large amount of complex codes to address those issues. To solve this issue, a new abstraction has been thought in order to express simple calculations that are trying to make, but hiding the complex details of parallelism, tolerance to errors, data distribution and balance of load in a library. MapReduce would be able to solve these problems.
Mostrar más

4 Lee mas

Semantic Sensor Data Search in a Large-Scale Federated Sensor Network

Semantic Sensor Data Search in a Large-Scale Federated Sensor Network

Abstract. Sensor network deployments are a primary source of massive amounts of data about the real world that surrounds us, measuring a wide range of physical properties in real time. However, in large-scale deployments it becomes hard to effectively exploit the data captured by the sensors, since there is no precise information about what devices are available and what properties they measure. Even when metadata is available, users need to know low-level details such as database schemas or names of properties that are specific to a device or platform. Therefore the task of coherently searching, correlating and combining sensor data becomes very challenging. We propose an ontology-based approach, that consists in exposing sensor observations in terms of ontologies enriched with semantic metadata, providing information such as: which sensor recorded what, where, when, and in which conditions. For this, we allow defining virtual semantic streams, whose ontological terms are related to the underlying sensor data schemas through declarative mappings, and can be queried in terms of a high level sensor network ontology.
Mostrar más

16 Lee mas

NOISE MAPPING AND NOISE ACTION PLANS IN LARGE URBAN AREAS

NOISE MAPPING AND NOISE ACTION PLANS IN LARGE URBAN AREAS

Basic requirements for noise mapping of large urban areas represent considerable technical challenges: vast amounts of data, enormous computer power and computer memory, reliable data reduction procedures and criteria for decisions on which noise source to include. Problems may also be found on interaction between the results and other type of data owing to the large file sizes or the complexity of shape files. Data is not always available as necessary. Digital 3-D land data is now almost widely available in all municipalities in Portugal. They are usually integrated in GIS allowing the crossing of acoustical data with other types of data, such as population, for example.
Mostrar más

7 Lee mas

The Concept of Data Privacy Law and Its Application to the Internet

The Concept of Data Privacy Law and Its Application to the Internet

Apart from statistical evidence, the erroneous arguments about the death, and indeed evils, of privacy such as those presented above may be refuted by the following. To say that we do not need a right of privacy because our modern information society does not cater for privacy is akin to saying that we do not need a right to water in a desert - the removal of a fundamental right is justified by reference to the environment being hostile to, or making difficult the exercise of, such a right. Such reasoning is clearly flawed and does not appear generally accepted in any context. Indeed, the opposite is true. For example, no reasonable person would suggest that we should not seek to protect animals facing extinction by reference to the fact that protecting such animals is made difficult by the circumstances under which those animals live.
Mostrar más

27 Lee mas

Cluster Ensembles for Big Data Mining Problems

Cluster Ensembles for Big Data Mining Problems

Mining big data involves several problems and new challenges [8], in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is hetero- geneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular char- acteristics that make them useful for different types of data mining problems. Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular al- gorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectation- maximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving.
Mostrar más

3 Lee mas

Predicting response and survival in chemotherapy-treated triple-negative breast cancer.

Predicting response and survival in chemotherapy-treated triple-negative breast cancer.

Predicting pCR after chemotherapy within TNBC. To identify predictors of chemotherapy response, we first evaluated gene expression-based signatures in the diagnostic (that is, pre- treatment) samples of the GEICAM/2006-03 trial (Alba et al, 2012). Among all patients, none of the signatures or clinical– pathological variables evaluated was found significantly associated with pCR (Figure 1A). Conversely, among patients with BLBC, high expression of the proliferation score, low expression of the luminal A signature and high Ki-67 by IHC were found to be significantly associated with pCR (Figure 1B). Interaction tests between each of these variables and the BLBC (vs others) for pCR showed a trend towards statistical significance for the luminal A signature (inverse relationship, P ¼ 0.066), the proliferation score (P ¼ 0.062) and no evidence of interaction with Ki-67 (P ¼ 0.372). To confirm the findings obtained from GEICAM/2006-03, we interrogated 188 TNBC patients from the combined MDACC data sets treated neoadjuvantly with anthracycline/taxane-based chemotherapy. Similar to the data obtained in GEICAM/2006-03, none of the signatures was significantly associated with pCR within
Mostrar más

10 Lee mas

Assessing the value of cooperation in Wikipedia

Assessing the value of cooperation in Wikipedia

To examine the correlation between edit volume and article quality, we compared the average number of edits and contributors on “featured” articles, selected by the Wikipedia community as “the best articles in Wikipedia,” to the corresponding averages for other articles. The results show a strong correlation between number of edits, number of distinct editors, and article quality. In making this comparison, it is crucially important to control for the article visibility or relevance, since featured articles tend to deal eith more popular subjects. Article age must also be taken into consideration, since on average older articles have more edits. Care was taken to control for these variables.
Mostrar más

14 Lee mas

A heuristic approach to generate good-quality linked data about hydrography

A heuristic approach to generate good-quality linked data about hydrography

Traditionally, the process that cartography producers go through to create cartography databases is as follows: first, they identify real world features and give them names; then, they categorize the features and create models, i.e., schemata; finally, they introduce these feature types and their related instances in a database using its underlying syntax [2]. Furthermore, as a consequence of the existence of multiple geospatial producers, it is quite common to find several databases describing, at least partly, the same geographical space. Usually data are collected for specific purposes, and are very different from one source to another [1].
Mostrar más

5 Lee mas

Development of National Satellite Image Atlas: Their Importance in Corporate and National SDI Development The Peripheral Countries Case

Development of National Satellite Image Atlas: Their Importance in Corporate and National SDI Development The Peripheral Countries Case

Recently, new systematic and cartographic programs at 1:250.000 and 1:100.000 scale has begun. Although these cartographic projects implied strong efforts by the institutions involved in them, the frequent change in political and economical environments produced tough operative difficulties. During the 1996-2000 the Army Geographic Institute (IGM) have finished the complete topographic coverage of the country at 1:250.000 scale. This cartographic initiative was sponsored by the Argentina Mining Sector Promotion Project (PASMA, 1996-2000) with funds provided by the World Bank. The PASMA-IGM agreement had as objective the creation of a topographical framework for the National Geological Map Program (national law 24.224). This cartographic surveying was made by digitising available cartographic information and controlled by georeferenced Landsat TM images. Since 1996, an analogue geological mapping program has been performing by the Geological and Mining Survey of Argentina (SEGEMAR). About six years ago other important cartographic survey initiative at 1:100.000 was begun by the IGM and the Cartographic Institute of Catalunya (ICC, Catalunya, Spain), but the country economical crash at 2001 momentarily stopped the survey operations (IGM, GIS Day IGM Institutional Presentation 2004).
Mostrar más

19 Lee mas

TítuloEnhancement of oxygen transfer in bioprocesses by the use of an organic phase: effect of silicone oil on volumetric mass transfer coefficient of oxygen (kLa)

TítuloEnhancement of oxygen transfer in bioprocesses by the use of an organic phase: effect of silicone oil on volumetric mass transfer coefficient of oxygen (kLa)

Cesário, M.T., Beeftink, H.H. and Tramper, J. (1992) Biological treatment of waste gases containing poorly-water-soluble pollutants. Biotechniques for Air Pollution Abatment and Odour Control Policies, A.J. Dragt and J. van Ham (editors) Elsevier Science Publishers B.V. Daugulis, A.J. (2001) Two-phase partitioning bioreactors: a new technology platform for

8 Lee mas

International Public Sector Accounting Standards Board

International Public Sector Accounting Standards Board

8. An approved budget as defined by this Standard reflects the anticipated revenues or receipts expected to arise in the annual or multi-year budget period based on current plans and the anticipated economic conditions during that budget period, and expenses or expenditures approved by a legislative body, being the legislature or other relevant authority. An approved budget is not a forward estimate or a projection based on assumptions about future events and possible management actions which are not necessarily expected to take place. Similarly, an approved budget differs from prospective financial information which may be in the form of a forecast, a projection or a combination of both ― for example, a one year forecast plus a five year projection.
Mostrar más

32 Lee mas

Hipotermia teraputica en el paro cardiorrespiratorio recuperado

Hipotermia teraputica en el paro cardiorrespiratorio recuperado

Patients treated with therapeutic hypothermia pre- sented better neurologic outcome and lower mortali- ty, despite mostly presenting a non-shockable initial rhythm, and a longer stoppage time. The implemen- tation of therapeutic hypothermia has not yet been widespread as much as is recommended, despite what has been shown by studies carried out in other parts of the world and being an important recom- mendation of the resuscitation guidelines for post- cardiac arrest care since 2010.

12 Lee mas

A Mobile Query Service for Integrated Access to Large Numbers of Online Semantic Web Data Sources

A Mobile Query Service for Integrated Access to Large Numbers of Online Semantic Web Data Sources

Our experimental evaluation is applied in a context- aware scenario, using the SCOUT context-aware application framework [11] as a client. As the user is moving around, SCOUT continuously discovers new physical entities in the user’s vicinity (e.g., using a built-in mobile RFID reader), and extracts references to online semantic sources describing the particular entity (e.g., by reading URLs from RFID tags). To allow integrated querying over this gradually discovered semantic dataset, SCOUT dynamically passes detected source references to the query service. For the experiments, five context-aware application queries were selected that request context-relevant data, covering the different types of data in our experiment dataset (e.g., geographical entities, people). Two queries return geographical data, for instance allowing to plot physical entities (e.g., shopping centers, airports) on a map. The other three queries return “interesting” physical entities in the vicinity (e.g., products for sale in an affordable price range), together with details and indication of relevance (e.g., manufacturer and user comments). 7.1.4 Methodology
Mostrar más

27 Lee mas

Preprocessing and analyzing genetic data with complex networks: An application to Obstructive Nephropathy

Preprocessing and analyzing genetic data with complex networks: An application to Obstructive Nephropathy

4. Feature selection methods. While it is possible to work directly with all the 834 features, i.e., microRNA expression levels, we are interested in the problem of feature selection, that is, the initial selection of a set of relevant features to be included in the analysis. Reducing the size of the initial dataset has three important advantages. Firstly, the computational cost, which approximately scales as the square of the number of features, is drastically reduced. Secondly, the elimination of features not relevant for the final result may improve the outcome of the algorithm, by reducing the quantity of noise it has to cope with. Finally, the reduction of the number of features implies that the number of dimensions of the space of the possible solutions is also reduced: this, in turn, improves the significance of results, thus leading to a more statistically relevant analysis.
Mostrar más

9 Lee mas

Equatorial ionospheric electric fields during the November 2004 magnetic storm

Equatorial ionospheric electric fields during the November 2004 magnetic storm

timescales from a few to several hours, are driven by enhanced energy deposition into the high latitude iono- sphere [Blanc and Richmond, 1980]. Relatively fast (occur- ring about 2 – 3 h after major increases in convection) disturbance dynamo electric fields [e.g., Scherliess and Fejer, 1997] are most likely due to the dynamo action of fast traveling equatorward wind surges [e.g., Fuller-Rowell et al., 2002], while slower disturbances (occurring 3 – 12 h later) are believed to be driven mostly by the electrody- namic action of storm-enhanced high latitude equatorward winds [Blanc and Richmond, 1980]. Disturbance dynamo perturbation electric fields occurring about one day after major enhancement in geomagnetic activity [Scherliess and Fejer, 1997] are most likely due to the combined effects of storm-driven equatorward winds and conductivity varia- tions, resulting from storm driven ionospheric composition changes. The low latitude ionospheric disturbance dynamo electric fields are westward during the day and eastward at night, with largest magnitudes in the late night sector [Scherliess and Fejer, 1997]. Although the climatological disturbance dynamo electric fields are in good agreement with the predictions from the Blanc-Richmond model, their large spatial and temporal variability [e.g., Fejer, 2002; Su et al., 2003] still remains to be understood.
Mostrar más

11 Lee mas

TítuloNew data structures and algorithms for the efficient management of large spatial datasets

TítuloNew data structures and algorithms for the efficient management of large spatial datasets

also be seen as edges in a labeled directed graph. The vision of a set of RDF triples as a graph is called RDF graph in the original recommendation [MM04]. Figure 4.2 shows a small example of RDF representation that models some statements about J.R.R. Tolkien. For example, the first triple states that Tolkien was born in South Africa; the second one shows that Tolkien wrote The Lord of the Rings; etc. The same information shown by the triples can be seen in the labeled graph in the figure. RDF datasets can be queried using a standard language called SPARQL [PS08]. This language is based on the concept of triple patterns: a triple pattern is a triple where any of its components can be unknown or variable. In SPARQL this is indicated by prepending the corresponding part with ?. Different triple patterns are created simply changing the parts of the triple that are variable. The possible triple patterns that can be constructed are: (S, P, O), (S, P, ?O), (?S, P, O), (S, ?P, O), (?S, P, ?O), (S, ?P, ?O), (?S, ?P, O) and (?S, ?P, ?O). For instance, (S, P, ?O) is a triple pattern matching with all the triples with subject S and predicate P , therefore it would return the values of property P for the resource S. (S, ?P, ?O) is a similar query but it contains an unbounded predicate: the results of this query would include all the values of any property P of subject S.
Mostrar más

337 Lee mas

Electrosíntesis de productos de interés industrial y medioambiental con electrodos de diamante

Electrosíntesis de productos de interés industrial y medioambiental con electrodos de diamante

However, it is relatively expensive, and is difficult to store due to its high reactivity, and these have limited its use. During the last years, the scientific effort was focused on the study of the in-situ production of PAA in order to limit the costs and hazards related to the transport and handling of concentrated PAA [12, 13]. Generally, the production of PAA is based on the chemical reaction between hydrogen peroxide and acetic acid in an aqueous reaction mixture in the presence of sulphonic acid as catalyst [14]. Based on this reaction, few works have reported the indirect production of PAA by the electrochemical production of hydrogen peroxide in acetic acid aqueous solutions [15] or by sonochemical synthesis from mixtures of acetic acid and hydrogen peroxide in a microstructure reactor [16]. Additional approaches have used the direct oxidation of acetic acid by hydroxyl radicals electrochemically produced in reactors equipped with Boron Doped Diamond (BDD) anodes [17], although in this case the efficiency of the process was limited by the mineralization of the raw material.
Mostrar más

256 Lee mas

D2  1–Report on Dynamic Data Reconciliation of Large Scale Processes

D2 1–Report on Dynamic Data Reconciliation of Large Scale Processes

In case that the measurement errors are normally distributed around their true values, the DDR ap- proach is able to provide the best set of estimations coherent with the model. Nevertheless, due to several reasons such as serious defects in instruments or in the communication network, the solution provided by the data reconciliation is distorted. As a consequence, the error is spread throughout the rest of the variables, creating a smearing effect. These problems are called gross errors and their detection and treatment is crucial for obtaining good estimations. The propagation of gross errors in measurements to the efficiency indicators must be avoided because, otherwise, the decision-support systems will recommend wrong actions. Hence, previous data treatment introducing gross-error detection plus the use of robust estimators in the data reconciliation are also mandatory. In this way, this step avoids the inclusion of corrupted data (outliers) in further decision support phases and serves as a detector of systematic errors in sensors/process.
Mostrar más

25 Lee mas

A low energy lake destratifier

A low energy lake destratifier

Scope and Method of Study: The objectives were: 1 To design, construct and test a pump that would pump large amounts of flow with low input of energy; 2 To determine the relationships of[r]

113 Lee mas

Caracterización de la función macro- y microvascular en un modelo animal de hipertensión (rata SHR) y de síndrome metabólico (rata SHROB). Efectos de las glitazonas y papel del tejido adiposo perivascular

Caracterización de la función macro- y microvascular en un modelo animal de hipertensión (rata SHR) y de síndrome metabólico (rata SHROB). Efectos de las glitazonas y papel del tejido adiposo perivascular

Therefore, taking into account the close relationship between insulin resistance, hypertension and obesity, and also knowing that adipose tissue releases large amounts of factors, many o[r]

269 Lee mas

Show all 10000 documents...