ELABORACIÓN DE INFORMES DE COBROS

INGRESOS EGRESO

ELABORACIÓN DE INFORMES DE COBROS

Having information of adequate quality available at the right time in the right place is vital for software systems to react to situations or support decisions. The applications in the area of IoT and EBS mentioned above are just a few examples of modern reactive systems where heterogeneous information provided by various publishers has to be interpreted and where false alarms, missed events or otherwise information of inadequate Quality of Information (QoI) carries a cost [35, 50, 55, 125, 182, 250].

In general, information in an EBS can be considered to be of inadequate quality if it is not precise, accurate, or fresh enough; if notifications about events arrive out of order, causelessly (false positives) or not at all (false negatives); if data is not reliable enough because the publisher is not trustworthy, or not confident enough. The degree to which any of these properties have to be met for QoI to be considered adequate, however, depends on the purpose the information is intended to be used for by each subscriber [35, 182]. This Value of Information (VoI) is application-specific and can change dynamically at runtime as it depends on the individual utility function of each subscriber, itself subject to the subscriber’s context and state [56, 225, 421, 424].

Supporting requirements about QoI depends on the EBS satisfying individual requirements about quality-related properties at runtime as most of these properties cannot be determined at design time [35, 42, 85, 213]. In that regard, the notion of QoI in EBS encompasses aspects of Quality of Service (QoS) and other concepts addressing runtime quality but is not limited to them. We use three motivational examples inspired by research cooperations: financial data vendors, energy-efficient data center and smart supply chains. The examples highlight different aspects of QoI and VoI in reactive software systems and the relevance of QoI.

Financial Data Vendors

Trading on today’s financial markets is based on software systems for analyzing, planning and executing transactions. Financial data vendors provide banks, traders, and end users with information to base investment decisions upon. The spectrum of the provided information ranges from raw data feeds about trades at stock exchanges delivered at high sampling rates and low latency to aggregated analytics that include background reports about markets or trends.

A financial data vendor is subscribed to fine-granular notifications about events at the stock market, published by the different stock exchanges. This data is directly forwarded to some consumers; for others, it is aggregated and fused with historic data or general news first, using approaches such as Complex Event Processing (CEP) or manual analysis. The sampling rate of incoming notifications from a single stock exchange can range from one notification per day to more than 100.000 notifications per second. All participants of such an EBS are usually located in data center connected by high-speed networks.

Customers of a financial data vendor subscribe to a combination of product (e.g., tick, aggre- gate, report), content attributes (e.g., stock exchange, ticker symbol) and QoI properties (e.g., latency, sampling rate). At an extreme end of the subscriber spectrum, high-frequency or low- latency trading applications exploit the speed of algorithmic decision making in software systems and minute information asymmetries for arbitrage revenues. They have restrictive requirements about latency and sampling rate but are willing to pay a premium; some applications require a

minimum sampling rate while others need to define a maximum sampling rate as they are un- able to process information properly if the sampling rate is too high. End users managing their own portfolio manually, on the other hand, are usually more interested in aggregated updates on the development of a stock symbol or trend forecasts, rendering latency and sampling rate insignificant compared to the requirements of algorithmic traders [73].

For financial data vendors and their subscribers, QoI directly translates into products, prices and penalties. The VoI of each subscriber determines the products it consumes and holds the financial data vendor responsible for. Providing data with insufficient QoI results in penalties and revenue loss for the financial data vendor. For consumers, receiving data with insufficient QoI results in suboptimal strategies to buy or sell – in the worst case even leading to stock market crashes such as the May 6, 2010 Flash Crash [255].

Energy-aware Reactive Data Center Management

Virtualized resources in data centers accessible via broadband networks and the Internet provide scalable infrastructures for applications, CEP engines and MOMs at different levels of abstraction. In this dissertation, they are subsumed under the term Cloud computing and allow applications to adjust resources automatically to meet fluctuations in demand. Resources are rented out to tenants by providers on a pay-per-use basis. The physical servers managed by the provider are not accessible to tenants but hosted Virtual Machines (VMs) and applications are. Network traffic in and out of the data centers is charged for by the provider.

From the perspective of a Cloud provider, this pay-per-use business model requires fine-granular monitoring of resources and applications for billing and availability management as violations of Service Level Agreements (SLAs) result in penalties and reduce revenue. Energy consumption of servers and cooling facilities are the main cost drivers in data center operations [44, 49, 257, 313]. Thus, providers try to optimize the utilization of their resources by balancing the level of utilization of each server with the energy it consumes, the heat that it produces, and the costs necessary for cooling. For this, sensory data about energy consumption and ambient temperature is incorporated into load-balancing algorithms together with metrics about applications, their SLAs, server utilization and network traffic. Resource management in this setting is done in a push-based fashion: certain events trigger a reassignment of resources for a given application, e.g., scale-in or scale-out, runtime migration to other hosts etc. Triggering events can be: changes to the workload or SLAs of the hosted application, critical resource utilization caused by other tenants hosted on the same resource, or outages [48, 81, 106, 128, 257, 280, 313].

At runtime, the whole technology stack has to be monitored: network traffic, racks, single servers, VMs hosted on each server, applications such as Apache Hadoop running on each VM, or single jobs executed by an application [250, 315]. Thus, runtime monitoring requires multiple publishers to provide runtime information about different entities. For example, monitoring systems like Ganglia or Nagios report on the state of a VM, application-specific agents like Hadoop Task- Trackers [16] or Borglets [414] monitor job execution, and components of the hosted application provide application-specific metrics. The data provided by these publishers is sometimes redun- dant in its content but differs in its QoI properties such as sampling rate, granularity, precision, or latency. The same data is consumed by many different subscribers such as applications for billing and metering, data warehouses, resource managers such as BorgManagers or Hadoop JobTrackers, dashboards, the applications themselves, load balancers, or cooling systems.

Requirements of subscribers regarding different quality-related properties of notifications are individual and can change dynamically over time. Some examples: monitoring data about a VM delivered at a given sampling rate and confidence of detection might be sufficient for the purpose of one subscriber while another subscriber might need the same type of data at a higher sampling rate but would tolerate less confidence of detection or precision – a third subscriber might not care about precision at all but requires measurements about the same entity from three different publishers for cross-validation; monitoring data in its current form might be sufficient for a subscriber as long as there is no indication of malfunction at the monitored entity – in case of anomalies, the same data is required at a high sampling rate for root cause analysis by this subscriber while other subscribers still require a lower sampling rate as they are resource- restricted.

In the context of reactive data center management systems, providing, processing or consum- ing data with insufficient QoI has a severe impact. Resource managers and load-balancers in data centers rely on precise data about the ambient temperature and energy consumption of server racks. They are bound to misjudge the actual utilization and power consumption of resources if the data they receive is imprecise or precise data is drowned out by too much imprecise data. Consequently, resources might be overloaded and overheat, resulting in outages and vio- lated SLAs for jobs running on these resources. Alternatively, underutilization of resources or overprovisioning of cooling facilities results in skyrocketing costs [81, 106, 313].

Smart Supply Chain Management and Industry 4.0

Advances along the whole technology stack have accelerated the emergence of the term Industry

4.0 [203, 282]. The term denotes the vision of tightly integrated production and delivery pro-

cesses that rely on machine-to-machine communication and the IoT to monitor, execute and optimize the manufacturing value chain. On the device level, increasing miniaturization and decreas- ing production costs enable a myriad of sensors to be used in monitoring real-world conditions while actuators can manipulate objects and processes in the real-world. Sensors and actuators are omnipresent in modern supply chains and production processes to feed EBSs and SOAs. For example, as part of Wireless Sensor Networks (WSNs), or as active and passive Radio-Frequency IDentification (RFID) tags in Cyber-physical Systems (CPSs) [76, 78, 176, 177, 208].

The resulting applications are distributed and form federated systems with a high degree of heterogeneity and dynamics; often they are a mix of energy- or otherwise resource-constrained participants and Cloud-based backend systems with no such constraints. As for smart buildings, multiple publishers for the same type of information are bound to become available over time as new devices are added that provide a bundle of capabilities previously provided by dedicated devices. Subscribers in those systems can range from Cloud-based Enterprise Resource Planning (ERP) applications and data warehouses to resource-constrained mobile devices such as smart glasses or handhelds. The data provided, processed and required is also very heterogeneous in terms of type, granularity and quality-related properties. For example, position information is provided or required about single items, or about the container or even the truck the items are contained in; some sensors provide precise information about the position or status of an entity while others have a certain drift; others again cannot provide data as frequently as required due to energy-constraints.

In terms of QoI, applications in the domain of smart business processes in manufacturing and logistics usually do not require subsecond latencies. Rather, they require complete, precise and

trustworthy information about the state of a process. Incomplete or imprecise data can lead to miscalculations of supplies, lot sizes or due dates. At the same time, energy-constrained devices have to avoid draining their batteries and rendering them useless. Thus, energy-constrained publishers have to be aware of interested subscribers and their required sampling rates [62, 99, 261, 325, 415, 418].

Limitations of a Typical EBS Regarding QoI

The three examples from different domains of modern reactive systems illustrate the relevance of QoI for applications in an EBS. In a typical EBS, however, runtime support for QoI is insufficient as requirements about QoI can only be supported implicitly by encoded types, or by additional metadata in notifications; some MOMs offer explicit support only for domain-specific and fixed sets of properties. Both approaches have limitations on the conceptual and technological level.

Implicit support for requirements about QoI-related properties can be provided by publishers in an EBS by advertising types that encode quality-related properties in their name (e.g., Cpu- Usage_rate50_confidence70), or by adding metadata to the content of each published notifica- tion (e.g., rate=50, confidence=70). Subscribers can express their requirements by subscribing to the encoded type that fits their requirements best, assuming that the semantics are known. This approach, however, is limited in terms of expressiveness and efficiency. First, using encoded types restricts the set of available properties to those determinable by publishers at design-time, excluding important runtime properties like latency and reliability that are determinable only by the MOM before dispatching notifications to subscribers. Crucially, publishers cannot coor- dinate their supply with the demand of subscribers, as there is no feedback from subscribers to publishers in a typical EBS. Above all, interdependent properties that require the participation of multiple publishers cannot be supported in a typical EBS using encoded types, as there is no coordination between publishers. For example, a typical EBS cannot support the requirement of a subscriber about a number of alternatives, i.e., the same type of notification has to be provided by a certain number of different publishers. Second, using encoded types for different combina- tions of quality-properties would result in an unmanageable growth of available types and traffic overhead as the same information has to be processed for multiple encoded types [23, 85, 171]. Such overhead, however, is not suitable for environments where processing data is expensive.

Explicit support for a few quality-related properties is provided by MOMs like IndiQoS [85], Adamant [213] with the underlying Data Distribution Service (DDS) [197, 269, 334, 345], or Harmony [253, 428]. These solutions are also limited on the conceptual as well as on the technological level. On the conceptual level, they focus on a fixed set of MOM-related QoS properties at a low level of abstraction, which they try to satisfy by adapting the MOM on the transport protocol level only. They do not consider requirements about QoI properties that would require publishers to adapt at runtime. On the technological level, they have specific requirements about the infrastructure and require tight vertical integration across the technology stack to switch between custom transport protocols. The applicability of these approaches in heterogeneous IoT deployments that involve Cloud-based services, however, is limited. For example, direct access to specific hardware features on the host machines for performance tuning is no longer provided and transport mechanisms like multicast are not available in Cloud environments [173].

In document UNIVERSIDAD DE EL SALVADOR FACULTAD MULTIDISCIPLINARIA PARACENTRAL DEPARTAMENTO DE CIENCIAS ECONÓMICAS LICENCIATURA EN CONTADURÍA PUBLICA (página 184-198)