• No se han encontrado resultados

Capítulo 4 Diagnóstico situacional de la empresa de servicios FUMILIMPIEZA de la ciudad de

4.2. Análisis externo de la empresa

4.2.3. Matriz de perfil competitivo

To define goals and objectives, gather information on all known problems, then extend it and give it structure by defining Key Performance Indicators (KPIs). You must also determine the level of reporting detail, the target audience, and if alerts are necessary.

TIP

Do not attempt to measure all metrics for all of the traffic all the time. In most cases, this approach will not support your monitoring objectives and will negatively affect system performance, while at the same time obscuring the relevant data. For a discussion on these topics, refer to

Data Center Real User Monitoring Capacity Planning and Performance Assessment User Guide.

The list of known problems and issues is specific to each enterprise and should be the starting point for defining the goals of the solution. Convert these problem areas into measurable objectives expressed as metrics. There may be specific patterns of network degradation to quantify.

In general there are three KPI areas to consider:

1. Application performance and availability

2. Network transport

3. Volume - Business Impact Analysis (BIA)

These KPIs can be measured at the enterprise level or at lower levels (such as regions, sites, or even individual users) depending on the required granularity of the solution. Note however that fine granularity will generate large amounts of monitoring data that may affect monitoring capacity.

Application Performance and Availability KPIs

Application performance and availability are at the highest level of product functionality.

Application Availability

Application availability is the percentage of successfully established TCP sessions. It reflects the health of the network and server infrastructure. Availability is expected to stay at 100% all the time unless problems occur.

Network Transport KPIs

The most important metrics for network transport are:

Round-Trip Time (RTT)

Round-trip time between remote sites and the data center hosting business critical applications contributes to the end-user experience. Know the acceptable RTT values. For example, RTT between Europe and USA it is expected to be between 100 and 150 ms.

Two-way loss rate

The percentage of total packets (client and server) that were lost (due to network congestion, low router queue capacity or other reasons) and needed to be retransmitted.

TCP errors

The total number of TCP errors.

Those errors may indicate server or application problems and therefore measurements of those are critical to understanding the issues that may affect end-user experience. AMDs measure and report on the following types of TCP errors:

• Connection Refused Errors - Client attempts to open a TCP session with a server, which rejects the request. SYN packet from Client is followed by RESET packet from Server, with matching TCP sequence numbers. This error is typically caused by resource exhaustion on the server, which is unable to accept more concurrent TCP sessions. This may be either a configuration issue (too few resources allocated in the kernel) or lack of memory. SYN flood attacks typically result in servers being unable to accept new connections.

• Server session termination error - Server is unexpectedly terminating a connection that was successfully opened. The server sends a RESET packet to the Client. Such an error originates at an application using TCP session that is monitored. It does not necessarily mean application failure; usually it means that the application encountered a condition in which it decided to immediately terminate session with the client, for example, because of an application security policy violation by the client.

• Session Abort - Client is unexpectedly terminating a connection that was successfully opened. The Client sends a RESET packet to the Server. These errors are inspected in the context of the client application and may or may not be reported. For example, the browser running HTTP may terminate the load of a GIF file if it is older than the one that it had previously cached and this is normal behavior. However, if all connections to the server are terminated because the user hits the STOP button, then this is abnormal session termination and is reported as "Aborted operation" or "Stopped Page".

• Client not responding errors (server timeout errors) - Server networking stack takes an assumption that the network connection to the client exists, but the client remains idle and does not respond. In such a case, the server closes the TCP session with the RESET packet. Such a condition may occur when the client has been silently disconnected from the network, for example, due to a link failure, or the client has crashed. Note that this error will not occur if the client has ended the session gracefully, e.g. by closing the client application.

• Server not responding errors (client timeout errors) - Client networking stack takes an assumption that network connection to the server exists, but the server remains idle and does not respond. In such a case, the client closes the TCP session with the RESET packet. This may occur either during the Session Setup phase (no response to the SYN packet), or during a normal data exchange process. Such a situation may result in the intermittent network problems between the client and the server. In the case the traffic is routed through asymmetric paths across the Internet, which is often the case, the path from the server to the client may be broken.

Volume and Business Impact Analysis KPIs

The most important metrics for volume analysis are:

Number of Operations or Transactions

The number of total transactions is a measure of Business Impact at the Volume Level

Total bytes

The number of all transmitted bytes (client + server).

Unique users

The number of unique users detected in the monitored traffic.

Granularity of Monitoring

The required granularity of monitoring directly impacts capacity.

User

Determine whether you have to provide granularity at the individual user level. You may decide to aggregate users into sites/Autonomous Systems instead of breaking them down by individual user. For large enterprises, visibility at the level of individual users may lead to capacity problems. Determine your regions/areas/sites structure and IP address ranges for users and servers.

URL

Determine whether you need to monitor specific URLs. As with user granularity, there are potential scaling problems inherent in monitoring every URL.

Target Audience

To determine the required reports and analysis, consider the intended target audience. In general, the audience consists of networking, development and server teams.

Mode of Utilization

You must consider if you are going to use the solution straight out of the box or with custom reports. The out-of-the-box solution produces reports with many metrics and great detail. Many users decide to use a custom DMI report as an aggregation point. You may also use DC RUM as a simple data collection tool to feed data into other report engines.

DMIAutomatic Alerts

Use automated alerts when particular metrics achieve certain values specified a number of times, or if certain patterns are observed in the monitored traffic. Enterprises and applications measure patterns and performance indicators for a period of a few days to a full year