1.3. El Miedo
1.3.2 Clasificación del Miedo
1.3.2.4. Directo o Indirecto
Software security and application security have become big business.
Advocates of different security-enhanced software processes, software security best practices, and a variety of supporting techniques and tools all suggest that those who adopt them will reap great benefits in terms of more secure (or at least less vulnerable) software. However, the fact is that there are few concrete metrics by which to precisely and objectively measure the effectiveness (innate and comparative) of all these different processes, practices, techniques, and tools. Moreover, there is active debate underway in the metrics and measurement community, which is attempting to define such metrics, regarding exactly what can and should be measured as a meaningful indicator that software is actually secure (or not vulnerable).
In August 2006, the first-ever conference devoted to security metrics, Metricon 1.0, [119] was held in Vancouver, British Columbia, Canada. Steve Bellovin, one of the “greybeards” of the information and network security community, summed up the problem nicely in his keynote address to Metricon.
He argued that for software, meaningful security metrics are not yet possible:
Safes are rated for how long they’ll resist attack under given circumstances. Can we do the same for software?…It’s well known that any piece of software can be buggy, including security software…This means that whatever the defense, a single well-placed blow can shatter it. We can layer defenses, but once a layer is broken the next layer is exposed; it, of course, has the same problem…The strength of each layer approximates zero; adding these together doesn’t help. We need layers of assured strength; we don’t have them. I thus very reluctantly conclude that security metrics are chimeras for the foreseeable future. We can develop probabilities of vulnerability, based on things like Microsoft’s Relative Attack Surface Quotient, the effort expended in code audits, and the like, but we cannot measure strength until we overcome brittleness.
Software Security Assurance State-of-the-Art Report (SOAR)
108
Section 5 SDLC Processes and Methods and the Security of Software
By Bellovin’s criteria metrics for software security are impossible because 100 percent security of software is not possible, i.e., one cannot measure what cannot possible exist. During the software security metrics track that followed Bellovin’s keynote (and the very fact that there was such a track implicitly refuted Bellovin’s argument), Jeremy Epstein of webMethods implicitly agreed with Bellovin that absolute security of software is probably not possible, [120]
but disagreed that it was impossible to collect some combination of statistics about software—measurements that are already being taken—and then to determine which of these metrics (alone or in combination with others)
actually says something meaningful about the security of the software. In short, given a statistic such as number of faults detected in source code, is it possible to extrapolate something about the influence of that statistic on the security of software compiled from that source code, i.e., do fewer faults in source code mean software that is less vulnerable? (Incidentally, this is the premise upon which the whole source code analysis tools industry is based.)
Of course one can reasonably argue, as Bellovin has, that even a single implementation fault, if exploited, can compromise the software. Even were there no implementation faults, the software could exhibit overall weakness due to inadequacies in its design and architecture. This type of inadequacy is much harder to pinpoint, let alone to measure.
Researchers involved in defining metrics for software security do not pretend they can define measurements of absolute security. The best they can hope for is to measure different characteristics and properties of software that can be interpreted in aggregate as indicating the relative security of that software, when compared either with itself operating under different conditions, or with other comparable software (operating under the same or different conditions).
Along the lines of Epstein’s suggestion, i.e., to gather statistics that one knows can be gathered, then to consider them in terms of their indications for software security, quite a bit of the software security metrics work to date has, in fact, involved investigating already-defined information security and software quality and reliability metrics, to determine whether any of these can be applied to the problem of measuring software security assurance (and, if so, which metrics). This is the approach of the DHS Software Assurance Program’s Measurement WG, for example (see Section 6.1.9.1). The WG’s approach is typical in its attempt to “leverage” metrics from the software quality arena (e.g., CMMI) and the information security arena (e.g., CC, SSE-CMM, [121] NIST Special Publication (SP) 800-55, ISO/IEC 27004).
One software security metric that is already in use is Microsoft’s Relative Attack Surface Quotient (RASQ), referred to by Steve Bellovin in his Metricon address. Developed with the assistance of CMU to compensate for the lack of common standards for software security metrics, RASQ measures the
“attackability” of a system, i.e., the likelihood that an attack on the system will occur and be successful. A RASQ score is calculated by finding the root attack vectors,
Software Security Assurance State-of-the-Art Report (SOAR) 109
Section 5 SDLC Processes and Methods and the Security of Software
which are features of the targeted system that positively or negatively affect its security. Each root attack vector has an associated attack bias value between 0 and 1, indicating the level of risk that a compromise will be achieved by the attack vector, and an effective attack surface, indicating the number of attack surfaces within the root attack vector. The final RASQ score for a system is the product of the sum of all effective attack surfaces multiplied by the root vector’s attack bias.
According to a study by Ernst & Young, [122] the RASQ for an out-of-the-box Windows 2000 Server running Internet Information Server (IIS) is 341.20, a high attackability rating (based on the number of vulnerabilities found in Windows 2000 Server since its release). By contrast, the RASQ for Windows Server 2003 running IIS was significantly lower—156.60, providing evidence that Microsoft has addressed many of the security shortfalls in the earlier Windows Server version.
The Ernst & Young study notes that RASQ’s usefulness is limited only to comparing relative attack surface rates between Microsoft operating system versions, because RASQ relies heavily on parameters that are only meaningful within those operating systems. In addition, the study stressed that RASQ does not measure a system’s vulnerability to attack or its overall level of security risk. Nevertheless, building and configuring a system to lower its RASQ score will reduce the number of potentially vulnerable attack surfaces, thereby reducing its overall risk level.
In addition to RASQ, several other software security metrics have been proposed, and are under development, by researchers in industry, government, and academia. The following are some examples (this is by no means a
comprehensive list):
u Relative Vulnerability Metric—[123] Developed by Crispin Cowan of Novell, Inc., this metric compares the calculated ratio of exploitable vulnerabilities detected in a system’s software components when an intrusion prevention system (IPS) is present, against the same ratio calculated when the IPS is not present.
u Static Analysis Tool Effectiveness Metric—[124] Devised by Katrina Tsipenyuk and Brian Chess of Fortify Software, this metric combines the actual number of flaws (true positive rate) with the tool’s false positive and false negative rates, and then weights the result according to the intended audience for the resulting measurements, i.e., tool vendors wishing to improve the accuracy of the tool, or the auditors attempting to avoid false negatives, or software developers trying to minimize false positives.
u Relative Attack Surface Metric—[125] Under development by Pratyusa K.
Manadhata and Jeannette M. Wing of CMU, this metric extends CMU’s work on the Microsoft RASQ to define a metric that will indicate whether the size of a system’s attack surface is proportional to size of the system overall, i.e., if A > B, is the attack surface of A larger than the attack surface of B? The metric will define a mathematical model for calculating the attack surface of a system based on an entry point and
Software Security Assurance State-of-the-Art Report (SOAR)
110
Section 5 SDLC Processes and Methods and the Security of Software
exit point framework for defining the individual entry and exit points of a system. These entry and exit points contribute to the attack surface according to their accessibility, attack weight, damage potential, effort, and attackability. The CMU metric is more generic than RASQ and thus applicable to a wider range of software types. In their paper,
Manadhata and Wing calculate the attack surface for two versions of a hypothetical e-mail server. However, the CMU attack surface metric is also significantly more complex than RASQ and requires further development before it will be ready for practical use.
u Predictive Undiscovered Vulnerability Density Metric—[126] O.H. Alhazmi, Y.K.
Malaiya, and I. Ray at Colorado State University are adapting quantitative reliability metrics to the problem of predicting the vulnerability density of future software releases. By analyzing data on vulnerabilities found in popular operating systems, the researchers have attempted to determine whether “vulnerability density” is a useful metric at all, then whether it is possible to pinpoint the fraction of overall software defects with security implications (i.e., those that are vulnerabilities). From this analysis, they produced a “vulnerability discovery rate” metric. Based on this metric, i.e., the quantity of discovered vulnerabilities, they are now attempting to extrapolate a metric for estimating the number of undiscovered (i.e., hypothetical) vulnerabilities.
u Flaw Severity and Severity-to-Complexity Metric—[127] Pravir Chandra of Foundstone (formerly of Secure Software Inc.) is researching a set of metrics for: (1) rating reported software flaws as critical, high, medium, or low severity; (2) determining whether flaw reports in general affect a product’s market share, and if so whether reporting of low severity flaws reduce market share less than reporting of high severity flaws; and (3) determining whether it is it possible to make a direct correlation between the number and severity of detected vulnerabilities and bugs and the complexity of the code that contains them.
u Security Scoring Vector (S-vector) for Web Applications—[128] Under development by a team of researchers from Pennsylvania State University, Polytechnic University, and SAP as a “a means to compare the security of different applications, and the basis for assessing if an application meets a set of prescribed security requirements.” The S-vector metric will be used rate a web application’s implementation against its requirements for: (1) technical capabilities (i.e., security functions), (2) structural protection (i.e., security properties), and (3) procedural methods (i.e., processes used in developing, validating, and deploying/configuring the application) in order to produce an overall security score (i.e., the S-vector) for the application.
u Practical Security Measurement (PSM) for Software and Systems—[129] In February 2004, the PSM Technical WG on Safety and Security began work to tailor the ISO/IEC 15939 PSM framework to accommodate the
Software Security Assurance State-of-the-Art Report (SOAR) 111
Section 5 SDLC Processes and Methods and the Security of Software
measurement of several aspects of software-intensive system security, including (1) compliance with policy, standards, best practices, etc.;
(2) management considerations (resources/costs, schedule, project progress); (3) security engineering concerns (e.g., conformance with requirements, constraints, security properties, stability, architectural and design security, security of functionality and components, verification and test results); (4) outcome (in terms of security performance, risk reduction, customer satisfaction); (5) risk management considerations (e.g., threat modeling, attack surface, vulnerability assessment, countermeasure design/implementation, and trade-offs); and (7) assurance (in support of assurance cases, independent product evaluations, etc.).
u Measuring Framework for Software Security Properties—[130] Another framework, this one proposed by the DistriNet research team at Catholic University of Leuven (Belgium). Starting with two lists of security
principles and practices—M. Graff and K. van Wyk’s Secure Coding:
Principles and Practices (O’Reilly, 2003) and NIST SP 800-27, Engineering Principles for Information Technology Security—the researchers
produced an initial short list of five properties that could realistically be measured: (1) smallness and simplicity; (2) separation of concerns; (3) defense in depth; (4) minimization of critical functions/components;
and (5) accountability. In future research, the team plans to identify more measurable properties and to identify meaningful metrics that can be used for such measurements, e.g., the Goal Question Metric. [131]
u Metrics Associated With Security Patterns—[132] Thomas Heyman and Christophe Huygens, also of Catholic University of Leuven (Belgium), are investigating ways in which security metrics can be directly associated with software security patterns in order to measure the effectiveness of those patterns in securing the software system. In this case, as discussed in Section 5.3.3, the security patterns they are considering are those that describe software security functionality, so it is likely that the metrics the team defines will measure effectiveness of those security functions in terms of policy enforcement or intrusion/compromise prevention.
u Quantitative Attack-potential-based Survivability Modeling for High-consequence Systems—In 2005, John McDermott of the NRL CHACS published an extensive paper [133] on his team’s work on methodology for using Performance Evaluation Process Algebra (PEPA) to mathematically model and quantify the survivability of a software-based system to what he terms “human sponsored” (rather than stochastic) faults. Quantification is achieved using mean time to discovery of a vulnerability [134] as the metric, i.e., the longer a vulnerability goes undiscovered due to lack of activation of the associated fault, the longer the software system is considered to have survived in the undetected presence of that fault.
Software Security Assurance State-of-the-Art Report (SOAR)
112
Section 5 SDLC Processes and Methods and the Security of Software
In October 2006, the Second ACM Workshop on Quality of Protection was held in Alexandria, Virginia; it included a session devoted to software security metrics during which other research into software security metrics was presented. In June 2007, the third Workshop on Assurance Cases for Security will focus on security assurance metrics, including metrics for software security assurance. See Section 5.1.4.3 for more information on these workshops.
To date the efforts of the NIST Software Assurance Metrics and Tools Evaluation (SAMATE) program (see Section 6.1.10) have focused almost exclusively on tools evaluation, although it is expected that they will become more active in pursuing the metrics and measurement portion of their charter.
For Further Reading
“Measurement.”
Available from: https://buildsecurityin.us-cert.gov/daisy/bsi/articles/best-practices/measurement.html
“NIST Information Technology Laboratory Software Diagnostics and Conformance Testing: Metrics and Measures.”
Available from: http://samate.nist.gov/index.php/Metrics_and_Measures
Andy Ozment and Stuart E. Schechter (Massachusetts Institute of Technology), “Milk or Wine:
Does Software Security Improve with Age?”, in Proceedings of the 15th Usenix Security Symposium, July 31–August 4 2006.
Available from: http://www.cl.cam.ac.uk/~jo262/papers/Ozment_and_Schechter-Milk_Or_Wine-Usenix06.pdf
Andy Ozment (University of Cambridge [UK]), “Software Security Growth Modeling: Examining Vulnerabilities with Reliability Growth Models”, in: Quality of Protection: Security Measurements and Metrics, Dieter Gollman, Fabio Massacci and Yautsiukhin, Artsiom.
Available from: http://www.cl.cam.ac.uk/~jo262/papers/qop2005-ozment-security_growth_modeling.pdf MITRE Corporation, “Making Security Measurable” [portal page], This portal provides a “collection of information security community standardization activities and initiatives” provides a portal to all the different MITRE security that are guided by the informal mission statement on the portal page: “MITRE’s approach to improving the measurability of security is through enumerating baseline security data, providing standardized languages as means for accurately communicating the information, and encouraging the sharing of the information with users by developing repositories.”
Available from: http://makingsecuritymeasurable.mitre.org/ or http://measurablesecurity.mitre.org/.