Evelyn I. Rodríguez Morrill 1 y Leticia Villarreal Caballero
1. Deporte y Sociedad, y las preguntas que institucionalizaron un campo
In this section, we organize the results of our mapping study of the surveyed articles and present a discussion on the common pitfalls when using SVDBs for SE tasks.
The common use of SVDBs within this SE task is to investigate vulnerabilities impacts on OSS libraries, to elicit security requirements through risk-based argumentation, and vulnerability prioritization based on severity impact.
3.5.1 How the Surveyed Articles Used SVDBs for Different SE Tasks
For each task, we look at two dimensions: (1) the SVDB that is used, and (2) the repositories that are used along with the SVDB. In other words, given a SE task, we want to answer the questions of which SVDB is usually used and which SE repositories are often used along with SVDBs. The results may help new researchers (and practitioners) determine how to best use SVDBs to a particular SE task.
Table 10 shows how the surveyed articles support each of the SE tasks. We focus on the six tasks that we previously identified (see Table 4), and we show the percentage of the surveyed articles that used each kind of SVDBs and SE repository. We found that most articles reported on the use of common SVDBs compared to specialized SVDBs. The reasons for this are manifold such as: (1) Specialized SVDBs contain known security vulnerabilities affecting specific systems written in a specific programming language (e.g., PHP). Analysis results obtained from specialized SVDBs are typically not generalized to other systems (e.g., using Java vs PHP), therefore limiting the potential impact of the published work. (2) Common SVDBs contain more diverse known security vulnerabilities affecting different types of software systems and therefore can accommodate different research interests. (3) Among the common SVDBs, we found NVD to be the most popular SVDB used in the SE community. There are several reasons for the popularity of NVD including ease of access (e.g., automatic data feeds), updates, size, and quality of the dataset.
Table 10: Summary of how surveyed articles used SVDBs for different SE tasks. The numbers are shown in percentage for each category (i.e., SVDBs types, and SE Repo. used).
Empirical
Research Modeling Testing Vulnerability Analysis Risk Analysis Other SE Tasks
SVDBs Types Common 47 22 15 13 9 3 Specialized 2 0 0 2 0 0 SE Repo. Used Source Code 15 10 8 13 3 1
Bugs / Vuln. Reports 29 9 2 2 2 2
Logs 2 1 2 0 0 0
Req./Desgin 1 2 0 0 3 0
Even with the popularity of common SVDBs, studies have shown that developers are often not aware of known security vulnerabilities affecting their systems [156], [164], [165], resulting in situations where known vulnerabilities are only late or never patched after the disclosure of a vulnerability. This implies limited communication between vendors in charge of patching the vulnerabilities and common SVDBs providers, since vendors are expected to provide a new (patched) version of components with known vulnerabilities or at least provide users with patch information on how to fix the vulnerabilities.
A limitation of many common SVDBs is that they do not include the actual code causing the security vulnerability, which is in contrast to specialized SVDBs that often share the code of known security vulnerabilities. Having direct access to this vulnerable code fragment simplifies the work of SE researchers evaluating their security analysis approaches.
We find that most tasks only analyze source code and bugs/vulnerability reports (i.e., issue tracker data) and rarely use other repositories. With the open source community and its supporting ecosystem providing access to its source code and bug repositories, the security research community takes advantage of these available knowledge resources to analyze known vulnerabilities reports and link vulnerabilities reported in SVDBs with available open source issue tracker or version control systems.
We reviewed SE tasks discussed in SE articles that used SVDBs in their research methodology to identify how these SVDBs are used. We classified the articles based on describing SE tasks for a more fine-grained analysis. Our findings reveal that empirical research such as security studies (e.g., case studies, comparison studies) are among the most common research activities covered by our reviewed articles. Although some research has shown that combining multiple SVDBs can improve vulnerability detection coverage and performance [91], [166], we found most articles cited only a single SVDB. Also we found most SVDBs host vulnerabilities affecting commercial (or closed-source) applications, but most of the use case studies were conducted on open source applications such as Apache project39, Chromium project40, and Mozilla open source project41. We believe the reason for this is that the open source project provides rich information resources (e.g., issue tracker data, source code and version history, email archive, etc.) which can be used along with SVDBs information. This
39 https://projects.apache.org/
40 https://www.chromium.org/
helps to study the complete development environment for analyzing security vulnerability (i.e., evolution). This advantage of gaining such application’s information is not always available for commercial (or closed-source) applications.
The second most common SE task covered by our reviewed articles is modeling. The common use of SVDBs in this task (e.g., [129]) is to apply a vulnerability model to a number of well-known vulnerabilities. In general, this has resulted in comprehensive understanding of the vulnerabilities and the measures required to prevent them.
Lastly, testing task is also a common SE task covered by our survey articles. Most of the testing approaches were conducted on web-applications (e.g., [54], [76], [151], [152]) and, more specifically, validating testing approaches on injection attack (e.g., [73], [76], [146]). One of the reasons the injection attack is classified as 1st most critical web application security risk is that it
has been confirmed as an OWASP Top 10 42 attack type. However, SVDBs attracted SE researchers in this domain (testing) due to the rich information provided by SVDBs regarding this type of attack. For example, in 2018 NVD host 6.09% injection attack, ranging from SQL injection 2.56% (inject SQL commands that can read or modify data from a database) to OS command injection 0.66% (full system compromise).
3.5.2 Common Pitfalls when Using SVDBs in Software Engineering
Tasks
In Section 3.4, we discussed how SVDBs are commonly used in different SE tasks. In this subsection, we further discuss the common pitfalls when using SVDBs in SE tasks.
42 https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project
Summary: The common use of SVDBs in these SE tasks is extracting vulnerability
examples for validating the assumptions proposed by authors and comprehending the security vulnerability affecting the software system. Also, studies on vulnerability repositories focus on harvesting statistical trends or creating vulnerability models and using them for prediction. Other studies focus on the vulnerability reporters who possess the most important information.
3.5.2.1 Vulnerability disclosure date
Although researchers in SE have proposed several approaches ranging from empirically studying the vulnerability evolution to vulnerability prediction models, we see very few studies that consider the various vulnerability sources in order to avoid bias threats to the validity of their approaches. Determining the public disclosure date of a vulnerability is vital to understanding the timeline of a vulnerability’s life cycle. Previous studies (e.g., [102], [167], [168], [106], and [169]) relied on a single SVDB (e.g., NVD) as a source for their empirical studies on vulnerability life cycle and security patches. But SVDBs entries contain a CVE publication date that corresponds to when the vulnerability was published in the database, not necessarily when it was actually publicly disclosed. For example in our surveyed articles, Munaiah et al. [82] and Murtaza et al. [86] relied on a single SVDB source for their vulnerability empirical analyses. Munaiah et al. [82] used NVD and relied on the vulnerability disclosure date provided by NVD only. In the same way, Murtaza et al. [86] used NVD vulnerabilities release dates without considering other SVDBs. However, this may affect the proposed approaches on vulnerability lifetime analysis, which would in turn affect the authors’ conclusions.
Current studies usually use SVDBs as a black box and do not consider the effect of using different SVDBs information on the SE task. As a result, future studies may want to examine the effect of different SVDBs information on SE tasks. For example, there is no clear guideline on how estimating the vulnerability disclosure date may affect the result of vulnerability prediction models. Providing such a guideline can help SE researchers choose better vulnerability disclosure dates.
3.5.2.2 Vulnerability information noise
When using SVDBs for tasks such as tracing security vulnerabilities, unstructured vulnerability information in some SVDBs may affect the vulnerability extraction and linking process. For example, unstructured vulnerability information in SVDBs requires text mining techniques and human labor in order to detect patch information and locate vulnerability causes in the source code. Since this relies on the vulnerability auditors and reporters themselves, we do not currently see a way to enforce this. Other vulnerability representation-related solutions include those that
synthesize exploit code examples for specific vulnerability or those that try to present existing vulnerability online resources in ways more useful to developers (e.g., [165]).
3.5.2.3 Lack of knowledge
Sometimes researchers may not be aware of different existing public SVDBs. Due to that lack of knowledge, researchers might miss important vulnerability information which is already known. For example, Scandariato et al. [61] showed detailed background about SVDBs (i.e., NVD); however, they used the Fortify SCA tool to identify vulnerabilities via static source code analysis rather than using the vulnerabilities reported in a database such as the NVD. For that, the authors claimed that “the choice was obligatory, as there are no public databases with sufficient
numbers of vulnerabilities to analyze for Android application”. Although Android vulnerabilities
were rarely published in NVD at the time, they were published and discussed in other SVDBs very often; for example, Exploit-DB43 and vulnerability-lab44 host over 200 android apps
vulnerabilities.
However, relying on vulnerability scanner tools to extract vulnerabilities without (or instead of) using known vulnerabilities published in different SVDBs may increase the threats to validity of a researcher’s approach and increase the effort in evaluation.