La participación con equidad - Abriendo espacios flexibles en la escuela

Research on source code reuse has become an important part of program comprehension and software maintenance research. In fact, reusing source code from the crowd is in its objective similar to traditional source code reuse. Thus, in this section, we briefly review both traditional code reuse within software systems and code reuse from the Web. We discuss first empirical studies covering code search on the Web, then using the Web to recommend source code, finally, review some work related to copy and paste source code.

2.3.1 Empirical Studies of Code Search on the Web

A number of studies exist that focus on understanding the role of source code search in the software development process. An early study conducted by Sim et al. [175], reports on a user survey conducted with programmers to classify reasons why developers resort to online source code search during programming and maintenance tasks. Their study reports that the most common search queries were related to: function definitions, different uses of a function, variable definitions, and different uses of a variable. Furthermore, they observed that most search queries were performed to support defect repair, code reuse, program comprehension, feature addition, and impact analysis. Brandt et al. [36] empirically examined how developers use the Web during programming tasks. In their study they observed the behavior of 20 programmers and analyzed Web query logs from 24,293 programmers. Their findings showed that programmers use the Web to quickly learn technologies, looking for clarification, and to recall a certain programming languages syntax. In addition, they reported that developers do not immediately test copied code from the Web. In [70], an experimental study conducted by Gallardo-Valencia and Sim to investigate the types of problems which motivate developers to do searches on the Web. In their user study, they observed 24 developers in an industrial setting. They reported that around 82% of Web searches are performed in an ad-hock manner also known as opportunistic search. This type of Web search is done to 1) remember syntax details, 2) clarify implementation details including fixing a bug, 3) learning new concepts. Additionally, 18% percent of the Web search conducted to find open source projects or code snippets. Sadowski et al. [166] studied programmers’ behavior when searching for code at Google Inc R_{. They analyzed}

survey results. They found that developers at Google search for code very frequently with an average of five session everyday. Furthermore, developers preform code search for different development tasks, including code review, finding a code snippet, and retrieving programming knowledge.

These empirical studies show that software developers frequently resort to code search during programming tasks. However, there are several common threats to validity in these studies, including that the results are based on observing or surveying a limited number of human subjects and that

analysis was conducted whether the code from the crowdsourcing platforms was actually reused in software systems.

2.3.2 Source Code Recommendation from the Web

Empirical studies in software engineering [175, 36] reported that developers often search the Web for example or code snippets. Numerous approaches have been proposed in the literature to support developers during programming tasks by providing code samples they can reuse [88, 186, 89].

To help programmers effectively benefit from the Web search engine, Hoffmann et al. [88] developed Assieme, a Web search interface that combine information from Web-accessible Java Archive files, API documentation and Web pages that include sample code snippets. It uses the Google search engine to find relevant Web pages containing source code. It then parsed the Web pages to create source code databased, and uses text on Web pages to rank code snippets and the libraries that are referenced in that code. Mica is a tool that augments the Web search to find code examples and API documentations [182]. It eliminates irrelevant results from Google Web APIs, and highlights the source code elements. PARSEWeb [186] is a tool that assists programmers in using APIs. It uses online search to recommend relevant method call sequences by applying static analysis on code snippets which return from a search engine. Other approaches aim to recommend code elements or code examples by mining source code in software repositories. For example, Holmes et al. [89] used structure matching from source code in Eclipse IDE to retrieve the most similar code snippets. The approach combines three heuristics to measure similarity between code snippets; 1) inheritance similarity that finds classes with the same parent. 2) method call similarity that retrieve methods with similar call-graph. 3) a use heuristic that find methods using similar data types. Keivanloo et al. [102] proposed a reusable code recommendation by ranking high quality code snippets. Their approach measured the code snippet’s similarity to a query by combining textual similarity and clone detection techniques. The recommended code snippets are ranked according to completeness and popularity of their usage patterns. More recently, Moreno et al. [137] presented MUSE, a technique for mining and ranking actual code examples that show how to use a specific method. Their approach parses existing applications to collect method usages pattern by static code analysis. A clone detection tool is used to group similar code example and MUSE ranks them according to their popularity. MUSE provides users with one code snippets from each group of clone that it selected based on its reusability, understandability, and popularity.

The aforementioned examples focus on recommending source code snippets to developers for learning API usage or source code reuse. Even though these approaches are appropriate for assisting developers in programming tasks, they lack information about the quality of the recommended source code. In addition, developers may reuse the suggested code snippets in a new context without

completely understanding the functionality of the code. One of the main aims of this thesis is to study the impact of integrating reused code from crowdsourcing platforms on the quality of the target software system.

2.3.3 Empirical Studies of Copy and Paste Source Code

Using source code from crowdsourcing platforms sometimes involves code cloning (copy and paste) and detecting code reuse. Research done in this area may also, in some cases, be relevant for studying code reuse form crowdsourcing platforms.

One of the most common code smells in software systems is cloning [104, 164] , copying code fragments and then reuse by pasting with or without minor modifications. Cloning can occur within the same project, or across projects as part of source code reuse. However, research in the area of code cloning lacks the consensus about its impact of clones on the quality of source code. For example, Kapser and Godfrey [99] described several clone patterns, and found that clones have often a positive effect on open source systems. Rahman et al. [155] empirically studied the relationship between code clones and defect proneness, and found that the majority of defects are not significantly associated with code clones. On the other hand, there are studies that discuss the negative effect of code clone on source code quality. Juergens et al. [94] investigated the occurrence of clones in open and proprietary systems. They found that inconsistent clones introduce defects that are difficult to detect. Lozano and Wermelinger [117] reported that code clones increase the maintenance efforts when compared to non-cloned code. Mondal et al. [136] linked the type of clones (Type-1, Typ2 and Type3) to defect proneness in open source systems, and found Type-3 is the most related to bug fixing change.

Most of the studies focus on the clone existence within the same software. However, developers frequently copy and paste code form outside resource such as Q&A websites [70], and little effort has been devoted to study the impact of such copy and paste action. For example, source code posted on Stack Overflow may not be complete or can contain vulnerabilities.

Other research works focus on the detection and study of code cross software systems boundaries. Inoue et al. [92] developed a prototype called Ichi Tracker to explore the evolution of a code fragments utilizing online code search engines and code clone detection techniques. The tool accepts a code fragment as an input and returns related files containing. Using the Ichi Tracker, developers can identify the origin of a source code fragment or a modified version of the code fragment, including potential license violations. German et al. [72] examined source code migration across three different systems (Linux, FreeBSD and OpenBSD) and its legal implications. They tracked reused code fragments using clone detection methods. Their result showed that code migration occurred between these systems. Additionally, the copying tends to be preformed without violating the license terms.

Davies et al. [48] proposed a signature-based matching technique to determine the origin of code entities. They found that their technique can be utilized to identify security bugs in reused libraries. Kawamitsu et al. [101] proposed a technique to automatically detect source code reuse between two software repositories at the file level. Their approach is based on measuring the similarity between pair of source files and using the commit time to identify the original source file revision. They found that in some instances developers did not record the version of the reused file correctly.

Common to these studies is that they propose techniques to detect and investigate the reuse of software components. One of the main objectives of this thesis is to conduct empirical studies to assess qualitative and quantitative the reuse of code form crowdsourcing platforms such as (e., npm and Stack Overflow).

In document Abriendo espacios flexibles en la escuela (página 108-111)