• No se han encontrado resultados

2 MARCO TEÓRICO

2.4 CONOCIMIENTOS SOBRE LAS FASES DE LA GESTIÓN DE LOS

2.4.1 Gestión interna de los desechos hospitalarios

Above we describe the most closely related work to the work presented in this thesis. However the literature include much more work in the area of knowledge discovery from source code artifacts, which we present in the following.

API specifications: To mine specifications, many approaches have been proposed that

mainly rely on existing client code. However, as presented in Section 2.2.2, for specific APIs, client code might not be available or is insufficient. As a consequence, for such APIs, other approaches have been proposed that consider its documentation as the main source for automatically inferring specifications.

Doc2Spec [180] uses a NLP technique to analyze natural language API documentation to infer resource specifications. The inferred specifications are then used to detect bugs in open-source projects. On the other hand, D2Spec [167] uses semi-structures online

API documentation (typically in form of HTML pages) for automatically extracting web API specifications. Given a seed online documentation page of an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine-learning techniques to extract the base URL, path templates and HTTP methods - from API documentation pages containing free-form text and arbitrary HTML structures. More specifically, D2Spec uses classifiers and a hierarchical clustering algorithm to extract a base URL and path templated for an API, and searches the context of a path template to infer the HTTP method.

Bug reports: Bug reports published in bug repositories are usually composed of a mix-

ture of sentences in software language and natural language, and the domain-specific predefined fields. For this reason, many approaches are proposed for automatically gen- erating bug reports summaries, which are an effective way to reduce considerable time in wading through numerous bug reports. BNER [181] is an approach for bug-specific entity recognition based on Conditional Random Fields (CRF) model and word embed- ding technique. DeepSum [65] is an unsupervised approach that integrate the bug-report characteristics into a deep neural network for generating bug report summaries.

Clone detection: Deep Learning (DL) can effectively replace manual feature engineer-

ing for the task of clone detection. Source code can be represented at different levels of abstractions: identifiers, abstract syntax trees, control flow graphs, and byte-code. Tufano et al. [153] applies neural networks on the different representation of source code to identify similarities (clone detection) in source code. They conjecture that each code representation can provide a different, yet orthogonal view of the same code fragment, thus, enabling a more reliable detection of similarities in code.

Code-example retrieval: Many other approaches do not synthesize different API usages

found in source code into patterns, but instead present to developers previously written code snippets by searching through large-scale codebases based on a user-query. For example, MUSE [95] is an approach that mines and ranks code examples to show concrete usages for a specific API method. In this case, the user-query is represented by the API method of interest. CODEnn [45] is based on deep neural networks for suggesting relevant code snippets to developers to complete a task at hand. CODEnn jointly embeds code snippets and natural language descriptions (in the form of commented methods) into a high-dimensional vector space. In such a way, code snippets related to a natural language query can be retrieved according to their vectors.

Code synthesis: Existing heuristics methods in pairing the title of a post with the code

in the accepted Stack OverFlow (SO) answers are limited both in their coverage and the correctness of the NL-code pairs obtained. Yin et al. [170] propose a method to mine high quality aligned data from SO using two sets of features: hand-crafted features considering the structure of the extracted snippets, and correspondence features obtained by training a probabilistic model to capture the correlation between NL and code using

neural networks. These features are fed into a classifier that determines the quality of mined NL-code pairs. The method uses for training labelled examples. Reasonable results are achieved even when training the classifier on one language and testing on another (Java, Python), showing promise for scaling NL-code mining to a wide variety of programming languages beyond those for which we are able to annotate data. At the same time, Peddamail et al. [115] uses neural networks for the task of code summarization to automatically generate a natural language summary for a given code snippets. It is based on using a dataset of pairs < N L, code > for training their model. Hu et al. [55] uses neural networks to automatically generate comments for method declarations.

Software artifacts classification: Software artifacts provide insights into how people

build software. Ma et al. [76] propose an automated approach based on machine learn- ing techniques for automatic classification of these software artifacts into open-source applications.

Test generation: Borges et al. [19] mine associations between UI elements and their

interactions from the most common applications. Once mined, the resulting UI inter- action model can be easily applied to new apps and new test generators. For exam- ple, AppFlow [54] is a system for synthesizing robust, reusable UI test. It leverages machine learning to automatically recognize common screens and widgets, relieving de- velopers from writing ad hoc, fragile logic to use them in tests. It enables developers to write a library of modular tests for the main functionality of an app category. It can then quickly test a new app in the same category by synthesizing full tests from the modular ones in the library. By focusing on the main functionality, AppFlow provides ”smoke testing” requiring little manual work. Optionally, developers can customize AppFlow by adding app-specific tests for completeness.

Variable names generation: Jaffe et al. [57] present a machine translation approach

to generate meaningful variable names for decompiled code. They consider decompiler output to be a noisy distortion of the original source code, where the original source code is transformed into the decompiler output. Using this noisy channel model, they apply standard statistical machine translation approaches to chose natural identifiers, combining a translation model trained on a parallel corpus with a language model trained on unmodified C code.

Documento similar