After the filtering phase for the starting context of Android repositories with a history of releases, the application of the GitHub Code Search API for keywords related to the selected testing tools led to the definition of six different groups of repositories, with size ranging from 5 (for the set of repositories featuring Selendroid) to 372 (for the set of repositories featuring Espresso).
The additional filtering phases performed on individual sets (i.e., the removal of duplicate projects and clones of Android frameworks, and the removal of projects without test classes having the “test” keyword in their absolute paths) led to a reduction of the size of the six sets. The reduction was more substantial for the set of projects associated with UI Automator, principally for many clones of frameworks that figured inside it.
The graph in figure 7.1 shows the number of projects after each filtering phase. Table 7.3 summarizes the metrics that have been computed on the projects, to answer RQ3.1. The columns show: the total number of projects featuring the testing tools considered, at the end of the filtering procedure and the derivated the Tool Diffusion (TD) metric; the average and median values for the Number of Tagged Releases (NTR), Number of Test Classes (NTC), Total Test LOCs (TTL) and Test LOCs Ratio (TLR). The last two measures have been computed on the master releases of each project. The last line of the table shows the average value for all projects, weighted by the size of each set of projects. The table does not show the measures for the single Selendroid projects which lasted after the application of the whole filtering phase to the original set of projects featuring such tool.
To ease the reading of this section, the acronyms used hereafter for the metrics are reported in table 7.2.
The TD metric, computed as the percentual penetration of each testing tool among the extracted context of Android repositories, ranged from near 0% (for the set associated with Selendroid, containing just a single project) to 4.11% for the set of projects featuring Robolectric. The higher percentage of projects featuring
7.2 Results 91
Robolectric may be justified with the fact that Robolectric has been available for a longer time than the other testing tools that have been considered, and it can also be used for other forms of testing in addition to automated GUI testing. The set of projects featuring Appium also proved to be rather small, with just 12 projects. Of the two testing frameworks that are part of the Android Instrumentation Framework, Espresso proved to be more widespread than UI Automator. This result may hint at Espresso being an easier testing tool to use for creating simple test suites for Android applications, whereas UI Automator was typically used for complex sets of applications or frameworks in which also the interaction with the OS user interface had to be tested. Also, the prevalence of Espresso test classes can be motivated by the fact that the tool is indicated by the Android Developer Guide as the official way to test individual activities of Android apps; additionally, the Android Studio IDE features a built-in plugin for the creation of Espresso test cases through Capture & Replay.
Overall, slightly less than 8% of the filtered set of Android projects featured tests belonging to at least one of the selected testing tools. The six sets of projects were not necessarily disjoint, since a single repository may contain references to multiple testing frameworks. This may create overlaps, and hence to an overestimation of the cumulated diffusion of the six considered testing tools. That considered, the resulting cumulated TD gives evidence of a lack of extensive adoption of automated GUI testing frameworks among Android repositories hosted on GitHub. As a limitation of this result, it must be considered that such value is limited to the six testing tools considered in this study, with the possibility of the presence of many scripted testing tools adopted by other Android repositories.
As a final comparison for the adoption of the considered testing tools, the same procedure of search for test classes was performed searching, this time, for the JUnit keyword. This search would result in a set of projects featuring any kind of unit testing classes developed with JUnit, along with classes associated with other testing frameworks using JUnit as an automation engine. The search resulted in 3,669 projects (with tagged releases and manifest files) featuring classes containing the JUnitkeyword, for around the 20% of the total amount of extracted Android projects. This percentage, albeit significantly bigger than the combined percentage of adoption of the six considered testing tools, shows that the percentage of applications that are tested with any framework based on JUnit is still quite low, testifying a rather
scarce adoption of any form of testing on Android open-source repositories hosted on GitHub.
NTR metric was used to give a statistic about the average history of the projects of each set. The averages went from 13 (for the projects featuring Espresso) to 37 (for the projects featuring Appium). The small average and median values for Espresso projects may suggest that Espresso is typically preferred for testing smaller applications with shorter lifespans, possibly thanks to their higher accessibility and to its integration (especially thanks to the Espresso Test Recorder tool) in the Android Studio IDE. In the case of Appium, the result may be heavily influenced by the small size of the set of projects that have been considered.
The average and median values for the NTC metric, useful for quantifying the typical size in terms of test classes of an automated GUI test suite for an Android project, were rather small for all the considered sets except for the set associated with Appium. This result may be a consequence of the usual coding patterns adopted when developing test classes for Android applications, with each test class associated with an Activity of production code. Most apps do not feature a high number of different screens to compose the interface shown to their users, and therefore they do not feature many activities to be tested. An investigation about the number of activities for the considered projects was performed, by computing the number of declared activities in the manifest .xml file, with a measured average number of 19 activities. The smaller average value for the NTC metric (6), suggests a partial coverage of the activities of the Android repositories, and hence small coverage of the production code by the test classes.
Average TTL values were very large for the sets of projects featuring Appium and Robolectric. However, it could be noticed that the TLR was very small for the considered Appium projects. This may suggest that the amount of test code written with Robolectric is typical more relevant, in the whole production code of a repository, than the amount of test code written with Appium. This result also suggests that, among all the considered sets of projects, Appium has been typically used for testing bigger projects in terms of production LOCs. The values of TTL and TLR were rather similar for sets of projects featuring Espresso, UI Automator and Robotium, suggesting that the testing tools are used in a similar way on projects of similar size. The set of projects featuring Espresso had the lowest TTL value. This measure can be explained with following reasons: (i) the fact that using a white-box
7.2 Results 93
Table 7.4 Acronyms used for Evolution Metrics Name Explanation
TLR Test LOCs Ratio
MTLR Modified Test LOCs Ratio MRTL Modified Relative Test LOCs MRR Modified Releases Ratio TSV Test Suite Volatility
MCR Modified Test Classes Ratio MMR Modified Test Methods Ratio
MCMMR Modified Classes With Modified Methods ratio
Table 7.5 Measures of the evolution of test code (averages on the sets of repositories)
Tool T LR MT LR MRT L MRR TSV MCR MMR MCMMR Espresso 6.30% 4.21% 3.17% 16.64% 19.42% 15.75% 3.83% 60.12% UI Automator 5.84% 3.10% 1.14% 10.68% 21.46% 14.48% 3.42% 55.86% Robotium 5.11% 5.09% 3.07% 16.50% 25.13% 17.40% 3.80% 58.41% Robolectric 11.23% 5.30% 5.93% 20.39% 18.12% 14.91% 3.88% 55.36% Average 8.78% 4.94% 4.54% 18.37% 19.43% 15.43% 3.83% 57.21%
testing technique allows to translate the same testing scenarios in direct operations instead of performing multiple operations on higher level widget descriptions; (ii) the accessibility of the Espresso framework even to non-experienced developers, and the availability of higher amounts of documentation with respect to the other testing tools; (iii) the integration of Espresso in the Android Development environment, that may make it the first choice for tryouts – later abandoned – of the practice of testing.
Answer to RQ3.1: The considered GUI testing tools reached a diffusion that is always lower than 4.11% individually, and a combined adoption of about 8% by the considered set of 15 thousand Android repositories hosted on GitHub. The projects that are tested with the considered tools are typically rather short-lived, with an average of 15 releases, and feature on average few very few test classes for around 10% of total production code devoted to testing.