ample, if reproduction of phenomena under controlled conditions is an abso- lute requirement, then astrophysicists and palaeontologists are not scientists. This complaint inherits a flavour from logical empiricism; reproducible exper- iments are necessary to repeatedly test and increase confidence in falsifiable
3.4 practicing science of security 119 (but not yet falsified) statements of universal law. Fragile or contextual conclu- sions in biology – conclusions that were not readily reproducible – historically led to serious claims that biology was not a science (Cartwright,1983).
Complaints of irreproducibility, at heart, strike out at the genuine observa- tion that conclusions in cybersecurity research are often fragile or contextual. Philosophy of biology countered similar logical-empiricist attacks by creating a more nuanced idea of evaluating explanations and results. I will leverage this work about biology to do the same. Reproducibility is a complex term in itself; I will present no fewer than eight different senses of the term that discuss different aspects of evaluating evidence from structured observations.
alternative: evaluation takes many forms
Althoughthe distinction between replication and repetition is not new (Cartwright,
1991), recent work provides actionable advice to scientists. I focus on the family of five terms suggested by Feitelson (2015), plus the notion of statist- ical reproducibility from Stodden (2015). I will discuss three distinct senses of statistical reproducibility, for a total of eight distinct methods to support the robustness of evidence. When one complains that cybersecurity lacks re- producibility, usually what is meant is that one or two of these eight senses is impossible. All sciences similarly struggle to achieve reproducibility in all these senses at once. Thus, cybersecurity is no worse off than other sciences.
Feitelson (2015) suggests five distinct terms for a computer science discus- sion of evaluating results:
Repetition – to rerun exactly, using original artefacts Replication – to rerun exactly, but recreate artefacts
Variation – repetition or replication, but with a measured intentional modi- fication of a parameter
Reproduction – to recreate the spirit, in a similar setting with similar but recreated artefacts
Corroboration – to aim to obtain the same or consistent result as another study via different means
Each of these strategies has its uses, one is not strictly preferred over the oth- ers. This subsection uses an extended example of evaluation of Network Intru- sion Detection and Prevention Systems (NIDPSs). One may question whether evaluating an NIDPS rule is scientific, in the sense desired. NIDPS rules may
determining whether a particular enzyme in the blood of some rare Amazonian fish actually selectively binds to some specific parasite. Chapter7will pick up on the example of NIDPS rules and how to integrate them into more general
knowledge using the strategies from Chapter 4.
repetition
Even in the restricted context of evaluating a singleNIDPSrule, these strategies all have a sensible interpretation. Given a recorded stream of traffic, if one plays the stream back through the sameNIDPSwith the
same network configuration, the same rules should fire. A failure of repetition would be indicative of race conditions or performance bottlenecks, or perhaps an architecture with multiple distinct NIDPS systems and a network switch
that randomly assigned packets, meaning any rule that required correlation between packets would not reliably fire across repetitions. It is because of re- petition experiments such as this that Bro, Snort, or Surricata work on flows, not packets. When load-balancing any of these tools, traffic is balanced based on flows, so that all packets in one conversation go to the same thread. Flow- based balancing is needed because a NIDPS works to reassemble application-
level, or at least transport-layer, information (Scarfone and Mell,2007). Any network architecture that randomized packets among instances, as opposed to maintaining flow-based balancing, would lead to unrepeatable observations because application-level reconstruction would fail to be the same, and the
NIDPS tests would be applied to different information.
replication
Replication might mean to use differentNIDPS softwareand re-write the rule in its language to get identical detection. Replication might also mean using the sameNIDPS architecture and rules on traffic that
is artificially generated to have the same malicious features as a prior ex- periment; or perhaps it is simply to recreate the same network architecture at a different physical location and replicate that the NIDPS works on a pre-
recorded traffic stream to provide evidence the newly setup sensor architecture has been installed correctly.
I have claimed that creating a new NIDPS rule and testing it on a pre-
recorded traffic stream is an experiment. Am I abusing the use of experiment here, in a way that is not consistent with other sciences? No. Perhaps strangely, the objects of the experiment are artefacts, and software artefacts at that. But if cybersecurity is a science of anything, certainly it is of software (and how people interact with it). Therefore, the NIDPSanalyst making a rule has
3.4 practicing science of security 121 an untested specimen (the NIDPS rule), a hypothesis about how it should
behave (what she designed it to do, in this case), and establishes a controlled environment in which to test the properties of the specimen of interest. This matches all the usual hallmarks of an experiment.
From repetition and replication, variation is straightforward and I will not discuss it in detail. For an NIDPS it is basically making a small change to a
signature and observing how the results change.