We have shown how to use the model to analyse the anonymity of a message which has travelled through the system given the observation of the system by the global passive attacker. This analysis was done by hand and has not been excessively painful. Clearly, with more complex systems and larger traces, such methods rapidly become infeasible. The next step was to write an analysis tool which implements our semantics.
The tool, written in Haskell [PJH+99] can perform several tasks: Firstly, given a
number of sender messages, it can act as a mix network and produce a set of resulting real traces and the final state in each case. Secondly, given a real trace, it can erase it to give the trace observed by the attacker. Finally (and most importantly) we can use the tool to work out the anonymity of a message given a particular observation – this involves generating the set ScenObs. The algorithm for generating ScenObs
from an erased trace is non-trivial, and much has been gained from implementing it in a high-level functional language. Unfortunately, the size of ScenObs is O(n!) if
the trace has no more than one message between two mixes during any round, and even worse otherwise. So, even having implemented the semantics we are still limited to fairly small examples (around 5 messages). This is clearly unsatisfactory (hence we do not describe our algorithm here) and we need to look for a more efficient algorithm which computes an approximation. It is, however, unclear, that a good algorithm for doing so exists. One such algorithm was proposed in our PET paper [SD02]; finding a proof or even an argument for its correctness has turned out to be
very difficult. We leave this for exciting future work. On the other hand, one can always go back to non-probabilistic analysis, i.e. compute the anonymity set (much easier to do approximately) and declare that the probability distribution over all the senders in the sender anonymity set is uniform. Instead, we go on to discuss the design choices of the semantic model we have constructed followed by a survey of formal models for anonymity systems and a speculation about the future.
In creating our model we made a number of design choices. Here we document the more important ones and discuss alternative approaches.
First of all, we decided to include the reordering produced by the network in the model. This is done using non-determinism in the reduction rules of the semantics. Following on from this, we assumed that the attacker observes the traffic at the senders, receivers and mixes and showed that the attacker is able to factor out the network reordering. This is consistent with the observation that anonymity is pro- duced by the mixes, not the network. If the attacker is not observing the traffic very close to the mixes6, he may not be able to deduce correctly which message belongs to which batch. Thus, it may appear to him that messages from different batches got mixed together, hence yielding increased anonymity. We are not aware of this simple observation having been made before.
Secondly, we have chosen to abstract from the issue of intersection attacks. This was a valuable design decision as our work is completely orthogonal to (and could therefore be easily combined with) previous research on statistical disclosure: [Dan03c, AKP03, KAP02].
We have also chosen to model a “not quite free route” mix network. This merely saved us several reduction rules in the model, though one must note that additional technicalities would be required for the construction of the attacker’s observation. The one difference between a free route mix network and the one we modelled is that in the latter messages are not allowed to have the same mix occurring consecutively in the sequence of mixes of any message. What happens if a message does have two mixes consecutively in its sequence? The answer is simple – it does not get sent out onto the network and thus is not seen by the attacker. However, because the mixes we are dealing with in this model are threshold mixes, the attacker is able to deduce that a message remained in the mix and infer the correct “observation” from the lack of one. This inference would have to be included in the definition of the attackers observation. Note that a similar inference could be made in the case of timed, timed pool and threshold pool mixes, but not in the case of binomial mixes or in the presence of a randomised dummy traffic policy. Modelling this formally is challenging future work, which may turn out to be infeasible for some mixes! A related, (and much
6The technical condition is the attacker sees the messages in the same order as they leave/arrive
easier) inference would have to be performed to account for message drops. Take the case of a threshold mix network, as above. From an observation Obs the attacker can deduce when a mix drops messages – in this case this mix would have more arrival events than send events. A dropped message, of course, is equivalent to a message leaving the network. Hence, once a message drop has been identified, we can augment Obs with two events in an appropriate place which represent the message leaving the network to a receiver drop. From here on, the analysis proceeds mostly as before (though now R must also include this new “receiver” and the mix must be made not to fire until n + 1 messages arrive at the appropriate time). Although we have not formalised this, it is clear that it can be added relatively easily to the framework presented above.
Finally, we have chosen to abstract away the details of encryption, message transfer protocols, retrieving the necessary public keys by the users, etc. This is the subject of other work, e.g. [Dan04], and is mostly orthogonal to our efforts.