• No se han encontrado resultados

3.2.1

Modeling yeast-two hybrid experiments

Protein-protein interaction networks are modeled as undirected and unweighted graphs as described in chapter 2. The type of interactions included in the network differ between Y2H (only physical interaction) and affinity purification (both physical and indirect interactions via other proteins). Since these differences make it difficult to define a comprehensive model for both experimental methods, the sampling procedure described by Han et al. (2005) simulates only the Y2H approach.

Although many topological properties can be analyzed, we concentrate on two of them, the degree distribution and the average clustering coefficient as described in chapter 2. In the following, a network is defined as randomly clustered if clustering coefficients are hardly changed by rewiring the network (see section 2.3.1). Consequently, a network is clustered less than randomly if clustering coefficients are increased by rewiring or more

3.2 Methods 21

DIP Ito Uetz Li Giot LaCount Rual Stelzl

0 0.02 0.04 0.06 0.08 0.1 0.12 C

DIP Ito Uetz Li Giot LaCount Rual Stelzl

0 25 50 75 100 C/C rand A

B yeast worm fly P. falc. human

human P. falc.

fly worm yeast

Figure 3.1: Clustering coefficients in large-scale Y2H interaction networks. Clustering coefficients (A) and the ratio to clustering coefficients of random graphs (Erd˝os and R´enyi, 1959) with the same size (B) are shown for the following interaction networks: yeast interactions from DIP (Xenarios et al., 2002) and the Y2H studies by Ito et al. (2001) and Uetzet al. (2000); C. elegans interactions by Li et al. (2004);Drosophila interactions by Giot et al. (2003); P. falciparum interactions by LaCount et al. (2005); and human interactions from the studies of Rual et al. (2005) and Stelzl et al. (2005). Only high- confidence interactions were considered for the Ito, Li and Giot data set and self-edges were ignored for the calculation of clustering coefficients.

than randomly if they are decreased. We will see examples for all three cases later on. Figure 3.1 Ashows the average clustering coefficients for a number of high-throughput Y2H data sets (Itoet al., 2001; Uetzet al., 2000; Liet al., 2004; Giotet al., 2003; Rualet al., 2005; Stelzlet al., 2005; LaCountet al., 2005). Here, only high-confidence interactions were considered for the data sets of Ito et al. (2001), Li et al. (2004) and Giot et al. (2003). For comparison purposes, the same characteristics are given for the yeast protein-protein interaction network from DIP (Xenarios et al., 2002) (version of April 2nd, 2006) which contains high-throughput data as well as interactions determined with other experimental methods. Although the clustering coefficients of some of the partial networks appear to be rather small, they are in most cases at least one order of magnitude higher than clustering coefficients of random graphs with the same number of nodes and edges (see Figure 3.1

3.2.2

Missing interactions

The sampling procedure described by Han et al. (2005) simulates the effect of the Y2H method under the assumption that interactions may be missed in the process but no wrong interactions are obtained. It is determined uniquely by two parameters: bait coverage (denoted by β) and edge coverage (denoted by ε). Bait coverage specifies the selective effect of choosing only a fraction of the proteome as baits in a large-scale Y2H experiment, whereas edge coverage determines the fraction of true interactions which can actually be resolved for a bait. Accordingly, a network is sampled from the original network as follows. A fraction β of nodes is selected as baits and then for each bait a fraction ε of its interactions. Edges connecting two baits are selected with higher probability 2ε−ε2 =

ε(2−ε). The sampled network then contains the bait nodes as well as non-bait nodes which are connected to a bait via a sampled edge. In the following, the latter ones are referred to as preys. The resulting network is referred to as G1 = (V1, E1) and the set of baits is called B. The resulting degree of a node v and its clustering coefficient are consequently referred to as k1

v and Cv1. The average degree of the network and the average clustering

coefficient are denoted by ¯k1 and C1.

3.2.3

Spurious interactions

Since false positive interactions may affect both the degree distribution and the clustering coefficient, we extended the simple sampling model to include also wrong interactions. For this purpose, a second step is added after the first sampling step in which false positive interactions are simulated. False positive interactions can be added between each bait and any other node u even if no interaction of u was sampled in the first step. We add an interaction between a bait v and any other node u with a specific probability ω(v, u) and the resulting network is denoted as G2 = (V2, E2).

The probabilityω(v, u) can be defined in different ways. In the first case, the probability of adding an edge betweenv and udepends neither on the degree ofv oru, i.e. is constant for all pairs of nodes. Since random graphs (Erd˝os and R´enyi (1959), see also section 2.2.1) are created in a similar way, this process is denoted as random attachment. In the second caseω(v, u) does only depend on the degree of the baitv but is constant for all its possible neighborsu. We denote this behavior as semi-preferential attachment, since new edges will be attached preferentially to baits with high degree. The last possible scenario involves preferential attachment for both v and u.

Since preferential attachment is most likely to change the degree distribution towards a power-law distribution (Barab´asi and Albert, 1999), our model is based on such a scenario. For this purpose, we use an adaption of the method described by Chung and Lu (2002) for creating random graphs with a given degree distribution (see section 2.3.2). Accordingly,

ω(v, u) is defined as

ω(v, u) =θ(Pkv+ι)(ku+ι)

w∈V(kw+ι)

. (3.1)