In order to investigate the precise function of a protein as- sembly, it is crucial to infer the set of binding patterns between all its members in a spatially and temporally highly resolved manner. Several experimental protocols and technologies, in the following denoted as assays or systems, have been developed in order to mea- sure the transient and permanent features of protein interactions.
Today, protein interactions can be measured by several biophysi- cal and genetic systems(9); while historically small-scale interaction assays focused on structural methods such as X-ray crystallography and NMR spectroscopy, modern genetic approaches neglect struc- tural aspects of the measured interactions in favor of increased experimental throughput. In particular, today binary interac- tions assays and co-complex detecting methods such as Y2H and (T)AP/MS, respectively, are considered to be premier approaches for orthogonaly and complementarily masuring the interactome.(10)
Yeast Two-hybrid. Yeast Two-hybrid (Y2H) is an experimental protein interaction assay that measures direct protein interactions by means of transcriptional activity.(11) Yeast transcription factors such as Gal4 that contain an activating (AD) and DNA binding (BD) domain are fused to two open reading frames (ORFs), re- spectively, that encode the pair of proteins under investigation, functionally denoted as bait and prey. The fusion constructs are transfected into yeast. Upon physical interaction of bird and prey the transcription factor domains are brought into close proximity and induce expression of a reporter gene that has phenotypic and observable effects on the yeast cell. Due to the granularity and sen- sitivity of this approach, large scale studies aided by robotics are able to interrogate tens of thousands of individual interactions in parallel.(12) However, the false positive rate of the Y2H system has been reported to be high, mostly due to biases in the experimental quantification of the interactions, artificial interactions between pro- tein partners not co-located in the same sub-cellular compartments under native conditions, and occasional indirect physical interac- tions.(13) However, recent advancements of the original Y2H system concerning the activation of reporter factors and the use of statisti- cal scoring schemes have been employed to significantly lower the error rate and Y2H is now the most widely used system to measure protein interactions.(14)
Affinity purification. Affinity purification and mass spectrometry (AP/MS) has been established as a more recent physical protein interaction assay that is designed to detect co-complexing proteins and thus measures both direct and indirect physical protein interac- tions in a single assay.
152 from basic research to clinical applications
(15)Puig et al.(2001),Rigaut et al.(1999)
(16)Babu et al.(2012),Ewing et al.(2007),
Gavin et al.(2006),Krogan et al.(2006),
Polanowska et al.(2004),Veraksa et al.
(2005)
(17)Blagoev et al.(2003),Ranish et al.
(2004,2003)
(18)Chang(2006),Gavin et al.(2011)
(19)Havugimana et al.(2012),Kristensen
et al.(2012),Wodak et al.(2009)
(20)Kapp et al.(2005),Perkins et al.(1999)
(21)Gavin et al.(2006,2011),Krogan et al.
(2006)
(22)Breitkreutz et al.(2010),Choi et al.
(2012,2011),Jäger et al.(2012),Lavallée- Adam et al.(2011),Ong et al.(2002),
Sardiu et al.(2008)
(23)Stengel et al.(2012)
Similar to Y2H, affinity purification follows a bait/prey approach that requires genetic tagging of the bait ORF with a known epitope such as GFO, StrepII, or FLAG prior to transfection into a yeast strain. The expressed protein complex is subsequently purified from the cellular medium using antibody affinity to the genetic tag. The components of the purified complex are then identified using liquid chromatography and tandem mass spectrometry (LC-MS).
A prominent variant implementing this interaction detection sys- tem is tandem affinity purification (TAP/MS) that allows for several sequential purifications in order to reduce contaminating proteins prior to identification.(15) In contrast to Y2H, affinity purification systems are biased towards more stable interactions that survive purification and are thus less likely to detect transient protein in- teractions. Affinity purification systems have been employed in large-scale studies interrogating the protein interactome of several model organisms such as yeast, worm, fly, and also human.(16)
Due to their ability to measure larger protein assemblies that may involve dynamic, i.e., functionally optional or temporally vari- able interaction partners, (T)AP/MS approaches are also suited for measuring protein complex dynamics using experimental perturba- tion techniques where the same complex is assayed multiple times in different states.(17)
Affinity purification exhibits significant false positive experimen- tal errors due to its requirement to over-express bait proteins, as well as due to off-target effects or reduced binding of antibodies, nonspecific binding events, and missing cellular compartmentaliza- tion of the tested proteins as a result of cell lyses.(18)
Refinements of the genetic protocols have been proposed that allow for measurements in slightly more physiological environ- ments but are still biased towards detecting highly stable interac- tions.(19) Similarly, statistical approaches that better quantify the uncertainty of MS protein identification have been utilized to re- duce error rates.(20) For instance, while the raw false positive rate of affinity purification systems is broadly comparable to that of Y2H assays,(21)methodologies have been proposed that quantify the abundance of the identified proteins by means of peptide spectra or differential isotope labeling of test and control purifications in or- der to identify contaminant proteins based on their abundance and binding specificity.(22)
Similarly, false negative errors that may be incurred due to the washing of physiologically relevant, transiently attached proteins during purification or that may result from low abundance interac- tions can partly be mitigated by specialized protocols.(23)
measuring protein interactions 153
(24)Gingras and Raught(2012)
(25)Gingras et al.(2007),Hyung and Ruo-
tolo(2012),Moyer et al.(2006),Stengel et al.(2012)
(26)Leitner et al.(2010),Sinz(2010)
(27)Yu et al.(2011)
Interpreting protein interaction assays
Apart from suffering from specific false positive and false negative error profiles, purification data pose further challenges for interpretation:(24) even after removal of contaminants, the retrieved set of co-complexed proteins is stabilized by a mixture of direct and indirect physical interactions. It is therefore not easily possible to identify proteins that share a common binding interface, nor is it straightforward to identify proteins that might by part of multiple physiological protein complexes that are co-purified within one single purification. Although additional techniques such as binary protein interaction assays, perturbation protocols, and quantitative MS approaches involving peptide counts or isotope labeling may aid in answering these questions, no consensus on a method for the integration of these approaches has yet been established. Instead, ad-hoc methods for interpreting interactions within purifications are commonly used when deposited these interactions in public repositories.
These methods, termed spoke expansion (interactions between bait protein and all its preys) and matrix expansion (interactions between all proteins of a purification regardless of bait or prey sta- tus), are prone to false negative and false positive errors: the spoke expansion neglects possible interactions between prey proteins that are physiologically likely to stabilize the complex and furthermore assumes that all preys are in direct physical contact with the bait; the latter fact is unlikely for larger purifications due to mutually exclusive binding interfaces. For the same reason, the matrix model of expansion leads to high rates of false positive interactions since large complexes are unlikely to be fully connected. In addition, both expansion models do not delineate multiple physiological protein complexes that have been captured within the same purifi- cation.
While additional experimental approaches like reciprocal pu- rifications, multiple orthogonal purification steps, or perturbation approaches may be conducted to obtain this missing information from protein purifications,(25)for example by cross-linking experi- ments,(26)these systems are cost-intensive and require specialized training. In addition, all known experimental methods, both high- throughput and low-throughput, influence cell physiology to dif- ferent degrees and in different manners, resulting in experimental results that are significantly biased by the assay being used. This fact regularly results in low reproducibility between identical pro- tein interaction assays run by different labs as well as low overlap between the results of different experimental approaches, thereby significantly complicating interpretation and comparison of protein interaction screens on genomic scales.(27)
154 from basic research to clinical applications
(28)Synthetic lethality: a class of genetic
interactions where where a combination of mutations in two or more genes is lethal for the cell while a mutation in a single gene is viable. Genes in a synthetic lethality reltionship may exhibit redun- dant functions within the cell. Synthetic lethality be measured in genome-wide screens by disabling pairs of gene by genetic process termed double knockout.
(29)Bader et al.(2003),Chatr-aryamontri
et al.(2013),Kerrien et al.(2012),Ke- shava Prasad et al.(2009),Licata et al.
(2012),Salwínski(2004)
(30)Turinsky et al.(2010)
(31)Aranda et al.(2011),Orchard et al.
(2012),Razick et al.(2008),Turner et al.
(2010)
(32)Armean et al.(2013)
(33)Spectral counts: the number of mass
spectra assigned to a given protein. Given appropriate normalization, these counts can be used to estimate the abundance of proteins within purification. Spectral counts are regularly employed to identify contaminants and compare protein abundance across technical replicates.
(34)Choi et al.(2012,2011),Sowa et al.
(2009)
(35)Gavin et al.(2002),Hart et al.(2006)
(36)Collins et al.(2007),Guruharsha et al.
(2011),Yu et al.(2009)
Computational interpretation and scoring methods. Due to these biases, even refined protein interaction protocols result in more measured protein interactions than would be expected given exist- ing biological knowledge and comparisons with orthogonal protein interaction assays. Consequently, in silico methodologies have been developed in order to aid in the interpretation of the measured data and to support identification of false positive experimental results by means of statistical models, additional experimental variables, or external information such as known protein interactions, functional annotations, and additional assays measuring protein-protein or gene-gene associations as for example gene expression measure- ments or synthetic lethality(28)experiments.
Many computational methods that classify or quantify protein interactions with regard to their status as potential contaminants make use of a background set of known, high-confidence protein interactions deposited in public databases such as HPRD, DIP, IntAct, BioGRID, MINT, and BIND.(29)However, analyses have re- ported low levels of concordance between these databases, a fact that is likely due to differing methods of data curation.(30) Recent standards by the Proteomics Standard Initiative, data querying in- terfaces that are able to contact multiple databases, as well as meta- databases such as IRefIndex and IRefWeb have aimed to mitigate these discrepancies.(31)
While a broad (perhaps overly broad) range of protein interac- tion scoring schemes exists,(32)here we will concentrate on methods that perform primary data analysis of purifications originating from (T)AP/MS assays and do not utilize additional, external data sources such as public databases. These primary data analysis methods can be further subdivided into methods that only rely on the list of proteins identified within each purification experiment (frequency-based approaches) and methods that employ non-standard experimental data from the (T)AP/MS workflow and thus require special protocols (spectral count approaches).(33) Two computational methods utilizing such additional (T)AP/MS data are ComPASS and SAINT, both of which use spectral counts and experimental replicates in combination with statistical models based on the Pois- son distribution in order to detect false positive interactions within the purification data.(34)
On the other hand, frequency-based socio-affinity approaches apply statistical models directly to the purification data in order to detect unreliable or promiscuous interactions based on experimen- tal replicates.(35) This class of approaches has lately been supple- mented by more elaborate methods by Collins et al., Guruharsha el al., and Yu et al. that include clustering, supervised learning, and permutation-based approaches in order to increase sensitivity.(36) The last section of this chapter will discuss advantages and disad- vantages of several of these methods in more detail.
(1)Durmu¸s Tekir and Ülgen(2013)
(2)Lengeling et al.(2001),Stebbins(2005)
(3)Barker(2006),Dyer et al.(2008),Münter
et al.(2006)
(4)Thieu et al.(2012)
(5)de Chassey et al.(2012b),Dolan et al.
(2006),International Human Genome Sequencing Consortium(2004),Zhaxy- bayeva and Doolittle(2011)
(6)Flajolet et al.(2000),McCraith et al.
(2000),Rain et al.(2001)