Luis asintió Ya lo sabía En caso como aquel, no se podía esperar otra cosa.

Escherichia Coli

E.coli is a gram negative, rod shaped bacteria, approximately 1µm wide by 2µm long, present in the

normal flora of the intestine. Most strains are harmless, but a few serotypes can cause food

poisoning and sickness [173]. E.coli is a widely studied prokaryotic organism as it is well

characterised, making it often the platform of choice for genetic, biochemical and metabolic

research [174]. The K‐12 strain MG1655 is fully sequenced [175] and 87% of its genome has been

functionally annotated [176]. Scarab Genomics LLC (http://www.scarabgenomics.com/) aim to

create a reduced genome E.coli, as they believe parts of the genome are unnecessary for function,

therefore the reduced genome E.coli will have improved stability, be more efficient, have improved

experimental manipulation and reduced redundancy [174].

E.coli has around 5000 protein coding genes, of which less than 1000 are highly conserved ‘core

genes’, with the remaining genome made up of variable strain specific genes, some of which are

localised on ‘gene islands’ [177]. The variable genes are the more ‘disposable’ genes, such as those

for adaptation to a specific environment. Removal of the gene islands alone from the wild type

MG1655 (WT) strain would reduce the genome by 20% [178]. Genes selected for deletion were

chosen by comparing E.coli genomes, with the hypothesis that genes in a single strain were non‐

essential, having evolved to perform strain specific adaptations [178]. The deletions were

constructed in a known strain of E.coli by recombination mediated by bacteriophage lambda red

[178], with the targeted gene removed, the sequence resealed and the elimination of markers used

to perform the deletion [174]. The deletion was verified to confirm that growth in minimal media

was not affected, and PCR was used to confirm deletions [178], then the deletion was transferred

into the latest deletion strain by P1 transduction. The majority of the gene deletions were performed

on non‐protein coding mobile elements that mediate recombination, such as insertion sequences

and transposons, plus site specific recombinations, along with elements that provide DNA sequence

repeats that mediate inversions, duplications and deletions [179]. Removal of these elements and

genes for unwanted functions, that are suited to a specific growth environment, will lead to a more

stable genome (better for experimental manipulation) [174] and facilitate the removal of unwanted

contaminants from drug or vaccine culture [178].

The depleted strain of E.coli used in this study, MSD66 (abbreviated to GD), has a genome reduced

by around 15%; which corresponds to 875 proteins (∆ proteins), along with non‐coding regions, of

these 875 proteins 14 code for insertion element proteins and 47 code for the different transposase

abundances in parts per million (ppm) from three individual studies, only 160 of these ∆ proteins

have any known abundance data. One study uses an YPF (yellow fluorescent protein) fusion library

to detect individual protein molecules through fluorescence image analysis [180]. The other two use

spectral counting [181, 182], which is based on the idea that the more MS/MS spectra of a peptide

are detected, the more abundant the protein is (taking into account the length of protein). PaxDb

creates an integrated average ppm value for the abundance of each protein that is weighted

differently for each study (YPF (Taniguchi et al, 2010) 10%, spectral counting (Lewis et al, 2010) 100%

and (Lu et al, 2007) 50%) and using this weighted average the copies per cell of each protein was

calculated through multiplying the average ppm by 2.4 (average approximation of 2.4million copies

of protein in every E.coli cell). Using this calculation it was possible to determine that half of the 160

∆ proteins are present in E.coli at an abundance of over 100 copies per cell, while half are below 100

copies per cell (Figure 3.1). The sensitivity of the mass spectrometers used in this study will allow

proteins to be identified if they are present down to approximately 100 copies per cell, but it is

unlikely those present at less than 100 copies per cell will be detected by these instruments.

One set of genes that have been deleted belong to the flagellar complex, which provides the

bacteria with motility and can sense external factors such as sugars and amino acids [183]. The

flagellum is made up of proteins, which create four helical filaments that extend out of the cell. A

hook is present just outside the outer membrane attached to the filament, a rod or shaft runs

through the cell membranes and wall with protein rings along the shaft that act as bearings (Figure

3.2A)[183]. The flagellum is powered by a rotary motor in the cell envelope, which uses the

movement of hydrogen ions across the cell membrane, due to differing concentration gradients

inside and outside the cell (proton motive force), as it’s source of power. The hydrogen ions are

moved into the cell which powers the flagellum to turn both clockwise (backwards) and anti‐

clockwise (forwards) thus enabling the cells to move towards areas of the environment that are

more favourable to survival [183]. If nutrients are plentiful the cell no longer needs to be motile or

perform chemotaxis, so the creation of the flagella complex is suppressed [183]. In total the

flagellum is made up of 27 individual protein components (Figure 3.2B), with 26 other proteins

contributing to the formation of the complex that are not present in the final structure itself. These

proteins contribute either through facilitating assembly (4 proteins in total), chaperoning proteins to

the complex (3 proteins in total), acting as sigma factors to regulate necessary genes (5 proteins in

total), through chemotaxis (5 proteins in total), transducing signals (2 proteins In total) or being

response regulators (4 proteins in total), (three proteins have an unknown function) [183]. The GD

strain has had all these proteins deleted apart from two methyl accepting chemotaxis proteins.

Figure 3.1. PaxDb abundance data for the 875∆proteins in the GD strain.

Abundance in copies per cell for the 875∆proteins in the GD strain,from Taniguchiet al.2010, YPF fusion library study, Lewiset al.2010, and Luet al.2007 spectral counting studies, combined and weighted 10%, 100%, 50% respectively. Calculated by the PaxDb database http://pax‐db.org.

Outer membrane Inner membrane Cell wall Periplasmic space Extracellular Cytoplasm Filament Hook L ring P ring Rod MS ring C ring Junction

Type III secretion system

Cap Motor FliC FliD FlgL FlgK FlgE FlgI FlgH FliG FliM FliN FliH FliJ FliI MotA MotB FlgB FliE FlgC FlgF FlgG FliF Outer membrane Inner membrane Cell wall Periplasmic space Extracellular Cytoplasm FlhA FlhB FliO FliP FliQ FliR

Figure 3.2.E.coliflagella complex

A) Structural features of the flagella complex

B) The individual protein components that make up the final complex structure. A)

quantification data, and only two of these proteins have an abundance of over 100 copies per cell,

indicating these proteins may not be detected in the WT strain (Figure 3.3).

An interesting question is not only can we confirm the gene deletions with proteomic techniques,

but also what are the consequences of these deletions for the remaining proteome? Does the GD

strain have to adapt to compensate for these deletions, or were they genuinely unnecessary genes?

Quantitative proteomics

Bottom up, label free quantification, using both data dependent (LTQ Orbitrap Velos, Thermo) and

data independent (MSE, Synapt™, Waters) mass spectrometry, was used to compare these genomes.

Samples were prepared for analysis through lysis and whole proteome in solution tryptic proteolysis,

including a RapiGest™ denaturing step, plus reduction and alkylation steps. This was chosen because

there is no label addition, which reduces experimental cost. Both E.coli strains were grown in three

biological replicates to ensure any changes observed in the proteome were reproducible. After

proteolysis the biological replicates were analysed in triplicate on the mass spectrometers. This

increases both the number and confidence of the identifications and quantification, so lowering the

false discovery rate (FDR), a value that indicates the expected number of false positive

identifications in any set of identifications.

Absolute label free quantification will be performed using the QuanToF Synapt™ instruments from

Waters, these instruments use a novel ion sampling and fragmentation method known as MSE. With

the addition of a known protein at a known concentration spiked into the E.coli whole proteome

tryptic digest, the process of MSE can be used to perform absolute quantification of all the proteins it

identifies. The Synapt™ G2 is the second generation QuanToF instrument and has an increased

resolution of 30,000 from the 10,000 previously in the Synapt™ G1, leading to greater mass accuracy

over both the low energy and high energy scans [161]. Both QuanToF instruments contain an ion

mobility travelling wave ion guide cell [161], allowing the precursor and fragment ions to be

subjected to a second separation (after the chromatography) based on their m/z and their

interactions with a travelling wave created by voltages across the cell with a buffer gas providing

resistance. This extra mobility separation step is performed over 30ms and provides the ability for

the instrument to cover a larger sample dynamic range (105) adding new identifications and

quantifications throughout the quantification range (full explanation of Synapt™ mass spectrometers

in chapter 1.5).

Protein Lynx Global Server (PLGS) is the instrument vendor software package for the Synapt™

Figure 3.3 PaxDb abundance data for the flagellar∆proteins in the GD strain.

Abundance in copies per cell for the 51 flagellar∆proteins in the GD strain,from Taniguchiet al.2010, YPF fusion library study, Lewiset al.2010, and Luet al.2007 spectral counting studies, combined and weighted 10%, 100%, 50% respectively. Calculated by the PaxDb database http://pax‐db.org.

quantification calculations using the Hi3 methodology (full explanation of the software and

quantification in chapter 2.19). ISOQuant is a quantitation software package that can perform

quantification calculations on MSE data sets processed using PLGS. The main advantages over the

quantification performed by PLGS are that ISOQuant treats the data set as a whole, rather than the

individual replicates, so it aligns the processed raw data using the apex retention time, to allow

identification of proteins across replicates. Quantification of a protein is performed with the top

three unique peptides eliminating problems with isomers, homologous proteins and shared peptide

identifications. This is explained in full in Chapter 2.20.

The LTQ Orbitrap Velos uses the more traditional mass spectrometric acquisition mode for bottom

up proteomics of data dependant acquisition. This is where a survey scan (MS1) is performed, from

this the precursor ions are selected and taken forward for fragmentation in an MS2 scan. The LTQ

Orbitrap Velos contains a high resolution (30,000), high mass accuracy (>1ppm) Orbitrap analyser for

the precursor ion scan (MS1), the top 20 most intense precursor ions are then selected and sent

back into the linear ion trap for fragmentation through CID (MS2). The ability to perform MS1 and

MS2 scans simultaneously in the different analysers plus the removal of the pre scan and the

modification to a dual pressure ion trap has created a high scanning speed, that can scan 20 MS/MS

spectra in a second, with no loss of resolution [158]. The Orbitrap gives this instrument its high

resolution and large dynamic range capabilities, leading to the identification of huge numbers of

proteins within samples (full explanation of mass spectrometer in chapter 1.5). The software

package chosen to perform label free analysis of the LTQ Orbitrap Velos data is Progenesis LC‐MS.

This software is explained in full in chapter 2.21.

The acquisition of data from the Synapt™ G2 and LTQ Orbitrap Velos is completely different, one

being data dependant, the other not and one being a time of flight instrument, the other being an

ion trap plus the unique features of MSE and the Orbitrap analyser. However, both instruments are

high resolution with high mass accuracy and have a fast duty cycle. One objective of this study was

to compare the respective results to see how well label free technologies correlate across

instruments.

In document Isabel Alvarez de Toledo y Maura – La Huelga (página 95-99)