Escherichia Coli
E.coli is a gram negative, rod shaped bacteria, approximately 1µm wide by 2µm long, present in the
normal flora of the intestine. Most strains are harmless, but a few serotypes can cause food
poisoning and sickness [173]. E.coli is a widely studied prokaryotic organism as it is well
characterised, making it often the platform of choice for genetic, biochemical and metabolic
research [174]. The K‐12 strain MG1655 is fully sequenced [175] and 87% of its genome has been
functionally annotated [176]. Scarab Genomics LLC (http://www.scarabgenomics.com/) aim to
create a reduced genome E.coli, as they believe parts of the genome are unnecessary for function,
therefore the reduced genome E.coli will have improved stability, be more efficient, have improved
experimental manipulation and reduced redundancy [174].
E.coli has around 5000 protein coding genes, of which less than 1000 are highly conserved ‘core
genes’, with the remaining genome made up of variable strain specific genes, some of which are
localised on ‘gene islands’ [177]. The variable genes are the more ‘disposable’ genes, such as those
for adaptation to a specific environment. Removal of the gene islands alone from the wild type
MG1655 (WT) strain would reduce the genome by 20% [178]. Genes selected for deletion were
chosen by comparing E.coli genomes, with the hypothesis that genes in a single strain were non‐
essential, having evolved to perform strain specific adaptations [178]. The deletions were
constructed in a known strain of E.coli by recombination mediated by bacteriophage lambda red
[178], with the targeted gene removed, the sequence resealed and the elimination of markers used
to perform the deletion [174]. The deletion was verified to confirm that growth in minimal media
was not affected, and PCR was used to confirm deletions [178], then the deletion was transferred
into the latest deletion strain by P1 transduction. The majority of the gene deletions were performed
on non‐protein coding mobile elements that mediate recombination, such as insertion sequences
and transposons, plus site specific recombinations, along with elements that provide DNA sequence
repeats that mediate inversions, duplications and deletions [179]. Removal of these elements and
genes for unwanted functions, that are suited to a specific growth environment, will lead to a more
stable genome (better for experimental manipulation) [174] and facilitate the removal of unwanted
contaminants from drug or vaccine culture [178].
The depleted strain of E.coli used in this study, MSD66 (abbreviated to GD), has a genome reduced
by around 15%; which corresponds to 875 proteins (∆ proteins), along with non‐coding regions, of
these 875 proteins 14 code for insertion element proteins and 47 code for the different transposase
abundances in parts per million (ppm) from three individual studies, only 160 of these ∆ proteins
have any known abundance data. One study uses an YPF (yellow fluorescent protein) fusion library
to detect individual protein molecules through fluorescence image analysis [180]. The other two use
spectral counting [181, 182], which is based on the idea that the more MS/MS spectra of a peptide
are detected, the more abundant the protein is (taking into account the length of protein). PaxDb
creates an integrated average ppm value for the abundance of each protein that is weighted
differently for each study (YPF (Taniguchi et al, 2010) 10%, spectral counting (Lewis et al, 2010) 100%
and (Lu et al, 2007) 50%) and using this weighted average the copies per cell of each protein was
calculated through multiplying the average ppm by 2.4 (average approximation of 2.4million copies
of protein in every E.coli cell). Using this calculation it was possible to determine that half of the 160
∆ proteins are present in E.coli at an abundance of over 100 copies per cell, while half are below 100
copies per cell (Figure 3.1). The sensitivity of the mass spectrometers used in this study will allow
proteins to be identified if they are present down to approximately 100 copies per cell, but it is
unlikely those present at less than 100 copies per cell will be detected by these instruments.
One set of genes that have been deleted belong to the flagellar complex, which provides the
bacteria with motility and can sense external factors such as sugars and amino acids [183]. The
flagellum is made up of proteins, which create four helical filaments that extend out of the cell. A
hook is present just outside the outer membrane attached to the filament, a rod or shaft runs
through the cell membranes and wall with protein rings along the shaft that act as bearings (Figure
3.2A)[183]. The flagellum is powered by a rotary motor in the cell envelope, which uses the
movement of hydrogen ions across the cell membrane, due to differing concentration gradients
inside and outside the cell (proton motive force), as it’s source of power. The hydrogen ions are
moved into the cell which powers the flagellum to turn both clockwise (backwards) and anti‐
clockwise (forwards) thus enabling the cells to move towards areas of the environment that are
more favourable to survival [183]. If nutrients are plentiful the cell no longer needs to be motile or
perform chemotaxis, so the creation of the flagella complex is suppressed [183]. In total the
flagellum is made up of 27 individual protein components (Figure 3.2B), with 26 other proteins
contributing to the formation of the complex that are not present in the final structure itself. These
proteins contribute either through facilitating assembly (4 proteins in total), chaperoning proteins to
the complex (3 proteins in total), acting as sigma factors to regulate necessary genes (5 proteins in
total), through chemotaxis (5 proteins in total), transducing signals (2 proteins In total) or being
response regulators (4 proteins in total), (three proteins have an unknown function) [183]. The GD
strain has had all these proteins deleted apart from two methyl accepting chemotaxis proteins.
Figure 3.1. PaxDb abundance data for the 875∆proteins in the GD strain.
Abundance in copies per cell for the 875∆proteins in the GD strain,from Taniguchiet al.2010, YPF fusion library study, Lewiset al.2010, and Luet al.2007 spectral counting studies, combined and weighted 10%, 100%, 50% respectively. Calculated by the PaxDb database http://pax‐db.org.
Outer membrane Inner membrane Cell wall Periplasmic space Extracellular Cytoplasm Filament Hook L ring P ring Rod MS ring C ring Junction
Type III secretion system
Cap Motor FliC FliD FlgL FlgK FlgE FlgI FlgH FliG FliM FliN FliH FliJ FliI MotA MotB FlgB FliE FlgC FlgF FlgG FliF Outer membrane Inner membrane Cell wall Periplasmic space Extracellular Cytoplasm FlhA FlhB FliO FliP FliQ FliR
Figure 3.2.E.coliflagella complex
A) Structural features of the flagella complex
B) The individual protein components that make up the final complex structure. A)
quantification data, and only two of these proteins have an abundance of over 100 copies per cell,
indicating these proteins may not be detected in the WT strain (Figure 3.3).
An interesting question is not only can we confirm the gene deletions with proteomic techniques,
but also what are the consequences of these deletions for the remaining proteome? Does the GD
strain have to adapt to compensate for these deletions, or were they genuinely unnecessary genes?
Quantitative proteomics
Bottom up, label free quantification, using both data dependent (LTQ Orbitrap Velos, Thermo) and
data independent (MSE, Synapt™, Waters) mass spectrometry, was used to compare these genomes.
Samples were prepared for analysis through lysis and whole proteome in solution tryptic proteolysis,
including a RapiGest™ denaturing step, plus reduction and alkylation steps. This was chosen because
there is no label addition, which reduces experimental cost. Both E.coli strains were grown in three
biological replicates to ensure any changes observed in the proteome were reproducible. After
proteolysis the biological replicates were analysed in triplicate on the mass spectrometers. This
increases both the number and confidence of the identifications and quantification, so lowering the
false discovery rate (FDR), a value that indicates the expected number of false positive
identifications in any set of identifications.
Absolute label free quantification will be performed using the QuanToF Synapt™ instruments from
Waters, these instruments use a novel ion sampling and fragmentation method known as MSE. With
the addition of a known protein at a known concentration spiked into the E.coli whole proteome
tryptic digest, the process of MSE can be used to perform absolute quantification of all the proteins it
identifies. The Synapt™ G2 is the second generation QuanToF instrument and has an increased
resolution of 30,000 from the 10,000 previously in the Synapt™ G1, leading to greater mass accuracy
over both the low energy and high energy scans [161]. Both QuanToF instruments contain an ion
mobility travelling wave ion guide cell [161], allowing the precursor and fragment ions to be
subjected to a second separation (after the chromatography) based on their m/z and their
interactions with a travelling wave created by voltages across the cell with a buffer gas providing
resistance. This extra mobility separation step is performed over 30ms and provides the ability for
the instrument to cover a larger sample dynamic range (105) adding new identifications and
quantifications throughout the quantification range (full explanation of Synapt™ mass spectrometers
in chapter 1.5).
Protein Lynx Global Server (PLGS) is the instrument vendor software package for the Synapt™
Figure 3.3 PaxDb abundance data for the flagellar∆proteins in the GD strain.
Abundance in copies per cell for the 51 flagellar∆proteins in the GD strain,from Taniguchiet al.2010, YPF fusion library study, Lewiset al.2010, and Luet al.2007 spectral counting studies, combined and weighted 10%, 100%, 50% respectively. Calculated by the PaxDb database http://pax‐db.org.
quantification calculations using the Hi3 methodology (full explanation of the software and
quantification in chapter 2.19). ISOQuant is a quantitation software package that can perform
quantification calculations on MSE data sets processed using PLGS. The main advantages over the
quantification performed by PLGS are that ISOQuant treats the data set as a whole, rather than the
individual replicates, so it aligns the processed raw data using the apex retention time, to allow
identification of proteins across replicates. Quantification of a protein is performed with the top
three unique peptides eliminating problems with isomers, homologous proteins and shared peptide
identifications. This is explained in full in Chapter 2.20.
The LTQ Orbitrap Velos uses the more traditional mass spectrometric acquisition mode for bottom
up proteomics of data dependant acquisition. This is where a survey scan (MS1) is performed, from
this the precursor ions are selected and taken forward for fragmentation in an MS2 scan. The LTQ
Orbitrap Velos contains a high resolution (30,000), high mass accuracy (>1ppm) Orbitrap analyser for
the precursor ion scan (MS1), the top 20 most intense precursor ions are then selected and sent
back into the linear ion trap for fragmentation through CID (MS2). The ability to perform MS1 and
MS2 scans simultaneously in the different analysers plus the removal of the pre scan and the
modification to a dual pressure ion trap has created a high scanning speed, that can scan 20 MS/MS
spectra in a second, with no loss of resolution [158]. The Orbitrap gives this instrument its high
resolution and large dynamic range capabilities, leading to the identification of huge numbers of
proteins within samples (full explanation of mass spectrometer in chapter 1.5). The software
package chosen to perform label free analysis of the LTQ Orbitrap Velos data is Progenesis LC‐MS.
This software is explained in full in chapter 2.21.
The acquisition of data from the Synapt™ G2 and LTQ Orbitrap Velos is completely different, one
being data dependant, the other not and one being a time of flight instrument, the other being an
ion trap plus the unique features of MSE and the Orbitrap analyser. However, both instruments are
high resolution with high mass accuracy and have a fast duty cycle. One objective of this study was
to compare the respective results to see how well label free technologies correlate across
instruments.