• No se han encontrado resultados

LA BIBLIOTECA UNIVERSITARIA Y SU RELACIÓN CON LA ENSEÑANZA SUPERIOR

RECOMENDACIONES DEL SEMINARIO SOLUCIONES BRASILEÑAS “1 Que se reconozca a la biblioteca como

3.3. LA BIBLIOTECA UNIVERSITARIA Y SU RELACIÓN CON LA ENSEÑANZA SUPERIOR

The diffusion map representation described in Section 5.6. demonstrates that the HSPC dataset recapitulates the structure of the haematopoietic hierarchy. To confirm the apex of the hierarchy, molecular overlap cells (MolO cells) were projected onto the atlas (Fig. 5.8A). Wilson et al.

identified the MolO cells during their investigation for a sorting strategy that enriches for functional LT-HSCs with high purity (N. K. Wilson et al. 2015). The MolO cells are HSCs that share a transcriptional profile and have increased probability of long-term multilineage repopulation potential upon single-cell transplantation. When projected onto the landscape, MolO cells sit at the top of the structure with the most primitive cells, as expected. Cells belonging to MEP and LMPP populations are at the end of the landscape, and intermediate populations such as MPPs and PreMegEs were present between the highlighted cell types.

In Chapter 3, the diffusion map was used to capture cells on differentiation trajectories towards mature cell types. Motivated by the structure of the hierarchy described by the qRT-PCR data, pseudotime analysis was performed to better understand the transcriptional changes occurring throughout differentiation. Pseudotime orders cells based on their gene expression profiles to infer their position in differentiation, which can be used to construct differentiation trajectories through single-cell expression data (Ocone et al. 2015). Coordinates on the diffusion map were used to identify cells on trajectories from HSCs to MEPs and LMPPs (Fig. 5.8B). Cells were assigned to two broad branches and were ordered in pseudotime using the Wanderlust algorithm (Bendall et al. 2014).

The expression of transcription factors was visualised through the pseudotime progression for both MEP and LMPP trajectories (Fig. 5.8C). Distinct expression patterns were noted between the two trajectories. On the MEP trajectory, Nkx2.3, Meis1 and Pbx1 expression decreased as Gata1, Gfi1b and Ikzf1 expression increased, consistent with the negative correlation seen between these genes in Fig. 5.6. Along the LMPP trajectory, genes important in lymphoid development, such as Notch and Ets1, increased, whereas genes important in HSC characteristics, such as Prdm16 and Hoxb4, decreased early in the trajectory and the key erythroid gene Gata1 was not expressed at all. Overall, fewer genes showed increased expression compared to the MEP trajectory, once again reflecting the myeloid-erythroid bias of the gene set. Pseudotime ordering demonstrates the gene expression dynamics occurring in haematopoiesis and suggests that the data could be further used to investigate regulatory networks along the trajectories.

Figure 5.8. Pseudotime ordering reveals two differentiation trajectories in the single-cell HSPC data. (A)

Projection of MolO cells onto the qRT-PCR dataset using a diffusion plot visualisation. MEPs and LMPPs are highlighted. MolO cells – purple; MEP – red; LMPP – blue. (B) Differentiation trajectories from stem cells to MEPs or LMPPs. Cells are coloured by their pseudotime value, moving from blue (early in pseudotime) to red (late in pseudotime). Cells not in the trajectory are grey. (C) Heatmaps showing the expression of transcription factor encoding genes. Cells are ordered along the pseudotime trajectories towards MEP or LMPP fates. The colour bar at the top of each heatmap indicated the cell types along each trajectory. HSC – purple; FSR-HSC – forest green; MPP – light blue; CMP – yellow green; LMPP – blue; PreMegE – brown, MEP – red, GMP – yellow. Figure was generated by Fiona Hamey and modified by Sonia Shaw.

5.9. Conclusions

In this chapter, single cell gene expression profiles were generated using the Fluidigm BioMarkTM platform to explore heterogeneity and regulatory relationships within the haematopoietic hierarchy. A qRT-PCR dataset of 2,167 cells was generated, spanning HSCs and early progenitors. The dataset was then explored using dimensionality-reduction methods, correlation analysis and pseudotime ordering.

This investigation built on a pre-existing dataset which included HSCs, FSR-HSCs, and four additional progenitor populations: CMPs, GMPs, MEPs and LMPPs (N. K. Wilson et al. 2015). The gene set was handpicked to include 33 transcription factor encoding genes important in HSPC biology, as well as 12 other genes implicated in HSC function. While this was already a large dataset with good coverage of early haematopoiesis, it did not include intermediate progenitor populations that occur during the differentiation process. Including these intermediate populations gave a more complete picture of early haematopoietic differentiation, which proved to be useful for inferring differentiation trajectories and regulatory networks. The three additional populations isolated were FSR-HSCs, MPPs, and PreMegEs. These populations were chosen because FSR- HSCs and MPPs should have multi-lineage potential without the capability of reconstituting a mouse long-term, and PreMegEs are an early precursor of megakaryocytic, erythroid, or mixed colonies (Pronk et al. 2007; Cabezas-Wallscheid et al. 2014).

The HSPCs were visualised using three dimensionality-reduction methods: PCA, t-SNE, and diffusion maps. All three methods recapitulated the haematopoietic hierarchy but gave varying levels of resolution in terms of heterogeneity and gene expression relationships. The PCA plots showed little separation of the four HSC isolation strategies, whereas HSC4 was more separated from the other strategies in both the t-SNE and diffusion map plots. As PCA only recognises linear relationships in the data, it will miss any non-linear relationships and therefore may not provide the most suitable visualisation for more complex structures, such as single-cell data. t-SNE is useful for visualising highly heterogeneous data and positions cells with similar expression profiles close together. In this HSPC dataset, the t-SNE plot positioned the MEPs and LMPPs further away from each other. However, while t-SNEs are often used to represent heterogeneous datasets, they are stochastic and may struggle to display continuous processes such as differentiation. Diffusion maps have been adapted to specifically deal with single-cell expression data (Coifman et al. 2005; Haghverdi, Buettner, and Theis 2015). When this dataset was visualised using diffusion maps, the

LMPPs were clearly separated from PreMegEs and MEPs. Based on the structure of the data, the diffusion map was the best method for recapitulating the structure of the haematopoietic hierarchy.

Unsupervised hierarchical clustering was used to group cells based on their gene expression and clearly separated the more mature progenitors and HSC populations into two distinct clusters. However, within the HSC cluster, there was a great amount of heterogeneity and overlap with FSR- HSC, LMPP, CMP and GMP populations. The clustering showed that the four isolation strategies used for HSCs do overlap but vary in their functional purity. The four strategies capture different “contaminating” cells, i.e. non-HSC cells, and the frequency and nature of these contaminating cells depends on the sorting strategy used. Correlation analysis was performed on all 2,167 cells together as well as the individual populations to examine regulatory relationships between the different populations. Previously published positive interactions were observed between Scl and Gata2 and between Gata2 and Gfi1b, corroborating the accuracy of this dataset (Moignard et al. 2013; J. E. Pimanda et al. 2007). Furthermore, negative correlations were observed between genes involved in opposing branches of haematopoietic differentiation, such as Notch and Gata1, which are genes involved in the lymphoid and erythroid lineages, respectively (Pui et al. 1999; Hamlett et al. 2008).

The correlation analysis for individual populations showed that many positive correlations were stable among the HSPC populations, but key differences were observed in their negative correlations. MEPs, GMPs, and CMPs in particular had the most negative correlations between genes. A multipotent phenotype may therefore be more associated with positive relationships, while repression, or lack of it, becomes more important in increasingly differentiated populations. CMPs and GMPs had similar regulatory relationships, in which Notch was negatively correlated with many erythroid genes. Current research suggests that CMPs may actually be a heterogeneous population primed towards erythroid and myeloid fates (Perié et al. 2015; Jaitin et al. 2014). Indeed, this investigation suggests that CMPs and GMPs are very similar, based on the visualisation of their transcriptional structures, clustering of the cells, and correlation analyses.

The cells were ordered along differentiation using pseudotime ordering, which identified two trajectories in the data from HSCs towards MEPs and LMPPs. Visualising the expression of transcription factor encoding genes along the trajectories showed that the genes had both static and dynamic expression patterns. Gata1 was differentially expressed along the two trajectories and was associated with the MEP trajectory, while Notch and Ets1 increased along the LMPP trajectory but

were only expressed in the HSC component of the MEP trajectory. Conversely, Bptf, Smarcc1 and Myb, which were positively correlated in the analysis of all 2,167 cells, were constitutively expressed in both trajectories. The complexity of correlations observed, as well as the gene expression changes occurring along pseudotime, imply that it may be a useful dataset for inferring regulatory network models along the two trajectories, explored in Chapter 6.

5.9.1. Limitations

Although pairwise correlations were identified for all populations, fewer relationships could be identified for LMPPs as the gene set was biased towards myeloid-erythroid genes and focused on HSCs rather than the progenitor populations. A limitation of qRT-PCR is that the number of genes profiled is limited and chosen by the investigator, which may hinder discovery of novel regulators, and, in the context of this work, may miss key regulators of haematopoietic differentiation. Furthermore, the limited gene set fails to capture the full transcriptional heterogeneity of the different cell types. Visualisations of this dataset have suggested a significant overlap between GMP and LMPP populations; however, the work in Chapter 3 shows these populations can be separated based on the full transcriptome, where LMPPs and GMPs occupy separate territories on the transcriptional landscape. These populations could potentially be separated better if specific lymphoid genes were included in the gene set, such as Dntt, Il7r, or Cd19.

Another limitation of this work arose due to technical issues in the processing of samples, which resulted in key regulators such as Gfi1 and Spi1 being excluded from the analysis. Spi1 would have been a valuable addition to the analysis due to its proposed antagonistic relationship with Gata1 in megakaryocytic-erythroid versus granulocytic-monocytic lineage decision making (Burda, Laslo, and Stopka 2010). Recent research from continuous live cell imaging and reporter mouse lines suggest that these two genes do not initiate the megakaryocytic-erythroid versus granulocytic- monocytic lineage switch, but rather reinforce the lineage choices once made (Hoppe et al. 2016). It would have been interesting to see if these recent findings could be seen in this single cell qRT- PCR dataset. Gfi1 was previously identified to be part of a regulatory triad with Gata2 and Gfi1b, and would have been a useful addition to compare the work to previous literature (Moignard et al. 2013). Missing these key regulators renders the findings from hierarchical clustering and pairwise correlation analysis incomplete, although the dataset does accomplish its original goal of distinguishing between HSC sorting strategies.

5.9.2. Further work

In this chapter, a qRT-PCR dataset of HSPC expression profiles was generated and interrogated for heterogeneity and regulatory relationships. The results indicate that there are complex relationships occurring between genes during differentiation, and that HSC regulators are not only involved in HSC maintenance but play a role in differentiation decisions as well. Genes that have similar expression profiles may also share regulatory mechanisms, whereas genes that don’t have similar expression profiles are likely to have unrelated regulatory mechanisms (Ståhlberg and Bengtsson 2010). It would be interesting to try to infer regulatory networks of transcription factors along the MEP and LMPP trajectories to further explore their unique regulatory mechanisms as well as those shared between them. Furthermore, these networks can be validated using functional assays. The regulatory networks and their validation will be explored in Chapter 6.

5.9.3. Summary

Single-cell expression profiling of HSPC populations using qRT-PCR demonstrated the heterogeneity present within populations of the haematopoietic hierarchy. Pairwise correlations of the different haematopoietic lineages identified regulatory relationships in individual populations and across the HSPC compartment. Pseudotime analysis ordered the cells in two trajectories from HSC to MEP or LMPP fates and was used to compare the dynamics of transcription factor expression along these trajectories.