These subsets offer more granular data on the interaction between cell-intrinsic effects and TF-specific effects in ChIP-seq binding profiles on CODEX. For instance, in normal myeloid cells (Figure 6.10), we can observe cell-intrinsic effects for macrophages in cluster B, but we can also see TF-specific effects across cell lines with STAT5 and PU.1 in clusters A and C respectively. Clusters D and E exemplify this interaction, with cluster E forming a GATA1- specific subset within the erythroid progenitor cells in cluster D. The PU.1:GATA1 paradigm has been well-described (Graf et al, 2009), this paradigm has been recently challenged by employing novel reporter mouse lines and live imaging for continuous single cell long-term quantification of PU.1 and GATA1 transcription (Hoppe et al, 2016), but GATA1 remains an essential driver of erythroid differentiation which forms a distinct cluster on CODEX.
← STAT3. STAT5
← PU.1
← GATA1
Figure 6.10 Heat map showing pairwise correlations between ChIP-seq profiles for transcription factors in normal myeloid cells on CODEX (A) is erythroblasts and megakaryocytes with STAT5 or STAT3, (B) is macrophages, (C) is macrophages and cDCs with PU.1, (D) is pro-erythroblast, erythroblast and erythroid progenitors, (E) is a subset with GATA1 only
159
Figure 6.11 describes pairwise correlations between ChIP-seq binding profiles for normal lymphoid cells. The distinct clusters A and C suggest that B lymphocytes and plasmacytoid dendritic cells have distinct binding profiles, however it is important to state that the small number of samples preclude any definitive conclusions.
It is worth stating here that the developmental biology of cDCs and pDCs remains a controversial question, with evidence that these cells can develop through both myeloid and lymphoid differentiation pathways. Our decision to place our cDC ChIP-seq profile in the normal myeloid compartment (Figure 6.10) and our pDC ChIP-seq profile in the normal lymphoid compartment (Figure 6.11) was based on our previous observations under the heat map in Figure 6.2 which examined TF binding profiles for all cells together. This is an example of how CODEX can be employed to define cryptic cell lines by pairwise correlation.
Figure 6.11 Heat map showing pairwise correlations between ChIP-seq profiles for transcription factors in normal lymphoid cells on CODEX (A) is B lymphocytes, (B) is lymphoblastoid cells, (C) is plasmacytoid dendritic cell
Figure 6.12 shows pairwise correlations between malignant myeloid cells. It is difficult to draw conclusions because this subset includes only AML and K562 cell lines, however we can observe that the most distinct clusters A and C are formed by AML cells, and that cluster B is formed by K562 cells which are driven by master regulator TFs such as GATA1, RUNX1 and FOSB.
← GATA1, RUNX1, FOSB
Figure 6.12 Heat map showing pairwise correlations between ChIP-seq profiles for transcription factors in malignant myeloid cells on CODEX (A) is AML, (B) is K562 cells, (C) is mostly AML cells with some K562 cells
Figure 6.13 confirms previous observations that distinct clusters are formed by binding profiles for myeloma cells (cluster A), and T-ALL and Jurkat cells (clusters B and C), and that CTCF-binding characterises pairwise correlation across Burkitt Lymphoma and myeloma cell lines (cluster D).
161
← CTCF
Figure 6.13 Heat map showing pairwise correlations between ChIP-seq profiles for transcription factors in malignant lymphoid cells on CODEX (A) is MM.1S cell line, (B) is T-ALL and Jurkat cells, (C) is T-ALL and Jurkat cells, (D) is Burkitt Lymphoma and myeloma cell lines with CTCF
6.7 Discussion
Several observations can be made through the HAEMCODE database of human blood cells. Firstly, TF DNA binding profiles form more distinct clusters on pairwise correlation of peaks than are formed by histone binding profiles, even when the S3norm protocol is employed to normalise data and account for SNR. Secondly, TF identity seems to exert more influence on pairwise correlations for TF binding profiles in progenitors, but cell-intrinsic correlations are more evident in lineage committed cells. Thirdly, TFs do nonetheless continue to influence binding profiles within clusters of similar cells, for instance we observe that GATA1 forms a distinct sub-cluster within the erythroid progenitor cluster. Fourthly, pairwise correlation of
TF DNA-binding profiles can reveal surprising correlations which merit further exploration, such as the correlation between AML cell lines and T-ALL cell lines. Fifthly, clusters are more difficult to establish in malignant cells than in normal cells, and this may be related to disruption of defined biological processes in normal cells. Lastly, CTCF has a singular capacity to maintain a consistent binding profile between different cell types in both normal and malignant cells, which may be related to its effects on chromatin looping and architecture (Hanssen et al, 2017).
CODEX is a powerful tool to probe DNA-binding profile data. For instance, pairwise correlations of human blood cells suggests that our pDC binding profile is closely related to lymphoid cell lines and our cDC binding profile is related to myeloid cell lines.
In addition, CODEX is able to harness experimental data from multiple sources and infer novel biological relationships. For instance, one of our findings is that CTCF has a very similar DNA binding profile in cell types as diverse as myeloma and Burkitt Lymphoma. It is known that CTCF has several binding sites within the NANOG locus, and therefore UCSC genome browser can be employed to visualise CODEX data for CTCF binding at this locus for myeloma and Burkitt Lymphoma cells:
Figure 6.14 UCSC genome browser confirms similar binding profiles for CTCF at the NANOG locus for myeloma and Burkitt Lymphoma cell lines
163
One of the most useful features of CODEX is that it allows users to freely upload their own ChIP-seq data and compare it to existing CODEX profiles. For instance, Jiapaer et al used CODEX to identify that the promoter for long non-coding RNA LincU, which maintains embryonic stem cells in a naïve state, was bound by KLF4 and SMAD1 as well as NANOG, and then tested this transcriptional mechanism with a LincU-promoter luciferase reporter (Jiapaer
et al, 2018).
In addition, CODEX can be employed to probe results from GWAS. For instance, if a SNP is identified which correlates with a clinical condition, then CODEX can inform users if that SNP is located at a locus with a particular TF or histone modification profile in a cell type of interest or across all cell types. Hodonsky et al performed a GWAS of 12,502 Hispanic people to characterise novel SNPs and genomic loci associated with red blood cell traits. They then probed this data for loci which were likely to be of functional significance, by cross-referencing with CODEX histone modification and ChIP-seq signals for key erythroid TFs (Hodonsky et al, 2017). Similarly, it can be informative to examine TF or histone binding profiles for mutations in non-coding regions of DNA. Another potential use of CODEX is to visualise whether altered gene expression in a specific human haematological malignancy correlates with TF binding to a particular locus.
In summary, our work has significantly expanded previous iterations of CODEX and we now have 459028795 bases with non-zero coverage, resulting in 14.8% of the genome covered with histone modification data. This latest iteration has significant potential for improving the applicability of CODEX to human disease states, as our focus has been to input data from human malignant blood cells. One of the strengths of CODEX is that its standardised pipeline permits continued expansion of data sets which will increase the power of pairwise correlations. In this regard, future work will include the incorporation of important cell types such as normal neutrophils and common malignant blood cells such as chronic lymphocytic leukaemia and follicular lymphoma. It will also be important to explore methods other than standard hierarchical clustering to identify putative groupings of experiments. Given the high- dimensional nature of the data contained in CODEX once several hundred experiments are considered, techniques now popular with single cell RNA-Seq analysis may be worth exploring, such as t-SNE and UMAP. It is also worth noting that CODEX appears to cope well with batch effects in general, since we repeatedly observe that experiments from different centres
demonstrate pairwise correlation. However, this is not always the case, suggesting that development of algorithms for batch effect removal may also be an important area for future investigations.
165
7. Conclusions
CEBPA double mutations confer favourable prognosis in patients with cytogenetically normal
AML in the absence of FLT3 ITD mutations, but it is important to remember that CEBPA- mutated AML nonetheless carries a significant mortality and morbidity burden, with a five- year overall survival of only 53% to 60% (Preudhomme et al, 2002, Renneville et al, 2009). Bearing in mind that CEBPA is one of the most common mutations implicated in human AML, the significant healthcare burden associated with treatment of this aggressive disease (Bewersdorf et al, 2019), and the fact that the mainstay of current chemotherapy regimens date from the 1970s (Evans et al, 1961), it can be persuasively argued that a better understanding of the biology of CEBPA-mutated AML is urgently required.
The cellular model for testing a specific mutational perturbation is particularly important when interpreting results of interventions. The impact of distinct models can be appreciated by the discordant results of CEBPA deletion on HSC function identified by Porse and colleagues who found that C/EBPa promotes HSC maintenance (Hasemann et al, 2014), as compared with Tenen and colleagues who employed a different model and concluded that C/EBPa inhibits HSC proliferation (Ye et al, 2013). Investigations of CEBPA-mutated AML have typically transduced myeloid-committed cells such as the 32Dcl3 cell line (Guchhait et al, 2003) for in
vitro studies, and harvested cells from highly heterogeneous haematopoietic compartments
such as bone marrow mononuclear cells (Togami et al, 2015) or foetal hepatocytes (Bereshchenko et al, 2009) for in vivo investigations. We have successfully created a novel
CEBPA-mutated cell line which has several advantages over previous in vitro models. Firstly,
our conditional model is based on a LMPP-like cell line which replicates a relatively early stage of haematopoiesis and is capable of reconstituting myeloid, lymphoid, and dendritic cell (DC) lineages. This is relevant because CEBPA is implicated as an early driver mutation in de novo AML, as evidenced by CEBPA-mutated clones being present in relapse, and we are more likely to recapitulate pre-leukaemic changes at the LMPP stage than in committed myeloid progenitors. Furthermore, several lines of investigation suggest that C/EBPa-mediated lineage specification is a relatively early event, for instance Welner et al found that C/EBPa was required for differentiation from HSPCs to common dendritic progenitors, but C/EBPa deletion in mature DCs had no effect on their numbers or function (Welner et al, 2013).
Secondly, the clonal murine Hoxb8-FL model provides a reproducible and homogenous cellular model which avoids the uncertainty associated with employing a highly heterogeneous cell compartment. Thirdly, we have employed the same model to generate Empty Vector and CEBPA WT-transduced cell lines to act as comparators.
The advantages of the precise control offered by this cellular model were initially evidenced by comparison of the behaviour of CEBPA WT-transduced cells in Hoxb8-FL conditions, which showed a cell marker profile consistent with differentiation. By contrast, CEBPA N321D- transduced cells failed to differentiate in Hoxb8-FL conditions, suggesting that the mutation has a dominant negative effect.
Our cellular model then allowed us to reproducibly test the effect of mutant C/EBPa on early differentiation in myeloid, lymphoid and DC lineages. We found that the N321D mutation had no appreciable effect on the cell marker profile of transduced cells in cytokine conditions which favoured granulocyte or lymphocyte specification, but showed a distinct phenotype in Flt3L, strongly suggesting an early DC effect in mutant C/EBPa. Furthermore, on prolonged cell culture, it was found that the C/EBPa N321D cell line was immortalised in Flt3L conditions.
Subsequent analysis by single cell RNA-seq (scRNA-seq) confirmed the C/EBPa WT and C/EBPa N321D phenotypes suggested by flow cytometry, and captured precisely the gene expression profile associated with the N321D mutation during early differentiation in Flt3L conditions. In addition, transcriptome analysis at the level of the single cell revealed a small sub-population of CEBPA N321D-transduced cells with a distinct gene expression profile, which would not have been captured by bulk RNA-seq and which can be hypothesised to represent the immortalised fraction of cells transduced with mutant CEBPA. This result underlined the importance of single cell analysis to appreciate the rare events implicated in heterogeneous pre-leukaemic alterations. ChIP-seq was then performed to identify putative direct binding events implicated in those alterations, and identified direct targets of C/EBPa N321D with altered gene expression, including NOTCH2, JAK2, SIRPA and FOS which perform significant roles in DC lineage specification and differentiation, as well as genes such as MYB and E2F2 implicated in HSPC cell state and cell cycle control respectively. Interestingly, it was more difficult to identify a significant subset of these genes which had two-fold enrichment for H3K27Ac, and our results suggest that the N321D mutation does not obviously affect the
167
acetylation profile. This is a surprising result, given that biologically meaningful genes are bound by C/EBPa N321D and have altered gene expression in our experiments, and given that Pundhir et al have found that chromatin accessibility and activity correlate with C/EBPa’s function as a pioneer factor in the context of early GM lineage differentiation (Pundhir et al, 2018). In this regard, it is important to remember that key genes such as IRF8, RUNX1 and SPI1 were bound by C/EBPa N321D but did not show altered gene expression, and it is tempting to speculate that the phenotypic effects of the N321D mutation may be mediated in part by conformational changes and altered dimerisation profiles rather than a direct binding event between C/EBPa N321D and cis-regulatory elements, explaining the disconnect between DNA binding patterns, acetylation profiles and gene expression. In this regard, Thomas Graf and colleagues have found that topological reorganisation often precedes gene expression changes during cell reprogramming by transient expression of CEBPA followed by induction of the Yamanaka TFs OCT4, SOX2, KLF4 and MYC (Stadhouders et al, 2018).
In mice transplanted with the C/EBPa N321D cell line, we found that mutant cells were capable of long-term engraftment in vivo but did not reliably reproduce leukaemia, suggesting that co-operating mutations are required for a leukaemic phenotype. This did not correlate with results reported by Togami et al when C57BL/6J mice received transplants of BM-derived cells tranduced with the N321D mutation (Togami et al, 2015). However, it does agree with reports by other groups including Nerlov and colleagues that C-terminal mutations result in expansion of multipotent progenitors with no progression to leukaemia (Bereshchenko et al, 2009). In this regard, the Bonnet group found long-term repopulating activity in human cord blood-derived Lin- cells transduced with NC-terminal but not C-terminal mutated CEBPA (Quintana-Bustamente et al, 2012). Our results also correspond with several lines of clinical data from human patients with CEBPA-mutated AML:
(i) these patients are typically characterised by a bi-allelic mutational profile (Wouters et
al, 2009);
(ii) hereditary germline mutations in CEBPA usually affect the N-terminus but do not result in a familial AML phenotype until patients acquire a somatic C-terminal mutation (Tawana et al, 2015);
(iii) CEBPA can cooperate with mutations in ASXL1, CSF3R, FLT3, GATA2, RUNX1, TET2 and WT1 (Fasan et al, 2014, Lavallee et al, 2016); and
(iv) AML patients with single-allele CEBPA mutations typically carry such a cooperating mutation (Wouters et al, 2009).
Recent work by van Galen et al has characterised heterogeneity in AML by performing scRNA- seq on 38,410 cells from sixteen AML patients and five healthy donors, and identified a spectrum of malignant cell types including cells with a DC transcriptional profile, as well as demonstrating that leukaemic cells have immunomodulatory properties such as T cell inhibition (van Galen et al, 2019). Our own experiments suggested that expression of CEBPA N321D at an early LMPP-like stage of haematopoiesis deregulates dendritic cell differentiation, specifically that it causes an immortalised CD11c+ B220+ BST2- pDC-like progenitor phenotype, correlating with gene set enrichment of differentially expressed genes identified by scRNA-seq. In addition, exogenous mutant C/EBPa binding events were increased in our cell line after differentiation, suggesting that the pre-leukaemic events occur in the HSPC compartment.
The role of C/EBPa in myeloid differentiation has been intensively studied, but relatively little is known regarding its function specifically in DC biology. Early work by Thomas Graf and colleagues suggested that expression of PU.1 in fully committed pre-T cells induces formation of dendritic cells, whereas expression of C/EBPa reprograms T cell progenitors to macrophages, and furthermore that NOTCH signalling is able to inhibit these processes (Laiosa
et al, 2006). Interestingly, our experiments demonstrated that C/EBPa N321D binds directly
with NOTCH2 and that expression of NOTCH2 is downregulated in CEBPA N321D-tranduced cells on differential expression with EV-transduced cells, and it is possible that NOTCH deregulation may influence DC lineage specification by the PU.1: C/EBPa axis. Later work by the Tenen group showed that there is an early requirement for C/EBPa in DC differentiation and that formation of mature DCs through myeloid progenitors is C/EBPa-dependent, whereas lymphoid progenitors are able to generate DCs in the absence of C/EBPa (Welner et
al, 2013). Given that recent work by Rodrigues et al has shown that pDCs develop mostly from
lymphoid progenitor cells (Rodrigues et al, 2018), it seems reasonable to hypothesise that the N321D mutation disrupts the C/EBPa-dependent myeloid pathway and favours the generation of pDCs rather than cDCs. This correlates with our experimental results that CEBPA N321D-mutated cells initially show reduced expression of cDC cell surface markers, before
169
progressing to an immortalised pDC-like immunophenotype. It is important to recognise that the phenotypic effects of C/EBPa N321D may also be mediated by interactions with other transcription factors implicated in lineage specification. For instance, our experiments have shown that mutant C/EBPa binds to IRF8 and SPI1, though gene expression of these two genes was not directly altered.
Given that our work suggests for the first time that a pDC phenotype may characterise pre- leukaemic changes in mutant C/EBPa, it is therefore useful to consider our results within the broader context of DC biology. It is well-established that DC subsets include professional antigen presenting cells and immunomodulatory cells which secrete IFN-a. More recently, evidence suggests that pDCs have a unique TLR7-dependent capability to recognise virus- infected cells throughout their intracellular replication rather than only during their rare extracellular transit events (Takahashi et al, 2010), and this means that pDCs can counter effectively the multiple mechanisms of immune evasion deployed by viruses. Marlène Dreux and others have shown that sensing of infected cells by pDCs involves viral envelope protein- dependent secretion and transmission of viral RNA in cell-to-cell interactions (Decembre et al, 2014), and it is tempting to speculate that pDCs may employ a similar mechanism to play a unique role in cancer surveillance.
Regarding pDC function, Rodrigues et al have shown that lymphoid-pathway pDCs produce interferon but do not participate in antigen presentation (Rodrigues et al, 2018). It would certainly be interesting to perform functional evaluation of the CEBPA N321D immortalised pDC-like progenitor cell line, for instance by measuring IFN-a production upon TLR9 stimulation and T cell proliferation in response to LPS stimulation. Interestingly, T cell anergy has also been noted in AML-derived dendritic cells (Narita et al, 2001), and in a broader malignant context intratumoral pDCs have been functionally characterised in a murine mammary tumour model, favouring tumour progression by secreting IFN-a poorly and by