2. Investigating Intra-tumor Heterogeneity

Analyse tumor heterogeneity in neuroblastoma using the R2 data analysis platform

This resource is located online at http://r2-training-courses.readthedocs.io .

2.1. Introduction

As was mentioned in the first part of the course, Investigating Structural Variants, no well defined path of subsequently gained mutations was found to lead to more aggressive tumorigenic cell types (the Vogelstein model) in neuroblastoma.In this practical work session, we’ll integrate RNA expression data with sequence data, specifically ChIP seq data, to further unravel neuroblastoma data.

Recent research suggests that neuroblastoma consists of different cell types. There is reason to believe that this heterogeneity causes the high percentage of relapses in the aggressive subtype of neuroblastoma. Children developing a relapse almost always die.In this section, we will focus on heterogeneity in neuroblastoma. We will study not only mutations and RNA expression of genes, but also study the epigenetic modifications of the DNA-associated histones. And we will work with datasets in which genes have been manipulated in cell lines and in living tissues.

Using recent molecular biology data gathering techniques and advanced bioinformatic data analysis algorithms we set out to investigate this aggressive characteristic of neuroblastoma tumors. We obtained tumor biopsies from four patients that were taken in culture. Each biopsy gave rise to two phenotypically divergent cell lines. In this course you will conduct the research yourself, following the lines of reasoning and the same data as was used in a paper by the AMC Oncogenomics group that was published in Nature Genetics in 2017.

2.2. Tumors and origins: a first impression of your data

For a start we’ll investigate established childhood tumor cell lines, including neuroblastoma. Established cell lines can be grown and passaged in culture indefinetely. A typical example is the classic HeLa cell line, taken from a cervical adenocarcinoma of Henrieta Lacks in 1951 that has been in culture since. How do profiles of neuroblastoma cell lines relate to cell lines of other tumors? Additional data about classical cell lines from other childhood tumors is available in the resources of the scientific community. For each publication scientists are required to make their data available in public repositories. We can use these in a larger public dataset of 86 other cell lines derived from 6 different childhood tumors and see how they relate.

Data used:

  • 86 cell lines derived from 6 different childhood tumors (Cellline Childhood cancer - ITCC - 86 - MAS5.0 - u133p2)

Techniques used:

  • mRNA Microarray expression

Analysis used

  • individual gene selection
  • t-SNE: t-distributed stochastic neighbor embedding statistics


2.2.1. Expression of key genes

  • The button below brings you to the form in which you can submit your answers for section 2.2.



  • Go to R2 by clicking on the button below:


You’re now on the R2 main page. This web based molecular biology data analysis platform contains a wealth of data and methods to analyze the datasets. Step by step, researchers are guided through a web of options for data analysis. R2’s main page shows this principle: step through each of the numbered boxes to develop your analysis of choice.In this case we’re first going to see if and how the mRNA expression of several genes changes through a single dataset. The proper dataset described above has been selected already. In this dataset 86 cell lines derived from 6 different childhood tumor types can be found (ewing sarcoma, medulloblastoma, neuroblastoma, osteosarcoma, acute lymphocytic leukemia and rhabdomyosarcoma).


_images/R2d2_logo.pngFrom knowledge acquired in previous lectures, or just from quick Googling on the web… Can you think of a gene that might show different expression between some of these 6 tumor models?


  • In field 4 type the name of the gene and click Next
  • Leave all settings at default and click Next

A graph shows the expression of this gene’s mRNA in the whole set of childhood tumor cell lines. Samples are along the x-axis, mRNA expression values of the gene in a sample are on the y-axis. Below the graph is the available annotation for the samples shown in colored tracks. In R2, samples can be annotated with e.g clinical data or biological information. Each group of annotated data is called a “Track” in R2. These tracks can be used to filter, color or split data in all types of R2 analyses.Sometimes you can see the categorical tracks displayed underneath a graph. But often more annotation is available for the samples. You can hover your mouse above dots in a graph or over the tracks underneath the graph to get more information per sample.

  • Hover with your mouse over data points to show additional information.

At the bottom of the page you can find a table with adjustable settings. Many settings of the graph can be adapted.

  • Try out a different view of the same data with the following changes to the settings:
    • The expression values on the y-axis are logarithmic by default, in the settings menu set the Transform option to none instead.
    • Split the data in groups with the setting use track under Group separations: choose the itcc_model track that contains the information which sample belongs to which tumor type.
    • The samples can be sorted by setting the Extra Graph Option to Track and Gene sort.
    • Finally, click Adjust Settings to obtain the graph with these adaptations.
  • Now try the gene MYCN (Type MYCN in the Change Gene box in the upper right corner to keep all your settings, but to change your gene).
  • Hover with your mouse over the track underneath the graph or over the data points, to find out which itcc_model belongs to which group of samples.

_images/R2d2_logo.pngWhat can you say about the expression of this gene in the different tumor models?



_images/R2d2_logo.pngWhat might the expression level of this gene in neuroblastoma signify about the function of the gene in this cancer?


2.2.2. Clustering with tSNE maps

We’ve seen that the expression of genes differs among the samples and some types of tumors seem to specifically express certain genes. To further explore the type of data we’re dealing with, an unbiased unsupervised type of clustering analysis is a good idea. One recently developed algorithm is the tSNE map. Similar cells will clump together on the map.

  • Click the button below to show the tSNE map in R2



  • Under the graph, a menu allows the user to adapt settings. Colors of the graph points are not set by default. To color the graph with a biologically meaningful annotation, find the ColorMode dropdown and select Color by Track. Now set the Track for Color dropdown to use the itcc_model track, and click Next to show the changes.
  • The t-SNE algorithm has a parameter called perplexity, which determines how much attraction points on a map have towards each other. Set the perplexity value to 5 and click next again.

_images/R2d2_logo.pngWhat do you note about the clustering of the neuroblastoma samples?

_images/R2d2_logo.pngWith which other tumor models do the neuroblastoma cell lines cluster?



_images/R2d2_logo.png**Based on the above, what would you do to further investigate your observations? **


2.3. Urgency of research: patient material

In the former step we derived that neuroblastoma cell lines seem to group with cell lines of different developmental lineages. We have recently established new cell line pairs from neuroblastoma patients. In some cases multiple cell lines were obtained from the same biopsy. These cell lines share genetic defects and are therefore called isogenic cell line pairs. A microscopy image of each pair is provided below.

Figure 2: Bright field image of isogenic cell line pairs.

Figure 2: Bright field image of isogenic cell line pairs.

  • The button below brings you to the form in which you can submit your answers for section 2.3




_images/R2d2_logo.pngWhat do you note about the morphology of the cell lines?


We profiled the mRNA expression of genes using Affymetrix mRNA chips in three of these pairs and of a previously established neuroblastoma cell line that after culturing gave rise to two very divergent phenotypes. The resulting gene expression patterns can be used to perform a hierarchical clustering. An example of such clustering resulting in an ordered heatmap is provided below

Figure 3: Heatmap: unsupervised clustering of samples using the distribution of the expression data combined with the clustering of genes based on their expression through the samples.

Figure 3: Heatmap: unsupervised clustering of samples using the distribution of the expression data combined with the clustering of genes based on their expression through the samples.

Data used:

  • Cell lines recently derived from three different patients. Two morphologically different looking cells were taken per patient. This dataset is combined with two classical Neuroblastoma cell lines that clustered differently in the tSNE: SHEP and SY5Y (Mixed Neuroblastoma (MES-ADRN) - Versteeg - 8 - MAS5.0 - u133p2)

Techniques used:

  • mRNA Microarray expression

Analysis used

  • Toplister: unsupervised gene selection
  • Unsupervised hierarchical clustering
  • Heatmap visualization


For this analysis we’ll directly go to one of the analysis tools of R2: Toplister. The Toplister can assess which genes behave different throughout a dataset. It does so by selecting the genes whose expression values have the largest standard deviation within a given set of samples. This gives an unbiased view of the differences in gene expression.

  • Go to R2 by clicking the button below. R2 will find the 100 genes that have the largest variation in gene expression among these 8 cell lines, three pairs from three tumors of a patient and two classical neuroblastoma cell lines.


  • Click Next; a list of genes appears

_images/R2d2_logo.pngDo you recognize any genes that explain the difference in phenotype?


  • Use the mousewheel to scroll to the bottom of the page (or click on the shoe-print at the top of the page).

Here you can choose to perform an additional analysis. The heatmap vizualization produces a hierarchically clustered view of the expression values for the top 100 genes.


_images/R2d2_logo.pngWhat number of groups do you expect?


  • Click on Heatmap (z-score)

The cell line pairs from the patient were also investigated for the tumor stem cell marker gene CD133 and for their migration capability. See the results in the figure below:

Figure 4: CD133 FACS analysis and transwell migration assay of isogenic pairs

Figure 4: CD133 FACS analysis and transwell migration assay of isogenic pairs


_images/R2d2_logo.pngGiven these observations, what differentiation state can you assign to each group?



2.4. Which genes make a difference? Creating signatures

We have identified two different types of cells that occur within the same patient. Neuroblastoma apparently has a heterogenous nature. What genes determine the difference between the two types? We’ll use RNA expression data again but now we will use a predefined, supervised classification in groups to search for genes that characterize this classification best, or in other words, that are differentially expressed between these two groups.

Data used:

  • Mixed Neuroblastoma (MES-ADRN) - Versteeg - 8 - MAS5.0 - u133p2 (same as above)
  • Gene Ontology
  • Broad curated hallmark datasets

Techniques used:

  • mRNA arrays

Analysis used

  • Differential Expression: supervised gene selection
  • Gene Ontology Analysis: Overrepresentation calculation


  • The button below brings you to the form in which you can submit your answers for section 2.4



  • Go to the main page of R2 by clicking the button below


  • In Field 3 choose Find Differential expression between groups and click Next

This dataset has been annotated with ‘cell type’ information. Each sample was assigned to either the MESenchymal or the ADReNergic cell type. The information is stored in R2 in a track.

  • Choose the proper track in the upper most dropdown Select a track dropdown. Since we have only 8 samples make sure that the multiple testing correction is set to No correction. Click Next twice. (More information on Correction for Multiple Testing can be found here).
  • A list of differentially expressed genes appears with correlation p-value < 0.01 in this dataset is shown. Click on the hyperlinked name of your favorite gene to see its expression in the sample set; try an oppositely correlating gene as well
  • Go back to the window with the differentially expressed genes. This is still open in one of your browser tabs.
  • Click on the Heatmap(zscore) button in the right menu panel; a heatmap shows the expression of the differentially expressed genes for each sample.

_images/R2d2_logo.pngHow is this figure different from the former?



For future use, this list of genes has been stored for you in R2 as signatures (aka genesets or categories). The list has been split into two categories: one set of genes that is highly expressed in the MES type of samples (r2_mesadrn_mes) and one set of genes highly expressed in the ADRN type of samples (r2_mesadrn_adrn).

R2 provides additional analysis for sets of genes that can be accessed from the right panel of menu buttons. As a first analysis step we can check a data resource called the Gene Ontology that provides a tree of systematically ordered biological terms that is used to formally describe the biological role of each gene. The Gene Ontology Analysis tool in R2 calculates for each of the thousands of groups of genes that are annotated with a specific biological term whether your set of choice is over-represented in them.

  • On the page with the differentially expressed genes, select the Gene Ontology Analysis button in the menu on the right

_images/R2d2_logo.pngWhat can you say about the function of the differentially expressed genes?



  • Now scroll down to the end of the page (or click the filter button in the left upper corner of the page) and adapt the settings such that only the Biological Process branch of the Gene Ontology is selected, and select only the genes that are higher expressed in the MES type of cells

_images/R2d2_logo.pngWhat can you say about the function of the group of genes that are upregulated in the MES type of cells?



In R2 there are many more sets of genes that have been found to be implemented in specific processes. The Broad Institute has compiled quite some of these sets of genes that characterize hallmark biological processes.

  • Go back to the window with the differentially expressed genes.
  • Select the Gene set analysis option from the right menu
  • Select the geneset broad_2015_hallmark and click Next

_images/R2d2_logo.pngWhich hallmark category of genes pops up as most important? Can you explain this?



2.5. Identifying groups: using signatures to classify other datasets

We now have a signature that distinguishes between the two types of cells. We also obtained some hints about functional characteristics of these cells. How does this signature behave in other datasets? Does the same set of genes tell us something about other sets of tumors or cell lines? This is the next step in our analysis.We’ve assembled a more complex dataset by gathering the dataset of the 4 pairs of cell lines, additional neuroblastoma cell lines from the first dataset and publicly available data of non-malignant human neural crest tissue. The neural crest undergoes a mesenchymal transition and gives rise to cell types from the adrenergic lineage.

Data used:

  • A combination of the 8 cell lines above, additional neuroblastoma cell lines and cells from the neural crest lineage (Mixed Neuroblastoma (MES-ADRN-CREST) - Versteeg/Etchevers - 34 - MAS5.0 - u133p2)

Techniques used:

  • mRNA expression data

Analysis used

  • Heatmap analysis



  • The button below brings you to the form in which you can submit your answers for section 2.5



  • Go to the main portal of R2 by clicking the button below; the dataset described above is automatically selected


  • In field 3 select View Geneset (Heatmap) that you can find under the red header Meta-analysis.
  • Click Next and select geneset r2provided_genelists
  • Click Next, leave selection as is and click Next
  • Select both signatures that were derived before by CTRL click on the MES (r2_mesadrn_mes) and ADRN (r2_mesadrn_adrn) signatures and click Next

_images/R2d2_logo.pngWhich cell types group together?

_images/R2d2_logo.pngHow does this relate to the earlier observations on cell lineage?


When observing such clear-cut patterns it is good scientific practice to test this in additional datasets. The database of R2 contains an additional dataset consisting of neuroblastoma cell lines that were profiled by a French research team.

  • Click on the button below to go there and perform the same analysis.



_images/R2d2_logo.pngDo you observe similar patterns?


2.6. Using scores for further characterization

The expression patterns of these specific signatures can be used to compare cell types. We can do this by summarizing the expression data of all genes in the signature in each cell type in one value; a signature score. The figure below shows the signature score of the MES part of the signature in a specific sample.

Figure 5: The signature score as calculated for a specific sample in the MES signature.

Figure 5: The signature score as calculated for a specific ample in the MES signature.

R2 has calculated these scores for all samples in both signatures. We’re going to find out how they relate.

Data used:

  • Mixed Neuroblastoma (MES-ADRN-CREST) - Versteeg/Etchevers - 34 - MAS5.0 - u133p2

Techniques used:

  • mRNA expression data

Analysis used

  • Signature scores



  • The button below brings you to the form in which you can submit your answers for section 2.6



  • Go back to the main portal of R2 by clicking the button below.


  • In field 3 choose Relate 2 tracks and click Next
  • First we’ll explore the scores in each signature separately; on the X-axis (Select X track) we’ll use the unique sample id (lab_id) and on the Y-axis the signature score track that R2 has generated for the ADRN signature (s_mesadrn_adrn(#)). Click Next.
  • A graph is generated. For each sample the signature score for the mesadrn_adrn signature is shown. Select Color by Track for ColorMode and try different tracks. Click Adjust Settings to view the result.
  • Now select for the Y-axis the MES part of the signature, click Adjust Settings to view the result.
  • To compare the signature scores, select the ADRN signature for the X track
  • If you have time you can also try the Color by Gene ColorMode, choose a gene of interest (Note: the dropdown selection is linked to the database, wait for the proper selections to popup…)

_images/R2d2_logo.pngWhat conclusion would you draw from these figures?


2.7. Finding causes: homing in on transcription factors

Apparently there are two types of cells in Neuroblastoma tumors. Neuroblastoma seems to be a heterogenous tumor. Transcription factors (TF’s) are known to determine gene expression programs in cells. These gene expression programs determine the development of the cell. Can we find out which TF’s might influence the difference between both of these cell lines?

Data used:

  • Mixed Neuroblastoma (MES-ADRN-CREST) - Versteeg/Etchevers - 34 - MAS5.0 - u133p2
  • Transcription factor annotation from Gene Ontology
  • NCBI (National Center for Biotechnology Information - USA) Gene information database

Techniques used:

  • mRNA expression data

Analysis used

  • Differential expression: supervised gene selection



  • The button below brings you to the form in which you can submit your answers for section 2.7



  • Go back to the main portal of R2 by clicking the button below.


Again we’re going to find out which genes make a difference, but now in a specific subset of genes that has been annotated to have Transcription Factor activity. This is gathered from databases that collect that information from peer reviewed publications.

  • In field 3 select Find Differential expression between groups Click Next
  • Like before, select the track that contains information about the cell types*.
  • We’re now also going to filter for a specific gene filter. In the Gene Filters section, choose from the bottom dropdown the category C: geneannot as Gene set. Now click again on the same dropdown, and you will see that the dropdown list has expanded with different subcategories. Select the subcategory transcription factors, SC: TF (945). Click Next.
  • In the next screen we’re asked to further filter for the specific types of samples to compare. Here we’re focusing on the difference between ADRN and MES; select these (i.e. uncheck neural_crest). Click Next.
  • A list of genes appears. Investigate the top 4 by clicking on the hyperlinked gene symbols. This brings you to the expression view of the gene.
  • From here you can also access the NCBI gene database containing additional information on the function of the gene and related scientific publications. Do this by clicking on the hyperlinked GeneID number in the top table. You’ll arrive at a website that gathers all known information on genes. A useful section is the Bibliography containing short summaries of relevant scientific papers.

_images/R2d2_logo.pngArmed with this information, which gene would you choose for further research? Why?


2.8. Proving causes: manipulating cell lines

From experiments it is known that cells can change their nature, some cells exhibit a certain plasticity.

  • The button below brings you to the form in which you can submit your answers for section 2.8




_images/R2d2_logo.pngFrom your own biomedical knowledge, can you explain why this is of relevance to cancer?


From experiments in our lab it became evident that the two cell types found in Neuroblastoma were able to switch. After a given period of time cells in dishes changed their nature as was proven by the expression of certain marker proteins on their surface.Now that we have a candidate Transcription Factor we can try to investigate its relevance in this plasticity by manipulating the gene in cell lines we grow in the lab.


_images/R2d2_logo.pngCan you think of ways to manipulate genes in cell lines?


The TF was inducibly expressed in the SKNBE cell line and this was monitored through time for its gene expression using Affymetrix mRNA arrays. The resulting data was added to the dataset we used above for comparison.


_images/R2d2_logo.pngTo which of the cell types does SKNBE belong?


Data used:

  • A combination of the 4 cell line pairs, additional classical Neuroblastoma cell lines, cells from the neural crest lineage and lines that had the TF inducible expressed for increasing periods (Mixed Neuroblastoma (MES-ADRN-Crest-Exp) - Versteeg - 52 - MAS5.0 - u133p2)

Techniques used:

  • Inducible gene expression
  • mRNA expression data

Analysis used

  • Signature score comparison


  • Go to the R2 main page by clicking the button below, the correct dataset will be selected.


  • Select in field 3 the Relate 2 tracks option. R2 has calculated signature scores for all samples in both signatures; in this dataset these tracks are called adrn_score and mes_score. Relate the two tracks, adapt the ColorMode to Color by Track and try the mes_adrn_time track. This track contains information on the time that the PRRX1 gene expression was induced in the SKNBE cell line.

_images/R2d2_logo.pngWhat is your conclusion from this experiment?


  • This conclusion is even more obvious when the sequence of events is highlighted, as can be done in R2. The relations between the isogenic pairs are also illustrated. Click the button below to view the graph annotated in this way, a figure that can be incorporated in a publication right away.


2.9. Creating hypotheses: relating to chromatin modification data

Apparently this TF is capable of shifting cells from one state to the other. How can we further determine causal relations and ideally targetable processes in these cancer cells? How is a switch dynamically possible? A growing body of evidence implicates enhancers as key elements defining cell identity but the relationship of these enhancers to intratumoral heterogeneity is unknown. We performed ChIP-Seq analysis of the H3K27ac histone modifications for the isogenic cell line pairs.

Data used:

  • Four MES and five ADRN neuroblastoma cell lines, including three isogenic cell line pairs.

Techniques used:

  • ChIP-Seq analysis

Analysis

  • Genome Browser: analyzing histone modifications marking active enhancers
  • Differential Expression



  • The button below brings you to the form in which you can submit your answers for section 2.9




_images/R2d2_logo.pngCan you explain what the goal of this experiment was?


First we’ll check one of the HAND genes, known to play a role in the development of the sympatho-adrenal lineage from the neural crest.


_images/R2d2_logo.pngWhat do you expect for the H3K27ac signals?


  • Click on the button below to show the ChIP-Seq data for HAND1 in the four mesenchymal and five adrenergic neuroblastoma cell lines. For your convenience the signals are colored according to the type (MES or ADRN) of cell line.


Regions encoding genes are drawn at the bottom of the graph. When in red they’re encoded in the reverse direction, coding exons are darker.


_images/R2d2_logo.pngCan you explain this graph? What do you expect for the expression of this gene?


  • At the top of the page click on the zoom out 10X button. Look at the differences between the cell lines

The chromatin state is especially important for transcription factors; we’ll re-visit the list of transcription factors that are differentially expressed between the MES and ADRN cell lines.


_images/R2d2_logo.pngWhy are Transcription Factors of interest in this setting?


  • Perform the differential expression analysis again by clicking on the button below


  • Use both expression analysis and the enhancer data in the genomebrowser to decide which transcription factors would be worthwhile to further investigate. In the genomebrowser that was still open from the previous question, type the name of the gene in the left upper corner textfield. To further explore the larger region around the gene you can use the zoom buttons at the top of the page.

_images/R2d2_logo.pngWhich Transcription Factor would you consider for further study?


2.10. Suggesting therapy

  • With the current new knowledge that you derived above, can you think of a strategy to use the fact that neuroblastoma is a heterogenous tumor consisting of a mesenchymal, motile cell type and a adrenergic, differentiated cell type for therapeutic options? This is an open question, so be creative, you might find something interesting! If you want, you can follow the suggestions below.
  • Use the button of 1.9 to perform a differential expression analysis. This time explore other gene categories that could be interesting for drug development. Look at the expression profiles of some genes of your choice.
    • Hint: There is a category “drugtargets” in R2 to select druggable proteins; you can select this in the same dropdown where the TF selection was done.
    • Another very interesting gene category is the “kinase” category, this contains known kinases that have active roles in pathways.
  • The first button in 1.9 takes you to the Genome Browser at the position of the HAND1 gene. In the upper left corner of that page, you can fill in other genes of interest to look at the ChIP-Seq profiles of the samples at these locations of the genome. Do this for the genes whose expression profiles you just have studied. Try to find genes that show consistent chromatin modification profiles for the one type of neuroblastoma cell lines and a different consistent profile for the other type.
  • Knowledge about pathways can be exploited as well.
  • The NCBI database can provide additional information from literature about the genes of interest.



  • The button below brings you to the form in which you can submit your answers for section 2.10




_images/R2d2_logo.pngWhich strategy do you suggest?


2.11. Final remarks / future directions

In the March 1st 2018 issue of Nature a paper was published describing a landscape of genomic alterations across childhood cancers. The data is accessible in R2 also as a Datascope. This is another example of how R2 can visualize your genomics data.

This ends the course. Feel free to further explore the course materials or our tutorials.

We hope that this course has been helpful. If you want to have your genomics data visualized and analyzed using the R2 platform you can always consult r2-support@amc.nl

The R2 support team.