1. Student Course: Investigating Intra-tumor Heterogeneity

Analyse tumor heterogeneity in Neuroblastoma using the R2 data analysis platform

This resource is located online at http://r2-training-courses.readthedocs.io

1.1. Introduction

Cancer is a very complex disease. Much more complicated than originally anticipated when the first mutations were found to be causal for specific cancers. During the lectures you’ve been shown how this works in colorectal cancer, where a well defined path of subsequently gained mutations leads to more aggressive tumorigenic cell types (the Vogelstein model).

Figure 1: Mutation paths during cancer progression.

Figure 1: Mutation paths during cancer progression.

Although there has been extensive research into similar mutation mechanisms in Neuroblastoma (also in the AMC Oncogenomics group), such a mechanism has not been found for this type of cancer. In this practical work session we will try to bring you to the cutting edge of research into this often deadly childhood tumor.

From the lectures you should have learned already that this tumor consists of different cancer cell types. There is reason to believe that this heterogeneity causes the high percentage of relapses in the aggressive subtype of Neuroblastoma. Children developing a relapse almost always die. Fortunately new technologies have become available to molecular biology. These enable us to study not only mutations and RNA expression of genes but also study the epigenetic modifications of the DNA-associated histones. And in addition genes can now be manipulated in cell lines and living tissues. Using advanced data analysis, statistics and clustering methods, the field of bioinformatics tries to derive new insights from these experimental data and help molecular biologists to generate hypotheses that can be tested experimentally. Today you will use the web-based genomics analysis and visualization platform R2. R2 provides you with a set of bioinformatics tools to investigate recent patient and experimental data from Neuroblastoma tumors and cell lines.

Neuroblastoma is a pediatric tumor of the peripheral adrenergic lineage, which is neural crest derived. During embryogenesis, cells delaminate from the neural crest, migrate ventrally and differentiate into adrenaline- or noradrenaline-producing cells. Neuroblastomas typically express enzymes for the adrenaline-synthesis route. High-stage neuroblastomas usually go into complete remission upon therapy but often relapse as therapy-resistant disease. Using recent molecular biology data gathering techniques and advanced bioinformatic data analysis algorithms we set out to investigate this nasty characteristic of Neuroblastoma tumors. From four patients we obtained tumor biopsies that were taken in culture. Each biopsy gave rise to two phenotypically divergent cell lines.

1.2. Tumors and origins: a first impression of your data

For a start we’ll investigate established childhood tumor cell lines, including neuroblastoma. Established cell lines can be grown and passaged in culture indefinetely. A typical example is the classic HeLa cell line, taken from a cervical adenocarcinoma of Henrieta Lacks in 1951 that has been in culture since. How do profiles of neuroblastoma cell lines relate to cell lines of other tumors? Additional data about classical cell lines from other childhood tumors is available in the resources of the scientific community. For each publication scientists are required to make their data available in public repositories. We can use these in a larger public dataset of 86 other cell lines derived from 6 different childhood tumors and see how they relate.

Data used:

  • 86 cell lines derived from 6 different childhood tumors (Cellline Childhood cancer - ITCC - 86 - MAS5.0 - u133p2)

Techniques used:

  • mRNA Microarray expression

Analysis used

  • individual gene selection
  • t-SNE: t-distributed stochastic neighbor embedding statistics

1.2.1. Expression of key genes

  • Go to R2 by clicking on the button below:

You’re now on the R2 main page. This web based molecular biology data analysis platform contains a wealth of data and methods to analyze these. Step by step researchers are guided through a web of data analysis possibilities. The portal of R2 shows this principle; step through each of the fields to develop your analysis of choice. In this case we’re first going to see if and how the mRNA expression of several genes changes through a single dataset. The proper dataset described above has been selected already.

_images/R2d2_logo.pngCan you think of a gene that might mark differences between these tumor models?

  • In field 4 type the name of the gene and click Next
  • Leave all settings at default and click Next

A graph shows the expression of this gene’s mRNA in the whole set of childhood tumor cell lines. Samples are along the x-axis, mRNA expression values of the gene in a sample are on the y-axis. Below the graph is the available annotation for the samples shown in colored tracks.

  • Hover with your mouse over data points to show additional information.
  • The expression values on the y-axis are logarithmic; set the Transform option to none, and select Track and Gene sort for the Extra Graph Option. Sample annotation is stored in R2 in so called tracks, for use track choose the itcc_model track that contains the information which sample belongs to which tumor type. Click on More Settings and set the draw legend dropdown to yes and click Adjust Settings to obtain a more explicit picture.
  • Now try the gene MYCN (Click the Go to Main link in the left upper corner)

_images/R2d2_logo.pngCan you say something about the role this gene can play in cancer?

_images/R2d2_logo.pngUsing the data annotation track below the graph, what can you say about neuroblastoma?

1.2.2. Clustering with tSNE maps

We’ve seen that the expression of genes differs among the samples and some types of tumors seem to specifically express certain genes. To further explore the type of data we’re dealing with, an unbiased unsupervised type of clustering analysis is a good idea. One recently developed algorithm is the tSNE map.

  • Click the button below to show the tSNE map in R2

  • Colors are not set by default, under ColorMode select Color by Track and use the itcc_model track, click Next to show the changes

_images/R2d2_logo.pngCan you relate the tumors to a type of tissue? (Note: ALL stands for Acute Lymphocytic Leukemia)

_images/R2d2_logo.pngWhat do you note about the clustering of neuroblastoma cell lines with respect to the lineage of origin?

_images/R2d2_logo.pngIf you had to choose two cell lines for further investigation of lineage identity in neuroblastoma, which would you choose?

1.3. Urgency of research: patient material

In the former step we derived that neuroblastoma cell lines seem to group with cell lines of different developmental lineages. We have recently established new cell line pairs from neuroblastoma patients. In some cases multiple cell lines were obtained from the same biopsy. These cell lines share genetic defects and are therefore called isogenic cell line pairs. A microscopy image of each pair is provided below.

Figure 2: Bright field image of isogenic cell line pairs.

Figure 2: Bright field image of isogenic cell line pairs.

_images/R2d2_logo.pngWhat do you note about the morphology of the cell lines?

We profiled the mRNA expression of genes using Affymetrix mRNA chips in three of these pairs and of a previously established neuroblastoma cell line that after culturing gave rise to two very divergent phenotypes. The resulting gene expression patterns can be used to perform a hierarchical clustering. An example of such clustering resulting in an ordered heatmap is provided below

Figure 3: Heatmap: unsupervised clustering of samples using the distribution of the expression data combined with the clustering of genes based on their expression through the samples.

Figure 3: Heatmap: unsupervised clustering of samples using the distribution of the expression data combined with the clustering of genes based on their expression through the samples.

Data used:

  • Cell lines recently derived from three tumors from one patient. Two biopsies per tumor were taken. This dataset is combined with two classical Neuroblastoma cell lines that clustered far apart: SHEP and SY5Y (Mixed Neuroblastoma (MES-ADRN) - Versteeg - 8 - MAS5.0 - u133p2)

Techniques used:

  • mRNA Microarray expression

Analysis used

  • Toplister: unsupervised gene selection
  • Unsupervised hierarchical clustering
  • Heatmap visualization

For this analysis we’ll directly go to one of the analysis tools of R2: Toplister. The Toplister can assess which genes behave different throughout a dataset. It does so by selecting the genes whose expression values have the largest standard deviation within a given set of samples. This gives an unbiased view of the differences in gene expression.

  • Go to R2 by clicking the button below. R2 will find the 100 genes having the largest variation in gene expression throught these 8 cell lines; three pairs from three tumors of a patient and two classical neuroblastoma cell lines.

  • Click Next; a list of genes appears

_images/R2d2_logo.pngDo you recognize any genes that explain the difference in phenotype?

  • Use the mousewheel to scroll to the bottom of the page (or click on the shoe-print at the top of the page).

Here you can choose to perform an additional analysis. The heatmap vizualization produces a hierarchically clustered view of the expression values for the top 100 genes.

_images/R2d2_logo.pngWhat number of groups do you expect?

  • Click on Heatmap(z-score)

The cell line pairs from the patient were also investigated for the tumor stem cell marker gene CD133 and for their migration capability. See the results in the figure below:

Figure 4: CD133 Facs analysis and transwell migration assay of isogenic pairs

Figure 4: CD133 Facs analysis and transwell migration assay of isogenic pairs

_images/R2d2_logo.pngGiven these observations, what origin can you assign to each group?

1.4. Which genes make a difference? Creating signatures

We have identified two different types of cells that occur within the same patient. Neuroblastoma apparently has a heterogenous nature. What genes determine the difference between the two types? We’ll use RNA expression data again but now we will use a predefined, supervised classification in groups to search for genes that characterize this classification best, or in other words, that are differentially expressed between these two groups.

Data used:

  • Mixed Neuroblastoma (MES-ADRN) - Versteeg - 8 - MAS5.0 - u133p2 (same as above)
  • Gene Ontology
  • Broad curated hallmark datasets

Techniques used:

  • mRNA arrays

Analysis used

  • Differential Expression: supervised gene selection
  • Gene Ontology Analysis: Overrepresentation calculation

  • Go to the main page of R2 by clicking the button below

  • In Field 3 choose Find Differential expression between groups and click Next

This dataset has been annotated with ‘cell type’ information. Each sample was assigned to either the MESenchymal or the ADReNergic cell type, in R2 this is called a track.

  • Choose the proper track in the Select a track dropdown. Since we have only 8 samples make sure that the multiple testing correction is set to No correction. (More information on Multiple Testing is here) Click Next twice
  • A list of differentially expressed genes appears with correlation p-value < 0.01 in this dataset is shown. Click on the hyperlinked name of your favorite gene to see its expression in the sample set; try an oppositely correlating gene as well
  • Go back to the window with the differentially expressed genes. This is still open in one of your browser tabs.
  • Click on the Heatmap(zscore) button in the right menu panel; a heatmap shows the expression of the differentially expressed genes for each sample.

_images/R2d2_logo.pngHow is this figure different from the former?

For future use, this list of genes has been stored for you in R2 as signatures (aka genesets or categories). The list has been split into two categories: one set of genes that is highly expressed in the MES type of samples (r2_mesadrn_mes) and one set of genes highly expressed in the ADRN type of samples (r2_mesadrn_adrn).

R2 provides additional analysis for sets of genes that can be accessed from the right panel of menu buttons. As a first analysis step we can check a data resource called the Gene Ontology that provides a tree of systematically ordered biological terms that is used to formally describe the biological role of each gene. The Gene Ontology Analysis tool in R2 calculates for each of the thousands of groups of genes that are annotated with a specific biological term whether your set of choice is over-represented in them.

  • On the page with the differentially expressed genes, select the Gene Ontology Analysis button in the menu on the right

_images/R2d2_logo.pngWhat can you say about the function of the differentially expressed genes?

  • Now scroll down to the end of the page (or click the filter button in the left upper corner of the page) and adapt the settings such that only the Biological Process branch of the Gene Ontology is selected, and select only the genes that are higher expressed in the MES type of cells

_images/R2d2_logo.pngWhat can you say about the function of the group of genes that are upregulated in the MES type of cells?

In R2 there are much more sets of genes that have been found to be implemented in specific processes. The Broad Institute has compiled quite some of these sets of genes that characterize hallmark biological processes.

  • Go back to the window with the differentially expressed genes.
  • Select the Gene set analysis option from the right menu
  • Select the geneset_broad_2015_hallmark geneset and click Next

_images/R2d2_logo.pngWhich hallmark category of genes pops up as most important? Can you explain this?

1.5. Identifying groups: using signatures to classify other datasets

We now have a signature that distinguishes between the two types of cells. We also obtained some hints about functional characteristics of these cells. How does this signature behave in other datasets? Does the same set of genes tell us something about other sets of tumors or cell lines? This is the next step in our analysis.We’ve assembled a more complex dataset by gathering the dataset of the 4 pairs of cell lines, additional neuroblastoma cell lines from the first dataset and publicly available data of non-malignant human neural crest tissue. The neural crest undergoes a mesenchymal transition and gives rise to cell types from the adrenergic lineage.

Data used:

  • A combination of the 8 cell lines above, additional neuroblastoma cell lines and cells from the neural crest lineage (Mixed Neuroblastoma (MES-ADRN-CREST) - Versteeg/Etchevers - 34 - MAS5.0 - u133p2)

Techniques used:

  • mRNA expression data

Analysis used

  • Heatmap analysis

  • Go to the main portal of R2 by clicking the button below; the dataset described above is automatically selected

  • In field 3 select View Geneset
  • Click Next and select geneset_r2provided_genelists
  • Click Next, leave selection as is and click Next
  • Select both signatures that were derived before by CTRL click on the MES (r2_mesadrn_mes) and ADRN (r2_mesadrn_adrn) signatures and click Next

_images/R2d2_logo.pngWhich cell types group together?

_images/R2d2_logo.pngHow does this relate to the earlier observations on cell lineage?

When observing such clear-cut patterns it is good scientific practice to test this in additional datasets. The database of R2 contains an additional dataset consisting of neuroblastoma cell lines that were profiled by a French research team.

  • Click on the button below to go there and perform the same analysis.

_images/R2d2_logo.pngDo you observe similar patterns?

1.6. Using scores for further characterization

The expression patterns of these specific signatures can be used to compare cell types. We can do this by summarizing the expression data of all genes in the signature in each cell type in one value; a signature score. The figure below shows the signature score of the MES part of the signature in a specific sample.

Figure 5: The signature score as calculated for a specific sample in the MES signature.

Figure 5: The signature score as calculated for a specific ample in the MES signature.

R2 has calculated these scores for all samples in both signatures. We’re going to find out how they relate.

Data used:

  • Mixed Neuroblastoma (MES-ADRN-CREST) - Versteeg/Etchevers - 34 - MAS5.0 - u133p2

Techniques used:

  • mRNA expression data

Analysis used

  • Signature scores

  • Go back to the main portal of R2 by clicking the button below.

  • In field 3 choose Relate 2 tracks and click Next
  • First we’ll explore the scores in each signature separately; on the X-axis (Select X track) we’ll use the unique sample id (lab_id) and on the Y-axis the signature score track that R2 has generated for the MES signature (u-34_mesadrn_mes(#)). Click Next.
  • A graph is generated for each sample the signature score for the mesadrn_mes signature is shown, select Color by Track for ColorMode and try different tracks. Click Adjust Settings to view the result.
  • Now select for the Y-axis the ADRN part of the signature, click Adjust Settings to view the result.
  • Now we’re going to compare the signature scores; select the MES signature for the X track
  • If you have time you can also try the Color by Gene ColorMode, choose a gene of interest (Note: the dropdown selection is linked to the database, wait for the proper selections to popup…)

_images/R2d2_logo.pngWhat conclusion would you draw from these figures?

1.7. Finding causes: homing in on transcription factors

Apparently there are two types of cells in Neuroblastoma tumors. Neuroblastoma seems to be a heterogenous tumor. Transcription factors (TFs) are known to determine gene expression programs in cells. These gene expression programs determine the development of the cell. Can we find out which TFs might influence the difference between both of these cell lines?

Data used:

  • Mixed Neuroblastoma (MES-ADRN-CREST) - Versteeg/Etchevers - 34 - MAS5.0 - u133p2
  • Transcription factor annotation from Gene Ontology
  • NCBI (National Center for Biotechnology Information - USA) Gene information database

Techniques used:

  • mRNA expression data

Analysis used

  • Differential expression: supervised gene selection

  • Go back to the main portal of R2 by clicking the button below.

Again we’re going to find out which genes make a difference, but now in a specific subset that has been annotated to have Transcription Factor activity. This is gathered from databases that collect that information from peer reviewed publications.

  • In field 3 select Find Differential expression between groups Click Next
  • Make sure to select the proper track under Select a track. We’re now also going to filter for a specific GeneCategory; select the Transcription factors (TF(945)). Click Next.
  • In the next screen we’re asked to further filter for a specific type of samples to compare, we’re focusing on the difference between ADRN and MES; select these. Click Next.
  • A list of genes appears. Investigate the top 4 by clicking on the hyperlinked gene symbols. This brings you to the expression view of the gene.
  • From here you can also access the NCBI gene database containing additional information on the function of the gene and related scientific publications. Do this by clicking on the hyperlinked GeneID number in the top table. You’ll arrive at a website that gathers all known information on genes; a useful section is the Bibliography containing short summaries of relevant scientific papers.

_images/R2d2_logo.pngArmed with this information, which gene would you choose for further research? Why?

1.8. Proving causes: manipulating cell lines

From experiments it is known that cells can change their nature, some cells exhibit a certain plasticity.

_images/R2d2_logo.pngCan you explain why this is of relevance to cancer?

From experiments in our lab it became evident that the two cell types found in Neuroblastoma were able to switch. After a given period of time cells in dishes changed their nature as was proven by the expression of certain marker proteins on their surface.Now that we have a candidate Transcription Factor we can try to investigate its relevance in this plasticity by manipulating the gene in cell lines we grow in the lab.

_images/R2d2_logo.pngCan you think of ways to manipulate genes in cell lines?

The TF was inducibly expressed in the SKNBE cell line and this was monitored through time for its gene expression using Affymetrix mRNA arrays. The resulting data was added to the dataset we used above for comparison.

_images/R2d2_logo.pngTo which of the cell types does SKNBE belong?

Data used:

  • A combination of the 4 cell line pairs, additional classical Neuroblastoma cell lines, cells from the neural crest lineage and lines that had the TF inducible expressed for increasing periods (Mixed Neuroblastoma (MES-ADRN-Crest-Exp) - Versteeg - 52 - MAS5.0 - u133p2)

Techniques used:

  • Inducible gene expression
  • mRNA expression data

Analysis used

  • Signature score comparison

  • Go to the R2 main page by clicking the button below, the correct dataset will be selected.

  • Select in field 3 the Relate 2 tracks option. R2 has calculated signature scores for all samples in both signatures; in this dataset these tracks are called adrn_score and mes_score. Relate the two tracks, adapt the ColorMode to Color by Track and try the mes_adrn_time track. This track contains information on the time that the PRRX1 gene expression was induced in the SKNBE cell line.

_images/R2d2_logo.pngWhat is your conclusion from this experiment?

1.9. Creating hypotheses: relating to chromatin modification data

Apparently this TF is capable of shifting cells from one state to the other. How can we further determine causal relations and ideally targetable processes in these cancer cells? How is a switch dynamically possible? A growing body of evidence implicates enhancers as key elements defining cell identity but the relationship of these enhancers to intratumoral heterogeneity is unknown. We performed ChIP-Seq analysis of the H3K27ac histone modifications for the isogenic cell line pairs.

Data used:

  • Four MES and five ADRN neuroblastoma cell lines, including three isogenic cell line pairs.

Techniques used:

  • ChIP-Seq analysis


  • Chapter 1.8 Weinberg


  • Genome Browser: analyzing histone modifications marking active enhancers
  • Differential Expression

_images/R2d2_logo.pngCan you explain what the goal of this experiment was?

First we’ll check one of the HAND genes, known to play a role in the development of the sympatho-adrenal lineage from the neural crest.

_images/R2d2_logo.pngWhat do you expect for the H3K27ac signals?

  • Click on the button below to show the ChIP-Seq data for HAND1 in the four mesenchymal and five adrenergic neuroblastoma cell lines. For your convenience the signals are colored according to the type (MES or ADRN) of cell line.

Regions encoding genes are drawn at the bottom of the graph. When in red they’re encoded in the reverse direction, coding exons are darker.

_images/R2d2_logo.pngCan you explain this graph? What do you expect for the expression of this gene?

The chromatin state is especially important for transcription factors; we’ll re-visit the list of transcription factors that are differentially expressed between the MES and ADRN cell lines.

_images/R2d2_logo.pngWhy are Transcription Factors of interest in this setting?

  • Perform the differential expression analysis again by clicking on the button below

  • Use both expression analysis and the enhancer data in the genomebrowser to decide which transcription factors would be worthwhile to further investigate. In the genomebrowser you can type the name of the gene in the left upper corner textfield. To further explore the larger region around the gene you can use the zoom buttons at the top of the page

_images/R2d2_logo.pngWhich Transcription Factor would you consider for further study?

1.10. Suggesting therapy

  • With the current new knowledge that you derived above, can you think of a strategy to use the fact that neuroblastoma is a heterogenous tumor consisting of a mesenchymal, motile cell type and a adrenergic, differentiated cell type for therapeutic options? This is an open question, so be creative, you might find something interesting! If you want, you can follow the suggestions below.
  • Use the button of 1.9 to perform a differential expression analysis. This time explore other gene categories that could be interesting for drug development. Look at the expression profiles of some genes of your choice.
    • Hint: There is a category “drugtargets” in R2 to select druggable proteins; you can select this in the same dropdown where the TF selection was done.
    • Another very interesting gene category is the “kinase” category, this contains known kinases that have active roles in pathways.
  • The first button in 1.9 takes you to the Genome Browser at the position of the HAND1 gene. In the upper left corner of that page, you can fill in other genes of interest to look at the ChIP-Seq profiles of the samples at these locations of the genome. Do this for the genes whose expression profiles you just have studied. Try to find genes that show consistent chromatin modification profiles for the one type of neuroblastoma cell lines and a different consistent profile for the other type.
  • Knowledge about pathways can be exploited as well.
  • The NCBI database can provide additional information from literature about the genes of interest.

_images/R2d2_logo.pngWhich strategy do you suggest?