1. Introduction

Intra-tumor heterogeneity (ITH) is now thought to be a key factor that results in the therapeutic failures and drug resistance, which have arose increasing attention in cancer research. Here, we present an R package, MesKit, for characterizing cancer genomic ITH and inferring the history of tumor evolutionary. MesKit provides a wide range of analyses including ITH evaluation, enrichment, signature, clone evolution analysis via implementation of well-established computational and statistical methods. The source code and documents are freely available through Github (https://github.com/Niinleslie/MesKit). We also developed a shiny application to provide easier analysis and visualization.

1.1 Citation

In R console, enter citation("MesKit").

MesKit: A Tool Kit for Dissecting Cancer Evolution of Multi-region Tumor Biopsies through Somatic Alterations (In production)

2. Prepare input Data

To analyze with MesKit, you need to provide:

  • A MAF file of multi-region samples from patients. (*.maf / *.maf.gz). Required
  • A clinical file contains the clinical data of tumor samples from each patient. Required
  • Cancer cell fraction (CCF) data of somatic mutations. Optional but recommended
  • A segmentation file. Optional
  • The GISTIC outputs. Optional

Note: Tumor_Sample_Barcode should be consistent in all input files, respectively.

2.1 MAF files

Mutation Annotation Format (MAF) files are tab-delimited text files with aggregated mutations information from VCF Files. The input MAF file could be gz compressed, and allowed values of Variant_Classificationcolumn can be found at Mutation Annotation Format Page.

The following fields are required to be contained in the MAF files with MesKit.

Mandatory fields:

Hugo_Symbol, Chromosome, Start_Position, End_Position, Variant_Classification, Variant_Type, Reference_Allele, Tumor_Seq_Allele2, Ref_allele_depth, Alt_allele_depth, VAF, Tumor_Sample_Barcode

Note:

  • The Tumor_Sample_Barcode of each sample should be unique.
  • The VAF (variant allele frequencie) can be on the scale 0-1 or 0-100.

Example MAF file

##   Hugo_Symbol Chromosome Start_Position End_Position Variant_Classification
## 1      KLHL17          1         899515       899515                 Silent
## 2       MFSD6          2      191301342    191301342      Missense_Mutation
## 3    KIAA0319          6       24596837     24596837      Missense_Mutation
##   Variant_Type Reference_Allele Tumor_Seq_Allele2 Ref_allele_depth
## 1          SNP                C                 T               85
## 2          SNP                G                 A               53
## 3          SNP                C                 T                6
##   Alt_allele_depth   VAF Tumor_Sample_Barcode
## 1                1 0.012             V402_P_1
## 2                0 0.000             V750_P_2
## 3                0 0.000            V750_BM_3

2.2 Clinical data files

Clinical data file contains clinical information about each patient and their tumor samples, and mandatory fields are Tumor_Sample_Barcode, Tumor_ID, Patient_ID, and Tumor_Sample_Label.

Example clinical data file

##   Tumor_Sample_Barcode Tumor_ID Patient_ID Tumor_Sample_Label
## 1             V402_P_1        P       V402                P_1
## 2             V402_P_2        P       V402                P_2
## 3             V402_P_3        P       V402                P_3
## 4             V402_P_4        P       V402                P_4
## 5            V402_BM_1       BM       V402               BM_1

2.3 CCF files

By default, there are six mandatory fields in input CCF file: Patient_ID, Tumor_Sample_Barcode, Chromosome, Start_Position, CCF and CCF_Std/CCF_CI_High (required when identifying clonal/subclonal mutations). The Chromosome field of your MAF file and CCF file should be in the same format (both in number or both start with “chr”). Notably, Reference_Allele and Tumor_Seq_Allele2 are also required if you want include contains INDELs in the CCF file.

Example CCF file

##   Patient_ID Tumor_Sample_Barcode Chromosome Start_Position   CCF CCF_Std
## 1       V402             V402_P_1          1         899515 0.031   0.126
## 2       V402             V402_P_1          1         982996 0.031   0.117
## 3       V402             V402_P_1          1        2452742 0.125   0.239
## 4       V402             V402_P_1          1        6203883 0.422   0.750
## 5       V402             V402_P_1          1       11106655 0.094   0.324

2.4 Segmentation files

The segmentation file is a tab-delimited file with the following columns:

  • Patient_ID - patient ID
  • Tumor_Sample_Barcode - tumor sample barcode of samples
  • Chromosome - chromosome name or ID
  • Start_Position - genomic start position of segments (1-indexed)
  • End_Position - genomic end position of segments (1-indexed)
  • SegmentMean/CopyNumber - segment mean value or absolute integer copy number
  • Minor_CN - copy number of minor allele Optional
  • Major_CN - copy number of major allele Optional
  • Tumor_Sample_Label - the specific label of each tumor sample. Optional

Note: Positions are in base pair units.

Example Segmentation file

##   Patient_ID Tumor_Sample_Barcode Chromosome Start_Position End_Position
## 1       V402             V402_P_1          1              1      1650882
## 2       V402             V402_P_1          1        1650883     33159352
## 3       V402             V402_P_1          1       33159353     33610373
## 4       V402             V402_P_1          1       33610374     88509894
## 5       V402             V402_P_1          1       88509895     89462108
##   CopyNumber Minor_CN Major_CN Tumor_Sample_Label
## 1          2        1        1                P_1
## 2          1        0        1                P_1
## 3          3        0        3                P_1
## 4          2        1        1                P_1
## 5          2        0        2                P_1

3. Installation

Via Bioconductor

# Installation of MesKit requires Bioconductor version 3.12 or higher
if (!requireNamespace("BiocManager", quietly = TRUE)){
  install.packages("BiocManager")
}   
# The following initializes usage of Bioc 3.12
BiocManager::install(version = "3.12")
BiocManager::install("MesKit")

Via GitHub

Install the latest version of this package by typing the commands below in R console:

if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
devtools::install_github("Niinleslie/MesKit")

4. Start with the Maf object

readMaf function creates Maf/MafList objects by reading MAF files, clinical files and cancer cell fraction (CCF) data (optional but recommended). Parameter refBuild is used to set reference genome version for Homo sapiens reference ("hg18", "hg19" or "hg38"). You should set use.indel.ccf = TRUE when your ccfFile contains INDELs apart from SNVs.

library(MesKit)
maf.File <- system.file("extdata/", "CRC_HZ.maf", package = "MesKit")
ccf.File <- system.file("extdata/", "CRC_HZ.ccf.tsv", package = "MesKit")
clin.File <- system.file("extdata", "CRC_HZ.clin.txt", package = "MesKit")
# Maf object with CCF information
maf <- readMaf(mafFile = maf.File,
               ccfFile = ccf.File,
               clinicalFile  = clin.File,
               refBuild = "hg19")  

5. Mutational landscape

5.1 Mutational profile

In order to explore the genomic alterations during cancer progression with multi-region sequencing approach, we provided classifyMut function to categorize mutations. The classification is based on shared pattern or clonal status (CCF data is required) of mutations, which can be specified by class option. Additionally, classByTumor can be used to reveal the mutational profile within tumors.

# Driver genes of CRC collected from [IntOGen] (https://www.intogen.org/search) (v.2020.2)
driverGene.file <- system.file("extdata/", "IntOGen-DriverGenes_COREAD.tsv", package = "MesKit")
driverGene <- as.character(read.table(driverGene.file)$V1)
mut.class <- classifyMut(maf, class =  "SP", patient.id = 'V402')
head(mut.class)

plotMutProfile function can visualize the mutational profile of tumor samples.

plotMutProfile(maf, class =  "SP", geneList = driverGene, use.tumorSampleLabel = TRUE)

5.2 CNA profile

The plotCNA function can characterize the CNA landscape across samples based on copy number data from segmentation algorithms. Besides, MesKit provides options to integrate GISTIC2 results, which can be obtained from http://gdac.broadinstitute.org. Please make sure the genome version based on these results is consistent with refBuild of the Maf/MafList object .

# Read segment file
segCN <- system.file("extdata", "CRC_HZ.seg.txt", package = "MesKit")
# Read gistic output files
all.lesions <- system.file("extdata", "COREAD_all_lesions.conf_99.txt", package = "MesKit")
amp.genes <- system.file("extdata", "COREAD_amp_genes.conf_99.txt", package = "MesKit")
del.genes <- system.file("extdata", "COREAD_del_genes.conf_99.txt", package = "MesKit")
seg <- readSegment(segFile = segCN, gisticAllLesionsFile = all.lesions,
                   gisticAmpGenesFile = amp.genes, gisticDelGenesFile = del.genes)
## --Processing COREAD_amp_genes.conf_99.txt
## --Processing COREAD_del_genes.conf_99.txt
## --Processing COREAD_all_lesions.conf_99.txt
seg$V402[1:5, ]
plotCNA(seg, patient.id = c("V402", "V750", "V824"), use.tumorSampleLabel = TRUE)