What is Immunogenomics?

Cancer is complex. Cancer cells become deviant, growing and functioning abnormally.  They cause disease symptoms and sabotage the function of the organs they invade. Our immune response to cancer is equally complex. When cancer turns fatal, it often means the immune system is no longer able to hold back tumor growth.

Our researchers and clinicians are using immunogenomics—the combined parallel study of the genomics of tumor and immune cells—to better understand the body’s immune response to cancer.

Unfortunately, immunotherapies, which boost the strength and specificity of immune cells trained to fight disease, are not currently universally effective. Immunotherapies work very well on some types of cancer, but not others—and very well in some patients, but not others. Our team is hard at work to understand why this is and how to make immunotherapies effective for every patient.

Among the other questions we aim to answer are: How does immunotherapy change the immune cell repertoire, and thus the selective pressure leveled against tumor cells? How does the tumor cell population evolve as this pressure changes? What is going on inside a tumor when anti-tumor immunity fails?

Ultimately, we seek to uncover how to tailor immunotherapy so that every patient experiences the greatest possible benefit with the least risk.  We are working to develop new immunotherapy strategies to improve outcomes.

Principles of Immunotherapy & Immunogenetics

In cancer, normal cells lose the ability to control their own growth and become immortal. While this can sometimes occur as a result of rare inherited mutations in critical genes, these pro-cancer cell changes more often occur gradually as a cell sustains damage to its DNA over time. DNA serves as a blueprint for how a cell should function, so changes in DNA, called mutations, can be dangerous.

In addition to fighting off infections due to viruses and bacteria, our immune system also surveil our body to recognize and protect us from the dangerous effects of cancer cells. Both mutations that help cancer cells grow (“driver mutations”) and those that do not (“passenger mutations”) can cause cells to express new versions of native proteins. Because these proteins do not occur anywhere else in the body, the immune system can recognize them as foreign. These foreign proteins, recognizable by our immune system, are called neo-antigens. The key immune cells that protect us from cancer by recognizing these neo-antigens are called T cells.

Cancers evolve through characteristic interactions with the immune system, referred to as the “three Es”—elimination, equilibrium and escape. Elimination occurs when the immune system destroys an abnormal growth before it becomes a cancer. Equilibrium refers to scenarios when the immune system keeps the tumor from growing but cannot eliminate it completely. Lastly, when the immune system is either suppressed by the tumor or no longer recognizes it as foreign, immune “escape” occurs and net growth of the cancer cell population.

Immunotherapies reinvigorate the immune system when a tumor is escaping, helping to restore equilibrium or even eliminate the tumor. Immunogenomics can provide information about the landscape of mutations and neo-antigens in a tumor as it changes over time (including during the course of therapy), helping us to better understand the biological mechanisms that make immunotherapy successful. In parallel, immunogenomics can also help us understand the specificity and strength of the immune response as it acts upon the tumor. Research in this area will help elucidate how a cancer escapes immune control and inform the development of better therapies for patients.

[Caption] Mutual selection of tumor cells and immune cells. Some mutations that occur in a tumor cell population (colored dots) give rise to mutant proteins that are immunogenic—recognizable by immune cells, such as lymphocytes. When tissue-infiltrating lymphocytes (TILs) encounter these neo-antigens, those with receptors that bind the neo-antigens will proliferate and become activated. These activated immune cells are then capable of killing the tumor cells that express the neo-antigen. Because not all neo-antigens are shared by all tumor cells in the population, this process often leads not to complete elimination of the tumor, but to tumor evolution (bottom right). Other tumor antigens now need to be targeted for immune killing or else the tumor will escape immune control and grow. Gene expression by the tumor cells and the surrounding tumor microenvironment is often a critical variable shaping whether the immune response is strong enough to eliminate the cancer, or weak, allowing escape.

Cancer is an evolutionary process and tumor cells can accumulate hundreds of mutations as they grow and divide. Some of these mutations are immunogenic—recognizable as “non-self” by our immune system. For a mutation to be immunogenic, the mutated protein has to be processed inside the cancer cell, and the resulting mutated peptide (called a neo-peptide) must bind to one of the patient’s major histocompatibility complex (MHC) class I molecules in order to be presented on the cell surface. Then, a T cell must be able to recognize the neo-peptide with its T cell receptor (TCR) in order to subsequently trigger an immune response. The physical feature that helps the TCR to recognize the neo-peptide is called an epitope. Increasing evidence suggests that the immune response to these mutation-derived antigens is very specific and critical for a successful response to immunotherapies, including immune checkpoint blockade and adoptive T cell therapy. (For background and research, see the following additional references: 1, 2, 3, 4, 5, 6, 7.)

[Caption] Antigen-processing machinery in normal and cancer cells. In both normal and tumor cells, degraded bits of proteins (peptides) are transported to the endoplasmic reticulum for loading onto the MHC class I molecules. The MHC-peptide complexes then move to the cell surface where they are monitored by T cells. If an epitope is recognized by the TCR, it leads to T cell activation, T cell differentiation and, ultimately, death of the epitope-presenting cells. These epitopes can be specific to tumor cells (neo-epitopes) or epitopes derived from normal proteins that are expressed at unusually high levels in tumor cells.

When a T cell recognizes an antigen with its T cell receptor, it activates, or “turns on,” and begins to proliferate. Once the number of activated T cells expressing that specific receptor increases, they can work together to kill the invading tumor cells or other threat.

Any immune reaction takes energy to sustain, which can cause damage to healthy tissues and hinder the body’s ability to react to other challenges. Importantly, however, the immune system has mechanisms to maintain balance.

When a T cell becomes activated, it begins to express other receptors on its cell surface that serve as “off switches.” These switches can be triggered by other cells (such as other immune cells, healthy tissue and even tumor cells) to shut down the killing action of the activated T cells. Furthermore, the longer a T cell is exposed to the antigen it recognizes, the weaker its ability to kill becomes—a phenomenon called exhaustion. It is thought that both of these mechanisms exist because most threats to the immune system are acute, which means they occur suddenly (like an infection) and are cured by the immune response in days or weeks. Tumors (as well as some infectious diseases) on the other hand, pose a chronic challenge, in which the immune cells are stimulated by the same antigens over longer periods of time, like months or even years. The T cells specific to that threat may become exhausted or actively suppressed by both tumor cells and healthy cells, triggering their “off switches” to protect themselves.

Many immunotherapies used against cancer are designed to protect or rescue T cells from this exhaustion or suppression, allowing tumor-specific immune cells to fully regain their killing functions. We are using high-throughput sequencing of both the immune cells and the tumor cells to: 1) improve immunodiagnostics for determining what aspect of a patient’s immune system is functioning suboptimally; 2) describe how the mutations in the tumor population change when selectively killed by rescued immune cells; 3) understand why these immunotherapies work better in some patients than in others; and 4) devise precision combinations of immunotherapies with chemotherapy and radiation therapy to maximize the killing of tumor cells while minimizing the damage to healthy tissues in every patient..

Core Technologies

Mutations accumulate in cells due to environmental insults, such as UV light and cigarette smoke, and form sporadic DNA replication errors that occur during normal cell proliferation. Mutations that confer the ability to proliferate unchecked by the body’s normal regulatory systems are often referred to as driver mutations. Cells with driver mutations can become abundant in the tumor population. Every time these cells divide, there is a chance that additional mutations will occur due to DNA copying errors. Thus, in addition to driver mutations, tumor cells often accumulate random damage to many other parts of the genome, including those that do not accelerate cancer’s growth (called passenger mutations).

The mutational landscape of a tumor is composed of both driver and passenger mutations, which can be identified using high-throughput next-generation sequencing. Studying the number of each, their abundance in the population and which mutations seem to have evolved together can reveal key information about the selective pressure the tumor is under (including competing for limited resources like nutrients and oxygen, struggling to maintain essential cell processes despite rapid growth, or being attacked by the immune system) and can inform precise combinations of therapies to target the genetic and immunogenic weaknesses of the tumor.

[Caption] Our current mutation-calling pipeline implements multiple state-of-the-art approaches to increase the confidence of analysis.

We use whole-exome sequencing, whole-genome sequencing and targeted gene sequencing to identify the genomic factors affecting antitumor immune activity. Briefly, our refined pipeline maps raw sequence reads to the human reference genome; annotates the positions of insertions, deletions, and nucleotide variations; and removes artifacts from library preparation.


We are interested in understanding the clonal composition of tumors. A clone is a cluster of cells that shares the same mutations, possibly due to a shared lineage. When a tumor contains many shared lineages, it is called “subclonal.” These distinctly arising subclones can accumulate new mutations that provide growth advantages, allowing them to out-grow less competitive subclones. Over time, the most competitive subclones make up a higher overall proportion of the tumor.

Not all subclones in a tumor necessarily respond to immunotherapy the same way, however. Some subclones may carry mutations that cause a stronger immune response than others. Therefore, it is important to understand the clonal composition of tumors in order to design therapeutic strategies that target enough of the tumor to perturb its growth at a clinically measurable level.

We use genome sequencing to estimate the relative frequency of cells within a tumor that carry a mutation. For each mutation, we calculate the cancer cell fraction (CCF) based on variant allele frequency of the mutation, its copy number, as well as the sample’s purity. CCF analysis can help us to identify subclones of cells that develop independently over the lifetime of a tumor, deduce the relationship between the fitness of those subclones relative to others and their susceptibility to immune targeting.

[Caption] In a tumor cell population, particular mutations (colored circles) are often found in subsets of cells; any given tumor cell contains some but not all of the mutations observed in the population as a whole. The efficacy of an immunotherapy that bolsters the T cell response to a particular mutated tumor protein may be strongly influenced by how much of the tumor cell population expresses that mutant protein. (Top) Immunotherapy #1 enhances the T cell response to the mutant protein A (pink), which occurs in 50% of tumor cells. Thus, when immunotherapy #1 allows these T cells to become activated and kill their targets, mutation A is eliminated from the tumor, but the other 50% of tumor cells remain. (Bottom) Immunotherapy #2 enhances the T cell response to the mutant protein B (purple), which occurs in 75% of tumor cells, so treatment results in only 25% of tumor cells (those without mutation B) persisting. By measuring the abundance of mutations in a tumor cell population over time, including during therapy, we can learn about how mutations are linked. For example, when one mutation disappears or becomes more common, which other mutations go with it? We can also determine how immunotherapies are acting on the immune response. Do some therapies only bolster T cell responses to a small number of antigens, while others support more broad T cell stimulation? Together this information about both the tumor target and the nature of the stimulated immune response can be used to more precisely design therapy for each patient.

Neo-Antigen Prediction

A major obstacle to the development of a strong, effective immune response to a growing tumor is the fact that tumor cells are very similar to healthy tissue. Antigens that arise in tumor cells due to mutations (neo-antigens) allow the immune system to recognize those tumor cells as non-self and can thereby trigger a tumor-specific immune response. It is thought that the number of neo-antigens present in a tumor is a crucial factor determining whether an immunotherapy will be successful at marshaling an effective antitumor immune response.

We are actively developing novel computational approaches to identify neo-antigens in human cancers. Our current method utilizes the same somatic mutation-calling pipeline as described above (see Genomic Sequencing), followed by neo-epitope analysis.

We are developing algorithms for predicting neo-epitopes. Typical algorithms translates all mutations identified by the genomic mutation pipeline, generates candidate peptides that would contain the mutations, and predicts the ability of the peptides to bind to MHC molecules. Peptides for which the mutated version is more strongly presented than the wild type are considered potential neo-antigens.

To better understand the specific interplay between a patient’s mutations and the immune system, mutant peptides are systematically tested for immunogenicity—the ability to activate T cells taken from the same patient. Results of this type of antigen screening can help in the creation of more personalized immunotherapies, such as tumor-specific vaccines or adoptive T cell therapies. Furthermore, we seek to understand the relative contributions of different types of mutations and antigens to effective immune responses, with the ultimate goal to make patient-specific therapies more precise.

[Caption] In order to maximize screening efficiency, plasmids encoding multiple tandem minigenes (TMGs) are generated. A single minigene consists of the DNA encoding a somatic mutation flanked on both sides by twelve amino acids from the wild type source protein. Up to ten minigenes are strung together to generate the TMGs used in screening. In vitro transcribed mRNA is then introduced into autologous dendritic cells (DCs) via electroporation to enable processing and HLA-presentation of the somatic mutation-containing peptides. Patient-derived T cells are co-cultured with TMG-transfected DCs. Neo-antigen peptide-induced T cell activation is quantified via detection of cytokine (e.g., interferon gamma) production using the highly sensitive ELISpot assay. Results are deconvolved by back-mutating (to wild type) each of the ten mutations contained in a reactive TMG and testing each for cytokine production in the co-culture assay described above. Intracellular cytokine staining is used for orthogonal validation of any positive hits from a minigene antigen screen.

Adaptive immune cells (T cells and B cells) help us to recognize specific threats, such as microbial pathogens (e.g., bacteria, viruses, fungi) and tumors. Each T cell or B cell expresses a receptor on its surface—the T cell receptor (TCR) or B cell receptor (BCR), respectively—that can bind to a particular molecular target and differs from one immune cell to the next. When a TCR or BCR finds its target molecule (called an antigen) the T or B cell is signaled to divide and multiply. Each receptor is unique, generated by random DNA recombination and alteration during development into a mature T or B cell.The number of different TCRs that can be generated by one person is huge: between 1012-1020 over the course of a lifetime, with ~109 present in the repertoire at any given time. It is the vast diversity of these receptors that enables any one person to respond to antigens his or her immune system has never encountered before, and to raise an “army” against a particular antigen if it represents a threat.

[Caption] Expansion of tumor-specific T cells. A. When the TCR of a T cell binds a target antigen strongly, the T cell becomes activated and proliferates. Thus, that TCR is represented at an expanded frequency in the population. B. In the context of a tumor, a T cell whose TCR recognizes a tumor-specific antigen may be represented in higher abundance by the same mechanism. With immunogenomics, we can learn about tumors and the T cells that recognize them in parallel: How many TCRs are there? What do they have in common? Do patients with the same tumor share the same expanded TCRs? How do the proportions of these TCRs reflect and predict the changes in the abundance of tumor antigen targets?

Many of these immune cells are not circulating freely in the blood, but infiltrate and provide surveillance in tissues (called tissue-infiltrating lymphocytes;TILs). Unlike the circulating population, TILs represent only a small sample of the total repertoire. T cells surveilling any tissue may be selected to reside in that particular organ or tissue based on their receptors, growth factors and other signaling molecules

Recent advances in high-throughput next-generation sequencing let us capture the TCRs from a whole sample (using a technology called TCRseq), including both circulating blood cells and T cell-infiltrated tissue, and describe the population in terms of TCR distribution. Using statistics, we analyze the diversity of these populations, compare them to one another and look for patterns across groups of patients being treated for cancer. How does the TCR repertoire inside a tumor differ from that in the circulating blood?

We are currently defining properties that indicate tumor-specific reactivity: What does the antitumor T cell response look like when it’s working? When it’s failing? When it has been restored through immunotherapy? These properties may be useful as multi-dimensional biomarkers to monitor tumor progression and therapeutic response. We are also using TCR repertoire sequencing to identify receptors that could be adapted for use as antitumor therapeutics.

[Caption] TCRseq libraries represent a sample the peripheral blood and tumor-infiltrating lymphocyte (TIL) cell repertoires. While some TCRs occur at the same rate in both populations (pink), some TCRs are relatively more abundant in the TILs than in the peripheral blood (green, purple), while some are at much lower abundance, to the point that they aren’t detected (cyan). The TCRs that are most enriched in the tumor tissue may reside there disproportionally because their TCRs bind antigens that are present only in the tumor tissue, making these sequences of interest for further study, as possible biomarkers of tumor progression or as therapeutic templates or targets (see below).

[Caption] Particular TCR sequences associated with either the progression or regression of a tumor could be used directly to develop therapeutics. (Top) The TCRs of cytolytic T cells (CTL) found to expand concomitantly with the regression of cancer could be tested as templates for engineered chimeric antigen receptor (CAR) T cells, which would be able to recognize and attack the tumor using a receptor based on that TCR. (Bottom) The regulatory T cells (Tregs) that inhibit the activity of active antitumor CTLs (thus protecting the tumor) could be blocked by immunotherapies targeting their TCRs.

One of CITI’s goals is to extract the immunogenomic information that will allow doctors to anticipate which patients are most likely to respond well to immunotherapy. We study tumor phenotype, or cell behavior, which is largely determined by the levels at which each gene is expressed. In particular, we use high-throughput sequencing of RNA from tumor biopsies to study how gene expression changes as cancer progresses, when therapy is given and when therapy is effective. Comparing tumors from patients who respond well to treatment versus those of patients who do respond well allows us to identify distinct tumor features that can be translated into diagnostic, prognostic and therapeutic biomarkers to be used for future patients.

The expression levels of genes also provide information about the environment in which the tumor evolves, particularly how the patient’s own immune system responds. Using cutting edge computational techniques, we can integrate this information to understand what types of immune cells are successful in this antitumor immune response.

Differential expression

Tumors differ from one another, in part because each patient’s immune system reacts to a tumor using a unique set of cells to try to destroy it. Abnormal tumor cell behavior, specific antitumor immune activity, non-specific inflammatory immune activity and tissue damage all shape the gene expression profiles of both tumor and non-tumor cells in unique ways.

[Caption] RNA from tumor tissue is subjected to high-throughput next-generation sequencing, which gives short nucleotide “reads” as output. These reads are ordered and aligned to the human genome, giving the amount of RNA from each gene. For each gene, we can then statistically test for differentially higher or lower expression between two groups of samples (for example, the tumors of therapy-responsive patients and non-responsive patients). We can then identify differentially expressed genes, or functionally related groups of genes, that reflect different programs of expressed genes or different cell compositions between tumors.

One application of differential gene expression analyses is to compare the pre-treatment and post-treatment profiles of tumors that responded to immunotherapy with those that did not. We can also identify marker genes or groups of functionally related genes that, if unusually high or low prior to treatment, correspond with better therapy response. Such predictive signatures could enable a simple pre-treatment biopsy to help tailor a patient’s treatment regimen.

In CITI, our pipeline for automating and visualizing these analyses is constantly improving. High-dimensional data visualization tools such as oncoprints and Visne maps allow us to organize and render dozens of parameters (e.g., RNASeq gene expression data in parallel with clinical parameters) simultaneously, without sacrificing their complexity, to enrich our understanding of the cancer immune environment.

[Caption] In one recent study, hierarchical clustering of the expression of genes across tumor biopsies from patients who were strongly responsive (+++), weakly responsive (+), or non-responsive (-) to anti-PD-1 therapy identified two subsets of genes, one of which was highly expressed among the clinically responsive patients, and one of which was highly expressed among the non-responsive patients. Enrichment of functionally related genes in these clusters can be used to infer how such a gene signature is related to these levels of immune response.

Cell composition (in silico deconvolution)

Many different immune cell types infiltrate tissues, where they perform different roles in surveilling for tumors, injuries or infections. For example, certain types of T cells are capable of directly killing dysfunctional, tumorigenic or infected cells, while monocytes and macrophages take up free-floating cell debris and present these potential antigens to T cells. This interaction, which requires both T cells and antigen-presenting cells, can help locally activate or suppress all of the T cells that recognize the same antigens. Meanwhile, B cells produce antibodies that can rapidly spread throughout the body to neutralize a particular threat. Thus, the relative abundance of different cell types can indicate which modes of tumor recognition are active, and which may be suppressed.

The type and degree of immune infiltration into tumors plays an important role in the efficacy of immunotherapy. The abundance of the messenger RNA (mRNA) of particular genes in a tissue biopsy not only allows us to identify differential gene expression between samples, but also enables us to calculate the relative abundance of different immune cell types in the local microenvironment. Briefly, from the mRNA of the bulk sample, we can detect high expression of signature genes or enrichment of a subset of genes that are specific to one cell type and compare it to the expression of genes specific to other cell types. We use computational algorithms such as Supporter Vector Machines (SVMs) or Single-Sample Gene Set Enrichment (ssGSEA) to translate the expression of these signatures into relative abundances of the corresponding immune cell populations.

Because immunotherapies perform different functions—such as maintaining immune cell activation, rescuing immune cells that become exhausted or stimulating antitumor reactivity among immune cells that were previously unexposed to tumor antigens—understanding which types of immune cells are present (or not) in the tumor microenvironment has implications for predicting immunotherapy treatment response, and therefore choosing the right option for each patient.

[Caption] From a bulk tumor sample, mRNA molecules are extracted from the mix of immune cells (colored and gray) and non-immune cells (brown, such as tumor). Running high-throughput next generation sequencing on the mRNA mixture provides the gene expression profile of the biopsy sample (left). Using known gene expression signatures, the proportion of mRNA molecules representing each infiltrated immune populations can be deconvolved (bottom), and the composition (relative abundance) of the immune cell types in the population can be inferred (right).

T cells recognize microbial threats and cancer by binding to degraded bits of foreign proteins (peptides) presented to them by the molecules of the major histocompatibility complex (MHC). These presentation molecules are expressed on the surface of most cell types, but more strongly on certain immune cells that provide tissue surveillance.

The genes that encode MHC class I proteins (called the HLA class I genes in humans) are located on chromosome 6. There are three HLA class I genes: HLA-A, HLA-B and HLA-C. Every person has two copies (alleles) of each gene (one from each parent). Since these genes are the most polymorphic (variable in DNA sequence) in the entire human genome, the six alleles each person has are often all different, and rarely do they match those of genetically unrelated individuals. There are specific alleles (e.g. HLA-A*02:01) that are more prevalent worldwide. Moreover, the frequency of HLA alleles varies across geographic regions and populations.

[Caption] Frequency of select HLA-A alleles across different geographic regions. Shown are the normalized frequencies of some HLA-A alleles in diverse geographic regions. For each HLA-A allele, each colored bar represents the frequency of the allele in a particular geographic region. The data were obtained from http://www.allelefrequencies.net/.

We are examining how the HLA alleles a patient uses affects immunotherapy responsiveness. The presentation of peptides to T cells by the MHC proteins plays a critical role in the adaptive immune response and strongly influences how T cells respond. For example, some MHC molecules activate T cells strongly, which is desirable if the specific antigen represents a threat (such as a viral infection or a dysfunctional or mutated protein produced by a tumor). However, this can be dangerous if the antigen is normal and occurs on healthy cells. Because potentiating the correct recognition by T cells of self versus non-self peptides is a major function of MHCs, and this distinction becomes muddled in the case of cancer, it is important to use genomic sequencing data to identify which six HLA alleles a patient has when determining how his or her immune system will react to the mutated tumor peptides.

Currently, the gold standard for identifying which HLA alleles a patient has is PCR-based typing, in which the HLA locus is specifically amplified and then sequenced. As genomic sequencing has achieved higher coverage, in silico HLA genotyping offers an efficient alternative that is economical when a patient’s genome is already being sequenced. Current software tools provide up to 99% accurate resolution for most clinical applications. For clinical applications that require higher accuracy, such as predicting tumor antigen presentation by certain HLA alleles that differ from their closest other alleles by only a few nucleotides, we are refining the computational pipelines for HLA identification using ensemble approaches, population-based weighting and alternative assemblies of the human reference genomes.

Understanding the cellular composition of tumor and immune cells on the level of phenotypic protein markers is a critical part of investigating tumor immunology. CITI utilizes several experimental techniques to better quantify protein expression in individual tumor and immune cells. Antibody-based flow cytometry allows for the precise quantification of extracellular and intracellular proteins of interest. Using fluorescence-activated cell sorting (FACS), individual immune or tumor cell populations can be further subdivided for downstream analysis, including DNA and RNA sequencing.

Occasionally, investigators may wish to simultaneously quantify the expression of a large number of intracellular and extracellular proteins from a single sample. Conventional flow cytometry limits the number of simultaneous parameters detectable due to fluorophore-generated spectral overlap. To overcome this barrier, CITI utilizes mass cytometry by time-of-flight (CyTOF) technology. CyTOF identifies intracellular and extracellular proteins using antibodies conjugated to rare earth heavy metals. After antibody-based staining, the sample is ionized and the antibody composition of single cells are subsequently identified. The primary advantage of CyTOF is its ability to simultaneously analyze a robust user-defined panel of cellular targets from a single sample using an antibody-based approach. Multi-parametric data can subsequently be analyzed using conventional flow cytometry software or more sophisticated techniques, including SPADE or ViSNE plots.

[Caption] Tumor-infiltrating lymphocytes and peripheral blood mononuclear cells from a patient with head and neck squamous carcinoma were clustered by their staining for immunophenotypic markers (clusters represent cells with similar phenotypes, where closeness on the plot indicates similarity), using the t-distributed stochastic neighbor embedding (t-SNE) algorithm. Color scale indicates CD8 protein detected, normalized across cells, which distinguishes the cell types (clusters) expressing this marker of the cytolytic T cell lineage.