Component: NGS of full length HLA genes: Preliminary results of the Pilot Study
Lisa E Creary1, Steven J Mack2 and Marcelo Fernandez-Vina1
1Department of Pathology, Stanford Blood Center
2Children’s Hospital Oakland Research Institute
Overview
The ultimate goal of the 17th International HLA and Immunogenetics Workshop (IHIW) is to advance the fields of Histocompatibility and Immunogenetics (H & I) research through the application of Next-Generation Sequencing (NGS) technologies for HLA and KIR genotyping, and to advance the development of NGS technologies tailored to meet the needs of the H & I community.
In 2014, we initiated an international multi-center pilot study in order to assess the performance of various NGS protocols, platforms, and software for full gene typing of classical class I (HLA-A, -B, -C) and class II (HLA-DPA1, -DPB1, -DQA1, -DQB1, -DRB1, -DRB3, -DRB4, -DRB5) genes.
The specific aims of this study were four-fold:
- Evaluate the performance of different NGS protocols and platforms, and identify the limitations/nuances specific to each method.
- Evaluate software programs for analysis of sequence data and assignment of HLA genotypes.
- Inform the design of optimized methods for the exchange and storage of NGS data. A goal of the workshop is to store HLA genotyping data in a format for reanalysis. This format should allow for simple and systematic examination and comparison of genotypes obtained by different protocols and platforms.
- Clone and sequence all class I and class II alleles in a quality control (QC) reference panel. These results will constitute an unambiguous reference for the evaluation of NGS reagent/platform combinations. In addition, these reference data will contribute to the necessary completion of full-gene sequences for common alleles and aid in identifying novel alleles resulting in unambiguous HLA genotypes.
Methods
We initially conducted a survey of twenty-five interested laboratories, using a questionnaire to gather information about the number and type of HLA genes that participants were able to sequence, NGS protocol and instrumentation used, software analysis packages (e.g. commercial or in-house) used, and type of output file formats. Fifty blinded QC cell line-derived genomic DNA samples, supplied by the Fred Hutchinson Cancer Research Center, Seattle, WA (http://www.ihwg.org) and collected in previous IHIWs were distributed to seventeen laboratories. These samples were genotyped by fifteen laboratories, or groups within the same laboratory applying different platforms and/or reagents, located worldwide (Table 1). The QC panel was selected to represent a wide range of HLA allele families and to include CWD alleles, rare alleles, null alleles, and samples homozygous for at least one locus. The QC cell lines had been typed previously by Sanger sequence Based Typing (SBT), sequence-specific primers (SSP), sequence-specific oligonucleotide (SSO) probes, serological and cellular methods for some but not all HLA genes. One of the fifteen laboratories also cloned and determined the nucleotide sequences for the majority of the HLA alleles of the QC panel. Genotype results were collated over a period of 10 months. Primary sequencing data (e.g. FASTQ files) were also collected from five laboratories.
Table 1. NGS HLA Pilot Study participating laboratories |
Laboratory |
Antony Nolan, London, UK |
BFR, Beijing, China |
GenDx, Utrecht, Netherlands |
Georgetown University, Washington DC, USA |
H&I Laboratory, Nantes, France |
Royal Perth Hospital, Perth, Australia |
Stanford Blood Center, Group 1, CA, USA |
Stanford Blood Center, Group 2, CA, USA |
Stanford Blood Center, Group 3, CA, USA |
Transplantation and Immunology, Tuebingen, Germany |
Transplantation Immunology, Ulm, Germany |
UCLA, CA, USA |
UNC-Chapel Hill, NC, USA |
Uppsala University, Uppsala, Sweden |
University of Vienna, Vienna, Austria |
Table 2 shows the sequencing platforms utilized by different laboratories to perform NGS based HLA typing.
Table 2. NGS Platforms used by participating laboratories | |
Platform | Number of laboratories |
GS Roche Junior | 1 |
Illumina MiSeq | 8 |
Ion Torrent Personal Genome Machine (PGM) | 4 |
Pacific Biosciences | 2 |
Results
Genotyping
Genotyping results are shown in Tables 3 and 4. All 15 laboratories performed full-length gene sequencing of HLA-A, -B and -C alleles. For HLA-A, 32 different alleles were typed; two of these alleles were reported as novel intronic variants. HLA-B typing identified 54 alleles, including one allele with a novel exon variant and eight with novel intronic variants. HLA-C typing identified 31 alleles, of which two included novel exon variants. Only four laboratories typed DPA1, and collectively identified 11 different DPA1 alleles. DPB1 was genotyped by 10 laboratories and 19 unique alleles were sequenced. Thirteen laboratories typed DQB1, and six laboratories genotyped DQA1, identifying 19 and 23 unique alleles respectively. For DRB1, a total of 13 laboratories used different primer pairs that generated various ranges of gene coverage (full gene, exons 2 and 3, or exons 2, 3, 4 only), and identified a total of 39 unique alleles. Five laboratories typed DRB3 (number of alleles identified, n = 4), DRB4 (n = 4), and DRB5 (n = 3).
Table 3. Total number of heterozygous and homozygous alleles in the QC panel | |||
HLA Locus | Heterozygous | Homozygous | Total |
A | 72 | 28 | 100 |
B | 78 | 22 | 100 |
C | 72 | 28 | 100 |
DPA1 | 42 | 58 | 100 |
DPB1 | 54 | 46 | 100 |
DQA1 | 74 | 26 | 100 |
DQB1 | 68 | 32 | 100 |
DRB1 | 80 | 20 | 100 |
DRB3 | 10 | 48 | 58 |
DRB4 | 10 | 36 | 46 |
DRB5 | 0 | 12 | 12 |
Table 4. The number of unique and novel alleles identified by participating laboratories | |||
Locus | Alleles | Novel Exon variants | Novel Intron variants |
A | 32 | 0 | 2 |
B | 54 | 1 | 8 |
C | 31 | 2 | 0 |
DPA1 | 11 | 0 | 0 |
DPB1 | 19 | 0 | 0 |
DQA1 | 23 | 0 | 0 |
DQB1 | 19 | 0 | 0 |
DRB1 | 39 | 0 | 0 |
DRB3 | 4 | 0 | 0 |
DRB4 | 4 | 0 | 0 |
DRB5 | 3 | 0 | 0 |
Cloning
Cloning experiments to generate full-length unambiguous gene sequences were conducted at the Stanford Blood Center. Class I loci in all 50 QC samples were cloned and sequenced at full genomic-length using Illumina NGS. Cloning and sequencing extended or confirmed allele sequence diversity in all HLA loci. Novel alleles and alleles with extended genomic sequences were identified and generated for 12, 14, and 2 alleles in HLA-A, -B, and -C respectively. DPA1 and DPB1 novel/extended alleles were identified for 7 and 15 alleles respectively. Twenty DQA1 alleles and 9 DQB1 alleles were successfully cloned and full- genomic length sequenced. For DRB1, 100 alleles were cloned and sequenced from exons 1 to 2 and exons 2 through 6. Forty-one DRB1 alleles were identified as novel/extended. For DRB3, DRB4 and DRB5, 14, 6, and 6 alleles were found to be novel.
Concordance
Due to incomplete and low-resolution reference genotypes for the QC panel, concordance rates were calculated by comparing NGS genotypes from each individual laboratory with consensus genotypes across all laboratories (including cloned data). Testing laboratories were coded A through O (Table 5). Concordance was determined at 2-field resolution. Consensus assignments were high for most laboratories and for most HLA loci. Only one laboratory had a low concordance rate for one locus (DQA1).
Table 5. Concordance rates of participants NGS genotype compared to consensus genotypes | |||||||||||
Group | HLA-A | HLA-B | HLA-C | HLA-DPA | HLA-DPB | HLA-DQA | HLA-DQB | HLA-DRB1 | HLA-DRB3 | HLA-DRB4 | HLA-DRB5 |
A | 100 | 100 | 100 | 98 | 100 | 100 | 100 | 100 | 100 | NC | 100 |
B | 100 | 100 | 100 | 99.0 | 100 | 98.0 | 99.0 | 98.0 | 100 | NC | 100 |
C | 98.0 | 97.0 | 100 | 99.0 | 100 | 99.0 | 100 | 97.0 | 100 | NC | 83.3 |
D | 99.0 | 100 | 100 | 100 | 100 | 98.0 | 100 | 98.0 | 100 | NC | 100 |
E | 98.0 | 98.0 | 94.0 | NT | 100 | 28.0 | 100 | 100 | 100 | NC | 100 |
F | 93.2 | 98.9 | 100 | NT | 98.9 | 98.9 | 97.8 | 93.3 | NT | NT | NT |
G | 100 | 100 | 100 | NT | 100 | NT | 100 | 100 | NT | NT | NT |
H | 100 | 100 | 100 | NT | NT | NT | 100 | 100 | NT | NT | NT |
I | 100 | 100 | 100 | NT | 99.0 | NT | 98.0 | 98.0 | NT | NT | NT |
J | 96.6 | 93.0 | 95.5 | NT | 95.9 | NT | 94.4 | 90.9 | NT | NT | NT |
K | 100 | 100 | 100 | NT | NT | NT | 100 | 92.7 | NT | NT | NT |
L | 100 | 94.0 | 100 | NT | NT | NT | 95.0 | 94.0 | NT | NT | NT |
M | 97.6 | 96.3 | 95.0 | NT | 100 | NT | 96.8 | 96.9 | NT | NT | NT |
N | 100 | 100 | 100 | NT | NT | NT | NT | NT | NT | NT | NT |
O | 98.0 | 96.0 | 98.0 | NT | NT | NT | NT | NT | NT | NT | NT |
NT = not tested
NC = not calculated unable to generate a consensus genotype.
The results of this study allow us to conclude that HLA typing by various methods is feasible and accurate results can be obtained by various library preparation protocols, platforms and software. We have applied the significant information gained in this effort to design the 17th IHIWS database, as well as to develop strategies for collection of HLA genotype data and the storage of primary data to perform re-analyses as a workshop activity.
We thank the Fred Hutchinson Cancer Research center for supplying the QC samples and all laboratories that participated in the study.