NGS of Full-length HLA genes

The HLA genes are the most polymorphic loci in the human genome; over 12,000 genetic variants (alleles) have been identified at 19 HLA genes, with some individual genes displaying several thousand alleles. The allelic and structural variation that characterizes these genes pose extreme challenges for routine genotyping of these genetic loci; this variation cannot be characterized by a few SNPs, and the high level of polymorphism confounds de novo sequence assembly efforts. Until recently, HLA genotyping has been accomplished using a variety of PCR-based approaches involving sequence-specific oligonucleotide probe (SSOP) hybridization and/or Sanger sequencing of 400-600 base pair amplicons. These methods do not readily allow the establishment of chromosomal-phase between assessed sequence features, which results in considerable ambiguity regarding an HLA genotype.
Next generation sequencing (NGS) methods are now being applied for HLA genotyping with great success. These methods apply a variety of single-molecule sequencing approaches to accomplish extensive phasing across genes, minimizing genotyping ambiguity, accelerating polymorphism discovery, and improving our understanding of allelic diversity. The inherent high-throughput nature of NGS has a broad potential for therapeutic and diagnostic applications. However, a wide variety of NGS platforms and methodologies are available, and their relative strengths and shortcomings for Histocompatibility and Immunogenetics (H&I) research applications remain unexplored.
The goal of the 17th International HLA and Immunogenetics Workshop is to advance the fields of H&I research through the application of NGS technologies, and to advance the development of NGS technologies tailored to meet the needs of the H&I community.
Although the HLA field has a well-defined nomenclature system, and well-curated reference sequence database, significant gaps in sequence coverage remain for the majority of HLA alleles, as these variants have been defined on the basis of partial gene sequence. In the absence of complete reference sequences, sequence datasets cannot be leveraged to the greatest extent possible; phasing of raw sequence data will suffer, resulting in erroneous or ambiguous genotype assignments, and variation in the genotyping results returned by different NGS typing approaches.
Goals
1. To complete the sequence of all HLA alleles of the reference cell lines from the 13th IHIWS.
2. To perform HLA genotyping of 10,000 quartet families of varied ancestry, utilizing at least one NGS method
Activities
1. Cloning or isolated amplification and nucleotide sequencing of all HLA alleles of the reference cell lines from the 13th IHIWS
Different methodological approaches for cloning and nucleotide sequencing are accepted. DNA or cells corresponding to cell lines from the 13th IHIWS will be distributed to laboratories interested in participating in this study. A variety of analytical methods corresponding to specific sequencing platforms will be applied for determining consensus sequences. These sequences will be deposited in GenBank and will be publicly available when validated and approved by the steering committee.
2. HLA Typing by Next Generation Sequencing Methodologies
The primary focus of the 17th IHWS will be Next Generation Sequencing (NGS) of classical HLA and the KIR genes with the aim of advancing the field by providing deep sequencing of all exons, all introns, and the 5′ and 3′ UTRs of the classical HLA genes. We would like to invite investigators, institutions and laboratories to participate in this component by performing NGS based testing and/or analysis of NGS data using various NGS platforms (including and not restricted to Illumina, Ion Torrent, Roche 454, PacBio) and software analysis packages that have already been validated for HLA typing. Other NGS typing approaches are encouraged for participation.
Investigators can participate in this component by submission of DNA specimens to be tested by a second laboratory or by HLA NGS typing of locally collected specimens. All eligible specimens to be included in the study should include appropriate informed consent and be approved for participation by the local Institutional Review Board.
All participant investigators are invited and encouraged to participate in the data analysis and preparation of manuscripts.
The laboratories performing testing will submit sequencing and genotype results electronically. There will be centralized data collection; this data may be distributed to other investigators that may utilize different software packages.
The samples to be included in the studies should be from at least four individuals from a single biological family. It is encouraged to include families previously typed for some HLA loci. Ideally parents and all offspring should be included in the study.
Through the analysis of the segregation of alleles in families, we will be able to determine unambiguous HLA allelic haplotypes; we propose to analyze informative quartets in which each haplotype is probed in two different family members, paired with different haplotypes. The proposed studies will render high quality sequence data that will provide the community with a large-scale database of complete genomic segments of the HLA genes. In addition we plan to perform SNP testing in these families. Analysis of the total genomic region, including both coding and non-coding regions with different selective pressures, will help to clarify the evolution of the HLA system.
Links for participation will be found on this page in the near future. We anticipate starting participation and data collection in January 2015.
HLA Typing for NGS – Pilot Project
We conducted a pilot project in which we will compare HLA typing results obtained by different NGS platforms utilizing different sets of primers with variable coverage. We will examine performance by the different methods and will investigate ways of transmitting results capturing coverage information and nuances specific for each platform/reagent combination.
The testing of a common set of reference samples by several sets of platforms/reagent combinations will allow us to:
- evaluate the performance of each platform/reagent combination.
- identify limitations/nuances specific for each platform/reagent.
- evaluate software analyses packages utilized to analyze sequences and assign genotypes and possible biases of the corresponding algorithms.
- evaluate ways of transmitting and capturing data that can be stored in the appropriate format to be re-analyzed in the future and genotype assignments obtained by different platform/reagent sets can be compared.
Because the HLA sequence databases change significantly in short periods of time with the identification of new alleles and the extension of sequences in incompletely covered alleles, the ability to capture data for re-analysis of genotype assignments over time is crucial to understand HLA diversity and to maintain accurate genotype assignments.
This pilot study is being conducted in 15 international laboratories performing HLA typing using various NGS platform/reagent combinations. Each participant lab will test a set of fifty blinded DNA samples which have been collected in previous S or cell lines that have been permanently stored and can be distributed without restrictions for future use. These cell lines were typed for some but not all HLA loci by standard Sanger sequencing methods. The laboratories performing testing are submitting sequencing and genotype results electronically. The centralized data collection will be distributed to other investigators that may utilize different software packages.
Cloning and sequencing of classical HLA loci in the reference samples has been and is being performed. The cloning and sequencing results will serve as an unambiguous reference of the evaluation of each platform/reagent/software combination used. In addition the cloning and sequencing will contribute to the necessary completion of sequences of common alleles with the resulting enhancements in unambiguous HLA genotype assignments.
Component Leaders: Marcelo Fernandez-Viña PhD, Steven J. Mack PhD
Liaison: Harriet Noreen CHS