CN103074444A - HLA (histocompatibility locus antigen) genetic typing method of HLA determinant gene through high-throughput sequencing - Google Patents

HLA (histocompatibility locus antigen) genetic typing method of HLA determinant gene through high-throughput sequencing Download PDF

Info

Publication number
CN103074444A
CN103074444A CN201310058260XA CN201310058260A CN103074444A CN 103074444 A CN103074444 A CN 103074444A CN 201310058260X A CN201310058260X A CN 201310058260XA CN 201310058260 A CN201310058260 A CN 201310058260A CN 103074444 A CN103074444 A CN 103074444A
Authority
CN
China
Prior art keywords
hla
sequence
reads
comparison
reference sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310058260XA
Other languages
Chinese (zh)
Inventor
王申俊
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU JINGYIN BIOLOGICAL TECHNOLOGY CO Ltd
Original Assignee
SUZHOU JINGYIN BIOLOGICAL TECHNOLOGY CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU JINGYIN BIOLOGICAL TECHNOLOGY CO Ltd filed Critical SUZHOU JINGYIN BIOLOGICAL TECHNOLOGY CO Ltd
Priority to CN201310058260XA priority Critical patent/CN103074444A/en
Publication of CN103074444A publication Critical patent/CN103074444A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an HLA (histocompatibility locus antigen) genetic typing method of an HLA determinant gene through high-throughput sequencing. Patterning HLA typing software based on various high-throughput sequencing platform data has important significance in clinic or biomedicine. Compared with the traditional sequencing method through a PCR-SBT (polymerase chain reaction-sequence based typing) method, the high-throughput sequencing technology has the obvious advantages in economic cost and time cost. HLA sequence data of thousands of samples can be read through an experiment and high resolution of HLA typing is achieved at one time through the high-throughput sequencing technology, and meanwhile, a new allele can be found. Qualitative leap in the aspects of flux detection, data quality, cost control and the like are achieved, 'low cost and high data' are achieved, additional economic burden of a patient caused by typing for multiple times can be avoided, the time for searching for a provider whose HLA is matched with that of the patient can also be reduced, and the precious time is saved for treating the patient.

Description

The HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence
Technical field
The present invention relates to a kind of gene sequencing and classifying method, relate in particular to a kind of high-flux sequence and HLA methods of genotyping of histocompatibility antigen determinant gene.
Background technology
Site, HLA place surpasses 200 genes, and crucial effect is arranged in the human immunity system.HLA has the polymorphism of height, comprises approximately 7000 known allelotrope (http://www.ebi.ac.uk/imgt/hla/).In marrow and other organ transplantation, the genotypic matching degree of HLA is higher between donor and the receptor, and the incidence of rejection is just lower, and transplanting succeed rate and transplant organ long-term surviving rate are just higher.Otherwise, rejection just more easily occurs.The people such as Stephanie J. Lee are in a broad scale research in 2007, analyzed American National marrow storehouse (National Marrow Donors Program) record from 1988 to 2003 3857 the transplanting data, patient's survival rate of finding to be correlated with 8 HLA allelotrope mating fully is the highest.These 8 allelotrope be respectively HLA-A ,-B ,-C ,-DRB1 ,-DQB1 ,-DQA1 ,-DPB1 and-DPA1, wherein HLA-A ,-B ,-C or-DRB1 in any one do not mate, all can bring higher lethality rate: 52% the when survival rate in 1 year is mated fully from 8, drop to 43%.And not mating of two or more sites can significantly aggravate this risk (Lee, Klein et al. 2007).Meanwhile, the time of transplanting, the effect after equally the patient being transplanted is very important.The people such as Stephanie J. Lee also find, only have the patient of 6 HLA gene locuss couplings, if in ill transplanting in early days, its effect still mate fully than 8 genes but disease progression transplant during to advanced stage good.This be because, the state of disease when transplanting is the unique factor that can be grasped by the doctor, transplanting as early as possible probably is the most significant steps (Lee, Klein et al. 2007) that can affect patient's survival rate.Therefore, HLA typing method accurately and fast, the patient who needs is carried out marrow or organ transplantation just seems particularly crucial.
The HLA genotype is joined the type except the donor and the acceptor that are used in large quantities clinically organ transplantation, also exists close related with many specific diseases such as autoimmune disease, communicate illness and certain cancers etc.For example, HLA-DRB1*04:01 is proved and rheumatic arthritis (Angelini, Morozzi et al. 1992), insulin-dependent diabetes mellitus (IDDM) (Windsor, Puschendorf et al. 2005), multiple sclerosis (Laroni, Calabrese et al. 2006) etc. are closely related.HLA-B*57:01 then can to protect mankind be subject to the infection (Fellay, Shianna et al. 2007) of HIV.In addition; although there is the susceptibility of many effect gene mammary cancer; and these genes are all uncorrelated with HLA; but in white people; still find that HLA II type gene HLA-DQB*03032 and HLA-DRB1*11 may have to human breast cancer the effect (Chaudhuri, Cariappa et al. 2000) of protectiveness.Therefore, the HLA typing method can also be used for resistance or the susceptibility of predict human to some specified disease.
Adverse drug reaction (ADR) refers to that the patient occurs when certain medicine that uses normal dose is used for prevention, diagnosis, treatment disease or regulates physiological function harmful and with the irrelevant effect of medication purpose.Wherein many T cells that all belong to are to the caused drug allergy syndrome of medicine immune response, and some of them such as Stevens-Johnson syndrome (Stevens-Johnson ' s syndrome, SJS) and toxic epidermal necrolysis (toxic epidermal necrolysis, TEN) even can bring serious consequence.Research is found, the adverse drug reaction of many T cell regulate and controls is relevant with specific HLA allelotype, for example, Allopurinol(medicine that is used for the treatment of gout and hyperuricemia) with some the Hans in the HLA-B*58:02 gene that carries, Carbamazepine(spasmolytic for the treatment of epilepsy) with some Han nationality, India and Thailander in carry the HLA-B*15:02 gene (Thorsby 2011; Bharadwaj, Illing et al. 2012).The risk of ADR occurs in the patient who carries some specific HLA genetic marker to certain medicine, compare the normal people, can exceed 500-1000 doubly, and this has exceeded known HLA and the relation between the disease (Thorsby 2011) far away.In the personalized medicine epoch on the horizon, based on the high-resolution HLA typing method of high-throughput, detect in advance specific HLA allelotrope and can help the clinician to judge that the patient takes the risk of untoward reaction that some medicines occur.
In a word, the high throughput method of research HLA somatotype, not only tool is of great significance clinically, and aspect the prevention and control of disease, also can bring into play positive effect.So histocompatibility antigen determinant gene high-flux sequence detection technique one is can be applied in relevant with operation transplantation (such as organ bone marrow transplantation etc.) clinically; The 2nd, assess with disease prevention and control or adverse drug reaction relevant in the personalized medicine; Three is that organ (or marrow) is contributed and transplanted the storehouse numerous donors' conventional H LA somatotype is detected etc.
The development experience of HLA typing method two stages: serological typing stage and dna typing stage.Recently, along with the maturation of round pcr, serological typing is abandoned substantially, and the HLA somatotype enters the dna typing stage comprehensively.Compare with serology, dna typing resolving power is high, error rate few (Dunn 2011).The HLA typing method of having set up at present comprises following three kinds: PCR-SSP (PCR with sequence-specific primers, the sequence specific primer PCR), PCR-SSOP (PCR with sequence-specific oligonucleotide probes, the PCR oligonucleotide probe) and PCR-SBT (PCR with genomic DNA sequencing-based typing, PCR product sequencing based type) (Lind, Ferriola et al. 2010; Dunn 2011).Because HLA number of alleles purpose constantly increases, PCR-SSP and PCR-SSOP method more and more are difficult to adapt to new standard, these technology have been stopped using in many laboratories, and the PCR-SBT method becomes the acceptable standard method of people (Dunn 2011) gradually.In theory, be the most directly perceived, method the most accurately owing to having adopted Sanger order-checking, PCR-SBT, also be unique be used for defining new allelic method (Gabriel, Danzer et al. 2009 simultaneously; Lind, Ferriola et al. 2010), therefore for each HLA somatotype laboratory, it is most important that the method seems.PCR-SBT is a kind of sequence typing method of Simple fast, at first utilizes pcr amplification to obtain dna fragmentation, and checking order based on Sanger obtains the dna sequence dna of amplified fragments again.The HLA genotyping technique of setting up accordingly not only can obtain the high resolving power result, also can show the complete nucleotide sequence of hypervariable region between the HLA gene, but sometimes also can produce ambiguous result (Gabriel, Danzer et al. 2009; Lind, Ferriola et al. 2010; Dunn 2011).Its major cause has: the allelotrope sequence of (usually, HLA I type gene is exon 2 and 3, the II type is exon 2) is identical in (1) order-checking zone, and the polymorphic allele site is positioned at beyond the analysis area; (2) in the Sanger sequencing reaction, Nucleotide is incorporated in all dna profilings simultaneously, 2 allelotrope are by together amplification order-checking, suitable/anti-ambiguous the result that causes the PCR-SBT typing method to be measured, sometimes the combination between isoallele can not obtain identical heterozygote sequence, can't obtain determining unique HLA genotype, such as A*01:01:01:01+02:01:01:01=A*01:14+92:21=A*36:04+02:36 (Adams, Barracchini et al. 2004; Listgarten, Brumme et al. 2008; Lind, Ferriola et al. 2010).Correspondingly be, most HLA I/II type allelotrope sequence polymorphisms, normally come from gene transformation (gene conversion), restructuring (recombination) and exon reorganization (exon shuffling) event (Adams, Barracchini et al. 2004).Therefore, in same exon, may there be multiple sequence motif on the specific position, these motif can be used (Adams simultaneously by the allelotrope at different subtype or different seats, Barracchini et al. 2004), namely each allelic specificity is to be formed by unique combination of these motif in fact.Allelotrope is more, and the probability that produces equivocal result is larger.Solving the PCR-SBT method and produce the equivocal result who joins type, is can (Dunn 2011 based on the PCR method (PCR-GSSP) of group-specific on the one hand; Lebedeva, Mastromarino et al. 2011) or haplotypes-specific extraction (Dapprich, Ferriola et al. 2008; Gabriel, Danzer et al. 2009), then be based on the other hand the method for colony's statistical calculations, to the result of equivocal property according to different areas or the ethnic group linkage disequilibrium to some specific HLA allelotype, infer most possible allelotype (Listgarten, Brumme et al. 2008), the method is especially effective to some historical datas in the database.In any case but the PCR-SBT typing all seems numerous and diverse and unit cost is higher, spended time is grown (Lank, Wiseman et al. 2010 simultaneously; Erlich, Jia et al. 2011).Along with the development of technology, people recognize that gradually tetra-sodium order-checking (pyrosequencing) perhaps can address this problem (Ramon, Braden et al. 2003; Ringquist, Styche et al. 2007; Lu, Boehm et al. 2009).Be different from the Sanger order-checking, the tetra-sodium order-checking can be carried out sequencing to the Nucleotide addition sequence, can design different Nucleotide addition sequences reacts, namely for the HLA allelotrope template that obtains equivocal somatotype result, can use heterogeneous nucleus thuja acid adding mode, make Nucleotide only mix one of them allelotrope, thereby make this allelic sequencing reaction early than another allelotrope, from principle solve the allelotrope that the Sanger method brings suitable/Anti-fuzzy.
The high-throughput research of HLA somatotype is mainly carried out based on Roche 454 GS FLX sequenators at present.The high throughput sequencing technologies that is provided by 454 GS FLX can the disposable HLA polymorphism of a plurality of samples directly being differentiated the exon level, the hrr gene type that reaches single is measured, eliminate the replication that repeatedly a certain fragment is carried out for measuring certain complicated allelotype, greatly simplified the complicacy of experiment flow than the PCR-SBT method.The result of simultaneously 454 HLA experiment also allows the GAssign-ATF 454 of third party's genotyping software such as Conexio Genomics company to test fast and data analysis, thereby obtain high-resolution detected result (Bentley, Higuchi et al. 2009; Gabriel, Danzer et al. 2009; Lind, Ferriola et al. 2010; Holcomb, H glund et al. 2011).In addition, because the high flux property of 454 sequencing result data (can produce simultaneously millions of reads sequences, read long 250bp), also so that find that rare allelotrope becomes possible (Bentley, Higuchi et al. 2009 in the individual specimen; Holcomb, H glund et al. 2011).
Yet, although GAssign-ATF 454 softwares can be integrated the site of all samples and sequence and with itself and the IGMT HLA sequence library automatically allelotype of output HLA of comparing, but this software is a business software, and its user interface seems comparatively complicated, limited its widespread use.In addition, from present documentation ﹠ info, this software only sees in the analysis relevant with the Roche/454 sequencing result, for other order-checking platform such as the Solexa of illumina company, not yet sees relevant report.Therefore, develop more general software and enable applicable different order-checking platform, seem particularly necessary.
Usually, determine that the allelotype of HLA can directly be carried out the homology search comparison based on sequence similarity, such as BLAST (Wiseman, Karl et al. 2009; Kita, Ando et al. 2011; Lee, Hur et al. 2011) or BLAT (Lank, Wiseman et al. 2010), reads fragment or spliced contigs fragment that 454 order-checkings obtain are compared the IGMT/HLA sequence library.This database is a thematic data base of human ajor histocompatibility composite system sequence, has comprised whole HLA sequences (Robinson, Mistry et al. 2011) of WHO HLA system factor NK formal approval and name.Therefore, for known most of HLA allelic gene types, the method can be identified accurately.But the method based on the homology comparison still exists one and the similar limitation of PCR-SBT classifying method, and even the polymorphic allele site is positioned at outside the order-checking zone, still ambiguous result may occur.Certainly, if all zones for the HLA gene carries out complete order-checking, sequence analysis method such as BLAST can reduce this ambiguity, but for conventional H LA somatotype, whether need the complete genome group to HLA, comprise intron and exon checks order, still there is dispute, after all at least with regard to intron, the loci polymorphism that mostly is in the intron does not in fact affect allelic expression, the meaning unactual to allelic somatotype, but really also exist some intragenic mutations to cause again the HLA allelotrope can't normal expression (Elsner, Bernard et al. 2002 simultaneously; Lind, Ferriola et al. 2010).Another problem that merits attention is, current IGMT/HLA sequence reference database is not enough perfect (Robinson, Mistry et al. 2011), will increase the probability (Lind, Ferriola et al. 2010) of wrong comparison.
Be under the jurisdiction of the Broad Institute of MIT and Harvard based on GATK(Genome Analysis Toolkit) developed a universal program HLACaller (Erlich, Jia et al. 2011), can for each locus of HLA, calculate the right posterior probability of HLA allelotrope on the karyomit(e) based on the information of three aspects: the locational genotype of (1) each base; (2) close on the phase information of variant sites; (3) the special gene frequency of colony.Basic thinking: (A) calculate the genotypic probability of each base of observing on the HLA locus based on GATK, and with the probability multiplication of all positions; (B) based on binominal distribution, calculate the every a pair of phase directional probability consistent with the sequence data in corresponding site that closes on polymorphic site of specific HLA allelotrope centering.This probability be based on HLA allelotrope right close on reads number that the polymorphic site phase place is complementary and total reads number, use that binominal distribution calculates, the order-checking error rate of wherein estimating (estimated sequencing error rate, P Err) be assumed that 1%, with (A), every a pair of probability multiplication that closes on polymorphic site; (C) probability among (A), (B) and colony's gene frequency are multiplied each other, just obtained for each posterior probability to the equipotential gene pairs, wherein the end-result as the HLACaller algorithm of probable value maximum.
The advantage of this algorithm is fully to have integrated colony's gene frequency information.The people such as Lank (Lank, Wiseman et al. 2010) thinks, to HLA I type antigen gene A, B, C, based on exon 2,3,4 in theory can only high resolving power ground distinguish 85% allelotrope, remaining 15% can only in/low resolving power carries out somatotype.Therefore, HLACaller after having integrated the gene frequency information of colony, even if only carry out sequencing analysis for exon 2 and 3, also still can high resolving power to the HLA allelic gene typing, and the accuracy of somatotype has reached more than 96%.Can't carry out the HLA genotype sample of high precision somatotype for those minorities, HLA still can provide a collection of potential result with higher probability.This problem can effectively be solved behind other the exon that additionally checks order.But still there are many limitation in the HLACaller algorithm, such as Preference and the accuracy of colony's gene frequency information existence itself.The somatotype that the former may cause HLA caller is some specific HLA allelotrope of preference as a result, and latter directly brings wrong judged result.Simultaneously, have challenge owing to designing specific primer for HLA II type antigen gene, this algorithm does not also use in HLA II type, even because unknown, this algorithm is stopped updating maintenance by Broad Institute.
In fact, HLACaller remains based on 454 order-checkings and carries out somatotype prediction.Although the long exon region that almost can cover whole HLA gene of reading because of 454 sequencing technologies; but 454 technology still have an obvious defective; length that namely can't Measurement accuracy homopolymer (homopolymer); cause sequencing result inaccurate; exactly because also this reason, 454 order-checkings usually can occur with the insertion of Nucleotide or lack relevant mistake.By contrast, other of high-flux sequence represent the Solexa sequencing technologies of platform such as illumina company or the Ion torrent PGM sequencing technologies of Life Technologies company will be gone up well a lot.Meanwhile, Solexa or Ion torrent PGM sequencing technologies are higher than 454 technology far away on the flux of order-checking, and its cost that relatively checks order also is lower than 454 technology.Therefore, development is based on the HLA typing method of illumina Solexa or Life Technologies Ion torrent PGM order-checking, and just seeming extremely is necessary.
Summary of the invention
In view of the defective that above-mentioned prior art exists, the objective of the invention is to propose a kind of HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence, solve HLA gene type efficient and Cost Problems.
A kind of technic relization scheme of above-mentioned purpose of the present invention is: the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence for HLA allelotype known and that included, is characterized in that comprising step:
I, employing high-flux sequence platform amplification order-checking obtain the reads sequence fragment;
II, the HLA allelotrope that comprises in the up-to-date IMGT/HLA database is as reference sequences, and the reads sequence fragment that the order-checking of step I is obtained is compared instrument with reference sequences employing nucleotide sequence and compared, and obtains comparison result;
III, to the comparison result carry out mispairing, optimum matching, length and/or tail end coupling multiplex screening, filter and optimize;
IV, definition central reads, the minimum order-checking overburden depth MCOR of all reads, the minimum order-checking overburden depth MCCR of central reads, MCOR and the MCCR value of calculating every reference sequences after the step III is filtered, and give up MCOR less than 20 and MCCR less than 10 reference sequences, reference sequences to remainder, list all may the making up of same HLA locus, comprise the homozygote of unique sequence and the heterozygote that makes up in twos, calculate the number of the different reads of every kind of combination, the combination that the reads number is maximum is judged to be corresponding HLA allelotype, wherein central reads refers at certain and gives locating point, the reads that participates in comparison to the length ratio on the sequence length on the locating point left side and the right between 0.5 ~ 2.
Further, its analytic target is including, but not limited to the mankind.
Further, described high-flux sequence platform comprises Roche 454 at least, Illumina Solexa, Life Technologies Ion torrent PGM.
Further, described nucleotide sequence comparison instrument is at least BLASTN.
Further, the screening of mispairing described in the step III refers to remove the comparison result that contains mispairing or room in the comparison; Described optimum matching screening refers to only keep the comparison result that the comparison score value is higher than certain threshold values; The screening of described length comprises that one rejects exon length and surpasses 50 bases and compare length less than the comparison results of 50 comparison bases, two reject the exon length less than 50 bases but comparison length less than all results of exon length; The screening of described tail end coupling refers to reject reference sequences can only compare a end among the paired-end read, meanwhile exists again other reference sequences can match the comparison result at its two ends.
Further, for homozygous reference sequences, it calculates gained reads number need multiply by an empirical value 1.05 in the step IV.
The another kind of technology perfect scheme of above-mentioned purpose of the present invention is: the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence, and the HLA allelotype of not included for new is characterized in that comprising step:
I, employing high-flux sequence platform amplification order-checking obtain the reads sequence fragment, adopt from the beginning that joining method obtains the contig sequence in the situation that order-checking reads length can not cover whole HLA allelotrope exon region, and keep the reads sequence that order-checking reads length is enough to cover whole exon region;
II, the HLA allelotrope that comprises in the up-to-date IMGT/HLA database is as reference sequences, and the reads sequence that the step I is obtained or contig sequence and reference sequences adopt nucleotide sequence to compare instrument to compare, obtain comparison result;
III, judge the most close HLA allelotype according to the sequence alignment score value is maximum, and determine its difference, find new allelotype.
Further, its analytic target is including, but not limited to the mankind.
Further, described high-flux sequence platform comprises Roche 454 at least, Illumina Solexa, Life Technologies Ion torrent PGM.
Further, described nucleotide sequence comparison instrument is at least BLASTN.
Compare traditional PCR-SBT method sequence measurement, no matter high throughput sequencing technologies on Financial cost or time cost, all has significant advantage.High throughput sequencing technologies only needs just can read thousands of increments HLA sequence data originally by once testing, and the disposable high resolving power that reaches the HLA somatotype, also can find new allelotrope simultaneously.At aspects such as detecting flux, the quality of data, cost control qualitative leap is arranged, really accomplished " low minute price; high score data ", can avoid repeatedly joining the extra economical load that type causes to the patient, while is classifying method efficiently, also can reduce the cycle of searching the donor that mates with patient HLA, for the valuable time has been striven in treatment.
Description of drawings
Fig. 1 is the required HLA primer of the order-checking schematic diagram of verifying in existing document.
Embodiment
The technical program mainly is divided into two parts, respectively for known included and new HLA allelotype of also not included, and enables to be applicable to various high-flux sequence platforms.
1) for the HLA allelotype of having been included:
The genotypic method of the judgement of standard, that the resulting sequence fragment of amplification order-checking is compared reference database, such as IGMT/HLA(http: //www.ebi.ac.uk/imgt/hla/), if sequence fragment can with reference database in specific reference sequences Perfect Matchings, that allelotype that just can determine institute's test sample is consistent with the reference sequences genotype.And carry out sequence alignment, first-selected BLASTN(http from NCBI: //blast.ncbi.nlm.nih.gov/) nucleotide sequence comparison instrument.The method of this HLA gene type begins from the comparison of BLASTN just.In the 3.8.0 version of 2012-04-12 issue, the IMGT/HLA database comprises 7527 HLA allelotrope altogether, wherein I type A, B, position, Building C comprise respectively 1884,2490,1384 allelotrope, and II type DRB1, DQB1 then comprise respectively 1094,165 allelotrope.Wherein, in these allelotrope sequences, only having small part to contain genome sequence, more is allelic exon nucleotide sequence, and this product fragment with amplification mainly is that exon region is consistent.These allelotrope sequences have consisted of the reference database that this BLASTN compares, and relevant parameter all adopts default parameters in the process of BLASTN comparison.
After the reads sequence that high-flux sequence is obtained was compared reference database IMGT/HLA based on BLASTN, comparison result was processed according to following order:
I) mispairing screening: remove the result (gaps) who contains mispairing (mismatch) or room in the comparison;
Ii) optimum matching screening: namely only have maximum just can being retained of comparison score value (bit score);
Iii) length screening: the one, reject exon length and surpass 50 bases and compare length less than the result of 50 bases, the 2nd, reject exon length less than 50 bases but comparison length less than all results of exon length; And
Iv) paired-end screening (if paired-end order-checking): reject those reference sequences and can only compare a end among the paired-end read, meanwhile exist again other reference sequences can match the comparison result at its two ends.
It should be noted that when order-checking reads compares on the correct reference sequences, these reads can form one and cover regional, continuous tile type (continuous tiling) shape of whole order-checking; And if be not correct reference sequences in the comparison, then can in some position in order-checking zone, become a kind of tile type that staggers (staggered tiling) shape.Difference for this comparison of quantification, at first definition " central reads ": in certain given site, participate in the reads of comparison, between 0.5 ~ 2, such reads is known as " central reads " at the length ratio on the sequence length on the left side, site and the right.Before carrying out the HLA gene type, also need suppose, comparison should be more than the reads of non-correct reference sequences to the reads number of correct reference sequences.So, just can take the method for exhaustion, list all reference sequences combinations, and go out to participate in the reads number of comparison for every a pair of number of combinations.Certainly, because reference sequences is numerous, its combined number is also suitable large, and this method is also inadvisable.So, take didactic strategy, at first get rid of those obvious impossible reference sequences.Here, again define two concepts: MCOR(minimum coverage of overall reads, the minimum of all reads order-checking overburden depth) and MCCR(minimum coverage of central reads, the minimum of the central reads overburden depth that checks order).Wherein: MCOR refers in each site for reference sequences, after filtering, comparison covers the minimum reads number in these sites, MCCR then refers to (neglect 30 base sites of intron exon boundary in the reference sequences) in each site for reference sequences, covers the minimum central reads number in these sites after comparison is filtered.To every reference sequences, calculate respectively MCOR and MCCR value, give up MCOR less than 20 and MCCR less than 10 reference sequences.For remaining reference sequences, list all may the making up of same HLA locus (homozygote is unique sequence, and heterozygote is in twos combination), and calculate the number of the different reads of every kind of combination.Consider that the allelic combination of homozygote only has a reference sequences, the reads number can multiply by an empirical value 1.05.The reference sequences combination that the reads number is the most right is considered to corresponding HLA allelotype.
2) for new HLA allelotype:
Obviously, above-mentioned method based on the reference sequences comparison can only be analyzed the known sample of HLA allelotype, and the new allelotype of not included by database then seems helpless.Therefore, in order to analyze new allelotype, need the extra method of design.Be similar to the principle that from the beginning Velvet software splice short and small reads, exploitation splicing software, the reads that fails all to compare or fail to compare reference sequences to splice those.In brief, with these reads, take 1bp as displacement unit, be divided into the fragment that length is 40 bases.Then make up a figure that direction and weight are arranged, wherein the fragment of each 40bp is as node, and will couple together from two continuous fragments of same read, and the weight setting on limit is the reads number that contains two node fragments.On figure, these weights and a maximum paths splice the contig that obtains exactly.Contig is compared on the reference sequences, just can obtain and the immediate reference sequences of contig and definite its difference.Based on the method, just can find new allelotype.
Above technical scheme is because BLASTN sequence alignment and the order-checking reads versatility of from the beginning splicing, so can expand to easily nearly all high-flux sequence platform of new generation.Simultaneously, owing to additionally having considered the treating method for the new HLA allelotype of not included, so that such scheme is limited by the existing incomplete restriction of HLA allelotrope database self no longer simply, again greatly improved the use range of this scheme.
Following constipation closes the embodiment accompanying drawing, the specific embodiment of the present invention is described in further detail, so that technical solution of the present invention is easier to understand, grasp.
1, design of primers
The required HLA primer of 454 order-checkings is the available primer (G. Bentley et al. 2009) that the direct people such as Bentley that adopt verified in the literature, as shown in Figure 1.These primers all guarantee to amplify all allelotypes as far as possible when design, keep again the specificity of its locus simultaneously.This time experiment, a temporarily exon 2,3 to HLA I type A, B, C, and HLA II type DRB1, DQB1 exon 2 increase.
2, sample
This tests a blood testing sample that has from 10 normal peoples of Ruijin Hospital, and the allelotype in 5 sites such as its HLA A, B, C, DRB1, DQB1 is obtained based on Standard PC R-SBT method by blood testing center, Shanghai City.All 10 samples all are sent to 454 Life Sciences GS FLX and check order.
3, HLA gene high-flux sequence
8 exons of 10 samples carry out respectively pcr amplification.Utilize Agencourt AMPure system (Agencourt Bioscience Corporation, Beverly, MA), short non-specific and primer dimer amplified production is eliminated.Subsequently, on a microplate spectrofluorometer, based on Quant-iT PicoGreen assay (Invitrogen Corporation) these amplicons that are purified are carried out quantitatively.After being diluted to suitable concentration, according to the requirement of 454 GS FLX sequence measurements, carry out Emulsion PCR, bead recovery and pyrosequencing.Finally, obtain 454 high-flux sequence data of 8 exons of 10 samples, its order-checking degree of depth does not wait from 20 ~ 500.
4, HLA gene type
Reference sequence database as the HLA gene type, adopt the 3.8.0 version IMGT/HLA(http of 2012-04-12 issue: //www.ebi.ac.uk/imgt/hla/), and therefrom extract HLA I type A, the exon 2 of B, C, 3 nucleotide fragments, and the exon 2 nucleotide fragments of HLA II type DRB1, DQB1.According to " technological line " part in above-mentioned " summary of the invention ", at first based on NCBI city edition BLASTN instrument, with 8 exon order-checking reads data of these 10 samples, compare in the reference sequence database respectively.Then, the BLASTN comparison result is carried out mispairing screening, optimum matching screening and length screening successively.Because this order-checking is based on the single-end order-checking of 454 GS FLX, need not to carry out the paired-end screening.Then, after adopting heuristic strategies to get rid of obvious impossible reference sequences, for all possible allelotrope combinations of same HLA locus (homozygote is unique sequence, and heterozygote is in twos combination), calculate respectively the number of the different reads of every kind of combination.At last, the combination that the reads number is maximum is identified as corresponding sample HLA allelotype.Result's demonstration, 5 HLA allelotrope site overwhelming majority can both be by correct somatotype in 10 samples.
The present invention is based on the graphical HLA somatotype software of various high-flux sequence platform datas, clinical or biomedical on all have great importance.Compare traditional PCR-SBT method sequence measurement, no matter high throughput sequencing technologies on Financial cost or time cost, all has significant advantage.High throughput sequencing technologies only needs just can read thousands of increments HLA sequence data originally by once testing, and the disposable high resolving power that reaches the HLA somatotype, also can find new allelotrope simultaneously.At aspects such as detecting flux, the quality of data, cost control qualitative leap is arranged, really accomplished " low minute price; high score data ", can avoid repeatedly joining the extra economical load that type causes to the patient, while is classifying method efficiently, also can reduce the cycle of searching the donor that mates with patient HLA, for the valuable time has been striven in treatment.Its main innovation embodies as follows.
1, first can carry out for multiple high-flux sequence platform such as Roche 454, illumina Solexa and Life Technologies Ion torrent PGM technology the HLA somatotype software of high throughput analysis;
2, compare classical PCR-SBT, the time cost of high-flux sequence and Financial cost all significantly reduce;
3, simultaneously, resolving power and accuracy that high-flux sequence detects also significantly improve, and can join the required high resolving power of type by the disposable HLA of reaching, and predictablity rate is more than 95%;
4, easily graphical, so that do not have the clinician of computer background or biologist can grasp rapidly its using method;
5, can find new allelotrope;
6, after the HLA gene database upgrades, can again carry out interpretation of result, namely the somatotype result can upgrade.

Claims (8)

1. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence for HLA allelotype known and that included, is characterized in that comprising step:
I, employing high-flux sequence platform amplification order-checking obtain the reads sequence fragment;
II, the HLA allelotrope that comprises in the up-to-date IMGT/HLA database is as reference sequences, and the reads sequence fragment that the order-checking of step I is obtained is compared instrument with reference sequences employing nucleotide sequence and compared, and obtains comparison result;
III, to the comparison result carry out mispairing, optimum matching, length and/or tail end coupling multiplex screening, filter and optimize;
IV, definition central reads, the minimum order-checking overburden depth MCOR of all reads, the minimum order-checking overburden depth MCCR of central reads, MCOR and the MCCR value of calculating every reference sequences after the step III is filtered, and give up MCOR less than 20 and MCCR less than 10 reference sequences, reference sequences to remainder, list all may the making up of same HLA locus, comprise the homozygote of unique sequence and the heterozygote that makes up in twos, calculate the number of the different reads of every kind of combination, the combination that the reads number is maximum is judged to be corresponding HLA allelotype, wherein central reads refers at certain and gives locating point, the reads that participates in comparison to the length ratio on the sequence length on the locating point left side and the right between 0.5 ~ 2.
2. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence according to claim 1, it is characterized in that: described high-flux sequence platform comprises Roche 454 at least, Illumina Solexa, Life Technologies Ion torrent PGM.
3. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence according to claim 1, it is characterized in that: described nucleotide sequence comparison instrument is at least BLASTN.
4. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence according to claim 1 is characterized in that: the screening of mispairing described in the step III refers to remove the comparison result that contains mispairing or room in the comparison; Described optimum matching screening refers to only keep the comparison result that the comparison score value is higher than certain threshold values; The screening of described length comprises that one rejects exon length and surpasses 50 bases and compare length less than the comparison results of 50 comparison bases, two reject the exon length less than 50 bases but comparison length less than all results of exon length; The screening of described tail end coupling refers to reject reference sequences can only compare a end among the paired-end read, meanwhile exists again other reference sequences can match the comparison result at its two ends.
5. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence according to claim 1 is characterized in that: for homozygous reference sequences, it calculates gained reads number need multiply by an empirical value 1.05 in the step IV.
6. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence, the HLA allelotype of not included for new is characterized in that comprising step:
I, employing high-flux sequence platform amplification order-checking obtain the reads sequence fragment, adopt from the beginning that joining method obtains the contig sequence in the situation that order-checking reads length can not cover whole HLA allelotrope exon region, and keep the reads sequence that order-checking reads length is enough to cover whole exon region;
II, the HLA allelotrope that comprises in the up-to-date IMGT/HLA database is as reference sequences, and the reads sequence that the step I is obtained or contig sequence and reference sequences adopt nucleotide sequence to compare instrument to compare, obtain comparison result;
III, judge the most close HLA allelotype according to the sequence alignment score value is maximum, and determine its difference, find new allelotype.
7. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence according to claim 6, it is characterized in that: described high-flux sequence platform comprises Roche 454 at least, Illumina Solexa, Life Technologies Ion torrent PGM.
8. the HLA methods of genotyping of histocompatibility antigen determinant gene high-flux sequence according to claim 6, it is characterized in that: described nucleotide sequence comparison instrument is at least BLASTN.
CN201310058260XA 2013-02-25 2013-02-25 HLA (histocompatibility locus antigen) genetic typing method of HLA determinant gene through high-throughput sequencing Pending CN103074444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310058260XA CN103074444A (en) 2013-02-25 2013-02-25 HLA (histocompatibility locus antigen) genetic typing method of HLA determinant gene through high-throughput sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310058260XA CN103074444A (en) 2013-02-25 2013-02-25 HLA (histocompatibility locus antigen) genetic typing method of HLA determinant gene through high-throughput sequencing

Publications (1)

Publication Number Publication Date
CN103074444A true CN103074444A (en) 2013-05-01

Family

ID=48151137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310058260XA Pending CN103074444A (en) 2013-02-25 2013-02-25 HLA (histocompatibility locus antigen) genetic typing method of HLA determinant gene through high-throughput sequencing

Country Status (1)

Country Link
CN (1) CN103074444A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104894271A (en) * 2015-06-10 2015-09-09 天津诺禾致源生物信息科技有限公司 Method and device for detecting gene fusion
WO2017139945A1 (en) * 2016-02-18 2017-08-24 深圳华大基因研究院 Typing method and device
CN108573127A (en) * 2017-03-14 2018-09-25 深圳华大基因科技服务有限公司 Processing method and its application of initial data is sequenced in a kind of nucleic acid third generation
CN111312332A (en) * 2020-02-13 2020-06-19 国家卫生健康委科学技术研究所 Biological information processing method and device based on HLA genes and terminal
CN111607640A (en) * 2020-06-04 2020-09-01 北京新抗元生物技术有限公司 Quantitative detection method for expression quantity of two alleles in pair of HLA alleles
CN112509638A (en) * 2020-12-04 2021-03-16 深圳荻硕贝肯精准医学有限公司 Analysis method and analysis processing device for human HLA chromosome region heterozygosity loss

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUNLIN WANG ET AL.: "High-throughput, high-fidelity HLA genotyping with deep sequencing", 《PNAS》 *
郝桂琴等: "分子生物学技术在HLA分型中的应用", 《中国实验血液学杂志》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104894271A (en) * 2015-06-10 2015-09-09 天津诺禾致源生物信息科技有限公司 Method and device for detecting gene fusion
CN104894271B (en) * 2015-06-10 2020-02-21 天津诺禾致源生物信息科技有限公司 Method and device for detecting gene fusion
WO2017139945A1 (en) * 2016-02-18 2017-08-24 深圳华大基因研究院 Typing method and device
CN108350498A (en) * 2016-02-18 2018-07-31 深圳华大生命科学研究院 Classifying method and device
CN108573127A (en) * 2017-03-14 2018-09-25 深圳华大基因科技服务有限公司 Processing method and its application of initial data is sequenced in a kind of nucleic acid third generation
CN108573127B (en) * 2017-03-14 2021-04-27 深圳华大基因科技服务有限公司 Processing method and application of original data of third-generation nucleic acid sequencing
CN111312332A (en) * 2020-02-13 2020-06-19 国家卫生健康委科学技术研究所 Biological information processing method and device based on HLA genes and terminal
CN111312332B (en) * 2020-02-13 2020-10-30 国家卫生健康委科学技术研究所 Biological information processing method and device based on HLA genes and terminal
CN111607640A (en) * 2020-06-04 2020-09-01 北京新抗元生物技术有限公司 Quantitative detection method for expression quantity of two alleles in pair of HLA alleles
CN111607640B (en) * 2020-06-04 2022-10-28 角井(北京)生物技术有限公司 Quantitative detection method for expression quantity of two alleles in pair of HLA alleles
CN112509638A (en) * 2020-12-04 2021-03-16 深圳荻硕贝肯精准医学有限公司 Analysis method and analysis processing device for human HLA chromosome region heterozygosity loss
CN112509638B (en) * 2020-12-04 2021-12-03 深圳荻硕贝肯精准医学有限公司 Analysis method and analysis processing device for human HLA chromosome region heterozygosity loss

Similar Documents

Publication Publication Date Title
Yoshimura et al. Recompleting the Caenorhabditis elegans genome
US20230203573A1 (en) Methods for detection of donor-derived cell-free dna
Bai et al. Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads
Baudhuin et al. Confirming variants in next-generation sequencing panel testing by Sanger sequencing
Zhou et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease
Erlich et al. Next-generation sequencing for HLA typing of class I loci
AU2015374259B2 (en) Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results
CN107849612B (en) Alignment and variant sequencing analysis pipeline
EP2718862B1 (en) Method for assembly of nucleic acid sequence data
Mandelker et al. Comprehensive diagnostic testing for stereocilin: an approach for analyzing medically important genes with high homology
AU2019310041A1 (en) Methods and systems for adjusting tumor mutational burden by tumor fraction and coverage
CN103074444A (en) HLA (histocompatibility locus antigen) genetic typing method of HLA determinant gene through high-throughput sequencing
JP2021535489A (en) Detection of microsatellite instability in cell-free DNA
CN104531883A (en) PKD1 gene mutation detection kit and detection method
CN114026646A (en) System and method for assessing tumor score
Yin et al. Challenges in the application of NGS in the clinical laboratory
Akbari et al. Parent-of-origin detection and chromosome-scale haplotyping using long-read DNA methylation sequencing and Strand-seq
TWI675918B (en) Universal haplotype-based noninvasive prenatal testing for single gene diseases
Nakaoka et al. A systems genetics approach provides a bridge from discovered genetic variants to biological pathways in rheumatoid arthritis
Jiménez-Barrón et al. Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline
US20220088174A1 (en) Genomic variants in ig gene regions and uses of same
Hung et al. Genetic diversity and structural complexity of the killer-cell immunoglobulin-like receptor gene complex: A comprehensive analysis using human pangenome assemblies
Zhao et al. GTQC: automated genotyping array quality control and report
Voelkerding et al. Next-Generation Sequencing: Principles for Clinical Application
Mikocziova et al. Polymorphisms in immunoglobulin heavy chain variable genes and their upstream regions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130501