TWI518538B - Predicting hla genotypes using unphased and flanking single-nucleotide polymorphisms in han chinese population - Google Patents

Predicting hla genotypes using unphased and flanking single-nucleotide polymorphisms in han chinese population Download PDF

Info

Publication number
TWI518538B
TWI518538B TW103114074A TW103114074A TWI518538B TW I518538 B TWI518538 B TW I518538B TW 103114074 A TW103114074 A TW 103114074A TW 103114074 A TW103114074 A TW 103114074A TW I518538 B TWI518538 B TW I518538B
Authority
TW
Taiwan
Prior art keywords
single nucleotide
polymorphic
hla
collection
genotype
Prior art date
Application number
TW103114074A
Other languages
Chinese (zh)
Other versions
TW201441857A (en
Inventor
范盛娟
張天鈞
楊偉勛
陳沛隆
謝璦如
陳垣崇
朱正中
Original Assignee
中央研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中央研究院 filed Critical 中央研究院
Publication of TW201441857A publication Critical patent/TW201441857A/en
Application granted granted Critical
Publication of TWI518538B publication Critical patent/TWI518538B/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6881Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Description

使用單核苷酸多型性預測漢人白血球抗原基因型之方法及裝置 Method and device for predicting Han human leukocyte antigen genotype using single nucleotide polymorphism

本發明係關於具有族群專一性之單核苷酸多型性可預測人類白血球抗原對偶基因,特別的是,本發明係關於使用漢人之單核苷酸多型性來預測人類白血球抗原基因型。 The present invention relates to a single nucleotide polymorphism predictive human leukocyte antigen dual gene having ethnic specificity. In particular, the present invention relates to the use of a single nucleotide polymorphism of a Han Chinese to predict a human leukocyte antigen genotype.

人類白血球組織抗原基因群位於第6對染色體上,分為主要組織相容性複合體第I類(HLA-A、HLA-B、及HLA-C)與主要組織相容性複合體第II類(HLA-DR、HLA-DQ、及HLA-DP)之對偶基因(alleles),而個體單一基因之多種對偶基因的多形性(polymorphism)型態,造成組織或器官移植時的移植物排斥(graft rejection)及移植物排斥宿主疾病(graft-versus-host diseases)。人類白血球抗原對偶基因亦在群體遺傳學及免疫相關疾病狀態(immune-related disease status)中扮演重要的角色。再者,先前的比較研究顯示免疫系統通常有強的選擇性壓力(selective pressure),其係可能由病毒-宿主交互作用所造成。因為這些選擇性壓力,族群間的比較揭示人類白血球抗原對偶基因之對偶基因分佈的連鎖不平衡(linkage disequilibrium)及高可變形態(variable patterns)。 The human leukocyte tissue antigen gene group is located on the sixth pair of chromosomes and is divided into major histocompatibility complex class I (HLA-A, HLA-B, and HLA-C) and major histocompatibility complex class II. The alleles of (HLA-DR, HLA-DQ, and HLA-DP), and the polymorphism of multiple dual genes of individual single genes, resulting in graft rejection during tissue or organ transplantation ( Graft rejection) and graft-versus-host diseases. The human leukocyte antigen dual gene also plays an important role in population genetics and immune-related disease status. Furthermore, previous comparative studies have shown that the immune system usually has a strong selective pressure, which may be caused by virus-host interactions. Because of these selective stresses, comparisons between ethnic groups revealed linkage disequilibrium and variable patterns of the dual gene distribution of human leukocyte antigen-to-gene.

人類白血球組織抗原(human leukocyte antigen,HLA)的遺傳變異與免疫功能、自體免疫疾病和某些癌症有關。至今,大規模的研究由實驗(經由血清學或PCR)取得人類白血球抗原基因之配型依然耗時與昂貴。因此,只需要較低價格的單核苷酸多型性(single-nucleotide polymorphisms,SNPs)被廣泛使用於預測白血球抗原之基因型以節省經費與 實驗時間。然而,大部分的人類白血球抗原基因型預測模型只有高加索人樣本,少有研究報導包含非高加索人之樣本,而在不同的種族間其人類白血球組織抗原基因類別分布相異。 The genetic variation of human leukocyte antigen (HLA) is associated with immune function, autoimmune diseases and certain cancers. To date, large-scale studies have been time-consuming and expensive to acquire human leukocyte antigen genes by experiments (via serology or PCR). Therefore, only low-cost single-nucleotide polymorphisms (SNPs) are needed to predict genotypes of leukocyte antigens to save money. Experimental time. However, most of the human leukocyte antigen genotype prediction models are only Caucasian samples, and few studies report samples containing non-Caucasians, and their human leukocyte tissue antigen gene classes are distributed differently among different races.

Zheng等人於2011年BMC genetics期刊中強調,在建構完預測白血球組織抗原預測模型後,此模型不能使用於不同種族之白血球抗原基因型。因此,Ayele等人於2011年於PLOS ONE期刊中,已經針對非洲人建構出其特有白血球組織抗原預測模型;然而,目前尚未有亞洲人的人類白血球組織抗原預測模型。因此,建構出種族獨特之白血球組織抗原預測模型有其必要性,尤其是漢人的之白血球組織抗原預測模型更有其需求。 Zheng et al. in the 2011 BMC genetics journal emphasized that this model cannot be used for genotypes of white blood cell antigens of different races after constructing a predictive model for predicting white blood cell tissue antigens. Therefore, Ayele et al. in the PLOS ONE journal in 2011 have constructed a specific white blood cell tissue antigen prediction model for Africans; however, there is currently no Asian human leukocyte antigen prediction model. Therefore, it is necessary to construct a racially unique white blood cell tissue antigen prediction model, especially the white blood cell tissue antigen prediction model of Han people.

緣此,本發明提供一種預測人類白血球抗原對偶基因的方法,其步驟包含:(a)提供一人類核酸樣本;(b)判別該人類核酸樣本之一單核苷酸多型性集合的基因型,該單核苷酸多型性集合集合包含位於人類白血球抗原基因上之各個不同的單核苷酸多型性;(c)使用一預測模型分析步驟(b)中各單核苷酸多型性的基因型以獲得一計算值,其中該預測模型係使用單核苷酸多型性基因型來預測人類白血球抗原對偶基因;以及(d)依據步驟(c)所獲得之計算值預測該人類樣本的人類白血球抗原對偶基因型;且其中該樣本係為亞洲人族群,較佳為漢人族群。藉此,透過判別本發明中由預測模型所發現之單核苷酸多型性集合,即可以低成本方式檢測少數之單核苷酸多型性位置,而準確預測人類白血球抗原對偶基因型。 Accordingly, the present invention provides a method for predicting a human leukocyte antigen dual gene, the steps comprising: (a) providing a human nucleic acid sample; and (b) identifying a genotype of a single nucleotide polymorphic collection of the human nucleic acid sample. The single nucleotide polymorphic collection comprises various single nucleotide polymorphisms located on the human leukocyte antigen gene; (c) analyzing a single nucleotide polymorphism in step (b) using a predictive model a genotype to obtain a calculated value, wherein the predictive model uses a single nucleotide polytype genotype to predict a human leukocyte antigen dual gene; and (d) predicts the human based on the calculated value obtained in step (c) The human leukocyte antigen dual genotype of the sample; and wherein the sample is an Asian ethnic group, preferably a Han Chinese ethnic group. Thereby, by discriminating the single nucleotide polymorphism set found by the prediction model in the present invention, a small number of single nucleotide polymorphism positions can be detected at a low cost, and the human leukocyte antigen dual genotype can be accurately predicted.

本發明之預測方法中,該單核苷酸多型性集合所包含之各單核苷酸多型性係來自(1)HLA-A、(2)HLA-B、(3)HLA-C、(4)HLA-DPB1、(5)HLA-DQB1、以及(6)HLA-DRB1基因,其中該來自(1)HLA-A基因之係選自於一第1單核苷酸多型性集合、一第2單核苷酸多型性集合、一第3單核苷酸多型性集合、及一第4單核苷酸多型性集合所組成的群組,其中(i)該第1單核苷酸多型性集合係包括:rs1633085、rs2254071、rs407238、rs9258881、rs2975046、rs2735096、rs417162、rs9260954、rs6917477、 rs6457144、rs9261394、及rs2523990;(ii)該第2單核苷酸多型性集合係包括:rs4122198、rs16895757、rs1632973、rs9357086、rs11759549、rs3115628、rs3094165、rs2734925、rs2517755、rs2256919、rs11756025、rs7382061、rs6457144、rs2517646、及rs7744914;(iii)該第3單核苷酸多型性集合係包括:rs3094165、rs9258883、rs3132714、rs1611493、rs2524005、rs2860580、rs12665039、rs6457109、rs3869062、rs3893464、rs5009448、rs2571375、rs7758512、及rs9261394;(iv)該第4單核苷酸多型性集合係包括:rs2523409、rs1611133、rs3115628、rs2517859、rs1611732、rs2523998、rs2860580、rs12202296、rs2248153、rs2975046、rs6457109、rs5009448、rs9260932、及rs6457144;該來自(2)HLA-B基因之單核苷酸多型性係選自於一第5單核苷酸多型性集合、一第6單核苷酸多型性集合、一第7單核苷酸多型性集合、及一第8單核苷酸多型性集合所組成的群組,其中(i)該第5單核苷酸多型性集合係包括:rs3130944、rs3130532、rs3130534、rs3134762、rs16899207、rs2524089、rs9366778、rs2524166、rs9295984、rs4394275、rs9378249、rs2523534、rs9266406、rs2844558、rs5022119 rs3099848、rs4081552、rs2848716、rs2596454、及rs2248462;(ii)該第6單核苷酸多型性集合係包括:rs11966319、rs2853948、rs6906846、rs9378228、rs2524051、rs9366778、rs16867947、rs4394274、rs4394275、rs2523591、rs9501572、rs7761068、rs2523535、rs9266406、rs5006724、rs13198903、rs9266669、rs9266689、rs3099849、rs2442749、rs1051796、rs2596464、rs3099836、及rs3131622;(iii)該第7單核苷酸多型性集合係包括:rs9264868、rs9264942、rs3094691、rs2156875、rs2523619、rs2442719、rs2596501、rs2523589、rs2523554、rs2844573、rs9266395、rs9266440、rs9295986、rs2442749、rs2596560、rs3128982、rs2284178、及rs7758090;(iv)該第8單核苷酸多型性集合係包括:rs3094691、rs7453967、rs4394274、rs4394275、rs2596509、rs2596501、rs1058026、rs2523591、rs2523589、rs2523554、rs2523545、rs9501572、rs2844575、rs9266395、rs9266406、rs5006725、rs9295986、rs6933050、rs4959068、rs5022119、rs13198903、 rs9266689、rs2251396、rs1051796、rs3094584、rs9765960、及rs3128982;該來自(3)HLA-C基因之單核苷酸多型性係選自於一第9單核苷酸多型性集合、一第10單核苷酸多型性集合、一第11單核苷酸多型性集合、及一第12單核苷酸多型性集合所組成的群組,其中(i)該第9單核苷酸多型性集合係包括:rs2073724、rs3130713、rs3130531、rs3095250、rs3130532、rs3130534、rs2844615、rs6906846、rs2524067、rs7382297、rs2394963、rs2524095、rs16899203、rs9366778、rs9295970、及rs2523534;(ii)該第10單核苷酸多型性集合係包括:rs3130712、rs28480108、rs3134762、rs19966319、rs9264523、rs3132488、rs3134745、rs3130693、rs3132486、rs2853948、rs6906846、rs9378228、rs6457372、rs2394963、rs2524057、rs12191877、及rs9366776;(iii)該第11單核苷酸多型性集合係包括:rs2516049、rs2858870、rs660895、rs532098、rs3129763、rs1063355、rs9275141、rs9275184、rs7774434、rs7775228、及rs9275224;(iv)該第12單核苷酸多型性集合係包括:rs9263957、rs9263969、rs3134762、rs11966319、rs2248880、rs9264532、rs2524099、rs2074488、rs2395471、rs5010528、rs13207315、rs3132488、rs3130693、rs9391714、rs4386816、rs2524057、rs16899205、及rs9295970;該來自(4)HLA-DPB1基因之單核苷酸多型性係選自於一第13單核苷酸多型性集合、一第14單核苷酸多型性集合、一第15單核苷酸多型性集合、及一第16單核苷酸多型性集合所組成的群組,其中(i)該第13單核苷酸多型性集合係包括:rs3128955、rs3130588、rs9277194、rs9348904、rs9296073、rs2856816、rs3135021、rs1431403、rs3128963、rs3117229、rs7763822、rs2295120、rs3117242、rs6937034、及rs1003979;(ii)該第14單核苷酸多型性集合係包括:rs9296068、rs9277183、rs3135402、rs9348904、rs2856830、rs9296073、rs2071350、rs1431402、rs1431403、rs9277550、rs3128963、rs3117229、rs9277567、rs3128918、及rs6937034;(iii)該第15單核苷酸多型性集合係包括:rs206769、rs6920606、rs375912、rs1431399、rs987870、rs3135021、rs9277535、rs9277554、rs10484569、 rs2281390、rs3128917、rs2281388、rs3130215、及rs2269346;(iv)該第16單核苷酸多型性集合係包括:rs2216264、rs423639、rs3097669、rs987870、rs1431402、rs1431403、rs9277378、rs9277535、rs9277550、rs9277554、rs9277565、rs2281390、rs2281388、rs3130215、rs6937034、rs6937061、及rs2395357;該來自(5)HLA-DQB1基因之單核苷酸多型性係選自於一第17單核苷酸多型性集合、一第18單核苷酸多型性集合、一第19單核苷酸多型性集合、及一第20單核苷酸多型性集合所組成的群組,其中(i)該第17單核苷酸多型性集合係包括:rs9269186、rs9270986、rs615672、rs3129768、rs9272219、rs9272346、rs6908943、rs9275134、rs9469220、rs6457617、rs2647046、rs2858308、及rs9275418;(ii)該第18單核苷酸多型性集合係包括:rs2647073、rs502055、rs3129768、rs9272535、rs9272723、rs34485459、rs3129716、rs7775228、rs6469219、rs5000634、rs6457617、及rs9275418;(iii)該第19單核苷酸多型性集合係包括:rs2516049、rs2858870、rs660895、rs532098、rs3129763、rs1063355、rs9275141、rs9275184、rs7774434、rs7775228、及rs9275224;(iv)該第20單核苷酸多型性集合係包括:rs17533090、rs9272219、rs17211510、rs41269947、rs34485459、rs1063355、rs9275141、rs3129716、rs7774434、rs9405119、rs9469219、rs9469220、及rs9275224該來自(6)HLA-DRB1基因之單核苷酸多型性係選自於一第21單核苷酸多型性集合、一第22單核苷酸多型性集合、一第23單核苷酸多型性集合、及一第24單核苷酸多型性集合所組成的群組,其中(i)該第21單核苷酸多型性集合係包括:rs9268831、rs9268861、rs7747521、rs9268877、rs9269186、rs2027852、rs615672、rs3129768、rs9272219、rs9272346、rs9275134、rs7775228、rs9469220、rs6457617、rs2647046、及rs2858308;(ii)該第22單核苷酸多型性集合係包括:rs9268877、rs4410767、rs7749092、rs17210980、rs2647073、rs615672、rs674343、rs502771、rs3997872、rs9271367、rs9271720、rs2187668、rs34485459、rs3129716、及rs9405119; (iii)該第23單核苷酸多型性集合係包括:rs9405098、rs3129871、rs13209234、rs9268832、rs6903608、rs602875、rs660895、rs9271366、rs3129769、rs17211510、rs2187668、rs9275141、rs9275184、rs9275383、rs2856717、rs2858305、rs13192471、及rs3104405;(iv)該第24單核苷酸多型性集合係包括:rs2395175、rs9405035、rs9268831、rs6903608、rs9268877、rs9269186、rs7749092、rs2027852、rs17210980、rs2516049、rs615672、rs660895、rs674313、rs502771、rs3997872、rs9271366、rs2187668、rs34485459、rs9275141、rs7755224、rs3129716、及rs3104404。 In the prediction method of the present invention, each single nucleotide polymorphism contained in the mononucleotide polymorphic collection is derived from (1) HLA-A, (2) HLA-B, (3) HLA-C, (4) HLA-DPB1, (5) HLA-DQB1, and (6) HLA-DRB1 gene, wherein the line derived from the (1) HLA-A gene is selected from a first mononucleotide polytype set, a group consisting of a second single nucleotide polymorphic set, a third single nucleotide polymorphic set, and a fourth single nucleotide polymorphic set, wherein (i) the first single Nucleotide polymorphic collections include: rs1633085, rs2254071, rs407238, rs9258881, rs2975046, rs2735096, rs417162, rs9260954, rs6917477, Rs6457144, rs9261394, and rs2523990; (ii) the second single nucleotide polymorphic collection includes: rs4122198, rs16895757, rs1632973, rs9357086, rs11759549, rs3115628, rs3094165, rs2734925, rs2517755, rs2256919, rs11756025, rs7382061, rs6457144, Rs2517646, and rs7744914; (iii) the third single nucleotide polymorphic collection includes: rs3094165, rs9258883, rs3132714, rs1611493, rs2524005, rs2860580, rs12665039, rs6457109, rs3869062, rs3893464, rs5009448, rs2571375, rs7758512, and rs9261394 (iv) the fourth mononucleotide polymorphic collection includes: rs2523409, rs1611133, rs3115628, rs2517859, rs1611732, rs2523998, rs2860580, rs12202296, rs2248153, rs2975046, rs6457109, rs5009448, rs9260932, and rs6457144; 2) The single nucleotide polymorphism of the HLA-B gene is selected from a fifth mononucleotide polytype set, a sixth single nucleotide polytype set, and a seventh single nucleotide set. a group consisting of a type set, and an 8th mononucleotide polytype set, wherein (i) the 5th single nucleotide polytype set includes : rs3130944, rs3130532, rs3130534, rs3134762, rs16899207, rs2524089, rs9366778, rs2524166, rs9295984, rs4394275, rs9378249, rs2523534, rs9266406, rs2844558, rs5022119 rs3099848, rs4081552, rs2848716, rs2596454, and rs2248462; (ii) the sixth single nucleoside The acid polymorphic collection includes: rs11966319, rs2853948, rs6906846, rs9378228, rs2524051, rs9366778, rs16867947, rs4394274, rs4394275, rs2523591, rs9501572, rs7761068, rs2523535, rs9266406, rs5006724, rs13198903, rs9266669, rs9266689, rs3099849, rs2442749, rs1051796, Rs2596464, rs3099836, and rs3131622; (iii) the 7th single nucleotide polymorphic collection includes: rs9264868, rs9264942, rs3094691, rs2156875, rs2523619, rs2442719, rs2596501, rs2523589, rs2523554, rs2844573, rs9266395, rs9266440, rs9295986, Rs2442749, rs2596560, rs3128982, rs2284178, and rs7758090; (iv) the eighth single nucleotide polymorphic collection includes: rs3094691, rs7453967, rs4394274, rs4394275, rs2596509, rs2596501, rs1058026, rs25235 91, rs2523589, rs2523554, rs2523545, rs9501572, rs2844575, rs9266395, rs9266406, rs5006725, rs9295986, rs6933050, rs4959068, rs5022119, rs13198903, Rs9266689, rs2251396, rs1051796, rs3094584, rs9765960, and rs3128982; the single nucleotide polymorphism derived from the (3) HLA-C gene is selected from a ninth single nucleotide polymorphism set, a tenth single a group consisting of a nucleotide polymorphic collection, an 11th single nucleotide polymorphic collection, and a 12th single nucleotide polymorphic collection, wherein (i) the ninth single nucleotide The type of collection includes: rs2073724, rs3130713, rs3130531, rs3095250, rs3130532, rs3130534, rs2844615, rs6906846, rs2524067, rs7382297, rs2394963, rs2524095, rs16899203, rs9366778, rs9295970, and rs2523534; (ii) the 10th single nucleotide The type of collection includes: rs3130712, rs28480108, rs3134762, rs19966319, rs9264523, rs3132488, rs3134745, rs3130693, rs3132486, rs2853948, rs6906846, rs9378228, rs6457372, rs2394963, rs2524057, rs12191877, and rs9366776; (iii) the 11th single nucleoside The acid polymorphic collection includes: rs2516049, rs2858870, rs660895, rs532098, rs3129763, rs1063355, rs9275141, rs9275184, rs7774434, rs7775228, and rs9275224 (iv) the 12th single nucleotide polymorphic collection includes: rs9263957, rs9263969, rs3134762, rs11966319, rs2248880, rs9264532, rs2524099, rs2074488, rs2395471, rs5010528, rs13207315, rs3132488, rs3130693, rs9391714, rs4386816, rs2524057, Rs16899205, and rs9295970; the single nucleotide polymorphism derived from the (4) HLA-DPB1 gene is selected from a 13th single nucleotide polymorphism set, a 14th single nucleotide polytype set, a group consisting of a 15th single nucleotide polymorphic set and a 16th single nucleotide polymorphic set, wherein (i) the 13th single nucleotide polymorphic collection comprises: rs3128955, Rs3130588, rs9277194, rs9348904, rs9296073, rs2856816, rs3135021, rs1431403, rs3128963, rs3117229, rs7763822, rs2295120, rs3117242, rs6937034, and rs1003979; (ii) the 14th single nucleotide polymorphic collection includes: rs9296068, rs9277183, Rs3135402, rs9348904, rs2856830, rs9296073, rs2071350, rs1431402, rs1431403, rs9277550, rs3128963, rs3117229, rs9277567, rs3128918, and rs6937034; (iii) the 15th Nucleotide polymorphic collections include: rs206769, rs6920606, rs375912, rs1431399, rs987870, rs3135021, rs9277535, rs9277554, rs10484569, Rs2281390, rs3128917, rs2281388, rs3130215, and rs2269346; (iv) the 16th single nucleotide polymorphic collection includes: rs2216264, rs423639, rs3097669, rs987870, rs1431402, rs1431403, rs9277378, rs9277535, rs9277550, rs9277554, rs9277565, Rs2281390, rs2281388, rs3130215, rs6937034, rs6937061, and rs2395357; the single nucleotide polymorphism derived from the (5) HLA-DQB1 gene is selected from a 17th single nucleotide polymorphism set, an 18th single a group consisting of a nucleotide polymorphic collection, a 19th single nucleotide polymorphic collection, and a 20th single nucleotide polymorphic collection, wherein (i) the 17th single nucleotide The type of collection includes: rs9269186, rs9270986, rs615672, rs3129768, rs9272219, rs9272346, rs6908943, rs9275134, rs9469220, rs6457617, rs2647046, rs2858308, and rs9275418; (ii) the 18th single nucleotide polymorphic collection includes: Rs2647073, rs502055, rs3129768, rs9272535, rs9272723, rs34485459, rs3129716, rs7775228, rs6469219, rs5000634, rs6457617, and rs9275418; (iii) the 19th single nucleotide polymorphism The collection system includes: rs2516049, rs2858870, rs660895, rs532098, rs3129763, rs1063355, rs9275141, rs9275184, rs7774434, rs7775228, and rs9275224; (iv) the 20th single nucleotide polymorphism collection includes: rs17533090, rs9272219, rs17211510, Rs41269947, rs34485459, rs1063355, rs9275141, rs3129716, rs7774434, rs9405119, rs9469219, rs9469220, and rs9275224. The single nucleotide polymorphism derived from the (6) HLA-DRB1 gene is selected from a 21th single nucleotide polymorphism. a group consisting of a sex collection, a 22nd mononucleotide polytype set, a 23rd mononucleotide polytype set, and a 24th single nucleotide polytype set, wherein (i) the group The 21st mononucleotide polymorphic collection includes: rs9268831, rs9268861, rs7747521, rs9268877, rs9269186, rs2027852, rs615672, rs3129768, rs9272219, rs9272346, rs9275134, rs7775228, rs9469220, rs6457617, rs2647046, and rs2858308; (ii) The 22nd mononucleotide polymorphic collection includes: rs9268877, rs4410767, rs7749092, rs17210980, rs2647073, rs615672, rs674343, rs502771, rs399787 2, rs9271367, rs9271720, rs2187668, rs34485459, rs3129716, and rs9405119; (iii) the 23rd mononucleotide polymorphic collection includes: rs9405098, rs3129871, rs13209234, rs9268832, rs6903608, rs602875, rs660895, rs9271366, rs3129769, rs17211510, rs2187668, rs9275141, rs9275184, rs9275383, rs2856717, rs2858305, rs13192471 And rs3104405; (iv) the 24th single nucleotide polymorphic collection includes: rs2395175, rs9405035, rs9268831, rs6903608, rs9268877, rs9269186, rs7749092, rs2027852, rs17210980, rs2516049, rs615672, rs660895, rs674313, rs502771, rs3997872 , rs9271366, rs2187668, rs34485459, rs9275141, rs7755224, rs3129716, and rs3104404.

本發明之另一目的係提供一種預測人類白血球抗原對偶基因之裝置,係包含不超過200個核苷酸探針,其中該探針可檢測上述單核苷酸多型性;而其中該探針係固定於該裝置上。 Another object of the present invention is to provide a device for predicting a human leukocyte antigen dual gene comprising no more than 200 nucleotide probes, wherein the probe can detect the above-described single nucleotide polymorphism; It is fixed to the device.

本發明建構對亞洲人種具有族群專一性之人類白血球抗原基因型預測模型,包含437個具有Affymetrix 5.0及Illumina 550K單核苷酸多型性的漢人血液樣本,其中214個樣本亦有在Affymetrix 6.0單核苷酸多型性的數據。所有個體均在6個人類白血球抗原基因座(loci)上分型至一4位數分辨率並且用於人類白血球抗原基因型預測模型中作為驗證(training)及測試(testing)集。本發明之結果顯示較大的樣本數與較高的單核苷酸多型性密度通常會導致較準確的預測。此外,與本發明亞洲人種中人類白血球抗原對偶基因有關的的最佳化的flanking區域(flanking region)通常較高加索人的flanking區域為短。在最準確的模型中,flanking區域為橫跨不同晶片數據集之人類白血球抗原對偶基因的20-200kb(中位數為70kb)。當該人類白血球抗原對偶基因較短時,該flanking區域增加,而該人類白血球抗原對偶基因密度上升。本發明之最佳模型在亞洲人種中提供準確的預測。此外,本發明亦提供針對亞洲族群人類白血球抗原基因型預測模型之實際的建議,其係關於對偶基因區域、晶片、及填補(Imputation)。本發明大約只需要20個單核甘酸多型性就可正確的預測一個白血球抗原基因型,因此只需要1/10的價格就可以得到白血球抗原基因型的資訊。 The present invention constructs a human leukocyte antigen genotype prediction model with ethnic specificity for Asian ethnic groups, comprising 437 Han Chinese blood samples with Affymetrix 5.0 and Illumina 550K single nucleotide polymorphism, of which 214 samples are also found in Affymetrix 6.0. Single nucleotide polymorphism data. All individuals were typed on a 6 human leukocyte antigen locus (loci) to a 4-digit resolution and used in the human leukocyte antigen genotype prediction model as a training and testing set. The results of the present invention show that a larger number of samples and a higher single nucleotide polymorphism density generally result in more accurate predictions. Furthermore, the flanking region optimized for the human leukocyte antigen dual gene in the Asian species of the present invention is generally shorter than the flanking region of the donor. In the most accurate model, the flanking region is 20-200 kb (median 70 kb) of the human leukocyte antigen dual gene across different wafer data sets. When the human leukocyte antigen dual gene is shorter, the flanking region increases, and the human leukocyte antigen dual gene density increases. The best model of the invention provides accurate predictions among Asian species. In addition, the present invention also provides practical recommendations for a population-specific human leukocyte antigen genotypic prediction model for dual gene regions, wafers, and Imputation. The present invention requires only about 20 mononuclear acid polymorphisms to correctly predict a leukocyte antigen genotype, so that a white blood cell antigen genotype information can be obtained at a price of only 1/10.

以下將配合圖式進一步說明本發明的實施方式,以下所列舉的實施例係用以闡明本發明,並非用以限定本發明之範圍,任何熟習此技 藝者,在不脫離本發明之精神和範圍內,當可做些許更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 The embodiments of the present invention will be further described in conjunction with the following drawings, which are set forth to illustrate the invention and not to limit the scope of the present invention. The scope of protection of the present invention is defined by the scope of the appended claims, and the scope of the invention is defined by the scope of the appended claims.

第1圖係與不同flanking區域大小相關之測試準確率;就各6個人類白血球抗原對偶基因(不同顏色的線)而言,測試準確率顯示隨著flanking區域大小增加而提升;圖中Affy 6.0晶片之數據係以未填補單核苷酸多型性表示。 Figure 1 shows the test accuracy associated with the size of different flanking regions; for each of the 6 human leukocyte antigen dual genes (lines of different colors), the test accuracy rate increases as the size of the flanking region increases; Affy 6.0 in the figure The data for the wafer is expressed as unfilled single nucleotide polymorphism.

第2圖係由各基因定型晶片產生之最佳化模型的測試準確率;圖中顯示6個人類白血球抗原對偶基因的測試準確率與判讀率(可信度門檻為0);圖中顯示各3個基因定型晶片及該三個基因定型晶片之合併晶片(不同顏色長條)之填補(A)與未填補(B)單核苷酸多型性。 Figure 2 is the test accuracy of the optimized model generated by each gene-formed wafer; the graph shows the test accuracy and interpretation rate of the 6 human leukocyte antigen dual genes (the threshold of credibility is 0); Filling (A) and unfilled (B) single nucleotide polytypes of the three genetically shaped wafers and the combined wafers of the three genetically shaped wafers (different color strips).

定義definition

本說明書中使用的用語係指在該領域中的一般涵義。下列在本說明書中所討論到的用語,為了方便起見,某些用語會以特別的字體格式標示,例如使用斜體及/或括號。這些字體格式的使用並不影響到其範圍及該用語本身的涵義。無論是否以特別的字體格式標示,其範圍及用語本身的涵義是相同的。因此,任何等效用語或同義詞的使用,並非用以改變其本身的涵義。使用其中一或多種同義詞,並非排除其他同義詞的使用。在本發明實施例中所使用的任何用語僅為說明,並非用以限制其範圍及涵義。相同地,本發明之範圍亦不僅受限於所出現的實施例。 The terminology used in the specification refers to the general meaning in the field. For the purposes of the following discussion in this specification, some terms are indicated in a special font format for convenience, for example, in italics and/or parentheses. The use of these font formats does not affect the scope and meaning of the term itself. Whether or not marked in a special font format, the scope and meaning of the term itself are the same. Therefore, the use of any equivalent language or synonym is not intended to change its meaning. The use of one or more synonyms does not exclude the use of other synonyms. The use of any terms in the embodiments of the present invention is merely illustrative and is not intended to limit the scope and meaning. Likewise, the scope of the invention is not limited only by the embodiments that are present.

除非有特別予以定義,所有在此出現的技術及科學用語,具有在本領域具有通常知識者所認知的通常涵義。 Unless specifically defined, all technical and scientific terms appearing herein have the ordinary meaning recognized by those of ordinary skill in the art.

本發明中所使用的「大約」、「約」等用語,係指在20% 的範圍內,較佳地係指在10%的範圍內,更佳地係指在5%的範圍內。在此提供的數字係為近似值,若未明確表達時,即是暗示具有大約或大致的意思。 The terms "about" and "about" as used in the present invention mean 20%. Within the scope of the present invention, it is preferably within the range of 10%, and more preferably within the range of 5%. The numbers provided herein are approximate and, if not explicitly indicated, are intended to have an approximate or approximate meaning.

實施例Example

本發明所提供之所有單核苷酸多型性(SNPs)之rsID,其序列及所含單核苷酸變異之位置及其變異之鹼基係於本發明申請前已公開於美國國家生物技術信息中心(National Center for Biotechnology Information,NCBI)之單核苷酸多型性資料庫(SNP database,dbSNP)。 The rsID of all single nucleotide polymorphisms (SNPs) provided by the present invention, the sequence and the position of the single nucleotide variation contained therein and the base of the variation thereof are disclosed in the National Biotechnology of the United States prior to the application of the present invention. Single Nucleotide Polymorphism Database (SNP database, dbSNP) of the National Center for Biotechnology Information (NCBI).

依據下列所述本發明實施例之儀器、裝置、方法及其相關結果等,僅為說明之用,並非用以限制本發明之範圍。在實施例中的名稱或其次名稱僅為方便閱讀,並非用以限制本發明之範圍。進一步地,在此所揭露的理論,無論其是否有誤,只要實施例可據以實施,皆不應限制本發明之範圍。 The apparatus, the device, the method, the related results, and the like according to the embodiments of the present invention described below are for illustrative purposes only and are not intended to limit the scope of the present invention. The names in the examples or their sub-names are for convenience of reading and are not intended to limit the scope of the invention. Further, the theory disclosed herein, whether or not it is inaccurate, is not intended to limit the scope of the invention.

研究設計Research design

利用估計等式方法(estimating equation approach),以建立混淆基因型(unphased genotype)的人類白血球抗原基因型預測模型。就各對偶基因而言,該人類白血球抗原基因型預測方法係以二個階段進行。第一階段為建構一預測的模型,而第二階段則是驗證由該第一階段所產生之模型。在該第一階段中,選擇一套混淆基因型來建立一預測模型。該選擇係使用一目標函數(objective function)來評估,其係為人類白血球抗原對偶基因特定的混淆基因型(基於Akaike Information Criterion)之負對數近似值(negative log-likelihood)。接著,基因型的選擇係以前進選擇(forward-selection)及後退淘汰(backward-elimination)的方法進行。以與一人類白血球抗原對偶基因相關的基因型為起始並逐次地增加一個基因型。該第二階段係使用一套獨立的樣本來驗證第一階段的預測模型。提供混淆基因型(unphased genotype)及非混淆人類白血球抗原對偶基因(phased HLA alleles)作為該些獨立的樣本。依照最簡化規則(parsimonious rule),最佳的預測模型需使用最小可能性的flanking區域與最少可能性的單核苷酸多型性以產生最精準的預 測。本發明所使用之樣本,係由臺灣地區華人細胞株及基因資料庫(Taiwan Han Chinese Cell and Genome Bank)取得之437個居住於臺灣地區漢人的血液樣本。 An estimation equation for human leukocyte antigen genotypes using an unphased genotype was established using an estimating equation approach. For each dual gene, the human leukocyte antigen genotype prediction method is performed in two stages. The first stage is to construct a predictive model, and the second stage is to verify the model produced by the first stage. In this first phase, a set of confusing genotypes is selected to establish a predictive model. This selection is assessed using an objective function, which is a negative log-likelihood of the human leukocyte antigen-specific gene-specific confounding genotype (based on Akaike Information Criterion). Next, the genotype selection is carried out by a method of forward-selection and backward-elimination. Start with a genotype associated with a human leukocyte antigen dual gene and add one genotype one by one. This second phase uses a separate set of samples to validate the predictive model of the first phase. Unphased genotypes and non-confusing human leukocyte antigen pairs (phased HLA alleles) were provided as separate samples. According to the parsimonious rule, the best predictive model requires the least possible franking region and the least probable single nucleotide polymorphism to produce the most accurate predictions. The sample used in the present invention is a blood sample of 437 Han Chinese living in Taiwan obtained from the Taiwan Han Chinese Cell and Genome Bank.

基因型分析法Genotype analysis

本發明使用三種商用晶片:1)Affymetrix Genome-Wide Human SNP Array 5.0晶片(Affy 5.0);2)Affymetrix Genome-Wide Human SNP Array 6.0晶片(Affy 6.0);及3)Illumina’s HumanHap550 Genotyping BeadChip晶片(Illumina 550),其中使用Affy 5.0與Illumina 550晶片對437個白血球DNA樣本進行基因型分析,而437個樣本中的214個樣本亦使用Affy 6.0晶片進行基因定型。位於第6對染色體短臂上的人類主要組織相容性複合體(MHC)又稱為人類白血球抗原(HLA)基因群區域中,該Affy 5.0晶片具有1,406個單核苷酸多型性(SNPs);該Affy 6.0晶片具有2,203個單核苷酸多型性;該Illumina 550K晶片具有1,939個單核苷酸多型性(如表1所示),而intra-MHC區域係以位於著絲點端(centromeric end)的HLA-A對偶基因(6號染色體上的位置:30,018,310-30,021,632;NCBI build 36.3)與位於端粒端(telomeric end)的HLA-DPB1對偶基因(6號染色體上的位置:33,151,738-33,162,954)為界。此區域包括第I類基因座(HLA-A,HLA-B,HLA-C)及第II類基因座(HLA-DRB1,HLA-DQB1)。對於HLA-A-B-C、-DQB1及-DRB1的對偶基因,係利用Dynal RELI SSO typing套組(Dynal Biotech Ltd.,英國)進行基因定型;對於HLA-DPB1的對偶基因,係利用Gold SSP HLA-DPB1 High Resolution套組(Invitrogen公司,美國加州)進行基因定型。所有的基因定型皆由中央研究院國家基因體醫學研究中心執行,該些單核苷酸多型性的判讀率(call rate)皆大於98%。 Three commercial wafers were used in the present invention: 1) Affymetrix Genome-Wide Human SNP Array 5.0 wafer (Affy 5.0); 2) Affymetrix Genome-Wide Human SNP Array 6.0 wafer (Affy 6.0); and 3) Illumina's HumanHap550 Genotyping BeadChip wafer (Illumina 550) Among them, 437 white blood cell DNA samples were genotyped using Affy 5.0 and Illumina 550 wafers, and 214 of 437 samples were also genotyped using Affy 6.0 wafers. The human major histocompatibility complex (MHC) located on the short arm of chromosome 6 is also known as the human leukocyte antigen (HLA) gene group region. The Affy 5.0 wafer has 1,406 single nucleotide polymorphisms (SNPs). The Affy 6.0 wafer has 2,203 single nucleotide polytypes; the Illumina 550K wafer has 1,939 single nucleotide polytypes (as shown in Table 1), while the intra-MHC region is located at the centromere The centromeric end of the HLA-A dual gene (position on chromosome 6: 30, 018, 310-30, 021, 632; NCBI build 36.3) and the HLA-DPB1 dual gene located at the telomeric end (position on chromosome 6: 33, 151, 738-33, 162, 954) is bounded. This region includes class I loci ( HLA-A , HLA-B , HLA-C ) and class II loci ( HLA-DRB1 , HLA-DQB1 ). For HLA-A, -B, -C, - DQB1 and - DRB1 allele, the system using the Dynal RELI SSO typing kit (Dynal Biotech Ltd., UK) genotyping; for HLA-DPB1 alleles, the system using Genotyping was performed on the Gold SSP HLA-DPB1 High Resolution kit (Invitrogen, Calif.). All genotyping was performed by the National Research Center for Genetic Medicine of the Academia Sinica, and the single-nucleotide polymorphic call rate was greater than 98%.

對於全基因體相關研究法(genome-wide association studies,GWASs),本發明以基因型填補(Genotype imputation)評估在建構人類白血球抗原基因型預測模型的實用性。為了數據的連貫性及最佳填補表現,本發明使用MaCH軟體及漢人北京(CHB)與日本東京(JPT)之數據組作為參考,此係用來填補本發明單核苷酸多型性之外來自國際人類基因組單體型圖計劃(HapMap Project)的基因型。本發明檢視所有在MHC區域中的單核苷酸 多型性,而通常在基因型填補前,採用quality-control評估並篩選單核苷酸多型性以控制其品質,當嚴重違反Hardy-Weinberg equilibrium(p<10-4)時,該些單核苷酸多型性判讀率(call rate)<0.95,少數對偶基因頻率(minor-allele frequency)<0.01,則排除該些單核苷酸多型性。再者,本發明之填補的單核苷酸多型性各自具有一由MaCH軟體分析結果之貝氏定理事後機率(posterior probability)>0.8、判讀率>0.95、且少數對偶基因頻率>0.01。 For genome-wide association studies (GWASs), the present invention evaluates the utility of genotype imputation in constructing a human leukocyte antigen genotype prediction model. For the consistency of data and the best filling performance, the present invention uses the dataset of MaCH software and Hanren Beijing (CHB) and Tokyo, Japan (JPT) as a reference, which is used to fill the single nucleotide polymorphism of the present invention. Genotypes from the International Human Genome HapMap Project. The present invention examines all single nucleotide polymorphisms in the MHC region, and typically evaluates and screens for single nucleotide polymorphism to control its quality prior to genotyping, when severely violating Hardy-Weinberg At equilibrium (p<10 -4 ), the single nucleotide polymorphism rate (call rate) <0.95, and the minority-allele frequency <0.01, excludes the single nucleotides. Type. Furthermore, the filled single nucleotide polymorphisms of the present invention each have a posterior probability > 0.8 for the results of the analysis of the MaCH software, a discrimination rate of > 0.95, and a minority of the dual gene frequencies > 0.01.

另一方面,為了測試該些晶片間單核苷酸多型性的重複性以及一致性,本發明比較了各兩個晶片重疊的單核苷酸多型性數據。判斷基因型數據的一致性係以Cohen’s kappa coefficient來計算,而Kappa值大於0.9通常表示兩個晶片之數據具有高一致性。本發明亦比較各兩個晶片在建構人類白血球抗原基因型預測模型過程中所選基因型間的差異,以此判定所選的基因型是否對晶片具有專一性即獨特性。該差異係定義為,其中plat i plat j 係兩個不同的晶片;∪(plat i ,plat j )係兩個不同晶片之單核苷酸多型性的聯集(union);而∩(plat i ,plat j )係兩個不同晶片之單核苷酸多型性的交集(intersection)。 On the other hand, in order to test the repeatability and uniformity of the single nucleotide polymorphism between the wafers, the present invention compares the single nucleotide polymorphism data of the two wafers. Judging the consistency of genotype data is calculated by Cohen's kappa coefficient, and a Kappa value greater than 0.9 usually means that the data of the two wafers has high consistency. The present invention also compares the differences between selected genotypes in the process of constructing a human leukocyte antigen genotype prediction model for each of the two wafers to determine whether the selected genotype is specific to the wafer, i.e., unique. This difference is defined as , plat i and plat j are two different wafers; plat ( plat i , plat j ) is a union of single nucleotide polymorphisms of two different wafers; and ∩ ( plat i , plat j ) is the intersection of single nucleotide polymorphisms of two different wafers.

不同族群間人類白血球抗原對偶基因頻率分佈及flanking區域大小差異Frequency distribution of human leukocyte antigen dual gene and size of flanking region among different ethnic groups

不同族群間,人類白血球抗原對偶基因及其對偶分佈有實質上的差異,其係反應族群近代的演化歷史。再者,人類白血球抗原對偶基因分布於第6對染色體上不同的區域,包括數個單核苷酸多型性。本發明探討國際人類基因組單體型圖計劃(HapMap Project)中亞洲人樣本及高加索人樣本的對偶基因頻率分佈。對於各人類白血球抗原對偶基因,本發明使用卡方分佈(chi-square)及費雪精確性檢定(Fisher’s exact test)來決定人類白血球抗原對偶基因在這兩個族群中是否有所差異。本發明以延伸±10kb至±400kb的flanking區域來建構該人類白血球抗原基因型預測模型。在漢人種中,各人類白血球抗原對偶基因最適合的flanking區域係由上述最簡化規 則來決定。此外,本發明亦比較亞洲人的flanking區域大小(Affy 5.0晶片)與已知高加索人的flanking區域大小。 There is a substantial difference between the human leukocyte antigen dual gene and its dual distribution among different ethnic groups, which is the evolution history of the response population in recent times. Furthermore, the human leukocyte antigen dual gene is distributed in different regions on the sixth pair of chromosomes, including several single nucleotide polymorphisms. The present invention explores the dual gene frequency distribution of Asian samples and Caucasian samples in the International Human Genome HapMap Project. For each human leukocyte antigen dual gene, the present invention uses a chi-square and Fisher's exact test to determine whether the human leukocyte antigen dual gene differs between the two populations. The present invention constructs the human leukocyte antigen genotype prediction model by extending the flanking region of ±10 kb to ±400 kb. In the Han Chinese, the most suitable flanking region for each human leukocyte antigen dual gene is the most simplified Then decide. In addition, the present invention also compares the size of the Asian flanking region (Affy 5.0 wafer) with the size of the known Caucasian flanking region.

交叉驗證(Corss-validation)Cross validation (Corss-validation)

在開始人類白血球抗原預測分析之前,本發明將數據分成多組進行交叉驗證(cross-validation,CV)。以十折(10-fold)交叉驗證為例,該數據係分為一驗證集(training data set)(數據之9/10)及一測試集(testing data set)(數據之1/10)。就各交叉驗證子集(subset)而言,計算該測試集的準確率且定義為,其中T v 係正確預測測試集中樣本的數量,而N v 係測試集中樣本的總數。平均測試準確度係10個交叉驗證子集的平均值,表示所建構的模型在預測人類白血球抗原對偶基因上的表現。人類白血球抗原之預測可不經由交叉驗證,然而執行交叉驗證可避免預測模型的過適(over-fitting)且可節省獲得一獨立樣本組用於評估的時間與成本。本發明係建構人類白血球抗原基因型預測模型,故,使用十折交叉驗證。 Prior to the initiation of human leukocyte antigen prediction analysis, the present invention divided the data into groups for cross-validation (CV). Taking a 10-fold cross-validation as an example, the data is divided into a training data set (9/10 of data) and a testing data set (1/10 of data). For each cross-validation subset, the accuracy of the test set is calculated and defined as , where T v is the correct prediction of the number of samples in the test set, and N v is the total number of samples in the test set. The mean test accuracy is the average of 10 cross-validated subsets, indicating the performance of the constructed model in predicting the human leukocyte antigen dual gene. Prediction of human leukocyte antigens may not be cross-validated, however performing cross-validation may avoid over-fitting of the predictive model and may save time and cost in obtaining an independent sample set for evaluation. The present invention constructs a human leukocyte antigen genotype prediction model, and therefore, uses ten-fold cross-validation.

可信度門檻(Confidence threshold)Confidence threshold

就在測試集中的各樣本而言,其P值(probability value)係被分配給特定單型之每個可能的人類白血球抗原對偶基因對。這些數值係基於所提供之混淆基因型及非混淆的人類白血球抗原對偶基因對。概率分配後,若該概率超過一預先指定之可信度門檻,則選擇具有最大概率的人類白血球抗原對偶基因對。通常來說,可信度門檻設為0,表示判讀率(call rate)為100%(即所有樣本均會被預測)。若可信度門檻設為0.5(或任何大於0的值),則僅會使用最大預測概率超過可信度門檻的樣本。本發明將可信度門檻設為0、0.5、或0.9來評估可信度門檻對建構人類白血球抗原基因型預測模型之影響。 For each sample in the test set, its probability value is assigned to each possible human leukocyte antigen pair pair of genes for a particular haplotype. These values are based on the provided confusing genotype and the non-confusing human leukocyte antigen pair pair. After the probability is assigned, if the probability exceeds a pre-specified threshold of confidence, the human leukocyte antigen pair pair with the greatest probability is selected. In general, the confidence threshold is set to 0, indicating that the call rate is 100% (ie, all samples are predicted). If the confidence threshold is set to 0.5 (or any value greater than 0), only samples with a maximum predicted probability that exceeds the confidence threshold will be used. The present invention sets the confidence threshold to 0, 0.5, or 0.9 to assess the impact of the credibility threshold on constructing a human leukocyte antigen genotype prediction model.

結果result

本發明使用214個樣本由三個不同晶片(Affy5.0、Affy 6.0、及Illumina 550K晶片)基因定型之樣本來計算6個典型人類白血球抗原(HLA-A,HLA-B,HLA-C,HLA-DRB1,HLA-DQB1,HLA-DPB1)對偶基因之頻率分佈。本發明亦分析180個從國際人類基因組單體型圖計劃取得之高加索人樣本,然而無該些樣本的HLA-DPB1數據。人類白血球抗原基因座最多係為HLA-B。在漢人種中觀察到44個對偶基因橫跨HLA-B區域,而在國際人類基因組單體型圖計劃中的高加索人種中觀察到32個對偶基因橫跨HLA-B區域。如卡方分佈及費雪精確性檢定所示,在高加索人和漢人間,HLA-A、-B、-C、-DQB1、及-DRB1對偶基因之對偶基因頻率分佈具有顯著的差異(所有p值<0.0001;HLA-A、-B、-C、-DQB1、及-DRB1之自由度(degree of freedom)分別為29、62、23、16、及35),由此可見,人類白血球抗原對偶基因頻率分佈在不同人種間有很大的差異,也就是說,由一族群的人類白血球抗原對偶基因所建構之人類白血球抗原基因型預測模型,在預測不同族群人種時會產生不良的預測。 The present invention uses 214 samples to generate six representative human leukocyte antigens ( HLA-A, HLA-B , HLA-C, HLA ) from three different wafers (Affy 5.0, Affy 6.0, and Illumina 550K wafers) . -DRB1 , HLA-DQB1, HLA-DPB1 ) Frequency distribution of the dual gene. The present invention also analyzes 180 Caucasian samples obtained from the International Human Genome HapMap Project, but without the HLA-DPB1 data for these samples. The human leukocyte antigen locus is most likely HLA-B . In the Han Chinese, 44 pairs of genes were observed to cross the HLA-B region, while 32 pairs of genes were observed across the HLA-B region in Caucasian humans in the International Human Genome HapMap Project. The chi-square test and Fisher accuracy shown in Caucasian and Chinese human, HLA-A, -B, -C , -DQB1, and - having a significant difference (all p values DRB1 allele frequency of allele distribution <0.0001; HLA-A, -B, -C, -DQB1 , and - DRB1 degrees of freedom are 29, 62, 23, 16, and 35, respectively, which shows that human leukocyte antigen dual gene The frequency distribution varies greatly among different races. That is to say, the human leukocyte antigen genotype prediction model constructed by a group of human leukocyte antigen dual genes may produce poor predictions when predicting different ethnic groups.

未填補之不同的晶片Different wafers not filled

僅使用單一的基因定型技術可能使得人類白血球抗原對偶基因之預測存在偏見。為了克服此問題,本發明之臺灣地區214個漢人樣本係以三個晶片來基因定型(Affy 5.0、Affy 6.0、及Illumina 550K)。各晶片的結果及這三個晶片的合併晶片(Union)的結果係被用來建構人類白血球抗原預測之模型。最後,本發明評估由這三個數據集所衍生之預測模型是否產出具有比較性的預測。 The use of a single genetic typing technique may bias the prediction of human leukocyte antigen dual genes. In order to overcome this problem, the 214 Han Chinese samples of the Taiwanese region of the present invention were genetically shaped with three wafers (Affy 5.0, Affy 6.0, and Illumina 550K). The results of each wafer and the results of the combined wafers of the three wafers were used to construct a model for human leukocyte antigen prediction. Finally, the present invention evaluates whether predictive models derived from these three data sets yield comparative predictions.

各晶片對之間少有數據的重疊(如表1所示)。Affy 6.0有最多在人類MHC區域中的單核苷酸多型性,而Affy 5.0則為最少(如表1所示)。表2則顯示,晶片對之間關於所觀察基因型的一致性係數。比較兩個Affymetrix陣列,同時存在於兩個陣列之基因型的一致性係數係高達0.9926,此高程度的一致性表示高質量的基因定型,其更由比較不同晶片間的基因型獲得支持。 There is little overlap of data between pairs of wafers (as shown in Table 1). Affy 6.0 has the most single nucleotide polymorphism in the human MHC region, while Affy 5.0 is the least (as shown in Table 1). Table 2 shows the consistency coefficients between pairs of wafers for the observed genotype. Comparing the two Affymetrix arrays, the genotypes present in both arrays have a consensus coefficient of up to 0.9926. This high degree of agreement indicates high quality genotyping, which is supported by comparing genotypes between different wafers.

表1、國際人類基因組單體型圖計劃及三個基因定型晶片間在延伸的MHC Table 1. International Human Genome HapMap Project and Extended MHC between Three Genotyping Wafers

通常來說,合併晶片較各單獨的晶片產生較準確的人類白血球抗原對偶基因預測。當可信度門檻為0,合併晶片的平均測試準確率為89.78%,但就單獨的Affy 5.0、Affy 6.0、及Illumina 550K而言,其平均測試準確率分別只有86.92%、88.42%、及88.06%(如第2A圖所示),顯示較高的單核苷酸多型性密集度使得人類白血球抗原對偶基因之預測準確率提升。 In general, combined wafers produce more accurate predictions of human leukocyte antigen dual genes than individual wafers. When the credibility threshold is 0, the average test accuracy of the combined wafers is 89.78%, but for the separate Affy 5.0, Affy 6.0, and Illumina 550K, the average test accuracy is only 86.92%, 88.42%, and 88.06, respectively. % (as shown in Figure 2A) shows that higher single nucleotide polymorphisms increase the predictive accuracy of human leukocyte antigen dual genes.

關於三個基因定型晶片間的比較,Affy 6.0產生最準確的人類白血球抗原對偶基因預測。舉例來說,在HLA-DRB1基因座,Affy 6.0較Affy 5.0更準確3.52%;而在HLA-DPB1基因座,Affy 6.0較Illumina 550K更準確2.58%。Affy 6.0可能具有在人類主要組織相容性複合體區域中最高的基因型密度。當可信度門檻為0時,Affy 6.0與HLA-DQB1可獲得最高的測試準確率(95.79%),而Illumina 550K與HLA-B之準確率則為最低(80.37%,如第2A圖所示)。藉由使用一可信度門檻為0.9至所有可能之人類白血球抗原對偶基因對的最大概率,HLA-C基因座之最高準確率提升至98.62%(由Illumina 55K得到,判讀率為77.47%),而HLA-B之最低準確率 提升至87.67%(由Affy 5.0得到,判讀率為64.94%)。基於Illumina 550K之預測模型所產生之準確率範圍在HLA基因座係較在其他基因定型晶片中所觀察到的結果顯著。當可信度門檻為0時,該Illumina 550K預測晶片在HLB-B對偶基因的準確率僅為80.37%,但在HLA-DQB1對偶基因的準確率為95.29%。對於HLA-BHLA-DPB1對偶基因,Affy 5.0的預測係較Ilumina 550K分別更準確0.45%及0.96%。對於HLA-AHLA-DRB1對偶基因,Illumina 550K的預測係較Affy 6.0分別更準確1.56%及0.27%(如第2圖所示)。該些結果顯示與Affy 5.0及Illumina 550K相關之些微的優勢係可能源自於這些晶片上特殊的單核苷酸多型性。總地來說,這些預測模型的準確率通常在基因定型晶片間具有比較性。 Affy 6.0 produced the most accurate human leukocyte antigen dual gene prediction for comparison between three genetically shaped wafers. For example, at the HLA-DRB1 locus, Affy 6.0 is 3.52% more accurate than Affy 5.0; at the HLA-DPB1 locus, Affy 6.0 is 2.58% more accurate than Illumina 550K. Affy 6.0 may have the highest genotypic density in the major histocompatibility complex regions of humans. When the confidence threshold is 0, Affy 6.0 and HLA-DQB1 can obtain the highest test accuracy (95.79%), while Illumina 550K and HLA-B have the lowest accuracy (80.37%, as shown in Figure 2A). ). By using a confidence threshold of 0.9 to the maximum probability of all possible human leukocyte antigen pair pairs, the highest accuracy of the HLA-C locus is increased to 98.62% (obtained by Illumina 55K, the interpretation rate is 77.47%). The minimum accuracy of HLA-B increased to 87.67% (obtained by Affy 5.0, the interpretation rate was 64.94%). The accuracy range produced by the Illumina 550K-based predictive model is significant in the HLA locus compared to that observed in other genotyping wafers. When the confidence threshold is 0, the accuracy of the Illumina 550K predicted wafer in the HLB-B dual gene is only 80.37%, but the accuracy of the HLA-DQB1 dual gene is 95.29%. For HLA-B and HLA-DPB1 dual genes, the prediction of Affy 5.0 was 0.45% and 0.96% more accurate than Ilumina 550K, respectively. For the HLA-A and HLA-DRB1 dual genes, the Illumina 550K prediction was 1.56% and 0.27% more accurate than Affy 6.0, respectively (as shown in Figure 2). These results show that the slight advantage associated with Affy 5.0 and Illumina 550K may be due to the particular single nucleotide polymorphism on these wafers. In general, the accuracy of these predictive models is often comparable between genotyping wafers.

進行有效flanking區域(例如:產生最準確人類白血球抗原對偶基因預測之最短側基因序列延伸)的探討。使用Illumina 550K(±10kb)在HLA-C基因座識別出最短的有效flanking區域。該HLA-C之長度為3,325bp而此有效flanking區域涵蓋22個單核苷酸多型性,其中13個單核苷酸多型性係包含於HLA-C預測模型中(當可信度門檻為0,測試準確率為92.01%)。當使用Affy 6.0數據時,HLA-A最長有效flanking區域為±350kb(如表3所示),在這個區域中為299個單核苷酸多型性,其中16個單核苷酸多型性係包含於HLA-A預測模型中(當可信度門檻為0,測試準確率為85.29%) Conduct an effective flanking region (eg, to produce the shortest side gene sequence extension of the most accurate human leukocyte antigen dual gene prediction). The shortest effective flanking region was identified at the HLA-C locus using Illumina 550K (±10 kb). The HLA-C is 3,325 bp in length and this effective flanking region covers 22 single nucleotide polymorphisms, of which 13 single nucleotide polymorphisms are included in the HLA-C predictive model (when the threshold is credible) 0, the test accuracy rate is 92.01%). When using Affy 6.0 data, the longest effective flanking region of HLA-A is ±350 kb (as shown in Table 3), in this region 299 single nucleotide polymorphisms, of which 16 single nucleotide polymorphisms It is included in the HLA-A prediction model (when the credibility threshold is 0, the test accuracy is 85.29%)

就各人類白血球抗原對偶基因而言,本發明進一步評估包含在人類白血球抗原基因型預測模型中晶片間重疊的基因型。當比較Affy 6.0及合併晶片時,HLA-DRB1之不含填補的單核苷酸多型性數據之最大重疊基因型比率為21.36%,顯示不同的晶片使用獨特的單核苷酸多型性來選擇晶片專一性的基因型,而該些基因型被用於建構不同的人類白血球抗原基因型預測模型。 For each human leukocyte antigen dual gene, the present invention further evaluates genotypes that overlap between wafers in a human leukocyte antigen genotype prediction model. When comparing Affy 6.0 with pooled wafers, the maximum overlap genotype ratio of HLA-DRB1 's unfilled single nucleotide polymorphism data was 21.36%, indicating that different wafers use unique single nucleotide polymorphisms. The wafer-specific genotypes were selected, and these genotypes were used to construct different human leukocyte antigen genotype prediction models.

不同晶片之填補Filling of different wafers

本發明的合併晶片(可信度門檻為0,平均測試準確率為90.17%)較三個獨立的晶片(可信度門檻為0,Affy 6.0、Affy 5.0、及Illumina 550K的平均測試準確率分別為89.90%、88.61%、及89.75%)產生較準確的人類白血球抗原對偶基因預測。較高的單核苷酸多型性密度可增加基因型填補的準確率,而因此增加最終預測的準確性。 The combined wafer of the present invention (with a confidence threshold of 0 and an average test accuracy of 90.17%) is compared with three independent wafers (the reliability threshold is 0, the average test accuracy of Affy 6.0, Affy 5.0, and Illumina 550K, respectively). For 89.90%, 88.61%, and 89.75%), a more accurate prediction of the human leukocyte antigen dual gene was produced. Higher single nucleotide polymorphism density increases the accuracy of genotyping, and thus increases the accuracy of the final prediction.

關於填補的三個基因定型晶片間之比較,通常Affy 6.0對人類白血球抗原對偶基因之預測較準確(在HLA-DPB1基因座較Affy 5.0準確高達4.23%且較Illumina 550K準確高達4.61%;如第2B圖所示)。在這些模型中(可信度門檻為0),HLA-DQB1基因座具有最高的測試準確率(96.75%,由Illumina 550K得到)。藉由使用一可信度門檻為0.9至所有可能之人類白血球抗原對偶基因對的最大概率,HLA-C基因座之最高準確率提升至99.09%(由Illumina 55K得到,判讀率為77.47%)。除了由Affy 5.0得到之HLA-B,基於此可信度門檻調整之準確率的進步在HLA-DRB1對偶基因最為顯著,當可信度門檻由0改變成0.9時,其準確率從86.67%上升至95.90%。然而,在HLA-A基因座,Affy 5.0(可信度門檻為0)產生之預測較Affy 6.0準確0.45%。Illumina 550K除了在HLA-DPB1較Affy 6.0準確4.61%,Illumina 550K產生之預測分別較Affy 6.0在HLA-A、-B、-C、-DQB1、及-DRB1基因座準確0.27%、1.10%、1.11%、0.05%、及1.19%(如第2圖所示),結果顯示Affy 5.0及Illumina 550K之預測的優勢可能源自於這些晶片上特有的單核苷酸多型性。 Affy 6.0 is generally more accurate in predicting the human leukocyte antigen dual gene (the accuracy of the HLA-DPB1 locus is as high as 4.23% compared to Affy 5.0 and as accurate as Illumina 550K up to 4.61%). Figure 2B shows). In these models (with a threshold of 0), the HLA-DQB1 locus has the highest test accuracy (96.75%, available from Illumina 550K). By using a confidence threshold of 0.9 to the maximum probability of all possible human leukocyte antigen pair pairs, the highest accuracy of the HLA-C locus was increased to 99.09% (from Illumina 55K, the interpretation rate was 77.47%). In addition to the HLA-B obtained by Affy 5.0, the improvement of the accuracy based on this credibility threshold is most significant in the HLA-DRB1 dual gene. When the credibility threshold is changed from 0 to 0.9, the accuracy rate increases from 86.67%. To 95.90%. However, at the HLA-A locus, Affy 5.0 (with a threshold of 0) produced a 0.45% accuracy prediction over Affy 6.0. Illumina 550K was 4.61% more accurate than Affy 6.0 in HLA-DPB1 , and Illumina 550K was predicted to be 0.27%, 1.10%, 1.11 better than Affy 6.0 at HLA-A, -B, -C, -DQB1, and -DRB1 loci, respectively. %, 0.05%, and 1.19% (as shown in Figure 2), the results show that the predicted advantages of Affy 5.0 and Illumina 550K may be derived from the unique single nucleotide polymorphism on these wafers.

本發明亦評估各人類白血球抗原基因座之填補的有效flanking區域。最短的flanking區域之一係在HLA-DPB1基因座(±20kb)並由Affy 5.0所識別出來(如表3所示)。此區域涵蓋133個單核苷酸多型性,其中的34個係被選擇用於HLA-DPB1預測模型(可信度門檻為0,測試準確率為88.28%)。另一個最短的flanking區域係在HLA-C基因座(±20kb)並由Affy 6.0、Illumina 550K、及合併晶片所識別出來。最長的有效flanking區域係在HLA-A(±200kb)並由Illumina 550K所得(如表3所示)。在這些區域中有515個單核苷酸多型性,其中的17個係用於HLA-A預測模型(可信度門檻為0,測試準確率為86.93%)。 The present invention also assesses the effective flanking region of each human leukocyte antigen locus padding. One of the shortest flanking regions is at the HLA-DPB1 locus (±20 kb) and is recognized by Affy 5.0 (as shown in Table 3). This region covers 133 single nucleotide polymorphisms, 34 of which were selected for the HLA-DPB1 prediction model (with a confidence threshold of 0 and a test accuracy of 88.28%). The other shortest flanking region is at the HLA-C locus (±20 kb) and is recognized by Affy 6.0, Illumina 550K, and pooled wafers. The longest effective flanking region is at HLA-A (±200 kb) and is obtained from Illumina 550K (as shown in Table 3). There were 515 single nucleotide polymorphisms in these regions, 17 of which were used in the HLA-A predictive model (the confidence threshold was 0 and the test accuracy was 86.93%).

就各人類白血球抗原對偶基因而言,在不同預測模型間所使用的重疊的基因型最多為60.08%。因此,填補似乎降低不同晶片間的差異。 For each human leukocyte antigen dual gene, the overlapping genotypes used between different predictive models are up to 60.08%. Therefore, the filling seems to reduce the difference between different wafers.

不同晶片之填補與未填補Filling and unfilling of different wafers

本發明預測模型間橫跨不同晶片填補與非填補之測試準確率的比較,以填補單核苷酸多型性所建構的預測模型較非填補單核苷酸多型性所建構的預測模型準確(可信度門檻為0,平均準確率分別為89.61%與88.30%)。 Compared with the test accuracy of filling and non-filling of different wafers between the prediction models of the present invention, the prediction model constructed by filling the single nucleotide polymorphism is more accurate than the prediction model constructed by the non-filled single nucleotide polymorphism. (The threshold of credibility is 0, and the average accuracy is 89.61% and 88.30%, respectively).

就可信度門檻為0而言,填補的合併晶片具有最高的HLA-DQB1對偶基因測試準確率(97.18%),而未填補的Illumina 550K則具有最低的HLA-B對偶基因測試準確率(80.37%)。藉由使用一可信度門檻為0.9至所有可能之人類白血球抗原對偶基因對的最大概率,Illumina 550K之HLA-C基因座最高準確率提升至99.09%(有填補且判讀率為82.40%)。 For the credibility threshold of 0, the filled merged wafer has the highest HLA-DQB1 dual gene test accuracy (97.18%), while the unfilled Illumina 550K has the lowest HLA-B dual gene test accuracy (80.37). %). By using a confidence threshold of 0.9 to the maximum probability of all possible human leukocyte antigen pair pairs, the highest accuracy of the Illumina 550K HLA-C locus was increased to 99.09% (filled and the interpretation rate was 82.40%).

比較不同晶片間有填補及未填補的測試預測準確率,當使用填補的單核苷酸多型性來建構人類白血球抗原基因型預測模型時,特定之基因型變異通常會降低。就不同晶片間各人類白血球抗原對偶基因而言,填補平均提升25.02%所選用來建構模型之基因型間的重疊比例。這些結果可將不同基因定型晶片間的差異最小化。 Comparing the predicted and unfilled test prediction accuracy between different wafers, when using the filled single nucleotide polymorphism to construct the human leukocyte antigen genotype prediction model, the specific genotype variation is usually reduced. For each human leukocyte antigen dual gene between different wafers, the average ratio of overlap between the genotypes selected to construct the model was increased by 25.02%. These results minimize the differences between different genetically shaped wafers.

實施例1Example 1

利用上述結果,可據此發展出一種預測人類白血球抗原對偶基因的方法,其步驟包括:(a)提供一人類核酸樣本;(b)識別該人類核酸樣本之一單核苷酸多型性集合的基因型,該單核苷酸多型性集合包含位於人類白血球抗原基因上的各個不同單核苷酸多型性;(c)使用一預測模型分析步驟(b)中各單核苷酸多型性的基因型以獲得一計算值,其中該預測模型係使用單核苷酸多型性基因型來預測人類白血球抗原對偶基因;以及(d)依據步驟(c)所獲得之計算值預測該人類樣本的人類白血球抗原對偶基因型。 Using the above results, a method for predicting a human leukocyte antigen dual gene can be developed according to which the steps include: (a) providing a human nucleic acid sample; and (b) identifying a single nucleotide polymorphic collection of the human nucleic acid sample. Genotype, the single nucleotide polymorphism set comprising various different single nucleotide polymorphisms on the human leukocyte antigen gene; (c) using a predictive model to analyze each single nucleotide in step (b) a type of genotype to obtain a calculated value, wherein the predictive model uses a single nucleotide polytype genotype to predict a human leukocyte antigen dual gene; and (d) predicts the calculated value obtained according to step (c) Human leukocyte antigen dual genotype in human samples.

本發明使用漢人族群專一性的單核苷酸多型性並透過一演算法來預測各6個人類白血球抗原基因座對偶基因之基因型。各晶片之最佳的單核苷酸多型性數量不同,如表4所示。 The present invention uses the single nucleotide polymorphism of the Han nationality specificity and predicts the genotype of the dual gene of each of the six human leukocyte antigen loci by an algorithm. The optimal number of single nucleotide polymorphisms for each wafer is different, as shown in Table 4.

上述三個晶片及其合併晶片之單核苷酸多型性集合的內容 係如表5所示。本發明係以該些具族群專一性的單核苷酸多型性集合來預測漢人人類白血球抗原基因座對偶基因。 The content of the single nucleotide polymorphic set of the above three wafers and their combined wafers As shown in Table 5. The present invention predicts the human human leukocyte antigen locus dual gene by using the group-specific single nucleotide polymorphism set.

討論discuss

因為習知人類白血球抗原基因型的直接定型技術不符經濟效益,故,本發明係以人類白血球抗原對偶基因相對應之混淆基因型為基礎,識別出特定的人類白血球抗原基因型,並以此建構第I型人類白血球抗原(HLA-A、HLA-B、及HLA-C)及第II型人類白血球抗原(HLA-DRB1、HLA-DQB1、及HLA-DPB1)的預測模型。本發明比較亞洲人(臺灣地區漢人)與高加索人種(國際人類基因組單體型圖計劃)之對偶基因頻率分佈並識別此不同人種間顯著的差異。本發明建構數個具有高預測準確率的人類白血球抗原基因型之預測模型,並驗證該些模型相關之重要參數(例如:有效flanking區域、晶片準確率、及填補的影響)。因此,本發明所提供之模型可準確地預測亞洲人種中該些基因型,故,可應用於詳細分析人類白血球抗原相關之疾病的直接影響。 Because the direct typing technique of the known human leukocyte antigen genotype is not economical, the present invention recognizes a specific human leukocyte antigen genotype based on the confounding genotype corresponding to the human leukocyte antigen dual gene, and constructs it. Predictive models of human type 1 human leukocyte antigens ( HLA-A, HLA-B , and HLA-C ) and type II human leukocyte antigens ( HLA-DRB1, HLA-DQB1 , and HLA-DPB1 ). The present invention compares the frequency distribution of the dual genes of Asians (Taiwanese Han Chinese) and Caucasians (International Human Genome HapMap Scheme) and identifies significant differences between different ethnic groups. The present invention constructs several predictive models of human leukocyte antigen genotypes with high predictive accuracy and validates important parameters related to the models (eg, effective flanking regions, wafer accuracy, and effects of filling). Therefore, the model provided by the present invention can accurately predict these genotypes in Asian races, and thus can be applied to the detailed analysis of the direct effects of diseases associated with human leukocyte antigens.

本發明判別一較密集之單核苷酸多型性集合是否可產生較準確的人類白血球抗原對偶基因預測。在本發明中,用於填補晶片數據及建構更密集單核苷酸多型性所使用之MaCH軟體係以國際人類基因組單體型圖計劃及/或千人基因組計劃(1000 Genome Project,http://www.1000genomes.org)之數據作為參考。本發明發現所建構之有填補的人類白血球抗原基因型預測模型典型地提供較高的預測準確率,其強調了使用一較高密度單核苷酸多型性的正面效果。因此,可建構一新穎的客製化單核苷酸多型性陣列,其包括人類白血球抗原基因型預測模型之所有組成該些基因定型或填補的單核苷酸多型性,以提升預測準確率。 The present invention discriminates whether a denser single nucleotide polymorphism set can produce a more accurate human leukocyte antigen dual gene prediction. In the present invention, the MaCH soft system used to fill wafer data and construct denser single nucleotide polymorphisms is based on the International Human Genome HapMap Project and/or the 1000 Genome Project, http: //www.1000genomes.org) data for reference. The present inventors have found that a constructed human leukocyte antigen genotype prediction model is typically provided to provide a higher predictive accuracy, emphasizing the positive effect of using a higher density single nucleotide polymorphism. Therefore, a novel customized single nucleotide polymorphic array can be constructed, which includes all the single nucleotide polymorphisms of the human leukocyte antigen genotype prediction model that constitute the genotyping or filling of the genes, so as to improve the prediction accuracy. Confirmation rate.

藉由增加可信度門檻至0.5或0.9,本發明之亞洲人專一性人類白血球抗原基因型預測模型的預測準確率接近100%。 By increasing the threshold of credibility to 0.5 or 0.9, the prediction accuracy of the Asian-specific human leukocyte antigen genotype prediction model of the present invention is close to 100%.

同時,為了產生更準確的預測模型,本發明變更驗證集和測試集之樣本大小。使用本發明之Affy 6.0基因型數據來建構人類白血球抗原基因型預測模型並使用二、四、及十折交叉驗證。當可信度門檻為0(即判讀率為100%)時,不同的交叉驗證得到一致的樣本大小。為了排除樣本大小的影響,本發明係比較在可信度門檻為0下交叉驗證的影響。由使用二、 四、及十折交叉驗證(可信度門檻為0)所建構之預測模型的評估分析所得到之測試準確率的估計值。十折交叉驗證之HLA-DQB1基因座具有最佳的測試準確率(95.55%),而二折交叉驗證之HLA-B基因座則具有最低的測試準確率(76.64%)。就二折交叉驗證而言,測試準確率的範圍係從76.64%(HLA-B)至94.39%(HLA-DQB1)。隨著交叉驗證增加至10倍,測試準確率之程度接近95.55%(HLA-DQB1)。就HLA-A而言此改善最為明顯,因為以二折交叉驗證之預測模型其準確率為80.84%,而以十折交叉驗證則導致85.29%的準確率。在其他五個人類白血球抗原基因座亦觀察到相似的趨勢。該些趨勢可能反映大的樣本大小,其包括在驗證集中一充足數量的人類白血球抗原對偶基因,從而提升預測的準確率。雖然交叉驗證的程度影響測試準確率,但其變化係極微小的,故,本發明之族群專一性人類白血球抗原基因型預測模型並不受不同的交叉驗證影響。 At the same time, in order to produce a more accurate prediction model, the present invention changes the sample size of the validation set and the test set. The Affy 6.0 genotype data of the present invention was used to construct a human leukocyte antigen genotype prediction model and cross-validation was performed using two, four, and ten folds. When the confidence threshold is 0 (ie, the interpretation rate is 100%), different cross-validation results in a consistent sample size. In order to rule out the effect of sample size, the present invention compares the effects of cross-validation with a confidence threshold of zero. An estimate of the test accuracy obtained from the evaluation of the predictive model constructed using the second, fourth, and ten-fold cross-validation (the threshold of confidence is zero). Ten-fold cross-validation of the HLA-DQB1 locus has the best accuracy of the test (95.55%), while the second fold cross-validation of the HLA-B locus of the lowest accuracy of test (76.64%). For the two-fold cross-validation, the test accuracy ranged from 76.64% ( HLA-B ) to 94.39% ( HLA-DQB1 ). As cross-validation increases to 10x, the test accuracy is close to 95.55% ( HLA-DQB1 ). This improvement is most pronounced for HLA-A because the accuracy of the two-fold cross-validation prediction model is 80.84%, while the 10-fold cross-validation results in an accuracy of 85.29%. A similar trend was observed in the other five human leukocyte antigen loci. These trends may reflect a large sample size, including an adequate number of human leukocyte antigen dual genes in the validation set, thereby increasing the accuracy of the prediction. Although the degree of cross-validation affects the accuracy of the test, its variation is extremely small. Therefore, the population-specific human leukocyte antigen genotype prediction model of the present invention is not affected by different cross-validation.

本發明著重於使用214個臺灣地區漢人樣本產生人類白血球抗原基因型預測模型,模型係各以Affy 5.0、Affy 6.0、及Illumina 550K晶片作核苷酸多型性(SNPs)的基因定型分析(Genotyping)。同時,為了評估樣本數量的影響,本發明亦使用437個漢人樣本以Illumina 550K晶片作基因定型之分析(包含原本的214個樣本)產生人類白血球抗原基因型預測模型。以437個樣本建構之人類白血球抗原基因型預測模型(平均測試準確率為90.36%)較以214個樣本建構之預測模型(平均測試準確率為86.84%)為佳。因此,較大的樣本數量可以增加人類白血球抗原預測的準確率。 The present invention focuses on the use of 214 samples of Han Chinese in Taiwan to produce a human leukocyte antigen genotype prediction model, and the model uses Affy 5.0, Affy 6.0, and Illumina 550K wafers for genotyping of nucleotide polymorphisms (SNPs) (Genotyping). ). At the same time, in order to evaluate the influence of the number of samples, the present invention also uses 437 Han Chinese samples to analyze the genotyping of the Illumina 550K wafer (including the original 214 samples) to generate a human leukocyte antigen genotype prediction model. The human leukocyte antigen genotype prediction model constructed with 437 samples (average test accuracy rate was 90.36%) was better than the 214 sample construction prediction model (average test accuracy rate was 86.84%). Therefore, a larger sample size can increase the accuracy of human leukocyte antigen prediction.

結論in conclusion

經過十年的研究,許多人類白血球抗原對偶基因已知具有特定的免疫功能。連結單核苷酸多型性與人類白血球抗原對偶基因之實驗方法較人類白血球抗原直接分型技術節省了可觀的時間及成本,且使得大規模人類白血球抗原變異之研究可行。雖然人類白血球抗原分佈在人類族群間有所差異,大多數現有的人類白血球抗原基因型預測模型係基於高加索人樣本。藉由基因定型一大數量的漢人樣本,本發明發現許多漢人特有的人類白血球抗原對偶基因並建構族群專一性的人類白血球抗原基因型預測模型。本發明之驗證集涵蓋人類白血球抗原基因座中許多不常見且族群專 一的對偶基因,實質上地增加了預測的準確率。 After a decade of research, many human leukocyte antigen dual genes are known to have specific immune functions. The experimental method of linking single nucleotide polymorphism with human leukocyte antigen dual gene saves considerable time and cost compared with human leukocyte antigen direct typing technology, and makes the study of large-scale human leukocyte antigen mutation feasible. Although human leukocyte antigen distribution varies among human populations, most existing human leukocyte antigen genotype prediction models are based on Caucasian samples. By genetically shaping a large number of Han Chinese samples, the present invention finds many Han human-specific human leukocyte antigen dual genes and constructs a population-specific human leukocyte antigen genotype prediction model. The validation set of the invention covers many uncommon and ethnic groups in the human leukocyte antigen locus A dual gene substantially increases the accuracy of the prediction.

本發明所使用之特定方法參數(例如:樣本大小、單核苷酸多型性密集厲、及填補)係產生對亞洲人種之人類白血球抗原基因型預測模型的因素。本發明在漢人樣本中得到良好的HLA-A、-B、-C、-DRB1、-DQB1、及-DPB1對偶基因預測準確率。使用從Affymetrix Genome-Wide Human SNP Array 5.0、Affymetrix Genome-Wide Human SNP Array 6.0、Illumina HumanHap550 BeadChip、或此三個晶片之合併晶片中的單核苷酸多型性數據,本發明產生有效的人類白血球抗原基因型預測模型以辨別亞洲人的人類白血球抗原基因型。本發明新穎之預測工具可幫助識別免疫相關疾病的遺傳風險因子(genetic risk factors),例如:葛瑞夫茲氏病(Grave’s disease)。此外,亦可使本領域具有通常知識者研究廣大人種族群中的人類白血球抗原基因型。 The specific method parameters (e.g., sample size, single nucleotide polymorphism, and padding) used in the present invention are factors that produce a human leukocyte antigen genotype prediction model for Asian races. The present invention is to obtain a good HLA-A Han Chinese samples, -B, -C, -DRB1, -DQB1 , and - DPB1 alleles prediction accuracy. The present invention produces effective human leukocytes using single nucleotide polymorphism data from Affymetrix Genome-Wide Human SNP Array 5.0, Affymetrix Genome-Wide Human SNP Array 6.0, Illumina HumanHap550 BeadChip, or a combined wafer of these three wafers Antigen genotype prediction model to identify human leukocyte antigen genotypes in Asia. The novel predictive tools of the present invention can help identify genetic risk factors for immune related diseases, such as Grave's disease. In addition, it is also possible for a person of ordinary skill in the art to study the human leukocyte antigen genotype in a large population of people.

本發明所提供之預測方法及其應用裝置確實具有產業上之利用價值,惟以上之敘述僅為本發明之較佳實施例說明,凡精於此項技藝者當可依據上述之說明而作其它種種之改良,惟這些改變仍屬於本發明之精神及以下所界定之專利範圍中。 The prediction method and the application device provided by the present invention have industrial use value, but the above description is only for the preferred embodiment of the present invention, and those skilled in the art can make other according to the above description. Various modifications, but such changes are still within the spirit of the invention and the scope of the patents defined below.

Claims (5)

一種預測人類白血球抗原基因型的方法,其步驟包含:(a)提供一人類核酸樣本;(b)判別該人類核酸樣本之一單核苷酸多型性集合的基因型,該單核苷酸多型性集合包含位於一人類白血球抗原基因上之各個不同的單核苷酸多型性;以及(c)依據步驟(b)所判別之結果預測該人類核酸樣本的該人類白血球抗原對偶基因型;其中,可預測之該人類白血球抗原對偶基因型包括:(1)HLA-A基因、(2)HLA-B基因、(3)HLA-C基因、(4)HLA-DPB1基因、(5)HLA-DQB1基因或(6)HLA-DRB1基因的基因型;其中,預測該(1)HLA-A基因型的單核苷酸多型性係選自於一第1單核苷酸多型性集合、一第2單核苷酸多型性集合、一第3單核苷酸多型性集合、及一第4單核苷酸多型性集合所組成的群組,其中:(i)該第1單核苷酸多型性集合係包括:rs1633085、rs2254071、rs407238、rs9258881、rs2975046、rs2735096、rs417162、rs9260954、rs6917477、rs6457144、rs9261394、及rs2523990;(ii)該第2單核苷酸多型性集合係包括:rs4122198、rs16895757、rs1632973、rs9357086、rs11759549、rs3115628、rs3094165、rs2734925、rs2517755、rs2256919、rs11756025、rs7382061、rs6457144、rs2517646、及rs7744914;(iii)該第3單核苷酸多型性集合係包括:rs3094165、rs9258883、rs3132714、rs1611493、rs2524005、rs2860580、rs12665039、rs6457109、rs3869062、rs3893464、rs5009448、rs2571375、rs7758512、及rs9261394;(iv)該第4單核苷酸多型性集合係包括:rs2523409、rs1611133、rs3115628、rs2517859、rs1611732、rs2523998、rs2860580、rs12202296、rs2248153、rs2975046、rs6457109、rs5009448、rs9260932、及rs6457144;其中,預測該(2)HLA-B基因型的單核苷酸多型性係選自於一第5單核 苷酸多型性集合、一第6單核苷酸多型性集合、一第7單核苷酸多型性集合、及一第8單核苷酸多型性集合所組成的群組,其中(i)該第5單核苷酸多型性集合係包括:rs3130944、rs3130532、rs3130534、rs3134762、rs16899207、rs2524089、rs9366778、rs2524166、rs9295984、rs4394275、rs9378249、rs2523534、rs9266406、rs2844558、rs5022119 rs3099848、rs4081552、rs2848716、rs2596454、及rs2248462;(ii)該第6單核苷酸多型性集合係包括:rs11966319、rs2853948、rs6906846、rs9378228、rs2524051、rs9366778、rs16867947、rs4394274、rs4394275、rs2523591、rs9501572、rs7761068、rs2523535、rs9266406、rs5006724、rs13198903、rs9266669、rs9266689、rs3099849、rs2442749、rs1051796、rs2596464、rs3099836、及rs3131622;(iii)該第7單核苷酸多型性集合係包括:rs9264868、rs9264942、rs3094691、rs2156875、rs2523619、rs2442719、rs2596501、rs2523589、rs2523554、rs2844573、rs9266395、rs9266440、rs9295986、rs2442749、rs2596560、rs3128982、rs2284178、及rs7758090;(iv)該第8單核苷酸多型性集合係包括:rs3094691、rs7453967、rs4394274、rs4394275、rs2596509、rs2596501、rs1058026、rs2523591、rs2523589、rs2523554、rs2523545、rs9501572、rs2844575、rs9266395、rs9266406、rs5006725、rs9295986、rs6933050、rs4959068、rs5022119、rs13198903、rs9266689、rs2251396、rs1051796、rs3094584、rs9765960、及rs3128982;其中,預測該(3)HLA-C基因型的單核苷酸多型性係選自於一第9單核苷酸多型性集合、一第10單核苷酸多型性集合、一第11單核苷酸多型性集合、及一第12單核苷酸多型性集合所組成的群組,其中(i)該第9單核苷酸多型性集合係包括:rs2073724、rs3130713、rs3130531、rs3095250、rs3130532、rs3130534、rs2844615、rs6906846、rs2524067、rs7382297、rs2394963、rs2524095、rs16899203、rs9366778、rs9295970、及rs2523534;(ii)該第10單核苷酸多型性集合係包括:rs3130712、rs28480108、 rs3134762、rs19966319、rs9264523、rs3132488、rs3134745、rs3130693、rs3132486、rs2853948、rs6906846、rs9378228、rs6457372、rs2394963、rs2524057、rs12191877、及rs9366776;(iii)該第11單核苷酸多型性集合係包括:rs2516049、rs2858870、rs660895、rs532098、rs3129763、rs1063355、rs9275141、rs9275184、rs7774434、rs7775228、及rs9275224;(iv)該第12單核苷酸多型性集合係包括:rs9263957、rs9263969、rs3134762、rs11966319、rs2248880、rs9264532、rs2524099、rs2074488、rs2395471、rs5010528、rs13207315、rs3132488、rs3130693、rs9391714、rs4386816、rs2524057、rs16899205、及rs9295970;其中,預測該(4)HLA-DPB1基因型的單核苷酸多型性係選自於一第13單核苷酸多型性集合、一第14單核苷酸多型性集合、一第15單核苷酸多型性集合、及一第16單核苷酸多型性集合所組成的群組,其中(i)該第13單核苷酸多型性集合係包括:rs3128955、rs3130588、rs9277194、rs9348904、rs9296073、rs2856816、rs3135021、rs1431403、rs3128963、rs3117229、rs7763822、rs2295120、rs3117242、rs6937034、及rs1003979;(ii)該第14單核苷酸多型性集合係包括:rs9296068、rs9277183、rs3135402、rs9348904、rs2856830、rs9296073、rs2071350、rs1431402、rs1431403、rs9277550、rs3128963、rs3117229、rs9277567、rs3128918、及rs6937034;(iii)該第15單核苷酸多型性集合係包括:rs206769、rs6920606、rs375912、rs1431399、rs987870、rs3135021、rs9277535、rs9277554、rs10484569、rs2281390、rs3128917、rs2281388、rs3130215、及rs2269346;(iv)該第16單核苷酸多型性集合係包括:rs2216264、rs423639、rs3097669、rs987870、rs1431402、rs1431403、rs9277378、rs9277535、rs9277550、rs9277554、rs9277565、rs2281390、rs2281388、rs3130215、rs6937034、rs6937061、及rs2395357;其中,預測該(5)HLA-DQB1基因型的單核苷酸多型性係選自於一第17 單核苷酸多型性集合、一第18單核苷酸多型性集合、一第19單核苷酸多型性集合、及一第20單核苷酸多型性集合所組成的群組,其中(i)該第17單核苷酸多型性集合係包括:rs9269186、rs9270986、rs615672、rs3129768、rs9272219、rs9272346、rs6908943、rs9275134、rs9469220、rs6457617、rs2647046、rs2858308、及rs9275418;(ii)該第18單核苷酸多型性集合係包括:rs2647073、rs502055、rs3129768、rs9272535、rs9272723、rs34485459、rs3129716、rs7775228、rs6469219、rs5000634、rs6457617、及rs9275418;(iii)該第19單核苷酸多型性集合係包括:rs2516049、rs2858870、rs660895、rs532098、rs3129763、rs1063355、rs9275141、rs9275184、rs7774434、rs7775228、及rs9275224;(iv)該第20單核苷酸多型性集合係包括:rs17533090、rs9272219、rs17211510、rs41269947、rs34485459、rs1063355、rs9275141、rs3129716、rs7774434、rs9405119、rs9469219、rs9469220、及rs9275224;其中,預測該(6)HLA-DRB1基因基因型的單核苷酸多型性係選自於一第21單核苷酸多型性集合、一第22單核苷酸多型性集合、一第23單核苷酸多型性集合、及一第24單核苷酸多型性集合所組成的群組,其中(i)該第21單核苷酸多型性集合係包括:rs9268831、rs9268861、rs7747521、rs9268877、rs9269186、rs2027852、rs615672、rs3129768、rs9272219、rs9272346、rs9275134、rs7775228、rs9469220、rs6457617、rs2647046、及rs2858308;(ii)該第22單核苷酸多型性集合係包括:rs9268877、rs4410767、rs7749092、rs17210980、rs2647073、rs615672、rs674343、rs502771、rs3997872、rs9271367、rs9271720、rs2187668、rs34485459、rs3129716、及rs9405119;(iii)該第23單核苷酸多型性集合係包括:rs9405098、rs3129871、rs13209234、rs9268832、rs6903608、rs602875、rs660895、rs9271366、rs3129769、rs17211510、rs2187668、rs9275141、rs9275184、rs9275383、 rs2856717、rs2858305、rs13192471、及rs3104405;(iv)該第24單核苷酸多型性集合係包括:rs2395175、rs9405035、rs9268831、rs6903608、rs9268877、rs9269186、rs7749092、rs2027852、rs17210980、rs2516049、rs615672、rs660895、rs674313、rs502771、rs3997872、rs9271366、rs2187668、rs34485459、rs9275141、rs7755224、rs3129716、及rs3104404。 A method for predicting a genotype of a human leukocyte antigen, the method comprising the steps of: (a) providing a human nucleic acid sample; and (b) identifying a genotype of a single nucleotide polymorphic collection of the human nucleic acid sample, the single nucleotide The polymorphic collection comprises various single nucleotide polymorphisms located on a human leukocyte antigen gene; and (c) predicting the human leukocyte antigen dual genotype of the human nucleic acid sample based on the results discerned in step (b) Among them, the human globular antigen dual genotype can be predicted to include: (1) HLA-A gene, (2) HLA-B gene, (3) HLA-C gene, (4) HLA-DPB1 gene, (5) a genotype of the HLA-DQB1 gene or (6) HLA-DRB1 gene; wherein, the single nucleotide polymorphism of the (1) HLA-A genotype is predicted to be selected from a first single nucleotide polymorphism a group consisting of a collection, a second single nucleotide polymorphism set, a third single nucleotide polytype set, and a fourth single nucleotide polytype set, wherein: (i) the The first mononucleotide polymorphic collection includes: rs1633085, rs2254071, rs407238, rs9258881, rs2975046, rs2735096, rs417162, rs9260954, rs 6917477, rs6457144, rs9261394, and rs2523990; (ii) the second mononucleotide polytype collection includes: rs4122198, rs16895757, rs1632973, rs9357086, rs11759549, rs3115628, rs3094165, rs2734925, rs2517755, rs2256919, rs11756025, rs7382061 Rs6457144, rs2517646, and rs7744914; (iii) the third single nucleotide polymorphic collection includes: rs3094165, rs9258883, rs3132714, rs1611493, rs2524005, rs2860580, rs12665039, rs6457109, rs3869062, rs3893464, rs5009448, rs2571375, rs7758512 And rs9261394; (iv) the fourth single nucleotide polymorphic collection includes: rs2523409, rs1611133, rs3115628, rs2517859, rs1611732, rs2523998, rs2860580, rs12202296, rs2248153, rs2975046, rs6457109, rs5009448, rs9260932, and rs6457144; The single nucleotide polymorphism of the (2) HLA-B genotype is predicted to be selected from a fifth single core. a group consisting of a polymorphic glycoside collection, a 6th single nucleotide polymorphic collection, a 7th single nucleotide polymorphic collection, and an 8th single nucleotide polymorphic collection, wherein (i) the fifth mononucleotide polymorphic collection includes: rs3130944, rs3130532, rs3130534, rs3134762, rs16899207, rs2524089, rs9366778, rs2524166, rs9295984, rs4394275, rs9378249, rs2523534, rs9266406, rs2844558, rs5022119 rs3099848, rs4081552. Rs2848716, rs2596454, and rs2248462; (ii) the sixth mononucleotide polymorphic collection includes: rs11966319, rs2853948, rs6906846, rs9378228, rs2524051, rs9366778, rs16867947, rs4394274, rs4394275, rs2523591, rs9501572, rs7761068, rs2523535, Rs9266406, rs5006724, rs13198903, rs9266669, rs9266689, rs3099849, rs2442749, rs1051796, rs2596464, rs3099836, and rs3131622; (iii) the seventh single nucleotide polymorphism collection includes: rs9264868, rs9264942, rs3094691, rs2156875, rs2523619, Rs2442719, rs2596501, rs2523589, rs2523554, rs2844573, rs9266395, rs9266440, rs929598 6. rs2442749, rs2596560, rs3128982, rs2284178, and rs7758090; (iv) the eighth mononucleotide polymorphic collection includes: rs3094691, rs7453967, rs4394274, rs4394275, rs2596509, rs2596501, rs1058026, rs2523591, rs2523589, rs2523554, Rs2523545, rs9501572, rs2844575, rs9266395, rs9266406, rs5006725, rs9295986, rs6933050, rs4959068, rs5022119, rs13198903, rs9266689, rs2251396, rs1051796, rs3094584, rs9765960, and rs3128982; wherein, the (3) single-core of HLA-C genotype is predicted The polymorphism is selected from a ninth single nucleotide polymorphic set, a tenth single nucleotide polymorphic set, an eleventh single nucleotide polymorphic set, and a tenth single a group consisting of a polymorphic collection of nucleotides, wherein (i) the ninth single nucleotide polymorphic collection comprises: rs2073724, rs3130713, rs3130531, rs3095250, rs3130532, rs3130534, rs2844615, rs6906846, rs2524067, rs7382297 , rs2394963, rs2524095, rs16899203, rs9366778, rs9295970, and rs2523534; (ii) the 10th single nucleotide polymorphic collection includes: rs3130712 rs28480108, Rs3134762, rs19966319, rs9264523, rs3132488, rs3134745, rs3130693, rs3132486, rs2853948, rs6906846, rs9378228, rs6457372, rs2394963, rs2524057, rs12191877, and rs9366776; (iii) the 11th single nucleotide polymorphic collection includes: rs2516049, Rs2858870, rs660895, rs532098, rs3129763, rs1063355, rs9275141, rs9275184, rs7774434, rs7775228, and rs9275224; (iv) the 12th single nucleotide polymorphism collection includes: rs9263957, rs9263969, rs3134762, rs11966319, rs2248880, rs9264532 Rs2524099, rs2074488, rs2395471, rs5010528, rs13207315, rs3132488, rs3130693, rs9391714, rs4386816, rs2524057, rs16899205, and rs9295970; wherein the single nucleotide polymorphism of the (4) HLA-DPB1 genotype is predicted to be selected from a 13th mononucleotide polytype set, a 14th single nucleotide polymorphic set, a 15th single nucleotide polymorphic set, and a 16th single nucleotide polymorphic set a group, wherein (i) the 13th single nucleotide polymorphic collection comprises: rs3128955, rs3130588, rs9277194, rs9348904, rs9296 073, rs2856816, rs3135021, rs1431403, rs3128963, rs3117229, rs7763822, rs2295120, rs3117242, rs6937034, and rs1003979; (ii) the 14th single nucleotide polymorphic collection includes: rs9296068, rs9277183, rs3135402, rs9348904, rs2856830, Rs9296073, rs2071350, rs1431402, rs1431403, rs9277550, rs3128963, rs3117229, rs9277567, rs3128918, and rs6937034; (iii) the 15th single nucleotide polymorphic collection includes: rs206769, rs6920606, rs375912, rs1431399, rs987870, rs3135021 Rs9277535, rs9277554, rs10484569, rs2281390, rs3128917, rs2281388, rs3130215, and rs2269346; (iv) the 16th single nucleotide polymorphic collection includes: rs2216264, rs423639, rs3097669, rs987870, rs1431402, rs1431403, rs9277378, rs9277535, Rs9277550, rs9277554, rs9277565, rs2281390, rs2281388, rs3130215, rs6937034, rs6937061, and rs2395357; wherein, the single nucleotide polymorphism of the (5) HLA-DQB1 genotype is predicted to be selected from a 17th a group consisting of a single nucleotide polymorphic collection, an 18th single nucleotide polymorphic collection, a 19th single nucleotide polymorphic collection, and a 20th single nucleotide polymorphic collection Wherein (i) the 17th single nucleotide polymorphic collection comprises: rs9269186, rs9270986, rs615672, rs3129768, rs9272219, rs9272346, rs6908943, rs9275134, rs9469220, rs6457617, rs2647046, rs2858308, and rs9275418; (ii) The 18th mononucleotide polytype collection includes: rs2647073, rs502055, rs3129768, rs9272535, rs9272723, rs34485459, rs3129716, rs7775228, rs6469219, rs5000634, rs6457617, and rs9275418; (iii) the 19th single nucleotide polymorphism The sexual collections include: rs2516049, rs2858870, rs660895, rs532098, rs3129763, rs1063355, rs9275141, rs9275184, rs7774434, rs7775228, and rs9275224; (iv) the 20th single nucleotide polymorphism collection includes: rs17533090, rs9272219, rs17211510 , rs41269947, rs34485459, rs1063355, rs9275141, rs3129716, rs7774434, rs9405119, rs9469219, rs9469220, and rs9275224; wherein, the (6) HLA is predicted The single nucleotide polymorphism of the -DRB1 gene genotype is selected from a 21st single nucleotide polymorphism set, a 22nd mononucleotide polytype set, and a 23rd single nucleotide polytype. a group consisting of a collection of 24th single nucleotide polymorphisms, wherein (i) the 21st single nucleotide polymorphism collection comprises: rs9268831, rs9268861, rs7747521, rs9268877, rs9269186, rs2027852 , rs615672, rs3129768, rs9272219, rs9272346, rs9275134, rs7775228, rs9469220, rs6457617, rs2647046, and rs2858308; (ii) the 22nd mononucleotide polymorphic collection includes: rs9268877, rs4410767, rs7749092, rs17210980, rs2647073, rs615672 , rs674343, rs502771, rs3997872, rs9271367, rs9271720, rs2187668, rs34485459, rs3129716, and rs9405119; (iii) the 23rd single nucleotide polymorphic collection includes: rs9405098, rs3129871, rs13209234, rs9268832, rs6903608, rs602875, rs660895 , rs9271366, rs3129769, rs17211510, rs2187668, rs9275141, rs9275184, rs9275383, Rs2856717, rs2858305, rs13192471, and rs3104405; (iv) the 24th single nucleotide polymorphic collection includes: rs2395175, rs9405035, rs9268831, rs6903608, rs9268877, rs9269186, rs7749092, rs2027852, rs17210980, rs2516049, rs615672, rs660895, Rs674313, rs502771, rs3997872, rs9271366, rs2187668, rs34485459, rs9275141, rs7755224, rs3129716, and rs3104404. 如申請專利範圍第1項所述之方法,其中該人類核酸樣本係為亞洲人族群。 The method of claim 1, wherein the human nucleic acid sample is an Asian ethnic group. 如申請專利範圍第1項所述之方法,其中該人類核酸樣本係為漢人族群。 The method of claim 1, wherein the human nucleic acid sample is a Han nationality group. 一種預測人類白血球抗原對偶基因之裝置,係包含不超過200個核苷酸探針,其中該探針可檢測如申請專利範圍第1項所述之單核苷酸多型性。 A device for predicting a human leukocyte antigen dual gene comprising no more than 200 nucleotide probes, wherein the probe detects a single nucleotide polymorphism as described in claim 1 of the patent application. 如申請專利範圍第4項所述之裝置,其中該探針係固定於該裝置上。 The device of claim 4, wherein the probe is attached to the device.
TW103114074A 2013-04-17 2014-04-17 Predicting hla genotypes using unphased and flanking single-nucleotide polymorphisms in han chinese population TWI518538B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US201361812800P 2013-04-17 2013-04-17

Publications (2)

Publication Number Publication Date
TW201441857A TW201441857A (en) 2014-11-01
TWI518538B true TWI518538B (en) 2016-01-21

Family

ID=51706698

Family Applications (1)

Application Number Title Priority Date Filing Date
TW103114074A TWI518538B (en) 2013-04-17 2014-04-17 Predicting hla genotypes using unphased and flanking single-nucleotide polymorphisms in han chinese population

Country Status (2)

Country Link
CN (1) CN104109710B (en)
TW (1) TWI518538B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107893113B (en) * 2017-12-30 2020-12-25 广州博富瑞医学检验有限公司 HLA related SNP marker, detection primer pair and determination method thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005082110A2 (en) * 2004-02-26 2005-09-09 Illumina Inc. Haplotype markers for diagnosing susceptibility to immunological conditions
WO2005108624A2 (en) * 2004-05-06 2005-11-17 University Of Chicago, Uctech Use of hla-g genotyping in immune-mediated conditions
US20050266410A1 (en) * 2004-05-19 2005-12-01 Emily Walsh Methods of Human Leukocyte Antigen typing by neighboring single nucleotide polymorphism haplotypes
WO2008110206A1 (en) * 2007-03-13 2008-09-18 Genome Diagnostics B.V. Method for determining a hla-dq haplotype in a subject
WO2012068701A2 (en) * 2010-11-23 2012-05-31 深圳华大基因科技有限公司 Hla genotype-snp linkage database, its constructing method, and hla typing method

Also Published As

Publication number Publication date
TW201441857A (en) 2014-11-01
CN104109710B (en) 2018-02-09
CN104109710A (en) 2014-10-22

Similar Documents

Publication Publication Date Title
Kennedy et al. What has GWAS done for HLA and disease associations?
Boegel et al. HLA typing from RNA-Seq sequence reads
Gandal et al. The road to precision psychiatry: translating genetics into disease mechanisms
Zhou et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease
Carapito et al. Next-generation sequencing of the HLA locus: methods and impacts on HLA typing, population genetics and disease association studies
Alcina et al. Multiple sclerosis risk variant HLA-DRB1* 1501 associates with high expression of DRB1 gene in different human populations
Fan et al. Fine mapping of the psoriasis susceptibility locus PSORS1 supports HLA-C as the susceptibility gene in the Han Chinese population
Pyo et al. Recombinant structures expand and contract inter and intragenic diversification at the KIR locus
Field et al. A polymorphism in the HLA-DPB1 gene is associated with susceptibility to multiple sclerosis
JP2019054812A (en) Methods and processes for non-invasive assessment of genetic variations
Moutsianas et al. Multiple Hodgkin lymphoma–associated loci within the HLA region at chromosome 6p21. 3
Kawashima et al. Evolutionary analysis of classical HLA class I and II genes suggests that recent positive selection acted on DPB1* 04∶ 01 in Japanese population
EP2171626A2 (en) Allelic determination
Wallace et al. Genetics in ocular inflammation—basic principles
Nunes et al. HLA imputation in an admixed population: An assessment of the 1000 Genomes data as a training set
WO2014065410A1 (en) Method and kit for dna typing of hla gene
Ruark et al. The ICR1000 UK exome series: a resource of gene variation in an outbred population
Douillard et al. Approaching genetics through the MHC lens: tools and methods for HLA research
Farahani et al. Frequency of HLA Alleles in a Group of Severe COVID-19 Iranian patients
Gowda et al. Comparative analyses of low, medium and high-resolution HLA typing technologies for human populations
Sakaue et al. Tutorial: a statistical genetics guide to identifying HLA alleles driving complex disease
TWI518538B (en) Predicting hla genotypes using unphased and flanking single-nucleotide polymorphisms in han chinese population
Moutsianas et al. Genetic association in the HLA region
Hsieh et al. Predicting HLA genotypes using unphased and flanking single-nucleotide polymorphisms in Han Chinese population
Baxter-Lowe The changing landscape of HLA typing: understanding how and when HLA typing data can be used with confidence from bench to bedside

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees