TWI807861B - Method for identifying affinity of taiwanese population and system thereof - Google Patents

Method for identifying affinity of taiwanese population and system thereof Download PDF

Info

Publication number
TWI807861B
TWI807861B TW111122256A TW111122256A TWI807861B TW I807861 B TWI807861 B TW I807861B TW 111122256 A TW111122256 A TW 111122256A TW 111122256 A TW111122256 A TW 111122256A TW I807861 B TWI807861 B TW I807861B
Authority
TW
Taiwan
Prior art keywords
nucleic acid
snp
acid sample
taiwanese
kinship
Prior art date
Application number
TW111122256A
Other languages
Chinese (zh)
Other versions
TW202401445A (en
Inventor
蔡輔仁
劉鼎元
林瑋德
Original Assignee
中國醫藥大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中國醫藥大學 filed Critical 中國醫藥大學
Priority to TW111122256A priority Critical patent/TWI807861B/en
Application granted granted Critical
Publication of TWI807861B publication Critical patent/TWI807861B/en
Publication of TW202401445A publication Critical patent/TW202401445A/en

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A method for identifying affinity of Taiwanese population and a system thereof are provided. The method includes acquiring a reference genetic database, performing a nucleic acid sample providing step, performing a nucleic acid detecting step and performing a calculating step. The reference genetic database includes a plurality of SNPs and a plurality of minor allele frequency. After providing a subject nucleic acid sample and a reference nucleic acid sample, nucleotide substitutions of the subject nucleic acid sample and the reference nucleic acid sample at each SNP are analyzed. The nucleotide substitutions of each SNP are compared with the corresponding minor allele frequency, and a probability of paternity is calculated to determine an affinity between the subject and the reference. Therefore, a high-accuracy and low-cost method for identifying affinity of Taiwanese population is established.

Description

鑑定台灣人族群親緣性的方法及其系統Method and System for Identifying the Affinity of Taiwanese Ethnic Groups

本發明提供一種鑑定親緣性的方法及其系統,尤其是一種鑑定台灣人族群親緣性的方法及其系統。The present invention provides a method and system for identifying relatedness, especially a method and system for identifying relatedness of Taiwanese ethnic groups.

親子關係、親緣性測試和人類身分識別於法醫遺傳學、臨床醫學及社會問題領域都是很重要的應用手段。在過去的一個世紀裡,血型、紅血球抗原、人類白血球抗原、紅血球酶、血清蛋白和DNA分型已被開發並用於親緣性鑑定,而其中使用遺傳標記進行DNA分型是現今最廣泛的鑑定方式。目前,用於DNA分型之遺傳標記多為短片段重複序列(short tandem repeat, STR),而STR的聚合酶鏈鎖反應(PCR)是現今最流行且成熟的技術。Paternity, kinship testing, and human identification are all important applications in the fields of forensic genetics, clinical medicine, and social issues. Over the past century, blood grouping, erythrocyte antigen, human leukocyte antigen, erythrocyte enzymes, serum proteins, and DNA typing have been developed and used for relatedness testing, with DNA typing using genetic markers being the most widely used today. At present, most of the genetic markers used for DNA typing are short tandem repeats (STRs), and the polymerase chain reaction (PCR) of STRs is the most popular and mature technology today.

雖然STR對人類DNA分型來說是相對穩定的遺傳標記技術,但仍存在一些應用上的缺陷,例如遺傳過程中基因座特異性突變率較高(10 -4~10 -2)、易受PCR過程所產生之DNA聚合酶錯配產物(stutter product)干擾、在降解程度較高的檢體材料中表現不佳。與之相比,單一核苷酸多型性(single nucleotide polymorphism, SNP)具有明顯較低的基因座特異性突變率(10 -7~10 -8)、PCR產物較短而不易產生DNA聚合酶錯配產物以及在染色體上的分布更加均勻(每1000個核苷酸中即會有1~10個SNP存在)的特性,是系統發育分析(Phylogenetic analysis)的理想遺傳標記。 Although STR is a relatively stable genetic marker technology for human DNA typing, there are still some application defects, such as a high locus-specific mutation rate (10 -4 ~10 -2 ) in the genetic process, susceptibility to interference from DNA polymerase mismatch products (stutter products) generated during the PCR process, and poor performance in highly degraded specimen materials. In contrast, single nucleotide polymorphism (single nucleotide polymorphism, SNP) has the characteristics of significantly lower locus-specific mutation rate (10 -7 ~10 -8 ), shorter PCR products, less likely to produce DNA polymerase mismatch products, and more uniform distribution on chromosomes (1 to 10 SNPs exist in every 1000 nucleotides). It is an ideal genetic marker for Phylogenetic analysis.

此外,各SNP位點在各種族間具有其獨特之次等位基因頻率,故每個種族都應有屬於自己族群的SNP位點遺傳標記。台灣自古位於交通樞紐且歷史上與其他種族間交融頻繁,造就了台灣人族群豐富的基因多樣性,倘若在基因分型方法所使用之遺傳標記仍沿用全人類通用種族SNP進行親緣鑑定,不僅費力費時,且由於通用種族SNP在台灣族群中之次等位基因頻率較低,親緣鑑定的準確性也會降低。In addition, each SNP site has its unique sub-allelic frequency among various races, so each race should have its own SNP site genetic markers. Taiwan has been located in a transportation hub since ancient times and has frequently intermingled with other ethnic groups in history, which has created a rich genetic diversity of the Taiwanese population. If the genetic markers used in the genotyping method still use the SNP of the common race of all humans for genetic identification, it will not only be laborious and time-consuming, but also because the frequency of the second allele of the common ethnic SNP in the Taiwanese population is low, the accuracy of genetic identification will also be reduced.

有鑑於此,本發明提供一種鑑定台灣人族群親緣性的方法及其系統,所述鑑定台灣人族群親緣性的方法透過分析一台灣人全基因資料庫建立起專屬於台灣人族群的參考基因組資料庫,其中包含台灣人族群專屬且具有更高識別能力的遺傳標記,可省時省力且精確地判定主檢測者與被檢測者間親緣性鑑定結果。In view of this, the present invention provides a method and system for identifying the relatedness of the Taiwanese ethnic group. The method for identifying the relatedness of the Taiwanese ethnic group establishes a reference genome database specific to the Taiwanese ethnic group by analyzing a Taiwanese full gene database, which contains genetic markers that are exclusive to the Taiwanese ethnic group and have higher recognition capabilities, which can save time and effort and accurately determine the relatedness identification results between the main tester and the tested person.

本發明之一態樣是在提供一種鑑定台灣人族群親緣性的方法,包含以下步驟:取得一參照基因組資料庫、進行一提供核酸樣本步驟、進行一核酸檢測步驟以及進行一計算步驟。所述參照基因組資料庫藉由一生物信息計算程式分析一台灣人全基因組資料庫並建立一參照次等位基因頻率集合及一SNP位點組合,其中所述SNP位點組合包含複數個SNP位點,所述參照次等位基因頻率集合包含相對應所述SNP位點之複數個次等位基因頻率,所述SNP位點位於第1~22對染色體上,各SNP位點之基因缺失率小於0.1,且經連鎖不平衡裁切後各次等位基因頻率大於0.4995。提供核酸樣本步驟係提供一主檢測者之一主測核酸樣本與一被檢測者之一被測核酸樣本。核酸檢測步驟係使用一核酸檢測方法檢測主測核酸樣本和被測核酸樣本中對應所述SNP位點組合中各SNP位點之複數個核苷酸組成。計算步驟係比對各SNP位點之核苷酸組成與相應之次等位基因頻率以計算出一親緣機率,再依據親緣機率判定主檢測者及被檢測者之親緣性。One aspect of the present invention is to provide a method for identifying the affinities of Taiwanese ethnic groups, including the following steps: obtaining a reference genome database, performing a nucleic acid sample providing step, performing a nucleic acid detection step, and performing a calculation step. The reference genome database analyzes a Taiwanese whole genome database through a biological information calculation program and establishes a reference sub-allele frequency set and a SNP site combination, wherein the SNP site combination includes a plurality of SNP sites, and the reference sub-allelic frequency set includes a plurality of sub-allele frequencies corresponding to the SNP sites. The SNP sites are located on the 1st to 22nd pairs of chromosomes. The minor allele frequency is greater than 0.4995. The step of providing a nucleic acid sample is to provide a main detection nucleic acid sample of a main detector and a detection nucleic acid sample of a subject. The nucleic acid detection step is to use a nucleic acid detection method to detect a plurality of nucleotide compositions corresponding to each SNP site in the SNP site combination in the main test nucleic acid sample and the test nucleic acid sample. The calculation step is to compare the nucleotide composition of each SNP site with the corresponding secondary allele frequency to calculate a relative probability, and then determine the relatedness of the main tester and the tested subject according to the relative probability.

本發明之另一態樣是在提供一種鑑定台灣人族群親緣性的系統,其包含一核酸萃取單元、一核酸檢測單元以及一非暫態機器可讀媒體。核酸萃取單元用以獲得一主檢測者之一主測核酸樣本和一被檢測者之一被測核酸樣本。核酸檢測單元電性連接該核酸萃取單元,用以檢測主測核酸樣本和被測核酸樣本中一SNP位點組合之複數個核苷酸組成,其中所述SNP位點組合包含複數個SNP位點,所述SNP位點位於第1~22對染色體上。非暫態機器可讀媒體訊號連接核酸檢測單元,用以存取一程式用以分析主測核酸樣本和被測核酸樣本之核苷酸組成並判定一親緣機率。非暫態機器可讀媒體包含一參照基因組資料庫及一計算單元。參照基因組資料庫包含所述SNP位點組合及參照次等位基因頻率集合,其中所述SNP位點組合及參照次等位基因頻率集合係藉由分析一台灣人全基因組資料庫所建立,所述參照次等位基因頻率集合包含相對應所述SNP位點之複數個次等位基因頻率,各SNP位點之基因缺失率小於0.1,且經連鎖不平衡裁切後各次等位基因頻率大於0.4995。計算單元訊號連接參照基因組資料庫,用以比對各SNP位點之核苷酸組成與相對應之次等位基因頻率並計算以得到一親緣機率,再依據親緣機率判定主檢測者及被檢測者之親緣性。Another aspect of the present invention is to provide a system for identifying ethnic group affinity of Taiwanese, which includes a nucleic acid extraction unit, a nucleic acid detection unit, and a non-transitory machine-readable medium. The nucleic acid extraction unit is used to obtain a main test nucleic acid sample of a main tester and a test nucleic acid sample of a test subject. The nucleic acid detection unit is electrically connected to the nucleic acid extraction unit, and is used to detect a plurality of nucleotide compositions of a SNP site combination in the main test nucleic acid sample and the test nucleic acid sample, wherein the SNP site combination includes a plurality of SNP sites, and the SNP sites are located on the 1st to 22nd pair of chromosomes. The non-transitory machine-readable medium is signally connected to the nucleic acid detection unit for accessing a program for analyzing the nucleotide composition of the main nucleic acid sample and the nucleic acid sample to be tested and determining a relative probability. The non-transitory machine readable medium includes a reference genome database and a computing unit. The reference genome database includes the SNP site combination and the reference minor allele frequency set, wherein the SNP site combination and the reference minor allele frequency set are established by analyzing a Taiwanese whole genome database, the reference minor allele frequency set includes multiple minor allele frequencies corresponding to the SNP site, the gene deletion rate of each SNP site is less than 0.1, and each minor allele frequency after linkage disequilibrium pruning is greater than 0.4995. The calculation unit is signal-connected to the reference genome database to compare the nucleotide composition of each SNP site and the corresponding secondary allele frequency and calculate to obtain a relative probability, and then determine the relative relationship between the main tester and the tested person according to the relative probability.

[鑑定台灣人族群親緣性的方法][Methods to Identify the Affinity of Taiwanese Ethnic Groups]

請參照第1圖和第2圖,第1圖繪示本發明一實施方式之鑑定台灣人族群親緣性的方法100的步驟流程圖,第2圖繪示第1圖之鑑定台灣人族群親緣性的方法100之計算步驟140的流程圖。在第1圖中,鑑定台灣人族群親緣性的方法100包含步驟110、步驟120、步驟130以及步驟140。Please refer to FIG. 1 and FIG. 2. FIG. 1 shows a flow chart of the steps of the method 100 for identifying the ethnic group affinity of Taiwanese according to an embodiment of the present invention, and FIG. 2 shows a flow chart of the calculation step 140 of the method 100 for identifying the ethnic group affinity of Taiwanese in FIG. In FIG. 1 , the method 100 for identifying the affinities of Taiwanese ethnic groups includes step 110 , step 120 , step 130 and step 140 .

步驟110為取得參照基因組資料庫,所述參照基因組資料庫係藉由一生物信息計算程式分析一台灣人全基因組資料庫並建立一參照次等位基因頻率集合及一SNP位點組合。其中SNP位點組合包含複數個SNP位點,參照次等位基因頻率集合包含相對應所述SNP位點之複數個次等位基因頻率,所述SNP位點位於第1~22對染色體上,各SNP位點之基因缺失率小於0.1,且經連鎖不平衡裁切後各次等位基因頻率大於0.4995。Step 110 is to obtain a reference genome database. The reference genome database is analyzed by a bioinformatics calculation program to analyze a Taiwanese whole genome database and establish a reference sub-allelic frequency set and a SNP locus combination. Wherein the SNP locus combination includes a plurality of SNP loci, the reference minor allele frequency set includes a plurality of minor allele frequencies corresponding to the SNP loci, the SNP loci are located on chromosomes 1 to 22, the gene deletion rate of each SNP locus is less than 0.1, and each minor allele frequency is greater than 0.4995 after linkage disequilibrium pruning.

步驟120為進行一提供核酸樣本步驟,其係提供一主檢測者之一主測核酸樣本與一被檢測者之一被測核酸樣本。具體來說,主測核酸樣本和被測核酸樣本可為分別取自於主檢測者或被檢測者中含有DNA之檢體,較佳地,主測核酸樣本和被測核酸樣本可來自於血液、頭髮、骨骼、皮屑或體液。Step 120 is to perform a step of providing a nucleic acid sample, which is to provide a main detection nucleic acid sample of a main detector and a detection nucleic acid sample of a subject. Specifically, the main test nucleic acid sample and the test nucleic acid sample can be obtained from samples containing DNA in the main tester or the test subject respectively. Preferably, the main test nucleic acid sample and the test nucleic acid sample can come from blood, hair, bone, dander or body fluid.

步驟130為進行一核酸檢測步驟,其係使用一核酸檢測方法檢測主測核酸樣本和被測核酸樣本中對應所述SNP位點組合中各SNP位點之複數個核苷酸組成。其中核酸檢測方法可包含使用一生物晶片、一化學試劑或一基質輔助雷射解析串聯飛行時間質譜儀所執行之基因檢測法,但本發明並不以此為限。進一步來說,所述核酸檢測法可為一鑑定酵素切割法、一核酸片段質量差異檢測法、一螢光探針偵測法、一核酸片段構型變異法或一核酸定序分析法,但本發明並不以此為限。Step 130 is to perform a nucleic acid detection step, which is to use a nucleic acid detection method to detect a plurality of nucleotide compositions corresponding to each SNP site in the SNP site combination in the main test nucleic acid sample and the test nucleic acid sample. The nucleic acid detection method may include a genetic detection method performed by using a biochip, a chemical reagent or a matrix-assisted laser desorption tandem time-of-flight mass spectrometer, but the present invention is not limited thereto. Further, the nucleic acid detection method can be an identification enzyme cleavage method, a nucleic acid fragment mass difference detection method, a fluorescent probe detection method, a nucleic acid fragment configuration variation method or a nucleic acid sequencing analysis method, but the present invention is not limited thereto.

步驟140為進行一計算步驟,其係比對各SNP位點之核苷酸組成與相對應之次等位基因頻率以計算出一親緣機率,再依據親緣機率判定主檢測者及被檢測者之親緣性。請一併參照第2圖,計算步驟140包含步驟141、步驟142、步驟143以及步驟144。Step 140 is a calculation step, which is to compare the nucleotide composition of each SNP site and the corresponding minor allele frequency to calculate a relative probability, and then determine the relatedness of the main tester and the tested person according to the relative probability. Please also refer to FIG. 2 , the calculation step 140 includes a step 141 , a step 142 , a step 143 and a step 144 .

步驟141為將主測核酸樣本與被測核酸樣本在同一個SNP位點之核苷酸組成與相對應之次等位基因頻率進行比對,以得到複數個目標次等位基因頻率。步驟142為將目標次等位基因頻率分別對所述SNP位點進行計算,以得到複數個親緣指數。步驟143為將親緣指數相乘積以計算出一累積親緣指數。步驟144為利用累積親緣指數計算出一親緣機率。具體來說,請參照第3圖和表一,第3圖繪示本發明一實施方式之鑑定台灣人族群親緣性的方法100中SNP位點組合的位點示意圖,表一為本發明之SNP位點組合中各SNP位點及其相對應之次等位基因頻率。本發明之鑑定台灣人族群親緣性的方法100之SNP位點組合的SNP位點可選自如SEQ ID NO: 1至SEQ ID NO: 176所示序列所構成的群組,共176個SNP位點。 表一 SEQ ID SNP位點 染色體 參考 等位基因 替代 等位基因 TPMI 等位基因 次等位基因頻率 NO: 1 rs6586535 1 C T C 0.4996 NO: 2 rs946836 1 C T T 0.4996 NO: 3 rs6694465 1 T G G 0.4996 NO: 4 rs701614 1 G A A 0.4996 NO: 5 rs9431708 1 A G G 0.4996 NO: 6 rs1425613 2 C T T 0.4996 NO: 7 rs11709353 3 G A A 0.4996 NO: 8 rs56027863 3 A G G 0.4996 NO: 9 rs4974500 3 T C C 0.4996 NO: 10 rs55768019 4 A G G 0.4996 NO: 11 rs16875084 5 G A G 0.4996 NO: 12 rs476428 5 G A G 0.4996 NO: 13 rs193491 5 T C T 0.4996 NO: 14 rs6871253 5 C T C 0.4996 NO: 15 rs3095250 6 C T C 0.4996 NO: 16 rs3851224 6 C G G 0.4996 NO: 17 rs12703023 7 T C C 0.4996 NO: 18 rs10954797 8 G A A 0.4996 NO: 19 rs7832232 8 A G A 0.4996 NO: 20 rs1025668 8 A G G 0.4996 NO: 21 rs1991718 8 C T T 0.4996 NO: 22 rs7854620 9 C A C 0.4996 NO: 23 rs10988509 9 G A G 0.4996 NO: 24 rs10826449 10 T C C 0.4996 NO: 25 rs7136376 12 C T C 0.4996 NO: 26 rs161966 12 G C G 0.4996 NO: 27 rs17456768 13 C T T 0.4996 NO: 28 rs9517294 13 G A A 0.4996 NO: 29 rs7992643 13 G C C 0.4996 NO: 30 rs7164594 15 C T C 0.4996 NO: 31 rs1079572 16 G A G 0.4996 NO: 32 rs7499814 16 C A A 0.4996 NO: 33 rs66491176 17 G A G 0.4996 NO: 34 rs4793579 17 A G A 0.4996 NO: 35 rs55865255 17 C A C 0.4996 NO: 36 rs7207216 17 G T T 0.4996 NO: 37 rs4891023 18 T C C 0.4996 NO: 38 rs9305268 21 T C T 0.4996 NO: 39 rs7521902 1 C A C 0.4997 NO: 40 rs284164 1 C T C 0.4997 NO: 41 rs4538254 2 C A A 0.4997 NO: 42 rs1344706 2 A C C 0.4997 NO: 43 rs10178377 2 T C T 0.4997 NO: 44 rs9822113 3 T C T 0.4997 NO: 45 rs4401376 3 T C C 0.4997 NO: 46 rs6786840 3 C T T 0.4997 NO: 47 rs13128397 4 A G A 0.4997 NO: 48 rs11932259 4 C A A 0.4997 NO: 49 rs9968429 4 A G G 0.4997 NO: 50 rs1443402 5 C T C 0.4997 NO: 51 rs4703389 5 G A A 0.4997 NO: 52 rs4286720 5 A G G 0.4997 NO: 53 rs11242704 6 A G A 0.4997 NO: 54 rs9372417 6 G A G 0.4997 NO: 55 rs6920965 6 G A G 0.4997 NO: 56 rs208869 6 T C C 0.4997 NO: 57 rs2041009 7 A G G 0.4997 NO: 58 rs12680146 8 C T C 0.4997 NO: 59 rs3847227 9 A G G 0.4997 NO: 60 rs7038346 9 A G A 0.4997 NO: 61 rs10962366 9 T C T 0.4997 NO: 62 rs7043796 9 C T T 0.4997 NO: 63 rs11006252 10 T C T 0.4997 NO: 64 rs4746992 10 C T C 0.4997 NO: 65 rs10887637 10 A G A 0.4997 NO: 66 rs2003906 11 A G G 0.4997 NO: 67 rs7926370 11 A G G 0.4997 NO: 68 rs10844220 12 A C A 0.4997 NO: 69 rs710681 12 C T T 0.4997 NO: 70 rs4981030 12 A G A 0.4997 NO: 71 rs9530834 13 A G A 0.4997 NO: 72 rs7166130 15 A T T 0.4997 NO: 73 rs8062124 16 C A C 0.4997 NO: 74 rs9932649 16 T G G 0.4997 NO: 75 rs2966063 16 A G A 0.4997 NO: 76 rs430639 17 G T T 0.4997 NO: 77 rs11081589 18 T C C 0.4997 NO: 78 rs2033491 19 C A A 0.4997 NO: 79 rs4814615 20 G A G 0.4997 NO: 80 rs885985 22 G A A 0.4997 NO: 81 rs12403557 1 G A A 0.4998 NO: 82 rs143290884 1 AG - AG 0.4998 NO: 83 rs10932127 2 G T G 0.4998 NO: 84 rs1032665 3 C T C 0.4998 NO: 85 rs4580593 3 C A C 0.4998 NO: 86 rs12640221 4 A G G 0.4998 NO: 87 rs986039 4 A G G 0.4998 NO: 88 rs1877731 4 C G G 0.4998 NO: 89 rs28582382 4 A G G 0.4998 NO: 90 rs9296249 6 T C T 0.4998 NO: 91 rs55668741 6 T C T 0.4998 NO: 92 rs11753921 6 T C T 0.4998 NO: 93 rs9690126 7 G A G 0.4998 NO: 94 rs12680842 8 A G A 0.4998 NO: 95 rs2929843 8 G A G 0.4998 NO: 96 rs4409435 8 T C T 0.4998 NO: 97 rs10809234 9 T G G 0.4998 NO: 98 rs7023738 9 A C C 0.4998 NO: 99 rs11144120 9 G T G 0.4998 NO: 100 rs10869499 9 A G G 0.4998 NO: 101 rs6482847 10 A G G 0.4998 NO: 102 rs2132966 11 A G A 0.4998 NO: 103 rs577948 11 A G G 0.4998 NO: 104 rs3741851 12 A G G 0.4998 NO: 105 rs11171598 12 C A A 0.4998 NO: 106 rs9573483 13 C T T 0.4998 NO: 107 rs12898878 15 C T T 0.4998 NO: 108 rs78526880 15 G A A 0.4998 NO: 109 rs12597411 16 C T C 0.4998 NO: 110 rs62034138 16 G A G 0.4998 NO: 111 rs67048050 16 A G A 0.4998 NO: 112 rs4368195 17 T C C 0.4998 NO: 113 rs3859191 17 G A G 0.4998 NO: 114 rs349989 17 T C T 0.4998 NO: 115 rs11871847 17 C G G 0.4998 NO: 116 rs6037894 20 T C C 0.4998 NO: 117 rs2207878 20 A G G 0.4998 NO: 118 rs61778328 1 C T T 0.4999 NO: 119 rs12759780 1 T G T 0.4999 NO: 120 rs642307 1 C A C 0.4999 NO: 121 rs910622 1 C T T 0.4999 NO: 122 rs33941127 1 C T T 0.4999 NO: 123 rs1544846 2 C T T 0.4999 NO: 124 rs10182721 2 C T C 0.4999 NO: 125 rs1158228 3 A G G 0.4999 NO: 126 rs2340475 3 C T C 0.4999 NO: 127 rs13102188 4 G T G 0.4999 NO: 128 rs6858430 4 T C T 0.4999 NO: 129 rs9502570 6 C T T 0.4999 NO: 130 rs9257185 6 A G G 0.4999 NO: 131 rs9349364 6 A G A 0.4999 NO: 132 rs62495696 8 A G G 0.4999 NO: 133 rs4397385 8 G A A 0.4999 NO: 134 rs1332312 9 A G A 0.4999 NO: 135 rs13294439 9 A C C 0.4999 NO: 136 rs7033078 9 T C T 0.4999 NO: 137 rs1452289 10 T C T 0.4999 NO: 138 rs7936903 11 T C C 0.4999 NO: 139 rs1953655 13 C T C 0.4999 NO: 140 rs7981566 13 C T C 0.4999 NO: 141 rs17792748 14 C T T 0.4999 NO: 142 rs61985798 14 C T C 0.4999 NO: 143 rs8006042 14 A G G 0.4999 NO: 144 rs883481 15 G A G 0.4999 NO: 145 rs77359952 15 G A G 0.4999 NO: 146 rs2305443 15 C T C 0.4999 NO: 147 rs4787247 16 C T C 0.4999 NO: 148 rs572858 18 G A G 0.4999 NO: 149 rs11673399 19 T C T 0.4999 NO: 150 rs28456308 20 C T C 0.4999 NO: 151 rs117294 22 A C A 0.4999 NO: 152 rs357063 1 T C C 0.5 NO: 153 rs12473958 2 A G A 0.5 NO: 154 rs7580245 2 T C T 0.5 NO: 155 rs1440512 3 C T T 0.5 NO: 156 rs13314271 3 T C T 0.5 NO: 157 rs34819461 4 C - - 0.5 NO: 158 rs3805285 4 G A A 0.5 NO: 159 rs17030363 4 G A A 0.5 NO: 160 rs258129 5 G A G 0.5 NO: 161 rs9479343 6 A G A 0.5 NO: 162 rs17170324 7 G C C 0.5 NO: 163 rs12705317 7 C T C 0.5 NO: 164 rs73174654 7 A G G 0.5 NO: 165 rs2978213 8 T C C 0.5 NO: 166 rs72614682 9 C T C 0.5 NO: 167 rs35051342 11 G C C 0.5 NO: 168 rs717582 11 T C C 0.5 NO: 169 rs11439588 11 - G - 0.5 NO: 170 rs72736093 15 G A G 0.5 NO: 171 rs4932564 15 A G A 0.5 NO: 172 rs918703 16 T C T 0.5 NO: 173 rs7499886 16 G A A 0.5 NO: 174 rs2058306 17 G C C 0.5 NO: 175 rs1785550 18 C T C 0.5 NO: 176 rs6089982 20 C T T 0.5 Step 141 is to compare the nucleotide composition of the main test nucleic acid sample and the test nucleic acid sample at the same SNP site with the corresponding minor allele frequencies to obtain a plurality of target minor allele frequencies. Step 142 is to calculate the frequency of the target secondary allele for the SNP loci respectively to obtain a plurality of relatedness indices. Step 143 is to multiply the relatedness indices to calculate a cumulative relatedness index. Step 144 is to calculate a relative probability by using the cumulative relative index. Specifically, please refer to FIG. 3 and Table 1. FIG. 3 shows a schematic diagram of the combination of SNP loci in the method 100 for identifying the affinity of Taiwanese ethnic groups according to an embodiment of the present invention. Table 1 shows the frequency of each SNP locus and its corresponding minor allele in the combination of SNP loci of the present invention. The SNP sites of the SNP site combination of the method 100 for identifying the affinity of the Taiwanese ethnic group of the present invention can be selected from the group consisting of the sequences shown in SEQ ID NO: 1 to SEQ ID NO: 176, a total of 176 SNP sites. Table I SEQ ID SNP site chromosome reference allele alternative allele TPMI alleles Minor allele frequency NO: 1 rs6586535 1 C T C 0.4996 NO: 2 rs946836 1 C T T 0.4996 NO: 3 rs6694465 1 T G G 0.4996 NO: 4 rs701614 1 G A A 0.4996 NO: 5 rs9431708 1 A G G 0.4996 NO: 6 rs1425613 2 C T T 0.4996 NO: 7 rs11709353 3 G A A 0.4996 NO: 8 rs56027863 3 A G G 0.4996 NO: 9 rs4974500 3 T C C 0.4996 NO: 10 rs55768019 4 A G G 0.4996 NO: 11 rs16875084 5 G A G 0.4996 NO: 12 rs476428 5 G A G 0.4996 NO: 13 rs193491 5 T C T 0.4996 NO: 14 rs6871253 5 C T C 0.4996 NO: 15 rs3095250 6 C T C 0.4996 NO: 16 rs3851224 6 C G G 0.4996 NO: 17 rs12703023 7 T C C 0.4996 NO: 18 rs10954797 8 G A A 0.4996 NO: 19 rs7832232 8 A G A 0.4996 NO: 20 rs1025668 8 A G G 0.4996 NO: 21 rs1991718 8 C T T 0.4996 NO: 22 rs7854620 9 C A C 0.4996 NO: 23 rs10988509 9 G A G 0.4996 NO: 24 rs10826449 10 T C C 0.4996 NO: 25 rs7136376 12 C T C 0.4996 NO: 26 rs161966 12 G C G 0.4996 NO: 27 rs17456768 13 C T T 0.4996 NO: 28 rs9517294 13 G A A 0.4996 NO: 29 rs7992643 13 G C C 0.4996 NO: 30 rs7164594 15 C T C 0.4996 NO: 31 rs1079572 16 G A G 0.4996 NO: 32 rs7499814 16 C A A 0.4996 NO: 33 rs66491176 17 G A G 0.4996 NO: 34 rs4793579 17 A G A 0.4996 NO: 35 rs55865255 17 C A C 0.4996 NO: 36 rs7207216 17 G T T 0.4996 NO: 37 rs4891023 18 T C C 0.4996 NO: 38 rs9305268 twenty one T C T 0.4996 NO: 39 rs7521902 1 C A C 0.4997 NO: 40 rs284164 1 C T C 0.4997 NO: 41 rs4538254 2 C A A 0.4997 NO: 42 rs1344706 2 A C C 0.4997 NO: 43 rs10178377 2 T C T 0.4997 NO: 44 rs9822113 3 T C T 0.4997 NO: 45 rs4401376 3 T C C 0.4997 NO: 46 rs6786840 3 C T T 0.4997 NO: 47 rs13128397 4 A G A 0.4997 NO: 48 rs11932259 4 C A A 0.4997 NO: 49 rs9968429 4 A G G 0.4997 NO: 50 rs1443402 5 C T C 0.4997 NO: 51 rs4703389 5 G A A 0.4997 NO: 52 rs4286720 5 A G G 0.4997 NO: 53 rs11242704 6 A G A 0.4997 NO: 54 rs9372417 6 G A G 0.4997 NO: 55 rs6920965 6 G A G 0.4997 NO: 56 rs208869 6 T C C 0.4997 NO: 57 rs2041009 7 A G G 0.4997 NO: 58 rs12680146 8 C T C 0.4997 NO: 59 rs3847227 9 A G G 0.4997 NO: 60 rs7038346 9 A G A 0.4997 NO: 61 rs10962366 9 T C T 0.4997 NO: 62 rs7043796 9 C T T 0.4997 NO: 63 rs11006252 10 T C T 0.4997 NO: 64 rs4746992 10 C T C 0.4997 NO: 65 rs10887637 10 A G A 0.4997 NO: 66 rs2003906 11 A G G 0.4997 NO: 67 rs7926370 11 A G G 0.4997 NO: 68 rs10844220 12 A C A 0.4997 NO: 69 rs710681 12 C T T 0.4997 NO: 70 rs4981030 12 A G A 0.4997 NO: 71 rs9530834 13 A G A 0.4997 NO: 72 rs7166130 15 A T T 0.4997 NO: 73 rs8062124 16 C A C 0.4997 NO: 74 rs9932649 16 T G G 0.4997 NO: 75 rs2966063 16 A G A 0.4997 NO: 76 rs430639 17 G T T 0.4997 NO: 77 rs11081589 18 T C C 0.4997 NO: 78 rs2033491 19 C A A 0.4997 NO: 79 rs4814615 20 G A G 0.4997 NO: 80 rs885985 twenty two G A A 0.4997 NO: 81 rs12403557 1 G A A 0.4998 NO: 82 rs143290884 1 AG - AG 0.4998 NO: 83 rs10932127 2 G T G 0.4998 NO: 84 rs1032665 3 C T C 0.4998 NO: 85 rs4580593 3 C A C 0.4998 NO: 86 rs12640221 4 A G G 0.4998 NO: 87 rs986039 4 A G G 0.4998 NO: 88 rs1877731 4 C G G 0.4998 NO: 89 rs28582382 4 A G G 0.4998 NO: 90 rs9296249 6 T C T 0.4998 NO: 91 rs55668741 6 T C T 0.4998 NO: 92 rs11753921 6 T C T 0.4998 NO: 93 rs9690126 7 G A G 0.4998 NO: 94 rs12680842 8 A G A 0.4998 NO: 95 rs2929843 8 G A G 0.4998 NO: 96 rs4409435 8 T C T 0.4998 NO: 97 rs10809234 9 T G G 0.4998 NO: 98 rs7023738 9 A C C 0.4998 NO: 99 rs11144120 9 G T G 0.4998 NO: 100 rs10869499 9 A G G 0.4998 NO: 101 rs6482847 10 A G G 0.4998 NO: 102 rs2132966 11 A G A 0.4998 NO: 103 rs577948 11 A G G 0.4998 NO: 104 rs3741851 12 A G G 0.4998 NO: 105 rs11171598 12 C A A 0.4998 NO: 106 rs9573483 13 C T T 0.4998 NO: 107 rs12898878 15 C T T 0.4998 NO: 108 rs78526880 15 G A A 0.4998 NO: 109 rs12597411 16 C T C 0.4998 NO: 110 rs62034138 16 G A G 0.4998 NO: 111 rs67048050 16 A G A 0.4998 NO: 112 rs4368195 17 T C C 0.4998 NO: 113 rs3859191 17 G A G 0.4998 NO: 114 rs349989 17 T C T 0.4998 NO: 115 rs11871847 17 C G G 0.4998 NO: 116 rs6037894 20 T C C 0.4998 NO: 117 rs2207878 20 A G G 0.4998 NO: 118 rs61778328 1 C T T 0.4999 NO: 119 rs12759780 1 T G T 0.4999 NO: 120 rs642307 1 C A C 0.4999 NO: 121 rs910622 1 C T T 0.4999 NO: 122 rs33941127 1 C T T 0.4999 NO: 123 rs1544846 2 C T T 0.4999 NO: 124 rs10182721 2 C T C 0.4999 NO: 125 rs1158228 3 A G G 0.4999 NO: 126 rs2340475 3 C T C 0.4999 NO: 127 rs13102188 4 G T G 0.4999 NO: 128 rs6858430 4 T C T 0.4999 NO: 129 rs9502570 6 C T T 0.4999 NO: 130 rs9257185 6 A G G 0.4999 NO: 131 rs9349364 6 A G A 0.4999 NO: 132 rs62495696 8 A G G 0.4999 NO: 133 rs4397385 8 G A A 0.4999 NO: 134 rs1332312 9 A G A 0.4999 NO: 135 rs13294439 9 A C C 0.4999 NO: 136 rs7033078 9 T C T 0.4999 NO: 137 rs1452289 10 T C T 0.4999 NO: 138 rs7936903 11 T C C 0.4999 NO: 139 rs1953655 13 C T C 0.4999 NO: 140 rs7981566 13 C T C 0.4999 NO: 141 rs17792748 14 C T T 0.4999 NO: 142 rs61985798 14 C T C 0.4999 NO: 143 rs8006042 14 A G G 0.4999 NO: 144 rs883481 15 G A G 0.4999 NO: 145 rs77359952 15 G A G 0.4999 NO: 146 rs2305443 15 C T C 0.4999 NO: 147 rs4787247 16 C T C 0.4999 NO: 148 rs572858 18 G A G 0.4999 NO: 149 rs11673399 19 T C T 0.4999 NO: 150 rs28456308 20 C T C 0.4999 NO: 151 rs117294 twenty two A C A 0.4999 NO: 152 rs357063 1 T C C 0.5 NO: 153 rs12473958 2 A G A 0.5 NO: 154 rs7580245 2 T C T 0.5 NO: 155 rs1440512 3 C T T 0.5 NO: 156 rs13314271 3 T C T 0.5 NO: 157 rs34819461 4 C - - 0.5 NO: 158 rs3805285 4 G A A 0.5 NO: 159 rs17030363 4 G A A 0.5 NO: 160 rs258129 5 G A G 0.5 NO: 161 rs9479343 6 A G A 0.5 NO: 162 rs17170324 7 G C C 0.5 NO: 163 rs12705317 7 C T C 0.5 NO: 164 rs73174654 7 A G G 0.5 NO: 165 rs2978213 8 T C C 0.5 NO: 166 rs72614682 9 C T C 0.5 NO: 167 rs35051342 11 G C C 0.5 NO: 168 rs717582 11 T C C 0.5 NO: 169 rs11439588 11 - G - 0.5 NO: 170 rs72736093 15 G A G 0.5 NO: 171 rs4932564 15 A G A 0.5 NO: 172 rs918703 16 T C T 0.5 NO: 173 rs7499886 16 G A A 0.5 NO: 174 rs2058306 17 G C C 0.5 NO: 175 rs1785550 18 C T C 0.5 NO: 176 rs6089982 20 C T T 0.5

詳細地說,參照基因組資料庫之建立係根據中國醫藥大學附屬醫院(CMUH)基因資料庫作為數據源。候選SNP位點和等位基因頻率的計算係採集其中18至75歲的成年人的血液樣本並提取核酸樣本(DNA),再使用TPMv1 SNP陣列對個別核酸樣本進行基因分型,並以PLINK1.9進行分析各SNP位點並設置質量控制篩選條件,不滿足以下條件之SNP位點則皆被排除:SNP缺失率(geno 0.1)、樣本(mind 0.1)、哈代-溫伯格平衡(Hardy-Weinberg equilibrium)p值 < 10 -4、次等位基因頻率 < 0.3。篩選後的SNP位點再設置參數:window size=250、step size=5、r2 threshold=0.1進行連鎖不平衡(Linkage disequilibrium, LD)裁切。此外,上述通過質量控制篩選之核酸樣本也同時藉由主成分分析(PCA)將參照基因組資料庫中非台灣人口的數據剔除。經質量篩選流程,共篩選出82,934個變異體和173,135個人,並得出此參照基因組資料庫的總分型率為0.9972。最後,挑出其中次等位基因頻率大於0.4995的SNP位點組成一SNP位點組合,並以所選的SNP位點相對應之次等位基因頻率建立為一參照次等位基因頻率集合,其結果如表一所示。 In detail, the establishment of the reference genome database is based on the genetic database of the China Medical University Hospital (CMUH) as the data source. Candidate SNP sites and allele frequencies were calculated by collecting blood samples from adults aged 18 to 75 and extracting nucleic acid samples (DNA), then using TPMv1 SNP arrays to genotype individual nucleic acid samples, and using PLINK1.9 to analyze each SNP site and set quality control screening conditions. SNP sites that did not meet the following conditions were excluded: SNP deletion rate (geno 0.1), sample (mind 0.1), Hardy-Weinberg balance (Hardy-Weinberg balance (Hardy-Weinberg balance) Weinberg equilibrium) p value < 10 -4 , minor allele frequency < 0.3. After screening, the parameters of the SNP sites were set: window size=250, step size=5, r2 threshold=0.1 for linkage disequilibrium (LD) cutting. In addition, the above-mentioned nucleic acid samples that passed the quality control screening were also excluded from the non-Taiwanese population data in the reference genome database by principal component analysis (PCA). Through the quality screening process, a total of 82,934 variants and 173,135 individuals were screened out, and the total typing rate of this reference genome database was 0.9972. Finally, select the SNP sites whose minor allele frequency is greater than 0.4995 to form a SNP site combination, and establish a reference minor allele frequency set with the minor allele frequency corresponding to the selected SNP site, and the results are shown in Table 1.

而步驟140所述之親緣性的計算與判定,是先計算主測核酸樣本與被測核酸樣本在各SNP位點之親緣指數後,再將176個位點之親緣指數相乘,得到累積親緣指數(combined paternity index, CPI),接著再將累積親緣指數換算為PP%,其換算公式為:PP%=CPI/(CPI+1)x100%。當PP% > 99.99%時,主測核酸樣本與被測核酸樣本會被判定為有親緣性。其中,親緣指數是先根據主測核酸樣本與被測核酸樣本在同一SNP位點之核苷酸組判定其基因型,並比對參照次等位基因頻率集合內相對應的次等位基因頻率後,經由表二所示的公式而得出。在表二中,P A為所在SNP位點在參照次等位基因集合中所對應之父系遺傳之次等位基因頻率,P B為所在SNP位點在參照次等位基因集合中相對應之母系遺傳之次等位基因頻率。 表二 親代基因型 子代基因型 親緣指數 (PI) AA AA 1/ P A AA AB 1/(P A*2) AA BB 0.0001 AB AA 1/(P A*2) AB AB 1/(P A*P B*4) AB BB 1/(P A*2) BB AA 0.0001 BB AB 1/(P A*2) BB BB 1/ P A The calculation and determination of relatedness described in step 140 is to first calculate the relatedness index of the main test nucleic acid sample and the tested nucleic acid sample at each SNP site, and then multiply the relatedness index of 176 sites to obtain the combined paternity index (combined paternity index, CPI), and then convert the combined paternity index into PP%, and the conversion formula is: PP%=CPI/(CPI+1)×100%. When PP% > 99.99%, the main test nucleic acid sample and the test nucleic acid sample will be judged to be related. Among them, the kinship index is first determined based on the nucleotide group of the main nucleic acid sample and the tested nucleic acid sample at the same SNP site to determine its genotype, and after comparing the corresponding minor allele frequencies in the reference minor allele frequency set, it is obtained through the formula shown in Table 2. In Table 2, PA is the paternally inherited minor allele frequency corresponding to the SNP site in the reference minor allele set, and P B is the maternally inherited minor allele frequency corresponding to the SNP site in the reference minor allele set. Table II parental genotype offspring genotype kinship index (PI) AAA AAA 1/ P A AAA AB 1/(P A *2) AAA BB 0.0001 AB AAA 1/(P A *2) AB AB 1/(P A *P B *4) AB BB 1/(P A *2) BB AAA 0.0001 BB AB 1/(P A *2) BB BB 1/ P A

[鑑定台灣人族群親緣性的系統][A system for identifying the relatedness of Taiwanese ethnic groups]

請參照第4圖,第4圖繪示本發明之另一實施方式之一實施例之鑑定台灣人族群親緣性的系統200的方塊圖。在第4圖中,鑑定台灣人族群親緣性的系統200包含一核酸萃取單元300、一核酸檢測單元400以及一非暫態機器可讀媒體500。Please refer to FIG. 4 . FIG. 4 shows a block diagram of a system 200 for identifying the affinities of Taiwanese ethnic groups according to another embodiment of the present invention. In FIG. 4 , the system 200 for identifying the ethnic group affinity of Taiwanese includes a nucleic acid extraction unit 300 , a nucleic acid detection unit 400 and a non-transitory machine-readable medium 500 .

核酸萃取單元300,用以獲得一主檢測者之一主測核酸樣本和一被檢測者之一被測核酸樣本。具體來說,核酸萃取單元300可使用管柱萃取純化法(Column Purification)或試劑萃取純化法(Reagents Purification)來萃取主檢測者的主測核酸樣本和被檢測者的被測核酸樣本,但本發明並不以此為限。The nucleic acid extraction unit 300 is used to obtain a main test nucleic acid sample of a main tester and a test nucleic acid sample of a subject to be tested. Specifically, the nucleic acid extraction unit 300 can use column purification (Column Purification) or reagent extraction purification (Reagents Purification) to extract the main test nucleic acid sample of the main tester and the test nucleic acid sample of the test subject, but the present invention is not limited thereto.

核酸檢測單元400電性連接核酸萃取單元300,用以檢測主測核酸樣本和被測核酸樣本中SNP位點組合511之複數個核苷酸組成,其中SNP位點組合511包含複數個SNP位點(圖未繪示),所述SNP位點位於第1~22對染色體上。具體來說,鑑定台灣人族群親緣性的系統200,其中核酸檢測單元400可為一生物晶片、一化學試劑套組或一基質輔助雷射解析串聯飛行時間質譜儀,但本發明不以此為限。進一步來說,核酸檢測單元400可係使用一鑑定酵素切割法、一核酸片段質量差異檢測法、一螢光探針偵測法、一核酸片段構型變異法或一核酸定序分析法檢測核苷酸組成,但本發明不以此為限。The nucleic acid detection unit 400 is electrically connected to the nucleic acid extraction unit 300 to detect the multiple nucleotide composition of the SNP site combination 511 in the main test nucleic acid sample and the test nucleic acid sample, wherein the SNP site combination 511 includes a plurality of SNP sites (not shown in the figure), and the SNP sites are located on the 1st to 22nd pair of chromosomes. Specifically, in the system 200 for identifying the affinities of Taiwanese ethnic groups, the nucleic acid detection unit 400 can be a biological chip, a chemical reagent set or a matrix-assisted laser desorption tandem time-of-flight mass spectrometer, but the present invention is not limited thereto. Further, the nucleic acid detection unit 400 can use an identification enzyme cleavage method, a nucleic acid fragment quality difference detection method, a fluorescent probe detection method, a nucleic acid fragment configuration variation method or a nucleic acid sequencing analysis method to detect nucleotide composition, but the present invention is not limited thereto.

非暫態機器可讀媒體500訊號連接該核酸檢測單元,用以存取一程式用以分析主測核酸樣本和被測核酸樣本之核苷酸組成並判定一親緣機率。所述非暫態機器可讀媒體500包含一參照基因組資料庫510以及一計算單元520。參照基因組資料庫510包含SNP位點組合511及一參照次等位基因頻率集合512,其中SNP位點組合511及參照次等位基因頻率集合512係藉由分析一台灣人全基因組資料庫所建立,參照次等位基因頻率集合512包含相對應所述SNP位點之複數個次等位基因頻率(圖未繪示),各SNP位點之基因缺失率小於0.1,且經連鎖不平衡裁切後各次等位基因頻率大於0.4995。計算單元520訊號連接參照基因組資料庫510,用以比對各SNP位點之核苷酸組成與相對應之次等位基因頻率並計算以得到一親緣機率,再依據親緣機率判定主檢測者及被檢測者之親緣性。The non-transitory machine-readable medium 500 is signally connected to the nucleic acid detection unit for accessing a program for analyzing the nucleotide composition of the main nucleic acid sample and the nucleic acid sample to be tested and determining a relative probability. The non-transitory machine readable medium 500 includes a reference genome database 510 and a computing unit 520 . The reference genome database 510 includes a SNP site combination 511 and a reference minor allele frequency set 512, wherein the SNP site combination 511 and the reference minor allele frequency set 512 are established by analyzing a Taiwanese whole genome database, and the reference minor allele frequency set 512 includes multiple minor allele frequencies corresponding to the SNP site (not shown in the figure), the gene deletion rate of each SNP site is less than 0.1, and after linkage disequilibrium pruning The allele frequency of each sub-allelic is greater than 0.4995. The calculation unit 520 is signal-connected to the reference genome database 510, and is used to compare the nucleotide composition of each SNP site with the corresponding minor allele frequency and calculate to obtain a relative probability, and then determine the relatedness of the main tester and the tested subject according to the relative probability.

請參照第5圖,第5圖繪示本發明之另一實施方式之另一實施例之鑑定台灣人族群親緣性的系統200a的方塊圖。在第5圖中,鑑定台灣人族群親緣性的系統200a所包含之核酸萃取單元300a、核酸檢測單元400a和非暫態機器可讀媒體500a。其中核酸萃取單元300a和核酸檢測單元400a的技術細節與第4圖中的核酸萃取單元300和核酸檢測單元400相同,在此不再贅述。Please refer to FIG. 5 . FIG. 5 is a block diagram of a system 200 a for identifying ethnic group affinities of Taiwanese in another embodiment of the present invention. In FIG. 5, the nucleic acid extraction unit 300a, the nucleic acid detection unit 400a, and the non-transitory machine-readable medium 500a included in the system 200a for identifying ethnic group affinity of Taiwanese. The technical details of the nucleic acid extraction unit 300 a and the nucleic acid detection unit 400 a are the same as those of the nucleic acid extraction unit 300 and the nucleic acid detection unit 400 in FIG. 4 , and will not be repeated here.

非暫態機器可讀媒體500a包含參照基因組資料庫510a和計算單元520a,其中參照基因組資料庫510a包含SNP位點組合511a和參照次等位基因頻率集合512a,參照基因組資料庫510a的技術細節和第4圖中的參照基因組資料庫510相同,在此不於此贅述。The non-transitory machine-readable medium 500a includes a reference genome database 510a and a computing unit 520a, wherein the reference genome database 510a includes a SNP site combination 511a and a reference sub-allele frequency set 512a, and the technical details of the reference genome database 510a are the same as those of the reference genome database 510 in FIG.

計算單元520a可包含比對模組521、親緣指數計算模組522、累積親緣指數計算模組523以及親緣機率計算模組524。比對模組521用以將主測核酸樣本與被測核酸樣本在同一個SNP位點之核苷酸組成與相對應之次等位基因頻率進行比對,以得到複數個目標次等位基因頻率。親緣指數計算模組522訊號連接比對模組521,用以將目標次等位基因頻率分別對所述SNP位點進行計算,以得到複數個親緣指數。累積親緣指數計算模組523訊號連接親緣指數計算模組522,將所述親緣指數相乘積而得出一累積親緣指數。親緣機率計算模組524訊號連接累積親緣指數計算模組523,利用累積親緣指數計算出一親緣機率。The calculation unit 520a may include a comparison module 521 , a kinship index calculation module 522 , a cumulative kinship index calculation module 523 and a kinship probability calculation module 524 . The comparing module 521 is used to compare the nucleotide composition of the main test nucleic acid sample and the test nucleic acid sample at the same SNP site and the corresponding minor allele frequencies to obtain a plurality of target minor allele frequencies. The kinship index calculation module 522 is connected to the signal comparison module 521, which is used to calculate the frequency of the target secondary alleles for the SNP loci to obtain a plurality of kinship indexes. The cumulative affinity index calculation module 523 is signally connected to the affinity index calculation module 522 to multiply the affinity indexes together to obtain a cumulative affinity index. The kinship probability calculation module 524 is signally connected to the cumulative kinship index calculation module 523, and uses the cumulative kinship index to calculate a kinship probability.

茲以下列具體實施例進一步示範說明本發明,用以有利於本發明所屬技術領域通常知識者,可在不需過度解讀的情形下完整利用並實踐本發明,而不應將這些試驗例視為對本發明範圍的限制,但用於說明如何實施本發明的材料及方法。The present invention is further exemplified with the following specific examples, in order to benefit those skilled in the art to which the present invention belongs, and can fully utilize and practice the present invention without excessive interpretation, and these test examples should not be regarded as limiting the scope of the present invention, but are used to illustrate how to implement the materials and methods of the present invention.

為了驗證本發明之鑑定台灣人族群親緣性的方法及其系統之穩定性及準確度,於本試驗中所使用的樣本共有355對,並將上述樣本的基因型數據儲存於參照基因組資料庫中。試驗上利用STR位點計算親緣機率作為對照組,再以PLINK1.9計算血緣同源(Identity By Descent, IBD)、以PLINK2.0計算基於親緣性的全基因組關聯推理(Kinship-based inference for genome-wide association, KING),並利用美國Affymetrix公司所推出之通用種族SNP位點計算親緣機率對此355對樣本進行親緣性鑑定做為比較例,以驗證本發明所選用之SNP位點組合應用於鑑定台灣人族群親緣性之穩定性及準確度。In order to verify the stability and accuracy of the method and system for identifying the affinities of Taiwanese ethnic groups of the present invention, a total of 355 pairs of samples were used in this experiment, and the genotype data of the above samples were stored in the reference genome database. In the experiment, STR loci were used to calculate the relative probability as a control group, and PLINK1.9 was used to calculate Identity By Descent (IBD), and PLINK2.0 was used to calculate Kinship-based inference for genome-wide association (KING), and the 355 pairs of samples were related by using the general ethnic SNP loci calculated by the American Affymetrix company. Gender identification was used as a comparative example to verify the stability and accuracy of the combination of SNP sites selected in the present invention for identifying the affinity of Taiwanese ethnic groups.

試驗上使用D8S1179、D21S11、D7S820、CSF1PO、D3S1358、TH01、D13S317、D16S539、D2S1338、D19S433、vWA、TPOX、D18S51、D5S818和FGA共15個體染色體STR位點為遺傳標記進行檢測,作為確認樣本間的親緣性並比較親緣性鑑定之穩定性及準確度的對照組,並分析牙釉質蛋白(Amelogenin)位點以區分X或Y性染色體,最後再計算此355對之親緣機率以判定其親緣性。其中,所述親緣機率之計算方式請參照Charles Brenner 及 Jeffrey W. Morris等人所提出之步驟。請參照表三,其為使用上述STR位點確認樣本間親緣性的結果,根據上述親緣機率的計算,此355對樣本中共有314對被STR位點判定有親緣性,有41對被STR位點判定無親緣性。 表三 有親緣性 無親緣性 樣本對 1~314 315~355 合計對數 314 41 In the experiment, a total of 15 chromosomal STR loci of D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818 and FGA were used as genetic markers for detection as genetic markers to confirm the affinity between samples and The control group was used to compare the stability and accuracy of kinship identification, and the Amelogenin site was analyzed to distinguish X or Y sex chromosomes, and finally the kinship probability of the 355 pairs was calculated to determine their kinship. Wherein, please refer to the steps proposed by Charles Brenner and Jeffrey W. Morris for the calculation method of the relative probability. Please refer to Table 3, which is the result of using the above STR loci to confirm the relatedness between samples. According to the calculation of the above-mentioned relative probability, among the 355 pairs of samples, there are 314 pairs determined to be related by STR loci, and 41 pairs determined to be unrelated by STR loci. Table three Affinity Unrelated sample pair 1~314 315~355 total logarithm 314 41

由表三之結果延伸比較其餘親緣性鑑定法如下,在PLINK1.9中,可以根據IBD比例而得出PIHAT值,即P(IBD=2)+0.5×P(IBD=1),而在PLINK2.0中,可根據KING而得出KINSHIP值來計算親緣性。此外,另分別使用所選定的176個SNP位點(以下簡稱CMUH_176SNPs)以及83個通用種族SNP位點(以下簡稱AFF_83SNPs)來依照本發明所揭示之計算步驟計算各樣本對PP%,並判定各樣本對之親緣性。其中AFF_83SNPs所使用的SNP位點、其等位基因及其次等位基因頻率如表四所示。 表四 SNP位點 等位基因 次等位基因頻率 SNP位點 等位基因 次等位基因頻率 rs323009 G 0.4081 rs10827221 G 0.4681 rs1540732 C 0.4943 rs7792052 C 0.2829 rs4557603 G 0.3550 rs1590349 G 0.4270 rs6861600 G 0.4439 rs9709980 G 0.4633 rs2665355 G 0.3839 rs713738 A 0.3984 rs803158 G 0.3484 rs6134919 C 0.4395 rs6987709 C 0.3302 rs4389065 A 0.3999 rs12285109 G 0.4091 rs12913890 G 0.3860 rs1351407 T 0.4976 rs11079221 G 0.4351 rs7322418 C 0.3782 rs1202031 A 0.4555 rs4472366 G 0.4120 rs10869208 C 0.4909 rs10819912 C 0.4045 rs1021870 T 0.3489 rs10466213 G 0.4572 rs284000 C 0.4272 rs2371356 G 0.3449 rs1942355 C 0.4736 rs347301 C 0.3679 rs1997466 C 0.4925 rs10495437 T 0.3033 rs5752479 C 0.4661 rs2882367 C 0.4706 rs1979097 C 0.3882 rs534665 C 0.4672 rs9917155 C 0.4363 rs7737453 C 0.3980 rs7534574 C 0.2894 rs13535 G 0.4201 rs4478161 C 0.4408 rs424301 A 0.2569 rs2630787 T 0.4652 rs2010253 C 0.4348 rs1401858 A 0.3427 rs4550919 A 0.3943 rs16953197 T 0.4996 rs1443118 A 0.4916 rs9297213 C 0.4833 rs7834533 C 0.2458 rs6005018 C 0.3574 rs8109968 A 0.4193 rs874429 C 0.3886 rs2517455 C 0.4904 rs9939407 C 0.3565 rs10770943 G 0.3982 rs761223 C 0.4996 rs6931131 C 0.4363 rs432551 G 0.3663 rs835401 C 0.4980 rs9317420 G 0.4282 rs6708411 T 0.4209 rs9912146 C 0.3059 rs11035666 A 0.3701 rs735043 G 0.4442 rs2324969 G 0.3870 rs2826803 C 0.4177 rs4746855 G 0.4844 rs1956616 C 0.3928 rs856411 T 0.4854 rs4805298 T 0.4641 rs3857265 C 0.3524 rs1783305 T 0.4879 rs2465390 T 0.4238 rs1031107 C 0.3999 rs2917817 G 0.4276 rs1756295 A 0.4078 rs6856651 A 0.3098 rs1423852 G 0.3186 rs4608860 G 0.4173 rs6590574 G 0.4285 rs2344664 G 0.3915 rs4887511 T 0.3364 rs7631088 A 0.4955       Based on the results in Table 3, the other related identification methods are extended and compared as follows. In PLINK1.9, the PIHAT value can be obtained according to the IBD ratio, that is, P(IBD=2)+0.5×P(IBD=1), while in PLINK2.0, the relatedness can be calculated based on the KINSHIP value obtained from KING. In addition, the selected 176 SNP sites (hereinafter referred to as CMUH_176SNPs) and 83 universal ethnic SNP sites (hereinafter referred to as AFF_83SNPs) were used to calculate the PP% of each sample pair according to the calculation steps disclosed in the present invention, and determine the affinity of each sample pair. The SNP sites used by AFF_83SNPs, their alleles and their minor allele frequencies are shown in Table 4. Table four SNP site allele Minor allele frequency SNP site allele Minor allele frequency rs323009 G 0.4081 rs10827221 G 0.4681 rs1540732 C 0.4943 rs7792052 C 0.2829 rs4557603 G 0.3550 rs1590349 G 0.4270 rs6861600 G 0.4439 rs9709980 G 0.4633 rs2665355 G 0.3839 rs713738 A 0.3984 rs803158 G 0.3484 rs6134919 C 0.4395 rs6987709 C 0.3302 rs4389065 A 0.3999 rs12285109 G 0.4091 rs12913890 G 0.3860 rs1351407 T 0.4976 rs11079221 G 0.4351 rs7322418 C 0.3782 rs1202031 A 0.4555 rs4472366 G 0.4120 rs10869208 C 0.4909 rs10819912 C 0.4045 rs1021870 T 0.3489 rs10466213 G 0.4572 rs284000 C 0.4272 rs2371356 G 0.3449 rs1942355 C 0.4736 rs347301 C 0.3679 rs1997466 C 0.4925 rs10495437 T 0.3033 rs5752479 C 0.4661 rs2882367 C 0.4706 rs1979097 C 0.3882 rs534665 C 0.4672 rs9917155 C 0.4363 rs7737453 C 0.3980 rs7534574 C 0.2894 rs13535 G 0.4201 rs4478161 C 0.4408 rs424301 A 0.2569 rs2630787 T 0.4652 rs2010253 C 0.4348 rs1401858 A 0.3427 rs4550919 A 0.3943 rs16953197 T 0.4996 rs1443118 A 0.4916 rs9297213 C 0.4833 rs7834533 C 0.2458 rs6005018 C 0.3574 rs8109968 A 0.4193 rs874429 C 0.3886 rs2517455 C 0.4904 rs9939407 C 0.3565 rs10770943 G 0.3982 rs761223 C 0.4996 rs6931131 C 0.4363 rs432551 G 0.3663 rs835401 C 0.4980 rs9317420 G 0.4282 rs6708411 T 0.4209 rs9912146 C 0.3059 rs11035666 A 0.3701 rs735043 G 0.4442 rs2324969 G 0.3870 rs2826803 C 0.4177 rs4746855 G 0.4844 rs1956616 C 0.3928 rs856411 T 0.4854 rs4805298 T 0.4641 rs3857265 C 0.3524 rs1783305 T 0.4879 rs2465390 T 0.4238 rs1031107 C 0.3999 rs2917817 G 0.4276 rs1756295 A 0.4078 rs6856651 A 0.3098 rs1423852 G 0.3186 rs4608860 G 0.4173 rs6590574 G 0.4285 rs2344664 G 0.3915 rs4887511 T 0.3364 rs7631088 A 0.4955

另請參照表五,其為以STR位點、IBD、KING、CMUH_176SNPs以及AFF_83SNPs計算樣本親緣性之結果比較。 表五   有親緣性 無親緣性 STR(PP%) 99.50~99.99 0~9.56 PIHAT 0.4963~0.5346 0~0.0766 KINSHIP 0.2219~0.2574 0~0.0334 CMUH_176SNPs(PP%) > 99.999999 ~0 AFF_83SNPs(PP%) 90.29~99.99999 0~26.02 Please also refer to Table 5, which is a comparison of the results of calculating the relatedness of samples based on STR loci, IBD, KING, CMUH_176SNPs and AFF_83SNPs. Table five Affinity Unrelated STR(PP%) 99.50~99.99 0~9.56 PIHAT 0.4963~0.5346 0~0.0766 KINSHIP 0.2219~0.2574 0~0.0334 CMUH_176SNPs(PP%) > 99.999999 ~0 AFF_83SNPs(PP%) 90.29~99.99999 0~26.02

如表五的結果所示,所有被STR判定為有親緣性的樣本對之PIHAT近乎於0.5,而所有被判定為無親緣性的樣本對都顯示PIHAT相對接近0.01;所有被STR判定有親緣性的樣本對KINSHIP近乎於0.247,而所有被判定為無親緣性的樣本對都顯示KINSHIP相對接近0.001。若將STR結果、PIHAT值與KINSHIP值作回歸分析,可得R 2=0.9957,表示KINSHIP和PIHAT值之間存在高度相關性。然而在使用CMUH_176SNPs之鑑定結果中,所有被STR判定有親緣性的樣本對親緣機率均大於99.999999%,而所有被判定為無親緣性的樣本對也都顯示親緣機率相對接近0。由此可知,本發明所選定之SNP位點組合具有良好的親緣判定準確率及穩定度。 As shown in the results in Table 5, the PIHAT of all the sample pairs judged to be related by STR is close to 0.5, while the pair of samples judged to be unrelated show that PIHAT is relatively close to 0.01; the pair of samples judged to be related by STR has KINSHIP close to 0.247, while the pair of samples judged to be unrelated show KINSHIP relatively close to 0.001. If the STR results, PIHAT value and KINSHIP value are used for regression analysis, R 2 =0.9957 can be obtained, indicating that there is a high correlation between KINSHIP and PIHAT value. However, in the identification results using CMUH_176SNPs, the relative probability of all the sample pairs judged to be related by STR was greater than 99.999999%, and all the sample pairs judged to be unrelated also showed a relative probability relatively close to 0. It can be seen from the above that the combination of SNP sites selected in the present invention has good accuracy and stability of kinship determination.

在此值得一提的是,在由STR判定有親緣性的314對樣本中,其中共有7對樣本在位點檢測階段時出現具有一個STR位點不匹配的情況,故額外測試了更多位於X或Y性染色體上的STR位點,用以進一步參考或排除判定其親緣性。此外,如表四所示,由於AFF_83SNPs在台灣人族群中的次等位基因頻率較低,雖其鑑定結果幾乎與CMUH_176SNPs一致,但仍有6對樣本偏離了STR所判定的結果,其中有親緣性的組別中有4對PP% < 99.9%,而無親緣性的組別中有2對PP% > 1。總結表五結果可知,比起STR位點,應用CMUH_176SNPs作為鑑定台灣人族群親緣性之遺傳標記,具有更優異的位點匹配度;而比起AFF_83SNPs,CMUH_176SNPs在台灣人族群中具有更高次等位基因頻率,可達成更佳的親緣判定準確率。It is worth mentioning here that among the 314 pairs of samples determined to be related by STR, a total of 7 pairs of samples had a STR locus mismatch during the site detection stage, so more STR loci located on the X or Y sex chromosomes were additionally tested for further reference or exclusion to determine their relatedness. In addition, as shown in Table 4, due to the lower allele frequency of AFF_83SNPs in the Taiwanese population, although the identification results of AFF_83SNPs are almost consistent with those of CMUH_176SNPs, there are still 6 pairs of samples that deviate from the results determined by STR. Among them, there are 4 pairs of PP% < 99.9% in the related group, and 2 pairs of PP% > 1 in the unrelated group. Summarizing the results in Table 5, it can be seen that compared with STR loci, using CMUH_176SNPs as a genetic marker to identify the relatedness of the Taiwanese population has a better locus matching; and compared with AFF_83SNPs, CMUH_176SNPs has a higher allele frequency in the Taiwanese population, which can achieve a better accuracy rate of genetic determination.

綜上所述,本發明之鑑定台灣人族群親緣性的方法及其系統可有效運用於法醫學鑑識、醫學鑑定、親緣性鑑定及社會問題領域,藉由條件篩選出在台灣人族群中具有高穩定性且高次等位基因頻率的176個SNP位點,建立了一個屬於台灣人用於鑑定親緣性的參照基因組資料庫。本發明之鑑定台灣人族群親緣性的方法及其系統不僅可達成比習知使用STR位點的鑑定方法具備更優異的位點匹配率,比起使用現行通用種族的SNP位點來鑑定台灣人族群親緣性更是具有更佳的準確率。In summary, the method and system for identifying the relatedness of the Taiwanese ethnic group of the present invention can be effectively applied to the fields of forensic forensics, medical identification, relatedness identification, and social issues. By conditionally screening out 176 SNP sites with high stability and high allele frequency in the Taiwanese ethnic group, a reference genome database for identifying the relatedness of the Taiwanese is established. The method and system for identifying the affinities of Taiwanese ethnic groups of the present invention can not only achieve a better site matching rate than the conventional identification method using STR loci, but also have a better accuracy rate than using the SNP sites of current common races to identify the affinities of Taiwanese ethnic groups.

然本發明已以實施方式揭露如上,然其並非用以限定本發明,任何熟習此技藝者,在不脫離本發明的精神和範圍內,當可作各種的更動與潤飾,因此本發明的保護範圍當視後附的申請專利範圍所界定者為準。However, the present invention has been disclosed as above in terms of implementation, but it is not intended to limit the present invention. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention should be defined by the scope of the appended patent application.

100:鑑定台灣人族群親緣性的方法 110,120,130,140,141,142,143,144:步驟 200,200a:鑑定台灣人族群親緣性的系統 300,300a:核酸萃取單元 400,400a:核酸檢測單元 500,500a:非暫態機器可讀媒體 510,510a:參照基因組資料庫 511,511a:SNP位點組合 512,512a:參照次等位基因頻率集合 520,520a:計算單元 521:比對模組 522:親緣指數計算模組 523:累積親緣指數計算模組 524:親緣機率計算模組 100: Method for Identifying the Affinity of Taiwanese Ethnic Groups 110, 120, 130, 140, 141, 142, 143, 144: steps 200,200a: A system for identifying the relatedness of Taiwanese ethnic groups 300,300a: nucleic acid extraction unit 400,400a: nucleic acid detection unit 500, 500a: Non-transitory machine-readable media 510,510a: Reference genome database 511,511a: SNP combination 512, 512a: reference minor allele frequency set 520,520a: computing unit 521: Comparing modules 522: Affinity index calculation module 523: Cumulative kinship index calculation module 524: Relative probability calculation module

為讓本發明之上述和其他目的、特徵、優點與實施例能更明顯易懂,所附圖式之說明如下: 第1圖繪示本發明一實施方式之鑑定台灣人族群親緣性的方法的步驟流程圖; 第2圖繪示第1圖之鑑定台灣人族群親緣性的方法之計算步驟的流程圖; 第3圖繪示本發明一實施方式之鑑定台灣人族群親緣性的方法之SNP位點組合的位點示意圖; 第4圖繪示本發明之另一實施方式之一實施例之鑑定台灣人族群親緣性的系統的方塊圖;以及 第5圖繪示本發明之另一實施方式之另一實施例之鑑定台灣人族群親緣性的系統的方塊圖。 In order to make the above and other objects, features, advantages and embodiments of the present invention more clearly understood, the accompanying drawings are described as follows: Figure 1 shows a flow chart of the steps of the method for identifying the affinities of Taiwanese ethnic groups according to an embodiment of the present invention; Figure 2 shows a flow chart of the calculation steps of the method for identifying the affinities of Taiwanese ethnic groups in Figure 1; Figure 3 shows a schematic diagram of the combination of SNP sites of the method for identifying the affinity of Taiwanese ethnic groups according to an embodiment of the present invention; Figure 4 shows a block diagram of a system for identifying the affinities of Taiwanese ethnic groups according to an example of another embodiment of the present invention; and FIG. 5 shows a block diagram of a system for identifying the affinities of Taiwanese ethnic groups according to another example of another embodiment of the present invention.

100:鑑定台灣人族群親緣性的方法 100: Method for Identifying the Affinity of Taiwanese Ethnic Groups

110,120,130,140:步驟 110, 120, 130, 140: steps

Claims (7)

一種鑑定台灣人族群親緣性的方法,包含:取得一參照基因組資料庫,該參照基因組資料庫藉由一生物信息計算程式分析一台灣人全基因組資料庫並建立一參照次等位基因頻率集合及一SNP位點組合,其中該SNP位點組合包含複數個SNP位點,該參照次等位基因頻率集合包含相對應該些SNP位點之複數個次等位基因頻率,該些SNP位點位於第1~22對染色體上,各該SNP位點之基因缺失率小於0.1,且經連鎖不平衡裁切後各該次等位基因頻率大於0.4995;進行一提供核酸樣本步驟,其係提供一主檢測者之一主測核酸樣本與一被檢測者之一被測核酸樣本;進行一核酸檢測步驟,其係使用一核酸檢測方法檢測該主測核酸樣本和該被測核酸樣本中對應該SNP位點組合中各該SNP位點之複數個核苷酸組成;以及進行一計算步驟,其係比對各該SNP位點之該些核苷酸組成與相對應之一該次等位基因頻率以計算出一親緣機率,再依據該親緣機率判定該主檢測者及該被檢測者之親緣性,其中該計算步驟包含:將該主測核酸樣本與該被測核酸樣本在同一該SNP位點之該些核苷酸組成與相對應之一該次等位基因頻率進行比對,以得到複數個目標次等位基因頻率;將該些目標次等位基因頻率分別對該些SNP位點進行計算,以得到複數個親緣指數; 將該些親緣指數相乘積以計算出一累積親緣指數;及利用該累積親緣指數計算出一親緣機率;其中,該些SNP位點係選自rs6586535、rs946836、rs6694465、rs701614、rs9431708、rs1425613、rs11709353、rs56027863、rs4974500、rs55768019、rs16875084、rs476428、rs193491、rs6871253、rs3095250、rs3851224、rs12703023、rs10954797、rs7832232、rs1025668、rs1991718、rs7854620、rs10988509、rs10826449、rs7136376、rs161966、rs17456768、rs9517294、rs7992643、rs7164594、rs1079572、rs7499814、rs66491176、rs4793579、rs55865255、rs7207216、rs4891023、rs9305268、rs7521902、rs284164、rs4538254、rs1344706、rs10178377、rs9822113、rs4401376、rs6786840、rs13128397、rs11932259、rs9968429、rs1443402、rs4703389、rs4286720、rs11242704、rs9372417、rs6920965、rs208869、rs2041009、rs12680146、rs3847227、rs7038346、rs10962366、rs7043796、rs11006252、rs4746992、rs10887637、rs2003906、rs7926370、rs10844220、rs710681、rs4981030、rs9530834、rs7166130、rs8062124、rs9932649、rs2966063、rs430639、rs11081589、 rs2033491、rs4814615、rs885985、rs12403557、rs143290884、rs10932127、rs1032665、rs4580593、rs12640221、rs986039、rs1877731、rs28582382、rs9296249、rs55668741、rs11753921、rs9690126、rs12680842、rs2929843、rs4409435、rs10809234、rs7023738、rs11144120、rs10869499、rs6482847、rs2132966、rs577948、rs3741851、rs11171598、rs9573483、rs12898878、rs78526880、rs12597411、rs62034138、rs67048050、rs4368195、rs3859191、rs349989、rs11871847、rs6037894、rs2207878、rs61778328、rs12759780、rs642307、rs910622、rs33941127、rs1544846、rs10182721、rs1158228、rs2340475、rs13102188、rs6858430、rs9502570、rs9257185、rs9349364、rs62495696、rs4397385、rs1332312、rs13294439、rs7033078、rs1452289、rs7936903、rs1953655、rs7981566、rs17792748、rs61985798、rs8006042、rs883481、rs77359952、rs2305443、rs4787247、rs572858、rs11673399、rs28456308、rs117294、rs357063、rs12473958、rs7580245、rs1440512、rs13314271、rs34819461、rs3805285、rs17030363、rs258129、rs9479343、rs17170324、rs12705317、 rs73174654、rs2978213、rs72614682、rs35051342、rs717582、rs11439588、rs72736093、rs4932564、rs918703、rs7499886、rs2058306、rs1785550及rs6089982所構成的群組。 A method for identifying the genetic affinity of the Taiwanese population, comprising: obtaining a reference genome database, the reference genome database analyzes a Taiwanese whole genome database by a biological information calculation program and establishes a reference sub-allele frequency set and a SNP site combination, wherein the SNP site combination includes a plurality of SNP sites, and the reference sub-allelic frequency set includes a plurality of sub-allelic frequencies corresponding to the SNP sites, and the SNP sites are located on chromosomes 1-22, The gene deletion rate of each of the SNP loci is less than 0.1, and the frequency of each of the secondary alleles is greater than 0.4995 after linkage disequilibrium cutting; a step of providing a nucleic acid sample is provided, which is to provide a main test nucleic acid sample of a main tester and a test nucleic acid sample of a subject to be tested; a nucleic acid detection step is performed, which is to use a nucleic acid detection method to detect the composition of multiple nucleotides corresponding to each of the SNP sites in the test nucleic acid sample and the tested nucleic acid sample; Comparing the nucleotide composition of each of the SNP sites and the corresponding one of the sub-allele frequencies to calculate a relative probability, and then determining the relatedness of the main tester and the tested subject according to the relative probability, wherein the calculation step includes: comparing the nucleotide composition of the main test nucleic acid sample and the tested nucleic acid sample at the same SNP site with the corresponding one of the sub-allele frequencies to obtain a plurality of target sub-allele frequencies; Points are calculated to obtain complex kinship indices; multiplying these relatedness indices to calculate a cumulative relatedness index; and using the cumulative relatedness index to calculate a relatedness probability; wherein, the SNP sites are selected from rs6586535, rs946836, rs6694465, rs701614, rs9431708, rs1425613, rs11709353, rs56027863, rs497450 0, rs55768019, rs16875084, rs476428, rs193491, rs6871253, rs3095250, rs3851224, rs12703023, rs10954797, rs7832232, rs1025668, rs1991718, rs7854620, rs1 0988509, rs10826449, rs7136376, rs161966, rs17456768, rs9517294, rs7992643, rs7164594, rs1079572, rs7499814, rs66491176, rs4793579, rs55865255, rs720 7216, rs4891023, rs9305268, rs7521902, rs284164, rs4538254, rs1344706, rs10178377, rs9822113, rs4401376, rs6786840, rs13128397, rs11932259, rs9968429 . 006252, rs4746992, rs10887637, rs2003906, rs7926370, rs10844220, rs710681, rs4981030, rs9530834, rs7166130, rs8062124, rs9932649, rs2966063, rs430639 , rs11081589, rs2033491, rs4814615, rs885985, rs12403557, rs143290884, rs10932127, rs1032665, rs4580593, rs12640221, rs986039, rs1877731, rs28582382, rs9296249, rs55668741, rs11753921, rs9690126, rs12680842, rs2929843, rs4409435, rs10809234, rs7023738, rs11144120, rs10869499, rs6482847, rs2132966, rs577948, rs 3741851, rs11171598, rs9573483, rs12898878, rs78526880, rs12597411, rs62034138, rs67048050, rs4368195, rs3859191, rs349989, rs11871847, rs6037894, rs rs13102188, rs6858430, rs9502570, rs925 7185, rs9349364, rs62495696, rs4397385, rs1332312, rs13294439, rs7033078, rs1452289, rs7936903, rs1953655, rs7981566, rs17792748, rs61985798, rs80060 42, rs883481, rs77359952, rs2305443, rs4787247, rs572858, rs11673399, rs28456308, rs117294, rs357063, rs12473958, rs7580245, rs1440512, rs13314271, rs3 4819461, rs3805285, rs17030363, rs258129, rs9479343, rs17170324, rs12705317, rs73174654, rs2978213, rs72614682, rs35051342, rs717582, rs11439588, rs72736093, rs4932564, rs918703, rs7499886, rs2058306, rs1785550 and rs6089982 group. 如請求項1所述之鑑定台灣人族群親緣性的方法,其中該核酸檢測法包含使用一生物晶片、一化學試劑或一基質輔助雷射解析串聯飛行時間質譜儀所執行之基因檢測法。 The method for identifying the affinities of Taiwanese ethnic groups as described in claim 1, wherein the nucleic acid detection method includes a genetic detection method performed by using a biological chip, a chemical reagent, or a matrix-assisted laser desorption tandem time-of-flight mass spectrometer. 如請求項1所述之鑑定台灣人族群親緣性的方法,其中該核酸檢測法為一鑑定酵素切割法、一核酸片段質量差異檢測法、一螢光探針偵測法、一核酸片段構型變異法或一核酸定序分析法。 The method for identifying the affinity of Taiwanese ethnic groups as described in Claim 1, wherein the nucleic acid detection method is an identification enzyme cleavage method, a nucleic acid fragment quality difference detection method, a fluorescent probe detection method, a nucleic acid fragment configuration variation method or a nucleic acid sequencing analysis method. 一種鑑定台灣人族群親緣性的系統,包含:一核酸萃取單元,用以獲得一主檢測者之一主測核酸樣本和一被檢測者之一被測核酸樣本;一核酸檢測單元,該核酸檢測單元電性連接該核酸萃取單元,用以檢測該主測核酸樣本和該被測核酸樣本中一SNP位點組合之複數個核苷酸組成,其中該SNP位點組合包含複數個SNP位點,該些SNP位點位於第1~22對染色體上;以及 一非暫態機器可讀媒體,該非暫態機器可讀媒體訊號連接該核酸檢測單元,用以存取一程式用以分析該主測核酸樣本和該被測核酸樣本之該些核苷酸組成並判定一親緣機率,該非暫態機器可讀媒體包含:一參照基因組資料庫,該參照基因組資料庫包含該SNP位點組合及一參照次等位基因頻率集合,其中該SNP位點組合及該參照次等位基因頻率集合係藉由分析一台灣人全基因組資料庫所建立,該參照次等位基因頻率集合包含相對應該些SNP位點之複數個次等位基因頻率,各該SNP位點之基因缺失率小於0.1,且經連鎖不平衡裁切後各該次等位基因頻率大於0.4995;及一計算單元,訊號連接該參照基因組資料庫,用以比對各該SNP位點之該些核苷酸組成與相對應之一該次等位基因頻率並計算以得到一親緣機率,再依據該親緣機率判定該主檢測者及該被檢測者之親緣性,其中該計算單元包含:一比對模組,用以將該主測核酸樣本與該被測核酸樣本在同一該SNP位點之該些核苷酸組成與相對應之一該次等位基因頻率進行比對,以得到複數個目標次等位基因頻率;一親緣指數計算模組,其訊號連接該比對模組,該親緣機率計算模組用以將該些次等位基因頻率分別對該些SNP位點進行計算,以得到複數個親緣指 數;一累積親緣指數計算模組,其訊號連接該親緣指數計算模組,將該些親緣指數相乘積以得出一累積親緣指數;及一親緣機率計算模組,其訊號連接該累積親緣指數計算模組,利用該累積親緣指數計算出一親緣機率;其中,該些SNP位點係選自rs6586535、rs946836、rs6694465、rs701614、rs9431708、rs1425613、rs11709353、rs56027863、rs4974500、rs55768019、rs16875084、rs476428、rs193491、rs6871253、rs3095250、rs3851224、rs12703023、rs10954797、rs7832232、rs1025668、rs1991718、rs7854620、rs10988509、rs10826449、rs7136376、rs161966、rs17456768、rs9517294、rs7992643、rs7164594、rs1079572、rs7499814、rs66491176、rs4793579、rs55865255、rs7207216、rs4891023、rs9305268、rs7521902、rs284164、rs4538254、rs1344706、rs10178377、rs9822113、rs4401376、rs6786840、rs13128397、rs11932259、rs9968429、rs1443402、rs4703389、rs4286720、rs11242704、rs9372417、rs6920965、rs208869、rs2041009、rs12680146、rs3847227、rs7038346、rs10962366、rs7043796、 rs11006252、rs4746992、rs10887637、rs2003906、rs7926370、rs10844220、rs710681、rs4981030、rs9530834、rs7166130、rs8062124、rs9932649、rs2966063、rs430639、rs11081589、rs2033491、rs4814615、rs885985、rs12403557、rs143290884、rs10932127、rs1032665、rs4580593、rs12640221、rs986039、rs1877731、rs28582382、rs9296249、rs55668741、rs11753921、rs9690126、rs12680842、rs2929843、rs4409435、rs10809234、rs7023738、rs11144120、rs10869499、rs6482847、rs2132966、rs577948、rs3741851、rs11171598、rs9573483、rs12898878、rs78526880、rs12597411、rs62034138、rs67048050、rs4368195、rs3859191、rs349989、rs11871847、rs6037894、rs2207878、rs61778328、rs12759780、rs642307、rs910622、rs33941127、rs1544846、rs10182721、rs1158228、rs2340475、rs13102188、rs6858430、rs9502570、rs9257185、rs9349364、rs62495696、rs4397385、rs1332312、rs13294439、rs7033078、rs1452289、rs7936903、rs1953655、rs7981566、rs17792748、rs61985798、rs8006042、rs883481、rs77359952、rs2305443、rs4787247、rs572858、rs11673399、 rs28456308、rs117294、rs357063、rs12473958、rs7580245、rs1440512、rs13314271、rs34819461、rs3805285、rs17030363、rs258129、rs9479343、rs17170324、rs12705317、rs73174654、rs2978213、rs72614682、rs35051342、rs717582、rs11439588、rs72736093、rs4932564、rs918703、rs7499886、rs2058306、rs1785550及rs6089982所構成的群組。 A system for identifying the affinities of ethnic groups in Taiwan, comprising: a nucleic acid extraction unit for obtaining a main test nucleic acid sample of a main tester and a test nucleic acid sample of a subject; a nucleic acid detection unit electrically connected to the nucleic acid extraction unit for detecting a plurality of nucleotides in a combination of a SNP site in the test nucleic acid sample and the test nucleic acid sample, wherein the SNP site combination includes a plurality of SNP sites, and the SNP sites are located on chromosomes 1-22; A non-transitory machine-readable medium, the non-transitory machine-readable medium is signal-connected to the nucleic acid detection unit for accessing a program for analyzing the nucleotide composition of the main test nucleic acid sample and the tested nucleic acid sample and determining a relative probability, the non-transitory machine-readable medium includes: a reference genome database, the reference genome database includes the SNP locus combination and a reference minor allele frequency set, wherein the SNP locus combination and the reference minor allele frequency set are determined by analyzing a Taiwan Established by the human whole genome database, the reference sub-allelic frequency set includes multiple sub-allelic frequencies corresponding to the SNP sites, the gene deletion rate of each SNP site is less than 0.1, and each sub-allelic frequency is greater than 0.4995 after linkage disequilibrium pruning; and a calculation unit, which is signally connected to the reference genome database, is used to compare the nucleotide composition of each SNP site and the corresponding one of the sub-allele frequencies and calculate to obtain a relative probability, Then determine the relatedness of the main tester and the tested subject according to the relative probability, wherein the calculation unit includes: a comparison module, which is used to compare the nucleotide composition of the main test nucleic acid sample and the tested nucleic acid sample at the same SNP site with the corresponding one of the sub-allele frequencies to obtain a plurality of target sub-allele frequencies; SNP loci are calculated to obtain multiple kinship indices A cumulative kinship index calculation module, whose signal is connected to the kinship index calculation module, and these kinship indexes are multiplied to obtain a cumulative kinship index; and a kinship probability calculation module, whose signal is connected to the cumulative kinship index calculation module, and the cumulative kinship index is used to calculate a kinship probability; wherein, these SNP sites are selected from rs6586535, rs946836, rs6694465, rs701614, rs rs16875084, rs476428, rs193491, rs6871253, rs3095250, rs3851224, rs12703023, rs109 rs107957 2. rs7499814, rs66491176, rs4793579, rs55865255, rs7207216, rs4891023, rs9305268, rs7521902, rs284164, rs4538254, rs1344706, rs10178377, rs9822113, rs4 401376, rs6786840, rs13128397, rs11932259, rs9968429, rs1443402, rs4703389, rs4286720, rs11242704, rs9372417, rs6920965, rs208869, rs2041009, rs12680 146, rs3847227, rs7038346, rs10962366, rs7043796, rs11006252, rs4746992, rs10887637, rs2003906, rs7926370, rs10844220, rs710681, rs4981030, rs9530834, rs7166130, rs8062124, rs9932649, rs2966063, rs4 30639, rs11081589, rs2033491, rs4814615, rs885985, rs12403557, rs143290884, rs10932127, rs1032665, rs4580593, rs12640221, rs986039, rs1877731, rs2858 2382, rs9296249, rs55668741, rs11753921, rs9690126, rs12680842, rs2929843, rs4409435, rs10809234, rs7023738, rs11144120, rs10869499, rs6482847, rs213 2966, rs577948, rs3741851, rs11171598, rs9573483, rs12898878, rs78526880, rs12597411, rs62034138, rs67048050, rs4368195, rs3859191, rs349989, rs11871 847, rs6037894, rs2207878, rs61778328, rs12759780, rs642307, rs910622, rs33941127, rs1544846, rs10182721, rs1158228, rs2340475, rs13102188, rs6858430 . 1985798, rs8006042, rs883481, rs77359952, rs2305443, rs4787247, rs572858, rs11673399, rs28456308, rs117294, rs357063, rs12473958, rs7580245, rs1440512, rs13314271, rs34819461, rs3805285, rs17030363, rs258129, rs9479343, rs17170324, rs 12705317, rs73174654, rs2978213, rs72614682, rs35051342, rs717582, rs11439588, rs72736093, rs4932564, rs918703, rs7499886, rs2058306, rs1785550 and rs60 A group of 89982. 如請求項4所述之鑑定台灣人族群親緣性的系統,其中該核酸檢測單元為一生物晶片、一化學試劑套組或一基質輔助雷射解析串聯飛行時間質譜儀。 The system for identifying the affinities of Taiwanese ethnic groups as described in claim 4, wherein the nucleic acid detection unit is a biological chip, a chemical reagent set or a matrix-assisted laser desorption tandem time-of-flight mass spectrometer. 如請求項4所述之鑑定台灣人族群親緣性的系統,其中該核酸檢測單元係使用一鑑定酵素切割法、一核酸片段質量差異檢測法、一螢光探針偵測法、一核酸片段構型變異法或一核酸定序分析法檢測該些核苷酸組成。 The system for identifying the affinity of Taiwanese ethnic groups as described in claim 4, wherein the nucleic acid detection unit uses an identification enzyme cleavage method, a nucleic acid fragment quality difference detection method, a fluorescent probe detection method, a nucleic acid fragment configuration variation method or a nucleic acid sequencing analysis method to detect the nucleotide composition. 如請求項4所述之鑑定台灣人族群親緣性的系統,其中該核酸萃取單元使用一管柱萃取純化法或一試劑萃取純化法萃取該主測核酸樣本和該被測核酸樣本。 The system for identifying the affinity of Taiwanese ethnic groups as described in claim 4, wherein the nucleic acid extraction unit uses a column extraction and purification method or a reagent extraction and purification method to extract the main test nucleic acid sample and the test nucleic acid sample.
TW111122256A 2022-06-15 2022-06-15 Method for identifying affinity of taiwanese population and system thereof TWI807861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW111122256A TWI807861B (en) 2022-06-15 2022-06-15 Method for identifying affinity of taiwanese population and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW111122256A TWI807861B (en) 2022-06-15 2022-06-15 Method for identifying affinity of taiwanese population and system thereof

Publications (2)

Publication Number Publication Date
TWI807861B true TWI807861B (en) 2023-07-01
TW202401445A TW202401445A (en) 2024-01-01

Family

ID=88149306

Family Applications (1)

Application Number Title Priority Date Filing Date
TW111122256A TWI807861B (en) 2022-06-15 2022-06-15 Method for identifying affinity of taiwanese population and system thereof

Country Status (1)

Country Link
TW (1) TWI807861B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050019787A1 (en) * 2003-04-03 2005-01-27 Perlegen Sciences, Inc., A Delaware Corporation Apparatus and methods for analyzing and characterizing nucleic acid sequences
CN106480170A (en) * 2015-08-31 2017-03-08 广州华大基因医学检验所有限公司 Determine method and the application of donor and receptor difference SNP
CN107779499A (en) * 2017-10-17 2018-03-09 中国林业科学研究院森林生态环境与保护研究所 Rhinopithecus roxellana genetic monitoring based on SNP site and breed management method
EP2872648B1 (en) * 2012-07-13 2019-09-04 Sequenom, Inc. Processes and compositions for methylation-based enrichment of fetal nucleic acid from a maternal sample useful for non-invasive prenatal diagnoses
CN112553327A (en) * 2020-12-30 2021-03-26 中日友好医院(中日友好临床医学研究所) Construction method of pulmonary thromboembolism risk prediction model based on single nucleotide polymorphism, SNP site combination and application
CN113122644A (en) * 2021-05-31 2021-07-16 中国农业科学院特产研究所 SNP (Single nucleotide polymorphism) locus for detecting blood source content of red deer, screening method, corresponding SNP chip and application
CN113584178A (en) * 2020-04-30 2021-11-02 深圳华大法医科技有限公司 Noninvasive paternity testing analysis method and device
CN113808666A (en) * 2020-12-10 2021-12-17 黄书琴 DNA comparison analysis investigation system for ethnic group relativity
CN114214425A (en) * 2020-12-23 2022-03-22 上海亿康医学检验所有限公司 Method or device for identifying parent tendentiousness of nucleic acid sample

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050019787A1 (en) * 2003-04-03 2005-01-27 Perlegen Sciences, Inc., A Delaware Corporation Apparatus and methods for analyzing and characterizing nucleic acid sequences
EP2872648B1 (en) * 2012-07-13 2019-09-04 Sequenom, Inc. Processes and compositions for methylation-based enrichment of fetal nucleic acid from a maternal sample useful for non-invasive prenatal diagnoses
CN106480170A (en) * 2015-08-31 2017-03-08 广州华大基因医学检验所有限公司 Determine method and the application of donor and receptor difference SNP
CN107779499A (en) * 2017-10-17 2018-03-09 中国林业科学研究院森林生态环境与保护研究所 Rhinopithecus roxellana genetic monitoring based on SNP site and breed management method
CN113584178A (en) * 2020-04-30 2021-11-02 深圳华大法医科技有限公司 Noninvasive paternity testing analysis method and device
CN113808666A (en) * 2020-12-10 2021-12-17 黄书琴 DNA comparison analysis investigation system for ethnic group relativity
CN114214425A (en) * 2020-12-23 2022-03-22 上海亿康医学检验所有限公司 Method or device for identifying parent tendentiousness of nucleic acid sample
CN112553327A (en) * 2020-12-30 2021-03-26 中日友好医院(中日友好临床医学研究所) Construction method of pulmonary thromboembolism risk prediction model based on single nucleotide polymorphism, SNP site combination and application
CN113122644A (en) * 2021-05-31 2021-07-16 中国农业科学院特产研究所 SNP (Single nucleotide polymorphism) locus for detecting blood source content of red deer, screening method, corresponding SNP chip and application

Also Published As

Publication number Publication date
TW202401445A (en) 2024-01-01

Similar Documents

Publication Publication Date Title
Liu et al. Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing
RU2752700C2 (en) Methods and compositions for dna profiling
Fordyce et al. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform
Almeida et al. Bioinformatics tools to assess metagenomic data for applied microbiology
Thompson et al. An overview of DNA typing methods for human identification: past, present, and future
US20140127688A1 (en) Methods and systems for identifying contamination in samples
WO2018149264A1 (en) Fluorescent quantitative pcr detection kit and detection method
WO2020172164A1 (en) Compositions, methods, and systems to detect hematopoietic stem cell transplantation status
Ren et al. Forensic nanopore sequencing of STRs and SNPs using Verogen’s ForenSeq DNA signature prep kit and MinION
CN110577987A (en) Detection method of CGG (glutamic acid G) repetitive sequence of FMR1 gene and application thereof
Silva et al. Sequence-based autosomal STR characterization in four US populations using PowerSeq™ Auto/Y system
CN108823294B (en) Forensic medicine composite detection kit based on Y-SNP genetic markers of 20 haplotype groups D
US20210292829A1 (en) High throughput assays for detecting infectious diseases using capillary electrophoresis
Almeida et al. Authentication of human and mouse cell lines by short tandem repeat (STR) DNA genotype analysis
TWI807861B (en) Method for identifying affinity of taiwanese population and system thereof
CN108517357B (en) Kit for detecting sudden cardiac death-related SNP (single nucleotide polymorphism) on SCN5A gene related to sudden cardiac death and detection method thereof
CN113621696B (en) SNP marker and kit for detecting high altitude adaptability
CN112885407B (en) Second-generation sequencing-based micro-haplotype detection and typing system and method
Kao et al. Determination of SMN1/SMN2 gene dosage by a quantitative genotyping platform combining capillary electrophoresis and MALDI-TOF mass spectrometry
EP3421608A1 (en) Chromosome number quantification method
Krüger et al. Genetic fingerprinting using microsatellite markers in a multiplex PCR reaction: a compilation of methodological approaches from primer design to detection systems
CN108642190B (en) Forensic medicine composite detection kit based on 14 autosomal SNP genetic markers
Tyazhelova et al. Application of massive parallel sequencing technology in forensics: comparative analysis of sequencing platforms
TW202007776A (en) System and method for genetic profiling
CN109457019B (en) KCNH2 gene SCD related SNP detection kit and detection method