CN115161403A - Method for judging species affiliation of ancient DNA sample - Google Patents

Method for judging species affiliation of ancient DNA sample Download PDF

Info

Publication number
CN115161403A
CN115161403A CN202210564617.0A CN202210564617A CN115161403A CN 115161403 A CN115161403 A CN 115161403A CN 202210564617 A CN202210564617 A CN 202210564617A CN 115161403 A CN115161403 A CN 115161403A
Authority
CN
China
Prior art keywords
gene
language
ancient
snp
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210564617.0A
Other languages
Chinese (zh)
Inventor
夏薇
张治洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Weihai
Original Assignee
Harbin Institute of Technology Weihai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Weihai filed Critical Harbin Institute of Technology Weihai
Priority to CN202210564617.0A priority Critical patent/CN115161403A/en
Publication of CN115161403A publication Critical patent/CN115161403A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Theoretical Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides an ancient DNA ethnicity determination method based on language gene polymorphism SNPs markers; by extracting the ancient DNA in an ancient sample, directly establishing a library for the ancient DNA and carrying out whole-genome high-throughput sequencing, collecting 147 single-nucleotide polymorphism site information of 13 language genes (FOXP 1, FOXP2, CNTNAP2, RBFOX2, TPK1, DCDC2, KIAA0319, TM4SF20, FLNC, ATP2C2, ROBO1, ROBO2, CMIP, DYX1C1, NFXL 1) from the sequence to be tested, digitally characterizing the site information and carrying out Principal Component Analysis (PCA) on the site information and a group of comparison sequences, and then determining whether the ancient DNA sample belongs to Huchiji family or Hummer spring family.

Description

Method for judging species affiliation of ancient DNA sample
Technical Field
The invention belongs to the technical field of biological engineering in biotechnology, and particularly relates to a method for representing ancient DNA samples based on multiple SNPs (multiple single nucleotide polymorphisms) and then judging the ethnic assignment of the samples.
Background
The Manchu-Guassian family is distributed in Heilongjiang, northeast of inner Mongolia and Xinjiang in China; it is distributed abroad in eastern Russia and Mongolia. There are 5 languages in China, mandarin, henbei, hucho, ewenke, elunchun, and the Ladies of Ladies that have historically been used by Ladies. These languages are distributed in black river city, sonke county, call county, tahe county, fuyu county, turnera county, kumao county, cojiang county, runhe county, the ensha county of inner mongolia, the oren city of oren, chen Baer tiger flag, mo Lida watt dada county, argan, the gorna left flag, zhanlan county, the scout Cha Er ciba county of weiwu district, houcheng county, stew county, tachen county, einken city, wuluqiqi city, etc. of the black dragon river province. There are 8 languages in the Soviet Union, evingi, neukadar, naira, uligy, O Luo Keyu, wu Degai, and Oronci. These languages are distributed in Evergen autonomous Special districts, yakutt autonomous republic, british autonomous republic, habarofsky Bordetella, bishore Bordetella, sakhalin, kancaca, ma Jiadan, etc., in east Siberian and far east. In Mongolia, only Ewenke is distributed in Barger region. The population of nationalities using the full-go-guls language is about 445 million. Among them, about 439 thousands (1982) in China, 5.64 thousands (1979) in the Soviet Union, and more than 1000 Mongolian countries.
Language generation is a lengthy process. If people in different regions say the same class or language, the people have a long history of co-evolution (a factor of removing the factors that the colonists or governors use the specified language to encourage people to use the language by administrative means). In addition, if the polymorphism patterns of the language genes of races in different regions are close to each other, there is a high possibility that they have a deep source on the ancient evolutionary path.
The Huchiji is a minority of a long-history minority in the northeast China, the national language is Huchiji, belongs to the Altai system full-Tonggusu Manyu (the view also considers that the Manyu is a Sunday), has no characters of the national language, records the language by using the Sirill letters, and is universal in Chinese due to the fact that the Chinese language is crossed with the Han for a long time. Due to the wide residence area, there are many self-names of hucho people, such as "nanbei", "nannaiao", "hucho" as the family name appearing in june of kangxi (1663) at the earliest, and "hucho" as the family name beginning to be widely spread after being published in one book of Ling Chun sound "hucho clan downstream of songhiang in 1934. The Hutaimen is mainly distributed in the three-river plain and the Wandashan afterward formed by the intersection of the Heilongjiang, the Songhua and the Wusuli river, and centrally resides in two villages, namely the three-village, the Yankou Hutaimen village, the eight-fork Hutaimen village, the double-duck mountain village, the four-row Hutaimen village in the Rough county and the Jiamu Si city, the Zhenhei village in the Changshi county and the Jige Hutaimen village in the Fuyuan county. According to the "yearbook for Chinese statistics-2021", the population of the Huchimen in China is 5373.
This group of Orenchun was said to appear as "Ourless swallows" on 28.4.1640. After 1683 years (twenty-two years of Kangxi), different writing methods such as "Russian Luo Chun", "E Luochun" and "Elunchun" appeared in the literature for many times. From 1690 (twenty-nine years Kangxi), the "Elunchun" was fixed as a unified group name. The 'Elunchun' is a national self-call, namely 'people using reindeer'. The pronunciation of "E Lun" is the same as the pronunciation of reindeer (oron), and (cho) is an additional component representing a human, and the two are combined to be (oroncho), i.e., "ironspring", and chinese is the meaning of "deer shooter". Before the middle leaves of the seventeen century, the Orenchun river was distributed in a wide area with east to Beijia lake, north to Heilongjiang and center to Jingqili river. The Orlenchun nationality is one of the least-populated nations in the northeast of China. According to the Chinese statistics yearbook-2021, the Oronchun population is 9168. The Oruechun language belongs to the Altai language family, is full of the Tonggusi language family, has no characters, and is mainly used in Chinese at present.
The Ewenk nationality (old called Tonggus or Solen) is a nationality in northeast Asia, and mainly resides in Russian Siberian and two provinces of Mongolia and Heilongjiang in China, and Mongolia is distributed in a small amount. Is called as Angstrom Wen Ji in Russian. Ewenke is a national self-call of Ewenke, which means "people living in Dashan forest". The language culture of Ewenke nationality is unique, belongs to North China of the Tonggusi nationality of the Altai language family, and in daily life, most of Ewenke people use the language of the nationality without the characters of the nationality. The hectoks generally use Mongolian characters, and farmers widely use Han characters. Ewenke is a group of people who develop from nomadic to stationary and engage in the mode of animal husbandry. The traditional culture of the people has great richness, and the clothing culture and the diet culture are the most prominent. According to the "annual book of Chinese statistics-2021", the number of people in the Ewenke group in China is 34617.
In the human evolution research, the patent firstly discovers that the Hucho family and the Erlenchun family are very close to or almost completely consistent with the Neideders in Europe (the Homo nerder thalensis is abbreviated as the Nissan) in the language gene polymorphism pattern, and suggests that modern and ancient DNA samples of the ethnic family have great potential research value in the future. One reason for this is that there is evidence that the Nerns in Europe have many features in common with Beijing. It is likely that the nyman is from the north of china, especially the species of the gulsian family (including the hucho and the bernoulli) in the black longjiang watershed and in the vicinity, although currently european tends to consider that the nyman in europe has succeeded in the human evolution in east asia by a long term migration on the continental europe and hybridization with the native ancient man in east asia. Currently, it is internationally accepted that nintendo is from africa. Due to the relatively narrow european territory, the deep space for coping with cold climates in the glacier stage is limited, which results in the ni person having to migrate to east asia or even to africa. The ni staying in europe is constantly extinct during the glacier, while some ni migrating east asia or south africa and several races of their subsequent evolution have an opportunity to survive instead. If the ni is from northeast asia, then the human african-originated utterance will be almost collapsed, and the entire human evolutionary history will be rewritten.
The human evolutionary history may also require finding answers from older geological times, namely: in older geological times, the major continents of the world are linked, and a certain class of ancient apes is in fact the earliest common ancestor; the continents are separated into a plurality of continents, the ancient apes of each continent evolve respectively but have the same genome basis, and the ancient mankind evolved in each region is ensured to have the same family. The periodic cycle of the geological glacier period enables the macro climatic features of continents to be similar, ensures proper interaction and hybridization among land ancient apes, enables the evolution directions of the ancient apes in all regions to be almost consistent until more than one place has an uprighted person (but regional differences exist). There has been no major reproductive isolation between these erectors, and erectors at different stages of evolution are present simultaneously. What plays a key role later is that the local climate determines which upright people or wisdom people are going out first, and the rapid fusion, multi-domain evolution and local extinction of the subsequent race are driven.
The overall results, based on literature studies, are: (1) International studies currently dominated by the western world indicate that humans originate in africa; ancient apes evolved into erect persons about 200 million years ago; ancient humans (orthotics) walk out of africa, first to europe and western asia, then to south and east asia, then to south and australia, then to north america in about 12000 years, and then to south america. (2) The leading Chinese research (most results are not published in English and are unknown in Western) has strong evidence that apes and monkeys originate from southwest China at the earliest, the ancient apes evolve to erect persons, and then intelligence is generated, and a large amount of fossil evidence exists in China in all stages; sago mountain people exist in the southwest area of China 214 ten thousand years ago; earlier fossils should also be found in the future. (3) literature data indicate that the following possibilities exist: in the ancient geological period, due to the fact that the continent of Eurasia is connected with Africa, ancient apes evolved at that time are likely to migrate or naturally diffuse to places such as Africa from the southwest of China, so that the ancient apes which are evolved into erect persons in the southeast of China and the east of Africa are guaranteed to have the same ancestors, namely the same initial genomes, and the erect persons in the southwest of China and the east of Africa and the descendents evolved by the erect persons belong to the same family at the genome level; (4) The formation time of the east African rift is consistent with the second time of the rising time of the Ximalaya mountain in the southwest of China, which is about 500-200 ten thousand years ago. The two places have huge changes of the landforms, so that similar challenges are brought to local ancient apes, namely, forests in local regions are removed, the ancient apes have to adapt to dilute grassland (east Africa) or river canyons (southwest of China), and the two places need the ability of the ancient apes to walk vertically as soon as possible to survive; (5) As long as the standing person does not have the capability of preserving fire for a long time or actively preparing fire and has the capability of resisting severe cold when preparing the animal coat, the standing person cannot survive in the ice river for one time, and is easy to die in forests, caves or grasslands due to temperature loss. From this perspective, the geographic features of southwest regions in China are complex, so that ancient apes and ancient mankind in various evolution stages can be more easily helped to avoid severe coldness in the ice season, and the condition of existence can be more easily met than that of Africa; (6) Uprighters or their subsequent evolutionary races in various parts of the world are unlikely to die all during each glacier, and it is more likely that a small proportion of the various ancient people will survive (certainly not precluding the total death of some races); due to the absence of large reproductive barriers, almost all of these ancient ethnic groups can be hybridized and fused to generate new ethnic groups, which ensures the continuity and high similarity of genomic sequences during human evolution.
Nimandes (Homo nerderthrensis), abbreviated as niemann, is a generic term for a representative group of middle stages of human evolutionary history, and belongs to wisdom. Its fossil is found in the valley of neanded, germany. The distribution of niandett people is wide, spain and france in western europe, uzbekistan in east to middle asia, south to barrenstan, north to northern latitude 53 ° line. The earliest year is about 20 ten thousand years, and the latest year is about 4 ten thousand years. Nieander is a close relative of the ancestors of modern europe, who dominate the entire europe, western asia, and northern africa, starting 12 million years ago, but these ancient mankind disappeared twenty-four thousand years ago. In 2010, the draft of the genome of the Niander people was released, and based on the draft of the genome of the Niander people, the research results show that modern people in the continental Eurasia except for African people have 1% -4% of the contribution of the gene components of the Niander people. In 2017, 3 months and 3 days, the U.S. science journal published on 3 days as a treatise on ancient human skull research on the ancient Chinese schchang unearthed late date, and human evolution research makes breakthrough progress: 'xuchang' who lives in Xu Changshi Lingjing ruins in Henan province 10 million years ago may be the offspring of ancient mankind and European Neidede people in China.
How the evolutionary process of language skills matches the overall evolutionary process of human beings is an important topic. It is certain that language ability should be significantly developed during the period of erectors, that the intelligence stage should be further improved and differentiated, and that the language ability of late wisdom should be the same as that of modern people. The current research finds that (1) the language gene polymorphism patterns of modern people all over the world seem to be very close except in africa; (2) There are individual danislava and nieander individuals who naturally possess language gene polymorphism patterns that are nearly as close as modern individuals, suggesting that danislava and nieander individuals should speak and pronounce as modern individuals; (3) The language gene polymorphism patterns of modern africans are very different from those of other countries, and some countries such as spain and kenya have language gene patterns between africans and modern people of other countries.
In the course of human evolution research, ancient DNA has been a sharp instrument. Through the extraction and fine sequencing of ancient DNA or fossil DNA and the quantitative method of molecular genetics, the evolutionary distance between the corresponding race and other races with known genome sequences can be accurately deduced. Sequencing is divided into whole genome sequencing and partial segment sequencing. Many ancient DNAs are rare and fragmented, making it difficult to obtain a whole genome sequence from such a sample; of course, partial segment sequencing results may also be good predictors of evolutionary distance or genetic relationship. However, when the genetic relationship between a large number of complex samples is compared at the same time, the algorithm is limited, and the precise positioning of the genetic relationship for some samples is difficult. The invention can better perform ethnic assignment positioning or auxiliary positioning on ancient DNA samples of Huchicho or Erlenchun by a group of fixed language gene polymorphism parameters with larger quantity and a group of standard DNA sequence samples with known evolutionary distance or genetic relationship.
Disclosure of Invention
The invention aims to further improve the prior art and provide an ancient DNA ethnicity determination method based on the language gene polymorphism SNPs markers; by extracting the ancient DNA in an ancient sample, directly establishing a library for the ancient DNA and carrying out whole genome high-throughput sequencing, collecting 147 single nucleotide polymorphism site information of 13 linguistic genes from a measured sequence, carrying out digital characterization on the site information and carrying out Principal Component Analysis (PCA) on the site information and a group of control sequences, and then determining whether the ancient DNA sample belongs to the Huchicho family or the Erlenchun family.
Technical route of the invention
(1) Extracting ancient DNA; the following methods are generally used: DNA analysis of an early model human from Tianyuan Cave, china, proc Natl Acad Sci USA.2013Feb 5;110 2223-7. Other known methods can be adopted according to the nature and the source of the sample;
(2) Whole genome high-throughput sequencing: adopting a whole genome sequencing method suitable for ancient DNA such as second-generation high-throughput sequencing to obtain a genome sequence file with a format such as fastq;
(3) Polymorphisms of the linguistic gene SNPs: the related 13 language genes are FOXP1, FOXP2, CNTNAP2, RBFOX2, TPK1, DCDC2, KIAA0319, TM4SF20, FLNC, ATP2C2, ROBO1, ROBO2, CMIP, DYX1C1 and NFXL1; 4-12 SNP sites were used per gene, for a total of 147 SNP sites. The SNP site used is subject to having the specificity of a specific region as much as possible. The more sites are adopted, the more easily the nuances among the samples can be identified, and the more accurately the ethnic localization of the ancient DNA is made;
(4) Control sequence: it is currently internationally accepted that humans originate in africa; ancient apes evolved into erect persons about 200 million years ago; ancient humans (orthotics) walk out of africa, first to europe and western asia, then to south and east asia, then to south and australia, then to north america in about 12000 years, and then to south america. This takes place over 2-3 times of african walk, corresponding to different stages of ancient mankind (erectors, wisdom, moderns, etc.). The backstaAfrica event also occurred, but how many times it was not clear. Especially how many major human migrations between several major continents take place will also require many years of research in the future. However, to perform ethnic mapping on an unknown ancient DNA sample, a control sequence including several continents or regions is generally required: ancient DNA and modern DNA of Africa, america, europe, east Asia, south Asia, southeast Asia, oceania, and the like. Ancient DNA research in many regions of the world has just begun, and there is currently a lack of high enough quality ancient DNA, but African, european, and Asian ancient DNA is an essential control sequence.
(5) PCA analysis: and (4) carrying out PCA analysis by adopting a FactoMineR program package and an auxiliary program package in the R language. Whether the unknown ancient DNA belongs to the Huchikuchi family or the Erlenchun family is judged or assisted by the position in the fine map output by PCA and the relative position of the comparison sequence.
Advantages and effects of the invention
1. Specific SNP combinations:
genes directly related to language functions are called language genes. The single base difference of a specific position of a gene sequence expressed in different individuals is called Single Nucleotide Polymorphism (SNP). For example, if a specific position of a gene is A or T in different populations, the position is an A/T polymorphic site of the gene. Different distribution characteristics are provided in different ethnicities, different regions and different individuals. Since the number of the language genes themselves is at least ten, the number of related SNPs of each gene can be as large as several to several tens, and the language genes need to interact with at least several tens of other genes (and further interact with the environment) to finally determine the specific traits of the language abilities of the individuals, it is conceivable that the biological basis of the language abilities between different individuals should be very large. This is also the molecular basis for the phenotypic diversification of language function-related phenotypes between individuals of different ethnic groups and different regions.
TABLE 1 list of language genes to which this patent relates
Figure BDA0003657381840000051
Description of the drawings: there is corresponding experimental evidence for these 15 human genes that are related to language function.
Reference:
[1]LaiCS et al.Afork-head domaingene is mutated in a severe speech and languagedisorder.Nature,2001,413(6855):519-523.
[7]GialluisiA etal.Genome-widescreening for DNA variants associated with reading andlanguagetraits.GenesBrain and Behavior,2014,13(7):686-701.
[8]WiszniewskiW etal.TM4SF20ancestral deletion and susceptibility to a pediatric disorderofearly languagedelay and cerebral white matter hyperintensities.AmericanJournal of HumanGenetics,2013,93(2):197-210.
[19]BaconC&GARappold.Thedistinct and overlapping phenotypic spectra of FOXP1 andFOXP2 incognitivedisorders.Human Genetics,2012,131(11):1687-9168.
[20]VillanuevaP etal.Genome-wide analysisof genetic susceptibility to language impairment inanisolated Chileanpopulation.European Journal of Human Genetics, 2011,19(6):687-695.
[21]FattalI et al.The crucial roleof thiamine in the development of syntax and lexicalretrieval:a study ofinfantile thiamine deficiency.Brain,2011, 134(6):1720-1739.
[23]Hannula-Jouppi,K.et al.The axonguidance receptor gene ROBO1 is a candidate genefordevelopmental dyslexia.PLoS Genet.1,e50;(2005).
[24]Bates,T.C.etal.Geneticvariance in a component of the language acquisition device:ROBO1polymorphismsassociated with phonological buffer deficits.Behav. Genet.41,50–57;(2011).
[25]Paracchini,S.et al.Thechromosome 6p22 haplotype associated with dyslexia reducestheexpression of KIAA0319,a novel gene involved in neuronal migration. Hum.Mol.Genet.15,1659–1666;(2006).
[26]Newbury,D.F.et al.Investigation of dyslexia and SLI risk variants inreading-andlanguage-impaired subjects.Behav.Genet.41,90–104;(2011).
[27]Scerri,T.S.etal.DCDC2,KIAA0319 and CMIP are associated with reading-related traits.Biol.Psychiatry70,237–245;(2011).
[28]Villanueva,P.et al.Exomesequencing in an admixed isolated population indicatesNFXL1variants confer arisk for specific language impairment.PLoS Genet.11,e1004925;(2015).
[29]StPourcain,B.et al.Commonvariation near ROBO2 is associated with expressivevocabulary ininfancy.Nat.Commun.5,4831;(2014).
[30]Newbury,D.F.et al.CMIP andATP2C2 modulate phonological short-term memory inlanguageimpairment.Am.J.Hum.Genet.85,264–272;(2009).
[31]Vernes,S.C.etal.A functionalgenetic link between distinct developmental languagedisorders.N.Engl.J.Med.359,2337–2345;(2008).
[32]Whitehouse,A.J.,Bishop,D.V.,Ang,Q.W.,Pennell,C.E.&Fisher,S. E.CNTNAP2variants affect earlylanguage development in the general population.GenesBrain Behav.10,451–456;(2011).
[33]Deffenbacher,K.E.et al.Refinement of the 6p21.3 quantitative trait locusinfluencingdyslexia:linkageand association analyses.Hum.Genet.115,128–138;(2004).
[34]Schumacher,J.et al.Stronggenetic evidence of DCDC2 as a susceptibility gene fordyslexia.Am.J.Hum.Genet.78,52–62;(2006).
[35]Taipale,M.etal.A candidategene for developmental dyslexia encodes anucleartetratricopeptide repeatdomain protein dynamically regulated in brain.Proc.Natl.Acad.Sci.USA 100,11553–11558;(2003).
[36]Paracchini,S.et al.Analysis ofdyslexia candidate genes in the Raine cohort representingthegeneralAustralian population.Genes Brain Behav.10,158–165;(2011).
[37]Francks,C.etal.A 77-kilobaseregion of chromosome 6p22.2 is associated with dyslexiainfamilies from theUnited Kingdom and from the United States.Am.J. Hum.Genet.75,1046–1058;(2004).
table 2 language gene 147 SNP sites
Gene SNP name SNP abbreviation
ATP2C2 rs78371901,rs4782948,rs2435172,rs247885,rs247818, rs74038217,rs62640935,rs62640932,rs62640931,rs62050917,rs16973859, rs13334642,rs4782970 ATP-1,ATP-10,ATP-11,ATP-12,ATP-13,ATP-2,ATP-3, ATP-4,ATP-5,ATP-6,ATP-7,ATP-8,ATP-9
CMIP rs201316817,rs34119643,rs16955675,rs2288011,rs1187121850, rs183876152,rs183075361,rs114894868,rs79979027,rs74031247,rs60152409, rs57603843,rs35429777 CMI-1,CMI-10,CMI-11,CMI-12,CMI-13,CMI-2, CMI-3,CMI-4,CMI-5,CMI-6,CMI-7,CMI-8,CMI-9
CNTNAP2 rs1637842,rs3194,rs535454043,rs2373284,rs61732853, rs1637841,rs1479837,rs1468370,rs1062072,rs1062071,rs987456,rs700309, rs700308 CNTN-1,CNTN-10,CNTN-11,CNTN-12,CNTN-13,CNTN-2,CNTN-3, CNTN-4,CNTN-5,CNTN-6,CNTN-7,CNTN-8,CNTN-9
DCDC2 rs35029429,rs33914824,rs33943110,rs190254728,rs2274305, rs34584835,rs33943110,rs33914824,rs9467075,rs9460973,rs3846827,rs3789219
DCD-1,DCD-10,DCD-11,DCD-12,DCD-2,DCD-3,DCD-4,DCD-5,DCD-6, DCD-7,DCD-8,DCD-9
FLNC rs2291569,rs2249128,rs117864464,rs35281128,rs371111092, rs2291568,rs2291566,rs2291565,rs2291563,rs2291562,rs2291561,rs2291560, rs2291558 FLN-1,FLN-10,FLN-11,FLN-12,FLN-13,FLN-2,FLN-3,FLN-4,FLN-5, FLN-6,FLN-7,FLN-8,FLN-9
FOXP1 rs76145927,rs17008224,rs147756430,rs75214049,rs17008544, rs17008063,rs11914627,rs7639736,rs1499893,rs1053797,rs144080925
FOXP1-1,FOXP1-10,FOXP1-11,FOXP1-2,FOXP1-3,FOXP1-4,FOXP1-5,FOXP1-6, FOXP1-7,FOXP1-8,FOXP1-9
FOXP2 rs10227893,rs144807019,rs182138317,rs61758964,rs10244649, rs12705977,rs61732741,rs61758964,rs62640396,rs73210755,rs1058335, rs61753357,rs7638391 FOXP2-1,FOXP2-10,FOXP2-11,FOXP2-12,FOXP2-2, FOXP2-3,FOXP2-4,FOXP2-5,FOXP2-6,FOXP2-7,FOXP2-8,FOXP2-9,FXP1
KIAA0319 rs138160539,rs75720688,rs150584710,rs115399701,rs7770041, rs117692893,rs114195393,rs699461,rs699462,rs699463,rs730860,rs10946705, rs75674723 KIA-1,KIA-10,KIA-11,KIA-12,KIA-13,KIA-2,KIA-3,KIA-4,KIA-5, KIA-6,KIA-7,KIA-8,KIA-9
NFXL1 rs1964425,rs920462,rs147017712,rs13152765,rs34323060, rs1822030,rs1822029,rs1812964,rs1545200,rs1440228,rs1371730,rs1036681, rs978094 NFX-10,NFX-11,NFX-12,NFX-13,NFX-2,NFX-3,NFX-4,NFX-5, NFX-6,NFX-7,NFX-8,NFX-9
ROBO1 rs34841026,rs77350918,rs6795556,rs35456279
ROBO-10,ROBO-14,ROBO-15,ROBO-16
ROBO2
rs11127602,rs78817248,rs144468527,rs17525412,rs10865561,rs5788280,rs392 3745,rs3923744,rs1163750,rs1163749,rs1163748,rs1031377
ROBO-1,ROBO-11,ROBO-12,ROBO-13,ROBO-2,ROBO-3,ROBO-4,ROBO-5,ROBO-6, ROBO-7,ROBO-8,ROBO-9
TM4SF20
rs6724955,rs137891000,rs44675173,rs4675172,rs4673192,rs4438464,rs442801 0,rs4408717,rs13415654,rs80305648
TM1,TM10,TM2,TM3,TM4,TM5,TM6,TM7,TM8,TM9
TPK1
rs113536847,rs77358162,rs79464600,rs77358162,rs28380423,rs17170295,rs12 333969,rs6953807,rs67644764
TPK-1,TPK10,TPK-2,TPK-3,TPK-4,TPK-5,TPK-6,TPK-7,TPK-8
2. Specific control sequences:
the control sequences provided by the present invention include the genomic sequences of modern humans in Africa, american, europe, east Asia, south Asia, southeast Asia, atlantic, and a set of ancient DNA genomic sequences in east Asia, europe, and Africa, see Table 3, which contains 3 test samples and 73 control samples.
Table 3 specific control sequences
Figure BDA0003657381840000081
Figure BDA0003657381840000091
Figure BDA0003657381840000101
3. Specific PCA calculation methods:
the invention adopts a PCA analysis technology provided by an R language platform, and adopts a FactoMineR program package so as to process the normalization of variables with different units and different size ranges. All SNP locus information is digitally processed by the following method: 1000 a frequency-1000 t frequency-1000 g frequency-1000 c frequency; for example, if the site information of an SNP is a =0, t =0.166, G =0.834, c =0, then the quantization is 166834000; and a =0.166, t =0, g =0, c =0.834, quantized to 166000000834; if the frequency of a base is 1, then the quantization is 999, for example, A =0, T =1, G =0, C =0 is 999000000; a =1, t =0, g =0, c =0, quantified as 999000000000; in the process of collecting the SNP information of the linguistic gene, it is found that a certain SNP site in the genome sequence of the same sample has both a sequence containing a and a sequence containing C, i.e., a =1, t =0, g =0, C =1, and is quantized to 999000000999; if a certain linguistic gene SNP locus is determined to be A =0, T =1, G =0, C =1, then the quantification is 999000999; a =1, T =0, g =1, c =1, quantized as 999000999999;
Detailed Description
1. Extraction of ancient DNA: using a commercial ancient DNA extraction reagent qiaguickr kit), the specific steps were as follows:
(1) Weighing 0.1g of sample, adding 1ml10% (v/v) SDS, 4ml0.5M EDTA, 100. Mu.l 10mg/ml proteinase K, and incubating at 50 ℃ for 24h with shaking;
(2) Centrifuging the lysate at 8000r/min for 8min;
(3) Adding 450 μ l of supernatant into Centricon YM-10, centrifuging at 6500r/min, repeating for several times, and concentrating the supernatant to 100 μ l;
(4) Add 5 volumes of qiaquickbuffer pb, mix with 2 μ l PHI for 30s;
(5) Adding into QIAquickspinocolumn, and centrifuging at 13000r/min for 1min;
(6) Pouring off the filtrate, and centrifuging for 1min at 13000 r/min;
(7) Adding 500ul buffer PE,13000r/min, centrifuging for 1min, and washing for 2 times in total;
(8) Centrifuging at 13000r/min for 3min to dry the filter;
(9) 100. Mu.l of BufferEB was added, incubated at 53 ℃ for 8min, centrifuged at 13000r/min for 2min, and the filtrate was collected and stored at-20 ℃.
(10) The concentration and purity of the ancient DNA solution extracted by the kit method were determined using a ThermoNanODROP2000 spectrophotometer.
Depending on the source and amount of the sample, ancient DNA extraction can also be performed according to the following literature methods: an extraction method of ancient DNA in hide and a kit CN201810465329.3; an ancient DNA capture method for wooden cultural relics CN202210149079.9; an ancient DNA library construction method CN202110894520.1; an efficient ancient DNA extraction method CN202110652184.X; an ancient human remains sex identification method CN201910185048.7 aiming at extremely low DNA content; a method for rapidly extracting ancient DNA by SiO 2-loaded magnetic beads CN201310675886.5; method for identifying and analyzing ancient DNA samples cn201710667605.X; or DNA analysis of an early model human from Tianyuan Cave, china, proc Natl Acad Sci USA.2013Feb 5;110 (6):2223-7.
2. High-throughput sequencing: the ancient DNA does not need fragmentation treatment, library construction, end modification, joint reaction, amplification, microfluidic cell reaction and signal data acquisition are directly carried out, and raw data are obtained; processing raw data by using related platform software and obtaining a genome sequence file with a format of fastq and the like of a whole genome sequence;
3. obtaining SNP locus information and digitizing the information:
aiming at the total 147 sites of 13 linguistic genes, downloading the sequence information of each SNP in dbSNP database: https:// www.ncbi.nlm.nih.gov/SNP/(dbSNPbasically shows modern human SNP information; so the patent essentially utilizes the modern human linguistic gene polymorphism information to judge the relative evolutionary distance between different human genome samples, and then realizes the judgment or auxiliary judgment of the evolutionary classification attribution of a specific sample), and then uses the sequences of 18-25 bases before and after the sites to search the specific base information of the SNPs in a fastq sequence file. The fastq genomic sequence file is generally the larger the better, but is limited to the memory capacity of computers, and 2G-200G sized fastq files can be used for SNP information retrieval. For example, if the size of 200G exceeds the memory of the computer, the file can be divided into 2 smaller files for respective retrieval. The ancient DNA genome sequence is only a few G in size because the sample is seriously degraded for a long time, but the ancient genome information of a plurality of control sequences is sequenced with high quality, and the total size of the sequence can reach more than 200G. The selection of all 147 SNP loci mainly depends on the information in databases of 1000 genes/Ensembl or gnomaD genes r3.0, and the most core selection basis is that the occurrence frequency of the selected SNPs in different ethnic groups is obviously different as much as possible, so that the SNPs are favorable for distinguishing different samples subsequently. All SNP locus information is digitally processed by the following method: 1000 a frequency-1000 t frequency-1000 g frequency-1000 c frequency; for example, if the site information of one SNP is a =0, t =0.166, g =0.834, c =0, the quantization is 166834000; and a =0.166, t =0, g =0, c =0.834, quantified as 166000000834; if the frequency of a base is 1, the quantization is 999, for example, A =0, T =1, G =0, C =0 is 999000000; a =1, T =0, g =0, c =0 quantified as 999000000000; in the process of collecting the SNP information of the linguistic gene, it is found that a certain SNP site in the genome sequence of the same sample has both a sequence containing a and a sequence containing C, i.e., a =1, t =0, g =0, C =1, and is quantized to 999000000999; if a certain linguistic gene SNP locus is determined to be A =0, T =1, G =0, C =1, then the quantification is 999000999; a =1,t =0, g =1,c =1, quantized to 999000999999;
4. PCA calculation: principal Component analysis Principal Component Methods (PCA) are used to extract important information from multivariate data tables and represent this information as a set of new variables called Principal components. These new variables correspond to linear combinations of the original variables. The number of principal components is less than or equal to the number of original variables. The goal of PCA is to identify the direction (or principal component) in which the data changes most, reducing the dimensionality of the multivariate data to two or three principal components that can be visualized graphically with minimal loss of information. PCA is one of the machine learning dimensionality reduction methods, but is only useful for linear data, which suggests the use of TSNE. The SNP data referred to in this patent is linear data. In general the main purpose of principal component analysis is to identify hidden patterns in the data set; reducing dimensionality of the data by eliminating noise and redundancy in the data; relevant variables are identified. The variables are typically scaled (i.e., normalized) in principal component analysis. This is particularly desirable when the variables are measured on different scales (e.g., kilograms, kilometers, centimeters …); otherwise the resulting PCA output will be severely affected with the goal of making the variables comparable. Typically, the variables are scaled to have a standard deviation of 1 and a mean of zero. Data normalization is a widely used method for gene expression data analysis prior to PCA and cluster analysis. Automatically standardizing data before PCA in a program package FactorMineR of R language under default condition; in the R package, the factminer package extends the traditional multivariate statistical method from the perspective of exploratory analysis (describing, plotting, and visualizing the dataset), including the following method 1) dimension reduction method: principal Component Analysis (PCA), factor analysis (FA, including multiple factor analysis MFA, hierarchical multiple factor analysis HMFA, and mixed data factor analysis FAMD), correspondence analysis (CA, including multiple correspondence analysis MCA) (2) clustering analysis method: hierarchical clustering, k-means clustering, and model-based clustering. FactormineR integrates the results of multivariate analysis well and has the following characteristics: different types of variables (quantitative or categorical), different types of data structures (variable partitioning, hierarchy of variables, individual partitioning), and supplemental information (supplemental individuals and variables) may be considered. The factextratra packet is a complement of the factminer. The multivariate analysis result is more attractive and visualized by using the calculation result of the FactorMineR on the basis of another R packet ggplot 2. The specific calculation and visualization process is as follows:
>library(FactoMineR)
>library(factoextra)
>library(ggplot2)
>country<-read.delim('C:/RBook/20220516fastqSNPdata.txt',row.names=1, sep='\t')
>country<-t(country)
>country.pca<-PCA(country,ncp=2,scale.unit=TRUE,graph=FALSE)
>plot(country.pca)
>pca_sample<-data.frame(country.pca$ind$coord[,1:2])
>head(pca_sample)
>pca_eig1<-round(country.pca$eig[1,2],2)
>pca_eig2<-round(country.pca$eig[2,2],2)
>pca_eig1
>pca_eig2
>group<-read.delim('C:/RBook/group3.txt',row.names=1,sep='\t', check.names=FALSE)
>group<-group[rownames(pca_sample),]
>pca_sample<-cbind(pca_sample,group)
>pca_sample$samples<-rownames(pca_sample)
>head(pca_sample)
>library(ggrepel)
>ggplot(data=pca_sample,aes(x=Dim.1,y=Dim.2))+geom_point(aes(color =group),size=3)+scale_color_manual(values=c('purple','red', 'green','blue','brown','pink','yellow','orange','grey'))+theme(panel.grid= element_blank(),panel.background=element_rect(color='black',fill= 'transparent'),legend.key=element_rect(fill='transparent'))+labs(x= paste('PCA1:',pca_eig1,'%'),y=paste('PCA2:',pca_eig2,'%'),color=”)+ geom_text_repel(aes(label=samples),size=3,show.legend=FALSE,box.padding= unit(0.25,'lines'))
* Description of the drawings: the number of the sample groups corresponds to the color types; the color types can be selected from color libraries such as orange, grey, yellow, dark, black, ping, green, brown, blue, and the like.
TABLE 1 list of language genes to which this patent relates
TABLE 2 147 SNP sites of the linguistic genes
TABLE 3 specific control sequences
Drawings
FIG. 1 the Hucho (ancient DNA and modern DNA) and the Orenchun (modern DNA) and specific Nerns appear in specific regions of the PCA diagram. Ancient DNA samples (dvi, huchigan family) were shown to be near Nissan (nd 2 and nd 3) in the PCA analysis; the DNA sample (c 4, huchiki family) was shown to be near Nib (nd 1 and nd 5) in the PCA analysis; the DNA sample (c 5, erenchun) was shown in the vicinity of Nieman (nd 2 and nd 3) in the PCA analysis; example 1 control sample (see table 3) + a set of samples (see also table 3); judging the ethnicity of the sample, if the sample is positioned at the leftmost position of the PCA diagram and is positioned near the positions of several Nys nd1-nd2-nd3-nd5, the DNA sample can be assisted to be judged as the DNA sample from Huchigan family or Elunchun family.

Claims (3)

1. An ancient DNA ethnicity determination method based on 147 language gene polymorphism SNPs markers, characterized by: the method comprises the following steps:
(1) Extracting ancient sample DNA;
(2) High-throughput genome sequencing;
(3) Extracting and quantifying the language gene polymorphism SNPs in the standard DNA sequence and the ancient DNA sequence;
(4) Principal Component analysis Principal Component Methods (PCA);
(5) And determining whether the ethnic group assignment of the ancient DNA sample belongs to Huchiji or Elunchun according to the distance from the ancient Nicol people in the PCA diagram.
2. The method for determining ancient DNA ethnicity based on the SNPs markers of 147 linguistic gene polymorphisms according to claim 1, wherein: the language genes comprise FOXP1, FOXP2, CNTNAP2, RBFOX2, TPK1, DCDC2, KIAA0319, TM4SF20, FLNC, ATP2C2, ROBO1, ROBO2, CMIP, DYX1C1 and NFXL1.
3. The method for determining ancient DNA ethnicity based on 147 genetic polymorphisms SNPs markers according to claim 1, wherein: the SNP of the linguistic gene TPK1 is rs113536847, rs77358162, rs79464600, rs77358162, rs28380423, rs17170295, rs12333969, rs6953807 and rs67644764; the SNP of the language gene TM4SF20 is rs6724955, rs137891000, rs44675173, rs4675172, rs4673192, rs4438464, rs4428010, rs4408717, rs13415654, rs80305648; the SNP of the linguistic gene ROBO2 is rs11127602, rs78817248, rs144468527, rs17525412, rs10865561, rs5788280, rs3923745, rs3923744, rs1163750, rs1163749, rs1163748, and rs1031377. The SNP of the language gene ROBO1 is rs34841026, rs77350918, rs6795556 and rs35456279; the SNP of the language gene NFXL1 is rs1964425, rs920462, rs147017712, rs13152765, rs34323060, rs1822030, rs1822029, rs1812964, rs1545200, rs1440228, rs1371730, rs1036681 and rs978094; the SNP of the linguistic gene KIAA0319 is rs138160539, rs75720688, rs150584710, rs115399701, rs7770041, rs117692893, rs114195393, rs699461, rs699462, rs699463, rs730860, rs10946705, rs75674723; the SNP of the linguistic gene FOXP2 is rs10227893, rs144807019, rs182138317, rs61758964, rs10244649, rs12705977, rs61732741, rs61758964, rs62640396, rs73210755, rs1058335, rs61753357 and rs7638391; the SNP of the linguistic gene FOXP1 is rs76145927, rs17008224, rs147756430, rs75214049, rs17008544, rs17008063, rs11914627, rs7639736, rs1499893, rs1053797 and rs144080925; the SNP of the language gene FLNC is rs2291569, rs2249128, rs117864464, rs35281128, rs371111092, rs2291568, rs2291566, rs2291565, rs2291563, rs2291562, rs2291561, rs2291560, rs2291558; the SNP of the linguistic gene DCDC2 is rs35029429, rs33914824, rs33943110, rs190254728, rs2274305, rs34584835, rs33943110, rs33914824, rs9467075, rs9460973, rs3846827, and rs3789219. The SNP of the language gene CNTNAP2 is rs1637842, rs3194, rs535454043, rs2373284, rs61732853, rs1637841, rs1479837, rs1468370, rs1062072, rs1062071, rs987456, rs700309, rs700308; the SNPs of the linguistic gene CMIP are rs201316817, rs34119643, rs16955675, rs2288011, rs1187121850, rs183876152, rs183075361, rs114894868, rs79979027, rs74031247, rs60152409, rs57603843 and rs35429777; the SNP of the language gene ATP2C2 is rs78371901, rs4782948, rs2435172, rs247885, rs247818, rs74038217, rs62640935, rs62640932, rs62640931, rs62050917, rs16973859, rs13334642, rs4782970.
CN202210564617.0A 2022-05-23 2022-05-23 Method for judging species affiliation of ancient DNA sample Pending CN115161403A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210564617.0A CN115161403A (en) 2022-05-23 2022-05-23 Method for judging species affiliation of ancient DNA sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210564617.0A CN115161403A (en) 2022-05-23 2022-05-23 Method for judging species affiliation of ancient DNA sample

Publications (1)

Publication Number Publication Date
CN115161403A true CN115161403A (en) 2022-10-11

Family

ID=83482721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210564617.0A Pending CN115161403A (en) 2022-05-23 2022-05-23 Method for judging species affiliation of ancient DNA sample

Country Status (1)

Country Link
CN (1) CN115161403A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108866171A (en) * 2017-05-10 2018-11-23 深圳华大基因研究院 A kind of species identification method based on new-generation sequencing
CN109402241A (en) * 2017-08-07 2019-03-01 深圳华大基因研究院 Identification and the method for analyzing ancient DNA sample

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108866171A (en) * 2017-05-10 2018-11-23 深圳华大基因研究院 A kind of species identification method based on new-generation sequencing
CN109402241A (en) * 2017-08-07 2019-03-01 深圳华大基因研究院 Identification and the method for analyzing ancient DNA sample

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
季林丹;姚彬彬;励佶佚;徐进;张亚平;: "人类群体环境适应的古DNA研究进展", 科学通报, no. 09, 30 March 2017 (2017-03-30), pages 28 - 35 *

Similar Documents

Publication Publication Date Title
CN105849279B (en) Methods and systems for identifying disease-induced mutations
Cavalli-Sforza et al. The application of molecular genetic approaches to the study of human evolution
CN105793859B (en) System for detecting sequence variants
JP6586088B2 (en) Method and system for aligning arrays
CN105793689B (en) Methods and systems for genotyping genetic samples
Grugni et al. Ancient migratory events in the Middle East: new clues from the Y-chromosome variation of modern Iranians
Linderholm et al. Corded Ware cultural complexity uncovered using genomic and isotopic analysis from south-eastern Poland
Ebdon et al. The Pleistocene species pump past its prime: evidence from European butterfly sister species
Karbstein et al. Phylogenomics supported by geometric morphometrics reveals delimitation of sexual species within the polyploid apomictic Ranunculus auricomus complex (Ranunculaceae)
CN106446597B (en) Several species feature selecting and the method for identifying unknown gene
Vai et al. A genetic perspective on Longobard-Era migrations
CN104099414B (en) Utilize the method for SSR molecular marker identification apricot cultivars
Juan et al. Mitochondrial DNA sequence variation and phylogeography of Pimelia darkling beetles on the island of Tenerife (Canary Islands)
US20190139628A1 (en) Machine learning techniques for analysis of structural variants
Brucato et al. Evidence of Austronesian genetic lineages in East Africa and South Arabia: complex dispersal from Madagascar and Southeast Asia
García et al. Ancient and modern mitogenomes from Central Argentina: new insights into population continuity, temporal depth and migration in South America
Serventi et al. Iron Age Italic population genetics: The Piceni from Novilara (8th–7th century BC)
Simon et al. Comparative transcriptomics reveal developmental turning points during embryogenesis of a hemimetabolous insect, the damselfly Ischnura elegans
Chauhan et al. Genes, stone tools, and modern human dispersals in the center of the Old World
CN115161403A (en) Method for judging species affiliation of ancient DNA sample
CN107365840A (en) Animal in deer family Rapid identification kit and its application based on DNA bar code
Herrera et al. Genomes, evolution, and culture: Past, present, and future of humankind
Rauf et al. Unveiling forensically relevant biogeographic, phenotype and Y-chromosome SNP variation in Pakistani ethnic groups using a customized hybridisation enrichment forensic intelligence panel
Zeng et al. Inferring the history of surname Ye based on Y chromosome high-resolution genotyping and sequencing data
Liu et al. Correlation analysis between language gene polymorphism and geography/society parameter from twenty-six countries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination