CN107475351B - Screening method of high-contribution pathogenic gene of rheumatoid arthritis - Google Patents

Screening method of high-contribution pathogenic gene of rheumatoid arthritis Download PDF

Info

Publication number
CN107475351B
CN107475351B CN201610403048.6A CN201610403048A CN107475351B CN 107475351 B CN107475351 B CN 107475351B CN 201610403048 A CN201610403048 A CN 201610403048A CN 107475351 B CN107475351 B CN 107475351B
Authority
CN
China
Prior art keywords
rheumatoid arthritis
dna
sequencing
group
healthy control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610403048.6A
Other languages
Chinese (zh)
Other versions
CN107475351A (en
Inventor
眭维国
戴勇
薛雯
侯显良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hui Weiguo
Original Assignee
Hui Weiguo
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hui Weiguo filed Critical Hui Weiguo
Priority to CN201610403048.6A priority Critical patent/CN107475351B/en
Publication of CN107475351A publication Critical patent/CN107475351A/en
Application granted granted Critical
Publication of CN107475351B publication Critical patent/CN107475351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Abstract

A method for screening a pathogenic gene with high contribution to rheumatoid arthritis comprises the following steps: setting a rheumatoid arthritis patient group and a healthy control group, wherein each group comprises a plurality of different human body peripheral blood specimens which are separated from the human body; extracting DNA in a peripheral blood sample; preparing a DNA pool of a rheumatoid arthritis patient group and a DNA pool of a healthy control group; determining the content, quality and integrity of the DNA pool, and executing the next step if the DNA pool is available; constructing a DNA library; sequencing the rheumatoid arthritis patient group and the healthy control group respectively by adopting a whole genome re-sequencing method, and obtaining sequencing results of the rheumatoid arthritis patient group and the healthy control group respectively; and comparing the sequencing result of the rheumatoid arthritis patient group with the sequencing result of the healthy control group, and screening out the high-contribution gene variation sites. The screening method can screen out the variation sites of the high-contribution pathogenic genes, and is beneficial to early treatment and prevention of the inflammation.

Description

Screening method of high-contribution pathogenic gene of rheumatoid arthritis
Technical Field
The invention relates to a method for acquiring information of rheumatoid arthritis high-contribution pathogenic genes as an intermediate result, in particular to a method for screening the rheumatoid arthritis high-contribution pathogenic genes.
Background
The Rheumatoid Arthritis (RA) is an autoimmune disease, has high global morbidity, wherein the morbidity of China is high, the number of the disease is far higher than that of men, the RA is always one of the most common diseases which disturb normal lives of people, the RA is mainly characterized by joint symmetry and destructiveness, bone joints of a human body are corroded, slight people have temporary pain, inconvenient movement, inflexible limbs and the like, severe people have walking difficulty and skeletal deformity, irreversible pathological characteristics with paroxysmal pain occur, multiple tissue organ inflammations are caused, the RA has serious influence on family economy and mental trauma, and the disease can be common in any age stage of any area, particularly in the northern area and coastal areas and has the mutual influence of genetic characteristics and external humid environment.
The rheumatoid arthritis is closely related to the infection of microorganisms, the invasion infection of any bacteria can cause the aggravation of the pathological changes of the rheumatoid arthritis, the condition can be caused because certain substances are secreted after the invasion of the bacteria to change the structure of proteins, in recent years, a large number of viruses such as herpes viruses, staphylococci, zona viruses and the like are extracted in joints of patients with the rheumatoid arthritis to draw sufficient attention, and the preliminary conclusion that some bacteria and viruses can cause and change the pathological characteristics of the rheumatoid arthritis and aggravate the exacerbation of the pathological conditions is initially concluded, so that not only the sampling investigation experiment is carried out in families of patients with the rheumatoid arthritis with obvious family genetic tendency, the high-frequency gene of the rheumatoid arthritis is found to be positioned in the 21-23 interval of the human chromosome 6, and an important pathogenic factor is called as the rheumatoid factor, the factor can be recognized by a protein antibody secreted by B cells to form a compound complement, if the factor enters a joint, the factor can be hydrolyzed by enzymes to destroy joint organs to form the most initial symptom pathology, but later discovery shows that the pathology is ubiquitous between patients and healthy patients, so the factor cannot be listed as the specific expression of the disease, in addition, research shows that the DNA genome of a rheumatoid arthritis patient has obvious methylation, so that the expression abnormality of a plurality of genes is caused, the generation of joints of organs of the whole body is influenced, in addition, in the continuous long-term development process, a plurality of theories with influence on the rheumatoid arthritis are generated, people think that the factor has stronger tendency to suffer from the disease for people with HLA-DR in different human bodies, mainly because the genes have similar amino acid sequences, as a receptor of the surface response mechanism, in addition to the explanation of LMP theory, LMP-2 and 7 genes have polymorphism and are related to many diseases, and the genetic variation of LMP sites is presumed to be possibly linked with the pathogenesis of RA.
Although more people go deep into the research work of rheumatoid arthritis, the pathogenic mechanism of rheumatoid arthritis is still the key to the problem of the research, is still unclear, only the pathogenic gene is found preliminarily and is considered to be related to heredity and environment, and in the past decades, people only find a plurality of methods for delaying the onset of rheumatoid arthritis and relieving the disease condition, including methods of drug therapy, laser therapy and the like, in clinical diagnosis, the rheumatoid factor and CCP antibody are usually taken as necessary items for blood examination, if the detection of HLA gene usually has the following steps, the single cell nucleus of peripheral blood is separated firstly, then DNA is extracted by a salting-out method, proper primers are arranged for gene amplification, finally, products are analyzed and processed by agarose gel electrophoresis, and with the development of the field of molecular biology, people begin to enter the systematic research stage for the mysteries of rheumatoid arthritis, the method mainly aims at solving the problem of many scientists of the genetic characteristic and susceptibility gene of rheumatoid arthritis, and people begin to explore the combination of sequencing technology in order to better study and master the treatment mechanism of more RA in the molecular micro field, so that the analysis and judgment can be further deeply carried out by an effective means.
Rheumatoid arthritis greatly troubles the life of people and becomes one of the diseases which must be explored.
Disclosure of Invention
In view of the above, it is necessary to provide a method for screening a causative gene highly contributing to rheumatoid arthritis.
A method for screening a pathogenic gene with high contribution to rheumatoid arthritis comprises the following steps:
s10: setting a rheumatoid arthritis patient group and a healthy control group, wherein each group comprises a plurality of different human body peripheral blood specimens which are separated from the human body;
s20: extracting DNA in the peripheral blood sample;
s30: preparing a DNA pool of a rheumatoid arthritis patient group and a DNA pool of a healthy control group;
s40: determining the content, quality and integrity of the DNA pool, and executing the next step if the DNA pool is available;
s50: constructing a DNA library;
s60: sequencing the rheumatoid arthritis patient group and the healthy control group respectively by adopting a whole genome re-sequencing method, and obtaining sequencing results of the rheumatoid arthritis patient group and the healthy control group respectively;
s70: and comparing the sequencing result of the rheumatoid arthritis patient group with the sequencing result of the healthy control group, and screening out the high-contribution gene variation sites of the rheumatoid arthritis.
In one embodiment, in step S40, the content and quality of the sample DNA in the DNA pool are determined using a spectrophotometer.
In one embodiment, in step S40, the integrity of the specimen DNA in the DNA pool is evaluated by agar gel electrophoresis.
In one embodiment, in step S40, the ratio of OD260/OD280 of the sample DNA in the DNA pool is determined to be 1.8-2.0, and if yes, the next step is executed.
In one embodiment, in step S20, the DNA in the peripheral blood sample is stored frozen in an environment at-20 ℃.
In one embodiment, in step S60, whole genome re-sequencing is performed using Hiseq2000 test platform.
In one embodiment, in step S20, the peripheral blood sample is centrifuged to extract DNA in the peripheral blood sample.
In one embodiment, in step S20, the lower pellet is collected after the centrifugation operation.
In one embodiment, in step S20, a protease reagent is added to the lower precipitate.
In one embodiment, in step S20, the lower precipitate after adding the protease reagent is placed in a constant temperature water bath environment.
The screening method of the rheumatoid arthritis high-contribution pathogenic gene discovers the variation site of the high-contribution pathogenic gene by screening the rheumatoid arthritis high-contribution pathogenic gene, plays a role in promoting the further research of the disease, provides a theoretical basis for the treatment and physiological mechanism of the rheumatoid arthritis, and has good effects on early treatment and prevention.
Drawings
FIG. 1 is a flowchart illustrating the steps of a method for screening a highly-contributing pathogenic gene of rheumatoid arthritis according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
The invention aims to discover the variation site of the high-contribution pathogenic gene of Rheumatoid Arthritis (RA) by screening the high-contribution pathogenic gene, plays a role in promoting the further research of the disease, provides a theoretical basis for the treatment and physiological mechanism of the RA, and has good effects on early treatment and prevention. As shown in FIG. 1, one embodiment of the present invention is a method for screening a causative gene highly contributing to rheumatoid arthritis, comprising the steps of: s10: setting a rheumatoid arthritis patient group and a healthy control group, wherein each group comprises a plurality of different human body peripheral blood specimens which are separated from the human body; s20: extracting DNA in the peripheral blood sample; s30: preparing a DNA pool of a rheumatoid arthritis patient group and a DNA pool of a healthy control group; s40: determining the content, quality and integrity of the sample DNA in the DNA pool, and executing the next step if the sample DNA is determined to be available; s50: constructing a DNA library; s60: sequencing the rheumatoid arthritis patient group and the healthy control group respectively by adopting a whole genome re-sequencing method, and obtaining sequencing results of the rheumatoid arthritis patient group and the healthy control group respectively; s70: and comparing the sequencing result of the rheumatoid arthritis patient group with the sequencing result of the healthy control group, and screening out the high-contribution gene variation sites of the rheumatoid arthritis.
In another embodiment, the method for screening the causative gene highly contributing to rheumatoid arthritis comprises the steps of:
s10: setting a rheumatoid arthritis patient group and a healthy control group, wherein each group comprises a plurality of different human body peripheral blood specimens which are separated from the human body.
For example, in the rheumatoid arthritis patient group, each peripheral blood sample comprises 5ml of rheumatoid arthritis patient peripheral blood; in the healthy control group, each peripheral blood sample included 5ml of peripheral blood of a healthy person.
For example, each example of the present invention used two sets of whole blood samples from 100 rheumatoid arthritis patients in the rheumatoid arthritis patient group and 100 normal rheumatoid arthritis-free healthy persons in the healthy control group. It should be noted that the selected specimens are controlled to be no more than 45 years old, the specimen is healthy and has no unclean life habits, all the specimens are collected from the first eight hospitals of the liberation military of Chinese people and approved by the ethical committee of the first eight hospitals of the liberation military, and meanwhile, the patient agrees, and all the subjects sign agreement books before the specimens are collected.
S20: and extracting DNA in the peripheral blood sample.
For example, in step S20, the DNA in the peripheral blood sample is frozen and stored in an environment of-20 ℃; in another example, in step S20, a peripheral blood sample is centrifuged to extract DNA in the peripheral blood sample; in another example, in step S20, after the centrifugation operation, a lower precipitate is collected; as another example, in step S20, a protease reagent is added to the lower precipitate; in another example, in step S20, the lower precipitate after the protease reagent is added is placed in a constant temperature water bath environment.
For example, the step S20 specifically includes the following steps:
s21: a microcentrifuge tube was taken, and 200. mu.L of peripheral blood was added thereto.
S22: adding 400 mu L of cell lysate into a microcentrifuge tube, fully shaking, standing at room temperature for 10 minutes, then placing into a high-speed centrifuge at the rotating speed of 9000rpm, centrifuging for one minute, discarding the supernatant, retaining the lower precipitate, adding 250 mu L of buffer solution, shaking uniformly, then adding 20 mu L of protease reagent, and placing in a constant-temperature water bath for 10 minutes to fully lyse the cells.
S23: and (3) after the solution is placed for a period of time and becomes clear, removing water drops in the centrifugal tube, adding 200 mu L of absolute ethyl alcohol, fully shaking and shaking uniformly to obtain the solution to be detected.
S24: adding the solution to be detected into an adsorption column, centrifuging at high speed for 30S, pouring off waste liquid, and putting the adsorption column back into the collection pipe again.
S25: add 500. mu.L of buffer A to the adsorption column, centrifuge at 12000rpm for 30S, discard the waste, and replace it in the collection tube.
S26: add 500. mu.L of buffer B to the adsorption column, centrifuge at 12000rpm for 30S, discard the waste, and replace it in the collection tube.
S27: add 500. mu.L of buffer C to the adsorption column, centrifuge at 12000rpm for 30S, discard the waste, replace it in the collection tube again, centrifuge at 12000rpm for 3 minutes again, discard the waste, allow it to stand for several minutes, and dry the residue in the collection tube.
S28: the micro adsorption column was put into a new 1.5ml microcentrifuge tube, 200. mu.L of the eluent was added, and the old tube was discarded, followed by standing at room temperature for 1 minute, followed by centrifugation at 6000 Xg (8000 rpm) for 1 minute, and the eluted DNA was collected to obtain a sample DNA, which was stored in a refrigerator for later use.
S29: repeating the steps S21-S28 to obtain a plurality of specimen DNAs of the rheumatoid arthritis patient group and the healthy control group respectively.
Through steps S21 to S29, several specimen DNAs were obtained for the rheumatoid arthritis patient group and the healthy control group.
S30: preparing a DNA pool of a rheumatoid arthritis patient group and a DNA pool of a healthy control group.
For example, the step S30 specifically includes the following steps:
s31: the concentration of specimen DNA was measured in the group of patients with rheumatoid arthritis.
S32: according to the concentration of the specimen DNA of the group of patients with rheumatoid arthritis, a corresponding amount of buffer was added and the concentration of the specimen DNA was diluted to 20 ng/. mu.L.
S33: and respectively taking the equal amount of specimen DNA of all the groups of patients with rheumatoid arthritis, mixing, and preparing the DNA pool of the group of patients with rheumatoid arthritis.
S34: the concentration of the sample DNA of the healthy control group was measured.
S35: according to the concentration of the sample DNA of the healthy control group, a corresponding amount of buffer was added, and the concentration of the sample DNA was diluted to 20 ng/. mu.L.
S36: and respectively taking the average amount of the sample DNA of all the healthy control groups, mixing, and preparing to obtain the DNA pool of the healthy control groups.
It should be noted that steps S31 to S33 or steps S34 to S36 may be performed synchronously or sequentially.
S40: and determining the content, quality and integrity of the sample DNA in the DNA pool, and executing the next step if the sample DNA is available.
For example, in step S40, the content and quality of the sample DNA in the DNA pool are measured using a spectrophotometer; in another example, in step S40, the integrity of the sample DNA in the DNA pool is evaluated by using an agar gel electrophoresis method; for another example, in step S40, the ratio of OD260/OD280 of the sample DNA in the DNA pool is determined to be 1.8-2.0, and if so, the next step is executed.
It should be noted that, before the preparation of DNA (deoxyribonucleotide) library, the quality inspection of the preserved DNA pool (DNA pooling) sample must be performed, for example, the content, quality and integrity of the sample DNA in the DNA pool are determined, and the next step is performed.
S50: a DNA library was constructed.
For example, step S50 specifically includes the following steps
S51: unscrewing a sealing cover to take out the ultrasonic atomizer, and sleeving an ethylene plastic pipe into an atomizer head to be close to the sealing cover; mixing 5 μ g of purified DNA, 50 μ L of TE buffer, and 700 μ L of buffer of ultrasonic atomizer, adding into atomizer, and screwing sealing cap; the randomly fragmented DNA fragments were recovered by connecting a PVC tube to a compressed air source at 35 psi to an ultrasonic atomizer, atomizing for 6 minutes, and centrifuging at 450Xg for 2 minutes.
S52: the recovered DNA fragment was purified using QIAquick PCR purification kit, and it was put on a QIAquick adsorption column, added with 30. mu.L of QIAGEN EB buffer, and eluted and concentrated.
S53: and (3) performing quality inspection on the obtained DNA fragment to ensure that the DNA fragment meets the library building requirement.
S54: and completely mixing the randomly smashed DNA fragments with DNA ligase, DNA polymerase and the like, centrifuging for a short time, and then putting the mixture into a thermal cycler for incubation for 30 minutes at 20 ℃ for terminal modification.
S55: a single "a" nucleotide is added to the end of a double stranded DNA fragment using an exonuclease.
S56: "T" nucleotides are ligated to the above DNA fragments, and linkers are added.
S57: mu.g of ethidium bromide was added to 150ml of TAE agarose Gel mixture of 1X, and then the DNA fragment ligations were cleaved to 150 to 250bp and purified according to the instructions of the QIAquick Gel Extraction Kit (Gel recovery Kit).
S58: the DNA with the linker was amplified by PCR technique.
S59: the final product was purified according to the instructions of the QIAquick Gel Extraction Kit, and placed on a QIAquick adsorption column, and 30. mu.L of QIAGEN EB buffer was added thereto, followed by elution and concentration to construct a DNA library.
S60: sequencing the rheumatoid arthritis patient group and the healthy control group respectively by adopting a whole genome re-sequencing method, and obtaining sequencing results of the rheumatoid arthritis patient group and the healthy control group respectively.
The whole genome re-sequencing is a method for performing whole genome sequencing on individuals with known genome sequences and performing difference analysis on individual or population levels, and comprises three parts of library construction, sequencing and biological information analysis, wherein the yield of data of more than 30X of each sample enables the sample to have the characteristic of high throughput. In this example, whole genome re-sequencing was performed on the rheumatoid arthritis patient group DNA pool and the healthy control DNA pool, respectively. For example, in step S60, whole genome re-sequencing was performed using Hiseq2000 test platform.
For example, the step S60 specifically includes the following steps:
s61: introducing DNA into a Flow Cell (FC), randomly hybridizing to adaptors distributed on the inner wall of the FC channel; immobilization to FC was achieved by complementary strand synthesis.
S62: the other end of the DNA single strand fixed on the FC is hybridized with a corresponding joint on the inner wall of the FC to form a bridge structure, and a complementary strand is synthesized under the action of high-fidelity DNA polymerase.
S63: and controlling the step S62 to repeat for 15-20 times, so that each single chain is amplified to 1000-6000 single chains to form a cluster, and millions of clusters are randomly distributed on the inner wall of the FC.
S64: the clusters are composed of double strands, and they need to be cleaved into single strands for sequencing, and the linkers fixed to the FC are provided with cleavage sites for chemical reagents or enzymes in advance, so that one single strand can be cleaved in an oriented manner, and then removed by alkali denaturation and buffer washing.
S65: a ddNTP was added to all free 3' ends to prevent unwanted random DNA extension.
S66: sequencing primers are introduced, hybridized to the binding sites of the universal primers, and the FC is ready for sequencing.
S67: the FC was transferred to a HiSeq2000 sequencer for base extension using dNTPs with both fluorescent and stop modifications, thus allowing only 1 base extension at a time, followed by elimination of unreacted bases and reagents.
S70: and comparing the sequencing result of the rheumatoid arthritis patient group with the sequencing result of the healthy control group, and screening out the high-contribution gene variation sites of the rheumatoid arthritis.
For example, the sequencing result can be obtained by performing whole genome re-sequencing on a rheumatoid arthritis patient group and a healthy control group respectively by using Illumina Hiseq2000, and then obtaining initial data preliminarily.
For example, after obtaining the sequencing result, the method further comprises the following steps:
s71: removing the adaptor sequence and low quality to obtain clean data, aligning the clean data to a reference genome through BWA, storing the clean data in a file of BWA, repeating and performing mark correction by using Picard, performing local realignment and base quality value recalibration by using GATK, and the like to obtain a final BAM file.
S72: detecting SNP (single-nucleotide polymorphism) and INDEL (Insertion Deletion) of the BAM file by using a GATK software, and detecting CNV (Copy number variation) by using CNVnator; detection of SV (structural variation of genome) was carried out using Pindel, CNVnator, BreakDancer.
For example, the final BAM file is also used to calculate coverage and alignment, after preliminary data analysis on rheumatoid arthritis patient groups and healthy control groups, respectively, group-by-group inspection analysis is performed on the groups using SIFT software by detecting the genetic locus status, if the P value of the detected locus is <0.01 and the SIFT value calculated according to the SIFT software is less than 0.05, a harmful mutation, i.e., a mutation site related to the rheumatoid arthritis, is defined, and statistical distribution analysis is performed on the whole gene level.
For example, all of the resulting data may need to be processed to reject non-defective data for a good database, e.g., the non-defective data may include data for which the quality is low and for which more than 10% of the unknown bases have been obtained; as another example, a qualified database includes clean readings for performing accurate biological analyses.
In one embodiment, all SNP sites are obtained by combining depth analysis, repetition rate and quality assurance on the whole genome level, the data of the sites have extremely high correct conservation and extremely high reliability, and the sites are detected and annotated according to reference information.
For example, by detecting variation and making comments, 4317412 and 4305668 SNPs, of which 133981 and 134844 are located in the coding region, respectively, were found and detected at the genome-wide level in the rheumatoid arthritis patient group and the healthy control group, respectively, the majority of the SNPs in the two groups overlap 99.37% and 99.34% in dbSNP135, the mutation sites in the newly detected disease group and the healthy group are 27249 and 28355, respectively, and the number of homozygotes is 551613 and 553072, respectively, which are basically the same as those detected in the human genome, indicating that the SNPs have relatively high accuracy.
TABLE 1 SNP annotation results
Figure BDA0001012995980000101
In one embodiment, SVs are annotated using Break Dancer software, e.g., genome variation is detected using Break Dancer et al software, and 6190 and 6310 SV variation sites are found in each of rheumatoid arthritis patient groups and healthy control groups, where inversions, deletions, and ectopic sites occur simultaneously, and SV breakpoints are concentrated in non-coding, intron, and exon regions.
TABLE 2SV notes results
Figure BDA0001012995980000111
In one embodiment, CNV is referred to as copy number variation, which is a type of variation, often occurring in some major diseases, and is an important molecular mechanism for disease, and CNV variation information is detected and annotated by using CNVnator.
For example, CNV is classified into two types of mutations, i.e., replication and deletion, generally referred to as an increase and decrease in the number of copies on the genome, and we performed detection using cnvnato software, and presented information shown in table 3, in which 7479 and 10153 mutation sites were detected in the rheumatoid arthritis patient group and the healthy control group, respectively, 1777 and 2308 were located in the intron region, 1136 and 1693 were located in the exon region, 85 and 115 were located in the upstream position, 47 and 88 were located in the downstream position, and 1 untranslated region was located in the 5-terminal disease group.
TABLE 3 CNV annotation results
Figure BDA0001012995980000121
In one embodiment, the site variation information is annotated by performing identification using SAMtools, GATK software, and analyzing the distribution statistics of InDel on each gene functional element.
For example, indels with a length of less than 50bp are determined by adopting GATK software, and the indels of the genes are detected to be 699676 and 692574 in the rheumatoid arthritis patient group and the healthy control group respectively, 66.56 percent and 65.53 percent of the two experimental groups are detected to be reported in dbsnp135 on the whole genome respectively, and the indels of the two groups corresponding to 193641 and 192299 genes in the overall result are detected to be newly found and have high accuracy.
TABLE 4 InDEL annotation results List
Figure BDA0001012995980000122
Figure BDA0001012995980000131
It should be noted that the available data obtained by sequencing was aligned with known databases such as dbSNP, etc., and the reference sequence was human whole genome build37(Hg19) to knock out the large number of variations that are widely present in normal healthy persons. Comparing the diseased group with the healthy group to find out the distribution characteristics of different regions of the InDEL and the SNV in the gene, wherein the distribution characteristics of the different regions of the InDEL and the SNV in the gene are statistically analyzed in all four variation types through the application of comparison software, the change of the copy rate is mainly noticed in the CNV, if the number is lower than the set minimum number or higher than the maximum number, the gene deletion and duplication can be considered, and the influence of the variation types of the gene on the gene is found. The deleterious mutation types were screened for by analysis provided by the final software, mainly SIFT and PolyPhen2, and analysis of annotated results.
For example, a great number of variant sites are obtained preliminarily after genome-wide re-sequencing, found variants are eliminated by comparison with a known database, then a diseased group and a healthy group are subjected to comparative analysis, expression difference genes are found, the expression is particularly remarkable, high-contribution gene variant sites which are rich in phenomena and possibly are researched rheumatoid arthritis are found, and about 19251 InDel sites, 43318 CNV sites, 332 SV mutation sites and 34887 SNV sites are obtained after comparison.
For example, according to the statistical analysis of distribution on the genome-wide level, the following 35 mutant SNP alleles were selected from the above numerous sites according to the mutation prediction software of SIFT, PolyPhen-2 and the like by the mechanism of occurrence of rheumatoid arthritis including apoptosis, proliferation and regulation, and all the mutant sites were located in the exon region, and others including the gene change caused by substitution.
2SV information screened in Table 5
Figure BDA0001012995980000141
And
35 SNPs screened in Table 6
Figure BDA0001012995980000142
Figure BDA0001012995980000151
Furthermore, DAVID software is adopted to carry out GO annotation and signal path analysis on the screened genes, and GO analysis bioinformatics comprises three biological information of BP (biological process), MF (molecular function) and CC (cell composition). By GO analysis, the genes such as PPIP5K1, HK3, MIOX, ANK3 and the like can be found to be enriched in a large amount, and the biological functions involved in the genes can be known through relevant literatures.
For example, by performing genome-wide re-sequencing on DNA pools of 100 patients and 100 normal persons, a large number of gene mutation sites are obtained, 27249 and 28355 SNP mutation sites are respectively found on the genome-wide level, InDEL detection annotation is performed on the gene mutation sites, two groups of insertion and deletion sites corresponding to 193641 and 192299 genes in the overall result are newly found, the result has high accuracy, genome variation is also detected, 6190 and 6310 SV variation sites are respectively found in the two experimental groups, inversion, deletion and ectopy occur simultaneously, SV breakpoints are concentrated in non-coding regions, introns and exon regions, CNV detects 7479 and 10153 variation sites, wherein 1777 and 2308 are respectively located in the intron regions, 1136 and 1693 are located in the exon regions, 85 and 115 are respectively located at the upstream positions, 47 and 88 are located at the downstream positions, and on the basis, the two groups are respectively compared with known databases of human genome, compared with healthy and diseased groups, the obtained approximate 19251 InDel sites, 43318 CNV sites, 332 SV mutation sites and 34887 SNV sites are found to be new variation sites.
Through bioinformatics analysis and related literature data, 37 mutant genes in a part are preliminarily determined on the whole mutant genes by combining with PolyPhen2 analysis and SIFT analysis, and based on the analysis, GO analysis (reference P value <0.05) and mutant gene enrichment pathway are additionally adopted to find abundant enrichment phenomena of genes such as MIOX, PPIP5K1, ANK3, HK3 and the like in 37 genes such as MRPL47, FAM114A1, HK3, IQGAP2, KIR3DL1, CD247, CDCA7 and the like, wherein the PPIP5K1 reports the biological functions thereof through related documents, has acid phosphatase activity, can code a bifunctional inositol kinase, and codes a protein which is an important intracellular signaling pathway and may have relevance to rheumatoid arthritis, and the CD247 and KIR3DL1 genes discovered by us are already known as susceptibility genes of the rheumatoid arthritis, the KIR3DL1 gene is a complex HLA-related pathogenic ligand of the HLA-type rheumatoid arthritis, and the HLA-related ligand of the HLA-type 1 is a complex HLA-related ligand of the rheumatoid arthritis, this also indicates that the data of the present invention has an associated reliability.
The screening method of the rheumatoid arthritis high-contribution pathogenic gene discovers the variation site of the high-contribution pathogenic gene by screening the rheumatoid arthritis high-contribution pathogenic gene, plays a role in promoting the further research of the disease, provides a theoretical basis for the treatment and physiological mechanism of the rheumatoid arthritis, and has good effects on early treatment and prevention.
Through the research of whole genome re-sequencing on rheumatoid arthritis patients, a large number of mutation sites are obtained, and the method can provide guiding help for the clinical and treatment of the rheumatoid arthritis. The approximate contents of the mutant genes of the disease are analyzed from the viewpoint of molecular biology.
For example, Rheumatoid Arthritis (RA) is a chronic autoimmune inflammatory disease which is relatively common and seriously affects people's life, the cause of the disease is unknown so far, but the disease is considered to have direct relation with the factors such as heredity and environment, in recent years, about 30 related pathogenic gene sites are found through a series of methods and are proved, such as FCRL3 gene, but most of the related highly pathogenic genes still need to be determined in order to better understand and research RA, the whole genome re-sequencing technology is to sequence known genome sequences among different individuals, the individuals subjected to whole genome re-sequencing can obtain a large number of insertion deletion sites (indels), structural variation Sites (SV) and single nucleotide polymorphism Sites (SNP) through comparative analysis among different sequences, and analysis treatment of annotated bioinformatics is carried out, the high-throughput sequencing biological means is adopted to further discover, research and discuss related genes on a molecular level.
For example, the pathogenic mechanism of rheumatoid arthritis is researched by a high-throughput sequencing technology, a DNA pool and a molecular biological information analysis means, and genes related to the pathogenesis of rheumatoid arthritis are found at the whole genome level, so that the promotion effect is achieved for further researching the disease, the theoretical basis is provided for the treatment and physiological mechanism of rheumatoid arthritis, and the early treatment and prevention effects are good.
For example, 100 cases of normal persons without disease history in medical examination and in a medical examination center of 181 hospital are respectively selected and peripheral blood is randomly extracted as study objects, genomic DNA is respectively extracted, a DNA pool of a normal group and a DNA pool of a disease group are established, then a DNA library is constructed, high-throughput whole genome sequencing is carried out, according to a sequencing result, bioinformatics analysis is carried out on data, which mainly comprises removing joint pollution and low-quality data, and comparison is carried out with reference information, then SNP detection, InDel detection, CNV detection and SV detection are carried out, then data are arranged and annotated, after two groups of data are respectively processed and analyzed, SNP, SV, InDel and CNV of different groups are detected by fisher exact test, and according to a detection result, enrichment (P <0.01) appears and SIFT <0.05 and PolyPhen2 are detected to be close to a numerical value of 1 by software, and then the GO is analyzed by a GO analysis and enrichment platform, which shows that the GO has great correlation with pathogenic genes of rheumatoid arthritis, and the site is probably a relevant variation site of the disease, and then the GO is further screened and discussed by adopting GO analysis and other biological methods on the basis of the relevant variation site.
For example, by re-sequencing the whole genome of the healthy group and the diseased group, 27249 and 28355 SNP mutation sites are respectively found on the whole genome level, InDEL detection annotation is carried out on the SNP mutation sites, the insertion and deletion sites corresponding to 193641 and 192299 genes in the two groups in the overall result are newly found, the result has high accuracy, the genome variation is also detected, 6190 and 6310 SV mutation sites are respectively found in the two experimental groups, inversion, deletion and ectopy simultaneously occur, SV breakpoint is concentrated in non-coding regions, introns and exon regions, CNV detects 7479 and 10153 mutation sites, 1777 and 2308 are respectively located in the intron region, 1136 and 1693 are located in the exon region, 85 and 115 are respectively located in the upstream position, 47 and 88 are located in the downstream position, and on the basis, the two groups are respectively compared with the known database of the human genome, compared with healthy and diseased groups, the results show that approximately 19251 InDel sites, 43318 CNV sites, 332 SV mutation sites and 34887 SNV sites are new variation sites, and through bioinformatics analysis and relevant literature data, combined with Polyphen2 analysis, SIFT analysis, GO analysis and relevant bioinformatics pathways thereof, PPIP5K1, MIOX, ANK3 and HK3 are probably relevant susceptibility genes in 37 genes such as MRPL47, FAM114A1, HK3, IQGAP2, KIR3DL1, CD247, CDCA7 and the like, and two genes of KIR3DL1 and CD247 are found in relevant reports and determined as high-contribution pathogenic genes of rheumatoid arthritis.
For example, through genome re-sequencing and combining biological platforms such as related content of biological statistical analysis and GO analysis, PPIP5K1 and MIOX are preliminarily identified as susceptibility genes and have a certain correlation with the onset of rheumatoid arthritis.
It is to be noted in particular that the direct object of the invention is not to obtain a diagnostic result or a health condition, but only to obtain a diagnosis of a body fluid that has been removed from the human body, i.e., peripheral blood of a human body, for example, peripheral blood that has been separated from a human body in a rheumatoid arthritis patient group and a healthy control group, is processed or detected to acquire information as an intermediate result, or a method for processing the information, for example, screening a site of highly-contributing gene mutation in rheumatoid arthritis as an intermediate result, according to the present medical knowledge and the disclosure of the present invention, the diagnosis of a disease cannot be directly derived from the obtained information itself, for example, according to the intermediate result of the highly-contributing gene mutation site of the rheumatoid arthritis, the gene mutation site is simply used for researching the disease, providing a theoretical basis for the treatment and physiological mechanism of the rheumatoid arthritis and providing an intermediate result for early treatment and prevention; according to the current medical knowledge and the disclosure of the invention, the diagnosis result of the disease cannot be directly obtained from the obtained high-contribution gene variation site of the rheumatoid arthritis.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (1)

1. Application of a reagent for detecting mutations of genes MIOX, PPIP5K1, ANK3 and HK3 in preparation of rheumatoid arthritis diagnosis products.
CN201610403048.6A 2016-06-08 2016-06-08 Screening method of high-contribution pathogenic gene of rheumatoid arthritis Active CN107475351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610403048.6A CN107475351B (en) 2016-06-08 2016-06-08 Screening method of high-contribution pathogenic gene of rheumatoid arthritis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610403048.6A CN107475351B (en) 2016-06-08 2016-06-08 Screening method of high-contribution pathogenic gene of rheumatoid arthritis

Publications (2)

Publication Number Publication Date
CN107475351A CN107475351A (en) 2017-12-15
CN107475351B true CN107475351B (en) 2021-02-05

Family

ID=60593702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610403048.6A Active CN107475351B (en) 2016-06-08 2016-06-08 Screening method of high-contribution pathogenic gene of rheumatoid arthritis

Country Status (1)

Country Link
CN (1) CN107475351B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110993031B (en) * 2019-11-07 2020-07-28 广州医科大学附属第三医院(广州重症孕产妇救治中心、广州柔济医院) Analysis method, analysis device, apparatus and storage medium for autism candidate gene

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7713696B2 (en) * 2006-04-10 2010-05-11 Academisch Zeikenhuis Leiden H.O.D.N. Lumc Genetic markers for prognosis of antifolate treatment efficacy
GB0613844D0 (en) * 2006-07-12 2006-08-23 Progenika Biopharma Sa Methods and products for in vitro genotyping
DK2601609T3 (en) * 2010-08-02 2017-06-06 Population Bio Inc COMPOSITIONS AND METHODS FOR DISCOVERING MUTATIONS CAUSING GENETIC DISORDERS
CN103468708A (en) * 2013-09-13 2013-12-25 陈刚 Pathogenic genes of schizophrenia and application
CN104250649B (en) * 2014-08-27 2018-01-16 深圳华大基因股份有限公司 The new Disease-causing gene of the white first syndrome of the few hair of Keratoderma and its encoding proteins matter and application

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
儿童类风湿关节炎高贡献率易感基因的研究;李明江等;《中国优生与遗传杂志》;20200125;110-112 *

Also Published As

Publication number Publication date
CN107475351A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
AU2021202149B2 (en) Detecting repeat expansions with short read sequencing data
US20210054461A1 (en) Measurement and comparison of immune diversity by high-throughput sequencing
CN103571847B (en) FOXC1 gene mutation bodies and its application
JP2022522565A (en) An array graph tool for determining the variation of short tandem repeat regions
Zufferey et al. Epigenetics and methylation in the rheumatic diseases
CN107475351B (en) Screening method of high-contribution pathogenic gene of rheumatoid arthritis
CN105838720B (en) PTPRQ gene mutation body and its application
CN104178487A (en) ATM gene mutant and its application
CN107475420A (en) The pathogenic new gene of sperm disease without a head and its application
CN110878346B (en) Gene mutant and application thereof
CN103509801B (en) Skeletal muscle chloride ion channel gene mutant and its application
CN107385076B (en) A kind of hypothyroidism Disease-causing gene mutation and the diagnostic reagent based on this gene mutation
CN107523629A (en) Sperm disease Disease-causing gene new mutation without a head and its application
Bano et al. Evaluating emerging technologies applied in forensic analysis
CN108424959B (en) Biomarker for early diagnosis of ankylosing spondylitis and application of biomarker in kit
CN113265405B (en) SAMM50 mutant gene, primer, kit and method for detecting same, and use thereof
Yabr Lafta et al. Detection of an Insertion in the ATXN3 Gene in Chronic Myeloid Leukemia Cases Using Exome Sequencing
Barbaro Overview of NGS platforms and technological advancements for forensic applications
Yilmaz Structural Variants in Health and Disease
CN104293813B (en) The new Disease-causing gene of Skin peeling syndrome and its encoding proteins matter and application
CN113186274A (en) GRPEL1 mutant gene, primer, kit and method for detecting GRPEL1 mutant gene and application of GRPEL1 mutant gene
CN104178486B (en) SACS gene mutation bodies and its application
CN103509802A (en) SLC25A13 gene mutant and its application
Rudin et al. 14 DNA BASED IDENTIFICATION
Stolovitzky et al. Forensic Scientist Berkeley, CA kinman@ ix. netcom. com

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant