WO2016208827A1 - Procédé et dispositif d'analyse de gène - Google Patents

Procédé et dispositif d'analyse de gène Download PDF

Info

Publication number
WO2016208827A1
WO2016208827A1 PCT/KR2015/012925 KR2015012925W WO2016208827A1 WO 2016208827 A1 WO2016208827 A1 WO 2016208827A1 KR 2015012925 W KR2015012925 W KR 2015012925W WO 2016208827 A1 WO2016208827 A1 WO 2016208827A1
Authority
WO
WIPO (PCT)
Prior art keywords
depths
genes
gene
cnv
copy number
Prior art date
Application number
PCT/KR2015/012925
Other languages
English (en)
Korean (ko)
Inventor
박웅양
김상철
남재용
Original Assignee
사회복지법인 삼성생명공익재단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 사회복지법인 삼성생명공익재단 filed Critical 사회복지법인 삼성생명공익재단
Priority to CN201580078172.3A priority Critical patent/CN107408163B/zh
Priority to SG11201707649SA priority patent/SG11201707649SA/en
Priority claimed from KR1020150168833A external-priority patent/KR101828052B1/ko
Publication of WO2016208827A1 publication Critical patent/WO2016208827A1/fr
Priority to SA517380741A priority patent/SA517380741B1/ar

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention relates to a method and apparatus for analyzing genes, and more particularly, to a method and apparatus for analyzing genes of copy number variation (CNV).
  • CNV copy number variation
  • a genome is all the genetic information of a living thing.
  • various technologies such as DNA chips, Next Generation Sequencing technology, and Next Next Generation Sequencing technology have been developed.
  • Analysis of genetic information such as nucleic acid sequences, proteins, etc., is widely used to find genes that express diseases such as diabetes and cancer, or to identify correlations between genetic diversity and individual expression characteristics.
  • the genetic data collected from the individual is important in identifying the genetic characteristics of the individual associated with different symptoms or disease progression. Therefore, genetic data such as nucleic acid sequences, proteins, etc. of an individual are essential data for identifying current and future disease-related information to prevent disease or to select an optimal treatment method at an early stage of disease.
  • Techniques for accurately analyzing individual genetic data and diagnosing an individual's disease using genome detection equipment that detects single nucleotide polymorphism (SNP) and copy number variation (CNV) as genetic information of an organism are being studied.
  • a method of analyzing a gene comprises: generating a reference data set relating to depths of reads aligned to each of the reference genes by performing deep sequencing on reference genes; Analyzing depths of reads aligned with each of the test genes by performing the deep sequencing on the test genes; And comparing the analyzed depths with depths of the reference genes included in the reference data set to determine whether a copy number variation (CNV) gene is present among the test genes.
  • CNV copy number variation
  • the analyzing step analyzes the depth of the reads aligned with exon sites of the test genes.
  • the presence of the copy number variation (CNV) gene may be determined by comparing the depths between the reference genes and the test genes for the same exon region.
  • the determining may include exon sites of which the difference in depths of exon sites corresponding to each other between the reference genes and the test genes are not statistically significant among the exon sites of the test genes. If so, the copy number mutation (CNV) gene is determined to exist.
  • CNV copy number mutation
  • the generating may further include obtaining lead-depths corresponding to the reference genes for each of the people through the deep sequencing of a plurality of people's gene data; Clustering the people into different groups according to the obtained distribution of lead-depths; Acquiring standard depths of each of the reference genes representing each of the groups by normalizing the read-depths acquired for each of the reference genes per group, wherein the reference data set includes the groups For each, data representing standard depths of each of the reference genes is included.
  • the determining may further include determining a group among the groups having the smallest statistical difference between the distribution of the analyzed depths and the distribution of the standard depths; And determining whether the copy number variation (CNV) gene is present by comparing the analyzed depths with standard depths corresponding to the determined group.
  • CNV copy number variation
  • the method further includes obtaining the genetic data of the people from public genomic data or public HapMap data.
  • the reference genes or the test genes may be obtained from biopsy tissue, formalin-fixed, paraffin-embedded (FFPE) tissue.
  • FFPE paraffin-embedded
  • the method may further include performing an annotation for identifying a drug corresponding to the copy number variation (CNV) gene.
  • a computer-readable recording medium having recorded thereon a program for executing the method on a computer.
  • an apparatus for analyzing a gene may include: a reference data generator configured to generate a reference data set about depths of reads aligned with each of the reference genes by performing deep sequencing on reference genes; An analysis unit which analyzes depths of reads aligned with each of the test genes by performing the deep sequencing on the test genes; And a determining unit determining whether a copy number variation (CNV) gene exists among the test genes by comparing the analyzed depths with depths of the reference genes included in the reference data set.
  • CNV copy number variation
  • the analysis unit analyzes the depth of the reads aligned with exon sites of the test genes.
  • the determination unit determines the existence of the copy number variation (CNV) gene by comparing the depths between the reference genes and the test genes for the same exon region.
  • CNV copy number variation
  • the determination unit when there is an exon region of the exon regions of the test genes, the difference in the depth of the exon regions corresponding to each other between the reference genes and the test genes are not statistically significant (significant) It is determined that the copy number mutation (CNV) gene is present.
  • CNV copy number mutation
  • the reference data generator obtains read-depths corresponding to the reference genes for each of the people through the deep sequencing of a plurality of people's gene data, and according to the distribution of the read-depths. Clustering the people into different groups, normalizing the read-depths obtained for each of the reference genes per group, thereby obtaining standard depths of each of the reference genes representing each of the groups, and The reference data set includes, for each of the groups, data representing standard depths of each of the reference genes.
  • the determination unit may determine a group having the smallest statistical difference between the distribution of the analyzed depths and the distribution of the standard depths among the groups, and compare the analyzed depths with the standard depths corresponding to the determined group. Thus, it is determined whether the copy number mutation (CNV) gene is present.
  • CNV copy number mutation
  • the reference data generator obtains the genetic data of the people from public genomic data or public map data (HapMap).
  • the reference genes or the test genes may be obtained from biopsy tissue, formalin-fixed, paraffin-embedded (FFPE) tissue.
  • FFPE paraffin-embedded
  • the determination unit when it is determined that the copy number variation (CNV) gene is present among the test genes, the determination unit performs an annotation for identifying a drug corresponding to the copy number variation (CNV) gene.
  • CNV copy number variation
  • FIG. 1 is a view for explaining a gene analysis apparatus according to an embodiment.
  • FIG. 2 is a block diagram illustrating hardware configurations of a gene analysis apparatus according to an exemplary embodiment.
  • FIG. 3 is a flowchart of a method of generating a reference data set according to an embodiment.
  • FIG. 4 is a diagram for describing obtaining lead-depths corresponding to reference genes for each of a plurality of people (eg, normal people), according to an exemplary embodiment.
  • 5 is a diagram for describing deep sequencing of exon regions according to an embodiment.
  • FIG. 6 is a diagram illustrating clustering people into different groups according to a distribution of lead-depths obtained from a normal group 400 according to an embodiment.
  • FIG. 7 is a diagram for describing standard depths of each of reference genes representing a group according to an embodiment.
  • FIG. 8 is a diagram for describing deep sequencing of test genes obtained from biological samples of a subject, according to an exemplary embodiment.
  • CNV copy number variation
  • FIG. 10 illustrates a method for determining whether a copy number variation (CNV) gene is present according to an embodiment.
  • FIG. 11 is a flowchart of a method of analyzing a gene, according to an embodiment.
  • FIG. 12 is a block diagram illustrating hardware configurations of a computing device according to an embodiment.
  • a method of analyzing a gene comprises: generating a reference data set relating to depths of reads aligned to each of the reference genes by performing deep sequencing on reference genes; Analyzing depths of reads aligned with each of the test genes by performing the deep sequencing on the test genes; And comparing the analyzed depths with depths of the reference genes included in the reference data set to determine whether a copy number variation (CNV) gene is present among the test genes.
  • CNV copy number variation
  • a computer-readable recording medium having recorded thereon a program for executing the method on a computer.
  • an apparatus for analyzing a gene may include: a reference data generator configured to generate a reference data set about depths of reads aligned with each of the reference genes by performing deep sequencing on reference genes; An analysis unit which analyzes depths of reads aligned with each of the test genes by performing the deep sequencing on the test genes; And a determining unit determining whether a copy number variation (CNV) gene exists among the test genes by comparing the analyzed depths with depths of the reference genes included in the reference data set.
  • CNV copy number variation
  • a part when a part is connected to another part, it includes not only a case where the part is directly connected, but also an electric part connected between other components in between. .
  • a part when a part includes a certain component, this means that the component may further include other components, not to exclude other components unless specifically stated otherwise.
  • the terms "... unit”, “... module” described in the embodiments means a unit for processing at least one function or operation, which is implemented in hardware or software, or a combination of hardware and software. Can be implemented.
  • FIG. 1 is a view for explaining a gene analysis apparatus according to an embodiment.
  • the genetic analysis apparatus 10 uses a genetic data 20 obtained from a normal population and a genetic data 30 obtained from a subject, thereby replicating copy number (CNV) to a subject gene of a subject.
  • CNV copy number
  • the genetic data 20 and the genetic data 30 received by the genetic analysis apparatus 10 may correspond to the genetic data in the FASTQ file format obtained by next generation sequencing (NGS).
  • the FASTQ format is usually a text-based format that stores biological sequences, such as nucleotide sequences, and corresponding quality scores.
  • the genetic analysis apparatus 10 according to the present embodiment is not limited to the FASTQ format, and the genetic data 20 and 30 in other formats can also be analyzed.
  • Gene data 20 of the normal population is obtained from a database (DB) already known in the art, such as the National Center for Biotechnology Information (NCBI), Gene® Expression Omnibus (GEO), or the like. It may be obtained from a biological sample of people recruited to. That is, the genetic data 20 may be obtained from public genomic data or public map data. Meanwhile, the reference genes included in the genetic data 20 or the test genes included in the genetic data 30 may be obtained from biopsy tissue, formalin-fixed tissue, or paraffin-embedded tissue. It may be.
  • DB database
  • NCBI National Center for Biotechnology Information
  • GEO Gene® Expression Omnibus
  • Copy number variation is known to mean a variation in a gene that appears to be repeated or lacking or amplified in a relatively large region of a particular chromosome compared to a reference genome. That is, the genetic analysis apparatus 10 may determine whether there is an abnormally deleted or amplified gene in the genetic data 30 obtained from the subject compared to the genetic data 20 obtained from a normal population.
  • the gene analyzed by the genetic analysis device 10 may refer to a nucleic acid such as DNA (deoxyribonucleic acid), RNA (ribonucleic acid), and the like.
  • the normal population may refer to a population composed of ordinary people who have not found a specific disease, such as cancer or a tumor, and the subject may refer to a patient where a specific disease such as cancer or a tumor is found. have. Meanwhile, in the present embodiments, the normal population and the subject may correspond to other animals other than humans.
  • the genetic analysis apparatus 10 may be implemented with at least one processor having a function of data processing for performing various instructions and various algorithms for analyzing the gene data 20 and 30 to identify a copy number variation (CNV) gene. Can be.
  • processor having a function of data processing for performing various instructions and various algorithms for analyzing the gene data 20 and 30 to identify a copy number variation (CNV) gene. Can be.
  • CNV copy number variation
  • FIG. 2 is a block diagram illustrating hardware configurations of a gene analysis apparatus according to an exemplary embodiment.
  • the genetic analysis apparatus 10 may include a reference data generator 110, an analyzer 120, and a determiner 130.
  • the gene analysis apparatus 10 shown in FIG. 2 only shows the components related to the present embodiment in order to prevent the features of the present embodiment from being blurred, the gene analysis apparatus 10 is shown in FIG. In addition to the components, other general purpose components may be further included.
  • the reference data generator 110 receives the gene data 20 obtained from the normal population described above with reference to FIG. 1, and generates a reference data set using the received gene data 20.
  • the reference data generator 110 performs deep sequencing of reference genes included in the gene data 20, thereby providing depths of reads aligned with each of the reference genes. Create a reference data set for (depths). Deep sequencing is a technique for sequencing nucleic acids such as DNA fragments, RNA fragments, and the like by repeatedly aligning leads to nucleic acids such as DNA fragments, RNA fragments, and the like. As a result of deep sequencing, data regarding depths corresponding to the number of reads complementarily bound to nucleic acids such as DNA fragments, RNA fragments, and the like can be obtained.
  • the term “depth” may be used interchangeably as the same meaning as the term “read-depth”.
  • the reference data generator 110 first read-depth corresponding to reference genes for each of the people through deep sequencing on the genetic data (20 of FIG. 1) of a plurality of people (eg, normal people). Acquire them. Then, the reference data generator 110 clusters people into different groups according to the obtained distribution of read-depths. The reference data generator 110 obtains standard depths of each of the reference genes representing each of the groups by normalizing the read-depths obtained for each of the reference genes for each group. As a result, the reference data set generated by the reference data generator 110 may include data representing standard depths of each of the reference genes for each of the groups.
  • the analyzer 120 receives the gene data 30 obtained from the subject, described above with reference to FIG. 1, and performs deep sequencing on the test genes included in the gene data 30 to each of the test genes. Analyze the depths of the aligned reads.
  • deep sequencing performed by the reference data generator 110 and the analyzer 120 may be performed on exon sites in the reference gene or the test gene.
  • the data of the depths analyzed by the reference data set generated by the reference data generator 110 or the analyzer 120 corresponding to the deep sequencing result may be related to the depths of the exon sites. Only data may be included, and data regarding depths of reads aligned to intron sites may not be included.
  • the exemplary embodiments are not limited thereto, and depth data of intron portions may be included.
  • the determination unit 130 compares the depths analyzed by the analyzer 120 with the depths of the reference genes included in the reference data set generated by the reference data generator 110. Then, the determination unit 130 determines whether there is a copy number variation (CNV) gene among the test genes. In this case, the determination unit 130 may determine the presence of the copy number variation (CNV) gene by comparing the depths between the reference genes and the test genes for the same exon region.
  • CNV copy number variation
  • the determination unit 130 includes an exon region in which the difference in the depth of exon regions corresponding to each other between the reference genes and the test genes is not statistically significant among the exon regions of the test genes. In this case, it can be determined that a copy number variation (CNV) gene is present.
  • CNV copy number variation
  • the determination unit 130 detects or identifies that the gene corresponding to the exon region whose difference in depth in the corresponding exon regions is not statistically significant corresponds to the copy number variation (CNV) gene. Further, when it is determined that there is a copy number variation (CNV) gene among the test genes, the determination unit 130 selects a drug (for example, an anticancer agent) corresponding to the detected copy number variation (CNV) gene. Annotations can be performed to identify them.
  • a drug for example, an anticancer agent
  • FIG. 3 is a flowchart of a method of generating a reference data set according to an embodiment.
  • the generation of the reference data set includes steps processed in time series in the reference data generator 110 described above.
  • the reference data generator 110 acquires read-depths corresponding to reference genes for each of a plurality of people (eg, normal people).
  • the reference data generator 110 clusters people into different groups according to the obtained distribution of read-depths.
  • step 303 the reference data generator 110 normalizes the read-depths acquired for each of the reference genes for each group.
  • step 304 the reference data generator 110 obtains standard depths of each of the reference genes representing each of the groups.
  • FIG. 4 is a diagram for describing obtaining lead-depths corresponding to reference genes for each of a plurality of people (eg, normal people), according to an exemplary embodiment.
  • the description of FIG. 4 may relate to the method performed in step 301 of FIG. 3.
  • the reference data generator 110 may acquire read-depths by performing deep sequencing using the genetic data 401 obtained from a database (DB) 40.
  • DB database
  • Database (DB) 40 stores genetic data 401 of a plurality of people (eg, normal people) classified into normal population 400.
  • Genetic data 401 may be obtained using various sequencing means, such as next generation sequencing (NGS), microarrays, and the like on biological samples taken from a plurality of people.
  • NGS next generation sequencing
  • the genetic data 401 may be data about a whole genome or data about a HapMap.
  • Database (DB) 40 corresponds to a database (DB) already known in the art, such as NCBI, GEO, etc., or stores genetic data 401 of people recruited to analyze subject genes of a subject. It may be built to.
  • the reference data generator 110 performs deep sequencing on genes (ie, reference genes) of individuals of the normal population 400 included in the gene data 401.
  • the reference data generator 110 may perform deep sequencing on reference genes 411 of the “person 1” 410 included in the normal population 400.
  • the genes 1, ..., gene n (n is a natural number) included in the reference genes 411 are aligned with the leads 415, and the reference genes 411 Data for the depths (lead-depths) of the leads 415 aligned to each other are obtained.
  • the reference data generator 110 performs deep sequencing on the reference genes 421 of the “person 1” 420 included in the normal population 400, and arranges each of the reference genes 421. Data about the depths (lead-depths) of the read leads 425 are obtained.
  • the reference data generator 110 may acquire data of read-depths by performing deep sequencing on reference genes of individual individuals of the normal population 400 included in the gene data 401.
  • 5 is a diagram for describing deep sequencing of exon regions according to an embodiment.
  • an individual's reference gene comprises gene a, gene b and gene c
  • the result of deep sequencing may be the depth of leads 510 aligned to exon a1 in gene a and Data of the depths of the reads aligned to exon a2, the depths of the reads aligned to exon b1 in gene b and the depths of the leads aligned to exon b2, and the depths of the reads aligned to exon c in gene c.
  • the exemplary embodiments are not limited thereto, and the deep sequencing result may include data of depths of reads aligned with the intron regions 505.
  • the analysis unit 120 of FIG. 2 may analyze the depths of reads aligned with each of the exon sites in the test genes by performing deep sequencing on the exon sites in the test genes.
  • FIG. 6 is a diagram illustrating clustering people into different groups according to a distribution of lead-depths obtained from a normal group 400 according to an embodiment. The description of FIG. 6 may relate to the method performed in step 302 of FIG. 3.
  • the reference data generator 110 groups people having a similar distribution of depths to cluster individuals of the normal group 400 into different groups.
  • clustering may be performed by statistically analyzing the distribution of read-depth for each reference gene (exon) using a known trend analysis algorithm, a clustering algorithm, or the like.
  • reference genes of people belonging to group 1 may have a similar distribution of each gene and depth pair.
  • reference genes of people in group 1 may be obtained from biopsy samples of people in group 1
  • reference genes of people in group M M (M is a natural number) may be obtained from FFPE of people in group M. It may be one obtained from the samples.
  • FIG. 7 is a diagram for describing standard depths of each of reference genes representing a group according to an embodiment. The description of FIG. 7 may relate to the methods performed in steps 303 and 304 of FIG. 3.
  • the reference data generator 110 normalizes the read-depths acquired for each of the reference genes for each group, and represents each of the reference genes representing each of the groups. Obtain standard depths.
  • the reference data generator 110 calculates an average of various depths for “exon 1”. By doing so, it is possible to standardize the depth for “Exon 1”. Similarly, the reference data generator 110 calculates an average of various depths with respect to each of the other reference genes (eg, “Exon 43”, “Exon 3543”, “Exon 5623”, etc.), and thus, each gene ( Exon) can be calculated. As a result, the reference data generator 110 may acquire standard depths of each of the reference genes, which represent each of the clustered groups. Meanwhile, in the present embodiment, for convenience of description, the average of the depths is calculated to take a representative value. However, in the present embodiment, the representative value of the depths may be calculated using other types of statistics besides the average.
  • FIG. 8 is a diagram for describing deep sequencing of test genes obtained from biological samples of a subject, according to an exemplary embodiment.
  • the analysis unit 120 of FIG. 2 performs depth sequencing of test genes on the basis of the gene data 30 of the test subject 800 to determine depths of reads aligned with each of the test genes. Analyze them.
  • the genetic data 30 of the subject 800 may be obtained through next generation sequencing (NGS) on a biopsy sample 810 or an FFPE sample 825 taken from some tissue of the subject 800.
  • NGS next generation sequencing
  • the FFPE sample 825 is a sample by FFPE treatment 820 for some tissue of the subject 800.
  • the analysis unit 120 of FIG. 2 analyzes the depths of the reads aligned with the test genes of the test subject 800 according to the deep sequencing methods described above with reference to FIGS. 4 and 5, thereby providing depth data of the test genes ( 830 may be obtained.
  • CNV copy number variation
  • the determination unit 130 determines a group among the groups clustered by the reference data generation unit 110 having the smallest statistical difference between the distribution of the depths analyzed from the test genes and the distribution of the standard depths. do. That is, the determination unit 130 determines at least one group among the clustered groups (eg, the groups of FIG. 6) having a statistical tendency similar to the distribution of depths analyzed from the test genes. In this case, the determination unit 130 may determine a group having the smallest standard deviation between the distribution of the depths analyzed from the test genes and the distribution of the standard depths.
  • the present invention is not limited thereto, and other statistics may be used in addition to the standard deviation to select a group having a tendency similar to the distribution of depths analyzed from the test genes.
  • the determination unit 130 compares the analyzed depths analyzed from the test genes and the standard depths corresponding to the determined group. More specifically, the determination unit 130 compares the depth of each of the test genes (exons) with the depths of the corresponding reference genes (corresponding exons). For example, assuming that “exon 1” and “exon 43” exist in both the test genes and the reference genes, the determination unit 130 may determine the “exon 1” of the analysis unit 120. The depth is compared with the standard depth of "Exon 1", and the depth of "Exon 43" analyzed by the analyzer 120 is compared with the standard depth of "Exon 43".
  • “exon 1” and “exon 43” are arbitrary terms for indicating that they are different exons.
  • the determination unit 130 determines whether a copy number variation (CNV) gene is present as a result of the comparison. At this time, the determination unit 130, if there is an exon region of the exon regions of the test genes, the difference in the depth of the exon regions corresponding to each other between the reference genes and the test genes are not statistically significant (significant) It can be determined that the copy number variation (CNV) gene is present.
  • CNV copy number variation
  • the determination unit 130 determines that the depth of any exon analyzed by the analysis unit 120 is standard. It may be determined that the copy number variation (CNV) gene is present when it exceeds 4 times the depth.
  • the threshold is not limited thereto and may be variously changed. For example, when the standard depth of “exon 1” is 1000, the threshold for determining significance may be 4000. Therefore, when the depth of the "exon 1" of the subject analyzed by the analysis unit 120 is 5000, the determination unit 130 may determine that the gene of "exon 1" is a copy number variation (CNV) gene. Can be.
  • FIG. 10 illustrates a method for determining whether a copy number variation (CNV) gene is present according to an embodiment.
  • the depths indicated by solid lines correspond to reference genes (exons)
  • the depths indicated by solid lines correspond to reference genes (exons)
  • the depths indicated by dashed lines correspond to test genes (exons).
  • the determination unit 130 compares the depths of the exons analyzed by the analysis unit 120 and the standard depths, as described above in the drawings.
  • the determination unit 130 may be an exon region (“exon a”) in which the difference in the depth of exon regions corresponding to each other between the reference genes and the test genes among the exon sites of the test genes is not statistically significant. Is present, the test gene of "exon a” has been identified as a copy number mutation (CNV) gene, it can be determined that the copy number mutation (CNV) gene is present.
  • CNV copy number mutation
  • the determination unit 130 may annotate for identifying a drug (eg, an anticancer agent) corresponding to the copy number variation (CNV) gene. Can be performed.
  • a drug eg, an anticancer agent
  • the gene analysis method includes steps that are processed in time series in the gene analysis apparatus 10 described in the foregoing figures. Therefore, even if omitted below, the contents described above may be applied to the genetic analysis method of FIG. 11.
  • the reference data generator 110 performs deep sequencing on the reference genes to generate a reference data set about depths of reads aligned with each of the reference genes.
  • the analyzer 120 analyzes the depths of the reads aligned with each of the test genes by performing deep sequencing on the test genes.
  • the determination unit 130 compares the analyzed depths with the depths of the reference genes included in the reference data set to determine whether a copy number variation (CNV) gene exists among the test genes.
  • CNV copy number variation
  • FIG. 12 is a block diagram illustrating hardware configurations of a computing device according to an embodiment.
  • the computing device 1 includes a genetic analysis device (processor) 10, a data interface 11, and a memory 12.
  • the computing device 1 shown in FIG. 12 has only general components related to the present embodiment in order to prevent the features of the present embodiment from being blurred. Therefore, the computing device 1 shown in FIG. Components may be further included.
  • the data interface 11 receives the genetic data 20 of the normal population and the genetic data 30 of the subject described above in FIG. 1. That is, the data interface 11 may be implemented in hardware of a wired / wireless network interface for the computing device 1 to communicate with other external devices. The data interface 11 transmits the received genetic data 20 and 30 to the genetic analysis device (processor) 10.
  • Data interface 11 may receive genetic data 20 of a normal population from database DB (40 in FIG. 4).
  • the data interface 11 may receive the genetic data 30 of the subject from an external next-generation sequencing apparatus, a microarray, or the like for sequencing the subject gene of the subject.
  • the memory 12 is hardware for storing data to be processed in the computing device 1 and the processed results, and memory chips such as random access memory (RAM), read only memory (ROM), or a hard disk (HDD). drive, solid state drive (SSD), and the like. That is, the memory 12 may store the gene data 20 and 30 received by the data interface 11, and the reference data set processed by the genetic analysis device (processor) 10, for the genes to be tested. Deep sequencing data, data for identified copy number variation (CNV) genes can also be stored.
  • RAM random access memory
  • ROM read only memory
  • HDD hard disk
  • SSD solid state drive
  • Genetic analysis device (processor) 10 is a module implemented in one or more processing units, which may be implemented as a combination of a microprocessor having an array of multiple logic gates and a memory module storing a program that can be executed on the microprocessor. have. Genetic analysis device (processor) 10 may be implemented in the form of a module of an application program. Genetic analysis device (processor) 10 is a hardware device for processing the gene analysis described above in FIGS.
  • the information about the copy number variation (CNV) gene identified by the genetic analysis device (processor) 10 may be transmitted via the data interface 11 to another external device such as a display device, another computing device, or the like, Or on an external network, such as the Internet, a public database (DB) server.
  • another external device such as a display device, another computing device, or the like
  • an external network such as the Internet, a public database (DB) server.
  • DB public database
  • a copy number variation (CNV) gene may be generated only by a biopsy sample or an FFPE sample of the cancer tissue of the subject. Can be detected.
  • genes of cancer tissue test genes
  • reference to reference genes under similar conditions FFPE treatment
  • CNV cloned mutation
  • the device may include a processor, a memory for storing and executing program data, a persistent storage such as a disk drive, a communication port for communicating with an external device, a touch panel, a key, a button, and the like. And a user interface device.
  • Methods implemented by software modules or algorithms may be stored on a computer readable recording medium as computer readable codes or program instructions executable on the processor.
  • the computer-readable recording medium may be a magnetic storage medium (eg, read-only memory (ROM), random-access memory (RAM), floppy disk, hard disk, etc.) and an optical reading medium (eg, CD-ROM). ) And DVD (Digital Versatile Disc).
  • the computer readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
  • the medium is readable by the computer, stored in the memory, and can be executed by the processor.
  • This embodiment can be represented by functional block configurations and various processing steps. Such functional blocks may be implemented in various numbers of hardware or / and software configurations that perform particular functions.
  • an embodiment may include an integrated circuit configuration such as memory, processing, logic, look-up table, etc. that may execute various functions by the control of one or more microprocessors or other control devices. You can employ them.
  • the present embodiment includes various algorithms implemented in C, C ++, Java (data structures, processes, routines or other combinations of programming constructs). It may be implemented in a programming or scripting language such as Java), an assembler, or the like.
  • the functional aspects may be implemented with an algorithm running on one or more processors.
  • the present embodiment may employ the prior art for electronic environment setting, signal processing, and / or data processing.
  • Terms such as “mechanism”, “element”, “means” and “configuration” can be used widely and are not limited to mechanical and physical configurations. The term may include the meaning of a series of routines of software in conjunction with a processor or the like.
  • connection or connection members of the lines between the components shown in the drawings by way of example shows a functional connection and / or physical or circuit connections, in the actual device replaceable or additional various functional connections, physical It may be represented as a connection, or circuit connections.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

L'invention concerne un procédé et un dispositif qui permettent d'analyser un gène et qui consistent : à générer un ensemble de données de référence par exécution d'un séquençage profond de gènes de référence ; à analyser, par exécution d'un séquençage profond de gènes à analyser, la profondeur des gènes à analyser ; à déterminer, par comparaison de la profondeur analysée et de la profondeur des gènes de référence inclus dans l'ensemble de données de référence, si des gènes à variation du nombre de copies (CNV) existent dans les gènes à analyser.
PCT/KR2015/012925 2015-06-24 2015-11-30 Procédé et dispositif d'analyse de gène WO2016208827A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201580078172.3A CN107408163B (zh) 2015-06-24 2015-11-30 用于分析基因的方法及装置
SG11201707649SA SG11201707649SA (en) 2015-06-24 2015-11-30 Method and device for analyzing gene
SA517380741A SA517380741B1 (ar) 2015-06-24 2017-01-18 طريقة ومعدة لتحليل جين

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR20150089449 2015-06-24
KR10-2015-0089449 2015-06-24
KR1020150168833A KR101828052B1 (ko) 2015-06-24 2015-11-30 유전자의 복제수 변이(cnv)를 분석하는 방법 및 장치
KR10-2015-0168833 2015-11-30

Publications (1)

Publication Number Publication Date
WO2016208827A1 true WO2016208827A1 (fr) 2016-12-29

Family

ID=57585062

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2015/012925 WO2016208827A1 (fr) 2015-06-24 2015-11-30 Procédé et dispositif d'analyse de gène

Country Status (1)

Country Link
WO (1) WO2016208827A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194208A (zh) * 2017-04-25 2017-09-22 北京荣之联科技股份有限公司 一种基因分析注释方法和装置
CN111599408A (zh) * 2020-04-15 2020-08-28 至本医疗科技(上海)有限公司 基因变异顺反位置关系检测方法、装置、设备和存储介质

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120208706A1 (en) * 2010-12-30 2012-08-16 Foundation Medicine, Inc. Optimization of multigene analysis of tumor samples

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120208706A1 (en) * 2010-12-30 2012-08-16 Foundation Medicine, Inc. Optimization of multigene analysis of tumor samples

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AJAY ET AL.: "Accurate and Comprehensive Sequencing of Personal Genomes", GENOME RESEARCH, vol. 21, no. 9, 2011, pages 1498 - 1505, XP055239158 *
FUJIMOTO ET AL.: "Whole-Genome Sequencing and Comprehensive Variant Analysis of a Japanese Individual Using Massively Parallel Sequencing", NATURE GENETICS, vol. 42, no. 11, 2010, pages 931 - 938, XP055287427 *
KRUMM ET AL.: "Copy Number Variation Detection and Genotyping from Exome Sequence Data", GENOME RESEARCH, vol. 22, no. 8, 2012, pages 1525 - 1532, XP055341007 *
WU ET AL.: "Copy Number Variation Detection from 1000 Genomes Project Exon Capture Sequencing Data", BMC BIOINFORMATICS, vol. 13, no. 1, 2012, pages 1 - 19, XP021138467 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194208A (zh) * 2017-04-25 2017-09-22 北京荣之联科技股份有限公司 一种基因分析注释方法和装置
CN107194208B (zh) * 2017-04-25 2020-10-02 荣联科技集团股份有限公司 一种基因分析注释方法和装置
CN111599408A (zh) * 2020-04-15 2020-08-28 至本医疗科技(上海)有限公司 基因变异顺反位置关系检测方法、装置、设备和存储介质
CN111599408B (zh) * 2020-04-15 2022-05-06 至本医疗科技(上海)有限公司 基因变异顺反位置关系检测方法、装置、设备和存储介质

Similar Documents

Publication Publication Date Title
Wright et al. Paediatric genomics: diagnosing rare disease in children
Fang et al. Indel variant analysis of short-read sequencing data with Scalpel
Paul et al. Increased DNA methylation variability in type 1 diabetes across three immune effector cell types
Zook et al. A robust benchmark for germline structural variant detection
Sekizuka et al. TGS-TB: total genotyping solution for Mycobacterium tuberculosis using short-read whole-genome sequencing
Griffin et al. Accurate mitochondrial DNA sequencing using off-target reads provides a single test to identify pathogenic point mutations
EP2926288B1 (fr) Cartographie précise et rapide de lectures de séquençage ciblé
Tekin et al. A next-generation sequencing gene panel (MiamiOtoGenes) for comprehensive analysis of deafness genes
CN107408163B (zh) 用于分析基因的方法及装置
CN110383385B (zh) 从肿瘤样品中检测突变负荷的方法
Yamamoto et al. Challenges in detecting genomic copy number aberrations using next-generation sequencing data and the eXome Hidden Markov Model: a clinical exome-first diagnostic approach
Olson et al. Variant calling and benchmarking in an era of complete human genome sequences
WO2017135768A1 (fr) Procédé et système permettant de prédire le risque de développement d'un trouble génétique dans la progéniture putative
Zhang et al. Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data
Kishikawa et al. A metagenome-wide association study of gut microbiome in patients with multiple sclerosis revealed novel disease pathology
KR20190122909A (ko) 모체 혈장으로부터의 비침습적 산전 분자 핵형분석
EP3631657A1 (fr) Système et procédé de détection de fusion de gènes
Bademci et al. Identification of copy number variants through whole-exome sequencing in autosomal recessive nonsyndromic hearing loss
Govender et al. Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications
Cho et al. Prevalence of rare genetic variations and their implications in NGS-data interpretation
WO2021071181A1 (fr) Procédé de prédiction de la résistance à un agent immunothérapeutique anticancéreux et appareil d'analyse
Luzón-Toro et al. Next-generation-based targeted sequencing as an efficient tool for the study of the genetic background in Hirschsprung patients
JP2021101629A (ja) ゲノム解析および遺伝子解析用のシステム並びに方法
WO2016208827A1 (fr) Procédé et dispositif d'analyse de gène
WO2017204414A1 (fr) Procédé et appareil permettant d'analyser le degré de contamination croisée d'un échantillon

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15896461

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15896461

Country of ref document: EP

Kind code of ref document: A1