KR101686146B1 - Copy Number Variation Determination Method Using Sample comprising Nucleic Acid Mixture - Google Patents

Copy Number Variation Determination Method Using Sample comprising Nucleic Acid Mixture Download PDF

Info

Publication number
KR101686146B1
KR101686146B1 KR1020150172212A KR20150172212A KR101686146B1 KR 101686146 B1 KR101686146 B1 KR 101686146B1 KR 1020150172212 A KR1020150172212 A KR 1020150172212A KR 20150172212 A KR20150172212 A KR 20150172212A KR 101686146 B1 KR101686146 B1 KR 101686146B1
Authority
KR
South Korea
Prior art keywords
score
chromosome
chromosomes
reference value
sequence information
Prior art date
Application number
KR1020150172212A
Other languages
Korean (ko)
Inventor
조은해
이준남
전영주
장자현
이태헌
Original Assignee
주식회사 녹십자지놈
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 녹십자지놈 filed Critical 주식회사 녹십자지놈
Priority to KR1020150172212A priority Critical patent/KR101686146B1/en
Application granted granted Critical
Publication of KR101686146B1 publication Critical patent/KR101686146B1/en

Links

Images

Classifications

    • G06F19/18
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • G06F19/22

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a method for determining the number of replica variants in a mixture of nucleic acids that are known to be different or different in the amount of one or more sequences of interest, A bioinformatic analysis method, and a statistical analysis method.
The mutation determination method according to the present invention can be used to determine the number of chromosome copy number variations that are related to or related to a medical condition of a fetus. The chromosomal copy number variation that can be determined according to the method of the present invention may be any one or more of the chromosomes 1-22, any one or more of the trisomes and the halo chromosomes of X and Y, the chromosome of the entire nucleic acid sequence, And / or redundancy of the fetus, which is useful for analyzing the gender and replica number variation of the fetus.

Description

Methods for Determining Replication Number Variation in a Sample Containing a Mixture of Nucleic Acids [

More particularly, the present invention relates to a method for detecting sex and a copy number abnormality of a fetus, more specifically, extracting DNA from a biological sample of a mother, obtaining sequence information, and then performing normalization correction of the chromosomal region and randomizing the reference chromosome The present invention relates to a non-invasive fetal chromosome abnormality detection method using a method for detecting a fetal chromosome abnormality.

The prenatal diagnosis of fetal chromosomal abnormalities includes ultrasound, blood markers, amniocentesis, chorioamnion, and transcutaneous umbilical cord blood (Malone FD, et al., 2005; Mujezinovic F, et al. Ultrasonography and blood markers are classified as screening, and amniocentesis is classified as confirmed. Non-invasive methods, such as ultrasound and blood marker testing, are safe because they do not take direct samples of the fetus, but the sensitivity of the test drops below 80% (ACOG Committee on Practice Bulletins. The invasive amniocentesis, chorioamnion, and transcutaneous umbilical cord blood tests can confirm the fetal chromosomal abnormality, but there is a disadvantage that there is a possibility of fetal loss due to invasive medical treatment (Mujezinovic F, et al. 2007). In 1997, Lo et al. (1997) have used the fetal genetic material in maternal plasma and serum for the prenatal screening of fetal genetic material as a result of Y chromosome sequencing analysis (Lo YM, et al. The fetal genetic material in maternal blood is a part of the trophoblast cells that underwent apoptosis during the placental remodeling process and enters the maternal blood through the mass exchange mechanism. It is actually derived from the placenta and is called cff DNA (cell-free fetal DNA) do. cff DNA is found in most maternal blood from the 18th day of embryo transfer as early as the 37th day (Guibert J, et al. 2003). Since cff DNA has a short strand of 300 bp or less and exists in a small amount in maternal blood, a large-scale parallel base analysis technique using a next-generation nucleotide sequence analyzer (NGS) is used to detect it in fetal chromosome aberration detection. Non-invasive fetal chromosomal anomaly detection using large-scale parallel base analysis technology has detection sensitivity of 90-99% or more depending on the chromosome, but false-positive and false-negative results are 1-10% (Gil MM, et al., 2015).

The present inventors have made efforts to develop a method for detecting a fetal chromosomal abnormality with high sensitivity, low false positive and false negative results, and as a result, when performing normalization and chromosomal assignment of a fetal chromosome region randomly, High sensitivity and low false positive / false negative results can be obtained, and the present invention has been completed.

It is an object of the present invention to provide a method for non-invasively detecting gender and replica number abnormalities of a fetus.

It is another object of the present invention to provide an apparatus for non-invasively detecting gender and replica number abnormalities of a fetus.

It is yet another object of the present invention to provide a computer readable medium comprising instructions that are configured to be executed by a processor that detects gender and replica number anomalies of the fetus in the manner described above.

In order to accomplish the above object, the present invention provides a method for obtaining a DNA sample, comprising the steps of: a) extracting DNA from a biological sample of a mother to obtain sequence information; b) aligning the obtained sequence information to a reference genome database; c) calculating a Q-score with respect to sorted sequence information, and selecting only sequence information having a cut-off value or less; And d) gender and cloning of the fetus, including calculating the G-score for the selected sequence information and determining the gender and replica number of the fetus as compared to the reference chromosomal combination Thereby providing a method for detecting a number abnormality.

The present invention also provides a method for detecting a DNA fragment, comprising: a decoding unit for extracting DNA from a biological sample of a mother to decode sequence information; An alignment unit for aligning the decoded sequence to a standard chromosome sequence database; A quality control unit for calculating a Q-score with respect to sorted sequence information reads and selecting only sequence information having a cut-off value or less; (G-score) for the selected sequence information and comparing the sex and the number of copies of the fetus with the reference chromosome combination, Thereby providing a duplication number abnormality detecting apparatus.

The present invention also provides a computer readable medium comprising instructions that are configured to be executed by a processor that detects a gender and a number of copies of a fetus, the method comprising: a) extracting DNA from a biological sample of a mother, Obtaining; b) aligning the obtained sequence information to a reference genome database; c) calculating a Q-score for the ordered sequence information reads, and selecting only sequence information that is less than or equal to a cut-off value; And d) calculating the G-score for the selected sequence information and comparing the gender and replica number of the fetus with the reference chromosome combination to determine the gender and replica of the fetus The instructions being configured to be executed by a processor that detects a number or more.

The method of distinguishing sex and chromosome duplication number of fetus according to the present invention not only improves the accuracy of gender discrimination by using Next Generation Sequencing (NGS), but also detects sex chromosomes such as XO, XXX, and XXY It is possible to increase the detection accuracy and to increase the commercial utilization. Therefore, the method of the present invention is useful for prenatal diagnosis which can determine early abnormality due to abnormality of fetal sex chromosomes.

BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a whole flow chart for detecting the gender and the number of copies of the embryo of the present invention. Fig.
FIG. 2 is a diagram illustrating the correction results before and after the GC correction by the LOESS algorithm during the QC process of the read data.
FIG. 3 is a diagram illustrating the correction results before and after the correction of the Coefficient of Variation (CV) value by the LOESS algorithm during the QC process of the read data.
FIG. 4 is a schematic diagram comparing the G-score values calculated in the normal group with the chromosomal abnormal group according to the method of the present invention.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In general, the nomenclature used herein and the experimental methods described below are well known and commonly used in the art.

In the present invention, the sequence analysis data obtained from the sample is normalized, and based on the reference value, the combination of the reference chromosomes is randomly permutated to determine the G-score difference Of the total number of chromosomes of the reference chromosomes were compared with each other to obtain a high sensitivity and low false positive / false negative.

That is, in one embodiment of the present invention, the DNA extracted from maternal blood is sequenced, the quality is managed using the LOESS algorithm, the G-score is calculated, and then the normal population and the subject chromosome The reference chromosome combinations are randomly assigned until the absolute value of the G-score difference satisfies the maximum value. Based on the determination, the reference value of the G-score is determined. A method of determining that the number of copies of the subject chromosome is abnormal is developed (Fig. 1)

Thus, the present invention, in one aspect,

a) extracting DNA from a biological sample of a mother to obtain sequence information;

b) aligning the obtained sequence information to a reference genome database;

c) calculating a Q-score for the sorted sequence information and selecting only sequence information having a cut-off value or less; And

d) calculating the G-score for the selected sequence information and comparing the sex chromosomal combination with the reference chromosome combination to determine the gender and replica number variation of the fetus; And more particularly to a method for detecting a number abnormality.

In the present invention, when the selected sequence information is chromosome 13, the combination of reference chromosomes is not limited thereto, but may be chromosomes 4 and 6. If the selected sequence information is chromosome 18, The chromosomes 4, 7, 10, and 16 may be chromosomes of the sequence number 7, 11, 14, and 22, If the selected sequence information is chromosome X, the reference chromosome combination is not limited thereto. However, it may be a chromosome 16 and 20, and when the selected sequence information is chromosome Y, the reference chromosome combination is limited thereto 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 17 and 19 It can be characterized as a chromosome.

In the present invention,

The step a)

(i) The mixture of fetal and maternal nucleic acids is amniotic fluid obtained by amniocentesis, villus obtained by chorionic villi sampling, percutaneous umbilical blood sampling In umbilical cord blood, spontaneous miscarrying fetus tissue, or human peripheral blood, obtained by the method of the present invention;

(ii) removing proteins, fats, and other residues using a salting-out method, a column chromatography method, and a beads method in a collected fetal and parent nucleic acid mixture to obtain a purified nucleic acid;

(iii) producing a single-end sequencing or pair-end sequencing library for purified nucleic acids or nucleic acids that have been enzymatically cleaved, disrupted, or randomly fragmented by the hydroshear method;

(iv) reacting the constructed library to a next-generation sequencer; And

(v) acquiring the sequence information of the nucleic acid in a next-generation gene sequence checker.

In the present invention, the next-generation sequencer includes, but is not limited to, the Hiseq system of Illuminator Company, the Miseq system of Illuminator Company, the genome of Illuminator Co., Analyzer (GA) system, 454 FLX from Roche Company, SOLiD system from Applied Biosystems Company, LifeTechnology Company's ion torrent system.

In the present invention, the alignment step may be performed using the BWA algorithm and the GRch38 sequence, but not limited thereto.

In the present invention, the step c)

(i) identifying regions of each aligned nucleic acid sequence;

(ii) specifying a sequence satisfying a mapping quality score and a reference value of the GC ratio;

(iii) calculating the fraction of chromosome N (ChrN) of any of the above identified sequences in Case 1 by the following formula 1;

Equation 1:

Figure 112015118864862-pat00001

(i) calculating the Z-score of the region of chromosome N by the following equation 2;

Equation 2:

Figure 112015118864862-pat00002

(ii) The standard deviation of the Z-score for the remaining chromosomal regions except for the Z-score of the region corresponding to chromosome 13, 18, and 21 in any case 1 is referred to as Q-score ); And

(iii) determining a reference value of the Q-score, determining that the calculated Q-score value exceeds the reference value, and reproducing the sequence information of the corresponding sample;

And a step of determining whether or not the image is displayed.

In the present invention, in the step of specifying the region of the nucleic acid sequence of the step (i), the region of the nucleic acid sequence may be 20 kb to 1 MB, though not limited thereto.

In the present invention, the mapping quality score of step (ii) may vary depending on the desired criteria, but may be preferably 15-70, more preferably 50-70, most preferably It can be 60 points.

In the present invention, the GC ratio in the step (ii) may be varied depending on a desired standard, but is preferably 20 to 70%, and most preferably 30 to 60%.

In the present invention, the reference value of the step (vi) may be 4, preferably 3, and most preferably 2.

In the present invention, the case group refers to a sample for detecting sex and chromosome duplication number of a fetus, and the reference group means a group of reference chromosomes that can be compared, such as, but not limited to, a standard chromosome sequence database .

In the present invention, it is preferable that the step (d)

(i) randomly selecting reference chromosomes from chromosomes 1 to 22;

(ii) calculating a fraction value of an arbitrary chromosome N by the following equation 3;

Equation 3:

Figure 112015118864862-pat00003

(iii) calculating the G-score of the chromosome N of the arbitrary case 1 by the following equation 4;

Equation 4:

Figure 112015118864862-pat00004

(iv) repeating the above steps (i) to (iii) to select a chromosome combination that maximizes the G-score difference between the normal and abnormal groups; And

(v) The G-score is calculated using the chromosomal combination obtained in the above step (iv). If the calculated G-score is less than the reference value, the copy number is determined to be decreased. ;

And a step of determining whether or not the image is displayed.

In the present invention, the step (iv) may be repeatedly performed 100 times or more, preferably 1,000 times or more, and most preferably 100,000 times or more.

In the present invention, the reference value of the G-score in the step (v) may be any value as long as it is calculated from a normal chromosome, but is preferably -2 or 2, and most preferably -3 or 3 .

In the present invention, the step of determining the gender of the fetus in the step (d)

(i) the step (i) to (iv) of determining the number of copies above is performed in a reference group of a mother with 46, XX or 46, XY of the fetal karyotype to determine a G-score reference value for the X and Y chromosomes ; And (ii) comparing the G-score for the X and Y chromosomes in any case with the reference value to determine sex.

In the present invention, the G-score reference value for the X and Y chromosomes is not limited thereto, but may be -2 or 2, and most preferably -3 or 3. The G- The X chromosome is determined to be three or more if the score is less than or equal to the reference value and the Y chromosome is determined to be one or more if the G-score for the Y chromosome is equal to or greater than the reference value .

In the present invention, when the number of the Y chromosome is one or more, the fetal fraction of the X chromosome is calculated by the formula 5, the fetal fraction of the Y chromosome is calculated by the formula 6, and the ratio of the fraction of the Y chromosome per X chromosome fraction is calculated by the formula 7 And when the value is 0.7 to 1.4, it is determined to be XY, and when it is 1.4 to 2.6, it is determined to be XYY.

Equation 5:

Figure 112015118864862-pat00005

Equation 6:

Figure 112015118864862-pat00006

Equation 7:

Figure 112015118864862-pat00007

According to another aspect of the present invention, there is provided a method for screening a biological sample, comprising: a decoding unit for extracting DNA from a biological sample of a mother to decode sequence information; An alignment unit for aligning the decoded sequence to a standard chromosome sequence database; A quality control unit for calculating a Q-score with respect to sorted sequence information reads and selecting only sequence information having a cut-off value or less; (G-score) for selected sequence information (reads), and comparing the sex and the number of copies of the fetus with that of the reference chromosome, Sex and a copy number abnormality detecting apparatus.

In the present invention, when the selected sequence information is chromosome 13, the combination of reference chromosomes is not limited thereto, but may be chromosomes 4 and 6. If the selected sequence information is chromosome 18, The chromosomes 4, 7, 10, and 16 may be chromosomes of the sequence number 7, 11, 14, and 22, If the selected sequence information is chromosome X, the reference chromosome combination is not limited thereto. However, it may be a chromosome 16 and 20, and when the selected sequence information is chromosome Y, the reference chromosome combination is limited thereto 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 17 and 19 It can be characterized as a chromosome.

In the present invention, the above-described detoxification unit is preferably constructed such that (i) the fetal and maternal nucleic acid mixture is amniotic fluid obtained by amniocentesis, villus obtained by chorionic villi sampling, A sample collection obtained from umbilical cord blood obtained by percutaneous umbilical blood sampling, spontaneous miscarrying fetus tissue or human peripheral blood; (ii) a nucleic acid obtainable by removing the proteins, fats, and other residues using a salting-out method, a column chromatography method, or a beads method in a collected fetal and parent nucleic acid mixture, and obtaining a purified nucleic acid; (iii) a library production section for producing single-end sequencing or pair-end sequencing libraries for purified nucleic acids or nucleic acids that have been enzymatically cleaved, disrupted, and randomly fragmented by the hydroshear method; (iv) a next-generation gene sequencer that reacts the constructed library to a next-generation sequencer; And (v) a sequence information acquiring unit for acquiring sequence information of a nucleic acid in a next generation gene sequence checker.

In the present invention, the next-generation sequencer includes, but is not limited to, the Hiseq system of Illuminator Company, the Miseq system of Illuminator Company, the genome of Illuminator Co., Analyzer (GA) system, 454 FLX from Roche Company, SOLiD system from Applied Biosystems Company, LifeTechnology Company's ion torrent system.

In the present invention, the alignment unit may be performed using the BWA algorithm and the GRch38 sequence, but not limited thereto.

In the present invention, the quality management unit

(i) a region specifying unit specifying an area of each aligned nucleic acid sequence;

(ii) a sequence specifying unit which specifies a sequence satisfying a mapping quality score and a reference value of the GC ratio;

(iii) a chromosomal fraction calculating unit for calculating a fraction of the chromosome N in any of the above-identified sequences in Case 1 by the following equation 1;

Equation 1:

Figure 112015118864862-pat00008

(iv) a Z-score calculating unit for calculating a Z score of a chromosome N region by the following equation 2; Equation 2:

Figure 112015118864862-pat00009

(iv) The standard deviation of the Z-score for the remaining chromosomal regions except the Z-score for the chromosome 13, 18, (Q-score) calculating unit for calculating a Q-score; And

(v) determining a reference value of the Q-score, determining that the calculated Q-score value is below the reference value, and reproducing the sequence information of the corresponding sample;

And a control unit.

In the present invention, in the region specifying section, the region of the nucleic acid sequence may be 20 kb to 1 MB, though not limited thereto.

In the present invention, the mapping quality score of the sequence specifying unit may vary depending on a desired criterion, but may be preferably 15-70 points, and most preferably 60 points.

In the present invention, the GC ratio of the sequence specifying unit may be varied depending on the desired standard, but is preferably 20 to 70%, and most preferably 30 to 60%.

In the present invention, the reference value of the quality organizing unit may be 4, preferably 3, and most preferably 2.

In the present invention, the case group refers to a sample for detecting sex and chromosome duplication number of a fetus, and the reference group means a group of reference chromosomes that can be compared, such as, but not limited to, a standard chromosome sequence database .

In the present invention, the copy number variation determining unit for determining the number of copies and the number of copies of the copy number variation determining unit includes: (i) a random permutation unit for randomly selecting reference chromosomes from chromosome 1 to 22; (ii) a chromosomal fraction calculating unit for calculating a fraction value of an arbitrary chromosome N by the following equation 3;

Equation 3:

Figure 112015118864862-pat00010

(iii) a G-score calculation unit for calculating the G-score of the chromosome N of the arbitrary case 1 by the following equation (4);

Equation 4:

Figure 112015118864862-pat00011

(iv) a reference chromosome combination selection unit for selecting a chromosome combination that maximizes the G-score difference between the normal and abnormal groups by repeating the above (i) to (iii) apparatus; And (v) a G-score is calculated using a combination of reference chromosomes selected from the reference chromosome combination selection unit. If the calculated G-score value is less than the reference value, the number of copies is determined to be decreased. And a copy number variation determining unit that determines the number of copies to be increased.

In the present invention, the number of iterations of the optimum reference chromosome combination G-score calculation unit may be 100 or more, preferably 1,000 or more, and most preferably 100,000 or more.

In the present invention, the reference value of the G-score of the copy number variation determining unit may be any value as long as the reference value is a value calculated by a normal chromosome, but is preferably -2 or 2, and most preferably -3 or 3 .

In the present invention, the gender determination unit of the fetus in the gender and the copy number variation determining unit may be configured such that (i) the apparatus of (i) to (iv) of the copy number variation determining unit for determining the number of copies, 46, and XY to obtain a G-score reference value for the X and Y chromosomes; And (ii) a G-score for the X and Y chromosomes in any case with the reference value to determine sex.

In the present invention, the G-score reference value for the X and Y chromosomes is not limited thereto, but may be -2 or 2, and most preferably -3 or 3. The G- The X chromosome is determined to be three or more if the score is less than or equal to the reference value and the Y chromosome is determined to be one or more if the G-score for the Y chromosome is equal to or greater than the reference value .

In the present invention, when the number of the Y chromosome is one or more, the fetal fraction of the X chromosome is calculated by the formula 5, the fetal fraction of the Y chromosome is calculated by the formula 6, and the ratio of the fraction of the Y chromosome per X chromosome fraction is calculated by the formula 7 And when the value is 0.7 to 1.4, it is determined to be XY, and when it is 1.4 to 2.6, it is determined to be XYY.

Equation 5:

Figure 112015118864862-pat00012

Equation 6:

Figure 112015118864862-pat00013

Equation 7:

Figure 112015118864862-pat00014

In another aspect, the present invention provides a computer readable medium comprising instructions that are configured to be executed by a processor that detects a sex and a number of copies of a fetus, comprising: a) extracting DNA from a biological sample of a mother, Obtaining; b) aligning the obtained sequence information to a reference genome database; c) calculating a Q-score for the ordered sequence information reads, and selecting only sequence information that is less than or equal to a cut-off value; And d) calculating the G-score for the selected sequence information and comparing the gender and replica number of the fetus with the reference chromosome combination to determine the gender and replica of the fetus And more particularly to a computer readable medium comprising instructions configured to be executed by a processor detecting a number or more.

Example

Hereinafter, the present invention will be described in more detail with reference to Examples. It is to be understood by those skilled in the art that these examples are for illustrative purposes only and that the scope of the present invention is not construed as being limited by these examples.

Example  1. Extract DNA from maternal blood and perform next-generation sequencing

A total of 358 pregnant mothers were collected in 10 ml of EDTA tubes and were centrifuged at 1200g, 4 ° C, and 15 minutes for 2 hours. 16000g at 4 캜 for 10 minutes to separate the supernatant of the plasma except for the precipitate. For the separated plasma, cell-free DNA was extracted using QIAamp Circulating Nucleic Acid Kit, and 2-4 ng of DNA was made into a library to generate sequence information data on NextSeq instrument.

Example  2. Quality control of sequence information data

The following series of procedures were performed before preprocessing the nucleotide sequence information mixed with the maternal-fetal genetic material and calculating the z-score. After converting the Bcl file (including nucleotide sequence information) generated by the Next Generation Sequence Analyzer (NGS) into the fastq format, the fastq file was aligned with the reference chromosome Hg19 sequence using the BWA-mem algorithm. There is a possibility of errors in the alignment of the library sequence, so we have performed three steps to correct the errors. First, the duplicated library sequence is removed. Then, the sequence having no mapping quality score of 60 aligned among the library sequences sorted by the BWA-mem algorithm is removed. Finally, regions having a Mappability value of 0.75 or less are removed , And the number of library sequences sorted by chromosome GC ratio was corrected using the LOESS algorithm. After a series of procedures, we have created a bed file that has been calibrated for alignment errors.

In order to manage the quality of sequencing error, the following series of processes were performed. First, we calculate the relative fraction of each chromosome. For example, the relative fraction of chromosome 1 can be expressed as:

Figure 112015118864862-pat00015

After calculating the relative fractions for all chromosomes, the Z score of chromosome N in case 1 can be expressed as:

Figure 112015118864862-pat00016

The standard deviation of the Z-score for the remaining chromosomal regions can be expressed as a Q-score, except for the Z-score for the regions corresponding to chromosomes 13, 18 and 21.

Therefore, when the standard deviation value of the z-score value distribution of Case 1 exceeds 2, it is determined as QC-fail (sequencing error) and re-experimentation and data reproduction are performed. As a result of the above QC process, And the distribution of the read is constant as shown in FIG.

Example  3. G-score calculation using permutation and fetal sex / Number of copies  Abnormal decision

The following procedure was performed to calculate the G-score. First, we calculate the relative fraction of the chromosomes of interest. For example, the relative fractions of a particular chromosome can be expressed as:

Figure 112015118864862-pat00017

The relative fraction of such specific chromosomes may be expressed by the following equation (3).

Equation 3:

Figure 112015118864862-pat00018

For all chromosomes, the G-score of subject A can be expressed as follows.

Figure 112015118864862-pat00019

This G-score may be expressed by the following equation (4).

Equation 4:

Figure 112015118864862-pat00020

The absolute value of the G-score difference of the chromosome N of the normal person and the subject A was obtained and the reference chromosome combination in which the absolute value satisfied the maximum value was determined by proceeding with randomization. When the random assignment was gradually increased and the results were compared, a large number of random assignment analyzes showed an improvement of more than 50% as shown in Table 1.

Random assignment analysis of chromosome 13, 18 and 21 chromosome Number of Permutations Enforced 100 500 1,000 1,500 2,000 5,000 10,000 15,000 50,000 100,000 13 6.903 8.142 9.361 9.361 8.955 9.361 8.955 9.361 9.361 9.361 18 -0.52 -0.09 -0.025 -0.012 -0.025 -0.025 0.122 0.136 0.128 0.136 21 1.051 1.352 1.343 1.364 1.168 1.201 1.352 1.374 1.377 1.532

The reference chromosome combinations can be changed by optimization for each analysis, and combinations that are detected more than 5 times in 10 operations to determine G-score of chromosome 13, 18, 21, X, Y are shown in Table 2 .

13, 18, 21, X, and Y chromosomes used in the G-score calculation of the main reference chromosome combination Chromosomes of interest Reference chromosome combination No. 13 4 and 6 18 times 4, 7, 10, 16 # 21 7, 11, 14, 22 X
16, 20
Y 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 17, 19

In order to determine whether the chromosomes of interest in the test sample are chromosomally insoluble, the G-score range of the normal group is calculated and set. When an outlier which is outside the maximum value and the minimum value range of the normal group G-score is found, It is determined that the copy number of the corresponding chromosome has been added. If it is smaller than the minimum value of the normal group G-score, it is determined that the copy number of the corresponding chromosome has been lost As a result of comparing the chromosome aberration group (Trisomy 21, Trisomy 18, Trisomy 13) with the normal group by the above method, it was confirmed that the G-score maximum value / minimum value of the chromosome aberration group and the normal group did not match ). As shown in Table 3, when the reference values of the G-score for the chromosomal aberration are 3 (Trisomy 21), 2.55 (Trisomy 18), and 3.5 (Trisomy 13) Sensitivity and specificity were detected, and the lower 95% confidence interval of the specificity was found to be over 98%.

Sensitivity and Specificity of Chromosomal Anomaly Detection by G-score Calculation Method Chromosomal abnormality Sensitivity (95% confidence interval) Specificity (95% confidence interval) Trisomy 21 (n = 42) 100% (91.62-100.0%) 100% (98.80-100.0%) Trisomy 18 (n = 21) 100% (84.54-100.0%) 100% (98.87-100.0%) Trisomy 13 (n = 3) 100% (43.85-100.0%) 100% (98.93-100.0%)

While the present invention has been particularly shown and described with reference to specific embodiments thereof, those skilled in the art will appreciate that such specific embodiments are merely preferred embodiments and that the scope of the present invention is not limited thereto will be. Accordingly, the actual scope of the present invention will be defined by the appended claims and their equivalents.

Claims (16)

a) extracting DNA from a biological sample of a mother to obtain sequence information;
b) aligning the obtained sequence information to a reference genome database;
c) The Q-score for the sorted sequence information is calculated by the method of ci) to c-vi), and only the sequence information that is less than the cut-off value of the Q-score is selected ;
(ci) specifying the region of each aligned nucleic acid sequence;
(c-ii) specifying a sequence satisfying a mapping quality score and a reference value of the GC ratio;
(c-iii) calculating a fraction of chromosome N in any of the above-identified sequences in Case 1 by the following equation 1;
Equation 1:
Figure 112016090873536-pat00032

(c-iv) calculating the Z-score of the chromosome N region by the following equation 2;
Equation 2:
Figure 112016090873536-pat00033

(cv) The standard deviation of the Z-score for the remaining chromosomal regions except for the Z-score of the region corresponding to chromosome 13, 18, ); And
(c-vi) determining a reference value of the Q-score, determining that the calculated Q-score value is below the reference value of the Q-score, and reproducing the sequence information reads of the sample; And
d) The G-score for the selected sequence information is calculated by the method of di) to d-vii) and compared with the reference chromosomal combination, the gender and the number of copies of the fetus are determined Step
(di) randomly selecting reference chromosomes from chromosome 1 to 22;
(d-ii) calculating a fraction value of an arbitrary chromosome N by the following equation 3;
Equation 3:
Figure 112016090873536-pat00034

(d-iii) calculating the G-score of the chromosome N of the case 1 by the following equation (4);
Equation 4:
Figure 112016090873536-pat00035

(d-iv) repeating the above steps (di) to (d-iii) to select a chromosome combination that maximizes the G-score difference between the normal and abnormal groups;
(dv) The G-score is calculated using the chromosomal combination obtained in the step (d-iv). If the calculated G-score is less than the reference value of the G-score, Determining if the number of replications is increased;
(d-vi) The above (di) to (d-iv) steps are carried out in a reference group of a mother whose fetal karyotype is 46, XX or 46, XY to obtain a reference value of the G- score for the X and Y chromosomes step; And
(d-vii) determining the sex by comparing the G-score for the X and Y chromosomes in any case with the G-score reference value for the step d-vi);
Of fetal sex and the number of replica abnormalities
The method according to claim 1, wherein the reference chromosome combination in step d) is chromosomes 4 and 6 when the selected sequence information is chromosome 13 and chromosomes 4 and 6 in the case of chromosome 18, In the case of chromosome 21, chromosomes 7, 11, 14 and 22 are chromosomes. In the case of X chromosome, chromosomes 16 and 20 are chromosomes. In the case of Y chromosome, chromosomes 1, 2 and 3 , 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 17 and 19 Detection method.
The method according to claim 1, wherein the step a) is performed by a method comprising the steps of:
(i) The mixture of fetal and maternal nucleic acids is amniotic fluid obtained by amniocentesis, villus obtained by chorionic villi sampling, percutaneous umbilical blood sampling In umbilical cord blood, spontaneous miscarrying fetus tissue, or human peripheral blood, obtained by the method of the present invention;
(ii) removing proteins, fats, and other residues using a salting-out method, a column chromatography method, and a beads method in a collected fetal and parent nucleic acid mixture to obtain a purified nucleic acid;
(iii) producing a single-end sequencing or pair-end sequencing library for purified nucleic acids or nucleic acids that have been enzymatically cleaved, disrupted, or randomly fragmented by the hydroshear method;
(iv) reacting the constructed library to a next-generation sequencer; And
(v) obtaining the sequence information of the nucleic acid from a next-generation gene sequencer.
delete 2. The method according to claim 1, wherein the mapping quality score is 15 to 70 and the GC ratio is 30 to 60%.
2. The method according to claim 1, wherein the reference value of the score of step (c-vi) is 4
delete delete 2. The method according to claim 1, wherein XO is determined if the G-score for the X-chromosome is less than the G-score, and X-chromosome is determined to be 3 or more if the G- Abnormality detection method.
The method according to claim 1, wherein when the number of Y chromosomes is one or more, the ratio of the fraction of the Y chromosome per X chromosome fraction to the fetal fraction of the X chromosome is calculated by the formula 5, the fetal fraction of the Y chromosome is calculated by the formula 6, Is determined to be XY when the value is 0.7 to 1.4, and XYY when the value is 1.4 to 2.6.
Equation 5:
Figure 112016090873536-pat00025

Equation 6:
Figure 112016090873536-pat00026

Equation 7:
Figure 112016090873536-pat00027

11. The method according to any one of claims 1 to 10, wherein the reference value of the G-score for the X chromosome is -2 or 2.
The method according to claim 1, wherein the number of repetitions of the step (d-iv) is 100 or more.
A reading unit for extracting DNA from a biological sample of a mother to decode the sequence information;
An alignment unit for aligning the decoded sequence to a standard chromosome sequence database;
A quality control unit for calculating Q-score with respect to sorted sequence information by the following methods (i) to (vi), and selecting only sequence information of samples less than a cut-off value;
(i) identifying regions of each aligned nucleic acid sequence;
(ii) specifying a sequence satisfying a mapping quality score and a reference value of the GC ratio;
(iii) calculating a fraction of chromosome N in any of the above-identified sequences in Case 1 by the following equation 1;
Equation 1:
Figure 112016090873536-pat00036

(iv) calculating the Z-score of the chromosome N region by the following formula 2;
Equation 2:
Figure 112016090873536-pat00037

(v) The standard deviation of the Z-score for the remaining chromosomal regions except for the Z-score of the region corresponding to the chromosome 13, 18, and 21 in any case 1 is referred to as Q-score ); And
(vi) determining a reference value of the Q-score, determining that the calculated Q-score value is below the reference value of the Q-score, and reproducing the sequence information of the sample; And
G-score for selected sequence reads is calculated by the method of a) to g) and compared to the reference chromosome combination, gender and sex to determine the fetal sex and the number of replicates The number-
(a) randomly selecting reference chromosomes from chromosome 1 to 22;
(b) calculating a fraction value of an arbitrary chromosome N by the following equation (3);
Equation 3:
Figure 112016090873536-pat00038

(c) calculating the G-score of the chromosome N in the case 1 by the following equation 4;
Equation 4:
Figure 112016090873536-pat00039

(d) repeating the steps (a) to (c) to select a chromosome combination that maximizes the G-score difference between the normal and abnormal groups;
(e) The G-score is calculated using the chromosomal combination obtained in the step (d). If the calculated G-score is less than the reference value of the G-score, determining the number of copies to be increased in the case of chrhk;
(f) performing the steps (a) to (d) in a reference group of a mother having a fetal karyotype of 46, XX or 46, XY to obtain a G-score reference value for the X and Y chromosomes; And
(g) comparing the G-score for the X and Y chromosomes in any case with the G-score reference value in step (f) to determine gender;
Fetus gender and a duplication number abnormality detecting device
37. A computer readable medium comprising instructions configured to be executed by a processor for detecting a gender and a number of copies of a fetus,
a) extracting DNA from a biological sample of a mother to obtain sequence information;
b) aligning the obtained sequence information to a reference genome database;
c) The Q-score for the sorted sequence information is calculated by the following methods ci) to c-vi), and only the sequence information having a cut-off value less than the Q-score is selected ;
(ci) specifying the region of each aligned nucleic acid sequence;
(c-ii) specifying a sequence satisfying a mapping quality score and a reference value of the GC ratio;
(c-iii) calculating a fraction of chromosome N in any of the above-identified sequences in Case 1 by the following equation 1;
Equation 1:
Figure 112016090873536-pat00040

(c-iv) calculating the Z-score of the chromosome N region by the following equation 2;
Equation 2:
Figure 112016090873536-pat00041

(cv) The standard deviation of the Z-score for the remaining chromosomal regions except for the Z-score of the region corresponding to chromosome 13, 18, ); And
(c-vi) determining a reference value of the Q-score, determining that the calculated Q-score value is below the reference value of the Q-score, and reproducing the sequence information reads of the sample; And
d) The G-score for the selected sequence information is calculated by the method of di) to d-vii) and compared with the reference chromosomal combination, the gender and the number of copies of the fetus are determined A processor for detecting the gender and the number of copies of the fetus,
(di) randomly selecting reference chromosomes from chromosome 1 to 22;
(d-ii) calculating a fraction value of an arbitrary chromosome N by the following equation 3;
Equation 3:
Figure 112016090873536-pat00042

(d-iii) calculating the G-score of the chromosome N of the case 1 by the following equation (4);
Equation 4:
Figure 112016090873536-pat00043

(d-iv) repeating the above steps (di) to (d-iii) to select a chromosome combination that maximizes the G-score difference between the normal and abnormal groups;
(dv) The G-score is calculated using the chromosomal combination obtained in the step (d-iv). If the calculated G-score is less than the reference value of the G-score, Determining that the number of replicas is increased when the reference value exceeds the reference value of the number of replicas;
(d-vi) obtaining the G-score reference value for the X and Y chromosomes by performing the steps (di) to (d-iv) in a reference group of a mother whose fetal karyotype is 46, XX or 46, XY ; And
(d-vii) comparing the G-score for the X and Y chromosomes in any case with the G-score reference value for the step d-vi) to determine the gender;
Wherein the instructions are configured to be executed by a computer.
14. The method according to claim 13, wherein the reference value of the G-score for the X chromosome is -2 or 2,
15. The computer-readable medium of claim 14, wherein the reference value of the G-score for the X chromosome is -2 or 2,
KR1020150172212A 2015-12-04 2015-12-04 Copy Number Variation Determination Method Using Sample comprising Nucleic Acid Mixture KR101686146B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150172212A KR101686146B1 (en) 2015-12-04 2015-12-04 Copy Number Variation Determination Method Using Sample comprising Nucleic Acid Mixture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150172212A KR101686146B1 (en) 2015-12-04 2015-12-04 Copy Number Variation Determination Method Using Sample comprising Nucleic Acid Mixture

Publications (1)

Publication Number Publication Date
KR101686146B1 true KR101686146B1 (en) 2016-12-13

Family

ID=57575050

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150172212A KR101686146B1 (en) 2015-12-04 2015-12-04 Copy Number Variation Determination Method Using Sample comprising Nucleic Acid Mixture

Country Status (1)

Country Link
KR (1) KR101686146B1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709276A (en) * 2017-01-21 2017-05-24 深圳昆腾生物信息有限公司 Genovariation cause analysis method and system
KR20180098438A (en) * 2017-02-24 2018-09-04 에스디지노믹스 주식회사 Priority calculation method for copy number variation candidate
WO2020022733A1 (en) * 2018-07-27 2020-01-30 주식회사 녹십자지놈 Whole genome sequencing-based chromosomal abnormality detection method and use thereof
WO2021034034A1 (en) 2019-08-19 2021-02-25 주식회사 녹십자지놈 Method for detecting chromosomal abnormality by using information about distance between nucleic acid fragments
WO2021107676A1 (en) * 2019-11-29 2021-06-03 주식회사 녹십자지놈 Artificial intelligence-based chromosomal abnormality detection method
WO2022097844A1 (en) * 2020-11-04 2022-05-12 국립암센터 Method for predicting survival prognosis of pancreatic cancer patients by using gene copy number variation information
WO2022250512A1 (en) * 2021-05-28 2022-12-01 한국과학기술원 Artificial intelligence-based method for early diagnosis of cancer, using cell-free dna distribution in tissue-specific regulatory region
WO2024158118A1 (en) * 2023-01-26 2024-08-02 지놈케어 주식회사 Method for detecting fetal copy number variations based on virtual positive and negative data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150070111A (en) * 2012-08-30 2015-06-24 프리마이타 헬스 엘티디 Method of detecting chromosomal abnormalities

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150070111A (en) * 2012-08-30 2015-06-24 프리마이타 헬스 엘티디 Method of detecting chromosomal abnormalities

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709276A (en) * 2017-01-21 2017-05-24 深圳昆腾生物信息有限公司 Genovariation cause analysis method and system
KR20180098438A (en) * 2017-02-24 2018-09-04 에스디지노믹스 주식회사 Priority calculation method for copy number variation candidate
KR101957909B1 (en) 2017-02-24 2019-03-15 에스디지노믹스 주식회사 Priority calculation method for copy number variation candidate
WO2020022733A1 (en) * 2018-07-27 2020-01-30 주식회사 녹십자지놈 Whole genome sequencing-based chromosomal abnormality detection method and use thereof
WO2021034034A1 (en) 2019-08-19 2021-02-25 주식회사 녹십자지놈 Method for detecting chromosomal abnormality by using information about distance between nucleic acid fragments
WO2021107676A1 (en) * 2019-11-29 2021-06-03 주식회사 녹십자지놈 Artificial intelligence-based chromosomal abnormality detection method
JP7539985B2 (en) 2019-11-29 2024-08-26 ジーシー ジェノム コーポレーション Artificial intelligence-based method for detecting chromosomal abnormalities
WO2022097844A1 (en) * 2020-11-04 2022-05-12 국립암센터 Method for predicting survival prognosis of pancreatic cancer patients by using gene copy number variation information
WO2022250512A1 (en) * 2021-05-28 2022-12-01 한국과학기술원 Artificial intelligence-based method for early diagnosis of cancer, using cell-free dna distribution in tissue-specific regulatory region
WO2024158118A1 (en) * 2023-01-26 2024-08-02 지놈케어 주식회사 Method for detecting fetal copy number variations based on virtual positive and negative data

Similar Documents

Publication Publication Date Title
KR101686146B1 (en) Copy Number Variation Determination Method Using Sample comprising Nucleic Acid Mixture
US12116628B2 (en) Rapid aneuploidy detection
AU2015266665C1 (en) Detecting fetal sub-chromosomal aneuploidies and copy number variations
CN107077537B (en) Detection of repeat amplification with short read sequencing data
KR101795124B1 (en) Method and system for detecting copy number variation
EP3061021B1 (en) Method for improving the sensitivity of detection in determining copy number variations
JP5938484B2 (en) Method, system, and computer-readable storage medium for determining presence / absence of genome copy number variation
KR101817785B1 (en) Novel Method for Analysing Non-Invasive Prenatal Test Results from Various Next Generation Sequencing Platforms
KR20170125044A (en) Mutation detection for cancer screening and fetal analysis
KR101739535B1 (en) Method for detecting aneuploidy of fetus
KR20140023847A (en) Noninvasive detection of fetal genetic abnormality
CN111052249B (en) Methods of determining predetermined chromosome conservation regions, methods of determining whether copy number variation exists in a sample genome, systems, and computer readable media
KR102405245B1 (en) Method for Detecting Chromosomal Abnormalities Based on Whole Genome Sequencing and Uses thereof
CN110770341A (en) Aneuploidy noninvasive prenatal screening method
JP2019500901A (en) Method for determining copy number anomalies in a sample containing a mixture of nucleic acids
KR101881098B1 (en) Method for detecting aneuploidy of fetus
KR20230076686A (en) Method for detecting aneuploidy of fetus based on synthetic data
KR102519739B1 (en) Non-invasive prenatal testing method and devices based on double Z-score
JP2024534899A (en) Methods and devices for non-invasive prenatal testing
Kit Text S1. Supplementary methods for sequence analysis, FDR correction, indel and TE insertion analysis.

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant