KR101907650B1

KR101907650B1 - Method of non-invasive trisomy detection of fetal aneuploidy

Info

Publication number: KR101907650B1
Application number: KR1020160157593A
Authority: KR
Inventors: 윤태균; 이병철; 박정선; 박동윤; 이정호
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2016-11-24
Filing date: 2016-11-24
Publication date: 2018-10-12
Also published as: KR20170036648A

Abstract

산모에서 분리된 생물학적 시료로부터 얻어진 염색체 염기 서열 분석 정보를 이용하는 태아 염색체 이수성 판단을 위한 비침습적 태아 염색체 분석 방법이 제공된다.Noninvasive fetal chromosome analysis methods are provided for determining fetal chromosomal integrity using chromosome sequencing information obtained from biological samples isolated from maternal.

Description

METHOD OF NON-INVASIVE TRISOMY DETECTION OF FETAL ANEUPLOIDY BACKGROUND OF THE INVENTION 1. Field of the Invention < RTI ID = 0.0 >

산모에서 분리된 생물학적 시료로부터 얻어진 염색체 염기 서열 분석 정보를 이용하는 태아 염색체 이수성 판단을 위한 비침습적 태아 염색체 분석 방법이 제공된다. Noninvasive fetal chromosome analysis methods are provided for determining fetal chromosomal integrity using chromosome sequencing information obtained from biological samples isolated from maternal.

최근, 출산 연령의 증가와 여러 산전 진단 장비들의 개발로 인하여 산전 진단에 대한 관심은 날로 증가하고 있다. Recently, due to the increase in the age of childbirth and the development of various prenatal diagnosis devices, interest in prenatal diagnosis is increasing day by day.

산전 진단 방법은 크게 침습적 진단 방법과 비침습적 진단 방법으로 나누어 볼 수 있다. 침습적 진단 방법에는 양수검사, 제대혈 채취 (Percutaneous umblical blood sampling), 융모막 채취, 태아조직 채취 등이 있으며, 검사 과정에서 태아에게 충격을 가하여 유산이나, 질병 또는 기형 등을 유발할 수 있다. 이러한 침습적 진단 방법의 문제점들을 극복하기 위하여 비침습적 진단 방법들이 개발되고 있다.Prenatal diagnosis can be divided into invasive and noninvasive diagnostic methods. Invasive diagnostic methods include amniocentesis, percutaneous umblical blood sampling, chorionic villus sampling, fetal tissue collection, and may cause abortion, disease or malformations by shocking the fetus during the examination. Noninvasive diagnostic methods have been developed to overcome the problems of these invasive diagnostic methods.

산모 혈청 내의 무세포 DNA(cell-free DNA; cfDNA)에서의 무세포 태아 DNA(cell-free fetal DNA; cffDNA)의 발견은 비침습적 산전 유전적 진단법을 개발하기 위한 강력한 도구를 제공하였다. 이러한 cffDNA의 산전 진단에의 응용은 차세대 서열분석(Next Generation Sequencing: NGS)과 같은 대규모 병렬형 서열분석(massively parallel sequencing) 기술의 도입에 의해 더 가속화되었다. The discovery of cell-free fetal DNA (cffDNA) in cell-free DNA (cfDNA) in maternal serum provided a powerful tool for the development of noninvasive prenatal genetic diagnosis. The application of this cffDNA to prenatal diagnosis was further accelerated by the introduction of massively parallel sequencing techniques such as Next Generation Sequencing (NGS).

또한, 몇 가지 연구들은 전체 게놈 서열분석(WGS) 및 cffDNA의 표적 농축(target enrichment) 후 서열분석에 의해 전체 게놈에 걸쳐 태아와 산모 DNA가 균일하게 분포되어 있음을 입증하였다(Lo YM et al., Science translational medicine 2010;2:61ra91; Liao GJ et al., Clinical chemistry 2011;57:92-101; Kitzman JO et al., Science translational medicine 2012;4:137ra76). In addition, several studies have demonstrated that fetal and maternal DNA is uniformly distributed throughout the entire genome by sequencing after total genome sequencing (WGS) and target enrichment of cffDNA (Lo YM et al. , Science translational medicine 2010; 2: 61ra91; Liao GJ et al., Clinical chemistry 2011; 57: 92-101; Kitzman JO et al., Science translational medicine 2012; 4: 137ra76).

이러한 연구에 기초하여, 산모의 혈액 (e.g., 혈장, 혈청 등) 내에 혼재하는 산모와 태아의 cfDNA 로부터 태아의 염색체 이상을 검사할 수 있는 방법이 제안되고 있다. 그러나, 산모 혈액 내 존재하는 태아의 cfDNA량이 상대적으로 매우 적기 때문에, 많은 수의 NGS 리드를 생성하여 판별하는 방식이 일반적으로 사용되고 있다. 많은 수의 NGS 리드 생성은 실험 비용의 증가를 초래하기 때문에, 낮은 리드 수 (Extremely Low Reads)에서도 민감하게 태아 염색체 이상 판별이 가능한 판별 수단이 개발되어야 한다. 또한 Sequencer, library prep, GC contents 등으로 인해 차세대 서열분석과 같은 대규모 병렬 서열 분석 데이터에 편차(bias)가 발생하므로, 보다 정확한 판별을 위해서는 이러한 bias를 제거하는 것도 필요하다.Based on these studies, a method has been proposed that can detect fetal chromosomal anomalies from maternal and fetal cfDNA in maternal blood (e.g., plasma, serum, etc.). However, since the amount of cfDNA in the fetus present in maternal blood is relatively small, a method of generating and discriminating a large number of NGS leads is generally used. Since a large number of NGS leads leads to an increase in the cost of the experiment, a discriminating means capable of discriminating the fetal chromosome abnormally sensitively should be developed even at a low number of leads (Extremely Low Reads). Also, because of sequencing, library prep, and GC contents, bias occurs in large-scale parallel sequencing data such as next-generation sequencing, and it is also necessary to eliminate this bias for more accurate discrimination.

따라서, 정확한 산전 태아 염색체 이상 진단을 위하여, 낮은 리드 수에서도 민감한 판별이 가능하고, 데이터의 편차를 제거하여 보다 정확한 결과를 도출할 수 있는 염색체 분석 기술의 개발이 요구된다.Therefore, it is required to develop a chromosome analysis technique that can discriminate accurately even at a low lead number and accurately remove the deviation of data for accurate fetal chromosome abnormality diagnosis.

대한민국 등록특허 제10-1516976호Korean Patent No. 10-1516976

일 예는 산모에서 분리된 생물학적 시료로부터 얻어진 염색체 염기 서열 정보를 이용하는 태아 염색체 이수성 판단을 위한 비침습적 태아 염색체 분석 방법을 제공한다. One example provides a noninvasive fetal chromosome analysis method for determining fetal chromosomal status using chromosome sequence information obtained from biological samples isolated from a mother.

본 명세서에 있어서, 상기 비침습적 태아 염색체 분석 방법은 태아의 염색체 이수성을 결정(판별, 확인, 또는 진단)하기 위한 서열 정보 분석 방법, 또는 태아의 염색체 이수성 결정 (판별, 확인, 또는 진단)에 정보를 제공하기 위한 방법으로 표현될 수 있으며, 이들은 모두 동일한 의미를 갖는다.In this specification, the non-invasive fetal chromosomal analysis method is a method of analyzing a sequence information for determining (discriminating, confirming, or diagnosing) the chromosomal integrity of a fetus, or a method for analyzing chromosomes in fetal chromosomes (discrimination, confirmation, &Lt; / RTI > and they all have the same meaning.

상기 비침습적 태아 염색체 분석 방법은 산모로부터 분리된 생물학적 시료로부터 얻어진 DNA 서열 정보로부터 태아의 염색체 이수성 여부를 정확하게 판별하기 위하여, 이수성 여부를 판별하고자 하는 특정 염색체 (예컨대, 13번, 18 번 또는 21번 염색체)의 평균 리드수와 상기 염색체를 제외한 다른 염색체로부터 생성한 통합 빈 (merged bin)에 존재하는 평균 리드수를 비교하여 실험간 편차를 제거하고, CV (Coefficient of Variation) 값으로 가중 평균된 염색체간 리드 수의 비율을 이용하여 결과의 신뢰도 및 특이도를 향상시켜 위양성 확률을 줄이는 것을 특징으로 한다.The non-invasive fetal chromosomal analysis method is a method for identifying a chromosome of a fetus from DNA sequence information obtained from a biological sample separated from a mother by using a specific chromosome (for example, 13, 18 or 21 Chromosome) and the average number of leads existing in the merged bin generated from other chromosomes except for the above chromosome are removed to remove the deviation between the experiments, and a weighted average of the chromosomes And the probability of false positives is reduced by improving the reliability and specificity of the result using the ratio of the number of inter-leads.

상기 산모로부터 분리된 생물학적 시료로부터 얻어진 DNA 서열 정보는 차세대 염기서열분석법 (NGS) 등과 같은 대규모 병렬형 서열분석의 전체 게놈 서열분석 (Whole Genome Sequencing; WGS) 방법으로 생성된 자료일 수 있다.The DNA sequence information obtained from the biological sample separated from the mother may be data generated by Whole Genome Sequencing (WGS) method of large scale parallel sequence analysis such as Next Generation Sequence Analysis (NGS).

일 구체예에서, 상기 비침습적 태아 염색체 분석 방법은 다음의 단계를 포함할 수 있다:In one embodiment, the non-invasive fetal chromosome assay method may comprise the steps of:

1-1) 산모로부터 분리된 시험 시료로부터 전체 게놈을 커버(cover)하는 폴리뉴클레오타이드 단편들의 서열 정보를 얻는 단계;1-1) obtaining sequence information of polynucleotide fragments covering the entire genome from a test sample separated from a mother;

1-2) 참조 시료의 전체 게놈을 커버하는 폴리뉴클레오타이드 단편들의 서열 정보를 준비하는 단계;1-2) preparing sequence information of polynucleotide fragments covering the entire genome of the reference sample;

2-1) 상기 단계 1-1)에서 얻어진 시험 시료의 폴리뉴클레오타이드 단편들의 서열 정보를 표준 게놈 염기 서열 (Reference genome sequence)과 비교(mapping)하여, 각 염색체 별로 미리 설정된 bin 개수 (bin number)를 갖도록 시험 폴리뉴클레오타이드 단편 수(polynucleotide fragment count)를 결정하는 단계, 2-1) Sequence information of the polynucleotide fragments of the test sample obtained in the above step 1-1) is mapped to a reference genome sequence, and a bin number previously set for each chromosome is calculated Determining a polynucleotide fragment count to have the polynucleotide fragment count,

2-2) 상기 단계 1-2)에서 준비된 참조 시료의 폴리뉴클레오타이드 단편들의 서열 정보를 이용하여 미리 설정된 bin 개수를 갖도록 참조 폴리뉴클레오타이드 단편 수를 결정하는 단계;2-2) determining the number of reference polynucleotide fragments so as to have a predetermined bin number using sequence information of the polynucleotide fragments of the reference sample prepared in the step 1-2);

3-1) 상기 시험 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체로부터 생성된 각각의 merged bin의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (상기 비율은 merged bin의 개수 개 만큼 얻어짐);3-1) The number of the average polynucleotide fragments of the target chromosome to be tested for the number of the test polynucleotide fragment water is selected from the number of n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome Obtaining a ratio of the number of average polynucleotide fragments of each merged bin generated from the chromosome to the number of average number of test polynucleotide fragments (the ratio is obtained by number of merged bins);

3-2) 상기 참조 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체를 대상으로 생성된 각각의 merged bin의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (상기 비율은 참조 시료 수 * merged bin의 개수 개 만큼 얻어짐);3-2) The number of the average polynucleotide fragments of the target chromosome to be tested for the number of the reference polynucleotide fragments, and the number (n is an integer selected from 1 to 21) selected from among chromosomes other than the target chromosome Obtaining a ratio of the number of average polynucleotide fragments to the number of average polynucleotide fragments of each merged bin generated on chromosomes (the ratio is obtained by number of reference samples * merged bin);

4) 평균 참조 폴리뉴클레오타이드 단편 수 비율 별로 CV (Coefficient of Variation) 값을 얻는 단계; 4) obtaining a value of CV (Coefficient of Variation) by the ratio of the number of average reference polynucleotide fragments;

5-1) 상기 단계 3-1)의 평균 시험 폴리뉴클레오타이드 단편 수 비율 중에서 CV값이 적은 상위 N_CV개에 해당하는 수치들을 선정하여, 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계; 5-1) obtaining a ratio of the weighted average number of test polynucleotide fragments by selecting values corresponding to the upper N _CVs having a smaller CV value from the ratio of the average number of test polynucleotide fragments in step 3-1);

5-2) 상기 단계 3-2)의 평균 참조 폴리뉴클레오타이드 단편 수 비율을 대상으로 상기 단계 5-1)에서 선정된 CV값이 적은 상위 N_CV개에 해당하는 수치들을 이용하여, 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계; 및 5-2) Using the numerical values corresponding to the upper N _CVs having the smallest CV value selected in the step 5-1) as the ratio of the average number of reference polynucleotide fragments in the step 3-2), the weighted average reference poly Obtaining a ratio of the number of nucleotide fragments; And

6) 상기 얻어진 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 비교하는 단계.6) comparing the ratio of the weighted average number of test polynucleotide fragments obtained to the weighted average number of reference polynucleotide fragments obtained.

일 예에서, 상기 단계 6)의 비교하는 단계는 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 사용하여, 목적 염색체의 Z-score를 얻는 단계에 의하여 수행될 수 있다.In one example, the step of comparing step 6) may be performed by obtaining the Z-score of the target chromosome using the ratio of the weighted average number of test polynucleotide fragments to the weighted average number of polynucleotide fragments.

일 예에서, 상기 비침습적 태아 염색체 분석 방법은, 상기 단계 6) 이후에, In one example, the non-invasive fetal chromosome assay method further comprises, after step 6)

7) 상기 단계 6)에서 얻어진 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 비교 결과 (예컨대, Z-score)를 이용하여 태아의 목적 염색체의 이수성 여부를 확인하는 단계7) confirming whether or not the fetal chromosome is complementary to the fetus using the comparison result (for example, Z-score) of the ratio of the weighted average number of test polynucleotide fragments to the weighted average number of polynucleotide fragments obtained in the step 6)

를 추가로 포함할 수 있다.. &Lt; / RTI >

상기 비침습적 태아 염색체 분석 방법에서, 단계 1-1) 및 1-2)는 동시 또는 순서에 상관 없이 연속적으로 수행될 수 있으며, 단계 2-1) 및 2-2)는 동시 또는 순서에 상관 없이 연속적으로 수행될 수 있고, 단계 3-1 및 3-2)는 동시 또는 순서에 상관 없이 연속적으로 수행될 수 있다. ,In the non-invasive fetal chromosomal analysis method, steps 1-1) and 1-2) may be performed continuously or in any order, and steps 2-1) and 2-2) may be performed simultaneously or sequentially And steps 3-1 and 3-2 may be performed continuously, regardless of the order or order. ,

일 예에서, 상기 비침습적 태아 염색체 분석 방법은 보다 정확한 결과 도출을 위하여, 상기 단계 2-1) 및 2-2) 이후에 (및 상기 단계 3-1 및 3-2) 이전에), a) 얻어진 시험 폴리뉴클레오타이드 단편 수 및 참조 폴리뉴클레오타이드 단편 수의 바이어스(bias)를 제거하는 단계를 추가로 포함할 수 있다. 상기 바이어스 제거 단계는 SVD (Singular Value Decomposition) 등을 적용하여 수행될 수 있다. In one example, the non-invasive fetal chromosome analysis method may be performed after steps 2-1) and 2-2) (and steps 3-1 and 3-2) And removing the bias of the number of test polynucleotide fragments obtained and the number of reference polynucleotide fragments. The bias removing step may be performed by applying SVD (Singular Value Decomposition) or the like.

상기 염색체는 상염색체일 수 있으며, 인간의 경우, 1 내지 22번까지의 염색체로 이루어진 군에서 선택될 수 있다. 상기 '목적 염색체'는 태아의 염색체 이수성 여부를 확인하고자 하는 염색체로서, 예컨대, 인간의 13번, 18번 또는 21번 염색체일 수 있으나, 이에 제한되는 것은 아니며, 염색체 이수성 여부를 확인하고자 하는 모든 상염색체 중에서 선택될 수 있다. 상기 '목적 염색체를 제외한 다른 염색체 중에서 선택된 n개의 염색체'는 상기 염색체 이수성 여부를 확인하고자 하는 목적 염색체 이외의 나머지 상염색체들 중에서 선택된 염색체이다 (n은 1 내지 21 중에서 선택되는 정수임).The chromosome may be an autosomal chromosome and, in the case of humans, may be selected from the group consisting of chromosomes 1 to 22. The 'target chromosome' is a chromosome for confirming the chromosomal integrity of the fetus. For example, the chromosome may be a chromosome 13, 18 or 21 of a human, but is not limited thereto. May be selected from chromosomes. The 'n chromosomes selected from the chromosomes other than the target chromosome' are chromosomes selected from the rest of the autosomes other than the target chromosome to be checked for chromosomal integration (n is an integer selected from 1 to 21).

상기 산모로부터 분리된 시험 시료는 산모로부터 분리된 혈액, 혈장, 또는 혈청일 수 있다. 본 명세서에 제안된 비침습적 태아 염색체 분석 방법의 적용 가능한 산모는 목적 염색체가 정상인, 즉 목적 염색체의 이수성을 갖지 않는 산모일 수 있다. The test sample separated from the mother may be blood, plasma, or serum separated from the mother. Applicable maternal of the noninvasive fetal chromosome analysis method proposed in this specification may be a mother whose target chromosome is normal, that is, a mother who does not have the desired chromosomal insolubility.

다른 예는 아래의 단계를 포함하는 태아의 염색체 이수성 판단을 위한 컴퓨터 판독 방법을 제공한다:Another example provides a computer readable method for determining chromosomal integrity of a fetus comprising the steps of:

A-1) 시험 시료의 폴리뉴클레오타이드 단편들의 서열 정보를 표준 게놈 염기 서열 (Reference genome sequence)과 비교(mapping)하여, 각 염색체 별로 미리 설정된 bin 개수 (bin number)를 갖도록 시험 폴리뉴클레오타이드 단편 수(polynucleotide fragment count)를 결정하는 단계, A-1) The sequence information of the polynucleotide fragments of the test sample is mapped to the reference genome sequence, and the number of test polynucleotide fragments (bin number) determining a fragment count,

A-2) 참조 시료의 폴리뉴클레오타이드 단편들의 서열 정보를 이용하여 미리 설정된 bin 개수를 갖도록 참조 폴리뉴클레오타이드 단편 수를 결정하는 단계;A-2) determining the number of reference polynucleotide fragments so as to have a preset bin number using sequence information of polynucleotide fragments of the reference sample;

B-1) 상기 시험 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체로부터 생성된 각각의 merged bin의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계;B-1) The number of the average number of polynucleotide fragments of the target chromosome to be tested, of which number is n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome in the test polynucleotide fragment water Obtaining a ratio of the average number of polynucleotide fragments of each merged bin generated from the chromosome to the number of average number of test polynucleotide fragments;

B-2) 상기 참조 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체를 대상으로 생성된 각각의 merged bin의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (상기 비율은 참조 시료 수 * merged bin의 개수 개 만큼 얻어짐);B-2) The number of the average polynucleotide fragments of the target chromosome to be tested for the number of the reference polynucleotide fragments, wherein n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome Obtaining a ratio of the number of average polynucleotide fragments to the number of average polynucleotide fragments of each merged bin generated on chromosomes (the ratio is obtained by number of reference samples * merged bin);

C) 평균 참조 폴리뉴클레오타이드 단편 수 비율 별로 CV (Coefficient of Variation) 값을 얻는 단계; C) obtaining a value of CV (Coefficient of Variation) by the ratio of the number of average reference polynucleotide fragments;

D-1) 상기 단계 B-1)의 평균 시험 폴리뉴클레오타이드 단편 수 비율 중에서 CV값이 적은 상위 N_CV개에 해당하는 수치들을 선정하여, 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계; D-1) selecting numerical values corresponding to the upper N _CVs having a smaller CV value from the ratio of the average number of test polynucleotide fragments in the step B-1 to obtain a weighted average number of test polynucleotide fragments;

D-2) 상기 단계 B-2)의 평균 참조 폴리뉴클레오타이드 단편 수 비율을 대상으로 상기 단계 D-1)에서 선정된 CV값이 적은 상위 N_CV개에 해당하는 수치들을 이용하여, 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계; D-2) Using the numerical values corresponding to the upper N _CVs having the smallest CV value determined in the above step D-1 for the ratio of the average number of reference polynucleotide fragments in step B-2), the weighted average reference poly Obtaining a ratio of the number of nucleotide fragments;

E) 상기 얻어진 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 비교하는 단계; 및E) comparing the ratio of the number of weighted average test polynucleotide fragments obtained to the weighted average number of reference polynucleotide fragments obtained; And

F) 상기 단계 E)에서 얻어진 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 비교 결과 (예컨대, Z-score)를 이용하여 태아의 목적 염색체의 이수성 여부를 확인하는 단계.F) confirming whether or not the fetal chromosome is complementary to the fetus using the comparison result of the ratio of the weighted average number of test polynucleotide fragments obtained in step E) to the weighted average number of polynucleotide fragments (for example, Z-score).

상기 컴퓨터 판독 방법은 보다 정확한 결과 도출을 위하여, 상기 단계 A-1) 및 A-2) 이후에 (및 상기 단계 B-1 및 B-2) 이전에), a) 얻어진 시험 폴리뉴클레오타이드 단편 수 및 참조 폴리뉴클레오타이드 단편 수의 바이어스(bias)를 제거하는 단계를 추가로 포함할 수 있다. 상기 바이어스 제거 단계는 SVD (Singular Value Decomposition) 등을 적용하여 수행될 수 있다.The computer readable method may further comprise: a) after the steps A-1) and A-2) (and before the steps B-1 and B-2) And removing the bias of the number of reference polynucleotide fragments. The bias removing step may be performed by applying SVD (Singular Value Decomposition) or the like.

다른 예는 상기 컴퓨터 판독 방법의 단계를 실행시키기 위하여 컴퓨터 판독 가능한 저장 매체에 저장된 컴퓨터 프로그램을 제공한다.Another example provides a computer program stored on a computer readable storage medium for executing the steps of the computer readable method.

다른 예는 상기 컴퓨터 판독 방법의 단계를 실행시키기 위한 컴퓨터에서 실행 가능한 프로그램(computer executable instruction)이 수록된 컴퓨터 판독 가능한 저장 매체 (또는 기록 매체)를 제공한다.Another example provides a computer readable storage medium (or recording medium) containing a computer executable instruction for executing the steps of the computer readable method.

용어의 정의Definition of Terms

달리 정의되지 않는 경우, 본 명세서에서 사용된 모든 기술 및 과학 용어들은 본 발명이 속하는 기술 분야의 당업자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

"염색체 이수성(aneuploidy)"은 목적 염색체의 수가 정상 염색체의 수(2개)와 상이한 것, 즉, 목적 염색체가 0개, 1개, 또는 3개 이상 (예컨대, 3개) 존재하는 것을 의미한다. 이와 같은 염색체 이수성은 희귀성 유전 질환과 관련 있기 때문에 태아 진단에 있어서 매우 중요하다. 예컨대, 인간 염색체 기준으로, 13번 염색체가 3개 존재하는 경우 (trisomy 13), 파타우 증후군(Patau syndrome), 18번 염색체가 3개 존재하는 경우 (trisomy 18), 에드워드 증후군 (Edward syndrome), 21번 염색체가 3개 존재하는 경우 (trisomy 21), 다운 증후군(Down syndrome)이 야기된다. "Aneuploidy" means that the number of target chromosomes is different from the number of normal chromosomes (2), that is, 0, 1, or more than 3 (for example, 3) target chromosomes exist . Such chromosomal aberrations are very important for fetal diagnosis because they are associated with rare genetic diseases. For example, on a human chromosome basis, three chromosomes 13 (trisomy 13), Patau syndrome, three chromosomes 18 (trisomy 18), Edward syndrome, When there are three chromosomes 21 (trisomy 21), Down syndrome is caused.

"표준 게놈 염기 서열 (reference genome sequence)"은 한 종을 대표하는 게놈 염기 서열 데이터베이스를 지칭한다. 현재 인간의 reference genome은 빌드 37(build 37: GRCh37), hg18, hg19, hg38과 같은 간행된(예컨대, UCSC, NCBI 등) 기준 게놈 서열에 근거하여 구축된 것일 수 있다.A "reference genome sequence" refers to a genomic sequence database representing one species. Currently the human reference genome may be constructed on the basis of published genomic sequences (eg, UCSC, NCBI, etc.) such as Build 37 (build 37: GRCh37), hg18, hg19, hg38.

"대규모 병렬 염기서열분석(massively parallel sequencing)"은 하나의 유전체(genome)을 무수히 많은 조각(폴리뉴클레오타이드 단편)으로 랜덤하게 분해하여 각 조각의 서열을 동시에 읽어낸 뒤, 이렇게 얻은 서열 데이터를 생물 정보학적 기법(bioinformatics)을 이용하여 조합함으로써 방대한 유전체 정보를 빠르게 해독하는 염기서열 분석 방법을 총칭한다. 대규모 병렬 염기서열분석의 추가적인 설명은 Rogers and Ventner, Nature (2005) 437:326~327에서 찾을 수 있다."Massively parallel sequencing" randomly decomposes a genome into numerous pieces (polynucleotide fragments), simultaneously reads out the sequence of each fragment, A method for sequencing nucleotide sequences that rapidly decode enormous genomic information by combining them using bioinformatics. Further explanations of large-scale parallel sequencing can be found in Rogers and Ventner, Nature (2005) 437: 326-327.

본 명세서에서 수치 앞에 기재된 "약"은, 다른 정의가 없는 한, 기재된 수치의 10%, 5%, 또는 3%의 변동폭(증감분)을 포함하기 위하여 사용된 것일 수 있다.The term " about "as used herein to refer to a numerical value may be used to include a variation (variation) of 10%, 5%, or 3% of the stated value, unless otherwise defined.

이하, 본 발명을 보다 상세히 설명한다.Hereinafter, the present invention will be described in more detail.

단계 1): 전체 게놈을 커버(cover)하는 폴리뉴클레오타이드 단편들의 서열 정보 수득 단계 Step 1): Covering the entire genome Step of obtaining sequence information of polynucleotide fragments

상기 폴리뉴클레오타이드 단편들의 서열 정보는 시료로부터 선택된 주형 DNA의 서열분석에 의해 얻어질 수 있다.Sequence information of the polynucleotide fragments can be obtained by sequencing the template DNA selected from the sample.

상기 폴리뉴클레오타이드 단편들은 표준 게놈 염기서열과의 맵핑을 통하여 각 염색체의 특정 위치에 지정되며, 전체 게놈을 커버한다. The polynucleotide fragments are assigned to specific positions of each chromosome through mapping with a standard genomic sequence and cover the entire genome.

상기 폴리뉴클레오타이드 단편들의 염기 서열은 대규모 병렬형 염기 서열 분석 방법, 예컨대, 차세대 서열 분석법에 의하여 얻어진 것일 수 있다. 이 경우, 상기 폴리뉴클레오타이드 단편은 차세대 염기서열 분석에 사용되는 리드 (read)이며, 상기 폴리뉴클레오타이드 단편 수는 리드 수 (read count)이며, 상기 평균 폴리뉴클레오타이드 단편 수는 평균 리드 수일 수 있다. The nucleotide sequence of the polynucleotide fragments may be obtained by a large-scale parallel-type sequencing method, for example, a next-generation sequencing method. In this case, the polynucleotide fragment may be a read used in a next-generation sequencing analysis, wherein the number of polynucleotide fragments is a read count, and the average number of polynucleotide fragments may be an average number of leads.

일 구체예에서, 상기 폴리뉴클레오타이드 단편들은 또는 리드들은 약 10 내지 약 2000 bp, 약 10 내지 약 1000bp, 약 10 내지 약 500bp, 약 10 내지 약 300bp, 약 10 내지 약 200 bp, 약 25 내지 약 2000 bp, 약 25 내지 약 1000 bp, 약 25 내지 약 500bp, 약 25 내지 약 300bp, 약 25 내지 약 200bp, 약 25 내지 약 100bp, 약 50 내지 약 2000 bp, 약 50 내지 약 1000 bp, 약 50 내지 약 500bp, 약 50 내지 약 300bp, 약 50 내지 약 200bp, 약 50 내지 약 100bp, 약 100 내지 약 2000 bp, 약 100 내지 약 1000 bp, 약 100 내지 약 500bp, 약 100 내지 약 300bp, 약 100 내지 약 200bp, 약 150 내지 약 2000 bp, 약 150 내지 약 1000 bp, 약 150 내지 약 500bp, 또는 약 150 내지 약 300bp 길이를 갖는 것일 수 있으며, 그 길이가 각각 동일하거나 상이할 수 있다. 예컨대, 상기 폴리뉴클레오타이드 단편들 또는 리드들은 각각 독립적으로 약 100 bp, 약 200 bp, 약 300 bp, 약 400 bp, 약 500 bp, 또는 약 1000　bp 의 길이를 갖는 것일 수 있다. In one embodiment, the polynucleotide fragments or lids are in the range of about 10 to about 2000 bp, about 10 to about 1000 bp, about 10 to about 500 bp, about 10 to about 300 bp, about 10 to about 200 bp, bp, about 25 to about 1000 bp, about 25 to about 500 bp, about 25 to about 300 bp, about 25 to about 200 bp, about 25 to about 100 bp, about 50 to about 2000 bp, about 50 to about 1000 bp, About 100 to about 1000 bp, about 100 to about 500 bp, about 100 to about 300 bp, about 100 to about 300 bp, about 50 to about 200 bp, about 50 to about 100 bp, about 50 to about 100 bp, About 200 bp, about 150 to about 2000 bp, about 150 to about 1000 bp, about 150 to about 500 bp, or about 150 to about 300 bp, and the lengths thereof may be the same or different. For example, the polynucleotide fragments or leads may each independently have a length of about 100 bp, about 200 bp, about 300 bp, about 400 bp, about 500 bp, or about 1000 bp.

이 때, 하나 이상의 염색체에 지정되는 폴리뉴클레오타이드 단편들 및/또는 어떠한 염색체에도 지정되지 않는 폴리뉴클레오타이드 단편들은 이후 단계에서 고려되지 않고 무시될 수 있다. At this time, polynucleotide fragments assigned to one or more chromosomes and / or polynucleotide fragments not assigned to any chromosome can be ignored and not considered in a later step.

상기 대규모 병렬 염기서열분석은, 예컨대 454 플랫폼(platform) (Margulies, 등, Nature (2005) 437:376~380), Illumina Genome Analyzer (또는 Solexa™ platform), Illumina HiSeq2000, HisSeq2500, MiSeq, NextSeq500, Life Tech Ion PGM, Ion Proton, Ion S5, Ion S5XL, 또는 SOLiD (Applied Biosystems) 또는 Helicos True Single Molecule DNA 서열분석 기술 (Harris, 등, Science (2008) 320:106~109), Pacific Biosciences의 단일 분자, 및/또는 실시간(SMRT™) 기술 등에 의하여 수행될 수 있다. 또한 나노포어 서열 분석 (Soni and Meller, Clin Chem (2007) 53:1996~2001) 상에서 가능한 대규모 병렬 염기서열분석은 표본으로부터 분리된 많은 핵산 분자들의 서열분석을 병렬 방식의 높은 차수의 멀티플렉싱(multiplexing)으로 가능하게 한다 (Dear, Brief Funct Genomic Proteomic (2003) 1:397~416). 이들 플랫폼들 각각은 핵산 단편들의 클론적으로 확장된 또는 증폭되지 않은 단일 분자들을 서열화한다. 상업적으로 입수 가능한 서열분석 기기를 사용하여 폴리뉴클레오타이드 단편들의 서열정보를 수득할 수 있다.The massive parallel sequencing can be performed using, for example, a 454 platform (Margulies, et al., Nature (2005) 437: 376-380), Illumina Genome Analyzer (or Solexa ™ platform), Illumina HiSeq2000, HisSeq2500, MiSeq, NextSeq500, Ion Proton, Ion S5, Ion S5XL, or SOLiD (Applied Biosystems) or Helicos True Single Molecule DNA sequencing technology (Harris, et al., Science (2008) 320: 106-109), a single molecule of Pacific Biosciences, And / or real-time (SMRT (TM)) technology. In addition, large-scale parallel sequencing analysis as possible on the basis of nanopore sequencing (Soni and Meller, Clin Chem (2007) 53: 1996-2001) allows high-order multiplexing of parallel- (Dear, Brief Funct. Genomic Proteomic (2003) 1: 397-416). Each of these platforms sequenced single molecules that are clonally expanded or unamplified of nucleic acid fragments. Sequence information of polynucleotide fragments can be obtained using commercially available sequencing instruments.

이 외에도 상기 서열 분석이 다른 다양한 공지된 서열분석 방법들 및 또는 이들의 변형 방법들에 의하여 수행될 수 있음은 당업자에게는 명백할 것이다.It will be apparent to those skilled in the art that the above sequence analysis may be performed by various other known sequence analysis methods and / or modified methods thereof.

1-1) 시험 시료로부터 전체 게놈을 커버하는 폴리뉴클레오타이드 단편들의 서열 정보를 얻는 단계 1-1) Covering whole genome from test sample Obtaining the sequence information of the polynucleotide fragments

상기 산모로부터 분리된 시험 시료는 산모로부터 분리된 혈액, 혈장, 또는 혈청일 수 있다. 상기 산모는 인간 여성일 수 있으며, 염색체 이수성 확인 대상인 목적 염색체가 정상인, 즉 목적 염색체의 이수성을 갖지 않는 산모일 수 있다. 상기 혈액, 혈장, 또는 혈청은 통상적인 방법으로 분리 가능하며, 임신 8~12, 12~16, 16~20, 20~24, 24~28, 28~32, 32~36, 36~40, 또는 40~44주에, 예컨대 임신 8~28주 사이에 산모로부터 분리된 것을 수 있다.The test sample separated from the mother may be blood, plasma, or serum separated from the mother. The mother may be a human female, and the target chromosome to be checked for chromosomal aberration may be normal, that is, a mother with no chromosomal aberration. The blood, plasma, or serum may be separated by conventional methods, and may be pregnant 8-12, 12-16, 16-20, 20-24, 24-28, 28-32, 32-36, 36-40, or For example, 40 to 44 weeks, for example, between 8 and 28 weeks of gestation.

상기 시험 시료의 전체 게놈을 커버(cover)하는 폴리뉴클레오타이드 단편들의 서열 정보를 얻는 단계는, Obtaining the sequence information of the polynucleotide fragments covering the entire genome of the test sample,

i) 시험 시료에 대하여 차세대 염기서열 분석과 같은 대규모 병렬 염기 서열 분석을 수행하는 단계, 또는 i) carrying out a large-scale parallel sequencing analysis, such as next generation sequence analysis, on the test sample, or

ii) 상기 i)에서 얻어진 서열 정보를 데이터 저장 매체에 저장된 형태로 준비하거나 또는 네트워크 데이터 송수신 장치를 통하여 얻는 단계ii) preparing the sequence information obtained in i) in a form stored in a data storage medium or obtaining it via a network data transmitting / receiving device

에 의하여 수행될 수 있다. Lt; / RTI >

1-2) 참조 시료의 전체 게놈을 커버하는 폴리뉴클레오타이드 단편들의 서열 정보를 얻는 단계 1-2) Covering the entire genome of a reference sample Obtaining the sequence information of the polynucleotide fragments

상기 참조 시료는 이미 '게놈의 염기 서열 정보 및 전체 게놈을 커버하는 폴리뉴클레오타이드 단편들의 서열 정보'(이하, '게놈 서열 정보'로 표현)를 알고 있는 게놈 pool로서, 목적 염색체의 이수성을 갖지 않는 태아를 임신한 정상 산모들로부터 얻은(예컨대 혈장 또는 혈청으로부터 얻음) 게놈 서열 정보 집합일 수 있다. 상기 목적 염색체의 이수성을 갖지 않는 태아를 임신한 정상 산모들로부터 얻은 게놈 서열 정보는 산모들로부터 얻은 게놈 서열 정보들 중에서 출산 후 태아가 염색체 이수성을 갖지 않는 것으로 확인된 산모들로부터 얻은 게놈 서열 정보들 중에서 선택된 것일 수 있다. 참조 시료의 개수 (산모 수 또는 게놈 수에 해당)는 특별한 제한은 없지만 데이터 처리의 편의성과 결과의 정확성을 고려하여, 상기 약 50개 내지 약 200,000개 범위에서 선택될 수 있으며, 예컨대, 상기 범위에서 (즉, 상한값을 200,000개로 하여), 약 50개 이상, 약 100개 이상, 또는 약 200개 이상에서 선택될 수 있다. 참조 시료는, 한국인, 동양인, 서양인 등 인종 별로 세분화된 게놈 서열 정보 군에서 각각 선택되거나 2 이상의 인종이 조합되도록 선택된 것일 수 있다. The reference sample is a genome pool that already knows the 'nucleotide sequence information of the genome and the sequence information of the polynucleotide fragments covering the entire genome' (hereinafter referred to as 'genome sequence information'), May be a genomic sequence information set obtained from normal pregnant mothers (e. G., From plasma or serum). The genome sequence information obtained from the normal pregnant mothers of the fetus having no chromosomal aberration of the target chromosome includes genomic sequence information obtained from the mothers obtained from the mothers of which the fetus has been confirmed to have no chromosomal aberration &Lt; / RTI > The number of reference samples (corresponding to the number of mothers or the number of genomes) is not particularly limited, but may be selected from the range of about 50 to about 200,000 in consideration of the convenience of data processing and the accuracy of the result, (That is, the upper limit is 200,000), about 50 or more, about 100 or more, or about 200 or more. The reference sample may be selected from the genome sequence information group subdivided by race, such as Korean, Oriental, and Western, or may be selected so as to combine two or more races.

상기 참조 시료의 전체 게놈을 커버(cover)하는 폴리뉴클레오타이드 단편들의 서열 정보를 준비하는 단계는 목적 염색체의 이수성을 갖지 않는 태아를 임신한 정상 산모들로부터 게놈 서열 정보들을 얻고 이 중에서 선택하거나, 이미 확보된 게놈 pool의 게놈 서열 정보들 중에서 선택하는 단계에 의하여 수행될 수 있다. The step of preparing the sequence information of the polynucleotide fragments covering the entire genome of the reference sample includes obtaining the genome sequence information from the normal pregnant mothers of the fetus that does not have the abnormality of the target chromosome, And genome sequence information of the genome pool of the genome pool.

단계 2) 폴리뉴클레오타이드 단편 수( polynucleotide fragment count)를 결정하는 단계 Step 2) The polynucleotide fragment can (polynucleotide fragment count), the result determining step

상기 단계 2)는 시험 시료 및 참조 시료 각각의 폴리뉴클레오타이드 단편들의 서열 정보를 표준 게놈 염기 서열 (Reference genome sequence)과 비교(mapping)하여, 각 염색체 별로 미리 설정된 bin 개수 (bin number)를 갖도록 폴리뉴클레오타이드 단편 수(polynucleotide fragment count)를 결정하는 단계이다.In step 2), sequence information of the polynucleotide fragments of each of the test sample and the reference sample is mapped to a reference genome sequence, and polynucleotides (hereinafter referred to as " polynucleotides " And determining the polynucleotide fragment count.

2-1) 시험 2-1) Test 폴리뉴클레오타이드Polynucleotide 단편 수를 결정하는 단계 Determining the number of fragments

상기 단계 2-1)은 시험 시료로부터 얻어진, 표준 게놈 염기서열에 맵핑된 전체 게놈 서열을 커버하는 폴리뉴클레오타이드 단편들의 서열 정보를 대상으로, 임의의 개수 (B개)의 bin 개수(bin number)를 갖도록 시험 폴리뉴클레오타이드 단편 수 (polynucleotide fragment count 또는 리드 수 (read count))를 계산하여 폴리뉴클레오타이드 단편 수 벡터 (polynucleotide fragment count vector 또는 리드 수 벡터 (read count vector))를 생성하는 단계에 의하여 수행될 수 있다.The above step 2-1) is a step for obtaining a bin number of an arbitrary number (B number) for the sequence information of the polynucleotide fragments covering the entire genome sequence mapped to the standard genomic nucleotide sequence obtained from the test sample (Polynucleotide fragment count or read count) to produce a polynucleotide fragment count vector (read count vector). The polynucleotide fragment count vector may be a polynucleotide fragment count vector or a read count vector. have.

예컨대, 시험 시료의 폴리뉴클레오타이드 단편 수 또는 리드 수 벡터 (S)는 아래의 수식 1으로 표현될 수 있다:For example, the number of polynucleotide fragments or the lead number vector (S) of a test sample can be expressed by the following equation:

(수식 1) (rc: read count; B: bin 개수)

(Formula 1) (rc: read count; B: number of bin)

상기 식에서 rc는 read count를 의미하며, 실험적으로 얻어지는 값이다.In the above equation, rc denotes a read count, which is an experimentally obtained value.

일 예에서, 상기 bin 개수는 각 bin이 약 10,000개 내지 약 20,000,000개, 약 20,000개 내지 약 15,000,000개, 약 30,000개 내지 약 10,000,000개, 또는 약 50,000개 내지 약 1,000,000개의 뉴클레오타이드를 포함하도록 하는 값으로 선택될 수 있다. 예컨대, bin 개수는 약 1 내지 약 30,000, 약 1 내지 약 10,000 , 약 1 내지 약 5,000, 약 1 내지 약 1,000, 약 1 내지 약 500, 약 2 내지 약 30,000, 약 2 내지 약 10,000, 약 2 내지 약 5,000, 약 2 내지 약 1,000, 약 2 내지 약 500, 약 5 내지 약 30,000, 약 5 내지 약 10,000, 약 5 내지 약 5,000, 약 5 내지 약 1,000 , 약 5 내지 약 500, 약 10 내지 약 30,000, 약 10 내지 약 10,000, 약 10 내지 약 5,000, 약 10 내지 약 1,000 , 약 10 내지 약 500, 약 20 내지 약 30,000, 약 20 내지 약 10,000, 약 20 내지 약 5,000, 약 20 내지 약 1,000, 약 20 내지 약 500 , 약 50 내지 약 30,000, 약 50 내지 약 10,000, 약 50 내지 약 5,000, 약 50 내지 약1,000, 약 50 내지 약 500, 약 100 내지 약 30,000, 약 100 내지 약 10,000, 약 100 내지 약 5,000, 약 100 내지 약1,000, 또는 약 100 내지 약 500 범위에서 선정될 수 있다. In one example, the bin number is a value such that each bin comprises from about 10,000 to about 20,000,000, from about 20,000 to about 15,000,000, from about 30,000 to about 10,000,000, or from about 50,000 to about 1,000,000 nucleotides Can be selected. For example, the bin number may range from about 1 to about 30,000, from about 1 to about 10,000, from about 1 to about 5,000, from about 1 to about 1,000, from about 1 to about 500, from about 2 to about 30,000, from about 2 to about 10,000, From about 5,000 to about 5,000, from about 5 to about 1,000, from about 5 to about 500, from about 10 to about 30,000, from about 5,000, from about 2,000 to about 1,000, from about 2 to about 500, from about 5 to about 30,000, from about 5 to about 10,000, From about 10 to about 10,000, from about 10 to about 5,000, from about 10 to about 1,000, from about 10 to about 500, from about 20 to about 30,000, from about 20 to about 10,000, from about 20 to about 5,000, from about 20 to about 1,000, From about 50 to about 500, from about 50 to about 30,000, from about 50 to about 10,000, from about 50 to about 5,000, from about 50 to about 1,000, from about 50 to about 500, from about 100 to about 30,000, from about 100 to about 10,000, About 5,000, about 100 to about 1,000, or about 100 to about 500.

2-2) 참조 Refer to 2-2) 폴리뉴클레오타이드Polynucleotide 단편 수를 결정하는 단계 Determining the number of fragments

상기 단계 2-2)는 확보된 참조 시료 pool에서 선택된 N개의 참조 시료군의 폴리뉴클레오타이드 단편들의 서열 정보를 대상으로 B개의 bin 개수(bin number)를 갖도록 폴리뉴클레오타이드 단편 수 (polynucleotide fragment count 또는 리드 수 (read count))를 계산하여 참조 폴리뉴클레오타이드 단편 수 행렬 (또는 참조 리드 수 행렬 (Reference read count matrix))을 생성하는 단계에 의하여 수행될 수 있다. The polynucleotide fragment count or the number of leads of the polynucleotide fragments is set so as to have B bin numbers for the sequence information of the polynucleotide fragments of the N reference sample groups selected in the secured reference pool. (or a read count) to generate a reference polynucleotide fragment number matrix (or a reference read count matrix).

예컨대, 참조 시료의 폴리뉴클레오타이드 단편 수 또는 리드 수 행렬 (R)은 아래의 수식 2 및 수식 3으로 표현될 수 있다:For example, the number of polynucleotide fragments or the lead number matrix R of the reference sample can be expressed by the following Equation 2 and Equation 3:

(수식 2)

(Equation 2)

(수식 3)

(Equation 3)

(B: bin 개수; N: 참조시료 개수)(B: number of bin; N: number of reference sample)

단계 a) 바이어스 제거 단계Step a) Bias removal step

단계 a)는 얻어진 폴리뉴클레오타이드 단편 수 값에서 바이어스를 제거하여 보다 정확한 결과를 도출하기 위한 것으로, 단계 2)와 단계 3) 사이에 추가로 수행되는 것일 수 있다. Step a) may be performed further between step 2) and step 3) in order to remove the bias from the obtained polynucleotide fragment number value to obtain a more accurate result.

상기 단계 a)는 시험 폴리뉴클레오타이드 단편 수 및 참조 폴리뉴클레오타이드 단편 수에 대하여 SVD (Singular Value Decomposition) 를 적용하여 바이어스를 제거하는 단계에 의하여 수행될 수 있다.The step a) may be performed by removing the bias by applying SVD (Singular Value Decomposition) to the number of test polynucleotide fragments and the number of reference polynucleotide fragments.

일 예에서, 상기 단계 a)는 SVD 를 적용하여 수행될 수 있으며, 이 경우, 다음의 수식 4-7과 같이, i) 참조 폴리뉴클레오타이드 단편 수 행렬과 시험 폴리뉴클레오타이드 단편 수 벡터를 결합하여 행렬 X를 생성하는 단계, ii) 결합된 행렬을 대상으로 SVD 수행하는 단계, iii) 분해된 Diagonal 행렬 D에 대해서 Singular value의 합의 50% 이내, 45% 이내, 40% 이내, 35% 이내, 30% 이내, 25% 이내, 20% 이내, 15% 이내, 또는 10% 이내, 예컨대, 1 내지 50%, 1 내지 45%, 1 내지 40%, 1 내지 35%, 1 내지 30%, 1 내지 25%, 1 내지 20%, 1 내지 15%, 1 내지 10%, 5 내지 50%, 5 내지 45%, 5 내지 40%, 5 내지 35%, 5 내지 30%, 5 내지 25%, 5 내지 20%, 5 내지 15%, 또는 5 내지 10%가 되는 singular value 상위 s개를 선정하는 단계, iv) 행렬 D에서 해당 singular value의 값을 0으로 치환하여 바이어스가 제거된 diagonal 행렬 D^BR를 생성하는 단계, 및 v) 행렬 D^BR를 이용하여 바이어스가 제거된 행렬 X^BR를 생성하는 단계를 포함할 수 있다.In one example, step a) may be performed by applying SVD, wherein i) the number of reference polynucleotide fragments and the number of test polynucleotide fragments are combined to form matrix X Ii) performing SVD on the combined matrix; iii) within 50%, within 45%, within 40%, within 35%, within 30% of the sum of the singular values for the decomposed diagonal matrix D , Within 25%, within 20%, within 15%, or within 10% such as 1 to 50%, 1 to 45%, 1 to 40%, 1 to 35%, 1 to 30%, 1 to 25% 5 to 35%, 5 to 35%, 5 to 30%, 5 to 25%, 5 to 20%, 1 to 20%, 1 to 15%, 1 to 10%, 5 to 50% 5 to 15%, or 5 to 10%; iv) generating a biased diagonal matrix D ^BR by replacing the value of the corresponding singular value in the matrix D by 0 , And v) generating a matrix X ^BR with the bias removed using the matrix D ^BR .

(수식 4-1)

(Expression 4-1)

(수식 4-2)

(Equation 4-2)

(수식 4-3)

(Equation 4-3)

(수식 5)

(Equation 5)

(수식 6)

(Equation 6)

(수식 7)

(Equation 7)

(상기 식에서, UDV^T는 바이어스 제거 전 SVD로 분해된 행렬, UD^BRV^T는 바이어스 제거 후 분해된 행렬을 의미한다)(Where UDV ^T is the matrix decomposed into the SVD before the bias removal and UD ^BR V ^T means the matrix decomposed after the bias removal)

본원 명세서 도 3에서 확인되는 바와 같이, SVD를 적용함으로써 GC 함량과 무관하게 폴리뉴클레오타이드 단편 수 (read count)가 일정한 수준을 유지함을 알 수 있으며, 이러한 결과는 SVD 적용에 의하여 GC 바이어스가 제거됨을 보여주는 것이다.As can be seen in FIG. 3, by applying SVD, it can be seen that the number of polynucleotide fragments (read count) remained constant regardless of the GC content, which indicates that the GC bias is removed by SVD application will be.

단계 3) 평균 Step 3) Average 폴리뉴클레오타이드Polynucleotide 단편 snippet 수 비율을Number ratio 얻는 단계 Steps to Obtain

상기 단계 3)은 목적 염색체의 평균 폴리뉴클레오타이드 단편 수를 상기 목적 염색체를 제외한 다른 염색체의 평균 폴리뉴클레오타이드 단편 수와 비교하여 그 비율을 구함으로써, 실험 간 편차를 제거하고 미량의 태아 염색체에 대한 이수성 확인 결과의 민감성을 보다 개선시키는데 기여한다.In step 3), the average number of polynucleotide fragments of the target chromosome is compared with the average number of polynucleotide fragments of chromosomes other than the target chromosome, and the ratio is determined. Contributing to further improving the sensitivity of the results.

인간의 경우, 1 내지 22번까지의 염색체로 이루어진 군에서 선택될 수 있다. 상기 '목적 염색체'는 태아의 염색체 이수성 여부를 확인하고자 하는 염색체로서, 예컨대, 인간의 13번, 18번 또는 21번 염색체일 수 있으나, 이에 제한되는 것은 아니며, 염색체 이수성 여부를 확인하고자 하는 모든 상염색체 중에서 선택될 수 있다. 상기 '목적 염색체를 제외한 다른 염색체 중에서 선택된 n개의 염색체'는 상기 염색체 이수성 여부를 확인하고자 하는 목적 염색체 이외의 나머지 상염색체들 중에서 선택된 염색체이다. n은 1 내지 21 중에서 선택되는 정수이다. 일 예에서, n은 21, 즉 인간의 22개 상염색체 중에서 목적 염색체를 제외한 21개의 염색체 각각의 평균 폴리뉴클레오타이드 단편 수를 평균 폴리뉴클레오타이드 단편 수 비율을 구하는데 사용 할 수 있다.In the case of humans, it can be selected from the group consisting of chromosomes 1 to 22. The 'target chromosome' is a chromosome for confirming the chromosomal integrity of the fetus. For example, the chromosome may be a chromosome 13, 18 or 21 of a human, but is not limited thereto. May be selected from chromosomes. The 'n chromosomes selected from other chromosomes other than the target chromosome' are chromosomes selected from the rest of the autosomes other than the target chromosome to be checked for chromosomal isolation. n is an integer selected from 1 to 21; In one example, n can be used to determine the ratio of the average number of polynucleotide fragments to the average number of polynucleotide fragments of each of the 21 chromosomes, except for the target chromosome, of 21 human autosomal chromosomes.

상기 "평균 폴리뉴클레오타이드 단편 수"는 목적 염색체 또는 merged bin 등의 boundary 내에 존재하는 모든 폴리뉴클레오타이드 단편 수 또는 리드 수를 평균하여 얻을 수 있다.The "average number of polynucleotide fragments" can be obtained by averaging the number of polynucleotide fragments or the number of leads present in the boundary of the target chromosome or merged bin.

상기 "목적 염색체를 제외한 다른 염색체의 평균 폴리뉴클레오타이드 단편 수"는 각 염색체 대상 임의로 정한 일정한 길이를 갖도록 각 bin을 통합한 영역 (merged bin)에 해당하는 폴리뉴클레오타이드 단편 수의 평균값이다.The "average number of polynucleotide fragments of chromosomes other than the target chromosome" is an average value of the number of polynucleotide fragments corresponding to a merged bin in which each bin is integrated so as to have a predetermined length arbitrarily determined for each chromosome.

일 구체예에서, 평균 시험 폴리뉴클레오타이드 단편 수 비율 또는 평균 참조 폴리뉴클레오타이드 단편 수 비율은 다음 단계에 의하여 계산될 수 있다:In one embodiment, the ratio of the average number of test polynucleotide fragments or the average number of reference polynucleotide fragments can be calculated by the following steps:

i) Merged Bin의 평균 size인 mb_size를 전체 Bin 개수를 전체 상염색체 개수인 22와 사전에 설정된 k개를 곱한 값으로 나누어 정하고, 각 염색체 별로 mb_size의 길이를 갖도록 bin을 통합하는 단계i) Determining mb _size , which is the average size of Merged Bin, divided by the total number of binomial multiplied by 22, the total number of autosomes, and k set in advance, and integrating the bins to have a length of mb _size for each chromosome

(수식 8); 및

(Eq. 8); And

ii) 목적 염색체 i 및 목적 염색체를 제외한 염색체의 각 merged bin j에 대해서 평균 값을 구하여, 이들 간 비율 (Read count ratio)을 얻는 단계ii) obtaining a mean value for each merged bin j of a chromosome other than chromosome i and a target chromosome to obtain a read count ratio

(수식 9).

(Equation 9).

*μ_chri는 목적 염색체 i의 평균 리드 수이고, μ_mbj는 merged bin j의 평균 리드 수 이다. 상기 k값은 사용자에 의해 선정되는 값으로, 일례로 1 내지 20, 1 내지 15, 1 내지 10, 또는 1 내지 5의 값을 사용할 수 있다. * μ _chri is the average number of leads of the target chromosome i, and μ _mbj is the average number of leads of the merged bin j. The k value is a value selected by the user, and may be a value of 1 to 20, 1 to 15, 1 to 10, or 1 to 5, for example.

3-1) 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계3-1) obtaining the ratio of the number of average test polynucleotide fragments

상기 단계 3-1)은 상기 시험 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체를 대상으로 생성된 merged bin각각의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (상기 비율은 merged bin의 개수 개 만큼 얻어짐)일 수 있다.Wherein said step (3-1) is a step of selecting, from said test polynucleotide fragment water, n number of average polynucleotide fragments of the target chromosome to be tested, The ratio of the average number of polynucleotide fragments to the number of the average number of polynucleotide fragments of each of the merged bins generated in the chromosome of the number of merged bins (obtained as a number of merged bins).

구체적으로, 상기 단계 3-1)은 시험 폴리뉴클레오타이드 단편 수 (또는 시험 리드 수)를 대상으로, 목적 염색체의 평균 폴리뉴클레오타이드 단편 수 (또는 시험 리드 수) 및 상기 목적 염색체를 제외한 n개의 염색체를 대상으로 생성된 merged bin 각각의 평균 폴리뉴클레오타이드 단편 수(또는 평균 리드 수)를 취하여 이들 간 비율 [목적 염색체의 평균 폴리뉴클레오타이드 단편 수 (또는 평균 리드 수)/ merged bin 평균 폴리뉴클레오타이드 단편 수(또는 평균 리드 수)] (Read count ratio)를 계산하여(상기 비율은 merged bin의 개수개 만큼 얻어짐), 평균 시험 폴리뉴클레오타이드 단편 수 비율 벡터 (또는 평균 시험 리드 수 비율 벡터) (Case read count ratio vector)를 생성하는 단계에 의하여 수행될 수 있다. i번째 염색체 (chromosome i; 목적 염색체)의 다른 염색체에 대한 평균 시험 폴리뉴클레오타이드 단편 수 비율 벡터(RCR_chri)는 아래의 수식 10으로 표현될 수 있다 (mbm: merged bin number):Specifically, the above step 3-1) is a step of examining the number (or the number of test leads) of the average number of polynucleotide fragments of the target chromosome and n chromosomes except for the target chromosome in the number of test polynucleotide fragments (or the number of test leads) The average number of polynucleotide fragments (or the average number of leads) of the target chromosome is calculated by taking the number of the average polynucleotide fragments (or the average number of leads) (The ratio is obtained by the number of merged bins), the average number of test polynucleotide fragments (or the average test lead count ratio vector) (Case read count ratio vector) is calculated And the like. The mean test polynucleotide fragment number ratio vector (RCR _chri ) for other chromosomes of the i-th chromosome (chromosome i) can be expressed as mbm: merged bin number:

(수식 10)

(Equation 10)

3-2) 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계3-2) Step of obtaining the ratio of the number of average reference polynucleotide fragments

상기 단계 3-1)은 상기 참조 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체를 대상으로 생성된 각각의 merged bin의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (상기 비율은 (참조 시료 수) x merged bin의 개수 (mbm) 개 만큼 얻어짐)일 수 있다.Wherein said step (3-1) is a step in which, in said reference polynucleotide fragment water, n number of average polynucleotide fragments of the target chromosome to be tested for the test are selected from among chromosomes other than said target chromosome (n is selected from 1 to 21 (The number of reference samples) x the number of merged bin (mbm (number of reference samples)) is obtained by obtaining the ratio of the average number of polynucleotide fragments to the number of average polynucleotide fragments of each merged bin generated on chromosome ) &Lt; / RTI >

구체적으로, 상기 단계 3-2)는 N개의 참조 시료로부터 얻어진 참조 폴리뉴클레오타이드 단편 수 (또는 참조 리드 수)를 대상으로, 목적 염색체의 평균 폴리뉴클레오타이드 단편 수 (또는 참조 리드 수) 및 상기 목적 염색체를 제외한 mbm 개의 merged bin 평균 폴리뉴클레오타이드 단편 수(또는 평균 리드 수)를 취하여 이들 간 비율 [목적 염색체의 평균 폴리뉴클레오타이드 단편 수 (또는 평균 리드 수)/ merged bin 평균 폴리뉴클레오타이드 단편 수(또는 평균 리드 수)] (Read count ratio)을 계산하여(상기 비율은 참조 시료 수 (N) * mbm 개 만큼 얻어짐), 평균 참조 폴리뉴클레오타이드 단편 수 비율 행렬 (또는 참조 리드 수 비율 행렬) (Reference read count ratio matrix)를 생성하는 단계에 의하여 수행될 수 있다. i번째 염색체 (chromosome i)의 다른 염색체에 대한 평균 참조 폴리뉴클레오타이드 단편 수 비율 행렬 (RCRM_chri)는 아래의 수식 11로 표현될 수 있다:Specifically, the above step 3-2) is a step of determining the number (or the number of reference leads) of the average polynucleotide fragments of the target chromosome and the number of reference polynucleotide fragments (or the number of reference leads) obtained from the N reference samples, The average number of polynucleotide fragments (or the average number of leads) / the number of merged bin average polynucleotide fragments (or the average number of leads) of the target chromosome is calculated by taking the number of mbm merged bin average polynucleotide fragments (or the average number of leads) The reference read count ratio matrix (or reference read count ratio matrix) (the ratio is obtained by the number of reference samples (N) * mbm), and the average read polynomial . [0033] FIG. The average reference polynucleotide fragment number ratio matrix (RCRM _chri ) for other chromosomes of the i-th chromosome i can be expressed by the following equation:

(수식 11)(Equation 11)

단계 4) CV (Coefficient of Variation) 값을 얻는 단계Step 4) Step of obtaining CV (Coefficient of Variation)

상기 단계 4)는 상기 얻어진 평균 참조 폴리뉴클레오타이드 단편 수 비율 행렬로부터 각 평균 폴리뉴클레오타이드 단편 수 비율 별 CV (Coefficient of Variation) 값을 얻는 단계이다.Step 4) is a step of obtaining a CV (Coefficient of Variation) value for each average polynucleotide fragment number ratio from the obtained average number of reference polynucleotide fragments.

구체적으로, 상기 단계는 각 염색체 별 평균 폴리뉴클레오타이드 단편 수 비율 (평균 리드 수 비율) 및 merged bin 평균 폴리뉴클레오타이드 단편 수 비율 (평균 리드 수 비)(RCRi,j)에 대해 참조 시료군을 대상으로 CV를 계산하여 수행될 수 있다. i번째 염색체 (chromosome i)에 대한 CV (CV_chri)는 다음의 수식 12로 얻어질 수 있다:Specifically, the above steps are performed on the reference sample group with respect to the ratio of the average number of polynucleotide fragments (average number of leads) and the number of merged bin average polynucleotide fragments (average number of leads) (RCRi, j) . &Lt; / RTI > The CV (CV _chri ) for the i-th chromosome i can be obtained by the following equation:

(수식 12)

(Equation 12)

상기 식에서, σRCR_n,mbm 은 참조 시료군을 대상으로 계산된 각 염색체별, merged bin별 리드수 비의 표준편차를 나타내고, μRCR_n,mbm 은 참조 시료군을 대상으로 계산된 각 염색체별, merged bin 별 리드수 비의 평균을 나타낸다.In the above equation,? RCR _{n, mbm} ΜRCR _{n and mbm} are the average of the lead counts for each chromosome and merged bin calculated for the reference sample group, .

단계 5) 가중 평균 폴리뉴클레오타이드 단편 수 비율을 얻는 단계Step 5) obtaining the ratio of the number of weighted average polynucleotide fragments

단계 5는 단계 3과 더불어 결과의 신뢰도와 정확성을 보다 높이기 위한 것으로, 상기 얻어진 목적 염색체 별 평균 폴리뉴클레오타이드 단편 수 비율 (mbn 개) 중에서 CV가 낮은 순서로 임의의 개수를 선택하고, 여기에 상기 4단계에서 구한 각 단편 수 비율에 해당하는 CV의 역수를 곱하여 얻어진 수치의 평균값(가중 평균 폴리뉴클레오타이드 단편 수 비율)을 사용하는 것을 특징으로 한다. Step 5 is to increase the reliability and accuracy of the result in addition to step 3. The number of average polynucleotide fragments (mbn pieces) per target chromosome obtained is arbitrarily selected in descending order of CV, (The ratio of the number of weighted average polynucleotide fragments) obtained by multiplying the reciprocal number of CVs corresponding to the ratio of the number of fragments obtained in the above step.

5-1) 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계5-1) obtaining the ratio of the number of weighted average test polynucleotide fragments

구체적으로, 단계 5-1)은 상기 단계 4)에서 각 염색체 chr_i 별로 참조 시료군을 대상으로 계산된 CV 값을 기준으로, CV 값이 적은 상위 N_CV개의 평균 폴리뉴클레오타이드 단편 수 비율을 선택한 후, 평균 시험 폴리뉴클레오타이드 단편 수 비율들을 대상으로 각 폴리뉴클레오타이드 단편 수 비율에 해당하는 CV 값으로 가중 평균 (CV의 역수를 곱하여 평균을 구함)된 가중 평균 폴리뉴클레오타이드 단편 수 비율 값을 계산하여 수행될 수 있다. 일례로, 상기 N_CV는 Cv_chri의 최소값 대비 약 1.1배 이상, 약 1.3배 이상, 약 1.5배 이상, 약 1.7배 이상, 약 2배 이상, 또는 약 3 배 이상 큰 값을 가진 평균 폴리뉴클레오타이드 단편 수 비율 값 (RCR), 예컨대, Cv_chri의 최소값 대비 약 1.1배 내지 약 5 배, 약 1.1배 내지 약 3배, 약 1.1배 내지 약 2배, 약 1.3배 내지 약 5배, 약 1.3 내지 약 3 배, 약 1.3배 내지 약 2배, 약 1.5배 내지 약 5배, 약 1.5배 내지 약 3배, 약 1.5 내지 약 2배, 약 1.7배 내지 약 5배, 약 1.7배 내지 약 3배, 약 1.7배 내지 약 2배, 약 2배 내지 약 5배, 또는 약 2배 내지 약 3배 큰 값을 가진 평균 폴리뉴클레오타이드 단편 수 비율 값(RCR)들을 선택할 수 있으나, 이에 제한되는 것은 아니고, 실험적 및/또는 경험적으로 적절한 값을 선택할 수 있다.Specifically, in step 5-1), each chromosome chr _i The ratio of the average number of polynucleotide fragments of the upper N _CVs having a small CV value was selected based on the CV value calculated for the reference sample group. The ratio of the average number of polynucleotide fragments to the number of polynucleotide fragments And calculating the weighted average polynucleotide fragment number ratio (by multiplying the reciprocal number of CV by averaging) with the corresponding CV value. For example, the N _CV is an average polynucleotide fragment having a value of at least about 1.1 fold, at least about 1.3 fold, at least about 1.5 fold, at least about 1.7 fold, at least about 2 fold, or at least about 3 fold greater than the minimum value of Cv _chri the percentage value (RCR), for example, about 1.1 times to about 5 times the minimum contrast of Cv _chri, about 1.1 times to about 3 times, about 1.1 times to about 2 times, about 1.3 times to about 5 times, about 1.3 to about From about 1.5 times to about 3 times, from about 1.5 times to about 2 times, from about 1.7 times to about 5 times, from about 1.7 times to about 3 times, from about 1.3 times to about 2 times, from about 1.5 times to about 5 times, Average number of polynucleotide fragments (RCR) values ranging from about 1.7-fold to about 2-fold, from about 2-fold to about 5-fold, or about 2-fold to about 3-fold can be selected, And / or empirically appropriate values.

일 예에서, i번째 염색체 (chromosome i)의 가중 평균 폴리뉴클레오타이드 단편 수 비율 (WRCR_chri)은 다음의 수식 13으로 얻을 수 있다:In one example, the ratio of the number of weighted average polynucleotide fragments (WRCR _chri ) of the i-th chromosome (chromosome i) can be obtained by the following equation:

(수식 13) (Equation 13)

5-2) 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계5-2) obtaining the ratio of the number of weighted average reference polynucleotide fragments

참조 시료군에 대해서도 각 참조 시료 별 (총 N개) 및 염색체 별 상위 N_CV개의 평균 폴리뉴클레오타이드 단편 수 비율 값을 대상으로, 각 폴리뉴클레오타이드 단편 수 비율에 해당하는 CV 값으로 가중 평균 (CV의 역수를 곱하여 평균을 구함)된 가중 평균 폴리뉴클레오타이드 단편 수 비율 값을 계산하여, 참조 가중 평균 폴리뉴클레오타이드 단편 수 비율 벡터를 생성할 수 있다.For the reference samples, the ratio of the number of average polynucleotide fragments to the number of N polynucleotides of the upper N _CVs per each reference sample (N total) and each chromosome was calculated. The CV value corresponding to the ratio of the number of polynucleotide fragments was used as a weighted average To obtain a mean), the ratio weighted average polynucleotide fragment number ratio value can be calculated to generate the reference weighted average polynucleotide fragment number ratio vector.

일 예에서, i번째 염색체 (chromosome i)의 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 벡터 (R_WRCRchri)를 아래의 수식 14로 구할 수 있다:In one example, the weighted average reference polynucleotide fragment number ratio vector (R _WRCRchri ) of the i-th chromosome i (chromosome i) can be calculated by the following equation:

(수식 14)(14)

단계 6) 가중 평균 폴리뉴클레오타이드 단편 수 비율을 비교하는 단계Step 6) comparing the ratio of the number of weighted average polynucleotide fragments

상기 단계 6)의 비교하는 단계는 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 비교하는 단계로서, 상기 비교는 목적 염색체의 Z-score를 얻는 단계에 의하여 수행될 수 있다.Wherein the step of comparing step 6) comprises comparing the ratio of the weighted average number of test polynucleotide fragments to the ratio of the number of weighted average reference polynucleotide fragments, and the comparison may be performed by obtaining the Z-score of the target chromosome .

예컨대, 목적 염색체 (염색체 i)의 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율 값과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 벡터를 비교하여 아래의 수식 15로 Z-score (Z_{CV-ratio.chri})를 계산할 수 있다:For example, the Z-score (Z _{CV-ratio.chri} ) can be calculated by the following equation (15) by comparing the ratio of the weighted average number of test polynucleotide fragments of the target chromosome (chromosome i) to the weighted average number of polynucleotide fragments have:

(수식 15)

(Equation 15)

상기 수식 15 에서,

는 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 벡터의 평균을 의미하고,

는 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 벡터의 표준편차를 의미한다.In Equation (15)

Means the average of the weighted average number of reference polynucleotide fragments,

Means the standard deviation of the weighted average reference polynucleotide fragment number ratio vector.

단계 7) 태아 염색체 이수성을 확인하는 단계Step 7) Step of confirming fetal chromosomal integrity

상기 단계 6)에서 얻어진 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율 비교 결과를 기초로 태아 염색체 이수성 여부를 판별할 수 있다. 즉, 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 비교 결과, 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율이 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 보다 유의하게 높거나 낮게 나타날수록 목적 염색체의 이수성 가능성이 높다고 판단할 수 있다.The fetal chromosomal integrity can be determined based on the comparison result of the weighted average number of test polynucleotide fragments obtained in the step 6). That is, as a result of the ratio of the weighted average number of test polynucleotide fragments to the weighted average number of reference polynucleotide fragments, the proportion of the weighted average test polynucleotide fragments is significantly higher or lower than the ratio of the weighted average reference polynucleotide fragments, We believe that the possibility of a successful bid is high.

예컨대, 상기 폴리뉴클레오타이드 단편 수 비율 비교가 Z-score에 의하여 수행되는 경우, Z-score 값이 클수록 태아의 목적 염색체의 이수성 가능성이 높다고 판단할 수 있다. For example, when the polynucleotide fragment number ratio comparison is performed by the Z-score, it can be judged that the larger the Z-score value, the higher the probability of the fetal chromosomal aberration.

일 예에서, 목적 염색체 (염색체 i)에 대한 Z-score (Z_CV _-ratio. _chri)의 절대값이 특정 수치 이상, 예컨대, 약 3 이상인 경우, 시험 시료의 태아 염색체 중 염색체 i에 염색체 이수성이 존재하는 것으로 판별할 수 있다:In one example, when the absolute value of the target chromosome (chromosome _{i) Z-score (Z CV} -ratio. Chri) for more than a certain value or more, e.g., about 3, chromosomal aneuploidy in the chromosome i of fetal chromosome of the test sample is Can be determined to exist:

(수식 16)

(Expression 16)

상기 제시된 비침습적 태아 염색체 분석 방법의 각 단계는 컴퓨터와 같은 정보 처리 및 판독 장치를 통하여 수행될 수 있다.Each step of the non-invasive fetal chromosome analysis method described above can be performed through an information processing and reading apparatus such as a computer.

본 발명의 다른 예는 비침습적 태아 염색체 분석을 위한 정보 처리 시스템 (컴퓨터)을 제공한다. 상기 시스템은 앞서 설명한 비침습적 태아 염색체 분석 방법에 사용하기 위해 적용되는 수단들을 포함하는 시스템일 수 있다. 상기 시스템은 Another example of the present invention provides an information processing system (computer) for non-invasive fetal chromosome analysis. The system may be a system that includes means applied for use in the non-invasive fetal chromosome analysis method described above. The system

1) 서열분석기 (sequencer) 또는 서열 정보를 포함하는 computer-readable 정보 저장 매체; 및1) a computer-readable information storage medium comprising a sequencer or sequence information; And

2) 상기 서열 분석기로부터 정보 수신이 가능하거나 상기 정보 저장 매체 내의 정보의 판독이 가능한 정보 처리 및 판독 매체 (컴퓨터)2) an information processing and reading medium (computer) capable of receiving information from the sequence analyzer or reading information in the information storage medium,

을 포함하는 것일 수 있다.. &Lt; / RTI >

상기 시스템은 산모로부터 분리된 생물학적 시료 및/또는 다수의 폴리뉴클레오타이드 단편들 (예컨대, 앞서 설명한 바와 같은 시험 시료 폴리뉴클레오타이드 단편들 및/또는 참조 시료 폴리뉴클레오타이드 단편들)을 추가로 포함할 수 있다.The system may further comprise a biological sample separated from the mother and / or a plurality of polynucleotide fragments (e.g., test sample polynucleotide fragments and / or reference sample polynucleotide fragments as described above).

한편, 본 명세서에 기재된 방법 및 정보는 상기 기재된 단계를 실행시킬 수 있는 프로그램을 통하여 공지된 컴퓨터 판독 가능한 매체 상에서 구현될 수 있다. 보다 구체적으로, 상기 제시된 비침습적 태아 염색체 분석 방법 및/또는 각 단계에서 얻어진 정보들은, 컴퓨터에 의해 실행 가능한 프로그램(computer executable instruction)으로서, 공지된 컴퓨터 판독 가능한 매체 상에서 전체적 또는 부분적으로 구현 및/또는 처리될 수 있다. 예컨대, 본 명세서에 기재된 방법은 하드웨어에 결합되어 구현될 수 있다. 상기 하드웨어는 컴퓨터, 표준 다목적(multi-purpose) CPU, ASIC(application-specific integrated circuit) 또는 다른 하드-와이어드 장치(hard-wired device)와 같은 특수하게 설계된 하드웨어 또는 펌웨어를 의미하는 것일 수 있으며, 이하 사용되는 용어 '컴퓨터'는 이들을 총칭하기 위한 것일 수 있다.On the other hand, the methods and information described herein can be implemented on a computer-readable medium known through a program capable of executing the steps described above. More specifically, the non-invasive fetal chromosome analysis method and / or the information obtained in each step described above may be implemented as a computer executable instruction, in whole or in part, on a known computer readable medium and / Lt; / RTI > For example, the methods described herein may be implemented in hardware. The hardware may be a specially designed hardware or firmware, such as a computer, a standard multi-purpose CPU, an application-specific integrated circuit (ASIC) or other hard-wired device, The term " computer " used may be for generic purposes.

본 발명의 또 다른 예는, 다음의 단계를 포함하는 태아의 염색체 이수성 판단을 위한 컴퓨터 판독 방법을 제공한다:Another example of the present invention provides a computer readable method for determination of chromosomal integrity of a fetus comprising the steps of:

A-1) 시험 시료의 폴리뉴클레오타이드 단편들의 서열 정보를 표준 게놈 염기 서열 (Reference genome sequence)과 비교(mapping)하여, 각 염색체 별로 미리 설정된 bin 개수 (bin number)를 갖도록 시험 폴리뉴클레오타이드 단편 수(polynucleotide fragment count)를 결정하는 단계 (앞서 설명한 단계 2-1)에 해당), A-1) The sequence information of the polynucleotide fragments of the test sample is mapped to the reference genome sequence, and the number of test polynucleotide fragments (bin number) fragment count) (the above-described step 2-1)),

A-2) 참조 시료의 폴리뉴클레오타이드 단편들의 서열 정보를 이용하여 미리 설정된 bin 개수를 갖도록 참조 폴리뉴클레오타이드 단편 수를 결정하는 단계 (앞서 설명한 단계 2-2)에 해당);A-2) determining the number of reference polynucleotide fragments so as to have a preset bin number using sequence information of polynucleotide fragments of a reference sample (step 2-2 described above);

B-1) 상기 시험 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체로부터 생성된 각각의 merged bin의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (상기 비율은 merged bin의 개수 개 만큼 얻어짐) (앞서 설명한 단계 3-1)에 해당);B-1) The number of the average number of polynucleotide fragments of the target chromosome to be tested, of which number is n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome in the test polynucleotide fragment water Obtaining a ratio of the number of average polynucleotide fragments to the number of average polynucleotide fragments of each merged bin generated from the chromosome (the ratio is obtained by the number of merged bins) (step 3-1 described above) );

B-2) 상기 참조 폴리뉴클레오타이드 단편 수 중에서, 이수성을 시험하고자 하는 목적 염색체의 평균 폴리뉴클레오타이드 단편 수의, 상기 목적 염색체를 제외한 다른 염색체 중에서 선택된 n개 (n은 1 내지 21 중에서 선택되는 정수)의 염색체를 대상으로 생성된 각각의 merged bin의 평균 폴리뉴클레오타이드 단편 수에 대한 비율을 구하여 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (상기 비율은 참조 시료 수 * merged bin의 개수 개 만큼 얻어짐) (앞서 설명한 단계 3-2)에 해당);B-2) The number of the average polynucleotide fragments of the target chromosome to be tested for the number of the reference polynucleotide fragments, wherein n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome Obtaining the ratio of the number of the average polynucleotide fragments to the number of the average polynucleotide fragments of each merged bin generated on the chromosome (the ratio is obtained by the number of the reference samples * merged bin) Corresponds to step 3-2) described above);

C) 평균 참조 폴리뉴클레오타이드 단편 수 비율 별로 CV (Coefficient of Variation) 값을 얻는 단계 (앞서 설명한 단계 4)에 해당); C) obtaining the CV (Coefficient of Variation) value by the ratio of the number of average reference polynucleotide fragments (corresponding to step 4 described above);

D-1) 상기 단계 B-1)의 평균 시험 폴리뉴클레오타이드 단편 수 비율 중에서 CV값이 적은 상위 N_CV개에 해당하는 수치들을 선정하여, 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (앞서 설명한 단계 5-1)에 해당); D-1) obtaining a ratio of the weighted average number of test polynucleotide fragments by selecting values corresponding to the upper N _CVs having a lower CV value from the ratio of the average number of test polynucleotide fragments in the step B-1) 5-1));

D-2) 상기 단계 B-2)의 평균 참조 폴리뉴클레오타이드 단편 수 비율을 대상으로 상기 단계 D-1)에서 선정된 CV값이 적은 상위 N_CV개에 해당하는 수치들을 이용하여, 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 얻는 단계 (앞서 설명한 단계 5-2)에 해당); D-2) Using the numerical values corresponding to the upper N _CVs having the smallest CV value determined in the above step D-1 for the ratio of the average number of reference polynucleotide fragments in step B-2), the weighted average reference poly Obtaining the ratio of the number of nucleotide fragments (corresponding to step 5-2 described above);

E) 상기 얻어진 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율을 비교하는 단계 (앞서 설명한 단계 6)에 해당); 및E) comparing the ratio of the obtained number of weighted average test polynucleotide fragments to the ratio of the number of weighted average reference polynucleotide fragments (step 6 described above); And

F) 상기 단계 E)에서 얻어진 가중 평균 시험 폴리뉴클레오타이드 단편 수 비율과 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 비교 결과 (예컨대, Z-score)를 이용하여 태아의 목적 염색체의 이수성 여부를 확인하는 단계 (앞서 설명한 단계 7)에 해당).F) confirming whether the target chromosome of the fetus is complementary or not using the comparison result (for example, Z-score) of the ratio of the weighted average number of test polynucleotide fragments obtained in step E) to the weighted average number of polynucleotide fragments Corresponds to step 7) described above.

상기한 각 단계의 상세 사항은 앞서 설명한 바와 같다.The details of each step described above are as described above.

상기 컴퓨터 판독 방법은 컴퓨터 판독 가능한 매체 상에서 컴퓨터에서 실행 가능한 프로그램으로서 구현될 수 있다.The computer-readable method may be embodied as a computer-executable program on a computer-readable medium.

다른 예는 상기 컴퓨터 판독 방법의 단계를 실행시키기 위하여 컴퓨터 판독 가능한 저장 매체에 저장된 컴퓨터 프로그램을 제공한다. 상기 컴퓨터 판독 가능한 저장 매체에 저장된 컴퓨터 프로그램은 하드웨어와 결합된 것일 수 있다. 상기 컴퓨터 판독 가능한 저장 매체에 저장된 컴퓨터 프로그램은 상기한 바와 같은 컴퓨터 판독 방법의 각 단계를 컴퓨터에서 실행시키기 위한 프로그램이며, 이 때 상기한 모든 단계가 하나의 프로그램에 의하여 실행되거나, 하나 이상의 단계를 실행하는 두 개 이상의 프로그램에 의하여 실행될 수 있다.Another example provides a computer program stored on a computer readable storage medium for executing the steps of the computer readable method. The computer program stored on the computer readable storage medium may be combined with hardware. A computer program stored in the computer-readable storage medium is a program for causing a computer to execute each step of the computer readable method as described above, wherein all of the above steps are executed by one program, Lt; RTI ID = 0.0 > and / or < / RTI >

다른 예는 상기 컴퓨터 판독 방법의 단계를 실행시키기 위한 컴퓨터에서 실행 가능한 프로그램(computer executable instruction)이 수록된 컴퓨터 판독 가능한 저장 매체 (또는 기록 매체)를 제공한다. Another example provides a computer readable storage medium (or recording medium) containing a computer executable instruction for executing the steps of the computer readable method.

상기 컴퓨터에서 실행 가능한 프로그램은 컴퓨터 판독 가능한 저장 매체 (예컨대, 메모리 등)에 저장되고, 하나 이상의 프로세서 상에 구현된 소프트웨어로 구현될 수 있다. 일반적으로 알려진 바와 같이, 프로세서는 하나 이상의 컨트롤러(controller), 연산 유닛(calculation unit) 및/또는 컴퓨터 시스템의 다른 유닛과 결합되거나, 적절한 펌웨어(firmware)에 이식될 수 있다. 상기 프로그램이 소프트웨어에 이식되는 경우, RAM (Random Access Memory), ROM (Read Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), 플래쉬 메모리 (e.g., USB(Universal Serial Bus) 메모리, SD(Secure Digital) 메모리, SSD(Soli State Drive), CF (Compact Flash) 메모리, xD 메모리 등), 자기 디스크, 레이저 디스크, 또는 기타 저장 매체와 같은 컴퓨터 판독가능한 저장 매체에 저장될 수 있다. 상기 컴퓨터 판독 가능한 저장 매체에 저장된 프로그램 또는 소프트웨어는, 예컨대, 전화선, 인터넷, 무선 접속 등과 같은 통신 채널 상에서, 또는 컴퓨터 판독가능한 디스크, 플래쉬 드라이브 등과 같은, 휴대용 매체(transportable medium)를 통한 것을 포함하는 모든 공지된 전달 방법을 통하여 컴퓨터 장치에 전달될 수 있다. The computer-executable program may be stored in a computer-readable storage medium (e.g., memory or the like) and may be embodied in software embodied on one or more processors. As is generally known, a processor may be coupled to one or more controllers, a calculation unit, and / or other units of a computer system, or may be ported to the appropriate firmware. When the program is transferred to the software, it is possible to use a random access memory (RAM), a read only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a flash memory Digital storage, solid state drive (SSD), Compact Flash (CF) memory, xD memory, etc.), magnetic disks, laser disks, or other storage media. The program or software stored in the computer-readable storage medium may be stored on a communication channel such as, for example, a telephone line, the Internet, a wireless connection, or the like, or via a transportable medium, such as a computer readable disk, a flash drive, And transmitted to the computer device through a known transmission method.

상기한 바와 같은 다양한 단계들이 통상적으로 알려진 다양한 블록, 작업(operation), 툴, 모듈, 및 하드웨어, 펌웨어, 소프트웨어, 또는 하드웨어, 펌웨어 및/또는 소프트웨어의 조합에서 구현될 수 있는 기법으로서 구현될 수 있다. 하드웨어에서 구현되는 경우, 블록, 작업, 기법 등의 일부 또는 전부가, 예컨대, 맞춤화 집적 회로(custom IC), ASIC(application specific integrated circuit), FPGA(field programmable logic array), PLA(programmable logic array) 등에서 구현될 수 있다. 소프트웨어에서 구현되는 경우, 소프트웨어는 자기 디스크, 광 디스크, 또는 다른 저장 매체와 같은 공지된 컴퓨터 판독가능한 매체, 컴퓨터의 RAM, 또는 ROM 또는 플래쉬 메모리, 프로세서, 하드 디스크 드라이브, 광 디스크 드라이브, 테이프 드라이브 등에 저장될 수 있다. 또한, 소프트웨어는, 예컨대, 컴퓨터 판독가능한 디스크 또는 다른 휴대용 컴퓨터 저장 메카니즘을 포함한 공지된 전달 방법을 통해 사용자 또는 컴퓨터 시스템에 전달될 수 있다. The various steps as described above may be implemented as a variety of commonly known blocks, operations, tools, modules, and techniques that may be implemented in hardware, firmware, software, or a combination of hardware, firmware, and / . Some or all of the blocks, operations, techniques, etc., when implemented in hardware, may be implemented in a custom IC, an application specific integrated circuit (ASIC), a field programmable logic array (FPGA), a programmable logic array (PLA) And the like. When implemented in software, the software may be stored in a computer-readable medium, such as a magnetic disk, an optical disk, or other storage medium, a RAM or ROM or flash memory, a processor, a hard disk drive, an optical disk drive, Lt; / RTI > In addition, the software may be communicated to a user or computer system through a known delivery method, including, for example, a computer readable disc or other portable computer storage mechanism.

상기 컴퓨터 판독 방법, 프로그램, 및 저장매체는 다수의 다른 범용(general purpose) 또는 특수 목적 컴퓨팅 시스템 환경 또는 구조에서 운영될 수 있다. 상기 컴퓨터 판독 방법, 프로그램, 및 저장매체를 실행하기에 적합한 컴퓨팅 시스템, 환경, 및/또는 구조는 예컨대, 퍼스널 컴퓨터(PC), 서버 컴퓨터, 휴대용 또는 랩탑(laptop) 장치, 멀티프로세서 시스템, 마이크로프로세서-기반 시스템, 셋탑 박스, 프로그램가능한(programmable) 가전(consumer electronics), 네트워크 PC, 미니컴퓨터, 메인프레임 컴퓨터, 및/또는 상기한 시스템 또는 장치를 포함하고 통신 네트워크를 통해 연결된 원격 처리 장치들에 의해 수행되는 분산 컴퓨팅(distributed computing) 환경 등을 포함할 수 있으나, 이에 제한되지 않는다. 통합 컴퓨팅 환경 및 분산 컴퓨팅 환경 모두에서, 프로그램 모듈은 메모리 저장 장치를 포함한, 로컬 및 원격 컴퓨터 저장 매체에 위치될 수 있다. The computer readable method, program, and storage medium may operate in a number of different general purpose or special purpose computing system environments or configurations. The computing system, environment, and / or architecture suitable for executing the computer readable method, program, and storage medium may be, for example, a personal computer (PC), a server computer, a portable or laptop device, Based systems, set top boxes, consumer electronics, network PCs, minicomputers, mainframe computers, and / or remote processing devices that include those systems or devices and are connected via a communication network Distributed computing environments, and the like, and the like. In both an integrated computing environment and a distributed computing environment, program modules may be located in local and remote computer storage media, including memory storage devices.

컴퓨터는 통상적으로 다양한 컴퓨터 판독가능한 매체를 포함할 수 있다. 컴퓨터 판독가능한 매체는 컴퓨터에 의해 접근 가능하고 이용 가능한 매체일 수 있고 휘발성 매체 및 비휘발성 매체, 이동성(removable) 매체 및 비이동성 매체를 포함할 수 있다. 예컨대, 컴퓨터 판독가능한 매체는 컴퓨터 저장 매체 및/또는 통신 매체(communication media)를 포함할 수 있다. Computers typically include a variety of computer readable media. The computer-readable medium can be a computer-accessible and usable medium and can include volatile and non-volatile media, removable media, and non-removable media. For example, the computer readable medium may comprise computer storage media and / or communication media.

상기 컴퓨터 저장 매체는 컴퓨터 판독가능한 명령어, 데이터 구조, 프로그램 모듈 및/또는 기타 데이터와 같은 정보의 저장을 위한 방법 또는 기술에서 구현된, 휘발성 또는 비휘발성, 및/또는 이동성 또는 비이동성 매체를 포함할 수 있다. 컴퓨터 저장 매체는 RAM, ROM, EEPROM, 플래쉬 메모리(e.g., USB 메모리, SD 메모리, SSD, CF 메모리, xD 메모리 등), 자기 디스크, 레이저디스크, 또는 기타 메모리, CD-ROM, DVD(digital versatile disk) 또는 기타 광학적 디스크, 자기 카세트(magnetic cassette), 자기테이프, 자기 디스크 저장 또는 기타 자기 저장 장치, 또는 원하는 정보를 저장하기 위해 이용될 수 있고 컴퓨터에 의해 접근 가능한 모든 매체들 중에서 하나 이상 선택될 수 있으나, 이에 제한되지 않는다. The computer storage media includes volatile or nonvolatile, and / or portable or non-removable media implemented in a method or technology for storage of information such as computer readable instructions, data structures, program modules and / or other data . Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory (e.g., USB memory, SD memory, SSD, CF memory, xD memory, etc.), magnetic disk, laser disk, or other memory, CD- ) Or other optical disk, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage device, or any medium that can be used to store the desired information and is accessible by a computer But is not limited thereto.

상기 통신 매체는 통상적으로 컴퓨터 판독가능한 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파(carrier wave)와 같은 모듈화 데이터 신호 (modulated data signal) 중 데이터 전송 또는 기타 전송(transport) 메카니즘을 구현하는 정보 전달 매체(information delivery media)를 포함할 수 있다. 용어 "모듈화 데이터 신호(modulated data signal)"는 신호에 정보를 코딩하는 방식으로 설정되거나 변경된 하나 이상의 특징을 갖는 신호를 의미한다. 예컨대, 상기 통신 매체는 유선 네트워크 또는 직접-유선 연결(direct-wired connection)과 같은 유선 매체, 및 음향(acoustic) 매체, RF, 적외선 및 기타 무선 매체와 같은 무선 매체들 중에서 하나 이상 선택될 수 있다. The communication medium typically includes computer readable instructions, a data structure, a program module, or an information delivery medium that implements a data transmission or other transport mechanism of a modulated data signal such as a carrier wave information delivery media). The term "modulated data signal" means a signal having one or more characteristics set or changed in such a manner as to code information in the signal. For example, the communication media may be selected from one or more of wired media such as a wired network or direct-wired connection, and wireless media such as acoustic media, RF, infrared and other wireless media .

상기한 매체들 중 하나 이상의 조합도 컴퓨터 판독 가능한 매체의 범위 내에 포함될 수 있다.Combinations of one or more of the above media may also be included within the scope of computer readable media.

본 명세서에서 제안되는 비침습적 태아 염색체 분석 방법은, 산모로부터 태아 비침습적으로 분리된 생물학적 시료로부터 얻어진 DNA 서열 정보로부터 태아의 염색체 이수성 여부를 판별하는데 있어서, 이수성 여부를 판별하고자 하는 특정 염색체의 평균 리드 수와 상기 염색체를 제외한 다른 염색체의 평균 리드 수를 비교하여 실험간 편차를 제거하고, CV (Coefficient of Variation) 값으로 가중 평균된 염색체간 리드 수의 비율을 이용함으로써, 결과의 신뢰도 및 특이도 향상시켜 위양성 확률을 줄일 수 있어서, 태아에 위해 없이 비침습적으로 안전하고 정확하게 태아의 염색체 이수성 여부를 판단할 수 있다.The noninvasive fetal chromosomal analysis method proposed in the present specification is a method for analyzing chromosomal integrity of a fetus from DNA sequence information obtained from a biological sample non-invasively isolated from a fetus, And the average number of leads of chromosomes other than the above chromosome is removed to eliminate the deviation between the experiments and the ratio of the weighted average number of interchromosomal leads to the CV (Coefficient of Variation) value is used to improve the reliability and specificity of the result And the probability of false positives can be reduced, so that it can be safely and accurately judged whether or not the chromosomes of the fetus are non-invasively harmless to the fetus.

도 1은 일 예에 따른 비침습적 태아 염색체 이수성 판별 방법의 각 단계를 예시적으로 보여주는 모식도이다.
도 2는 태아 염색체 이수성 판별 결과 얻어진 Z-score를 나타낸 그래프로서, A는 기존 방법에 따른 결과이고, B는 본 명세서에서 제안된 방법에 따른 것이다.
도 3은 SVD 적용 전 후의 GC 바이어스 제거 양상을 보여주는 그래프로서, Y축은 리드수 비율(read count fraction), X축은 GC 함량 (GC content)를 의미한다.FIG. 1 is a schematic diagram showing an example of each step of a non-invasive fetal chromosome aberration discrimination method according to an example.
FIG. 2 is a graph showing the Z-score obtained as a result of discrimination of fetal chromosomal integrity, wherein A is the result according to the conventional method and B is according to the method proposed in this specification.
FIG. 3 is a graph showing the manner of removing the GC bias before and after applying the SVD, wherein the Y axis indicates the read count fraction and the X axis indicates the GC content (GC content).

이하에서는 실시예를 들어 본 발명을 더욱 구체적으로 설명하고자 하나, 이는 예시적인 것에 불과할 뿐 본 발명의 범위를 제한하고자 함이 아니다. 아래 기재된 실시예들은 발명의 본질적인 요지를 벗어나지 않는 범위에서 변형될 수 있음은 당 업자들에게 있어 자명하다. Hereinafter, the present invention will be described in more detail with reference to the following examples, which should not be construed as limiting the scope of the present invention. It will be apparent to those skilled in the art that the embodiments described below may be modified without departing from the essential spirit of the invention.

실시예Example 1: 시험 시료 준비 및 염기 서열 분석 1: Test sample preparation and nucleotide sequence analysis

시험 대상 임신 8~28 주 산모의 전혈 10ml을 채취하여, 5ml의 혈장을 분리하였다. 상기 분리된 혈장으로부터 Qiagen 사의 QIAamp Circulating Nucleic Acid Kit를 사용하여cfDNA(cell-free DAN)를 추출한 뒤, 추출된 cfDNA를 이용하여 NGS library를 생성하고, Illumina사의 MiSeq NGS 기기에 sequencing하여 FASTQ data를 생성하였다. 이 때, 사용된 리드는 200bp 길이를 갖도록 하여 시험을 수행하였다.Test subjects 10 ml of whole blood from 8 to 28 weeks of pregnancy was collected and 5 ml of plasma was isolated. From the separated plasma, cfDNA (cell-free DAN) was extracted using Qiagen's QIAamp Circulating Nucleic Acid Kit, and NGS library was generated using the extracted cfDNA and sequenced to Illumina's MiSeq NGS instrument to generate FASTQ data Respectively. At this time, the test was carried out so that the lead used had a length of 200 bp.

실시예Example 2: 참조 시료의 염기 서열 준비 2: Preparation of base sequence of reference sample

산모(시험 대상 산모를 제외)들로부터 전혈 10ml을 채취하여, 5ml의 혈장을 분리하였다. 실시예 1의 방법을 참조하여, 상기 분리된 혈장으로부터 cfDNA를 추출한 뒤, 추출된 cfDNA를 이용하여 NGS library를 생성하고, NGS 기기에 sequencing하여 FASTQ data를 생성하였다. 이들 중에서 태아가 염색체 이수성을 갖지 않은 것으로 확인된 산모의 데이터들을 선택하여 이하 시험에서 참조 시료로 사용하였다 (참조 시료 수 = 100). 10 ml of whole blood was collected from the mothers (excluding the mothers to be tested), and 5 ml of plasma was separated. With reference to the method of Example 1, cfDNA was extracted from the separated plasma, NGS library was generated using the extracted cfDNA, and FASTQ data was generated by sequencing the NGS library. Among these data, maternal data confirmed to have no fetal chromosomal aberration was selected and used as a reference sample in the following test (reference sample count = 100).

실시예Example 3: 리드 카운트 결정 3: Lead count determination

상기 준비된 시험 시료로부터 얻어진 리드의 염기서열을 표준 게놈 염기서열(hg18, hg19, 또는 hg38; NCBI 제공)에 맵핑하고, bin별 30,000 개~ 10,000,000 개의 뉴클레오타이드가 포함되도록 약 100개 ~ 30,000개의 bin 개수(bin number)를 갖도록 시험 리드 수를 계산하여, 아래와 같이 시험 리드 수 벡터 (read count vector; S)를 생성하였다.The nucleotide sequence of the lead obtained from the prepared test sample is mapped to the standard genomic base sequence (hg18, hg19, or hg38; provided by NCBI), and about 100 to 30,000 bin numbers including 30,000 to 10,000,000 nucleotides bin number), and a read count vector (S) is generated as follows.

(수식 1) (rc: read count; B(bin 개수)=100~30,000)

(Formula 1) (rc: read count; B (number of bins) = 100 to 30,000)

또한, 상기 준비된 참조 시료 염기 서열 정보를 이용하여 참조 리드 수 행렬 (R)을 아래와 같이 생성하였다:Using the prepared reference sample base sequence information, a reference read number matrix R was generated as follows:

(수식 2)

(Equation 2)

(수식 3)

(Equation 3)

(B (bin 개수): 100~30,000; N (참조시료 개수): 100)(B (number of bins): 100 to 30,000; N (number of reference samples): 100)

실시예 4: 바이어스 제거Example 4: Bias removal

상기 얻어진 시험 시료의 리드 수 벡터 (S)와 참조 시료의 리드 수 행렬 (R)에 대하여, 아래의 방법으로 바이어스를 제거하였다. The lead number vector (S) of the obtained test sample and the lead number matrix (R) of the reference sample were removed in the following manner.

우선, 참조 리드 수 행렬과 시험 리드 수 벡터를 결합하여 행렬 X를 생성하고 결합된 행렬을 대상으로 아래의 과정으로 SVD 수행하였다:First, a matrix X is generated by combining the reference lead number matrix and the test lead number vector, and SVD is performed on the combined matrix as follows:

(수식 4-1)(N: 100)

(Formula 4-1) (N: 100)

(수식 4-2)

(Equation 4-2)

(수식 4-3)

(Equation 4-3)

(B: 100~30,000; N: 100)(B: 100 to 30,000; N: 100)

분해된 Diagonal 행렬 D에 대해서 Singular value의 합의 5~50 % 이내가 되는 singular value 상위 s개를 선정하고,For the decomposed diagonal matrix D, the top s of singular values that are within 5 ~ 50% of the sum of the singular values are selected,

(수식 5)

(Equation 5)

행렬 D에서 해당 singular value의 값을 0으로 치환하여 바이어스가 제거된 diagonal 행렬 D^BR를 생성한 후,After generating the diagonal matrix D ^BR with the bias removed by replacing the value of the corresponding singular value in the matrix D by 0,

(수식 6)

(Equation 6)

행렬 D^BR를 이용하여 바이어스가 제거된 행렬 X^BR를 생성하였다:The matrix D ^BR was used to generate a biased matrix X ^BR :

(수식 7).

(Equation 7).

상기와 같이 SVD를 적용하여 바이어스가 제거된 경우의 GC 함량을 염색체별 구아닌 (G), 사이토신 (C)의 염기 수의 합 / 염색체별 전체 염기 수의 합으로 측정하여, 바이어스가 제거되지 않은 경우와 비교하여, 도 3에 나타내었다. As described above, the GC content when the bias was removed by applying SVD was measured as the sum of the number of bases of guanine (G) and cytosine (C) per chromosome / the total number of bases per chromosome, This is shown in Fig. 3 in comparison with the case.

도 3에서 확인되는 바와 같이, SVD를 적용함으로써 GC 함량과 무관하게 폴리뉴클레오타이드 단편 수 (read count)가 일정한 수준을 유지함을 알 수 있으며, 이러한 결과는 SVD 적용에 의하여 GC 바이어스가 제거됨을 보여주는 것이다.As can be seen in FIG. 3, it can be seen that, by applying SVD, the number of polynucleotide fragments (read count) remained constant regardless of the GC content. This result shows that the GC bias is eliminated by SVD application.

실시예 5: 평균 폴리뉴클레오타이드 단편 수 비율을 계산Example 5: Calculating the ratio of the number of average polynucleotide fragments

Merged Bin의 평균 size인 mb_size를 전체 Bin 개수를 전체 상염색체 개수인 22와 사전에 설정된 k개를 곱한 값으로 나누어 정하고, 각 염색체 별로 mb_size의 길이를 갖도록 bin을 통합하였다: The mb _size , the average size of the Merged Bin, is divided by the total number of autosomes, 22, multiplied by the number of k pre-set, and the bins are combined to have a length of mb _size for each chromosome:

(수식 8).

(Equation 8).

(B=100~30,000, k=1~10)(B = 100 to 30,000, k = 1 to 10)

이수성을 확인하고자 하는 13, 18, 또는 21번째 염색체 및 상기 염색체를 제외한 염색체의 각 merged bin j에 대해서 평균 값을 구하여, 이들 간 비율 (Read count ratio)을 구하였다:The mean values of the merged bin j of chromosome 13, 18, or 21, and the chromosome other than the chromosome were determined, and the read count ratio was obtained.

(수식 9).

(Equation 9).

(μ_chri: 목적 염색체 i의 평균 리드 수, μ_mbj는 merged bin j의 평균 리드 수; i: 13, 18, 또는 21).(μ _chri : average number of leads of target chromosome i, μ _mbj is the average number of leads of merged bin j; i: 13, 18, or 21).

실험 시료를 대상으로 각 염색체 (chr_i) 별, merged bin 별 리드 수 비 (Read count ratio) 를 계산하여, 평균 시험 리드 수 비 벡터 (Case read count ratio vector; RCR_chri)를 다음과 같이 생성하였다:The read count ratio for each chromosome (chr _i ) and merged bin was calculated for the experimental sample, and the average read test ratio vector (RCR _chri ) was generated as follows :

(수식 10)

(Equation 10)

(mbm: merged bin number) (mbm: merged bin number)

참조 시료를 대상으로 각 염색체 chr_i에 대해서 Read count ratio를 계산하여 염색체 별 참조 리드 수 비 행렬 (Reference read count ratio matrix)을 다음과 같이 생성하였다:For the reference samples, the read count ratio for each chromosome chr _i was calculated and the reference read count ratio matrix for each chromosome was generated as follows:

(수식 11)(Equation 11)

실시예 6: CV (Coefficient of Variation) 값 계산Example 6 Calculation of Coefficient of Variation (CV)

참조 시료에 대하여 각 염색체 별, merged bin별 리드 수 비 (RCR_i,j)에 대해 다음과 같이 CV를 계산하였다:CV for each chromosome and merged bin lead count (RCR _{i, j} ) for the reference sample was calculated as follows:

(수식 12)

(Equation 12)

(σRCR_n,mbm: 참조 시료군을 대상으로 계산된 각 염색체별, merged bin별 리드수 비의 표준편차, μRCR_n,mbm: 참조 시료군을 대상으로 계산된 각 염색체별, merged bin별 리드수 비의 평균)(σRCR _{n, mbm} : Standard deviation of the number of merged bin lead counts for each chromosome, μRCR _{n, mbm} calculated for the reference sample group, and the number of merged bin-specific leads for each reference chromosome group Average of rain)

실시예 7: 가중 평균 리드 수 비율 계산Example 7: Calculation of the weighted average number of leads

상기 실시예 6에서 각 염색체 chr_i 별로 참조 시료를 대상으로 계산된 CV 값을 기준으로, CV 값이 적은 상위 N_CV개의 리드 수 비를 선택한 뒤, 실험 시료의 평균 리드 수 비율 값들을 대상으로 리드 수 비에 해당하는 CV 값으로 가중 평균된 가중 평균 리드 수 비 값을 아래의 수식 13으로 계산하였다.In Example 6, each chromosome chr _i The number of lead N _CVs with the smallest CV value is selected based on the CV value calculated for the reference sample, and then the average lead number ratio of the test sample is weighted by the CV value corresponding to the lead number The average weighted average number of readings ratio was calculated by the following equation (13).

(수식 13)(Equation 13)

본 실시예에서 N_CV는 Cv_chri 의 최소값 대비 1.1 내지 5배 큰 값을 가진 리드 수 비까지의 값들을 선택하였다. , In the present embodiment, N _CV is Cv _chri To the number of leads having a value 1.1 to 5 times larger than the minimum value of the number of leads. ,

참조 시료의 평균 리드 수 비율 값들에 대해서도 염색체 별 상위 N_CV개 값의 리드 수 비 값을 대상으로 상기와 같은 과정을 수행하여, 참조 가중 평균 리드 수 비 벡터를 아래와 같이 생성하였다:For the average lead count values of the reference samples, the same procedure as above was performed on the lead count value of the upper N _{CV of} each chromosome to generate the reference weighted average lead count vector as follows:

(수식 14)(14)

실시예 8: 태아 염색체 이수성의 판단Example 8: Judgment of fetal chromosomal integrity

상기 실시예 7에서 얻어진 염색체 별 실험 시료의 가중 평균 리드 수 값과 참조 시료의 가중 평균 리드 수 비 벡터를 행렬을 벡터를 비교하여 다음과 같이 Z-score를 계산하였다:The Z-score was calculated as follows by comparing the weighted average number of leads of the chromosome experimental sample obtained in Example 7 and the weighted average number of leads of the reference sample:

(수식 15)

(Equation 15)

(

: 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 벡터의 평균;

: 가중 평균 참조 폴리뉴클레오타이드 단편 수 비율 벡터의 표준 편차) (

: Average of weighted average reference polynucleotide fragment number ratio vector;

: Standard deviation of weighted average reference polynucleotide fragment number ratio vector)

Z-score의 절대값이 3 이상이면, 해당 샘플의 태아 염색체에 이수성이 있는 것으로 판별하였다.If the absolute value of the Z-score is 3 or more, it is determined that the fetus chromosome of the corresponding sample is biologically stable.

(수식 16)

(Expression 16)

*상기 얻어진 결과를 도 2에 나타내었다 (아래 부분). 비교를 위하여, 실시예 4의 SVD를 적용하여 바이어스를 제거하는 단계 [단계 a) 해당], 실시예 5의 폴리뉴클레오타이드 단편 수 비율을 계산하는 단계 [단계 3-1) 및 3-2)에 해당], 및 실시예 7의 가중 평균 리드수 비율을 계산하는 단계[단계 4), 5-1), 및 5-2)에 해당]를 수행하지 않고, 실시예 3 (단계 2-1) 및 2-2)에 해당)에서 얻어진 시험 리드 수 벡터와 참조 리드 수 행렬을 사용하여 중간 단계 없이 실시예 8 [단계 6)에 해당]을 참조하여 Z-score를 계산하여 태아 염색체 이수성을 판단한 결과를 도 2에 포함시켰다 (윗 부분). The results obtained above are shown in Fig. 2 (lower part). For comparison, the step of removing the bias by applying the SVD of Example 4 [step a)], the step of calculating the ratio of the number of polynucleotide fragments of Example 5 [Step 3-1] and the step of 3-2) (Step 4), 5-1), and 5-2) of calculating the weighted average number of leads in Example 7, and calculating the weighted average number of leads in Example 7 -2)) and the reference lead number matrix obtained from the test lead number vector obtained in the step (2)), the Z-score was calculated by referring to Example 8 [Step 6] without an intermediate step and the fetal chromosomal integrity was determined 2 (upper part).

도 2에 나타난 바와 같이, 본 실시예의 방법에 의하여 산모 혈액으로부터 태아 염색체 이수성을 판별한 결과, 기존 방법은 태아 염색체 이수성이 확인된 총 17개의 시험 시료 중 3개의 시험 시료는 태아 염색체 이수성이 없는 것으로 판별하였으나, 본 실시예에 따른 방법은 총 17개의 시험 시료 모두에 대해서 태아 염색체 이수성이 있는 것으로 판별한 것을 알 수 있다. 이러한 결과는 본 실시예에 따른 방법에 의하여 태아 염색체 이수성 판별의 정확도가 개선된 것을 보여주는 것이다.As shown in FIG. 2, the fetal chromosomal aberration was determined from maternal blood according to the method of this embodiment. As a result, in the conventional method, three of the 17 test samples confirmed to have fetal chromosomal integrity were not fetal chromosomal insoluble . However, the method according to the present embodiment judges that all of the 17 test samples have fetal chromosomal integrity. These results show that the accuracy of fetal chromosomal discrimination is improved by the method according to the present embodiment.

Claims

A method for analyzing sequence information for chromosomal integration of a fetus, comprising the steps of:
1-1) obtaining sequence information of polynucleotide fragments covering the whole genome from a test sample which is blood, plasma or serum separated from a mother;
1-2) obtaining sequence information of polynucleotide fragments covering the entire genome of the reference sample;
2-1) Sequence information of the polynucleotide fragments of the test sample obtained in the above step 1-1) is mapped to a reference genome sequence, and a bin number previously set for each chromosome is calculated Determining a polynucleotide fragment count to have the polynucleotide fragment count,
2-2) determining the number of reference polynucleotide fragments so as to have a predetermined bin number using sequence information of the polynucleotide fragments of the reference sample prepared in the step 1-2);
3-1) The number of the average polynucleotide fragments of the target chromosome to be tested for the number of the test polynucleotide fragment water is selected from the number of n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome Obtaining a ratio of the number of the average polynucleotide fragments of each merged bin generated on the chromosome to the number of the average number of test polynucleotide fragments (the ratio is obtained by the number of merged bins);
3-2) The number of the average polynucleotide fragments of the target chromosome to be tested for isolation in the reference polynucleotide fragment water is calculated by dividing the number of n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome Obtaining a ratio of the number of average polynucleotide fragments to the number of average polynucleotide fragments of each merged bin generated on chromosomes (the ratio is obtained by number of reference samples * merged bin);
4) obtaining a CV (Coefficient of Variation) value by the ratio of the number of average reference polynucleotide fragments, wherein CV (CV _chri ) for the i-th chromosome i can be obtained by the following equation 12,

(Equation 12)
In the above equation, σRCR _{n and mbm} represent standard deviations of the number of merged bin-specific lead counts for each chromosome and reference samples, μRCR _{n and mbm} represent merged represents the average of the bin lead readings;
5-1) Selecting the values corresponding to the upper N _CVs having a smaller CV value for the CV value calculated in the step 4) from the ratio of the average number of test polynucleotide fragments in the step 3-1) 3-1) to the ratio of the average number of test polynucleotide fragments to the reciprocal of the CV value calculated in step 4) to obtain a weighted average number of test polynucleotide fragments;
5-2) Among the ratio of the average number of reference polynucleotide fragments in step 3-2), values corresponding to the upper N _CVs having a smaller CV value are selected from the CV values calculated in step 4) -2) to the ratio of the average number of reference polynucleotide fragments to the reciprocal of the CV value calculated in step 4) to obtain a weighted average number of reference polynucleotide fragments; And
6) comparing the ratio of the weighted average number of test polynucleotide fragments obtained to the weighted average number of reference polynucleotide fragments obtained.

The method according to claim 1, wherein after the steps 2-1) and 2-2)
a) removing the bias of the number of test polynucleotide fragments and the number of reference polynucleotide fragments
/ RTI > The method of claim 1, further comprising:

3. The method of claim 2, wherein the bias elimination is performed by applying SVD (Singular Value Decomposition).

◈ Claim 4 is abandoned due to the registration fee.

4. The method according to any one of claims 1 to 3, wherein the chromosome is autosomal.

◈ Claim 5 is abandoned due to the registration fee.

4. The method according to any one of claims 1 to 3, wherein the target chromosome is chromosome 13, 18 or 21 of human.

◈ Claim 6 is abandoned due to the registration fee.

4. The method according to any one of claims 1 to 3, wherein the mother does not have the desired chromosomal complementarity.

4. An information processing system comprising means adapted to carry out the method for analyzing sequence information according to any one of claims 1 to 3.

A computer readable method for determining chromosomal integrity of a fetus comprising the steps of:
A-1) The sequence information of the polynucleotide fragments of the test sample is mapped to the reference genome sequence, and the number of test polynucleotide fragments (bin number) determining a fragment count,
A-2) determining the number of reference polynucleotide fragments so as to have a preset bin number using sequence information of polynucleotide fragments of the reference sample;
B-1) The number of the average number of polynucleotide fragments of the target chromosome to be tested, of which number is n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome in the test polynucleotide fragment water Obtaining a ratio of the number of average polynucleotide fragments of each merged bin generated from the chromosome to the number of average number of test polynucleotide fragments (the ratio is obtained by number of merged bins);
B-2) The number of the average polynucleotide fragments of the target chromosome to be tested for the number of the reference polynucleotide fragments, wherein n (n is an integer selected from 1 to 21) selected from the chromosomes other than the target chromosome Obtaining a ratio of the number of average polynucleotide fragments to the number of average polynucleotide fragments of each merged bin generated on chromosomes (the ratio is obtained by number of reference samples * merged bin);
C) obtaining a value of CV (Coefficient of Variation) according to the ratio of the number of average reference polynucleotide fragments, wherein CV (CV _chri ) for the i-th chromosome i can be obtained by the following equation (12)

(Equation 12)
In the above equation, σRCR _{n and mbm} represent standard deviations of the number of merged bin-specific lead counts for each chromosome and reference samples, μRCR _{n and mbm} represent merged represents the average of the bin lead readings;
D-1) selecting the values corresponding to the upper N _CVs having a smaller CV value for the CV values calculated in the step C) from the ratio of the average number of test polynucleotide fragments in the step B-1) Multiplying the ratio of the number of the average number of test polynucleotide fragments of B-1) to the reciprocal of the CV value calculated in step C) to obtain a weighted average number of test polynucleotide fragments;
D-2) Among the ratios of the average number of reference polynucleotide fragments in the step B-2), values corresponding to the upper N _CVs having a smaller CV value are selected for the CV values calculated in the step C) -2) to the ratio of the average number of reference polynucleotide fragments to the reciprocal of the CV value calculated in step 4) to obtain a weighted average number of reference polynucleotide fragments;
E) comparing the ratio of the number of weighted average test polynucleotide fragments obtained to the weighted average number of reference polynucleotide fragments obtained; And
F) confirming whether or not the target chromosome of the fetus is complementary, using the ratio of the weighted average number of test polynucleotide fragments obtained in the step E) to the weighted average number of polynucleotide fragments obtained in the step E).

◈ Claim 9 is abandoned upon payment of registration fee.

9. The method according to claim 8, wherein after steps A-1) and A-2)
a) applying a Singular Value Decomposition (SVD) to remove the bias of the number of test polynucleotide fragments and the number of reference polynucleotide fragments.

A computer program stored in a computer-readable storage medium for executing the steps of the computer readable method of claim 8 or 9 in combination with hardware.

◈ Claim 11 is abandoned due to registration fee.

9. The computer readable method according to claim 8, wherein the chromosome is an autosomal.

◈ Claim 12 is abandoned due to registration fee.

9. The computer readable method according to claim 8, wherein the target chromosome is chromosome 13, 18 or 21 of human.