KR20150038216A

KR20150038216A - Highly multiplex pcr methods and compositions

Info

Publication number: KR20150038216A
Application number: KR20157004509A
Authority: KR
Inventors: 버나드 짐머맨; 매튜 엠. 힐; 필립 길버트 라크루트; 마이클 다드
Original assignee: 내테라, 인코포레이티드
Priority date: 2012-07-24
Filing date: 2012-11-21
Publication date: 2015-04-08
Also published as: JP2022037145A; JP7503043B2; IL236435A0; JP6392222B2; JP6997814B2; JP2022051949A; JP2018183189A; CA2877493A1; JP6997815B2; JP2024113133A; JP2022027971A; JP2020054402A; CA2877493C; AU2012385961B9; RU2014152883A; AU2012385961A1; JP7343563B2; HK1211058A1; AU2012385961B2; JP6997813B2

Abstract

본 발명은 하나의 반응 용적 속에서 목적한 다수의 핵산 영역을 동시에 증폭시키는 방법 및 이러한 증폭 방법에 사용하기 위한 프라이머의 라이브러리를 선택하는 방법을 제공한다. 본 발명은 또한 증폭된 프라이머 이량체 또는 다른 비-표적 앰플리콘의 최소 형성과 같은, 바람직한 특성을 지닌 프라이머의 라이브러리를 제공한다.The present invention provides a method for simultaneously amplifying a plurality of desired nucleic acid regions in one reaction volume and a method for selecting a library of primers for use in such an amplification method. The present invention also provides a library of primers with desirable characteristics, such as minimal formation of amplified primer dimers or other non-target amplicons.

Description

[0001] HIGHLY MULTIPLEX PCR METHODS AND COMPOSITIONS [0002]

관련 출원의 교차-참조Cross-reference of related application

본 출원은 2012년 11월 21일자로 출원된, 미국 실용신안 출원 일련 번호 제13/683,604호, 및 2012년 7월 24일자로 출원된 미국 가특허원 일련 번호 제61/675,020호의 이익 및 우선권을 청구한다. 미국 실용신안 출원 일련번호 제13/683,604호는 2011년 11월 18일자로 출원된, 미국 실용신안 일련번호 제13/300,235호의 부분 연속 출원이며, 2011년 5월 18일자로 출원된 미국 실용신안 출원 일련번호 제13/110,685호의 부분 연속 출원이고, 2012년 7월 24일자로 출원된 미국 가특허원 일련번호 제61/675,020호의 이익을 청구한다. 미국 실용신안 출원 일련번호 제13/110,685호는 2010년 5월 18일자로 출원된, 미국 가특허원 일련번호 제61/395,850호; 2010년 6월 21일자로 출원된 미국 가특허원 일련 번호 제61/398,159호; 2011년 2월 9일자로 출원된, 미국 가특허원 일련번호 제61/462,972호; 2011년 3월 2일자로 출원된 미국 가특허원 일련번호 제61/448,547호; 및 2011년 4월 12일자로 출원된, 미국 가특허원 일련번호 제61/516,996호의 이익을 청구한다. 미국 실용신안 출원 일련번호 제13/300,235호는 2011년 6월 23일자로 출원된 미국 가특허원 일련번호 제61/571,248호의 이익을 청구한다. 이들 출원 모두의 전문은 본원의 교시를 위해 참조로서 본원에 포함된다.This application claims the benefit of US Utility Model Application Serial No. 13 / 683,604, filed November 21, 2012, and United States Patent Application Serial No. 61 / 675,020, filed July 24, 2012, I claim. U.S. Utility Model Serial No. 13 / 683,604 is a continuation-in-part of U.S. Utility Model Serial No. 13 / 300,235 filed on November 18, 2011 and filed with U.S. Utility Model Application filed on May 18, Serial No. 13 / 110,685, filed on July 24, 2012, and claims benefit of U.S. Provisional Patent Application Serial No. 61 / 675,020, filed July 24, U.S. Utility Model Serial No. 13 / 110,685 filed on May 18, 2010, U.S. Provisional Patent Application Serial No. 61 / 395,850; U. S. Patent Application Serial No. 61 / 398,159, filed June 21, 2010; U.S. Provisional Patent Application Serial No. 61 / 462,972, filed February 9, 2011; U.S. Patent Application Serial No. 61 / 448,547, filed March 2, 2011; And U.S. Provisional Patent Application Serial No. 61 / 516,996, filed April 12, U.S. Utility Model Serial No. 13 / 300,235 claims the benefit of US Patent Application Serial No. 61 / 571,248 filed on June 23, 2011. The entire disclosure of both of these applications is incorporated herein by reference for the teaching of the present application.

연방 정부 후원의 연구 또는 개발에 관한 기술Technology for research or development of federal sponsorship

본 연구는 국립 의료원이 승인한 등록 번호 제5R44HD60423-3호에 의해 지원되었다. 미국 정부는 본 출원에 있어서 어떠한 특허 쟁점의 권리를 가질 수 있다.This study was supported by the National Health Service's registration number 5R44HD60423-3. The US Government may have the right to any patent issues in this application.

발명의 분야Field of invention

본 발명은 일반적으로 하나의 반응 용적의 목적한 다수의 핵산 영역을 연속적으로 증폭시키기 위한 방법 및 조성물에 관한 것이다.The present invention generally relates to methods and compositions for continuously amplifying a plurality of desired nucleic acid regions of a single reaction volume.

핵산 시료의 검정 처리량을 증가시키고 보다 효율적인 사용을 위하여, 목적한 시료 속의 많은 표적 핵산의 동시에 증폭을 많은 올리고뉴클레오타이드와 시료를 합한 후 당해 시료를 다중 PCR로서 당해 분야에 공지된 공정의 폴리머라제 연쇄 반응(PCR) 조건에 적용시켜 수행할 수 있다. 다중 PCR의 사용은 실험 과정을 유의적으로 단순화시키고 핵산 분석 및 검출에 요구되는 시간을 단축시킬 수 있다. 그러나, 다중 쌍을 동일한 PCR 반응에 가하는 경우, 증폭된 프라이머 이량체와 같은 비-표적 증폭 생성물이 생성될 수 있다. 이러한 생성물을 생성할 위험은, 프라이머의 수가 증가함에 따라 증가한다. 이들 비-표적 앰플리콘(non-target amplicon)은 추가의 분석 및/또는 검정을 위한 증폭된 생성물의 사용을 유의적으로 제한한다. 따라서, 다중 PCR 동안에 비-표적 앰플리콘의 형성을 감소시키기 위한 개선된 방법이 요구된다.In order to increase the throughput of the nucleic acid sample and to use it more efficiently, it is necessary to simultaneously amplify many target nucleic acids in the target sample and to combine many oligonucleotides with the sample. The sample is then subjected to multiplex PCR to perform polymerase chain reaction RTI ID = 0.0 > (PCR) < / RTI > conditions. The use of multiple PCRs can significantly simplify the experimental procedure and shorten the time required for nucleic acid analysis and detection. However, when multiple pairs are subjected to the same PCR reaction, non-target amplification products such as amplified primer dimers can be generated. The risk of producing such a product increases as the number of primers increases. These non-target amplicons significantly limit the use of amplified products for further analysis and / or assays. Thus, there is a need for improved methods for reducing the formation of non-target amplicons during multiplex PCR.

개선된 다중 PCR 방법은 비-침입성 태아 유전 진단(Non-Invasive Prenatal Genetic Diagnosis: NPD)과 같은, 다양한 적용에 유용할 수 있다. 특히, 태아 진단의 현재의 방법은 주치의 및 부모에게 성장하는 태아의 비정상을 경계시킬 수 있다. 태아 진단없이도, 50명의 자녀 중 1명은 심각한 물리적 또는 정신적 장애를 갖고 태어나며, 30명 중의 1명으로 많은 자녀가 선천성 기형의 일부 형태를 가질 것이다. 불행하게도, 표준 방법은, 정확성이 불량하거나, 유산의 위험을 수반하는 침입성 과정을 포함한다. 모계의 혈액 호르몬 수준 또는 초음파 측정을 기반으로 하는 방법은 비-침입성이지만, 이들은 또한, 정확성이 낮다. 양수진단, 융모 생체검사 및 태아 혈액 시료채취는 고도의 정밀성을 갖지만, 침입성이고 유의적인 위험을 수반한다. 양수진단은, 이의 사용 빈도가 과거 15년에 걸쳐 증가되어 왔지만, 미국에서 모든 임산부의 대략 3%에서 수행되었다.The improved multiplex PCR method may be useful for a variety of applications, such as non-Invasive Prenatal Genetic Diagnosis (NPD). In particular, current methods of fetal diagnosis can alert the physician and parents of abnormal fetal growth. Without fetal diagnosis, one in 50 children will be born with severe physical or mental disability, and one in 30 children will have some form of congenital anomalies. Unfortunately, standard methods involve intrusive processes that are poorly accurate or involve the risk of miscarriage. Methods based on maternal blood hormone levels or ultrasonic measurements are non-invasive, but they are also less accurate. Amniocentesis, villi biopsy and fetal blood sampling are highly precise, but invasive and involve significant risks. Amniocentesis has been performed in approximately 3% of all pregnant women in the United States, although its frequency has increased over the past 15 years.

정상의 사람은 모든 건강한 이배체 세포에서 2개 세트의 23개 염색체를 가지며, 1개의 카피는 각각의 부모에서 기원한다. 세포가 너무나 많은 및/또는 너무나 적은 염색체를 함유하는 경우의 핵 세포내 상태는 거대한 퍼센트의 착상 실패, 유산 및 유전 질병에 관여하는 것으로 여겨지고 있다. 염색체 비정상의 검출은 성공적인 임신 기회의 증가 외에, 다른 것들 중에서, 다운 증후군(Down syndrome), 클라인펠터 증후군(Klinefelter's syndrome), 및 터너 증후군(Turner syndrome)과 같은 상태의 개체 또는 배아를 확인할 수 있다. 염색체 비정상에 대한 시험은, 모(mother)의 연령: 35 내지 40세 사이에서 배아의 적어도 40%가 비정상이고, 40세를 초과하면 배아의 50% 이상이 비정상인 것으로 추정되기 때문에 특히 중요하다.The normal human has two sets of 23 chromosomes in all healthy diploid cells, one copy originating from each parent. Nuclear intracellular conditions where the cells contain too many and / or too few chromosomes are believed to involve a large percentage of implant failure, abortion and genetic diseases. Detection of chromosomal abnormalities can identify individuals or embryos in a state such as Down syndrome, Klinefelter's syndrome, and Turner syndrome among others, in addition to an increase in successful pregnancy opportunities. Testing for chromosomal abnormality is particularly important because at least 40% of the embryos are abnormal in the age range of 35-40 years of the mother and over 40% of the embryos are estimated to be abnormal.

세포 유리된 태아 DNA 및 완전한 태아 세포는 모계 혈액 순환내로 도입될 수 있다는 것이 최근에 발견되었다. 결과적으로, 이러한 유전 물질의 분석은 조기의 NPD를 허용할 수 있다. NPD에 대해 요구되는 민감성 및 특이성을 개선시키고 시간 및 비용을 감소시키는 개선된 방법이 요구되고 있다.It has recently been discovered that cell free fetal DNA and complete fetal cells can be introduced into the maternal blood circulation. As a result, analysis of these genetic materials may allow early NPD. There is a need for improved methods to improve the sensitivity and specificity required for NPD and to reduce time and cost.

발명의 요약SUMMARY OF THE INVENTION

하나의 측면에서, 본 발명은 핵산 시료 속에서 표적 유전자자리를 증폭시키는 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 (i) 핵산 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리에 동시에 하이브리드화하는 시험 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계; 및 (ii) 상기 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 표적 앰플리콘을 포함하는 증폭된 생성물을 생산하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 또한 적어도 하나의 표적 앰플리콘(예를 들면, 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 표적 앰플리콘)의 존재 또는 부재를 측정하는 단계를 포함한다. 일부 구현예에서, 상기 방법은 또한 적어도 하나의 표적 앰플리콘(예를 들면, 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 표적 앰플리콘)의 서열을 측정하는 단계를 포함한다.In one aspect, the invention features a method of amplifying a target gene locus in a nucleic acid sample. In some embodiments, the method comprises (i) contacting the nucleic acid sample with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of test primers that hybridize simultaneously to 100,000 different target gene sites to produce a reaction mixture; And (ii) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising the target amplicon. In some embodiments, the method also comprises administering to the subject at least one of the target amplicons (e. G., At least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the target amplicon) &Lt; / RTI > presence or absence. In some embodiments, the method further comprises the step of administering to the subject at least one of the target amplicons (e.g., at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the target amplicon) And measuring the sequence.

본 발명의 측면 중 어느 것의 각종 구현예에서, 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리가 증폭된다. 일부 구현예에서, 증폭된 생성물의 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%는 표적 앰플리콘이다. 일부 구현예에서, 표적 유전자자리 중의 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%가 증폭된다. 다양한 구현예에서, 증폭된 생성물의 60, 50, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, 또는 0.05% 미만이 프라이머 이량체이다. 일부 구현예에서, 시험 프라이머의 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 시험 프라이머 쌍을 포함하며, 여기서, 프라이머의 각각의 쌍은 전방(forward) 시험 프라이머 및 동일한 표적 유전자자리에 하이브리드화하는 역 시험 프라이머를 포함한다. 일부 구현예에서, 시험 프라이머의 라이브러리는 상이한 표적 유전자자리에 하이브리드화하는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 개개 시험 프라이머를 포함하며, 여기서 당해 개개의 프라이머는 프라이머 쌍의 부분이 아니다.In various embodiments of any of the aspects of the invention, at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target gene sites are amplified. In some embodiments, at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the amplified product is a target amplicon. In some embodiments, at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the target gene positions are amplified. In various embodiments, less than 60, 50, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, or 0.05% of the amplified product is a primer dimer. In some embodiments, the library of test primers is at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 test primer pairs, wherein each pair of primers includes a forward test primer and a reverse test primer that hybridizes to the same target gene locus. In some embodiments, the library of test primers comprises at least 1,000 that hybridizes to a different target locus; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 individual test primers, wherein each individual primer is not part of a primer pair.

본 발명의 측면 중의 어느 것의 각종 구현예에서, 각각의 시험 프라이머의 농도는 100, 75, 50, 25, 10, 5, 2, 또는 1 nM 미만이다. 각종 구현예에서, 시험 프라이머의 GC 함량은 30 내지 80%, 예를 들면, 40 내지 70% 또는 50 내지 60%이다. 일부 구현예에서, 시험 프라이머의 GC 함량의 범위(예를 들면, 최대 GC 함량 - 최소 GC 함량, 예를 들면, 80% - 60% = 20%의 범위)는 30, 20, 10, 또는 5% 미만이다. 일부 구현예에서, 시험 프라이머의 용융 온도(T_m)는 40 내지 80℃이며, 예를 들면, 포괄적인, 50 내지 70℃, 55 내지 65℃, 또는 57 내지 60.5℃이다. 일부 구현예에서, 시험 프라이머의 용융 온도의 범위는 20, 15, 10, 5, 3, 또는 1℃ 미만이다. 일부 구현예에서, 시험 프라이머의 길이는 15 내지 100개의 뉴클레오타이드이고, 예를 들면, 포괄적인, 15 내지 75개의 뉴클레오타이드, 15 내지 40개의 뉴클레오타이드, 17 내지 35개의 뉴클레오타이드, 18 내지 30개의 뉴클레오타이드, 20 내지 65개의 뉴클레오타이드를 포함한다. 일부 구현예에서, 시험 프라이머는 내부 루프 구조를 형성하는 태그(tag)와 같은 표적 특이적이지 않은 태그를 포함한다. 일부 구현예에서, 당해 태그는 2개의 DNA 결합 영역이다. 각종 구현예에서, 시험 프라이머는 표적 유전자자리에 대해 특이적인 5' 영역, 표적 유전자 자리에 대해 특이적이지 않고 루프 구조를 형성하는 내부 영역, 및 표적 유전자자리에 대해 특이적인 3' 영역을 포함한다. 각종 구현예에서, 3' 영역의 길이는 적어도 7개의 뉴클레오타이드이다. 일부 구현예에서, 3' 영역의 길이는 7 내지 20개의 뉴클레오타이드이며, 예를 들면, 포괄적인, 7 내지 15개의 뉴클레오타이드, 또는 7 내지 10개의 뉴클레오타이드를 포함한다. 각종 구현예에서, 시험 프라이머는 표적 유전자자리에 대해 특이적이지 않은 5' 영역(예를 들면, 태그 또는 공통의 프라이머 결합 부위)에 이어서 표적 유전자자리에 대해 특이적인 영역, 표적 유전자자리에 대해 특이적이지 않고 루프 구조를 형성하는 내부 영역, 및 표적 유전자자리에 대해 특이적인 3' 영역을 포함한다. 일부 구현예에서, 시험 프라이머의 길이의 범위는 50, 40, 30, 20, 10, 또는 5개의 뉴클레오타이드 미만이다. 일부 구현예에서, 표적 앰플리콘의 길이는, 포괄적인, 50 내지 100개의 뉴클레오타이드이고, 예를 들면, 60 내지 80개의 뉴클레오타이드, 또는 60 내지 75개의 뉴클레오타이드를 포함한다. 일부 구현예에서, 표적 앰플리콘의 길이의 범위는 50, 25, 15, 10, 또는 5개의 뉴클레오타이드 미만이다.In various embodiments of any of the aspects of the invention, the concentration of each test primer is less than 100, 75, 50, 25, 10, 5, 2, or 1 nM. In various embodiments, the GC content of the test primer is 30-80%, e.g., 40-70% or 50-60%. In some embodiments, the range of GC content of the test primer (e.g., the maximum GC content-the minimum GC content, e.g., 80% -60% = 20%) is 30, 20, 10, . In some embodiments, the melting temperature ( _Tm ) of the test primer is from 40 to 80 占 폚, for example, from 50 to 70 占 폚, 55 to 65 占 폚, or 57 to 60.5 占 폚, inclusive. In some embodiments, the range of melting temperatures of the test primers is less than 20, 15, 10, 5, 3, or 1 ° C. In some embodiments, the length of the test primer is from 15 to 100 nucleotides, for example from 15 to 75 nucleotides, from 15 to 40 nucleotides, from 17 to 35 nucleotides, from 18 to 30 nucleotides, from 20 to 30 nucleotides, It contains 65 nucleotides. In some embodiments, the test primer includes a non-target specific tag, such as a tag that forms an inner loop structure. In some embodiments, the tag is two DNA binding regions. In various embodiments, the test primers include a 5 ' region specific for the target gene locus, an internal region that is not specific for the target gene locus but forms a loop structure, and a specific 3 ' region for the target gene locus . In various embodiments, the length of the 3 'region is at least 7 nucleotides. In some embodiments, the length of the 3 'region is 7 to 20 nucleotides and includes, for example, a generic, 7 to 15 nucleotides, or 7 to 10 nucleotides. In various embodiments, the test primer may comprise a 5 'region (e.g., a tag or common primer binding site) that is not specific for the target gene locus followed by a region specific for the target gene locus, An internal region that is not intact and forms a loop structure, and a 3 'region specific for the target gene locus. In some embodiments, the length of the test primer is less than 50, 40, 30, 20, 10, or 5 nucleotides. In some embodiments, the length of the target amplicon is a generic, 50 to 100 nucleotides, such as 60 to 80 nucleotides, or 60 to 75 nucleotides. In some embodiments, the length of the target amplicon is in the range of less than 50, 25, 15, 10, or 5 nucleotides.

본 발명의 측면의 어느 것 중의 각종 구현예에서, 프라이머 연장 반응 조건은 폴리머라제 연쇄 반응 조건(PCR)이다. 각종 구현예에서, 어닐링 단계(annealing step)의 길이는 3, 5, 8, 10, 또는 15분 이상이다. 각종 구현예에서, 연장 단계의 길이는 3, 5, 8, 10, 또는 15분 이상이다.In various embodiments of any of the aspects of the invention, the primer extension reaction conditions are polymerase chain reaction conditions (PCR). In various embodiments, the length of the annealing step is 3, 5, 8, 10, or 15 minutes or more. In various embodiments, the length of the extension step is 3, 5, 8, 10, or 15 minutes or more.

본 발명의 측면의 어느 것 중의 각종 구현예에서, 시험 프라이머는 태아의 임신한 모친에 기원한 모계 DNA 및 태아 DNA를 포함하는 시료 속에서 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시켜 태아 염색체 기형의 존재 또는 부재를 측정하는데 사용된다. 각종 구현예에서, 상기 방법은 시료 속에서 공통의 프라이머 결합 부위를 DNA 분자에 연결시키는 단계, 연결된 DNA 분자를 적어도 1,000개의 특이적인 프라이머및 공통의 프라이머를 사용하여 증폭시켜 제1 세트의 증폭된 생성물을 생산하는 단계; 및 제1 세트의 증폭된 생성물을 적어도 1,000개 쌍의 특이적인 프라이머를 사용하여 증폭시켜 제2 세트의 증폭된 생성물을 생산하는 단계를 포함한다. 각종의 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 프라이머 쌍이 사용된다.In various embodiments of any of the aspects of the invention, the test primer simultaneously amplifies at least 1,000 different target gene sites in a sample containing maternal DNA and fetal DNA originating from the pregnant mother of the fetus to produce a fetal chromosomal anomaly &Lt; / RTI > presence or absence. In various embodiments, the method comprises coupling a common primer binding site to a DNA molecule in a sample, amplifying the linked DNA molecule with at least 1,000 specific primers and a common primer to generate a first set of amplified products ; And amplifying the first set of amplified products using at least 1,000 pairs of specific primers to produce a second set of amplified products. In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different primer pairs are used.

본 발명의 측면의 어느 것 중의 각종 구현예에서, 시험 프라이머는 태아의 추정된 부친의 DNA를 포함하는 시료 속에서 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시키고 태아의 임신한 모친의 모계 DNA 및 태아 DNA를 포함하는 시료 속에서 표적 유전자자리를 동시에 증폭시켜 추정된 아버지가 태아의 생물학적 아버지인지를 확립하는데 사용된다.In various embodiments of any of the aspects of the invention, the test primer simultaneously amplifies at least 1,000 different target gene sites in a sample containing the DNA of the putative fetus of the fetus, and the maternal DNA and embryo of the pregnant mother of the fetus It is used to simultaneously amplify the target gene locus in a sample containing DNA to establish whether the estimated father is the biological father of the fetus.

본 발명의 측면의 어느 것 중의 각종 구현예에서, 시험 프라이머는 배아의 1개 세포 또는 다수 세포에서 적어도 상이한 1,000개의 표적 유전자자리를 동시에 증폭시켜 염색체 기형의 존재 또는 부재를 측정하는 데 사용된다. 각종 구현예에서, 2개 이상의 배아 세포 세트가 분석되며, 1개의 배아는 시험관내 수정을 위해 선택된다.In various embodiments of any of the aspects of the invention, the test primer is used to measure the presence or absence of chromosomal anomalies by simultaneously amplifying at least 1,000 different target gene sites in one cell or multiple cells of the embryo. In various embodiments, two or more sets of embryo cells are analyzed and one embryo is selected for in vitro fertilization.

본 발명의 측면의 어느 것 중의 각종 구현예에서, 시험 프라이머는 법의학적 핵산 시료 속에서 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시키는데 사용된다. 각종 구현예에서, 어닐링 단계의 길이는 3, 5, 8, 10, 또는 15분 이상이다.In various embodiments of any of the aspects of the invention, the test primer is used to simultaneously amplify at least 1,000 different target gene sites in a forensic nucleic acid sample. In various embodiments, the length of the annealing step is 3, 5, 8, 10, or 15 minutes or more.

본 발명의 측면의 어느 것 중의 각종 구현예에서, 당해 방법은 시험 프라이머를 사용하여 대조군 핵산 시료 속에서 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시켜 표적 앰플리콘의 제1 세트를 생산하고 시험 핵산 시료 속의 표적 유전자자리를 동시에 증폭시켜 표적 앰플리콘의 제2의 세트를 생산하는 단계; 및 표적 앰플리콘의 제1 및 제2의 세트를 비교하여 표적 유전자자리가 1개의 시료 속에 존재하지만 다른 것에서는 부재하는지, 또는 표적 유전자자리가 대조군 시료와 시험 시료 속에 상이한 수준으로 존재하는지를 측정한다. 각종 구현예에서, 시험 시료는 목적한 질병 또는 표현형(예를 들면, 암), 또는 목적한 질병 또는 표현형에 대한 증가된 위험을 가진 것으로 추측된 개체에서 기원하며; 여기서 표적 유전자자리 중의 하나 이상은 목적한 질병 또는 표현형에 대해 증가된 위험과 관련되거나, 목적한 질병 또는 표현형과 관련된 서열(예를 들면, 다형태 또는 다른 돌연변이)를 포함한다. 각종 구현예에서, 당해 방법은 시험 프라이머를 사용하여 RNA를 포함하는 대조군 시료 속의 상이한 표적 유전자 자리 1,000개를 동시에 증폭시켜 표적 앰플리콘의 제1의 세트를 생산하고 RNA를 포함하는 시험 시료 속의 표적 유전자자리를 동시에 증폭시켜 표적 앰플리콘의 제2 세트를 생산하는 단계; 및 표적 앰플리콘의 제1 및 제2 세트를 비교하여 대조군 시료와 시험 시료 사이의 RNA 발현 수준의 차이의 존재 또는 부재를 측정하는 단계를 포함한다. 각종 구현예에서, RNA는 mRNA이다. 각종 구현예에서, 시험 시료는 목적한 질병 또는 표현형(예를 들면, 암) 또는 목적한 질병 또는 표현형(예를 들면, 암)에 대해 증가된 위험을 가진 것으로 추측된 개체에서 기원하며; 여기서 표적 유전자자리 중의 하나 이상은 목적한 질병 또는 표현형에 대한 증가된 위험과 관련되거나 목적한 질병 또는 표현형과 관련된 서열(예를 들면, 다형성 또는 다른 돌연변이)를 포함한다. 일부 구현예에서, 시험 시료는 목적한 질병 또는 표현형(예를 들면, 암)을 지닌 것으로 진단 된 개체에서 기원하며; 여기서 대조군 시료와 시험 시료 사이의 RNA 발현 수준에 있어서의 차이는, 표적 유전자자리가 목적한 질병 또는 표현형에 대한 증가된 위험 또는 감소된 위험과 관련된 서열(예를 들면, 다형성 또는 다른 돌연변이)을 포함한다.In various embodiments of any of the aspects of the invention, the method comprises simultaneously amplifying at least 1,000 different target gene sites in a control nucleic acid sample using test primers to produce a first set of target amplicons, Simultaneously amplifying the target gene locus in the genome to produce a second set of target amplicons; And comparing the first and second sets of target amplicons to determine whether the target gene locus is present in one sample but not in the other, or whether the target gene locus is present at a different level in the control and test samples. In various embodiments, the test sample is from a subject suspected of having the desired disease or phenotype (e.g., cancer), or an increased risk for the desired disease or phenotype; Wherein one or more of the target gene loci comprises a sequence associated with an increased risk for the desired disease or phenotype, or a sequence associated with the desired disease or phenotype (e.g., polymorphism or other mutation). In various embodiments, the method uses a test primer to simultaneously amplify 1,000 different target gene sites in a control sample comprising RNA to produce a first set of target amplicons, and to amplify the target gene Amplifying the sites simultaneously to produce a second set of target amplicons; And comparing the first and second sets of target amplicons to determine the presence or absence of a difference in the level of RNA expression between the control sample and the test sample. In various embodiments, the RNA is mRNA. In various embodiments, the test sample is from an individual suspected of having an increased risk for the desired disease or phenotype (e.g., cancer) or the desired disease or phenotype (e.g., cancer); Wherein one or more of the target gene loci comprises sequences (e.g., polymorphisms or other mutations) associated with the disease or phenotype of interest or associated with an increased risk for the disease or phenotype of interest. In some embodiments, the test sample is from an individual diagnosed with the desired disease or phenotype (e.g., cancer); Wherein the difference in the level of RNA expression between the control sample and the test sample comprises a sequence (e.g., a polymorphism or other mutation) associated with an increased risk or reduced risk of a target disease or phenotype of the target gene locus do.

본 발명의 측면의 어느 것 중의 일부 구현예에서, 시험 프라이머는 본 발명의 방법 중의 어느 것을 사용한 프라이머의 선택과 같은, 하나 이상의 매개변수를 기반으로 한 후보물 프라이머의 라이브러리에서 선택된다. 일부 구현예에서, 시험 프라이머는 적어도 부분적으로 프라이머 이량체를 형성하기 위한 후보물 프라이머의 능력을 기반으로 하여 후보물 프라이머의 라이브러리에서 선택된다.In some embodiments of any of the aspects of the invention, the test primer is selected from a library of post-treasure primers based on one or more parameters, such as selection of primers using any of the methods of the invention. In some embodiments, the test primer is selected from a library of post-treasure primers based at least in part on the ability of the post-treasure primer to form the primer dimer.

하나의 측면에서, 본 발명은 후보물 프라이머의 라이브러리에서 시험 프라이머를 선택하는 방법을 특징으로 한다. 각종 구현예에서, 당해 선택은 (i) 컴퓨터 상에서 라이브러리에서 2개의 후보물 프라이머의 대부분 또는 모든 가능한 조합에 대한 비바람직성 점수를 계산하는 단계(여기서 각각의 비바람직성 점수는 적어도 부분적으로 2개의 후보물 프라이머 사이의 이량체 형성의 가능성을 기반으로 한다); (ii) 후보물 프라이머의 라이브러리에서 최대 비바람직성 점수를 지닌 후보물 프라이머를 제거하는 단계; 및 (iii) 단계 (ii)에서 제거된 후보물 프라이머가 프라이머 쌍의 구성원인 경우, 후보물 프라이머의 라이브러리에서 프라이머 쌍의 다른 구성원을 제거하는 단계; 및 (iv) 임의로 단계 (ii) 및 (iii)을 반복함으로써 시험 프라이머의 라이브러리를 선택하는 단계를 포함한다. 일부 구현예에서, 선택 방법은, 라이브러리에 남아있는 후보물 프라이머 배합물에 대한 비바람직성 점수가 모두 최소 한계(minimum threshold) 이하일 때까지 수행된다. 일부 구현예에서, 선택 방법은, 라이브러리 속에 남아있는 다수의 후보물 프라이머가 바람직한 수로 감소될 때까지 수행된다. 다양한 구현예에서, 비바람직성 점수는 라이브러리내 후보물 프라이머의 가능한 배합물의 적어도 적어도 80, 90, 95, 98, 99, 또는 99.5%에 대해 계산된다. 다양한 구현예에서, 라이브러리 속에 남아있는 후보물 프라이머는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리를 동시에 증폭시킬 수 있다. 다양한 구현예에서, 당해 방법은 또한 (v) 표적 유전자자리를 포함하는 핵산 시료를 라이브러리 속에 남아있는 후보물 프라이머와 접촉시켜 반응 혼합물을 형성시키는 단계; 및 (vi) 상기 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 표적 앰플리콘을 포함하는 증폭된 생성물을 생산하는 단계를 포함한다.In one aspect, the invention features a method of selecting a test primer in a library of post-treasure primers. In various embodiments, the selection may include (i) computing a weathering score for most or all possible combinations of two post-treasure primers in a library on a computer, wherein each weathering score is at least partially two post-treasure Based on the possibility of dimer formation between primers); (ii) removing the post-treasure primer with a maximum rain resistance score in the library of post-treasure primers; And (iii) if the treasure primer is a member of a pair of primers after removal in step (ii), removing other members of the primer pair from the library of treasure primers; And (iv) optionally repeating steps (ii) and (iii). In some embodiments, the selection method is performed until both the weathering scores for the post-treasure primer combination remaining in the library are all below the minimum threshold. In some embodiments, the selection method is performed until a number of post-treble primers remaining in the library are reduced to a desired number. In various embodiments, the weathering score is calculated for at least 80, 90, 95, 98, 99, or 99.5% of possible combinations of post-treasure primers in the library. In various embodiments, the remaining treble primer in the library has at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target gene sites simultaneously. In various embodiments, the method further comprises the steps of: (v) contacting the nucleic acid sample comprising the target gene locus with a post-harvest primer remaining in the library to form a reaction mixture; And (vi) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising the target amplicon.

하나의 측면에서, 본 발명은 후보물 프라이머의 라이브러리에서 시험 프라이머를 선택하는 방법을 특징으로 한다. 다양한 구현예에서, 후보물 프라이머의 라이브러리에서 선택된 시험 프라이머의 선택은 (i) 컴퓨터 상에서 라이브러리에서 2개의 후보물 프라이머의 대부분 또는 모든 가능한 조합에 대해 비바람직성 점수를 계산하는 단계(여기서, 각각의 비바람직성 점수는 적어도 부분적으로 2개의 후보물 프라이머 사이의 이량체 형성 가능성을 기반으로 한다); (ii) 후보물 프라이머의 라이브러리에서 제1의 최소 한계를 초과하는 비바람직성 점수를 갖는 2개의 후보물 프라이머의 배합물의 최대 수 중 일부인 후보물 프라이머를 제거하는 단계; (iii) 단계 (ii)에서 제거된 후보물 프라이머가 프라이머 쌍의 구성원인 경우, 후보물 프라이머의 라이브러리에서 프라이머 쌍의 다른 구성원을 제거하는 단계; 및 (iv) 임의로 단계 (ii) 및 (iii)을 반복함으로써, 시험 프라이머의 라이브러리를 선택하는 단계를 포함한다. 일부 구현예에서, 선택 방법은, 라이브러리 속에 남아있는 후보물 프라이머 배합물에 대한 비바람직성 점수가 모두 제1의 최소 한계 이하일 때까지 수행된다. 일부 구현예에서, 선택 방법은, 라이브러리에 남아있는 후보물 프라이머의 수가 바람직한 수까지 감소될 때까지 수행된다. 다양한 구현예에서, 비바람직성 점수는 라이브러리 속의 후보물 프라이머의 모든 가능한 배합물의 적어도 80, 90, 95, 98, 99, 또는 99.5%에 대해 계산된다. 다양한 구현예에서, 라이브러리 속에 남아있는 후보물 프라이머는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리를 동시에 증폭시킬 수 있다. 다양한 구현예에서, 당해 방법은 (v) 표적 유전자자리를 포함하는 핵산 시료를 라이브러리 속에 남아있는 후보물 프라이머와 접촉시켜 반응 혼합물을 생산하는 단계; 및 (vi) 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 표적 앰플리콘을 포함하는 증폭된 생성물을 생산하는 단계를 포함한다.In one aspect, the invention features a method of selecting a test primer in a library of post-treasure primers. In various embodiments, the selection of the test primer selected in the library of the trefoil primers can be accomplished by (i) computing a rain storability score for most or all possible combinations of two trefoil primers in a library on a computer, The stiffness score is based at least in part on the possibility of dimer formation between the two post-treasure primers); (ii) removing a post-treasure primer that is part of a maximum number of combinations of two treasure primers having a weathering score exceeding a first minimum limit in a library of treasure primers; (iii) if the treasure primer is a member of a primer pair after removal in step (ii), removing another member of the primer pair from the library of treasure primers; And (iv) optionally repeating steps (ii) and (iii). In some embodiments, the selection method is performed until all of the weathering scores for the post-treasure primer combination remaining in the library are below the first minimum limit. In some embodiments, the selection method is performed until the number of post-treasure primers remaining in the library is reduced to a desired number. In various embodiments, the weathering score is calculated for at least 80, 90, 95, 98, 99, or 99.5% of all possible combinations of the trefoil primers in the library. In various embodiments, the remaining treble primer in the library has at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target gene sites simultaneously. In various embodiments, the method comprises: (v) contacting a nucleic acid sample comprising a target gene locus with a post-treasure primer remaining in the library to produce a reaction mixture; And (vi) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising the target amplicon.

본 발명의 측면 중 어느 것의 다양한 구현예에서, 선택 방법은 단계 (ii)에 사용된 제1의 최소 한계를 보다 낮은 제2의 최소 한계로 감소시킴으로써 라이브러리 속에 남아있는 후보물 프라이머의 수를 추가로 감소시키는 단계 및 임의로 단계 (ii) 및 (iii)을 반복하는 단계를 포함하다. 일부 구현예에서, 선택 방법은 단계 (ii)에서 사용된 제1의 최소 한계를 보다 높은 제2의 최소 한계로 증가시키는 단계 및 임의로 단계 (ii) 및 (iii)을 반복하는 단계를 포함한다. 일부 구현예에서, 선택 방법은, 라이브러리 속에 남아있는 후보물 프라이머 배합물에 대한 비바람직성 점수가 모두 제2의 최소 한계 이하일 때까지, 또는 라이브러리 속에 남아있는 후보물 프라이머의 수가 바람직한 수로 감소될 때까지 수행된다.In various embodiments of any of the aspects of the present invention, the selection method further includes reducing the number of post-treasure primers remaining in the library by reducing the first minimum limit used in step (ii) to a lower second minimum limit And optionally repeating steps (ii) and (iii). In some embodiments, the selection method includes increasing the first minimum limit used in step (ii) to a second higher minimum limit, and optionally repeating steps (ii) and (iii). In some embodiments, the selection method is performed until both the weathering score for the remaining trefoil primer combination remaining in the library is below the second minimum limit, or until the number of remaining trephine primers remaining in the library is reduced to a desired number do.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 상기 방법은, 단계 (i) 전에, 표적 유전자자리에 하이브리드화하는 프라이머를 확인하거나 선택하는 단계를 포함한다. 일부 구현예에서, 다중 프라이머(또는 프라이머 쌍)은 동일한 표적 유전자자리에 하이브리드화하며, 선택 방법은 하나 이상의 매개변수를 기반으로 당해 표적 유전자자리에 대한 하나의 프라이머(또는 하나의 프라이머 쌍)을 선택하기 위해 사용된다. 다양한 구현예에서, 당해 방법은, 단계 (ii) 전에, 다른 프라이머 쌍에 의해 생산된 표적 앰플리콘과 오우버랩되는 표적 앰플리콘을 생산하는 라이브러리에서 프라이머 쌍을 제거하는 단계를 포함한다. 다양한 구현예에서, 후보물 프라이머는 하나 이상의 다른 매개변수를 기반으로 후보물 프라이머의 라이브러리에서의 제거를 위한 비바람직성 점수가 동일한 2개 이상의 후보물 프라이머의 그룹 중에서 선택된다. 일부 구현예에서, 라이브러리 속에 남아있는 후보물 프라이머는 본 발명의 방법 중 어느 것에서 시험 프라이머의 라이브러리로서 사용된다. 일부 구현예에서, 수득되는 시험 프라이머의 라이브러리는 본 발명의 프라이머 라이브러리 중의 어느 것도 포함한다.In various embodiments of any of the aspects of the invention, the method comprises identifying or selecting a primer that hybridizes to the target gene site prior to step (i). In some embodiments, multiple primers (or primer pairs) hybridize to the same target gene locus, and the selection method selects one primer (or one primer pair) for the target gene locus based on one or more parameters Lt; / RTI > In various embodiments, the method comprises the step of removing the primer pair from a library that produces a target amplicon that is overlaid with the target amplicon produced by another primer pair prior to step (ii). In various embodiments, the treasure primer is selected from the group of two or more treasure primers having the same degree of weathering score for removal from the library of treasure primers based on one or more other parameters. In some embodiments, the remaining trefoil primer in the library is used as a library of test primers in any of the methods of the present invention. In some embodiments, the library of test primers obtained includes any of the primer libraries of the present invention.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 비바람직성 점수는 적어도 부분적으로 표적 유전자자리의 이형접합성 비율, 표적 유전자자리에서 서열(예: 다형성)과 관련된 질병 우세성, 표적 유전자자리에서 서열(예: 다형성)과 관련된 질병 침투율, 표적 유전자자리에 대한 후보물 프라이머의 특이성, 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 증폭 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 매개변수를 기반으로 한다.In various embodiments of any of the aspects of the invention, the score of wobbling is at least partially related to the heterozygosity ratio of the target gene locus, disease predominance associated with the sequence (e.g., polymorphism) at the target gene locus, sequence at the target locus, Polymorphism), the specificity of the post-treasure primer to the target gene locus, the size of the post-treasure primer, the melting temperature of the target ampullicon, the GC content of the target ampicillon, the amplification efficacy of the target ampicillon, &Lt; RTI ID = 0.0 > size. &Lt; / RTI >

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 비바람직성 점수는 적어도 부분적으로 표적 유전자자리의 이형접합성 비율, 표적 유전자자리에 대한 후보물 프라이머의 특이성; 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 증폭 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 매개변수를 기반으로 하며; 시험 프라이머는 태아의 임신부의 모계 DNA 및 태아 DNA를 포함하는 시료 속의 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시켜 태아 염색체 비정상의 존재 또는 부재를 측정한다. 다양한 구현예에서, 당해 방법은 공통의 프라이머 결합 부위를 시료 속의 DNA 분자에 연결시키는 단계; 연결된 DNA 분자을 적어도 1,000개의 특이적인 프라이머 및 공통의 프라이머를 사용하여 증폭시켜 제1 세트의 증폭된 생성물을 생산하는 단계; 및 제1 세트의 증폭된 생성물을 적어도 1,000개 쌍의 특이적인 프라이머를 사용하여 증폭시켜 제2 세트의 증폭된 생성물을 생산하는 단계를 포함한다. 다양한 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 프라이머 쌍이 사용된다. 다양한 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리가 증폭된다.In various embodiments of any of the aspects of the invention, the weathering score is at least partially dependent on the heterozygosity ratio of the target gene locus, the specificity of the ribosome primer to the target gene locus; At least one parameter selected from the group consisting of the size of the post-treasure primer, the melting temperature of the target ampule, the GC content of the target ampule, the amplification efficiency of the target ampule cone, and the size of the target ampule cone; The test primer simultaneously amplifies at least 1,000 different target gene sites in a sample containing maternal DNA and fetal DNA of the pregnant part of the fetus to measure the presence or absence of the fetal chromosomal abnormality. In various embodiments, the method comprises coupling a common primer binding site to a DNA molecule in a sample; Amplifying the linked DNA molecule with at least 1,000 specific primers and a common primer to produce a first set of amplified products; And amplifying the first set of amplified products using at least 1,000 pairs of specific primers to produce a second set of amplified products. In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different primer pairs are used. In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target gene sites are amplified.

본 발명의 측면 중 어느 것의 다양한 구현예에서, 비바람직성 점수는 적어도 부분적으로 표적 유전자자리의 이형접합성 비율, 표적 유전자자리에 대한 후보물 프라이머의 특이성; 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 증폭 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 매개변수를 기반으로 하며; 시험 프라이머는 태아의 추정 부친(Alleged father)의 DNA를 포함하는 시료 속의 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시키고 태아의 임신부의 모계 DNA 및 태아 DNA를 포함하는 시료 속의 표적 유전자자리를 동시에 증폭시켜 주장된 부친이 태아의 생물학적 부친인지를 확립하는데 사용된다. 다양한 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 프라이머 쌍이 증폭된다.In various embodiments of any of the aspects of the invention, the weathering score is at least in part the rate of heterozygosity of the target gene locus, the specificity of the ribosome primer relative to the target gene locus; At least one parameter selected from the group consisting of the size of the post-treasure primer, the melting temperature of the target ampule, the GC content of the target ampule, the amplification efficiency of the target ampule cone, and the size of the target ampule cone; The test primer simultaneously amplifies at least 1,000 different target gene sites in a sample containing the DNA of the fetus's Alleged father and simultaneously amplifies the target gene locus in the sample containing maternal DNA and fetal DNA of the pregnant part of the fetus It is used to establish that the asserted father is the biological father of the fetus. In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different primer pairs are amplified.

본 발명의 측면 중 어느 것의 다양한 구현예에서, 비바람직성 점수는 적어도 부분적으로 표적 유전자자리의 이형접합성 비율, 표적 유전자자리에 대한 후보물 프라이머의 특이성; 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 증폭 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 매개변수를 기반으로 하며; 시험 프라이머는 태아의 1개 세포 또는 다수 세포에서 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시켜 염색체 이상의 존재 또는 부재를 측정하는데 사용된다. 다양한 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리가 증폭된다.In various embodiments of any of the aspects of the invention, the weathering score is at least in part the rate of heterozygosity of the target gene locus, the specificity of the ribosome primer relative to the target gene locus; At least one parameter selected from the group consisting of the size of the post-treasure primer, the melting temperature of the target ampule, the GC content of the target ampule, the amplification efficiency of the target ampule cone, and the size of the target ampule cone; Test primers are used to simultaneously amplify at least 1,000 different target gene sites in a single cell or multiple cells of a fetus to measure the presence or absence of chromosomal aberrations. In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target gene sites are amplified.

본 발명의 측면 중 어느 것의 다양한 구현예에서, 비바람직성 점수는 적어도 부분적으로 표적 유전자자리의 이형접합성 비율, 표적 유전자자리에 대한 후보물 프라이머의 특이성; 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 증폭 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 매개변수를 기반으로 하며; 시험 프라이머는 외부 핵산 시료에서 적어도 1,000개의 상이한 표적 유전자자리를 동시에 증폭시키는데 사용된다. 다양한 구현예에서, 어닐링 단계의 길이는 3, 5, 8, 10, 또는 15분 이상이다. 다양한 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 프라이머 쌍이 증폭된다.In various embodiments of any of the aspects of the invention, the weathering score is at least in part the rate of heterozygosity of the target gene locus, the specificity of the ribosome primer relative to the target gene locus; At least one parameter selected from the group consisting of the size of the post-treasure primer, the melting temperature of the target ampule, the GC content of the target ampule, the amplification efficiency of the target ampule cone, and the size of the target ampule cone; The test primers are used to simultaneously amplify at least 1,000 different target gene sites in an external nucleic acid sample. In various embodiments, the length of the annealing step is 3, 5, 8, 10, or 15 minutes or more. In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different primer pairs are amplified.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 비바람직성 점수는 적어도 부분적으로 표적 유전자자리의 이형접합성 비율, 표적 유전자자리의 서열(예: 다형성)과 관련된 질병 유병률, 표적 유전자자리에서 서열(예: 다형성)과 관련된 질병 침투도, 표적 유전자자리에 대한 후보물 프라이머의 특이성, 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 매개변수를 기반으로 하며; 당해 방법은 시험 프라이머를 사용하여 적어도 대조군 핵산 내의 1,000개의 상이한 표적 유전자자리를 동시에 증폭시킴으로써 제1 세트의 표적 앰플리콘을 생산하고 시험 핵산 시료 내 표적 유전자자리를 동시에 증폭시킴으로써 제2 세트의 표적 앰플리콘을 생산하는 단계; 및 제1 및 제2 세트의 표적 앰플리콘을 비교하여 표적 유전자자리가 하나의 시료 속에 존재하지만 다른 것에는 존재하지 않는지, 또는 표적 유전자자리가 대조군 시료 및 시험 시료 내 상이한 수준으로 존재하는지를 측정하는 단계를 포함한다. 다양한 구현예에서, 시험 시료는 목적한 질병 또는 표현형, 또는 목적한 질병 또는 표현형에 대한 증가된 위험을 가진 것으로 추측된 개체에서 기원하며; 여기서 하나 이상의 표적 유전자자리는 목적한 질병 또는 표현형에 대한 증가된 위험과 관련되거나, 목적한 질병 또는 표현형과 관련된 표적 유전자자리에서 서열(예: 다형성)을 포함한다. 다양한 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리가 증폭된다.In various embodiments of any of the aspects of the invention, the wilting score is determined, at least in part, by the heterozygosity ratio of the target gene locus, the disease prevalence associated with the sequence of the target gene locus (e.g., polymorphism), the sequence at the target locus Polymorphism), the specificity of the post-treasure primer for the target gene locus, the size of the post-treasure primer, the melting temperature of the target ampullicon, the GC content of the target ampullicon, the efficacy of the target ampiclone, Size of at least one parameter selected from the group consisting of; The method comprises the steps of simultaneously amplifying at least 1,000 different target gene sites in a control nucleic acid using a test primer to produce a first set of target amplicons and simultaneously amplifying a target gene locus in the test nucleic acid sample to produce a second set of target amplicons ; And comparing the first and second sets of target amplicons to determine whether the target gene locus is present in one sample but not in the other, or whether the target gene locus is present at a different level in the control sample and the test sample . In various embodiments, the test sample is from an individual suspected of having the desired disease or phenotype, or an increased risk for the desired disease or phenotype; Wherein the one or more target gene loci comprise sequences (e.g., polymorphisms) associated with an increased risk for the desired disease or phenotype, or at a target locus associated with the disease or phenotype of interest. In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target gene sites are amplified.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 비바람직성 점수는 적어도 부분적으로 표적 유전자자리의 이형접합성 비율, 표적 유전자자리의 서열(예: 다형성)과 관련된 질병 유병률, 표적 유전자자리에서 서열(예: 다형성)과 관련된 질병 침투도, 표적 유전자자리에 대한 후보물 프라이머의 특이성, 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 매개변수를 기반으로 하며; 당해 방법은 시험 프라이머를 사용하여 RNA를 포함하는 대조군 시료 속의 1,000개의 상이한 표적 유전자자리를 동시에 증폭시킴으로써 제1 세트의 표적 앰플리콘을 생산하고 RNA를 포함하는 시험 시료 내 표적 유전자자리를 동시에 증폭시킴으로써 제2 세트의 표적 앰플리콘을 생산하는 단계; 및 제1 및 제2 세트의 표적 앰플리콘을 비교하여 대조군 시료와 시험 시료 사이의 RNA 발현에 있어서 차이의 존재 또는 부재를 측정하는 단계를 포함한다. 다양한 구현예에서, RNA는 mRNA이다. 다양한 구현예에서, 시험 시료는 목적한 질병 또는 표현형(암과 같은) 또는 목적한 질병 또는 표현형(암과 같은)에 대한 증가된 위험을 가진 것으로 추측된 개체에서 기원하며; 여기서 하나 이상의 표적 유전자자리는 목적한 질병 또는 표현형에 대한 증가된 위험과 관련되거나, 목적한 질병 또는 표현형과 관련된 서열(예: 다형성 또는 다른 돌연변이)을 포함한다. 일부 구현예에서, 시험 시료는 목적한 질병 또는 표현형(암과 같은)으로 진단된 개체에서 기원하며; 여기서 대조군 시료와 시험 시료 사이의 RNA 발현 수준에 있어서의 차이는, 표적 유전자자리가 목적한 질병 또는 표현형에 대해 증가되거나 감소된 위험과 관련된 서열(예를 들면, 다형성 또는 다른 돌연변이)를 포함함을 나타낸다. 다양한 구현예에서, 적어도 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리가 증폭된다.In various embodiments of any of the aspects of the invention, the wilting score is determined, at least in part, by the heterozygosity ratio of the target gene locus, the disease prevalence associated with the sequence of the target gene locus (e.g., polymorphism), the sequence at the target locus Polymorphism), the specificity of the post-treasure primer for the target gene locus, the size of the post-treasure primer, the melting temperature of the target ampullicon, the GC content of the target ampullicon, the efficacy of the target ampiclone, Size of at least one parameter selected from the group consisting of; The method involves simultaneously amplifying 1,000 different target gene sites in a control sample containing RNA using a test primer to produce a first set of target amplicons and simultaneously amplifying the target gene sites in the test sample containing RNA Producing two sets of target amplicons; And comparing the first and second sets of target amplicons to determine the presence or absence of a difference in RNA expression between the control sample and the test sample. In various embodiments, the RNA is mRNA. In various embodiments, the test sample is from an individual suspected of having an increased risk for the desired disease or phenotype (such as cancer) or the desired disease or phenotype (such as cancer); Wherein one or more target gene loci comprise sequences (e.g., polymorphisms or other mutations) associated with an increased risk for the desired disease or phenotype, or related to the disease or phenotype of interest. In some embodiments, the test sample is from an individual diagnosed with the desired disease or phenotype (such as cancer); Wherein the difference in the level of RNA expression between the control sample and the test sample indicates that the target gene locus comprises a sequence (e.g., a polymorphism or other mutation) associated with an increased or decreased risk for the desired disease or phenotype . In various embodiments, at least 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target gene sites are amplified.

하나의 측면에서, 본 발명은 프라이머의 라이브러리를 특정으로 한다. 일부 구현예에서, 프라이머는 후보물 프라이머의 라이브러리에서 본 발명의 방법 중 어느 것을 사용하여 선택된다. 일부 구현예에서, 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리에 동시에 하이브리드화하는 후보물 프라이머의 라이브러리에서 선택된다. 일부 구현예에서, 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리에 동시에 하이브리드화하는 프라이머를 포함한다. 일부 구현예에서, 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리를 동시에 증폭시킴으로써 증폭된 생성물의 60, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, 또는 0.05% 미만이 프라이머 이량체가 되도록 하는 프라이머를 포함하다. 일부 구현예에서, 라이브러리는 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리를 증폭시킴으로써 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 증폭된 생성물이 표적 앰플리콘이 되도록 하는 프라이머를 포함한다. 일부 구현예에서, 라이브러리는 표적 유전자자리를 동시에 증폭시킴으로써 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리 이외의 표적화된 유전자자리 중의 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%가 증폭되도록 하는 프라이머를 포함한다. 일부 구현예에서, 프라이머의 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000 프라이머 쌍을 포함하며, 여기서 프라이머의 각각의 쌍은 전방 시험 프라이머 및 역방 시험 프라이머를 포함하고 여기서 시험 프라이머의 각각의 쌍은 표적 유전자자리에 하이브리드화한다. 일부 구현예에서, 프라이머의 라이브러리는 상이한 표적 유전자자리에 각각 하이브리드화하는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 개개 프라이머를 포함하며, 여기서 당해 개개 프라이머는 프라이머 쌍들 중 일부가 아니다.In one aspect, the invention is directed to a library of primers. In some embodiments, primers are selected using any of the methods of the invention in a library of post-treasure primers. In some embodiments, the library comprises at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Lt; RTI ID = 0.0 > 100000 < / RTI > different target gene sites. In some embodiments, the library comprises at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or primers that hybridize simultaneously to 100,000 different target gene sites. In some embodiments, the library comprises at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or less than 60, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, or 0.05% of the amplified product by simultaneously amplifying 100,000 different target gene sites to be primer dimers Lt; / RTI > In some embodiments, the library comprises: 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or primers that allow at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the amplified product to become the target amplicon by amplifying 100,000 different target gene sites. In some embodiments, the library comprises at least one of: 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the targeted gene loci other than 100,000 different target gene sites are amplified. In some embodiments, the library of primers is at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 primer pairs, wherein each pair of primers comprises a forward test primer and a reverse test primer, wherein each pair of test primers hybridizes to a target gene locus. In some embodiments, the library of primers comprises at least 1,000 that hybridize to different target gene sites, respectively; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 individual primers, wherein each individual primer is not part of a pair of primers.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 각각의 프라이머의 농도는 100, 75, 50, 25, 10, 5, 2, 또는 1 nM 미만이다. 다양한 구현예에서, 프라이머의 GC 함량은 40 내지 70% 또는 50 내지 60%를 포함하는 것과 같은, 30 내지 80%이다. 일부 구현예에서, 프라이머의 GC 함량의 범위는 30, 20, 10, 또는 5% 미만이다. 일부 구현예에서, 프라이머의 용융 온도는 50 내지 70℃, 55 내지 65℃, 또는 57 내지 60.5℃를 포함하는 것과 같은, 40 내지 80℃이다. 일부 구현예에서, 프라이머의 용융 온도의 범위는 15, 10, 5, 3, 또는 1℃ 미만이다. 일부 구현예에서, 프라이머의 길이는 15 내지 75개의 뉴클레오타이드, 15 내지 40개의 뉴클레오타이드, 17 내지 35개의 뉴클레오타이드, 18 내지 30개의 뉴클레오타이드, 또는 20 내지 65개의 뉴클레오타이드를 포함하는 것과 같은, 15 내지 100개의 뉴클레오타이드이다. 일부 구현예에서, 프라이머는 내부 루프 구조를 형성하는 태그와 같은, 표적 특이적이지 않은 태그를 포함한다. 일부 구현예에서, 태그는 2개의 DNA 결합 영역 사이에 있다. 다양한 구현예에서, 프라이머는 표적 유전자자리에 대해 특이적인 5' 영역, 표적 유전자자리에 대해 특이적이지 않고 루프 구조를 형성하는 내부 영역, 및 표적 유전자자리에 대해 특이적인 3' 영역을 포함한다. 다양한 구현예에서, 3' 영역의 길이는 적어도 7 뉴클레오타이드이다. 일부 구현예에서, 3' 영역의 길이는 7 내지 15개의 뉴클레오타이드, 또는 7 내지 10개의 뉴클레오타이드를 포함하는 것과 같은 7 내지 20개의 뉴클레오타이드이다. 다양한 구현예에서, 프라이머는 표적 유전자자리에 대해 특이적이지 않은 5' 영역(다른 태그 또는 공통의 프라이머 결합 부위와 같은)에 이어서 표적 유전자자리에 대해 특이적인 영역, 표적 유전자자리에 대해 특이적이지 않은 내부 영역, 및 표적 유전자자리에 대해 특이적인 3' 영역을 포함한다. 일부 구현예에서, 프라이머의 길이의 범위는 50, 40, 30, 20, 10, 또는 5개 미만의 뉴클레오타이드이다. 일부 구현예에서, 표적 앰플리콘의 길이는 60 내지 80개의 뉴클레오타이드, 또는 60 내지 75개의 뉴클레오타이드를 포함하는 것과 같은 50 내지 100개의 뉴클레오타이드이다. 일부 구현예에서, 표적 앰플리콘의 길이의 범위는 50, 25, 15, 10, 또는 5개 미만의 뉴클레오타이드이다.In various embodiments of any of the aspects of the invention, the concentration of each primer is less than 100, 75, 50, 25, 10, 5, 2, or 1 nM. In various embodiments, the GC content of the primer is from 30 to 80%, such as from 40 to 70% or from 50 to 60%. In some embodiments, the GC content of the primer ranges from 30, 20, 10, or 5%. In some embodiments, the melting temperature of the primer is from 40 to 80 캜, such as from 50 to 70 캜, from 55 to 65 캜, or from 57 to 60.5 캜. In some embodiments, the range of melting temperatures of the primers is less than 15, 10, 5, 3, or 1 ° C. In some embodiments, the length of the primer is from 15 to 100 nucleotides, such as from 15 to 75 nucleotides, from 15 to 40 nucleotides, from 17 to 35 nucleotides, from 18 to 30 nucleotides, or from 20 to 65 nucleotides, to be. In some embodiments, the primers include non-target specific tags, such as tags that form an inner loop structure. In some embodiments, the tag is between two DNA binding sites. In various embodiments, the primer comprises a 5 'region specific for the target gene locus, an internal region that is not specific to the target gene locus but forms a loop structure, and a 3' region specific for the target gene locus. In various embodiments, the length of the 3 ' region is at least 7 nucleotides. In some embodiments, the length of the 3 'region is 7 to 15 nucleotides, or 7 to 20 nucleotides, such as 7 to 10 nucleotides. In various embodiments, the primer is a 5 'region (such as another tag or a common primer binding site) that is not specific for the target gene locus, a region specific for the target gene locus, a region specific for the target gene locus An internal region that is specific for the target gene locus, and a 3 ' region that is specific for the target gene locus. In some embodiments, the range of primer lengths is 50, 40, 30, 20, 10, or fewer than 5 nucleotides. In some embodiments, the length of the target amplicon is from 60 to 80 nucleotides, or from 50 to 100 nucleotides, such as from 60 to 75 nucleotides. In some embodiments, the length of the target amplicon is in the range of 50, 25, 15, 10, or less than 5 nucleotides.

하나의 측면에서, 본 발명은 핵산 시료 속에서 표적 유전자자리를 증폭하기 위한 본 발명의 프라이머 라이브러리 중 어느 것을 포함한 키트(kit)를 제공한다. 일부 구현예에서, 당해 키트는 라이브러리를 사용하여 표적 유전자자리를 증폭시키기 위한 설명서를 포함한다.In one aspect, the invention provides a kit comprising any of the primer libraries of the invention for amplifying a target gene locus in a nucleic acid sample. In some embodiments, the kit includes instructions for amplifying the target gene locus using the library.

하나의 측면에서, 본 발명은 잉태된 태아 내 염색체의 배수성 상태를 측정하는 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 핵산 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산함을 포함하며; 여기서 핵산 시료는 태아 모친의 모계 DNA 및 태아의 태아 DNA를 포함한다. 일부 구현예에서, 반응 혼합물은 프라이머 연장 반응 조건에 적용시켜 증폭된 생성물을 생산하고; 당해 증폭된 생성물은 고 배출 서열분석기로 측정하여 서열분석 데이터를 생산하며; 다형성 유전자자리에서 대립유전자 수는 서열분석 데이터를 기반으로 컴퓨터에서 계산되며; 염색체의 상이한 가능한 배수성 상태에 관한 각각의 배수성 가설 다수가 컴퓨터에서 생성되며; 염색체 상의 다형성 유전자자리에서 예측된 대립유전자 수에 대한 결합 분포 모델(joint distribution model)을 컴퓨터 상에서 각각의 배수성 가설에 대해 정립하고; 배수성 가설 각각의 상대적인 가능성을 컴퓨터 상에서 결합 분포 모델 및 대립유전자 수를 사용하여 측정하며; 태아의 배수성 상태는 최대 확률을 갖는 가설에 상응하는 배수성 상태를 선택함으로써 요청(calling)된다.In one aspect, the invention features a method for measuring the puerality of an infertile chromosome. In some embodiments, the method comprises contacting the nucleic acid sample with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different polymorphic loci, to produce a reaction mixture; Here, the nucleic acid sample contains the maternal DNA of the fetal mother and fetal DNA of the fetus. In some embodiments, the reaction mixture is subjected to primer extension reaction conditions to produce an amplified product; The amplified product is measured with a high-emission sequencer to produce sequencing data; The number of alleles at the polymorphic locus is computed on a computer based on sequence analysis data; Many of the respective drainage hypotheses regarding the different possible drainage states of the chromosome are generated in the computer; Establish a joint distribution model for the number of alleles predicted at the polymorphic locus on the chromosome for each diploid hypothesis on a computer; The relative likelihood of each of the drainage hypotheses is measured on a computer using the joint distribution model and the number of alleles; The fetal drainage state is called by selecting the drainage state corresponding to the hypothesis with the greatest probability.

하나의 측면에서, 본 발명은 잉태된 태아에서 염색체의 배수성 상태를 측정하는 방법을 특징으로 한다. 일 구현예에서, 잉태된 태아에서 염색체의 배수성 상태를 측정하는 방법은 태아 모친의 모계 DNA 및 태아의 태아 DNA를 포함하는 제1의 DNA 시료를 수득하는 단계, DNA를 분리함으로써 제1의 시료를 제조하여 제조된 시료를 수득하는 단계, 염색체 상의 다수의 다형성 유전자자리에서 제조된 시료 속의 DNA를 측정하는 단계, 컴퓨터 상에서 제조된 시료 상에서 제조된 DNA 측정에 의한 다수의 다형성 유전자자리에서 대립유전자 수를 계산하는 단계, 컴퓨터 상에서 각각 염색체의 상이한 가능한 배수성 상태에 관한 다수의 배수성 가설을 생성하는 단계, 컴퓨터 상에서 각각의 배수성 가설에 대한 염색체 상에서 다수의 다형성 유전자자리에서 예측된 대립유전자 수에 대한 결합 분포 모델을 구축하는 단계, 컴퓨터 상에서 결합 분포 모델 및 제조된 시료 상에서 측정된 대립유전자 수를 사용하여 배수성 가설 각각의 상대적인 가능성을 측정하는 단계, 및 최대 확률을 지닌 가설에 상응하는 배수성 상태를 선택함으로써 태아의 배수성 상태를 요청하는 단계를 포함한다.In one aspect, the invention features a method for measuring the ploidy status of a chromosome in a conceived fetus. In one embodiment, a method for determining the chromatic status of a chromosome in an infertile fetus includes obtaining a first DNA sample comprising maternal DNA of the fetal mother and fetal DNA of the fetus, isolating the first sample by separating the DNA Measuring the DNA in a sample prepared at a plurality of polymorphic loci on a chromosome, measuring the number of alleles at a plurality of polymorphic loci by DNA measurement prepared on a sample prepared on a computer, Generating a plurality of hypothetical hypotheses about the different possible polydispersity states of each of the chromosomes on the computer, determining a binding distribution model for the number of alleles predicted at a plurality of polymorphic loci on a chromosome for each of the hypothetical hypotheses on a computer Establishing a binding distribution model on the computer and a sample By measuring the relative probability of each hypothesis drainage using a specified number of alleles, and select a drainage state that corresponds to the hypothesis having the maximum probability and a step of requesting the drainage state of the fetus.

하나의 측면에서, 본 발명은 모계 및 태아 DNA의 혼합물을 포함하는 시료 속에서 염색체의 비정상적인 분포에 대한 시험 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 (i) 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계(여기서, 표적 유전자자리는 다수의 상이한 염색체에서 기원되고; 다수의 상이한 염색체는 시료 속에 비정상적인 분포를 갖는 것으로 예측된 적어도 하나의 제1의 염색체 및 시료 속에 일반적으로 분포된 것으로 예측된 적어도 하나의 제2의 염색체를 포함한다); (ii) 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 증폭된 생성물을 생산하는 단계; (iii) 증폭된 생성물을 서열분석하여 표적 유전자자리에 대해 지정된 다수의 서열 태그를 수득하는 단계(여기서, 서열 태그는 특이적인 표적 유전자자리에 지정되기에 충분한 길이이다); (iv) 컴퓨터 상에서 다수의 서열 태그를 이들의 상응하는 표적 유전자자리에 지정하는 단계; (v) 컴퓨터 상에서 제1의 염색체의 표적 유전자자리에 지정된 서열 태그의 수 및 제2의 염색체의 표적 유전자자리에 지정된 서열 태그의 수를 측정하는 단계; 및 (vi) 컴퓨터 상에서 단계 (v)에서 나온 수를 비교하여 제1의 염색체의 비정상 분포의 존재 또는 부재를 측정하는 단계를 포함한다.In one aspect, the invention features a test method for abnormal distribution of chromosomes in a sample comprising a mixture of maternal and fetal DNA. In some embodiments, the method comprises (i) contacting the sample with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different target gene sites to produce a reaction mixture, wherein the target gene locus is derived from a number of different chromosomes, a plurality of different chromosomes having an abnormal distribution in the sample At least one first chromosome predicted to be present in the sample and at least one second chromosome predicted to be generally distributed in the sample); (ii) applying the reaction mixture to primer extension reaction conditions to produce an amplified product; (iii) sequencing the amplified product to obtain a plurality of sequence tags designated for the target gene locus, wherein the sequence tag is of sufficient length to be assigned to a specific target gene locus; (iv) assigning a plurality of sequence tags to their corresponding target gene sites on the computer; (v) measuring the number of sequence tags assigned to the target gene locus of the first chromosome and the number of sequence tags assigned to the target gene locus of the second chromosome on the computer; And (vi) comparing the numbers from step (v) on the computer to determine the presence or absence of an abnormal distribution of the first chromosome.

하나의 측면에서, 본 발명은 태아 이수성의 존재 또는 부재를 측정하는 방법을 제공한다. 일부 구현예에서, 당해 방법은 (i) 모계 및 태아 DNA의 혼합물을 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 비-다형성 표적 유전자자리에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계(여기서, 표적 유전자자리는 다수의 상이한 염색체에서 기원한다); (ii) 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 표적 앰플리콘을 포함하는 증폭된 생성물을 생산하는 단계; (iii) 컴퓨터 상에서 목적한 제1 및 제2의 염색체의 표적 앰플리콘의 상대적 빈도를 정량화하는 단계; (iv) 컴퓨터 상에서 목적한 제1 및 제2의 염색체의 표적 앰플리콘의 상대적 빈도를 비교하는 단계; 및 (v) 목적한 제1 및 제2의 염색체의 비교된 상대적 빈도를 기반으로 이수성의 존재 또는 부재를 확인하는 단계를 포함한다. 일부 구현예에서, 제1의 염색체는 정배수성인 것으로 추측된 염색체이다. 일부 구현예에서, 제2의 염색체는 이수성인 것으로 추측된 염색체이다.In one aspect, the invention provides a method of measuring the presence or absence of fetal insulin resistance. In some embodiments, the method comprises: (i) providing a mixture of maternal and fetal DNA at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different non-polymorphic target gene sites to produce a reaction mixture, wherein the target gene locus is from a number of different chromosomes; (ii) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising the target amplicon; (iii) quantifying the relative frequency of the target amplicon of the desired first and second chromosomes on the computer; (iv) comparing the relative frequencies of the target amplicon of the first and second chromosomes desired on the computer; And (v) identifying the presence or absence of the biosynthetic based on the relative relative frequency of the desired first and second chromosomes. In some embodiments, the first chromosome is a chromosome deduced to be of the order of magnitude. In some embodiments, the second chromosome is a chromosome suspected of being isomeric.

하나의 측면에서, (a) 태아 및 모계 게놈 DNA를 포함하는 모계 조직 시료로부터 태아 및 모계 게놈 DNA의 혼합물을 수득하는 단계, (b) 단계 (a)의 태아 및 모계 게놈 DNA의 혼합물에서 무작위로 선택된 DNA 단편의 거대한 평행 DNA 서열분석을 수행하여 상기 DNA 단편의 서열을 측정하는 단계, (c) 단계 (b)에서 수득된 서열이 속하는 염색체를 확인하는 단계, (d) 단계 (c)의 데이터를 사용하여 모계 및 태아 게놈 DNA의 상기 혼합물 중 적어도 하나의 제1의 염색체의 양을 측정하는 단계(여기서, 상기 적어도 하나의 제1의 염색체는 태아 속에서 정배수성인 것으로 추정된다), (e) 단계 (c)의 데이터를 사용하여 모계 및 태아 게놈 DNA의 상기 혼합물 중 제2의 염색체의 양을 측정하는 단계(여기서, 상기 제2의 염색체는 태아 속의 이수성이 되는 것으로 추측된다), (f) 태아 및 모계 DNA의 혼합물 중 태아 DNA의 분획을 계산하는 단계, (g) 제2의 표적 염색체가 정배수성인 경우, 단계 (d)에서의 수를 사용하여 제2의 표적 염색체의 양의 예측된 분포를 게산하는 단계, (h) 제2의 표적 염색체가 이수성인 경우, 단계 (d)의 제1의 수 및 단계 (f)에서 태아 및 모계 DNA의 혼합물 중 태아 DNA의 계산된 분획을 사용하여 제2의 표적 염색체의 양의 예측된 분포를 계산하는 단계, 및 (i) 최대 확률 또는 최대 후순위 접근법을 사용하여 단계 (e)에서 측정된 제2의 염색체의 양이 단계 (g)에서 계산된 분포 또는 단계 (h)에서 계산된 분포의 일부일 가능성이 보다 더 있는지를 측정함으로써; 태아 이수성의 존재 또는 부재를 나타내는 단계를 포함하는, 태아 및 모계 게놈 DNA를 포함하는 모계 조직 시료 속에서 태아 이수성의 존재 또는 부재를 측정하기 위한 방법이 기재되어 있다.(A) obtaining a mixture of fetal and maternal genomic DNA from a maternal tissue sample comprising fetal and maternal genomic DNA, (b) randomly selecting a mixture of fetal and maternal genomic DNA of step (a) (C) identifying the chromosome to which the sequence obtained in step (b) belongs, (d) comparing the data of step (c) Measuring the amount of at least one first chromosome in said mixture of maternal and fetal genomic DNA, wherein said at least one first chromosome is presumed to be infinite in the fetus; (e ) Measuring the amount of the second chromosome in said mixture of maternal and fetal genomic DNA using the data of step (c), wherein said second chromosome is assumed to be fetal in origin, (f Fetus and Calculating the fraction of the fetal DNA in the mixture of genomic DNA, (g) using the number in step (d) when the second target chromosome is in the range of one to several times the predicted distribution of the amount of the second target chromosome (H) using a calculated fraction of fetal DNA in a mixture of fetal and maternal DNA in the first number of step (d) and step (f) if the second target chromosome is isomeric, (I) the amount of the second chromosome measured in step (e) using the maximum probability or maximum subordinate approach is greater than the distribution calculated in step (g) or By determining whether there is more likelihood of being part of the distribution calculated in step (h); There is described a method for measuring the presence or absence of fetal insemination in a maternal tissue sample comprising fetal and maternal genomic DNA, comprising the step of indicating the presence or absence of fetal insulin resistance.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 당해 방법은 또한 태아의 한쪽 또는 양쪽 부모의 유전형 데이터를 수득하는 단계를 포함한다. 일부 구현예에서, 태아의 한쪽 또는 양쪽 부모의 유전형 데이터를 수득하는 단계는 부모의 DNA를 제조하는 단계(여기서 당해 제조 단계는 다수의 다형성 유전자자리에서 DNA를 우선적으로 농축시키는 단계, 임의로, 제조된 부모 DNA를 증폭시키는 단계를 포함한다), 및 다수의 다형성 유전자자리에서 제조된 시료 중 모계 DNA를 측정하는 단계를 포함한다.In various embodiments of any of the aspects of the invention, the method also includes obtaining genotype data of one or both parents of the fetus. In some embodiments, the step of obtaining genotyping data for one or both parents of the fetus comprises the steps of producing the parent ' s DNA, wherein the manufacturing step comprises preferentially enriching the DNA at a plurality of polymorphic loci, And amplifying the parent DNA), and measuring the maternal DNA among the samples prepared at a plurality of polymorphic loci.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 염색체에서 다수의 다형성 유전자자리의 예측된 대립유전자 수 가능성에 대한 결합 분포 모델을 구축하는 단계는 한쪽 또는 양쪽 부모로부터 수득된 유전 데이터를 사용하여 수행된다. 일부 구현예에서, 시료(예를 들면, 제1의 시료)는 모계 혈장에서 분리되었으며, 여기서 모친으로부터 유전형 데이터를 수득하는 것은 제조된 시료에서 이루어진 DNA 측정을 통해 얻은 모계 유전형 데이터를 평가하여 수행한다.In various embodiments of any of the aspects of the invention, the step of constructing a binding distribution model for the predicted allele number probability of multiple polymorphic loci in a chromosome is performed using genetic data obtained from one or both parents . In some embodiments, a sample (e. G., A first sample) has been isolated in a maternal plasma, where obtaining genotyping data from the mother is performed by evaluating maternal genotype data obtained from DNA measurements made in the prepared sample .

하나의 측면에서, 잉태된 태아 속의 염색체의 배수성 상태를 측정하는 것을 돕는 진단 박스(diagnostic box)가 기재되어 있으며, 여기서 당해 진단 박스는 본 발명의 방법 중의 어느 것의 제조 및 측정 단계를 실행할 수 있다.In one aspect, there is described a diagnostic box that helps to measure the morbidity of the chromosomes in the fetal fetus, wherein the diagnostic box is capable of performing the manufacturing and measuring steps of any of the methods of the present invention.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 대립유전자 수는 이원성이라기 보다는 확률론적이다. 일부 구현예에서, 다수의 다형성 유전자자리에서 제조된 시료 중 DNA의 측정을 또한 사용하여 태자녀 하나 또는 다수의 질병 결합된 일배체형을 유전받았는지를 측정한다.In various embodiments of any of the aspects of the invention, the allele number is probabilistic rather than binary. In some embodiments, measurement of DNA in samples prepared at multiple polymorphic loci is also used to determine whether one or more disease-linked haplotypes are inherited.

본 발명의 측면중의 어느 것의 다양한 구현예에서, 대립유전자 수 가능성에 대한 결합 분포 모델을 구축하는 것은 염색체 속의 상이한 위치에서 교차하는 염색체의 가능성에 대한 데이터를 사용하여 염색체 상의 다형성 대립유전자 사이의 의존성을 모델화함으로써 수행한다. 일부 구현예에서, 대립유전자 수에 대한 결합 분포 모델을 구축하고 각각의 가설의 상대적인 가능성을 측정하는 단계는 참조 염색체의 사용을 필요로 하지 않는 방법을 사용하여 수행한다.In various embodiments of any of the aspects of the present invention, building a binding distribution model for allele probability is based on the use of data on the likelihood of chromosomes crossing at different locations in a chromosome to determine the dependence between polymorphic alleles on a chromosome . In some embodiments, constructing a binding distribution model for the number of alleles and measuring the relative likelihood of each hypothesis is performed using a method that does not require the use of a reference chromosome.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 각각의 가설의 상대적 가능성을 측정하는 것은 제조된 시료 속의 태아 DNA의 추정된 분획을 사용하도록 한다. 일부 구현예에서, 대립혈질 수 가능성을 게산하고 각각의 가설의 상대적인 가능성을 측정하는데 사용된 제조된 시료의 DNA 측정은 1차의 유전 데이터를 포함한다. 일부 구현예에서, 최대 확률을 지닌 가설에 상응하는 배수성 상태를 선택하는 것은 최대 확률 평가 또는 최대 후순위 추정을 사용하여 수행한다.In various embodiments of any of the aspects of the invention, measuring the relative likelihood of each hypothesis allows the use of an estimated fraction of fetal DNA in the prepared sample. In some embodiments, the DNA measurement of the manufactured sample used to calculate the likelihood of alleles and to measure the relative likelihood of each hypothesis comprises primary genetic data. In some implementations, choosing a drainage state corresponding to a hypothesis with a maximum probability is performed using a maximum likelihood estimation or a maximum subordinate estimate.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 태아의 배수성 상태를 요청하는 것은 또한 결합 분포 모델을 사용하여 측정한 배수성 가설 및 대립유전자 수 가능성을 실제 수 분석으로 일어진 그룹 중에서 취한 통계적 기술을 사용하여 계산된 배수성 가설 각각의 상대적 가능성과 결합시키는 단계, 모계 유전 정보가 사용된 경우 유일하게 이용가능한 통계인, 이형접합성 비율, 시료(예를 들면, 제1의 시료) 또는 제조된 시료, 및 이의 조합의 추정된 태아 분획을 사용하여 계산된 통계인, 특정의 부모 관계에 대해 표준화된 유전형 신호의 가능성을 비교하는 단계를 포함한다.In various embodiments of any of the aspects of the present invention, requesting the fetal condition of the embryo also includes using a statistical technique taken from a group of actual numerical analyzes to determine the probabilities of the hypothesis and alleles measured using the binding distribution model (For example, a first sample) or a manufactured sample, and a second sample, which is the only statistic available when maternal genetic information is used, Comparing the likelihood of a standardized genotype signal to a particular parent relationship, which is a statistic calculated using the estimated fetal fraction of the combination.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 신뢰 추정치는 요청된 배수성 상태에 대해 계산된다. 일부 구현예에서, 당해 방법은 또한 태아의 요청된 배수성 상태를 기반으로 한 임상 작용을 고려하는 단계를 포함하며, 여기서 임상 작용은 임신을 종료하거나 임신을 유지하는 것 중의 하나에서 선택된다.In various embodiments of any of the aspects of the invention, the confidence estimate is calculated for the requested drainage condition. In some embodiments, the method also includes taking into account a clinical action based on the requested drainage condition of the fetus, wherein the clinical action is selected from one of ending the pregnancy or maintaining the pregnancy.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 당해 방법은 임신 4 내지 5주째; 임신 5 내지 6주째; 임신 6 내지 7주째; 임신 7 내지 8주째; 임신 8 내지 9주째; 임신 9 내지 10주째; 임신 10 내지 12주째; 임신 12 내지 14주째; 임신 14 내지 20주째; 임신 20 내지 40주째; 처음 3개월; 두번째 3개월; 세번째 3개월; 또는 이의 조합에서 태아에 대해 수행될 수 있다.In various embodiments of any of the aspects of the present invention, the method comprises administering a compound of formula 5-6 weeks of pregnancy; 6 to 7 weeks of pregnancy; 7-8 weeks of gestation; 8 to 9 weeks of pregnancy; 9th to 10th week of pregnancy; 10 to 12 weeks of pregnancy; 12 to 14 weeks of gestation; 14th to 20th week of pregnancy; 20 to 40 weeks of pregnancy; First 3 months; The second three months; Third three months; Or a combination thereof.

본 발명의 측면 중의 어느 것의 다양한 구현예에서, 잉태된 태아에서 염색체의 측정된 배수성 상태를 나타내는 보고는 당해 벙법을 사용하여 생성하였다. 일부 구현예에서, 본 발명의 방법 중 어느 것을 사용하도록 설계된 잉태된 태아에서 표적 염색체의 배수성 상태를 측정하기 위한 키트가 기재되어 있으며, 당해 키트는 다수의 내부 전방 프라이머 및 임의로 다수의 내부 역방 프라이머를 포함하고, 여기서 프라이머 각각은, 표적 염색체, 및 임의로 추가 염색체 상의 다형성 부위 중 하나의 상부 및/또는 하부에 바로 DNA의 영역에 하이브리드화하도록 설계되며, 여기서 하이브리드화 영역은 소수의 염기에 의해 다형성 부위에서 분리되며, 여기서 소수는 1, 2, 3, 4, 5, 6 내지 10, 11 내지 15, 16 내지 20, 21 내지 25, 26 내지 30, 31 내지 60, 및 이의 조합으로 이루어진 그룹 중에서 선택된다.In various embodiments of any of the aspects of the invention, reports showing the measured drainage status of the chromosomes in the conceived fetus were generated using this method. In some embodiments, there is described a kit for measuring the ploidy status of a target chromosome in an infertile fetus designed to use any of the methods of the present invention, comprising a plurality of internal forward primers and optionally a plurality of internal reverse primers Wherein each of the primers is designed to hybridize to a region of DNA directly above and / or below one of the target chromosomes and, optionally, one of the polymorphic sites on an additional chromosome, wherein the hybridization region is flanked by a polynucleotide Wherein the prime number is selected from the group consisting of 1, 2, 3, 4, 5, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, 31 to 60, .

하나의 측면에서, 본 발명은 주장하는 아버지가 임신모에서 잉태중인 태아의 생물학적 아버지인지를 확립하기 위한 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 (i) 주장된 부친의 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리를 포함하는 다수의 다형성 유전자자리를 동시에 증폭시켜 제1 세트의 증폭된 생성물을 생산하는 단계; (ii) 임신모에서 채취한 생물학적 시료의 DNA의 혼합된 시료에서 상응하는 다수의 다형성 유전자자리를 동시에 증폭시켜 제2 세트의 증폭된 생성물을 생산하는 단계(여기서 DNA의 혼합된 시료는 태아 DNA 및 모계 DNA를 포함한다); (iii) 컴퓨터 상에서 주장된 아버지가 태아의 생물학적 아버지일 가능성을 제1 및 제2 세트의 증폭된 생성물을 기반으로 한 유전형 측정을 사용하여 측정하는 단계, 및 (iv) 주장된 아버지가 태아의 생물학적 아버지일 측정된 가능성을 사용하여 주장된 아버지가 태아의 생물학적 아버지인지를 측정하는 단계를 포함한다. 다양한 구현예에서, 당해 방법은 모친의 유전 물질에서 상응하는 다수의 다형성 유전자자리를 동시에 증폭시켜 제3 세트의 증폭된 생성물을 생산하는 단계를 추가로 포함하며; 여기서 주장된 아버지가 태아의 생물학적 아버지인 가능성은 제1, 제2, 및 제3 세트의 증폭된 생성물을 기반으로 유전형 측정을 사용하여 측정한다.In one aspect, the invention features a method for establishing whether a claimed father is a biological father of a fetus being conceived in a pregnant mother. In some embodiments, the method comprises (i) at least 1,000 of the claimed father; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or simultaneously amplifying a plurality of polymorphic loci comprising 100,000 different polymorphic loci to produce a first set of amplified products; (ii) simultaneously amplifying a corresponding plurality of polymorphic loci in a mixed sample of DNA of the biological sample taken from the pregnant mother to produce a second set of amplified products, wherein the mixed sample of DNA comprises fetal DNA and Including maternal DNA); (iii) measuring the likelihood that the claimed father is a biological father of the fetus on a computer using genetic measures based on the first and second sets of amplified products, and (iv) Using the measured probabilities of father days to determine whether the claimed father is a biological father of the fetus. In various embodiments, the method further comprises simultaneously amplifying a corresponding plurality of polymorphic loci in the maternal genetic material to produce a third set of amplified products; The likelihood that the claimed father is the biological father of the fetus is measured using genotyping based on amplified products of the first, second, and third sets.

하나의 측면에서, 본 발명은 배아의 세트의 각 배자녀 경우에 따라 발달할 상대적 가능성을 평가하는 방법을 제공한다. 일부 구현예에서, 당해 방법은 각각의 배아에서 채취한 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 각각의 배아에 대한 반응 혼합물을 생산하는 단계를 포함하며, 상기 시료는 배아의 하나 이상의 세포에서 각각 채취한다. 일부 구현예에서, 각각의 반응 혼합물은 프라이머 연장 반응 조건에 적용되어 증폭된 생성물을 생산한다. 일부 구현예에서, 당해 방법은 컴퓨터 상에서 증폭된 생성물을 기반으로 각각의 배아의 적어도 하나의 세포의 하나 이상의 특성을 측정하는 단계; 및 컴퓨터 상에서 각각의 배자녀 경우에 따라 발달할 상대적 가능성을 각각의 배아에 대한 적어도 하나의 세포의 하나 이상의 특성을 기반으로 추정하는 단계를 포함한다.In one aspect, the present invention provides a method for assessing the relative likelihood of developing on each occasion of a child's set of embryos. In some embodiments, the method comprises contacting a sample taken from each embryo with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different target gene sites to produce a reaction mixture for each embryo, said sample being collected from one or more cells of the embryo, respectively. In some embodiments, each reaction mixture is subjected to primer extension reaction conditions to produce amplified products. In some embodiments, the method comprises: measuring one or more characteristics of at least one cell of each embryo based on an amplified product on a computer; And estimating, based on at least one characteristic of the at least one cell for each embryo, the relative likelihood of developing each embryo on a computer.

하나의 측면에서, 본 발명은 핵산 시료 속에서 2개 이상의 표적 유전자자리의 양을 측정하는 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 (i) PCR을 사용하여 제1의 표준 유전자자리, 제2의 표준 유전자자리, 제1의 표적 유전자자리, 및 제2의 표적 유전자자리를 포함하는 핵산 시료를 증폭시켜 증폭된 생성물을 형성하는 단계(여기서 제1의 표준 유전자자리 및 제1의 표적 유전자자리는 동일한 수의 뉴클레오타이드를 갖지만 하나 이상의 뉴클레오타이드에서 상이한 서열을 가지며; 여기서 제2의 표준 유전자자리 및 제2의 표적 유전자자리는 동일한 수의 뉴클레오타이드를 갖지만 하나 이상의 뉴클레오타이드에서 상이한 서열을 갖는다); (ii) 증폭된 생성물을 서열분석하여 증폭된 제2의 표준 유전자자리와 비교된 증폭된 제1의 표준 유전자자리의 상대적인 양을 비교하는 표준 비를 측정하는 단계(여기서 당해 표준 비는 제1의 표준 유전자자리 및 제2의 표준 유전자자리의 증폭에 대한 PCR 효능에 있어서의 차이를 나타낸다); (iii) 증폭된 제2의 표적 유전자자리에 대해 비교된 증폭된 제1의 표적 유전자자리의 상대적인 양을 비교하는 표적 비를 측정하는 단계; 및 (iv) 단계 (iii)에서 측정한 표적 비를 단계 (ii)에서 측정한 표준 비를 기반으로 조절하여 시료 속에서 제1의 표적 유전자자리 및 제2의 표적 유전자자리의 상대적인 양을 측정하는 단계를 포함한다. 다양한 구현예에서, 상기 방법은 시료속에서 제1의 표적 유전자자리 및 제2의 표적 유전자자리의 절대적인 양을 측정하는 단계를 포함한다. 다양한 구현예에서, 당해 방법은 시료 속에서 표적 유전자자리(예를 들면, 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리)의 존재 또는 부재를 측정하는 단계를 추가로 포함한다. 다양한 구현예에서, 당해 방법은 본 발명의 프라이머 라이브러리 중 어느 것도 사용하는 단계를 포함한다. 다양한 구현예에서, 당해 방법은 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리를 동시에 증폭시키는 단계를 포함한다.In one aspect, the invention features a method of measuring the amount of two or more target gene sites in a nucleic acid sample. In some embodiments, the method comprises: (i) amplifying a nucleic acid sample comprising a first standard gene locus, a second standard locus, a first target gene locus, and a second locus of a target gene using PCR; Wherein the first standard gene spot and the first target gene spot have the same number of nucleotides but different sequences in one or more nucleotides, wherein the second standard gene spot and the second The target gene locus has the same number of nucleotides but a different sequence at one or more nucleotides); (ii) sequencing the amplified product to determine a standard ratio that compares the amplified first standard gene place compared to the amplified second standard gene place, wherein the standard ratio is the first The standard gene locus and the difference in PCR efficiency for amplification of the second standard locus); (iii) measuring a target ratio comparing a relative amount of amplified first target gene locus compared to an amplified second target locus; And (iv) adjusting the target ratio measured in step (iii) based on the standard ratio measured in step (ii) to determine the relative amount of the first target gene spot and the second target gene spot in the sample . In various embodiments, the method comprises measuring the absolute amount of a first target gene spot and a second target gene spot in a sample. In various embodiments, the methods can be used to identify target gene sites (e.g., at least 1,000, 2,000, 5,000, 7,500, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 75,000, or 100,000 different target gene sites) Or the presence or absence of a < / RTI > In various embodiments, the methods comprise using any of the primer libraries of the invention. In various embodiments, the method comprises: 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or simultaneously amplifying 100,000 different target gene sites.

하나의 측면에서, 본 발명은 분석용 시료 속에서 다수의 유전적 표적을 정량적으로 측정하는 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 (i) 분석용 시료에서 기원한 유전 물질을 다수의 표적 특이적인 증폭 시약, 및 표적 특이적인 증폭 시약 표적에 상응하는 다수의 표준 서열과 혼합하는 단계; (ii) 유전 물질 및 표준 서열의 표적 영역을 증폭시켜 표적 앰플리콘 및 표준 서열 앰플리콘을 증폭시키는 단계; 및 (iii) 생산된 표적 앰플리콘 및 표준 서열 앰플리콘의 양을 측정하는 단계를 포함한다. 일부 구현예에서, 유전 물질은 유전자 라이브러리 속에 존재한다. 일부 구현예에서, 유전 표적은 다형성 유전자자리(SNP와 같은)이다. 일부 구현예에서, 양의 측정은 서열을 계수함으로써 달성한다. 일부 구현예에서, 당해 방법은 유전자 라이브러리가 기원하는 시료 속의 적어도 하나의 염색체의 추정된 카피 수를 측정하는 단계를 추가로 포함하며, 여기서 당해 측정은 표적 앰플리콘의 수를 표준 앰플리콘의 서열 판독물의 수와 비교하는 단계를 포함한다. 일부 구현예에서, 표준 서열 및 유전자 라이브러리는 동일한 프라이머에 의해 프라이밍될 수 있는 공통의 프라이밍 부위를 포함한다. 일부 구현예에서, 혼합 단계는 적어도 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 특이적인 증폭 시약 및 적어도 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 표준 서열을 포함한다. 다양한 구현예에서, 당해 방법은 본 발명의 프라이머 라이브러리중 어느 것을 사용하는 단계를 포함한다. 다양한 구현예에서, 당해 방법은 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 영역을 동시에 증폭시키는 단계를 포함한다. 일부 구현예에서, 각각의 표준 서열의 상대적인 양은 알려져 있다. 일부 구현예에서, 각각의 서열의 상대적인 양은 참조 게놈과 관련하여 교정되어 왔다. 일부 구현예에서, 분석용 시료는 태아 및 모계 게놈의 혼합물을 포함한다. 일부 구현예에서, 분석용 시료는 임신한 여성에서 또는 혈액 혈장에서 채취한다. 일부 구현예에서, 참조 게놈은 염색체 13, 18, 21, X, 또는 Y에서 이수성과 같은, 적어도 하나의 이수성을 갖는다. 일부 구현예에서, 참조 게놈은 이배체이다.In one aspect, the invention features a method for quantitatively measuring a plurality of genetic targets in a sample for analysis. In some embodiments, the method comprises: (i) mixing a genetic material from an analytical sample with a plurality of standard specific amplification reagents and a plurality of standard sequences corresponding to a target specific amplification reagent target; (ii) amplifying the target region of the genetic material and the standard sequence to amplify the target amplicon and the standard sequence amplicon; And (iii) measuring the amount of the target amplicon and standard sequence amplicon produced. In some embodiments, the genetic material is in a gene library. In some embodiments, the genetic target is a polymorphic locus (such as a SNP). In some embodiments, a positive measure is achieved by counting the sequence. In some embodiments, the method further comprises measuring an estimated number of copies of at least one chromosome in the sample from which the gene library originates, wherein the measurement is based on the number of target amplicons read from a sequence of a standard amplicon With the number of water. In some embodiments, the standard sequence and gene library comprise a common priming site that can be primed by the same primer. In some embodiments, the mixing step is at least 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target specific amplification reagents and at least 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 standard sequences. In various embodiments, the methods comprise using any of the primer libraries of the present invention. In various embodiments, the method comprises: 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or simultaneously amplifying 100,000 different target regions. In some embodiments, the relative amounts of each of the standard sequences are known. In some embodiments, the relative amounts of each sequence have been corrected with respect to the reference genome. In some embodiments, the sample for analysis comprises a mixture of fetal and maternal genomes. In some embodiments, analytical samples are taken from pregnant women or from blood plasma. In some embodiments, the reference genome has at least one isomerism, such as a chromosome 13, 18, 21, X, or a complement in Y. In some embodiments, the reference genome is a diploid.

하나의 측면에서, 본 발명은 다수의 유전 표준 서열을 포함하는 혼합물을 특징으로 하며, 여기서 혼합물 중의 각각의 유전 표준 서열의 상대적인 양은 참조 게놈에 대해 교정하여 측정하여 왔다. 다양한 구현예에서, 혼합물은 적어도, 10; 100; 500; 10,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 유전 표준 서열을 포함한다. 다양한 구현예에서, 유전 표준 서열은 제1의 공통의 프라이밍 부위, 제2의 공통의 프라이밍 부위, 제1의 표적 특이적인 프라이밍 부위, 제2의 표적 특이적인 프라이밍 부위, 및 제1과 제2의 표적 특이적인 프라이밍 부위 사이에 위치한 마커 서열을 포함하며, 여기서 제1의 표적 특이적인 부위 및 제2의 표적 특이적인 프라이밍 부위는 제1과 제2의 공통의 프라이밍 부위 사이에 위치한다. 다양한 구현예에서, 교정은 본 발명의 프라이머 라이브러리 중 어느 것을 사용함을 포함한다. 다양한 구현예에서, 상기 교정은 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000의 상이한 표적 영역을 동시에 증폭시키는 단계를 포함한다. 일부 구현예에서, 참조 게놈은 염색체 13, 18, 21, X, 또는 Y에서 이수성과 같은, 적어도 하나의 이수성을 갖는다. 일부 구현예에서, 참조 게놈은 이배체이다.In one aspect, the invention features a mixture comprising a plurality of genetic standard sequences, wherein the relative amount of each genetic standard sequence in the mixture has been calibrated against the reference genome. In various embodiments, the mixture comprises at least 10; 100; 500; 10,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 genetic standard sequences. In various embodiments, the genetic standard sequence comprises a first common priming site, a second common priming site, a first target specific priming site, a second target specific priming site, and a first and second common priming site, Specific priming site, wherein the first target-specific site and the second target-specific priming site are located between the first and second common priming sites. In various embodiments, the calibration includes using any of the primer libraries of the present invention. In various implementations, the calibration may be 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different target regions at the same time. In some embodiments, the reference genome has at least one isomerism, such as a chromosome 13, 18, 21, X, or a complement in Y. In some embodiments, the reference genome is a diploid.

하나의 측면에서, 본 발명은, 계산된 유전 표준 서열이 세트를 생산하는 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 (i) 참조 게놈으로 제조된 유전자 라이브러리, 다수의 표적 특이적인 증폭 프라이머 시약 세트, 및 표적 특이적인 증폭 시약 세트에 상응하는 다수의 유전 표준 서열을 포함하는 증폭 반응 혼합물을 형성시키는 단계, (ii) 유전자 라이브러리 및 유전 표준 서열을 증폭시켜 표적 서열에서 앰플리콘 및 유전 표준 서열에서 앰플리콘을 생산하는 단계, (iii) 표적 서열에서 앰플리콘 및 유전 표준 서열에서 앰플리콘을 측정하는 단계, 및 (iv) 각각의 유전 표준 서열의 상대적인 양을 각각에 대해 측정함으로써 다수의 유전 표준 서열을 교정하는 단계를 포함한다. 다양한 구현예에서, 적어도 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 유전 표준 서열이 사용된다. 다양한 구현예에서, 당해 방법은 본 발명의 프라이머 라이브러리 중 어느 것도 사용함을 포함한다. 다양한 구현예에서, 당해 방법은 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 서열을 동시에 증폭시키는 단계를 포함한다. 일부 구현예에서, 참조 게놈은 염색체 13, 18, 21, X, 또는 Y에서 이수성과 같은, 적어도 하나의 이수성을 갖는다. 일부 구현예에서, 참조 게놈은 이배체이다.In one aspect, the invention features a method wherein the calculated genetic standard sequence produces a set. In some embodiments, the method comprises (i) amplifying reaction mixture comprising a plurality of genetic standard sequences corresponding to a gene library prepared by the reference genome, a plurality of target specific amplification primer reagent sets, and a target specific amplification reagent set, (Ii) amplifying the genomic library and the genetic standard sequence to produce an amplicon in the target sequence from the amplicon and genetic standard sequence, (iii) amplifying the amplicon in the target sequence and the amplicon from the genetic standard sequence And (iv) calibrating a plurality of genetic standard sequences by measuring relative amounts of each genetic standard sequence, respectively. In various embodiments, at least 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 genetic standard sequences are used. In various embodiments, the methods comprise using any of the primer libraries of the present invention. In various embodiments, the method comprises: 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different sequences at the same time. In some embodiments, the reference genome has at least one isomerism, such as a chromosome 13, 18, 21, X, or a complement in Y. In some embodiments, the reference genome is a diploid.

하나의 측면에서, 본 발명은 본 발명의 방법들 중 어느 것에 따라 계산된 유전 표준 서열의 세트를 제공한다. 하나의 측면에서, 본 발명은, 방법이 수행되기 전, 동안 또는 후에 교정될 수 있는 유전 표준 서열의 세트를 제공한다.In one aspect, the invention provides a set of genetic standard sequences calculated according to any of the methods of the present invention. In one aspect, the invention provides a set of genetic standard sequences that can be calibrated before, during, or after a method is performed.

하나의 측면에서, 본 발명은 결실을 갖는 적어도 하나의 대립유전자를 갖는 다수의 목적 유전자의 카피를 측정하는 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 (i) 분석용 시료에서 기원한 유전 물질을 목적 유전자에 대해 특이적이고 목적 유전자의 대립유전자를 포함하는 결실을 특이적으로 증폭시킬 수 없는 증폭 시약, 목적 유전자에 상응하는 표준 서열, 참조 서열에 대해 특이적인 증폭 시약, 및 참조 서열에 상응하는 표준 서열과 혼합하는 단계; (ii) 목적 유전자 서열, 목적 유전자에 상응하는 표준 서열, 참조 서열, 및 참조 서열에 상응하는 표준 서열을 증폭시켜 목적 앰플리콘의 유전자, 참조 서열 앰플리콘, 및 표준 서열 앰플리콘의 유전자를 생산하는 단계; 및 (iii) 생산된 표적 앰플리콘 및 표준 서열 앰플리콘의 양을 측정하는 단계를 포함한다. 일부 구현예에서, 양을 측정하는 단계는 서열 판독물을 계수함으로써 달성된다. 일부 구현예에서, 당해 방법은, 유전자 라이브러리가 기원한 시료 속의 적어도 하나의 염색체의 추정된 카피 수를 측정하는 단계를 추가로 포함하며, 여기서 당해 측정은 표적 앰플리콘의 다수의 서열을 표준 앰플리콘의 다수의 서열과 비교하는 단계를 포함한다. 일부 구현예에서, 표준 서열 및 유전자 라이브러리는 동일한 프라이머에 의해 프라이밍될 수 있는 공통의 프라이밍 부위를 포함한다. 일부 구현예에서, 각각의 서열의 상대적인 양은 참조 게놈과 관련하여 계산되어 왔다. 다양한 구현예에서, 적어도 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 유전 표준 서열이 사용된다. 다양한 구현예에서, 당해 방법은 본 발명의 프라이머 라이브러리 중의 어느 것도 사용하는 단계를 포함한다. 다양한 구현예에서, 당해 방법은 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 영역을 동시에 증폭시키는 단계를 포함한다. 일부 구현예에서, 참조 게놈은 이배체이다. 일부 구현예에서, 분석용 시료는 혈액에서 채취한다.In one aspect, the invention features a method of measuring a copy of a plurality of target genes having at least one allele with deletion. In some embodiments, the method comprises (i) contacting the genetic material originating in the sample for analysis with an amplification reagent that is specific for the target gene and that can not specifically amplify a deletion comprising the allele of the gene of interest, With a standard sequence corresponding to a reference sequence, an amplification reagent specific for the reference sequence, and a reference sequence corresponding to the reference sequence; (ii) amplifying the target sequence, the reference sequence corresponding to the target gene, the reference sequence corresponding to the target gene, the reference sequence, and the standard sequence corresponding to the reference sequence to produce the genes of the target amplicon, the reference sequence amplicon, and the standard sequence amplicon step; And (iii) measuring the amount of the target amplicon and standard sequence amplicon produced. In some embodiments, measuring the amount is accomplished by counting the sequence readings. In some embodiments, the method further comprises measuring an estimated number of copies of at least one chromosome in the sample from which the gene library originated, wherein the measurement comprises comparing a plurality of sequences of the target amplicon to standard amplicons With a plurality of sequences of SEQ ID NOs. In some embodiments, the standard sequence and gene library comprise a common priming site that can be primed by the same primer. In some embodiments, the relative amounts of each sequence have been calculated with reference to the reference genome. In various embodiments, at least 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 genetic standard sequences are used. In various embodiments, the methods comprise using any of the primer libraries of the present invention. In various embodiments, the method comprises: 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or simultaneously amplifying 100,000 different target regions. In some embodiments, the reference genome is a diploid. In some embodiments, the analytical sample is drawn from the blood.

본 발명의 측면중의 어느 것의 일부 구현예에서, 표적 유전자자리(예: 다수의 다형성 유전자자리)에서 시료(예: 제1의 시료)의 DNA를 우선적으로 농축시키는 것은 다수의 예비-원형화된 프로브를 수득하는 단계(여기서, 각각의 프로브는 유전자자리(예: 다형성 유전자자리) 중 하나를 표적화하고, 여기서 프로브의 3' 및 5' 말단은 바람직하게는 소수의 염기에 의해 유전자자리의 다형성 부위로 제조된 DNA의 영역에 하이브리드화하도록 설계되며, 여기서 소수는 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 내지 25, 26 내지 30, 31 내지 60, 또는 이의 조합이다), 예비-원형화된 프로브를 시료(예: 제1의 시료)의 DNA에 하이브리드화하는 단계, DNA 폴리머라제를 사용하여 하이브리드화된 프로브 말단 사이에 갭을 충전시키는 단계, 예비-원형화된 프로브를 원형화시키는 단계, 및 원형화된 프로브를 증폭시키는 단계를 포함한다.In some embodiments of any of the aspects of the invention, preferential enrichment of the DNA of a sample (e.g., a first sample) from a target gene locus (e.g., a plurality of polymorphic loci) Wherein each probe targets one of the loci (e.g., polymorphic locus), wherein the 3 ' and 5 ' ends of the probe are preferably located in the polymorphic locus of the locus by a few bases Wherein the primes are designed to hybridize to a region of DNA made from a DNA sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, Hybridizing the pre-circularized probe to the DNA of a sample (e.g., a first sample), contacting the DNA polymer Filling the gap between the hybridized probe ends using a lyse, adding a pre-circularized probe A rounding step, and amplifying the circularized probe.

본 발명의 측면중의 어느 것의 일부 구현예에서, 표적 유전자자리(예: 다수의 다형성 유전자자리)에서 DNA를 우선적으로 농축시키는 것은 다수의 연결-매개된 PCR 프로브를 수득하는 단계(여기서 각각의 PCR 프로브는 표적 유전자자리(예: 다형성 유전자자리) 중의 하나를 표적화하고, 여기서 PCR 프로브의 상부 및 하부는 소수의 염기에 의해 유전자자리의 다형성 부위에서 바람직하게 분리된 DNA의 하나의 쇄 상의 DNA 영역에 하이브리드로하도록 설계되며, 여기서 소수는 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 내지 30, 31 내지 60, 또는 이의 조합이다), 연결-매개된 PCR 프로브를 시료(예: 제1의 시료)의 DNA에 하이브리드화하는 단계, 연결-매개된 PCR 프로브 말단 사이의 갭을 DNA 폴리머라제를 사용하여 충전시키는 단계, 연결-매개된 PCR 프로브를 연결시키는 단계, 및 연결된 연결-매개된 PCR 프로브를 증폭시키는 단계를 포함한다.In some embodiments of any of the aspects of the invention, preferential enrichment of DNA in a target gene locus (e.g., a plurality of polymorphic loci) comprises obtaining a plurality of link-mediated PCR probes, wherein each PCR The probe targets one of the target gene sites (e.g., the polymorphic locus), wherein the top and bottom of the PCR probe are bound to a DNA region on one strand of DNA preferably separated at the polymorphic site of the locus by a small number of bases Hybrid, where the prime numbers are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 to 25, 26 to 30, 31 to 60, or combinations thereof), hybridizing the linkage-mediated PCR probe to the DNA of a sample (e.g., a first sample) Filling the gap with a DNA polymerase, linking-mediated PCR < RTI ID = 0.0 > And a step for amplifying the intermediate PCR probe-step, and the associated connections for connecting the lobes.

본 발명의 다양한 측면의 일부 구현예에서, 표적 유전자자리(예: 다수의 다형성 유전자자리)에서 DNA를 우선적으로 농축시키는 것은 유전자자리(예: 다형성 유전자자리)를 표적화하는 다수의 하이브리드의 포획 프로브를 수득하는 단계, 하이브리드 포획 프로브를 시료(예: 제1의 시료)의 DNA에 하이브리드화하는 단계 및 DNA의 시료(예: 제1의 시료)에서 하이브리드화되지 않은 DNA 중 일부 또는 모두를 물리적으로 제거하는 단계를 포함한다.In some embodiments of various aspects of the invention, preferential enrichment of DNA in a target gene locus (e.g., a plurality of polymorphic loci) involves capturing probes of a plurality of hybrids that target locus (e.g., polymorphic locus) Hybridizing the hybrid capture probe to the DNA of the sample (e.g., the first sample), and physically removing some or all of the DNA not hybridized in the sample of DNA (e.g., the first sample) .

본 발명의 측면중의 어느 것의 일부 구현예에서, 하이브리드 포획 프로브는 다형성 부위와 플랭킹(flanking)되어 있지만 다형성 부위와 중첩되지 않은 영역에 하이브리드화하도록 설계되어 있다. 일부 구현예에서, 하이브리드 포획 프로브는 다형성 부위와 플랭킹되어 있지만 다형성 부위와 중첩되지 않는 영역에 하이브리드화하도록 설계되며, 여기서 플랭킹 포획 프로브의 길이는 약 120개 미만의 염기, 약 110개 미만의 염기, 약 100개 미만의 염기, 약 90개 미만의 염기, 약 80개 미만의 염기, 약 70개 미만의 염기, 약 60개 미만의 염기, 약 50개 미만의 염기, 약 40개 미만의 염기, 약 30개 미만의 염기, 및 약 25개 미만의 염기로 이루어진 그룹 중에서 선택될 수 있다. 일부 구현예에서, 하이브리드 포획 프로브는 다형성 부위와 중첩된 영역에 하이브리드화하도록 설계되며, 여기서 다수의 하이브리드 포획 프로브는 각각의 다형성 유전자자리에 대해 적어도 2개의 하이브리드 포획 프로브를 포함하며, 여기서 각각의 하이브리드 포획 프로브는 다형성 유전자자리에서 상이한 대립유전자에 대해 상보성이 되도록 설계된다.In some embodiments of any of the aspects of the invention, the hybrid capture probe is designed to hybridize to a region that is flanked with a polymorphic site, but does not overlap a polymorphic site. In some embodiments, the hybrid capture probe is designed to hybridize to a polymorphic site and a region that is flanked but not overlapping the polymorphic site, wherein the length of the flanking capture probe is less than about 120 bases, less than about 110 bases Base, less than about 100 bases, less than about 90 bases, less than about 80 bases, less than about 70 bases, less than about 60 bases, less than about 50 bases, less than about 40 bases , Less than about 30 bases, and less than about 25 bases. In some embodiments, the hybrid capture probe is designed to hybridize to an overlapping region with a polymorphic site, wherein the plurality of hybrid capture probes comprise at least two hybrid capture probes for each polymorphic locus, wherein each hybrid Capture probes are designed to be complementary to different alleles at the polymorphic locus.

본 발명의 측면 중의 어느 것의 일부 구현예에서, 다수의 다형성 유전자자리에서 DNA를 우선적으로 농축시키는 것은 다수의 내부 전방(inner forward) 프라이머를 수득하는 단계(여기서 각각의 프라이머는 다형성 유전자자리 중 하나를 표적화하고, 여기서 내부로 향한 프라이머의 3' 말단은 다형성 부위의 DNA 상부의 영역에 하이브리드화하도록 설계되어 소수의 염기에 의해 다형성 부위에서 분리되며, 여기서 소수는 1, 2, 3, 4, 5, 6 내지 10, 11 내지 15, 16 내지 20, 21 내지 25, 26 내지 30, 또는 31 내지 60개의 염기 쌍으로 이루어진 그룹 중에서 선택된다), 다수의 내부 역 프라이머를 임의로 수득하는 단계(여기서 각각의 프라이머는 다형성 유전자자리 중 하나를 표적화하고, 여기서 내부 역 프라이머의 3' 말단은 다형성 부위의 DNA 상부의 영역에 하이브리드화하도록 설계되어 소수의 염기에 의해 다형성 부위에서 분리되며, 여기서 소수는 1, 2, 3, 4, 5, 6 내지 10, 11 내지 15, 16 내지 20, 21 내지 25, 26 내지 30, 또는 31 내지 60개의 염기쌍으로 이루어진 그룹 중에서 선택된다), 내부 프라이머를 DNA에 하이브리드화하는 단계, 및 DNA를 폴리머라제 연쇄 반응을 사용하여 증폭시켜 앰플리콘을 형성시키는 단계를 포함한다.In some embodiments of any of the aspects of the invention, preferentially enriching DNA at multiple polymorphic loci comprises obtaining a plurality of inner forward primers, wherein each primer has one of the polymorphic loci Wherein the 3 ' end of the inward directed primer is designed to hybridize to a region on the DNA of the polymorphic site and is separated at the polymorphic site by a small number of bases, wherein the prime is 1, 2, 3, 4, 5, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, or 31 to 60 base pairs), optionally obtaining a plurality of internal inverted primers, wherein each primer Targets one of the polymorphic loci, wherein the 3 ' end of the internal reverse primer hybridizes to a region on the DNA of the polymorphic site Designed and separated at a polymorphic site by a small number of bases wherein the prime number is 1, 2, 3, 4, 5, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, Base pair), hybridizing the internal primer to DNA, and amplifying the DNA using polymerase chain reaction to form an amplicon.

본 발명의 측면중의 어느 것의 일부 구현예에서, 당해 방법은 또한 다수의 외부로 향한(outer forward) 프라이머를 수득하는 단계(여기서 각각의 프라이머는 표적(예: 다형성 유전자자리) 중의 하나를 표적화하고, 여기서 외부로 향한 프라이머는 내부로 향한 프라이머의 상부의 DNA 영역에 하이브리드화하도록 설계된다), 다수의 외부로 향한 프라이머를 수득하는 단계(여기서 각각의 프라이머는 표적 유전자자리(예: 다형성 유전자자리) 중의 하나를 표적화하고, 여기서 외부 역 프라이머는 내부 역 프라이머의 바로 하부의 DNA의 영역에 하이브리드화하도록 설계된다), 제1의 프라이머를 DNA에 하이브리드화하는 단계, 및 DNA를 폴리머라제 연쇄 반응을 사용하여 증폭시키는 단계를 포함한다.In some embodiments of any of the aspects of the invention, the method also includes obtaining a plurality of outer forward primers, wherein each primer targets one of the targets (e.g., polymorphic locus) , Wherein the externally directed primer is designed to hybridize to a DNA region on top of the inward directed primer), obtaining a plurality of outwardly directed primers, wherein each primer has a target gene locus (e.g., a polymorphic locus) Wherein the external reverse primer is designed to hybridize to a region of DNA immediately below the internal reverse primer, hybridizing the first primer to the DNA, and using DNA polymerase chain reaction And amplifying the amplified signal.

본 발명의 측면 중의 어느 것의 일부 구현예에서, 당해 방법은 또한 다수의 외부 역방 프라이머를 수득하는 단계(여기서 각각의 프라이머는 다형성 유전자자리 중의 하나를 표적화하며, 여기서 외부 역방 프라이머는 내부 역방 프라이머의 바로 하부의 DNA의 영역에 하이브리드화하도록 설계된다), 다수의 외부 전방 프라이머를 임의로 수득하는 단계(여기서 각각의 프라이머는 표적 유전자자리(예를 들면, 다형성 유전자자리) 중의 하나를 표적화하고, 여기서 외부 전방 프라이머는 내부 전방 프라이머의 DNA 상부의 영역에 하이브리드화하도록 설계되어 있다), 제1의 프라이머를 DNA에 하이브리드화시키는 단계, 및 DNA를 폴리머라제 연쇄 반응을 사용하여 증폭시키는 단계를 포함한다.In some embodiments of any of the aspects of the invention, the method also includes obtaining a plurality of external reverse primers, wherein each primer targets one of the polymorphic gene sites, wherein the external reverse primer Optionally, a plurality of external forward primers, wherein each primer targets one of the target gene sites (e.g., the polymorphic locus), wherein the external forward primer The primer is designed to hybridize to the region on top of the DNA of the inner forward primer), hybridizing the first primer to the DNA, and amplifying the DNA using the polymerase chain reaction.

본 발명의 측면 중의 어느 것의 일부 구현예에서, 시료(예: 제1의 시료)를 제조하는 것은 시료(예: 제1의 시료) 속의 DNA에 공통의 어댑터(adapter)를 붙이는 단계 및 시료(예: 제1의 시료) 속의 DNA를 폴리머라제 연쇄 반응을 사용하여 증폭시키는 단계를 포함한다. 일부 구현예에서, 증폭된 앰플리콘 중의 적어도 하나의 분획은 100 bp 미만, 90 bp 미만, 80 bp 미만, 70 bp 미만, 65 bp 미만, 60 bp 미만, 55 bp 미만, 50 bp 미만, 또는 45 bp 미만이고, 상기 분획은 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 또는 99%이다.In some embodiments of any of the aspects of the invention, preparing a sample (e.g., a first sample) comprises attaching a common adapter to the DNA in the sample (e.g., the first sample) : First sample) using a polymerase chain reaction. In some embodiments, at least one fraction of the amplified amplicon is less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 65 bp, less than 60 bp, less than 55 bp, less than 50 bp, , And the fraction is 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 99%.

본 발명의 측면 중의 어느 것의 일부 구현예에서, DNA를 증폭시키는 것은 하나 또는 다수의 개개의 반응 용적 속에서 수행되며, 여기서 각각의 개개 반응 용적은 100개 이상의 상이한 전방 및 역방 프라이머 쌍, 200개 이상의 상이한 전방 및 역방 프라이머 쌍, 500개 이상의 상이한 전방 및 역방 프라이머 쌍, 1,000개 이상의 상이한 전방 및 역방 프라이머 쌍, 2,000개 이상의 상이한 전방 및 역방 프라이머 쌍, 5,000개 이상의 상이한 전방 및 역방 프라이머 쌍, 10,000개 이상의 상이한 전방 및 역방 프라이머 쌍, 20,000개 이상의 상이한 전방 및 역방 프라이머 쌍, 50,000개 이상의 상이한 전방 및 역방 프라이머 쌍, 또는 100,000개 이상의 상이한 전방 및 역방 프라이머 쌍을 함유한다.In some embodiments of any of the aspects of the invention, amplifying the DNA is performed in one or more individual reaction volumes, wherein each individual reaction volume comprises at least 100 different forward and reverse primer pairs, at least 200 At least 500 different forward and reverse primer pairs, at least 1,000 different forward and reverse primer pairs, at least 2,000 different forward and reverse primer pairs, at least 5,000 different forward and reverse primer pairs, at least 10,000 different forward and reverse primer pairs, Different forward and reverse primer pairs, more than 20,000 different forward and reverse primer pairs, more than 50,000 different forward and reverse primer pairs, or more than 100,000 different forward and reverse primer pairs.

본 발명의 측면 중의 어느 것의 일부 구현예에서, 시료(예: 제1의 시료)를 제조하는 것은 시료(예: 제1의 시료)를 다수의 부위로 나누는 단계를 포함하고, 여기서 각각의 부위 속의 DNA는 표적 유전자자리(예: 다수의 다형성 유전자자리)의 소세트에서 우선적으로 농축된다. 일부 구현예에서, 내부 프라이머는 프라이머 쌍들을 유사하게 확인함으로써 바람직하지 않은 프라이머 이본쇄(duplexe)를 형성시키고 다수의 프라이머의 바람직하지 않은 프라이머 이본쇄를 형성하는 경향이 있는 것으로 확인된 적어도 하나의 프라이머 쌍을 제거함으로써 선택된다. 일부 구현예에서, 내부 프라이머는 표적화된 유전자자리(예: 다형성 유전자자리)의 상부 또는 하부에 하이브리드화도록 설계된 영역을 함유하며, 임의로 PCR 증폭하도록 설계된 공통의 프라이밍 서열을 함유한다. 일부 구현예에서, 프라이머 중 적어도 일부는 각각의 개개 프라이머 분자에 대해 상이한 무작위 영역을 추가로 함유한다. 일부 구현예에서, 프라이머 중의 적어도 일부는 분자 바코드를 추가로 함유한다.In some embodiments of any of the aspects of the present invention, preparing a sample (e.g., a first sample) comprises dividing the sample (e.g., a first sample) into a plurality of sites, wherein each DNA is preferentially enriched in a small set of target gene sites (eg, multiple polymorphic loci). In some embodiments, the internal primers have at least one primer that has been found to similarly identify the primer pairs to form an undesirable primer duplex and tends to form undesirable primer duplexes of multiple primers Lt; / RTI > In some embodiments, the internal primer contains a region designed to hybridize to the top or bottom of the targeted gene locus (e.g., the polymorphic locus) and contains a common priming sequence designed to optionally amplify the PCR. In some embodiments, at least some of the primers additionally contain different random regions for each individual primer molecule. In some embodiments, at least some of the primers further contain a molecular bar code.

본 발명의 측면 중의 어느 것의 일부 구현예에서, 우선적인 농축은 제조된 시료 및 2의 인자 이하, 1.5의 인자 이하, 1.2의 인자 이하, 1.1의 인자 이하, 1.05의 인자 이하, 1.02의 인자 이하, 1.01의 인자 이하, 1.005의 인자 이하, 1.002의 인자 이하, 1.001의 인자 이하, 및 1.0001의 인자 이하로 이루어진 그룹 중에서 선택된 인자의 시료(예: 제1의 시료) 사이에 평균 정도의 대립유전자 편향을 생성한다. 일부 구현예에서, 다수의 다형성 유전자자리는 SNP이다. 일부 구현예에서, 제조된 시료 속에서 DNA를 측정하는 것은 서열분석으로 달성된다.In some embodiments of any of the aspects of the invention, preferential concentration is less than or equal to a factor of 2, less than or equal to a factor of 1.2, less than or equal to a factor of 1.1, less than or equal to a factor of 1.0, An average allele bias between samples of factors less than or equal to 1.01, less than or equal to 1.005, less than or equal to 1.002, less than or equal to 1.001, and less than or equal to 1.0001 (for example, a first sample) . In some embodiments, the plurality of polymorphic loci is a SNP. In some embodiments, measuring the DNA in the prepared sample is accomplished by sequencing.

본 발명의 측면 중의 어느 것의 일부 구현예에서, 표적 유전자자리는 동일한 목적 핵산(예를 들면, 동일한 염색체 또는 염색체의 동일한 영역)에 존재한다. 일부 구현예에서, 표적 유전자자리의 적어도 일부는 상이한 목적 핵산(예를 들면, 상이한 염색체)에 존재한다. 일부 구현예에서, 핵산 시료는 단편화되거나 분해된 핵산을 포함한다. 일부 구현예에서, 핵산 시료는 게놈 DNA, cDNA, 또는 mRNA를 포함한다. 일부 구현예에서, 핵산 시료는 단일 세포의 DNA를 포함한다. 일부 구현예에서, 핵산 시료는 실질적으로 세포를 포함하지 않는 혈액 또는 혈장 시료이다. 일부 구현예에서, 핵산 시료는 혈액, 혈장, 타액, 정액, 정자, 세포 배양 상층액, 점액 분비, 치아 플라크, 위장관 조직, 대변, 뇨, 모발, 골, 체액, 눈물, 조직, 피부, 손톱, 난할, 배아, 양수, 융모막 시료, 담즙, 림프액, 자궁경관 점액, 또는 법의학적 시료를 포함하거나 이들에서 기원한다. 일부 구현예에서, 표적 유전자자리는 사람 핵산의 분절이다. 일부 구현예에서, 표적 유전자자리는 단일 뉴클레오타이드 다형성(SNP)을 포함하거나 이로 이루어진다. 일부 구현예에서, 프라이머는 DNA 분자이다.In some embodiments of any of the aspects of the invention, the target gene locus is in the same target nucleic acid (e. G., The same region of the same chromosome or chromosome). In some embodiments, at least a portion of the target gene locus is present in a different target nucleic acid (e.g., a different chromosome). In some embodiments, the nucleic acid sample comprises a fragmented or degraded nucleic acid. In some embodiments, the nucleic acid sample comprises genomic DNA, cDNA, or mRNA. In some embodiments, the nucleic acid sample comprises single cell DNA. In some embodiments, the nucleic acid sample is a blood or plasma sample that is substantially free of cells. In some embodiments, the nucleic acid sample is selected from the group consisting of blood, plasma, saliva, semen, sperm, cell culture supernatant, mucus secretion, tooth plaque, gastrointestinal tissue, feces, urine, hair, bone, body fluids, Embryo, amniotic fluid, chorionic villus sample, bile, lymph fluid, cervical mucus, or forensic sample. In some embodiments, the target gene locus is a segment of a human nucleic acid. In some embodiments, the target gene locus comprises or consists of a single nucleotide polymorphism (SNP). In some embodiments, the primer is a DNA molecule.

본 발명의 측면 중의 어느 것의 일부 구현예에서, 시료(예: 제1의 시료) 속의 DNA는 모계 혈장에서 비롯된다. 일부 구현예에서, 시료(예: 제1의 시료)를 제조하는 것은 DNA를 증폭시키는 단계를 포함한다. 일부 구현예에서, 시료(예: 제1의 시료)를 제조하는 것은 표적 유전자자리(예: 다수의 다형성 유전자자리)에서 시료(예: 제1의 시료) 속에 DNA를 우선적으로 농축시키는 단계를 포함한다.In some embodiments of any of the aspects of the invention, the DNA in the sample (e.g., the first sample) is derived from maternal plasma. In some embodiments, preparing a sample (e.g., a first sample) comprises amplifying the DNA. In some embodiments, the preparation of a sample (e.g., a first sample) includes the step of preferentially concentrating DNA in a sample (e.g., a first sample) at a target gene locus (e.g., a plurality of polymorphic loci) do.

다양한 구현예에서, 프라이머 연장 반응 또는 폴리머라제 연쇄 반응은 폴리머라제에 의한 하나 이상의 뉴클레오타이드의 첨가를 포함한다. 다양한 구현예에서, 프라이머 연장 반응 또는 폴리머라제 연쇄 반응은 연결-매개된 PCR을 포함하지 않는다. 다양한 구현예에서, 프라이머 연장 반응 또는 폴리머라제 연쇄 반응은 리가제에의해 2개의 프라이머를 결합하는 단계를 포함하지 않는다. 다양한 구현예에서, 프라이머는 또한 예비-순환된 프로브, 예비-순환되는 프로브, 순환되는 프로브, 패드록 프로브(Padlock probe), 또는 분자 역 브로브(Molecular Inversion Probe: MIP)로 또한 불릴 수 있는 연결된 역위 프로브(Inverted Probe: LIP)를 포함하지 않는다.In various embodiments, the primer extension reaction or the polymerase chain reaction comprises the addition of one or more nucleotides by a polymerase. In various embodiments, primer extension reactions or polymerase chain reactions do not involve link-mediated PCR. In various embodiments, the primer extension reaction or the polymerase chain reaction does not involve the step of binding two primers by ligase. In various embodiments, the primers may also be linked (or linked), which may also be referred to as pre-circulating probes, pre-circulating probes, circulating probes, Padlock probes, or Molecular Inversion Probes It does not include an inverted probe (LIP).

본원에 기술된 본 발명의 측면 및 양태는 "포함하는", "이루어진", 및 "필수적으로 이루어진" 측면 및 양태를 포함하는 것으로 이해된다.It is understood that aspects and aspects of the invention described herein include " comprises, "" consisting ", and "consisting essentially of"

정의Justice

단일 뉴클레오타이드 다형성( SNP )은 동일한 종의 2개의 구성원의 게놈 사이에서 상이할 수 있는 단일 뉴클레오타이드를 말한다. 당해 용어의 사용은, 각각의 변이체가 발생하는 빈도에 있어서의 어떠한 제한도 내포하지 않아야 한다. A single nucleotide polymorphism ( SNP ) refers to a single nucleotide that can differ between the genomes of two members of the same species. The use of the term should not imply any limitation on the frequency with which each variant occurs.

서열은 DNA 서열 또는 유전 서열을 말한다. 이는 개체에서 DNA 분자 또는 쇄의 주요, 물리적 구조를 말할 수 있다. 이는 DNA 분자에서 발견된 뉴클레오타이드의 서열, 또는 DNA 분자에 대한 상보성 쇄를 말할 수 있다. 이는 인실리코( in silico)내에서 이의 표시로서 DNA 분자에 함유된 정보를 말할 수 있다. A sequence refers to a DNA sequence or a genetic sequence. It can refer to the major, physical structure of DNA molecules or chains in an individual. This can refer to a sequence of nucleotides found in a DNA molecule, or a complementary strand to a DNA molecule. This is an indication thereof in the room Rico (in silico) may refer to the information contained in the DNA molecule.

유전자자리는 SNP를 언급할 수 있는, 개체의 DNA 상의 목적한 특수 영역, 가능한 삽입 또는 결실의 부위, 또는 일부의 다른 관련된 유전적 변화의 부위를 말한다. 질병-연결된 SNP는 또한 질병-연결된 유전자자리를 말할 수 있다. A gene locus refers to a region of interest, a possible insertion or deletion site, or some other related genetic variation on a DNA of an individual, which may refer to a SNP. Disease-linked SNPs can also tell disease-linked gene loci.

다형성 대립유전자, 또한 "다형성 유전자자리"는, 유전형이 소정의 종 내에서 개체들 사이에서 변하는 유전자자리 또는 대립유전자를 말한다. 다형성 대립유전자의 일부 예는 단일 뉴클레오타이드 다형성, 짧은 탄뎀 반복체, 결실, 중복, 및 역전을 포함한다. A polymorphic allele, also referred to as a "polymorphic gene spot" refers to a locus or allele in which the genotype is altered between individuals within a given species. Some examples of polymorphic alleles include single nucleotide polymorphism, short tandem repeats, deletion, redundancy, and inversion.

다형성 부위는 개체들 사이에서 변하는 다형성 영역에서 발견된 특수 뉴클레오타이드를 말한다. Polymorphic sites refer to special nucleotides found in polymorphic regions that vary between individuals.

대립유전자는 특수 유전자자리를 점유한 유전자를 말한다. An allele is a gene that occupies a special gene locus.

유전 데이터 또는 "유전형 데이터 ( Genotypic Data )"는 한명 이상의 개체의 게놈의 측면을 기술하는 데이터를 말한다. 이는 유전자자리 중 하나 또는 한 세트, 부분 또는 전체 서열, 부분 또는 전체 염색체, 또는 전체 게놈을 말할 수 있다. 이는 하나 또는 다수의 뉴클레오타이드의 동일성을 나타낼 수 있고; 이는 연속 뉴클레오타이드의 세트 또는 게놈 내 다른 위치의 뉴클레오타이드, 또는 이의 조합을 말할 수 있다. 게놈형 데이터는 전형적으로 인실리코이지만, 화학적으로 암호화된 유전 데이터로서 서열내에 물리적 뉴클레오타이드를 고려하는 것이 또한 가능하다. 유전형 데이터는 개체 "상의", "의", "에서", "로부터" 또는 "상의"인 것으로 일컬어질 수 있다. 유전형 데이터는, 이의 측정이 유전 물질 상에서 이루어지는 경우 유전형 플랫폼(genotyping platform)에서의 산출량 측정을 말할 수 있다. Genetic data or "genotype data (Genotypic Data " refers to data describing the genome aspect of one or more individuals, which may refer to one or a set of genes, a partial or full sequence, a partial or full chromosome, or an entire genome, Can refer to the identity of a nucleotide, which can refer to a set of contiguous nucleotides, or a nucleotide at another position in the genome, or a combination thereof. Genomic data is typically silylic, but is chemically encoded as genetic data in which the physical nucleotides It is also possible to consider genetic data as being "on", "on", "on", "from" or "on" of the entity. This can be said to be a measure of yield on a genotyping platform.

유전 물질 또는 "유전 시료"는 DNA 또는 RNA를 포함하는 한 명 이상의 개체의 조직 또는 혈액과 같은 물리적 물질을 말한다. A genetic material or "genetic sample" refers to a physical material, such as blood or tissue, of one or more individuals, including DNA or RNA.

노이지 유전 데이터(Noisy Genetic Data)는 다음 중의 어느 하나를 지닌 유전 데이터를 말한다: 대립유전자 드롭아웃(dropout), 불특정 염기 쌍 측정, 부정확한 염기 쌍 측정, 잃어버린 염기 쌍 측정, 삽입 또는 결실의 불특정 측정, 염색체 분절 카피 수의 불특정 측정, 가짜 신호, 잃어버린 측정, 다른 오차, 또는 이의 조합. Noisy genetic data (Noisy Genetic Data) refers to the genetic data with any one of the following: allele dropouts (dropout), non-specific base pair measurements, incorrect base pair measurements, non-specific measure of lost base pair measurements, insertion or deletion , An unspecified measurement of the number of chromosome segmental copies, a false signal, a lost measurement, another error, or a combination thereof.

신뢰는 명명된 SNP, 대립유전자, 대립유전자의 세트, 배수성 요청(call), 또는 염색체 분절 카피의 측정된 수가 개체의 실제 유전 상태를 정확하게 나타내는 통계적 가능성을 말한다. Confidence refers to the statistical likelihood that accurately represents the actual genetic status of a named SNP, an allele, a set of alleles, a drainage call, or a chromosome segmental copy.

배수성 요청, 또한 "염색체 카피 수 요청", 또는 "카피 수 요청"(CNC)은 세포 내에 존재하는 하나 이상의 염색체의 양 및/또는 염색체 실체를 측정하는 작용을 말할 수 있다. A request for a chromosomal copy, or a request for a copy number (CNC), may refer to the action of measuring the amount of one or more chromosomes present in the cell and / or the chromosomal entity.

이수성은, 염색체의 잘못된 수(예를 들면, 완전한 염색체의 잘못된 수 또는 염색체 분절의 잘못된 수, 예를 들면, 염색체 분절의 결실 또는 중복의 존재)가 세포 속에 존재하는 상태를 말한다. 체세포 사람 세포의 경우에, 이는, 세포가 22개 쌍의 상염색체 및 1개 쌍의 성 염색체를 함유하지 않는 경우를 말할 수 있다. 사람 배우자의 경우, 이는, 세포가 23개 염색체 각각 중의 하나를 함유하지 않는 경우를 말할 수 있다. 단일 염색체 유형의 경우에, 이는, 2개 이상 또는 미만의 동종이나 동일하지 않은 염색체 카피가 존재하거나, 동일한 부모에서 기원하는 2개의 염색체 카피가 존재하는 경우를 말할 수 있다. 일부 구현예에서, 염색체 분절의 결실은 미세결실이다. This is a state in which a wrong number of chromosomes (for example, an erroneous number of complete chromosomes or an erroneous number of chromosomal segments, for example, deletion or redundancy of a chromosomal segment) is present in the cell. In the case of somatic cell, it can be said that the cell does not contain 22 pairs of autosomes and one pair of sex chromosomes. In the case of a human partner, this can refer to the case where the cell does not contain one of each of the 23 chromosomes. In the case of a single chromosome type, this may refer to the presence of two or more homologous or non-identical chromosomal copies, or the presence of two chromosomal copies originating from the same parent. In some embodiments, deletion of the chromosome segment is a microdeletion.

배수성 상태는 세포내에서 하나 이상의 염색체 유형의 양 및/또는 염색체 실체를 말한다. Drainage conditions refer to the amount of one or more chromosomal types and / or chromosomal entities within a cell.

염색체는 정상의 체세포내에서 46개체 DNA의 단일 분자를 의미하는 단일 염색체 카피를 말할 수 있으며, 예는 '모친에서 기원한 염색체 18'이다. 염색체는 또한 정상의 사람 체세포내에 23개가 존재하는 염색체 유형을 말할 수 있으며; 예는 '염색체 18'이다. Chromosomes can refer to a single chromosome copy, which means a single molecule of 46 individual DNAs in normal somatic cells, for example 'chromosome 18 originating from the mother'. Chromosomes can also refer to a type of chromosome in which there are 23 in normal human somatic cells; An example is 'chromosome 18'.

염색체 실체는 관련 염색체 수, 즉 염색체 유형을 말할 수 있다. 정상의 사람은 22개의 유형의 번호가 매겨진 자가 염색체 유형, 및 2개 유형의 성 염색체를 가진다. 이는 또한 염색체의 부모계 기원으로 언급될 수 있다. 이는 또한 부모로부터 유전된 특수 염색체를 말할 수 있다. 이는 또한 염색체의 다른 확인하는 특징을 말할 수 있다. Chromosomal entities can refer to the number of chromosomes involved, the type of chromosome. The normal person has 22 types of numbered autosomal types, and two types of sex chromosomes. It can also be referred to as the parental origin of chromosomes. It can also refer to special chromosomes inherited from parents. It can also refer to other identifying characteristics of chromosomes.

유전 물질의 상태 또는 단순히 "유전 상태"는 DNA 상의 SNP의 세트의 실체, 유전 물질의 단계적인 일배체형, 및 삽입, 결실, 반복 및 돌연변이를 포함하는 DNA의 서열을 말할 수 있다. 이는 또한 하나 이상의 염색체, 염색체 분절, 또는 염색체 분절의 세트의 배수성 상태를 말할 수 있다. The state of a dielectric material or simply a "dielectric state" The identity of the set of SNPs on the DNA, the stepwise haplotype of the genetic material, and the insertions, deletions, repetitions, and mutations of the DNA. It can also refer to the pivotal state of one or more chromosomes, chromosome segments, or sets of chromosome segments.

대립유전자 데이터는 하나 이상의 대립유전자의 세트에 관한 유전형 데이터의 세트를 말한다. 이는 단계적인, 일배체 데이터를 말할 수 있다. 이는 SNP 실체를 말할 수 있으며, 이는 삽입, 결실, 반복 및 돌연변이를 포함하는, DNA의 서열 데이터를 말할 수 있다. 이는 각각의 대립유전자의 부모계 기원을 포함할 수 있다. Allele data refers to a set of genotype data on the set of one or more alleles. This can be said to be step-wise, monoclonal data. This can refer to SNP entities, which can refer to sequence data of DNA, including insertions, deletions, repetitions, and mutations. This may include the parental origin of each allele.

대립유전자 상태는 하나 이상의 대립유전자의 세트내 유전자의 실제 상태를 말한다. 이는 대립유전자 데이터로 기술된 유전자의 실제 상태를 말할 수 있다. An allelic condition refers to the actual state of a gene in a set of one or more alleles. This can refer to the actual state of the gene described by the allele data.

대립유전자 비( Allelic Ratio 또는 allele ratio)는 시료 또는 개체에 존재하는 유전자자리에서 각각의 대립유전자의 양 사이의 비를 말한다. 시료를 서열분석으로 측정한 경우, 대립유전자 비는 유전자자리에서 각각의 대립유전자에 대해 맵핑되는 서열 판독물의 비를 말할 수 있다. 시료를 측정 방법을 기반으로 강도에 의해 측정한 경우, 대립유전자 비는 측정 방법에 의해 평가된 것으로서 이러한 유전자자리에 존재하는 각각의 대립유전자의 양의 비를 말할 수 있다. Allelic ratio Ratio or allele ratio refers to the ratio between the amount of each allele at the locus of the gene present in the sample or individual. When samples are measured by sequencing, the allele ratio can refer to the ratio of the sequence readings mapped to each allele at the locus of the gene. When the sample is measured by intensity based on the measurement method, the allele ratio is the ratio of the amount of each allele present in the locus of the gene as assessed by the measurement method.

대립유전자 수는 특정한 유전자자리에 대해 맵핑한 서열의 수를 말하며, 유전자자리가 다형성인 경우, 이는 대립유전자 각각에 대해 맵핑하는 서열의 수를 말한다. 각각의 대립유전자를 이원성 양식으로 계수하는 경우, 대립유전자 수는 정수(whole number)일 것이다. 대립유전자가 확률론적으로 계수되는 경우, 대립유전자 수는 분획 수일 수 있다. The number of alleles refers to the number of sequences mapped to a particular locus, and when the locus is polymorphic, it refers to the number of sequences that map to each of the alleles. When each allele is counted in a binary fashion, the allele number will be a whole number. If the allele is probabilistically counted, the number of alleles may be a fraction.

대립유전자 수 확률은 특정한 유전자자리 또는 다형성 유전자자리에서 대립유전자의 세트에 맵핑할 가능성이 있고, 맵핑의 확률로 결합된 서열의 수를 말한다. 대립유전자 수는, 각각의 계수된 서열에 대한 맵핑의 확률이 이원성(0 또는 1)인 경우, 대립유전자 수 가능성과 동등하다. 일부 구현예에서, 대립유전자 수 가능성은 이원성일 수 있다. 일부 구현예에서, 대립유전자 수 확률은 DNA 측정과 동등할 세트일 수 있다. The probability of allele chance is the number of sequences that are likely to map to a set of alleles at a particular locus or polymorphic locus and that are combined at a probability of mapping. The number of alleles is equal to the number of alleles if the probability of mapping to each counted sequence is binary (0 or 1). In some embodiments, the allelic number probability may be binary. In some embodiments, the allele probability can be an equivalent set of DNA measurements.

대립유전자 분포, 또는 '대립유전자 수 분포'는 유전자자리의 세트내 각각의 유전자자리에 대해 존재하는 각각의 대립유전자의 상대적인 양을 말한다. 대립유전자 분포는 개체, 시료, 또는 시료에서 이루어진 측정 세트를 말할 수 있다. 서열분석과 관련하여, 대립유전자 분포는 다형성 유전자자리의 세트내 각각의 대립유전자에 대해 특정한 대립유전자에 대해 맵핑되는 판독물의 수 또는 가능한 수를 말한다. 대립유전자 측정은 확률적으로, 즉, 소정의 대립유전자가 소정의 서열 판독물에 대해 존재할 가능성이 0 내지 1 사이의 분획이거나, 이들이 이원 양식으로 처리될 수 있는, 즉, 어떠한 소정의 판독물도 특정한 대립유전자의 정확하게 0 또는 1개 카피인 것으로 고려되는 가능성일 수 있다. Allele distribution, or 'allele frequency distribution', refers to the relative amount of each allele present for each locus of the gene set. An allele distribution can refer to an individual, a sample, or a set of measurements made of a sample. With respect to sequence analysis, the allele distribution refers to the number or possible number of reads that are mapped to a particular allele for each allele in a set of polymorphic loci. Allelic determinations are probabilistic, that is, whether the likelihood that a given allele is present for a given sequence reading is between 0 and 1, or whether they can be processed in a binary format, i.e., May be considered to be exactly zero or one copy of the allele.

대립유전자 분포 양식은 상이한 부모 관계에 대한 대립유전자 분포들의 세트를 말한다. 특정의 대립유전자 분포 양식은 특정의 배수성 상태의 지표일 수 있다. The allele distribution pattern refers to a set of allele distributions for different parental relationships. The specific allele distribution pattern may be an indicator of a specific morbidity state.

대립유전자 편향은 DNA의 원래의 시료 속에 존재한 비에 대해 상이한 이형접합성 유전자자리에서 대립유전자의 측정된 비가 DNA의 원래의 시료 속에 존재한 비와 상이한 정도를 말한다. 특정한 유전자자리에서 대립유전자 바이어스 정도는 이러한 유전자자리에서 원래의 DNA 시료 속에서 대립유전자의 비로 나눈, 측정된 것으로서, 이러한 대립유전자에서 관찰된 대립유전자 비와 동등하다. 대립유전자 편형은 0 초과인 것으로 정의될 수 있음으로써, 대립유전자 편향의 정도의 계산이 1 미만인 값, x로 되는 경우, 대립유전자 편향의 정도는 1/x로 재언급될 수 있다. 대립유전자 편향은 증폭 편향, 정제 편향, 또는 상이한 대립유전자에 상이하게 영향을 미치는 일부 다른 현상에 기인할 수 있다. Allele bias refers to the degree to which the measured ratio of alleles in the heterozygous gene locus for a ratio present in the original sample of DNA is different from the ratio present in the original sample of DNA. The degree of allele bias at a particular locus is measured at the locus of this gene divided by the ratio of alleles in the original DNA sample, which is equivalent to the allele ratio observed in this allele. The allelic variant can be defined to be greater than 0, so that if the calculation of the degree of allelic deviation is less than 1, x, then the degree of allelic deviation can be rewritten as 1 / x. Allele biases can be due to amplification biases, tablet biases, or some other phenomenon that differently affects different alleles.

프라이머 , 또한 "PCR 프로브"는 단일 DNA 분자(DNA 올리고머) 또는 DNA 분자(DNA 올리고머)의 수집을 말하며, 여기서 DNA 분자는 동일하거나, 거의 동일하며, 여기서 프라이머는 표적화된 유전자자리(예를 들면, 표적화된 다형성 유전자자리 또는 비다형성 유전자자리)에 하이브리드화하도록 설계된 영역을 함유하며, PCR 증폭을 허용하도록 설계된 프라이밍 서열을 함유할 수 있다. 프라이머는 또한 분자 바코드(molecular barcode)를 함유할 수 있다. 프라이머는 각각의 개개 분자에 대해 상이한 무작위 영역을 함유할 수 있다. 용어 "시험 프라이머" 및 "후보물 프라이머"는 제한하는 것을 의미하지 않으며 본원에 개시된 프라이머 중의 어느 것을 의미할 수 있다. Primer, and "PCR probe" refers to a collection of single DNA molecules (DNA oligomer) or a DNA molecule (DNA oligomers), wherein the DNA molecule is the same or almost the same, in which the primer is, for spot targeted gene (e.g., Targeted polymorphic locus or non-primed locus), and may contain a priming sequence designed to permit PCR amplification. The primer may also contain a molecular barcode. The primers may contain different random domains for each individual molecule. The terms "test primer" and "post-treasure primer" are not meant to be limiting and may mean any of the primers disclosed herein.

프라이머의 라이브러리는 2개 이상의 프라이머의 집단을 말한다. 다양한 구현예에서, 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 프라이머를 포함한다. 다양한 구현예에서, 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 프라이머 쌍을 포함하고, 여기서 프라이머의 각각의 쌍은 전방 시험 프라이머 및 역방 시험 프라이머를 포함하고, 여기서 시험 프라이머의 각각이 쌍은 표적 유전자자리에 하이브리드화된다. 일부 구현예에서, 프라이머의 라이브러리는 상이한 표적 유전자자리에 각각 하이브리드화하는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 개개 프라이머를 포함하며, 여기서 개개 프라이머는 프라이머 쌍들의 부분이 아니다. 일부 구현예에서, 라이브러리는 (i) 프라이머 쌍 및 (ii) 프라이머 쌍의 부분이 아닌 개개 프라이머(예를 들면, 공통의 프라이머) 둘 모두를 갖는다. A library of primers refers to a group of two or more primers. In various embodiments, the library comprises at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different primers. In various embodiments, the library comprises at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different primer pairs, wherein each pair of primers comprises a forward test primer and a reverse test primer, wherein each pair of test primers is hybridized to a target gene locus. In some embodiments, the library of primers comprises at least 1,000 that hybridize to different target gene sites, respectively; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different individual primers, wherein the individual primers are not part of the primer pairs. In some embodiments, the library has both (i) a primer pair and (ii) an individual primer (e.g., a common primer) rather than a portion of the primer pair.

하이브리드 포획 프로브는 PCR 또는 직접적인 합성과 같은 다양한 방법에 의해 생성되고, 시료 속의 특이적인 표적 DNA 서열 중의 1개의 쇄에 대해 상보성인 것으로 의도된, 가능하게는 변형된, 특정 핵산 서열을 말한다. 외인성 하이브리드 포획 프로브는 제조된 시료에 가하여 변성-재어닐링 공정(deanture-reannealing process)을 통해 하이브리드화되어 외인성-내인성 단편의 이본체를 형성할 수 있다. 이들 이본체는 이후에 시료에서 각종 수단에 의해 물리적으로 분리될 수 있다. Hybrid capture probes refer to specific nucleic acid sequences, possibly modified, that are produced by a variety of methods, such as PCR or direct synthesis, and are intended to be complementary to one strand of a specific target DNA sequence in a sample. The exogenous hybrid capture probe can be added to the prepared sample and hybridized through a deanture-reannealing process to form this body of extrinsic-endogenous fragment. These bodies can then be physically separated from the sample by various means.

서열 판독물은 클론 서열분석 방법을 사용하여 측정된 뉴클레오타이드 염기의 서열을 나타내는 데이터를 말한다. 클론 서열분석은 하나의 원래의 DNA 분자 중의 하나, 또는 클론, 또는 집단을 나타내는 서열 데이터를 생산할 수 있다. 서열 판독물은 또한 뉴클레오타이드가 정확하게 요청된 확률을 나타내는 서열의 각각의 염기 위치에서 관련된 품질 점수를 가질 수 있다. Sequence readings refer to data representing sequences of nucleotide bases measured using clone sequencing methods. Clone sequence analysis can produce sequence data representing one of the original DNA molecules, or clones, or populations. Sequence readings may also have an associated quality score at each base position of the sequence that indicates the probability that the nucleotide was correctly requested.

서열 판독물의 맵핑은 특정한 유기체의 게놈 서열 내 오리진의 서열 판독물의 위치를 측정하는 공정이다. 서열 판독물의 오리진의 위치는 판독물의 뉴클레오타이드 서열 및 게놈 서열의 유사성을 기반으로 한다. Sequence readings Mapping is a process for determining the position of a sequence reading of an origin in a genome sequence of a particular organism. The position of the origin of the sequence reads is based on the similarity of the nucleotide sequence and genomic sequence of the reading.

조화된 카피 오차, 또한 "조화되는 염색체 이수성"(MCA)은, 하나의 세포가 2개의 동일하거나 거의 동일한 염색체를 함유하는 이수성의 상태를 말한다. 이러한 유형의 이수성은 감수분열시 생식세포의 형성 동안에 발생할 수 있으며 감수분열 비-괴리 오차(meiotic non-disjunction error)로 언급될 수 있다. 이러한 유형의 오차는 감수분열시 발생할 수 있다. 조화되는 삼염색체성은, 소정의 염색체의 3개의 카피가 개체에서 존재하고 카피 중 2개가 동일한 경우를 말할 수 있다. Harmonized copy tolerance, as well as " harmonized chromosomal aberration" (MCA), refers to a state of isomerism in which a cell contains two identical or nearly identical chromosomes. This type of imbalance can occur during the formation of germ cells during meiosis and can be referred to as meiotic non-disjunction error. This type of error can occur during meiosis. Harmonious trichromosomes can be said when three copies of a given chromosome are present in an individual and two of the copies are identical.

조화되지 않은 카피 오차, 또한 "유일한 염색체 이수성"(UCA)은, 하나의 세포가 동일한 부모에서 기원하는 2개의 염색체를 함유하는 경우의 이수성의 상태를 말하며, 동종일 수 있지만 동일하지 않을 수 있다. 이러한 유형의 이수성은 감수분열 동안 발생할 수 있으며, 감수분열 오차로 언급될 수 있다. 조화되지 않는 삼염색체성은, 소정의 염색체의 3개의 카피가 개체에 존재하고 카피 중의 2개가 동일한 환자에서 기원하며, 동종이지만 동일하지 않은 경우를 말한다. 조화되지 않는 삼염색체성은, 하나의 부모의 2개의 동종 염색체가 존재하는 경우, 및 염색체의 일부 분절이 동일하지만 다른 분절은 거의 동종인 경우를 말할 수 있다. Uncorrected copy tolerance, as well as "unique chromosomal integration" (UCA), refers to the state of anisotropy when one cell contains two chromosomes from the same parent, and may be homologous, but not identical. This type of perfusion can occur during meiosis and can be referred to as meiosis error. The incongruent trichromosomes are those in which three copies of a given chromosome are present in an individual and two of the copies are of the same patient, and are homologous but not identical. Uncorrected trichromosomes can be described when there are two homologous chromosomes in one parent, and in some segments of the chromosome are identical, but the other segment is nearly homologous.

동종 염색체는 감수분열 동안에 일반적으로 쌍을 이루는 유전자의 동일한 세트를 함유하는 염색체 카피를 말한다. Homologous chromosomes refer to chromosomal copies containing the same set of genes that are commonly paired during meiosis.

동일한 염색체는, 동일한 유전자 세트를 함유하고 각각의 유전자에 대해 이들이 동일하거나 거의 동일한 대립유전자의 동일한 세트를 갖는 염색체 카피를 말한다. The same chromosome refers to a chromosomal copy containing the same set of genes and for each gene they have the same or nearly the same set of alleles.

대립유전자 드롭 아웃( ADO )은 특정 대립유전자에서 동종 염색체의 염기 쌍들 세트의 염기 쌍 중 적어도 하나가 검출되지 않는 상황을 말한다. Allele dropout ( ADO ) refers to a situation where at least one of the base pairs of a set of base pairs of homologous chromosomes in a particular allele is not detected.

유전자자리 드롭 아웃(LDO)은 특정 대립유전자에서 동종 염색체의 일단의 염기 쌍에 속한 염기 쌍 둘 모두 검출되지 않는 상황을 말한다.Genotypic dropout (LDO) refers to a situation in which no pair of bases belonging to a pair of base pairs of a homologous chromosome in a specific allele is detected.

동형접합성은 상응하는 염색체 유전자자리로서 유사한 대립유전자를 갖는 것을 말한다. Homozygosity refers to having a similar allele as the corresponding chromosomal locus.

이형접합성은 상응하는 염색체 유전자자리로서 같지 않은 대립유전자를 갖는 것을 말한다. Heterozygosity refers to having the same allele as the corresponding chromosomal locus.

이형접합성 비율은 소정의 유전자자리에서 이형접합성 대립유전자를 갖는 집단내 개체의 비율을 말한다. 이형접합성 비율은 개체에서 소정의 유전자자리, 또는 DNA의 시료에서 예측되거나 측정된 대립유전자의 비율을 말할 수 있다. The heterozygosity ratio refers to the proportion of individuals in a population having a heterozygous allele at a given locus. The heterozygosity ratio can refer to a predetermined locus in an individual, or a proportion of an allele predicted or measured in a sample of DNA.

고도의 정보성 단일 뉴클레오타이드 다형성( HISNP )은, 태자녀 모의 유전형 속에 존재하지 않는 대립유전자를 갖는 SNP를 말한다. Highly informative single nucleotide polymorphisms ( HISNPs ) refer to SNPs with alleles not present in the parental genotype.

염색체 영역은 염색체, 또는 완전한 염색체의 분절을 말한다. The chromosomal region is a chromosome, or a segment of a complete chromosome.

염색체의 분절은 크기의 범위가 하나의 염기 쌍에서 전체 염색체까지 이를 수 있는 염색체의 구획을 말한다. Chromosome segments are chromosomal segments that range in size from one base pair to the entire chromosome.

염색체는 완전한 염색체, 또는 염색체의 분절 또는 구획을 말한다. A chromosome refers to a complete chromosome, or a segment or segment of a chromosome.

카피는 염색체 분절의 카피의 수를 말한다. 이는 염색체 분절의 동일한 카피, 또는 동일하지 않은 동종의 카피를 말할 수 있으며, 여기서 염색체 분절의 상이한 카피는 실질적으로 유사한 세트의 유전자자리를 함유하고 여기서 하나 이상의 유전자자리는 상이하다. M2 카피 오차와 같은 이수성의 일부 경우에서, 동일한 소정의 염색체 분절의 일부 카피 및 동일하지 않은 동일한 염색체 분절의 일부 카피를 가지는 것이 가능하다. A copy is the number of copies of a chromosome segment. This may refer to the same copy of a chromosome segment, or a homologous copy that is not the same, wherein different copies of the chromosome segment contain a substantially similar set of gene sites, wherein one or more gene sites are different. It is possible to have some copies of the same predetermined chromosome segment and some copies of the same chromosome segment that are not the same.

일배체형은 동일한 염색체 상에서 함께 전형적으로 유전된 다수의 유전자자리에서 대립유전자의 조합을 말한다. 일배체형은 2개와 같이 적은 유전자자리 또는 소정의 유전자자리의 세트 사이에서 발생하는 재조합 현상의 수에 의존한 전체 염색체를 말할 수 있다. 일배체형은 또한 통계적으로 관련된 단일 염색분체 상의 단일의 뉴클레오타이드 다형성(SNP)의 세트를 말할 수 있다. Haplotype refers to a combination of alleles at multiple loci, typically inherited together on the same chromosome. Haplotypes can refer to whole chromosomes, such as two, depending on the number of recombination events that occur between a small set of genes or a set of predetermined sets of genes. Haplotypes can also refer to a set of single nucleotide polymorphisms (SNPs) on a statistically related single chromosome.

일배체형 데이터 또한 "단계적인 데이터" 또는 "정돈된 유전 데이터"는 이배체 또는 배수체 게놈내 단일 염색체, 즉, 이배체 게놈내 염색체의 격리된 모계 또는 부계 카피의 데이터를 말한다. Haplotype Data "Staged data" or "ordered genetic data" refers to data of a isolated maternal or paternal copy of a single chromosome in a diploid or diploid genome, i.e., a chromosome in the diploid genome.

단계화는 개체의 소정의 정돈되지 않은, 이배체(또는 다배수성(poluploidy)) 유전 데이터의 일배체형 유전 데이터를 측정하는 작용을 말한다. 이는 하나의 염색체에서 발견된 대립유전자의 세트에 대해, 대립유전자에서 2개의 유전자 중 어느 것이 개체에서 2개의 동종 염색체 각각과 연관되어 있는지를 측정하는 작용을 말할 수 있다. Staged refers to a non-ordered in the predetermined object, diploid (or polyploidy (poluploidy)) serves to measure the genetic haplotype data of the genetic data. This is an action that, for a set of alleles found on one chromosome, measures which of the two genes in the allele is associated with each of the two homologous chromosomes in the individual.

단계적인 데이터는, 하나 이상의 일배체형이 측정된 유전 데이터를 말한다. Gradual data, refers to the genetic data of one or more haplotypes is measured.

가설은 소정의 유전자자리 세트에서 가능한 배수성 상태, 또는 소정의 유전자자리의 세트에서 가능한 대립유전자 상태의 세트를 말한다. 가능성 세트는 하나 이상의 성분을 포함할 수 있다. Hypothesis refers to a set of possible allelic states in a given set of gene positions, or a set of possible alleles in a given set of genetic loci. The set of possibilities may include one or more components.

카피 수 가설, 또한 "배수성 상태 가설"은 개체에서 염색체의 카피의 수에 관한 가설을 말한다. 이는 또한 각 염색체의 기원인 부모를 포함하는, 염색체 각각의 실체, 및 부모의 2개의 염색체 중 어느 것이 개체에서 존재하는지에 관한 가설을 말할 수 있다. 이는 또한, 관련된 개체의 염색체 또는 염색체 분절이 있는 경우, 어느 염색체 또는 염색체 분절이 개체의 특정 염색체에 유전적으로 상응하는지에 관한 가설을 말한다. The copy number hypothesis , and also the "diploid state hypothesis", refers to the hypothesis about the number of copies of a chromosome in an individual. This can also be hypothesized as to the presence of each chromosome, including the parents of the origin of each chromosome, and which of the two chromosomes of the parent are present in the individual. It also refers to a hypothesis as to which chromosome or chromosome segment corresponds genetically to a particular chromosome of an individual when there is a chromosome or chromosome segment of the relevant individual.

표적 개체는, 유전 상태가 측정되는 개체를 말한다. 일부 구현예에서, DNA의 제한된 양만 표적 개체에서 얻을 수 있다. 일부 구현예에서, 표적 개체는 태자녀다. 일부 구현예에서, 하나 이상의 표적 개체가 존재할 수 있다. 일부 구현예에서, 한 쌍의 부모에서 기원한 각각의 태아는 표적 개체인 것으로 고려될 수 있다. 일부 구현예에서, 측정되는 유전 데이터는 대립유전자 요청 중의 하나 또는 세트이다. 일부 구현예에서, 측정되는 유전 데이터는 배수성 요청이다. A target entity refers to an individual whose genetic status is measured. In some embodiments, only a limited amount of DNA can be obtained from the target entity. In some embodiments, the target entity is a fetus. In some implementations, there may be more than one target entity. In some embodiments, each fetus originating from a pair of parents may be considered to be a target entity. In some embodiments, the genetic data being measured is one or a set of allelic requests. In some implementations, the genetic data being measured is a drainage request.

관련된 개체는 표적 개체과 일배체형 블록을 공유하고, 이와 유전적으로 관련된 특정 개체를 말한다. 이와 관련하여, 관련된 개체는 표적 개체의 유전적 모친, 또는 정자, 극성체, 배아, 태아, 또는 자녀와 같은, 부모에서 기원한 특정 유전물질일 수 있다. 이는 또한 형제, 부모 또는 조부모를 말할 수 있다. A related entity is a specific entity that shares a haplotype block with the target entity and is genetically related to it. In this regard, the related entity may be a genetic parent of the target entity, or a parental origin, such as a sperm, a polar body, an embryo, a fetus, or a child. It can also refer to a brother, a parent, or a grandparent.

형제자매는 이의 부모가 문제의 개체과 동일한 특정 개체를 말한다. 일부 구현예에서, 이는 태어난 자녀, 배아, 또는 태자녀거나, 태어난 자녀, 배아, 또는 태아에서 기원하는 하나 이상의 세포를 말할 수 있다. 형제자매는 또한 정자, 극성체, 또는 일배체 유전 물질의 어떠한 다른 세트와 같은 부모 중의 하나에서 기원한 일배체 개체를 말할 수 있다. 개체는 자체의 형제자매인 것으로 고려될 수 있다. Siblings refer to certain individuals whose parents are the same as the object in question. In some embodiments, it may refer to one or more cells from a born child, embryo, or fetus, or from a born child, embryo, or fetus. A sibling may also refer to a single embryo originating from one of its parents, such as sperm, polar body, or any other set of single sexogenic materials. An entity can be considered to be its own sibling.

태아는 "태아의" 또는 "태아와 유전적으로 유사한 태반의 영역의"를 말한다. 임신한 여성에서, 태반의 일부 부위는 태아와 유전적으로 유사하며, 모계 혈액에서 발견된 자유로이 부유하는 태아 DNA는 태아와 조화된 유전형을 지닌 태반의 일부에서 기원할 수 있다. 태아에서 염색체의 반의 유전 정보는 태아의 모친으로부터 유전됨을 주목한다. 일부 구현예에서, 태아 세포에서 비롯된 이들 모체 유전된 염색체의 DNA는 "태아 기원의"인 것으로 간주되며, "모체 기원의" 것이 아니다. The fetus refers to "the fetus" or "the area of the placenta genetically similar to the fetus". In pregnant women, some parts of the placenta are genetically similar to the fetus, and freely floating fetal DNA found in maternal blood can originate from part of the placenta with a genotype that matches the fetus. It is noted that the genetic information of half of the chromosomes in the fetus is inherited from the mother of the fetus. In some embodiments, the DNA of these maternal inherited chromosomes derived from fetal cells is considered to be "of fetal origin" and not of "maternal origin."

태아 기원의 DNA는, 이의 유전형이 태아의 유전형과 필수적으로 동일한 세포의 원래의 부분에 있던 DNA를 말한다. The DNA of the fetal origin refers to the DNA whose genotype was in the original part of the cell which is essentially identical to the fetal genotype.

모체 기원의 DNA는 이의 유전형이 모의 유전형과 필수적으로 동일한 세포의 원래의 부분인 DNA를 말한다. The DNA of a maternal origin is one in which its genotype is essentially the same as the mock genotype It is the original part of the DNA.

자녀는 배아, 난할구, 또는 태아를 말할 수 있다. 현재 개시된 구현예에서, 기술된 개념은 신생아, 태아, 배아 또는 그 세포들의 세트인 개체에 동등하게 잘 적용됨을 주목한다. 당해 용어 자녀의 사용은, 자녀로서 언급된 개체가 부모의 유전적 자식으로서 언급됨을 함축함을 단순히 의미할 수 있다. Your child can say embryos, ovaries, or embryos. Note that in the presently disclosed embodiment, the concepts described apply equally well to newborns, fetuses, embryos, or entities that are a set of such cells. The use of the term child simply means to imply that the individual referred to as the child is referred to as the genetic child of the parent.

부모는 개체의 유전적 모친 또는 부친을 말한다. 개체는, 유전적 또는 염색체 키메라현상(chimerism)과 같은 경우가 필수적으로 아니라면, 전형적으로 2명의 부모, 모친 및 부친을 가진다. 부모는 개체인 것으로 고려될 수 있다. A parent is a genetic mother or father of an individual. An individual typically has two parents, a mother and a father, unless a case such as a genetic or chromosomal chimerism is essential. The parent may be considered to be an individual.

부모 관계는 표적의 2명의 부모 중 1명 또는 둘 모두에 대한 2개의 관련 염색체 각각에서, 소정의 SNP의 유전적 상태를 말한다. The parental relationship refers to the genetic status of a given SNP in each of the two associated chromosomes for one or both of the two parents of the target.

바람직하게 발달하다, 또는 "정상적으로 발달하다"는 자궁 속에 착상되어 임신을 생성하고/하거나 임신이 지속되어 실체 출생을 이루는 살아있는 배아, 및/또는 염색체 비정상성이 없는 신생아, 및/또는 질병-연결된 유전자와 같은 다른 바람직하지 않은 유전 상태가 없는 신생아를 말한다. 용어 "바람직하게 발달하다"는 부모 또는 건강관리 협력자가 요구할 수 있는 어느 것을 포함함을 의미한다. 일부 경우에, "바람직하게 발달하다"는 의학적 연구 또는 다른 목적에 유용한 살아있지 않거나 살아있는 배아를 말한다. A live embryo that is preferably developed, or "normally developed" implanted in the uterus to produce a pregnancy and / or which continues to conceive, and / or a neonate without chromosomal abnormality, and / &Lt; RTI ID = 0.0 > and / or < / RTI > other undesirable genetic conditions. The term " preferably developed "means that the parent or health care collaborator may include any that may be required. In some cases, "preferably developed" refers to a live or living embryo useful for medical research or other purposes.

자궁내로 삽입은 시험관 수정과 관련하여 자궁 강 내로 배아를 전달하는 공정을 말한다. Intrauterine insertion refers to the process of delivering an embryo into the uterine cavity in connection with in vitro fertilization.

모계 혈장은 임신한 여성의 혈액의 혈장 일부를 말한다. Maternal plasma refers to the plasma part of the blood of a pregnant woman.

임상 결정은 개체의 건강 또는 생존에 영향을 미치는 결과를 갖는 작용을 취하거나 취하지 않는 어떠한 결정을 말한다. 태아 진단과 관련하여, 임상 결정은 태아를 낙태하거나 낙태하지 않는 결정을 말할 수 있다. 임상 결정은 추가의 시험을 수행하거나, 바람직하지 않은 표현형을 완화시키기 위한 작용을 취하거나, 비정상인 자녀의 출생에 대해 대비하기 위한 작용을 취하는 결정을 말할 수 있다. A clinical decision refers to any decision that does or does not take an action that has an effect that affects the health or survival of the individual. With regard to fetal diagnosis, a clinical decision can refer to a decision not aborting or aborting the fetus. A clinical decision may refer to a decision to perform additional testing, to take action to alleviate undesirable phenotypes, or to take action to prepare for the birth of an abnormal child.

진단 상자는 본원에 개시된 방법 중의 하나 또는 다수의 측면을 수행하기 위해 설계된 기계 중의 하나 또는 조합을 말한다. 일 구현예에서, 진단 상자는 환자 관리의 지점에 위치시킬 수 있다. 일 구현예에서, 진단 상자는 서열확인 후에 표적화된 증폭을 수행할 수 있다. 일 구현예에서, 진단 상자는 단독으로나 기술자의 도움으로 기능할 수 있다. The diagnostic box refers to one or a combination of machines designed to perform one or more aspects of the methods disclosed herein. In one embodiment, the diagnostic box may be located at a point of patient care. In one embodiment, the diagnostic box can perform targeted amplification after sequence identification. In one embodiment, the diagnostic box may function alone or with the assistance of an engineer.

정보학 기반 방법은 다량의 데이터를 이해하기 위한 통계학에 주로 의존하는 방법을 말한다. 태자녀 진단의 측면에서, 이는, 예를 들면, 분자 배열 또는 서열분석에 의한 다량의 유전 데이터를 고려해볼 때, 상태를 직접적으로 물리적으로 측정함에 의한 것보다는, 대부분의 유사한 상태를 통계학적으로 추론함으로써 하나 이상의 대립유전자 또는 하나 이상의 염색체의 배수성 상태를 측정하도록 설계된 방법이다. 본원의 일 구현예에서, 정보과학 기반 방법은 이러한 환자에서 개시된 하나일 수 있다. 본원의 일 구현예에서, 이는 PARENTAL SUPPORT^TM일 수 있다. Informatics-based methods refer to methods that rely heavily on statistics to understand large amounts of data. In terms of maternal diagnosis, this can be done by statistically inferring most similar states, rather than by directly physically measuring the state, for example, considering large amounts of genetic data by molecular alignment or sequence analysis Thereby determining one or more alleles or one or more chromosomal aberrations. In one embodiment of the invention, the information science based method may be one disclosed in such patients. In one embodiment herein, this may be a PARENTAL SUPPORT ^TM .

1차 유전 데이터는 유전자형 플랫폼에 의해 출력되는 동족체 강도 신호를 나타낸다. SNP 배열의 측면에서, 1차 유전 데이터는, 어떠한 유전자형 호출이 수행되기 전의 강도 신호를 나타낸다. 서열분석의 측면에서, 1차 유전 데이터는, 어떠한 염기 쌍의 동일성이 측정되기 전, 및 서열이 게놈에 맵핑되기 전에 서열을 제거하는, 크로마토그램과 유사한, 아날로그 측정을 나타낸다. The primary genetic data represents the homologous intensity signal output by the genotypic platform. In terms of the SNP arrangement, the primary genetic data represents the intensity signal before any genotype calls are made. In terms of sequencing, the primary genetic data represents analog measurements similar to chromatograms, in which the identity of any base pair is measured and before the sequence is mapped to the genome.

2차 유전 데이터는 유전자형 플랫폼에 의해 출력되는 가공된 유전 데이터를 나타낸다. SNP 배열의 측면에서, 제2의 유전 데이터는 SNP 배열 판독기와 결합된 소프트웨어로 제조된 대립유전자 호출을 나타내며, 여기서 소프트웨어는, 소정의 대립유전자가 시료 내에 존재하는지 또는 부재하는지 여부를 호출한다. 서열분석의 측면에서, 제2의 유전 데이터는, 서열이 측정된 염기쌍 동일성, 및 가능하게는 또한 서열이 어디에서 게놈에 맵핑되었는지를 나타낸다. The secondary genetic data represents the processed genetic data output by the genotypic platform. In terms of the SNP arrangement, the second genetic data represents an allele call made with software coupled with a SNP sequence reader, where the software calls whether a given allele is present or absent in the sample. In terms of sequencing, the second genetic data indicates where the sequence is mapped to the genomic base pairs, and possibly also where the sequence is being measured.

비- 침입성 태자녀 진단( NPL ), 또는 또한 "비-침입성 태자녀 스크리닝"(NPS)는, 모의 혈액에서 발견된 유전 물질을 이용하여 모에서 잉태되는 태아의 유전 상태를 측정하는 방법을 나타내며, 여기서 유전 물질은 모의 정맥 혈액을 채혈해서 수득한다. Non- Invasive Childhood Diagnosis ( NPL ) , or also "non-invasive maternal child screening" (NPS), uses a genetic material found in the mock blood to measure the fetal genetic status in the mother Wherein the genetic material is obtained by collecting simulated venous blood.

유전자자리에 상응하는 DNA의 우선적 농축, 또는 유전자자리에서 DNA의 우선적 농축은 유전자자리에 상응하는 예비-농축 DNA 혼합물 중 DNA 분자의 퍼센트보다 높은 유전자자리에 상응하는 농축후 DNA 혼합물 중 DNA의 분자의 퍼센트를 생성하는 특정 방법을 말한다. 당해 방법은 유전자자리에 상응하는 DNA 분자의 선택적인 증폭 단계를 포함할 수 있다. 당해 방법은 유전자자리에 상응하지 않는 DNA 분자를 제거하는 단계를 포함할 수 있다. 당해 방법은 방법들의 조합을 포함할 수 있다. 농축 정도는 유전자자리에 상응하는 예비-농축 혼합물 중 DNA 분자의 퍼센트로 나눈 유전자자리에 상응하는 농축-후 혼합물 속의 DNA 분자의 퍼센트로 정의된다. 우선적 농축은 다수의 유전자자리에서 수행될 수 있다. 본원의 일부 구현예에서, 농축 정도는 20 초과이다. 본원의 일부 구현예에서, 농축 정도는 200 초과이다. 본원의 일부 구현예에서, 농축 정도는 2,000 초과이다. 우선적 농축이 다수의 유전자자리에서 수행되는 경우, 농축 정도는 유전자자리의 세트내 유전자자리 모두의 평균 농축 정도를 말할 수 있다. Preferential enrichment of the DNA corresponding to the locus, or preferential enrichment of the DNA at the locus, corresponds to locus higher than the percentage of the DNA molecule in the pre-enriched DNA mixture corresponding to the locus, It refers to a specific method of generating percentages. The method may comprise a selective amplification step of the DNA molecule corresponding to the locus. The method may include removing DNA molecules that do not correspond to the locus. The method may comprise a combination of methods. The degree of concentration is defined as the percentage of DNA molecules in the mixture after concentration, corresponding to the locus divided by the percentage of DNA molecules in the pre-enriched mixture corresponding to the locus. Prior enrichment can be performed at multiple loci. In some embodiments of the invention, the degree of concentration is greater than 20. In some embodiments of the invention, the degree of concentration is greater than 200. In some embodiments of the invention, the degree of concentration is greater than 2,000. When preferential enrichment is carried out at multiple genetic sites, the degree of enrichment can refer to the average concentration of all of the loci in the set of genetic loci.

증폭은 DNA 분자의 카피의 수를 증가시키는 방법을 말한다. Amplification refers to increasing the number of copies of DNA molecules.

선택적 증폭은 DNA의 특수 분자, 또는 DNA의 특수 영역에 상응하는 DNA 분자의 카피 수를 증가시키는 방법을 말할 수 있다. 이는 또한, 이것이 표적화되지 않은 분자 또는 DNA의 영역을 증가시키는 것보다도 DNA의 특정한 표적화된 분자, 또는 표적화된 영역의 카피의 수를 증가시키는 방법을 말할 수 있다. 선택적인 증폭은 바람직한 농축 방법일 수 있다. Selective amplification can refer to a method of increasing the number of copies of a DNA molecule, or a DNA molecule corresponding to a specific region of DNA. This may also refer to a method of increasing the number of copies of a particular targeted molecule of DNA, or of a targeted region, rather than increasing the region of untargeted molecules or DNA. Selective amplification can be a preferred enrichment method.

공통의 프라이밍 서열은 예를 들면, 연결, PCR, 또는 연결 매개된 PCR에 의해 표적 DNA 분자의 집단에 첨부될 수 있는 DNA 서열을 말한다. 일단 표적 분자의 집단에 가해지면, 공통의 프라이밍 서열에 특이적인 프라이머를 사용하여 증폭 프라이머의 단일 쌍을 사용하여 표적 집단을 증폭시킬 수 있다. 공통의 프라이밍 서열은 전형적으로 표적 서열과 관련되지 않는다. A common priming sequence refers to a DNA sequence that can be attached to a population of target DNA molecules by, for example, ligation, PCR, or linkage mediated PCR. Once added to a population of target molecules, the target population can be amplified using a single pair of amplification primers using a primer specific to a common priming sequence. A common priming sequence is typically not associated with the target sequence.

공통의 어댑터, 또는 '연결 어댑터' 또는 '라이브러리 태그'는 표적 이본쇄 DNA 분자의 집단의 5-프라임 및 3-프라임 말단에 공유결합으로 연결될 수 있는 공통의 프라이밍 서열을 함유하는 DNA 분자이다. 어댑터의 첨가는, 이로부터 PCR 증폭이 일어나는 표적 집단의 5-프라임 및 3-프라임 말단에 대한 공통의 프라이밍 서열을 제공하여 표적 집단의 모든 분자를 증폭 프라이머의 단일 쌍을 사용하여 증폭시킨다. A common adapter , or "link adapter" or "library tag," is a DNA molecule containing a common priming sequence that can be covalently linked to the 5-prime and 3-prime ends of a population of target double-stranded DNA molecules. The addition of adapters amplifies all molecules in the target population using a single pair of amplification primers, providing a common priming sequence to the 5-prime and 3-prime ends of the target population from which PCR amplification takes place.

표적화는 DNA의 혼합물 속에서, 유전자자리의 세트에 상응하는 DNA 분자를 선택적으로 증폭시키거나 달리는 우선적으로 농축시키는데 사용된 방법을 말한다. Targeting refers to the method used to preferentially concentrate in a mixture of DNA, either selectively amplifying or running DNA molecules corresponding to a set of gene sites.

결합 분포 모델은 다수의 무작위 변수, 즉, 동일한 확률 공간으로 정의된 소정의 다수의 무작위 변수의 측면에서 정의된 현상의 확률을 정의하는 모델을 말하며, 여기서 변수의 확률이 연결된다. 일부 구현예에서, 변수의 확률이 연결되지 않는 퇴화 경우가 사용될 수 있다. The joint distribution model refers to a model that defines the probability of a phenomenon defined in terms of a number of random variables, that is, a given number of random variables defined with the same probability space, where the probability of a variable is connected. In some implementations, a degenerate case where the probability of a variable is unconnected can be used.

현재 개시된 구현예는 첨부된 도면을 참조로 추가로 설명될 것이며, 여기서 유사 구조는 몇개의 그림을 통해 유사 숫자로 언급된다. 나타낸 도면은 필수적으로 크기가 조정되지 않았으며, 대신 현재 개시된 구현예의 원리를 나타낼 때 일반적으로 강조되어 있다.
도 1은 직접적인 다중화된 미니-PCR 방법의 그래프적 묘사이다.
도 2는 반-내포된 미니-PCR 방법의 그래프적 묘사이다.
도 3은 완전히 중첩된 미니-PCR 방법의 그래프적 묘사이다.
도 4는 헤미-내포된 미니-PCR 방법의 그래프적 묘사이다.
도 5는 3중의 헤미-내포된 미니-PCR 방법의 그래프적 묘사이다.
도 6은 단면 중첩된 미니-PCR 방법의 그래프적 묘사이다.
도 7은 단면 미니-PCR 방법의 그래프적 묘사이다.
도 8은 역 반-내포된 미니-PCR 방법의 그래프적 묘사이다.
도 9는 반-내포된 방법에 대한 일부 가능한 흐름도이다.
도 10은 루프된 연결 어댑터의 그래프적 묘사이다.
도 11은 내부적으로 태그된 프라이머의 그래프적 묘사이다.
도 12는 내부 태그를 지닌 일부 프라이머의 예이다.
도 13은 연결 어댑터 결합 영역을 지닌 프라이머를 사용한 방법의 그래프적 묘사이다.
도 14는 2개의 상이한 분석 기술을 사용한 계수 방법에 대한 모의된 배수성 요청 정확도이다.
도 15는 실험 4의 세포주내 다수의 SNP에 대한 2개의 대립유전자의 비이다.
도 16은 염색체에 의해 분류된 실험 4의 세포주에서 다수의 SNP에 대한 2개의 대립유전자의 비이다.
도 17a 내지 도 17d는 염색체에 의해 분류된, 임신한 여성 혈장 시료 속의 다수의 SNP에 대한 2개의 대립유전자의 비이다.
도 18은 데이터 교정 전 및 후 바이어스 분산(binomial variance)에 의해 설명될 수 있는 데이터의 분획이다.
도 19는 짧은 라이브러리 제조 프로토콜 후 시료 속의 태아 DNA의 상대적인 농축을 나타내는 그래프이다.
도 20은 직접적인 PCR 및 반-내포된 방법을 비교하는 판독 그래프의 깊이이다.
도 21은 3개의 게놈 시료의 직접적인 PCR에 대한 판독물의 깊이의 비교이다.
도 22는 3개의 시료의 반-내포된 미니-PCR에 대한 판독물의 깊이의 비교이다.
도 23은 1,200-플렉스(plex) 및 9,600-플랙스 반응에 대한 판독물의 깊이의 비교이다.
도 24는 3개의 염색체에서 6개의 세포에 대한 판독물 수의 비이다.
도 25는 3개의 염색체에서 1ng의 게놈성 DNA 상에서 수행된 2개의 3개-세포 반응 및 제3의 반응 실시에 대한 대립유전자 비이다.
도 26은 3개의 염색체에서 2개의 단일-세포 반응에 대한 대립유전자 비이다.
도 27은 각각의 프라이머 라이브러리에 의해 표적화된 특정한 미니 대립유전자 빈도를 지닌 다수의 유전자자리를 나타내는 2개의 프라이머 라이브러리의 비교이다.
도 28a는 PCR 생성물의 전기영동 그래프이다.
도 28b 내지 도 28m은 도 28a에서 레인 1 내지 12 각각의 일렉트로페로그람(electropherogram)이다.
도 29a 내지 도 29e는 태아 이수성(도 29a)의 측정을 위한 본 발명의 방법의 삽화 묘사이다. 모계 및 부모계 유전형 데이터(혈액 또는 구강 면봉을 이용하여 측정) 및 HapMap 데이터베이스에서 비롯된 교차 빈도 데이터를 이용하여 인실리코(도 29c)에서 각각의 잠재적인 태아 배수성 상태에 대한 다수의 독립적인 가설(도 29b)을 생성하였다. 이들 가설 각각을 확장시켜 상이한 가능한 교차 점을 고려한 부-가설을 포함하였다. 데이터 모델은, 어떤 서열분석 데이터가 소정의 각 가설 태아 유전형 및 상이한 태아 cfDNA 분획과 유사할 수 있는지를 예측하며, 실제 서열분석 데이터와 비교되며; 각각의 가설에 대한 가능성은 베이지안 통계학(Bayesian statistics)을 사용하여 측정한다. 당해 가설 실시예에서, 최대 확률(정배수성)을 갖는 가설을 측정한다(도 29d). 도 29c의 개체 가능성을 각각의 카피수 가설 계열(일염색체성, 이염색체성, 또는 삼배체성)에 대해 요약한다. 최대 확률을 갖는 가설은 배수성 상태로 명명되며, 태아 분획을 나타내고, 시료-특이적인 계산된 정밀성을 나타낸다(도 29e).
도 30a 내지 도30h는 정배수성(도 30a 내지 도 30c), 일염색체성(도 30d), 및 삼염색체성(도 30e 내지 도 30h)의 대표적인 그래프적 묘사이다. 모든 플롯에 대해, x-축은 각각의 염색체에 따른 개개 다형성 유전자자리의 선형 위치를 나타내고(하기에 플롯을 나타냄), y-축은 전체(A+B) 대립유전자 판독물의 분획으로서 A 대립유전자 판독물의 수를 나타낸다. 모계 및 태아 유전형, 및 또한 밴드가 집중되는 y-축 상의 위치는 플롯의 우측에 나타낸다. 가시화를 촉진하는 것이 바람직한 경우, 플롯은 모계 유전형에 따라 색상-코드화함으로써, 적색이 AA의 모계 유전형을 나타내고, 청색이 BB의 모계 유전형을 나타내고, 녹색이 AB의 모계 유전형을 나타내도록 한다. 경우에 따라, 모계 대립유전자 분포는 "태아 유전형" 컬럼에 색상으로 나타낼 수 있다. 대립유전자 기여는 모│태아로 나타냄으로써, 모가 AA이고 태자녀 AB인 대립유전자가 AA|AB로 나타나도록 한다. 도 30a는 2개의 염색체가 존재하고 태아 cfDNA 분획이 0%인 경우 생성된 플롯을 나타낸다. 당해 플롯은 임신하지 않은 여성에서 기원하므로, 유전형이 전적으로 모의 것인 경우의 양식을 나타낸다. 따라서, 대립유전자 집단은 주변 1(AA 대립유전자), 0.5(AB 대립유전자), 및 0(BB 대립유전자) 주변에 집중된다. 도 30b는, 2개의 염색체가 존재하고 태아 분획이 12%인 경우 생성된 플롯을 나타낸다. A 대립유전자 판독물의 분획에 대한 태아 대립유전자의 분포는 y-축을 따라 상향 또는 하향으로 일부 대립유전자 점의 위치를 이동시키므로, 밴드는 1(AA|AA 대립유전자), 0.94(AA|AB 대립유전자), 0.56(AB|AA 대립유전자), 0.50(AB|AB 대립유전자), 0.44(AB|BB 대립유전자), 0.06(BB|AB 대립유전자), 및 0(BB|BB 대립유전자) 주변으로 집중된다. 도 30c는 2개의 염색체가 존재하고 태아 분획이 26%인 경우 생성된 플롯을 나타낸다. 2개의 적색 및 2개의 청색 주변 밴드 및 3개의 중심 녹색 밴드를 포함하는 양식은 용이하게 명백하다(색상은 나타내지 않음). 밴드는 1(AA|AA 대립유전자), 0.87 (AA|AB 대립유전자), 0.63(AB|AA 대립유전자), 0.50(AB|AB 대립유전자), 0.37(AB|BB 대립유전자), 0.13(BB|AB 대립유전자), 및 0(BB|BB 대립유전자) 주변으로 집중된다. 도 30d는, 하나의 염색체가 존재하고 태아 분획이 26%인 경우 생성된 플롯을 나타낸다. 하나의 외부 적색 및 하나의 외부 청색 주변 밴드 및 2개의 중심 녹색 밴드의 특징 양식은 모계로 유전된 일염색체성(색상은 나타내지 않음)을 나타낸다. 태아 만이 대립유전자 판독물에 대해 단일 대립유전자 (A 또는 B)에 기여하기 때문에, 내부 주변 적색 및 청색 밴드는 존재하지 않으며, 밴드의 중심 3개 조는 2개의 밴드로 농축된다(색상은 나타내지 않음). 밴드는 1(AA|A 대립유전자), 0.57(AB|A 대립유전자), 0.43(AB|B 대립유전자), 및 0(BB|B 대립유전자) 주변으로 집중된다. 도 30e는, 3개의 염색체가 존재하고 태아 분획이 27%인 경우 생성된 플롯을 나타낸다. 2개의 적색 및 2개의 청색 주변 밴드 및 2개의 중심의 녹색 밴드의 이러한 양식은 모계-유전된 감수분열 삼염색체성(색상은 나타내지 않음)을 나타낸다. 밴드는 1(AA|AAA 대립유전자), 0.88(AA|AAB 대립유전자), 0.56(AB|AAB 대립유전자), 0.44(AB|ABB 대립유전자), 0.12(BB|ABB 대립유전자), 및 0(BB|BBB 대립유전자) 주변으로 집중된다. 도 30f는, 3개의 염색체가 존재하고 태아 분획이 14%인 생성된 플롯을 나타낸다. 3개의 적색 및 3개의 청색 주변 밴드, 및 2개의 중심 녹색 밴드의 이러한 양식은 부계-유전된 감수분열 삼염색체성(색상은 나타내지 않음)을 나타낸다. 밴드는 1(AA|AAA 대립유전자), 0.93(AA|AAB 대립유전자), 0.87(AA|ABB 대립유전자), 0.60(AB|AAA 대립유전자), 0.53(AB|AAB 대립유전자), 0.47(AB|ABB 대립유전자), 0.40(AB|BBB 대립유전자), 0.13(BB|AAB 대립유전자), 0.07(BB|ABB 대립유전자), 및 0(BB|BBB 대립유전자) 주변에 집중된다. 도 30g는, 3개의 염색체가 존재하고 태아 분획이 35%인 경우에 생성된 플롯을 나타낸다. 2개의 적색 및 2개의 청색 주변 밴드 및 또한 4개의 중심 녹색 밴드의 이러한 양식은 부계-유전된 감수 분열 삼염색체성(색상은 나타내지 않음)을 나타낸다. 밴드는 1(AA|AAA 대립유전자), 0.85(AA|AAB 대립유전자), 0.72(AB|AAA 대립유전자), 0.57(AB|AAB 대립유전자), 0.43(AB|ABB 대립유전자), 0.28(AB|BBB 대립유전자), 0.15(BB|ABB 대립유전자), 및 0(BB|BBB 대립유전자) 주변에 집중된다. 도 30h는, 3개의 염색체가 존재하고 태아 분획이 25%인 경우에 생성된 플롯을 나타낸다. 2개의 적색 및 2개의 청색 주변 밴드 및 4개의 중심 녹색 밴드의 이러한 양식은 부계-유전된 감수 분열 삼염색체성(색상은 나타내지 않음)을 나타낸다. 이러한 양식은 내부 주변 밴드의 위치에 의해 부계-유전된 감수 분열 삼염색체성(도 30g에 나타낸 바와 같음)의 것과는 구별될 수 있다. 구체적으로, 밴드는 1(AA|AAA 대립유전자), 0.78(AA|ABB 대립유전자), 0.67(AB|AAA 대립유전자), 0.56(AB|AAB 대립유전자), 0.44(AB|ABB 대립유전자), 0.33(AB|BBB 대립유전자), 0.22(BB|AAB 대립유전자), 및 0(BB|BBB 대립유전자) 주변에 집중된다.
도 31은 나타낸 바와 같이 (도 31a) 정배수성, (도 31b) T13, (도 31c) T18, (도 31d) T21, (도 31e) 45,X, 및 (도 31f) 47,XXY 시험 시료를 그래프적으로 나타낸다. 각각의 염색체는 플롯의 상단에 나타내며, 태아 및 모계 유전형은 플롯의 우측에 나타내고, x-축은 각각의 염색체를 따라 SNP의 선형 위치를 나타내고, y-축은 전체 판독물의 분획으로서 A 대립유전자 판독물의 수를 나타낸다. 변경된 집단 위치화는 본원에 나타낸 바와 같이, 태아 분획을 기반으로 함에 주목한다. 각각의 점은 단일 SNP 유전자자리를 나타낸다. 태아 및 모계 유전형은 플롯의 우측에 나타내고, 염색체의 실체는 플롯의 상단에 나타낸다.
도 32는, 성 염색체 이수성의 합해진 출생시 유병률은 자가 이수성의 것보다 크다는 것을 나타낸다.
상기 나타낸 도면은 현재 개시된 구현예를 나타내지만, 다른 구현예도 토의에서 나타낸 바와 같이, 또한 고려된다. 당해 기재내용은 표시의 방식으로 제한하지 않고 나열된 구현예를 제공한다. 다수의 다른 변형 및 구현예가 현재 개시된 구현예의 원리의 영역 및 취지 내에 속하는 당해 분야의 숙련가에 의해 고안될 수 있다.
상세한 설명
본 발명은, 흔히 프라이머의 라이브러리 중의 단지 비교적 소수의 프라이머 만이 다중 PCR 반응 동안 형성하는 증폭된 플이머 이량체의 실질적인 양에 관여한다는 놀라운 발견을 부분적으로 기반으로 한다. 후보물 프라이머의 라이브러리에서 제거하기 위해 가장 바람직하지 않은 프라이머를 선택하는 방법이 개발되어 왔다. 무시할 정도의 양(PCR 생성물의 ~0.1%)까지 프라이머 이량체의 양을 감소시킴으로써, 당해 방법은, 생성되는 프라이머 라이브러리가 단일의 다중 PCR 반응에서 거대한 수의 표적 유전자자리를 동시에 증폭시키도록 한다. 프라이머는 표적 유전자자리에 하이브리드화하여 다른 프라이머에 하이브리드화하여 증폭된 프라이머 이량체를 형성시키기 보다는 이들을 증폭시키기 때문에, 증폭될 수 있는 상이한 표적 유전자자리의 수가 증가된다. 보다 낮은 프라이머 농도 및 정상보다 긴 어닐링 시간을 사용하여, 프라이머가 서로에 대해 하이브리드화하여 프라이머 이량체를 형성하는 대신 표적 유전자자리에 하이브리드화하는 가능성을 증가시킴이 또한 밝혀졌다.
게놈 시료 속에서 PCR 증폭 및 19,488개의 표적 유전자자리의 서열분석 동안, 서열분석 판독물 중의 99.4 내지 99.7%가 게놈에 맵핑되고, 이들 중 99.99%가 표적화된 유전자자리에 맵핑된다. 천만 개의 서열분석 판독물을 사용한 혈장 시료의 경우, 전형적으로 19,488개의 표적화된 유전자자리(99.3 %) 중 적어도 19,350개가 증폭되어 서열분석된다. 이러한 거대한 수의 표적 유전자자리를 한번에 동시에 증폭시킬 수 있음은 수천개의 표적 유전자자리를 분석하는데 요구되는 DNA의 양 및 시간의 양을 상당히 감소시킨다. 예를 들어, 단일 세포의 DNA는 수천개의 표적 유전자자리를 동시 분석하기에 충분하며, 이는, DNA의 양이 낮은 적용, 예를 들면, 시험관 수정 또는 아주 적은 DNA를 사용한 외부 시료의 유전적 시험 전 배아의 단일 세포의 유전적 시험에 중요하다. 또한, 시료를 다수의 상이한 반응으로 쪼개기 보다는 하나의 반응 용적(예를 들면, 하나의 체임버 또는 웰) 속에 표적 유전자자리를 분석할 수 있는 것이 반응 사이에 발생할 수 있는 가변성을 감소시킨다. 또한, 상이한 표적 유전자자리 사이에 발생할 수 있는 증폭 편향에 대해 교정하기 위해 참조 표준물을 사용하는 방법이 개발되어 왔다. 예를 들어, GC 함량과 같은 인자로 인하여 표적 유전자자리 사이에 증폭 효능에 있어서의 차이는 동일한 양으로 실제 존재하는 표적 유전자자리에 대해 생산되는 PCR 생성물의 양을 차등화시킬 수 있다. 표적 유전자자리와 유사한 참조 표준물의 사용은 이러한 증폭 편향의 검출을 가능하도록 함으로써 표적 유전자자리의 정량화 동안에 이것이 교정될 수 있도록 한다.
PCR 생성물의 서열분석 동안에, 프라이머 이량체와 같은 인공물이 검출되므로 표적 앰플리콘의 검출을 억제한다. 이러한 제한으로 인하여, 하이브로드화 프로브를 사용한 미세배열을 검출에 흔히 사용하는데, 그 이유는 미세배열이 프라이머 이량체에 기인한 방해에 대해 거의 민감하지 않기 때문이다. 본 발명에 이르러 달성된 최소 비-표적 앰플리콘과의 고 수준의 다중화는 PCR에 이어 미세배열에 대한 대안으로서 사용될 서열분석을 허용한다.
본 발명의 다중-PCR 방법은 유전형, 염색체 비정상성(예를 들면, 태아 염색체 이수성), 유전자 돌연변이 및 다형성(예를 들면, 단일 뉴클레오타이드 다형성, SNP) 분석, 유전자 결실 분석, 친부 측정, 집단 중에서 유전적 차이의 분석, 법의학적 분석, 질병에 대한 소인 측정, mRNA의 정량적 분석, 및 감염제(예를 들면, 세균, 기생충, 및 바이러스)의 검출 및 확인과 같은 다양한 적용에서 존재할 수 있다. 다중 PCR 방법은 또한 부친 시험 또는 태아 염색체 비정상성의 검출과 같은 비-침입성 태아 시험을 위해 사용될 수 있다.
예시적인 프라이머 설계 방법
고도의 다중화된 PCR은 흔히 프라이머 이량체 형성과 같은 비생산적인 부반응에 기인하는 생성물 DNA의 매우 높은 비율의 생산을 야기할 수 있다. 일 구현예에서, 비생산적인 부반응을 유발하는 경향이 가장 큰 특정한 프라이머를 프라이머 라이브러리에서 제거하여 게놈에 맵핑되는 증폭된 DNA의 보다 높은 비율을 생성할 프라이머 라이브러리를 수득할 수 있다. 문제의 프라이머, 즉, 특히 이량체를 안정화시키는 경향이 있는 프라이머는 예상치않게도 서열분석에 의한 후속적인 분석을 위해 극도로 높은 PCR 다중화 수준이 가능하다. 수행이 프라이머 이량체 및/또는 다른 유해한 생성물에 의해 유의적으로 감퇴되는, 서열분석과 같은 시스템에서, 다른 기술된 다중화보다 10배 이상, 50배 이상, 및 100배 이상 더 높은 다중화가 달성되어 왔다. 이는, 과량의 프라이머 이량체가 결과에 인지가능하게 영향을 미치지 않을 프로브계 검출 방법, 예를 들면, 미세배열, TAQMAN, PCR 등에 대치됨에 주목한다. 또한, 당해 분야에서 일반적인 믿음은, 서열분석을 위한 다중화 PCR이 동일한 웰에서 약 100개 검정으로 제한된다는 것이다. 플루이딤(Fluidigm) 및 레인 댄스(Rain Dance)는 하나의 시료에 대한 동시 반응(in parallel reactions)으로 48개 또는 1000개의 PCR 검정을 수행할 플랫폼을 제공한다.
비-맵핑 프라이머 이량체 또는 다른 프라이머 피해 생성물의 양이 최소화되는 라이브러리에 대한 프라이머를 선택하는 다수의 방법이 존재한다. 경험적인 자료는, 소수의 '나쁜' 프라이머가 다량의 비-맵핑 프라이머 이량체 부반응에 관여함을 나타낸다. 이들 '나쁜' 프라이머를 제거하는 것은 표적화된 유전자자리에 맵핑하는 서열 판독물의 퍼센트를 증가시킬 수 있다. '나쁜' 프라이머를 확인하는 한 가지 방법은 표적화된 증폭에 의해 증폭된 DNA의 서열분석 데이터를 관찰하는 것이며; 최대 빈도로 관찰되는 이들 프라이머 이량체를 제거하여 게놈에 대해 맵핑되지 않는 부생성물 DNA를 생성하는 경향이 유의적으로 적은 프라이머 라이브러리를 수득할 수 있다. 또한 다양한 프라이머 조합의 결합 에너지를 계산할 수 있는 공공의 이용가능한 프로그램이 존재하며, 최대 결합 에너지를 사용하여 이들을 제거하는 것은 또한 게놈에 맵핑되지 않는 부생성물 DNA를 생성하는 가능성이 유의적으로 없는 프라이머 라이브러리를 생성할 것이다.
프라이머를 선택하기 위한 일부 구현예에서, 후보물 프라이머의 초기 라이브러리는 하나 이상의 프라이머 또는 프라이머 쌍을 후보물 표적 유전자자리에 설계함으로써 생성된다. 한 세트의 후보물 표적 유전자자리(예를 들면, SNP)는 표적 집단내에서 SNP의 빈도 또는 SNP의 이형접합성 비율과 같은 표적 유전자자리에 대한 바람직한 매개변수에 대한 공공 이용가능한 정보를 기반으로 선택될 수 있다. 일 구현예에서, PCR 프라이머는 Primer3 프로그램(primer3.sourceforge.net; libprimer3 release 2.2.3의 범세계통신망(www), 이의 전문은 본원에 참조로 혼입됨)을 사용하여 설계될 수 있다. 경우에 따라, 프라이머는 특정한 어닐링 온도 범위내에서 어닐링되고/되거나, 특정한 범위의 GC 함량을 가지며/가지거나, 특정한 크기 범위를 가지고/가지거나, 특정한 크기 범위의 표적 앰플리콘을 생산하고/하거나 다른 매개변수 특성을 가지도록 설계될 수 있다. 후보물 표적 유전자자리 당 다중 프라이머 또는 프라이머 쌍을 사용한 출발은, 프라이머 또는 프라이머 쌍이 대부분 또는 모든 표적 유전자자리에 대한 라이브러리 속에 잔존하도록 할 것이다. 일 구현예에서, 선택 범주는, 표적 유전자자리당 적어도 하나의 프라이머 쌍이 라이브러리에 잔존하도록 하는 것을 요구할 수 있다. 이러한 방식으로, 대부분 또는 모든 표적 유전자자리가 제1의 프라이머 라이브러리를 사용하는 경우 증폭될 것이다. 이는 게놈에서 거대한 수의 위치에서 결실 또는 중복을 위한 스크리닝 또는 질병과 관련되거나 질병 위험이 증가된 거대한 수의 서열(예를 들면, 다형성 또는 다른 돌연변이)에 대한 스크리닝과 같은 적용의 경우 바람직하다. 라이브러리에서 선택한 프라이머 쌍이 다른 프라이머 쌍에 의해 생산된 표적 앰플리콘과 오버랩된 표적 앰플리콘을 생산할 수 있는 경우, 프라이머 쌍 중의 하나를 라이브러리에서 제거하여 방해를 방지할 수 있다.
일부 구현예에서, "비바람직성 점수"(점수가 높을 수록 가장 적은 바람직성을 나타낸다)는 후보물 프라이머의 라이브러리에서 2개의 프라이머의 대부분 또는 모든 가능한 조합에 대해 계산(컴퓨터 상에서의 계산과 같음)된다. 다양한 구현예에서, 비바람직성 점수는 라이브러리내 후보물 프라이머의 가능한 조합의 적어도 적어도 80, 90, 95, 98, 99, 또는 99.5%에 대해 계산된다. 각각의 비바람직성 점수는 적어도 부분적으로는 2개의 후보물 프라이머 사이의 이량체 형성의 가능성을 기반으로 한다. 경우에 따라, 비바람직성 점수는 표적 유전자자리의 이형접합성 비율, 표적 유전자자리에서 서열(예: 다형성)과 관련된 질병 유병률, 표적 유전자자리에서 서열(예: 다형성)과 관련된 질병 침투도, 표적 유전자자리에 대한 후보물 프라이머의 특이성, 후보물 프라이머의 크기, 표적 앰플리콘의 용융 온도, 표적 앰플리콘의 GC 함량, 표적 앰플리콘의 증폭 효능, 및 표적 앰플리콘의 크기로 이루어진 그룹 중에서 선택된 하나 이상의 다른 매개변수를 기반으로 할 수 있다. 다수의 인자가 고려되는 경우, 비바람직성 점수는 다양한 매개변수의 칭량된 평균을 기반으로 계산될 수 있다. 당해 매개변수는, 프라이머가 사용될 특수 적용을 위한 이들의 중요성을 기반으로 상이한 중량으로 지정될 수 있다. 일부 구현예에서, 최대 비바람직성 점수를 지닌 프라이머를 라이브러리에서 제거한다. 제거된 프라이머가 하나의 표적 유전자자리에 하이브리드화하는 프라이머쌍의 구성원인 경우, 프라이머 쌍의 다른 구성원도 라이브러리에서 제거할 수 있다. 프라이머를 제거하는 공정은 경우에 따라 반복될 수 있다. 일부 구현예에서, 선택 방법은, 라이브러리 속에 잔존하는 후보물 프라이머조합물에 대한 비바람직성 점수가 모두 최소한계와 동일하거나 미만이 될 때까지 수행된다. 일부 구현예에서, 선택 방법은, 라이브러리 속에 남아있는 후보물 프라이머가 바람직한 수로 감소될 때까지 수행된다.
다양한 구현예에서, 비바람직성 점수가 계산된 후, 제1의 최소 한계를 초과하는 비바람직성 점수를 가진 2개의 후보물 프라이머의 조합의 최대 수 중 일부인 후보물 프라이머를 라이브러리에서 제거한다. 당해 단계는, 제1의 최소 한계 이하의 상호작용을 무시하는데, 이는, 이러한 상호작용이 거의 유의적이지 않기 때문이다. 제거된 프라이머가 하나의 표적 유전자자리에 하이브리드화하는 프라이머 쌍의 구성원인 경우, 프라이머 쌍의 다른 구성원을 라이브러리에서 제거할 수 있다. 프라이머를 제거하는 공정은 경우에 따라 반복될 수 있다. 일부 구현예에서, 선택 방법은, 라이브러리 속에 남아있는 후보물 프라이머에 대한 비바람직성 점수가 모두 제1의 최소 한계 이하일 때까지 수행된다. 라이브러리 속에 남아있는 후보물 프라이머의 수가 바람직한 수보다 더 큰 경우, 프라이머의 수는 제1의 최소 한계를 보다 낮은 제2의 최소 한계로 감소시키고 프라이머를 제거하는 공정을 반복함으로써 감소시킬 수 있다. 라이브러리 속에 남아있는 후보물 프라이머의 수가 바람직한 수보다 더 적은 경우, 당해 방법은 제1의 최소 한계를 보다 높은 제2의 최소 한계로 증가시키고 프라이머를 제거하는 공정을 반복함으로써 지속하여, 보다 많은 후보물 프라이머가 라이브러리 속에 잔존하도록 할 수 있다. 일부 구현예에서, 선택 방법은, 라이브러리 속에 남아있는 후보물 프라이머에 대한 비바람직성 점수가 모두 제2의 최소 한계 이하이거나, 라이브러리 속에 남아있는 후보물 프라이머의 수가 바람직한 수로 감소될 때까지 수행한다.
경우에 따라, 다른 프라이머 쌍에 의해 생산된 표적 앰플리콘과 오버랩된 표적 앰플리콘을 생산하는 프라이머 쌍을 분리된 증폭 반응으로 나눌 수 있다. 다중 PCR 증폭 반응이 후보물 표적 유전자자리(표적 앰플리콘의 중첩으로 인하여 분석에 의하여 후보물 표적 유전자자리를 빼는 것 대신에) 모두를 분석하는 것이 바람직할 수 있다.
이러한 선택 방법은 라이브러리에서 제거되어서 프라이머 이량체에서 바람직한 감소를 달성하는 후보물 프라이머의 수를 최소화한다. 라이브러리에서 소수의 후보물 프라이머를 제거함으로써, 보다 많은(또는 모든) 표적 유전자자리를 수득되는 프라이머 라이브러리를 사용하여 증폭시킬 수 있다.
다수의 프라이머의 다중화는 포함될 수 있는 검정에서 고려할만한 제약을 부여한다. 의도치않게 상호작용하는 검정은 가짜 증폭 생성물을 생성한다. 미니PCR의 크기 제약은 추가의 구속을 초래할 수 있다. 일 구현예에서, 매우 다수의 잠재적인 SNP 표적(약 500 내지 1백만 개 이상)을 사용하고 각각의 SNP를 증폭시키기 위한 프라이머를 설계하기 위한 접근법을 개시하는 것이 가능하다. 프라이머를 설계할 수 있는 경우 DNA 이본쇄 형성을 위한 발표된 열역학적 매개변수를 사용하여 모든 가능한 프라이머의 쌍 사이에 가짜의 프라이머 이본쇄 형성의 가능성을 평가함으로써 가짜 생성물을 형성시키는 가능성이 있는 프라이머를 확인하는 접근법이 가능하다. 프라이머 상호작용은 상호작용과 관련된 기능을 점수매김으로써 순위를 정할 수 있으며 최악의 상호작용 점수를 갖는 프라이머는, 목적한 프라이머의 수가 충족될 때까지 제거한다. SNP가 이형접합성일 가능성이 가장 유용한 경우에, 검정의 목록을 순위를 정하여 가장 이형접합성인 양립성 검정을 선택하는 것이 또한 가능하다. 실험은, 높은 반응성 점수를 지닌 프라이머가 프라이머 이량체를 형성할 가능성이 가장 높음을 입증한다. 높은 다중화시, 모든 거짓된 상호작용을 제거하는 것이 가능할 뿐 아니라, 이들이 전체 반응을 배제할 수 있고, 의도된 표적의 증폭을 크게 제한하므로 인실리코 최대 상호작용 점수를 가진 프라이머 또는 프라이머 쌍을 제거하는 것이 필수적이다. 본 발명자들은 당해 과정을 수행하여 10,000개 프라이머 이하 및 일부 경우에 이상인 복합체 프라이머 세트를 생성하기 위한 당해 공정을 수행하였다. 당해 공정으로 인한 개선은, 최악의 프라이머가 제거되지 않는 반응의 10%와 비교하여, 모든 PCR 생성물의 서열분석에 의해 측정된 것으로서 표적 생성물에서 80% 이상, 90% 이상, 95% 이상, 98% 이상, 및 심지어 99% 이상의 증폭을 실질적으로 가능하게 한다. 앞서 기술한 바와 같은 부분적인 반-내포된 접근법와 결합시키는 경우, 앰플리콘의 90% 이상, 및 심지어 95% 이상이 표적화된 서열에 맵핑될 수 있다.
어느 PCR 프로브가 이량체를 형성할 가능성이 있는지를 측정하는 다른 방법이 존재함에 주목한다. 일 구현예에서, 최적화되지 않은 세트의 프라이머를 사용하여 증폭시킨 DNA의 혼주물의 분석은, 문제가 있는 프라이머를 측정하기에 충분할 수 있다. 예를 들면, 분석은 서열분석을 사용하여 수행될 수 있으며, 최대 수로 존재하는 이들 이량체는 이량체를 형성할 가능성이 최대인 것으로 측정되며, 제거될 수 있다.
당해 방법은 예를 들면, SNP 유전형, 이형접합성 비율 측정, 카피 수 측정, 및 다른 표적화된 서열분석 적용에 대한 다수의 잠재적인 적용을 갖는다. 일 구현예에서, 프라이머 설계 방법은 본 서류의 어딘가에 기술된 미니-PCR 방법과의 조합에서 사용될 수 있다. 일부 구현예에서, 프라이머 설계 방법은 거대한 다중화된 PCR 방법이 일부로서 사용될 수 있다.
프라이머에서 태그의 사용은 프라이머 이량체 생성물의 증폭 및 서열분석을 감소시킬 수 있다. 일부 구현예에서, 프라이머는 태그와 함께 루프 구조를 형성하는 내부 영역을 함유한다. 특수 구현예에서, 프라이머는 표적 유전자자리에 대해 특이적인 5' 영역, 표적 유전자자리에 대해 특이적이지 않고 루프 구조를 형성하는 내부 영역, 및 표적 유전자자리에 대해 특이적인 3' 영역을 포함한다. 일부 구현예에서, 루프 영역은 2개의 결합 영역 사이에 있을 수 있으며, 여기서 2개의 결합 영역은 주형 DNA의 연속된 또는 이웃하는 영역에 결합하도록 설계된다. 다양한 구현예에서, 3' 영역의 길이는 적어도 7개의 뉴클레오타이드이다. 일부 구현예에서, 3' 영역의 길이는 7 내지 20개의 뉴클레오타이드이며, 예를 들면, 7 내지 15개의 뉴클레오타이드, 또는 7 내지 10개의 뉴클레오타이드를 포함한다. 다양한 구현예에서, 프라이머는 표적 유전자자리(예를 들면, 태그 또는 공통의 프라이머 결합 부위)에 대해 특이적이지 않고 5' 영역에 이어서 표적 유전자자리에 대해 특이적인 영역, 표적 유전자자리에 대해 특이적이지 않은 고 루프 구조를 형성하는 내부 영역, 및 표적 유전자자리에 대해 특이적인 3' 영역을 포함한다. 태그-프라이머는 20개 미만, 15개 미만, 12개 미만, 및 심지어 10개 미만의 염기 쌍으로 필수적인 표적-특이적인 서열을 단축시키기 위해 사용될 수 있다. 이는, 표적 서열이 프라이머 결합 부위내에서 단편화되는 경우 표준 프라이머 설계로 우연일 수 있거나, 프라이머 설계로 설계될 수 있다. 당해 방법의 장점은: 이것이 특정의 최대 앰플리콘 길이에 대해 설계될 수 있는 검정의 수를 증가시키고, 이것이 프라이머 서열의 "비-정보성" 서열분석을 단축시킨다는 것이다. 이는 또한 내부 태그화(당해 서류의 다른 곳 참조)와 함께 사용될 수 있다.
일 구현예에서, 다중화되고 표적화된 PCR 증폭에서 비생산성 생성물의 상대적인 양은 어닐링 온도를 상승시킴으로써 감소시킬 수 있다. 하나가 표적 특이적인 프라이머와 동일한 태그를 지닌 증폭 라이브러리인 경우, 어닐링 온도는, 태그가 프라이머 결합에 기여할 것이므로 게놈 DNA와 비교하여 증가될 수 있다. 일부 구현예에서, 본 발명자들은 다른 곳에 보고된 것보다 더 긴 어닐링 시간을 사용하는 것과 함께 앞서 보고된 것보다 현저히 더 낮은 프라이머 농도를 사용한다. 일부 구현예에서, 어닐링 시간은 3분 이상, 5분 이상, 8분 이상, 10분 이상, 15분 이상, 20분 이상, 30분 이상, 60분 이상, 120분 이상, 240분 이상, 480분 이상, 및 심지어 960분 이상일 수 있다. 일 구현예에서, 보다 긴 어닐링 시간이 앞서의 보고에서 보다 더 사용되어, 보다 낮은 프라이머 농도를 허용한다. 다양한 구현예에서, 정상의 연장 시간보다 긴 연장시간, 예를 들면, 3, 5, 8, 10, 또는 15분 이상이 사용된다. 일부 구현예에서, 프라이머 농도는 50 nM, 20 nM, 10 nM, 5 nM, 1 nM와 같이 낮고, 1μM 이하이다. 이는 놀랍게도 고도로 다중화된 반응, 예를 들면, 1,000-플렉스 반응(plex reaction), 2,000-플렉스 반응, 5,000-플렉스 반응, 10,000-플렉스 반응, 20,000-플렉스 반응, 50,000-플렉스 반응, 및 심지어 100,000-플렉스 반응에 대한 견고한 수행능을 생성한다. 일 구현예에서, 증폭은 긴 어닐링 시간과 함께 1, 2, 3, 4 또는 5개의 주기 실시를 사용한 후, 이어서 태그된 프라이머를 사용한 보다 일반적인 어닐링 시간을 지닌 PCR 주기를 수반한다.
표적 위치를 선택하기 위해, 후보물 프라이머 쌍 설계의 혼주물을 사용하여 출발하고 프라이머 쌍 사이의 잠재적으로 역 상호작용의 열역학적 모델을 생성한 후 당해 모델을 사용하여 혼주물 속에서 다른 설계와 비혼화성인 설계를 제거할 수 있다.
선택 공정 후, 라이브러리 속에 남아있는 프라이머는 본 발명의 방법 중 어느 것에서도 사용할 수 있다.
예시적인 프라이머 라이브러리
하나의 측면에서, 본 발명은 본 발명의 방법 중 어느 것을 사용하여 후보물 프라이머의 라이브러리에서 선택된 프라이머와 같은 프라이머의 라이브러리를 특징으로 한다. 일부 구현예에서, 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리를 하나의 반응 용적으로 동시에 하이브리드화하거나(또는 동시에 하이브리드화할 수 있거나) 또는 동시에 증폭시키는(또는 동시에 증폭시킬 수 있는) 프라이머를 포함한다. 다양한 구현예에서, 라이브러리는 1,000 내지 2,000; 2,000 내지 5,000; 5,000 내지 7,500; 7,500 내지 10,000; 10,000 내지 20,000; 20,000 내지 25,000; 25,000 내지 30,000; 30,000 내지 40,000; 40,000 내지 50,000; 50,000 내지 75,000; 또는 75,000 내지 100,000개를 포함하는 상이한 표적 유전자자리를 하나의 반응 용적으로 동시에 증폭시키는(또는 동시에 증폭시킬 수 있는) 프라이머를 포함한다. 다양한 구현예에서, 라이브러리는 1,000 내지 100,000개의 상이한 표적 유전자자리, 예를 들면, 1,000 내지 50,000; 1,000 내지 30,000; 1,000 내지 20,000; 1,000 내지 10,000; 2,000 내지 30,000; 2,000 내지 20,000; 2,000 내지 10,000; 5,000 내지 30,000; 5,000 내지 20,000; 또는 5,000 내지 10,000개를 포함하는 상이한 표적 유전자자리를 하나의 반응 용적으로 동시에 증폭시키는(또는 동시에 증폭시킬 수 있는) 프라이머를 포함한다. 일부 구현예에서, 라이브러리는 표적 유전자자리를 하나의 반응 용적으로 동시에 증폭시킴으로써(또는 동시에 증폭시킬 수 있음으로써) 60, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, 또는 0.5% 미만의 증폭된 생성물이 프라이머 이량체가 되도록 하는 프라이머를 포함한다. 다양한 구현예에서, 프라이머 이량체인 증폭된 생성물의 양은 0.5 내지 60%, 예를 들면, 0.1 내지 40%, 0.1 내지 20%, 0.25 내지 20%, 0.25 내지 10%, 0.5 내지 20%, 0.5 내지 10%, 1 내지 20%, 또는 1 내지 10%를 포함한다. 일부 구현예에서, 프라이머는 표적 유전자자리를 하나의 반응 용적으로 동시에 증폭시킴으로써(또는 동시에 증폭시킬 수 있음으로써) 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 증폭된 생성물이 표적 앰플리콘이 되도록 한다. 다양한 구현예에서, 표적 앰플리콘인 증폭된 생성물의 양은 50 내지 99.5%이며, 예를 들면, 60 내지 99%, 70 내지 98%, 80 내지 98%, 90 내지 99.5%, 또는 95 내지 99.5%를 포함한다. 일부 구현예에서, 프라이머는 표적 유전자자리를 하나의 반응 용적에서 동시에 증폭시킴으로써(또는 동시에 증폭시킬 수 있음으로써) 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 표적화된 유전자자리가 증폭되도록 한다. 다양한 구현예에서, 증폭된 표적 유전자자리의 양은 50 내지 99.5%이며, 예를 들면, 60 내지 99%, 70 내지 98%, 80 내지 99%, 90 내지 99.5%, 95 내지 99.9%, 또는 98 내지 99.99%를 포함한다. 일부 구현예에서, 프라이머의 라이브러리는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 프라이머 쌍을 포함하며, 여기서 프라이머의 각각의 쌍은 전방 시험 프라이머 및 역방 시험 프라이머를 포함하고 여기서 시험 프라이머의 각각의 쌍은 표적 유전자자리에 하이브리드화된다. 일부 구현예에서, 프라이머의 라이브러리는, 상이한 표적 유전자자리에 각각 하이브리드화하는 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 개개 프라이머를 포함하며, 여기서 개개 프라이머는 프라이머 쌍의 일부가 아니다.
다양한 구현예에서, 각각의 프라이머의 농도는 100, 75, 50, 25, 20, 10, 5, 2, 또는 1 nM 미만, 또는 500, 100, 10, 또는 1 μM 미만이다. 다양한 구현예에서, 각각의 프라이머의 농도는 1μM 내지 100 nM이며, 예를 들면, 1 μM 내지 1 nM, 1 내지 75 nM, 2 내지 50 nM 또는 5 내지 50 nM를 포함한다. 다양한 구현예에서, 프라이머의 GC 함량은 30 내지 80%이며, 예를 들면, 40 내지 70%, 또는 50 내지 60%를 포함한다. 일부 구현예에서, 프라이머의 GC 함량의 범위는 30, 20, 10, 또는 5% 미만이다. 일부 구현예에서, 프라이머의 GC 함량의 범위는 5 내지 30%이고, 예를 들면, 5 내지 20% 또는 5 내지 10%를 포함한다. 일부 구현예에서, 시험 프라이머의 용융 온도(T_m)는 40 내지 80 ℃이며, 예를 들면, 50 내지 70 ℃, 55 내지 65 ℃, 또는 57 내지 60.5 ℃를 포함한다. 일부 구현예에서, T_m은 빌트-인(built-in) SantaLucia 매개변수(primer3.sourceforge.net에서 범세계통신망)를 사용하는 Primer3 프로그램(libprimer3 release 2.2.3)을 사용하여 계산한다. 일부 구현예에서, 프라이머의 용융 온도의 범위는 15, 10, 5, 3, 또는 1℃ 미만이다. 일부 구현예에서, 프라이머의 용융 온도의 범위는 1 내지 15 ℃이며, 예를 들면, 1 내지 10 ℃, 1 내지 5 ℃, 또는 1 내지 3 ℃를 포함한다. 일부 구현예에서, 프라이머의 길이는 15 내지 100개의 뉴클레오타이드이고, 예를 들면, 15 내지 75개의 뉴클레오타이드, 15 내지 40개의 뉴클레오타이드, 17 내지 35개의 뉴클레오타이드, 18 내지 30개의 뉴클레오타이드, 20 내지 65개의 뉴클레오타이드를 포함한다. 일부 구현예에서, 프라이머의 길이의 범위는 50, 40, 30, 20, 10, 또는 5개 미만의 뉴클레오타이드이다. 일부 구현예에서, 프라이머의 길이의 범위는 5 내지 50개의 뉴클레오타이드이며, 예를 들면, 5 내지 40개의 뉴클레오타이드, 5 내지 20개의 뉴클레오타이드, 또는 5 내지 10개의 뉴클레오타이드를 포함한다. 일부 구현예에서, 표적 앰플리콘의 길이는 50 내지 100개의 뉴클레오타이드이고, 예를 들면, 60 내지 80개의 뉴클레오타이드, 또는 60 내지 75개의 뉴클레오타이드를 포함한다. 일부 구현예에서, 표적 앰플리콘의 길이는 50, 25, 15, 10, 또는 5개 미만의 뉴클레오타이드이다. 일부 구현예에서, 표적 앰플리콘의 길이의 범위는 5 내지 50 뉴클레오타이드이고, 예를 들면, 5 내지 25개의 뉴클레오타이드, 5 내지 15개의 뉴클레오타이드, 또는 5 내지 10개의 뉴클레오타이드를 포함한다.
이들 프라이머 라이브러리는 본 발명의 방법 중 어느 것에서도 사용될 수 있다.
예시적인 프라이머 키트
하나의 측면에서, 본 발명은 본 발명의 프라이머 라이브러리중 어느 것을 포함하는 키트(예를 들면, 핵산 시료 속에서 표적 유전자자리를 증폭시키기 위한 키트)를 특징으로 한다. 일부 구현예에서, 본 기재내용에 기술된 방법을 달성하도록 설계된 다수의 프라이머를 포함하는 키트가 제형화될 수 있다. 프라이머는 본원에 개시된 바와 같은 외부 전방 및 역방 프라이머, 내부 전방 및 역방 프라이머일 수 있으며, 이들은 프라이머 설계에서 단락에 개시된 바와 같이 키트 속의 다른 프라이머에 대해 낮은 결합 친화성을 갖도록 설계되어진 프라이머일 수 있으며, 이들은 관련된 단락에 기술된 바와 같은 하이브리드 포획 프로브 또는 예비-원형화된 프로브, 또는 이의 일부 조합일 수 있다. 일 구현예에서, 키트는 본원에 개시된 방법과 함께 사용되도록 설계된 임신성 태아 속의 표적 염색체의 배수성 상태를 측정하기 위해 제형화될 수 있으며, 당해 키트는 다수의 내부 전방 프라이머 및 외부 역방 프라이머를 포함하고, 임의로 외부 전방 프라이머 및 외부 역방 프라이머를 포함하며, 여기서 각각의 프라이머는 표적 염색체, 및 임의로 추가의 염색체 상의 표적 부위(예를 들면, 다형체 부위) 중 하나의 상부 및/또는 하부로 DNA 영역에 하이브리드화하도록 설계된다. 일 구현예에서, 프라이머 키트는 본 문서의 어딘가에 기술된 진단 박스와 함께 사용될 수 있다. 일부 구현예에서, 당해 키트는 표적 유전자자리를 증폭시키기 위한 라이브러리를 사용하기 위한 설명서를 포함한다.
예시적인 다중 PCR 방법
하나의 측면에서, 본 발명은 (i) 핵산 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계; 및 (ii) 반응 혼합물을 프라이머 연장 반응 조건(예를 들면, PCR 조건)에 적용시켜 표적 앰플리콘을 포함하는 증폭된 생성물을 생산하는 단계를 포함하는, 핵산 시료 속의 표적 유전자자리를 증폭시키는 방법을 특징으로 한다. 일부 구현예에서, 당해 방법은 또한 적어도 하나의 표적 앰플리콘(예를 들면, 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 표적 앰플리콘)의 존재 또는 부재를 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 또한 적어도 하나의 표적 앰플리콘(예를 들면, 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 표적 앰플리콘)의 서열을 측정하는 단계를 포함한다. 일부 구현예에서, 적어도 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 또는 99.5%의 표적화된 유전자자리가 증폭된다. 다양한 구현예에서, 60, 50, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, 또는 0.05% 미만의 증폭된 생성물은 프라이머 이량체이다.
일 구현예에서, 본원에 개시된 방법은 고도로 효율적인 고도의 다중 표적화된 PCR을 사용하여 DNA를 증폭시킨 후 고 처리 서열분석에 의해 각각의 표적 유전자자리에서 대립유전자 빈도를 측정한다. 약 50 또는 100개의 PCR 프라이머를 하나의 반응 용적으로 대부분의 수득되는 서열 판독물이 표적화된 유전자자리에 맵핑되는 방식으로 다중화하는 능력은 신규하며 명확하다. 고도로 다중화된 표적 PCR을 고도의 효율적인 방식으로 허용하는 하나의 기술은 서로와 하이브리드화하는 경향이 없는 프라이머를 설계하는 단계를 포함한다. 전형적으로 프라이머로서 언급되는 PCR 프로브는 적어도 500; 적어도 1,000; 적어도 2,000; 적어도 5,000; 적어도 7,500; 적어도 10,000; 적어도 20,000; 적어도 25,000; 적어도 30,000; 적어도 40,000; 적어도 50,000; 적어도 75,000; 또는 적어도 100,000개의 잠재적인 프라이머 쌍 사이의 잠재적인 역(adverse) 상호작용, 또는 프라이머와 시료 DNA 사이의 원하지 않은 상호작용의 열역학적 모델을 생성한 후, 당해 모델을 사용하여 혼주물 속에서 다른 설계물과 비혼화성인 설계를 제거하는 단계에 의해 선택된다. 고도로 다중화되고 표적화된 PCR이 고도로 효율적인 방식으로 수행되도록 하는 다른 기술은 표적화된 PCR에 대한 부분적이거나 완전히 중첩된 접근법을 사용하는 것이다. 이들 접근법의 하나 또는 조합을 사용하는 것은, 서열 분석하는 경우, 표적화된 유전자자리에 맵핑될 DNA 분자의 대부분을 포함하는 생성되는 증폭된 DNA가 들어있는 단일의 혼주물 속에 적어도 300, 적어도 800, 적어도 1,200, 적어도 4,000 또는 적어도 10,000개의 프라이머의 다중화를 허용한다. 이들 접근법중 하나 또는 조합을 사용하는 것은 단일 혼주물 속의 다수의 프라이머가 표적화된 유전자자리에 맵핑하는 50% 이상, 60% 이상, 67% 이상, 80% 이상, 90% 이상, 95% 이상, 96% 이상, 97% 이상, 98% 이상, 99%, 또는 99.5% 이상의 DNA 분자를 포함하는 생성되는 증폭된 DNA와 다중화하도록 한다.
일부 구현예에서, 표적 유전 물질의 검출은 다중화 양식으로 수행될 수 있다. 평행하게 수행될 수 있는 유전 표적 서열의 수는 1 내지 10, 10 내지 100, 100 내지 1000, 1000 내지 10,000, 10,000 내지 100,000, 100,000 내지 1,000,000, 또는 1,000,000 내지 10,000,000의 범위일 수 있다. 혼주물 당 100개 이상의 프라이머를 다중화하는 선행 접근법은 프라이머-이량체 형성과 같은 원치않는 부작용과 함께 유의적인 문제점을 초래하여 왔다.
표적화된 PCR
일부 구현예에서, PCR을 사용하여 게놈의 표적 특이적인 위치를 표적화할 수 있다. 혈장 시료 속에서, 원래의 DNA는 고도로 단편화(전형적으로 500 bp 미만, 평균 길이 200 bp 미만)된다. PCR에서, 전방 및 역방 프라이머 둘 모두는 동일한 단편에 어닐링되어 증폭이 가능하도록 한다. 따라서, 단편이 짧은 경우, PCR 검정은 또한 비교적 짧은 영역을 증폭시켜야 한다. MIPS과 같이, 다형성 위치가 폴리머라제 결합 부위와 너무 근접한 경우, 이는 상이한 대립유전자의 증폭시 편향을 초래할 수 있었다. 현재, SNP를 함유하는 것과 같은 다형성 영역을 표적화하는 PCR 프라이머는 전형적으로, 당해 프라이머의 3' 말단이 다형성 염기 또는 염기들에 바로 근접한 염기에 하이브리드화하도록 설계된다. 본원의 구현예에서, 전방 및 역방 PCR 프라이머의 둘 모두의 3' 말단은 표적화된 대립유전자의 변이체 위치(다형성 부위)에서 하나 또는 몇개의 위치가 떨어져 있는 염기에 하이브리드화하도록 설계된다. 다형성 부위(SNP 또는 달리)와, 프라이머의 3' 말단이 하이브리드화하도록 설계된 염기 사이의 염기의 수는 하나의 염기일 수 있거나, 이는 2개의 염기일 수 있거나, 이는 3개의 염기일 수 있거나, 이는 4개의 염기일 수 있거나, 이는 5개의 염기일 수 있거나, 이는 6개의 염기일 수 있거나, 이는 7 내지 10개의 염기일 수 있거나, 이는 11 내지 15개의 염기일 수 있거나, 이는 16 내지 20개의 염기일 수 있다. 전방 및 후방 프라이머는 다형성 부위에서 떨어진 상이한 수의 염기를 하이브리드화하도록 설계될 수 있다.
PCR 검정은 다수로 생성될 수 있지만, 상이한 PCR 검정 사이의 상호작용은 이들을 약 100개 검정을 초과하여 다중화하기 어렵게 한다. 다양한 복합체 분자 접근법을 사용하여 다중화 수준을 증가시킬 수 있지만, 이는 여전히 반응당 100개 미만, 아마도 200개 미만, 또는 아마도 500개 미만의 검정으로 제한될 수 있다. 다량의 DNA를 지닌 시료는 다수의 부반응 중에서 쪼개진 후 서열분석 전에 재조합될 수 있다. 전체 시료 또는 DNA 분자의 일부 소 집단이 제한되는 경우에, 시료를 쪼개는 것은 실질적인 노이즈를 도입시킬 수 있다. 일 구현예에서, 소량 또는 제한된 양의 DNA는 10 pg 미만, 10 내지 100 pg, 100 pg 내지 1 ng, 1 내지 10 ng, 또는 10 내지 100 ng이 양을 말할 수 있다. 당해 방법이, 다수의 혼주물로 쪼갬을 포함하는 다른 방법이 도입된 확률적 노이즈와 관련된 유의적인 문제를 유발할 수 있는 소량의 DNA에서 특히 유용하지만, 당해 방법은 특정한 양의 DNA의 시료에서 실시되는 경우 편향을 최소화시키는 이점을 여전히 제공함에 주목한다. 이러한 상황에서, 공통의 예비-증폭 단계를 사용하여 전체 시료 양을 증가시킬 수 있다. 이상적으로, 이러한 예비-증폭 단계는 대립유전자 분포를 눈에 띄게 변경시키지 않아야 한다.
일 구현예에서, 본원의 방법은 체액의 단일 세포 또는 DNA와 같은 제한된 시료를 사용하여, 서열분석 또는 일부 다른 유전형 방법에 의한 유전형을 위해 다수의 표적화된 유전자자리, 구체적으로 1,000 내지 5,000개의 유전자자리, 5,000 내지 10,000개의 유전자자리 또는 10,000개 이상의 유전자자리에 대해 특이적인 PCR 생성물을 생성할 수 있다. 현재, 5 내지 10개 이상의 표적물의 다중 PCR 반응을 수행하는 것은 주요 챌린지를 나타내며 흔히 프라이머 이량체, 및 다른 인공물과 같은 프라이머 부생성물에 의해 방해된다. 하이브리드화 프로브를 지닌 미세배열을 사용하여 표적 서열을 검출하는 경우, 프라이머 이량체 및 다른 인공물은 이들이 검출되지 않으므로, 무시될 수 있다. 그러나, 검출 방법으로서 서열분석을 사용하는 경우, 많은 대부분의 서열분석 판독은 이러한 인공물을 서열분석할 수 있으며 시료 속의 목적한 표적 서열은 서열분석하지 않을 수 있다. 50 또는 100개 이상의 반응을 하나의 반응 용적으로 다중화한 후 서열분석하는데 사용된 선행 기술에 기술된 방법은 20% 이상, 및 흔히 50% 이상, 많은 경우에 80% 이상 및 일부 경우에 90% 이상의 오프-표적 서열 판독을 생성할 것이다.
일반적으로, 시료의 다수(n)의 표적(50 초과, 100 초과, 500 초과, 또는 1,000 초과)의 표적화된 서열분석을 수행하기 위해서는, 시료를, 하나의 개개 표적을 증폭시키는 다수의 평행 반응으로 쪼갤 수 있다. 이는, PCR 멀티웰 플레이트(multiwell plate) 속에서 수행하거나 FLUIDIGM ACCESS ARRAY(미세유동성 칩 속에서 시료 당 48개 반응) 또는 DROPLET PCR과 같은 시판 플랫폼에서 RAIN DANCE TECHNOLOGY(100 내지 수천개의 표적)에 의해 수행하여 왔다. 불행하게도, 이들 쪼갬-및-혼주 방법(split-and-pool method)은 제한된 양의 DNA를 지닌 시료의 경우 문제가 되는데, 이는 각각의 웰 속에 게놈의 각각의 영역의 하나의 카피가 존재함을 보증하기 위한 충분한 카피의 게놈이 흔히 존재하지 않기 때문이다. 이는, 다형성 유전자자리가 표적화되고 다형성 유전자자리에서 대립유전자의 상대적인 비율이 요구되는 경우 특히 심각한 문제가 되는데, 이는, 쪼갬 및 혼주에 의해 도입된 확률적 노이즈가 DNA의 원래의 시료 속에 존재한 대립유전자의 비율의 매우 불량하게 정밀한 측정을 유발할 것이기 때문이다. 본원에 기술된 것은, 단지 제한된 양의 DNA가 이용가능한 경우에 적용가능한 많은 PCR 반응을 효과적으로 및 효율적으로 증폭시키는 방법이다. 일 구현예에서, 당해 방법은 단일 세포, 체액, 모계 혈장, 생검, 환경 및/또는 외부 시료 속에서 발견된 자유로이 부유하는 DNA와 같은 DNA의 혼합물의 분석을 위해 적용할 수 있다.
일 구현예에서, 표적화된 서열분석은 다음 단계들 중 하나, 다수, 또는 모두를 포함할 수 있다. a)　DNA 단편의 양쪽 말단에서 어댑터 서열을 지닌 라이브러리를 생성하고 증폭시키는 단계. b) 라이브러리 증폭 후 다수의 반응물로 나누는 단계. c)　DNA 단편의 양쪽 말단에서 라이브러리를 어댑터 서열을 사용하여 생성시키고 임의로 증폭시키는 단계. d) 표적 및 하나의 태그 특이적인 프라이머 당 하나의 표적 특이적인 "전방" 프라이머를 사용하여 선택된 표적의 1000- 내지 10,000-플렉스 증폭을 수행하는 단계. e) 제1 라운드에서 표적 특이적인 전방 프라이머의 일부로서 도입된 공통의 태그에 대해 특이적인 하나(이상의) 프라이머 및 "역" 태그 특이적인 프라이머를 사용하여 당해 생성물의 제2 증폭을 수행하는 단계. f) 제한된 수의 주기를 위해 선택된 태그의 1000-플렉스 예비증폭을 수행하는 단계. g) 생성물을 다수의 분취량으로 나누어 개개 반응물 속의 표적의 소혼주물로 나누는 단계(예를 들면, 이것이 단일플렉스(singleplex)로 내리는 방법 모두를 사용할 수 있지만, 50 내지 500개-플렉스)을 증폭시키는 단계. h) 평행한 소혼주물 반응의 생성물을 혼주시키는 단계. i) 이들 증폭 동안에 프라이머가 서열분석 혼화성 태그(부분적으로 또는 완전한 길이의)를 수반할 수 있어서 생성물이 서열분석될 수 있도록 하는 단계.
고도의 다중화된 PCR
혈장에서 수득된 게놈 DNA와 같은 핵산 시료의 표적 서열(예를 들면, SNP 유전자자리)이 수백 내지 수천개 이상의 표적화된 증폭을 수행하는 방법이 본원에 기재되어 있다. 증폭된 시료는 비교적 프라이머 이량체 생성물을 함유하지 않을 수 있으며 표적 유전자자리에서 낮은 대립유전자 편향을 갖는다. 증폭 동안 또는 후에 생성물이 서열분석 혼화성 어댑터와 함께 첨부되는 경우, 이들 생성물의 분석은 서열분석에 의해 수행될 수 있다.
당해 분야에 공지된 방법을 사용하여 고도의 다중화 PCR 증폭을 수행하는 단계는 과도한 바람직한 증폭 생성물이고 서열분석에 적합하지 않은 프라이머 이량체 생성물의 형성을 초래한다. 이는 이들 생성물을 형성하는 프라이머를 제거하거나, 프라이머의 인실리코 선택을 수행함으로써 경험적으로 감소시킬 수 있다. 그러나, 검정의 수가 많아질수록, 이러한 문제는 더 어려워진다.
하나의 해결책은 5000-플렉스 반응을 수개의 보다 적은-플렉스 증폭, 예를 들면, 150-플렉스 또는 5000-플렉스 반응으로 쪼개거나, 미소유체를 사용하거나 심지어 시료를 개개 PCR 생성물로 쪼개는 것이다. 그러나, 시료 DNA가, 임신성 혈장에 의한 침입성 태아 진단에서와 같이, 제한된 경우, 시료를 다수의 반응 간에 나누는 것은 병목현상을 초래할 것이므로 피해야 한다.
본원에 기술된 것은 시료의 혈장 DNA를 우선 포괄적으로 증폭시킨 후 반응당 보다 중간인 수의 표적 서열을 사용하여 시료를 다수의 다중 표적화된 농축 반응으로 나누는 방법이다. 일 구현예에서, 본원의 방법은 다수의 유전자자리에서 DNA 혼합물을 우선적으로 농축시키기 위해 사용될 수 있으며, 당해 방법은 다음 단계 중 하나 이상을 포함한다: 라이브러리 속의 분자가 DNA 단편의 양쪽 말단에 연결된 어댑처 서열을 갖는 경우 DNA의 혼합물의 라이브러리를 생성하고 증폭시키는 단계, 증폭된 라이브러리를 다수의 반응으로 나누는 단계, 표적 당 하나의 표적 특이적인 "전방" 프라이머를 및 하나 또는 다수의 어댑터 특이적인 공통의 "역" 프라이머를 사용하여 선택된 표적의 다중 증폭의 제1 라운드를 수행하는 단계. 일 구현예에서, 본원의 방법은 "역" 표적 특이적인 프라이머 및 제1 라운드에서 표적 특이적인 전방 프라이머의 일부로서 도입된 공통의 태그에 대해 특이적인 하나 또는 다수의 프라이머를 사용한 제2의 증폭을 수행하는 단계를 추가로 포함한다. 일 구현예에서, 당해 방법은 완전히 중첩된, 헤미-내포된(hemi-nested), 반-내포된(semi-nested), 한면이 완전히 중첩된, 한면이 헤미-내포된, 또는 한면이 반-내포된 PCR 접근법을 포함할 수 있다. 일 구현예에서, 본원의 방법은 DNA 혼합물을 다수의 유전자자리에 우선적으로 농축시키기 위해 사용되며; 당해 방법은 제한된 수의 주기를 위해 선택된 표적의 다중 예비증폭을 수행하는 단계, 당해 생성물을 다수이 분취량으로 나누는 단계 및 개개 반응에서 표적의 소혼주물을 증폭시키는 단계, 및 평행한 소혼주물 반응의 생성물을 혼주시키는 단계를 포함한다. 당해 접근법을 사용하여 50 내지 500개 유전자자리, 500 내지 5,000개 유전자자리, 5,000 내지 50,000개 유전자자리, 또는 심지어 50,000 내지 500,000개 유전자자리에 대한 낮은 수준의 대립유전자 편향을 생성할 수 있는 방식으로 표적화된 증폭을 수행할 수 있다. 일 구현예에서, 프라이머는 부분적인 또는 완전한 길이의 서열분석 혼화성 태그를 수반한다.
작업흐름은 (1) 혈장 DNA와 같은 DNA를 추출하는 단계, (2) 단편의 양쪽 말단에서 공통의 어댑터를 사용하여 단편 라이브러리를 제조하는 단계, (3) 어댑터에 대해 특이적인 공통의 라이브러리를 사용하여 라이브러리를 증폭시키는 단계, (4) 증폭된 시료 "라이브러리"를 다수의 분취량으로 분리하는 단계, (5) 분취량에서 다중(예를 들면, 표적 및 표적-특이적인 프라이머당 하나의 표적 특이적인 프라이머와의 약 100-플렉스, 1,000, 또는 10,000-플렉스) 증폭을 수행하는 단계, (6) 하나의 시료의 분취량을 혼주시키는 단계, (7) 시료를 바코딩(barcoding)하는 단계, (8) 시료를 혼합하고 농도를 조절하는 단계, (9) 시료를 서열분석하는 단계를 포함할 수 있다. 당해 작업흐름은 나열된 단계(예를 들면, 라이브러리 단계를 제조하는 단계 (2)는 3개의 효소 단계(평활 말단, dA 테일링 및 어댑터 연결) 및 3개의 정제 단계) 중의 하나를 함유하는 다수의 소-단계를 포함할 수 있다. 작업흐름의 단계들은 합해지거나, 나누어지거나, 다른 순서(예를 들면, 바 코딩 및 시료의 혼주)로 수행될 수 있다.
라이브러리의 증폭은, 짧은 단편을 보다 효율적으로 증폭시키기 위해 편향되는 방식으로 수행될 수 있음을 주목하는 것이 중요하다. 이러한 방식으로, 보다 짧은 서열, 예를 들면, 임신한 여성의 순환에서 발견된 세포 유리된 태아 DNA(태반 기원의)와 같은 모노-뉴클레오좀 DNA 단편을 우선적으로 증폭시키는 것이 가능하다. PCR 검정은 태그, 예를 들면, 서열분석 태그(일반적으로 15 내지 25개 염기의 트렁케이트된(truncated) 형태)를 가질 수 있다. 다중화 후, 시료의 PCR 다중화를 혼주시킨 후 태그를 태그-특이적인 PCR(연결에 의해 또한 수행될 수 있다)에 의해 완료(바 코딩 포함)시킨다. 또한, 완전한 서열분석 태그를 다중화로서 동일한 반응에 가할 수 있다. 제1 주기에서 표적은 표적 특이적인 프라이머를 사용하여 증폭시킬 수 있으며, 후속적으로 태그-특이적인 프라이머는 SQ-어댑터 서열을 완료시키기 위해 대체된다. PCR 프라이머는 태그를 수반하지 않을 수 있다. 서열분석 태그는 연결에 의해 증폭 생성물에 첨부된다.
일 구현예에서, 클론 서열분석에 의한 증폭된 물질이 평가를 수반한 고도의 다중화 PCR은 태아 이수성의 검출과 같은 다양한 적용을 위해 사용될 수 있다. 전통적인 다중화 PCR이 50개 이하의 유전자자리를 동시에 평가하지만, 본원에 기술된 접근법을 사용하여 50개 이상의 유전자자리를 동시에, 100개 이상의 유전자자리를 동시에, 500개 이상의 유전자자리를 동시에, 1,000개 이상의 유전자자리를 동시에, 5,000개 이상의 유전자자리를 동시에, 10,000개 이상의 유전자자리를 동시에, 50,000개 이상의 유전자자리를 동시에, 및 100,000개 이상의 유전자자리를 동시에 동시 평가하는 것이 가능할 수 있다. 실험은, 10,000개 이상까지의 명백한 유전자자리가 하나의 반응에서, 비-침입성 태아 이수성 진단 및/또는 카피 수 요청을 고 정밀성으로 이루기에 충분히 우수한 효능 및 특이성으로 동시에 평가할 수 있음이 밝혀졌다. 검정은 단일 반응물을, 모계 혈장, 이의 단편, 또는 모계 태반에서 분리된 cfDNA 시료, 이의 단편, 또는 cfDNA 시료의 추가로 가공된 유도체와 같은 시료 전체와 합할 수 있다. 시료(예: cfDNA 또는 유도체)는 또한 다수의 평행한 다중 반응으로 쪼갤 수 있다. 최적의 시료 쪼갬 및 다중화는 다양한 수행 명세를 트레이딩 오프(trading off)함으로써 측정된다. 제한된 양의 물질로 인하여, 시료를 다수의 분획으로 쪼개는 단계는 시료채취 노이즈, 취급 시간을 도입시키고, 오차 가능성을 증가시킬 수 있다. 역으로, 보다 높은 다중화는 둘 모두 시험 수행능을 감소시킬 수 있는 보다 큰 양의 가짜 증폭 및 증폭 시 보다 큰 불균등을 생성할 수 있다.
본원에 기술된 방법의 적용 시 2개의 중요한 관련된 고려는, 대립유전자 빈도 또는 다른 측정이 수득되는 물질에서 원래의 시료(예를 들면, 혈장)의 제한된 양 및 원래의 분자의 수이다. 원래의 분자의 수가 특정 수준 이하인 경우, 무작위 시료채취 노이즈는 유의적으로 되며, 시험의 정밀도에 영향을 미칠 수 있다. 전형적으로, 비-침입성 모계 이수성 진단을 달성하기에 충분한 품질의 데이터는, 측정이 표적 유전자자리 당 500 내지 1000개의 원래의 분자의 등가물을 포함하는 시료에서 수행된 경우 수득될 수 있다. 다수의 명확한 측정, 예를 들면, 시료 용적을 증가시키는 다수의 방법이 존재한다. 시료에 적용된 각각의 조작은 또한 잠재적으로 물질의 손실을 초래한다. 다양한 조작으로 부과된 손실을 특성화하고 특정의 조작의 수율을 피하거나, 경우에 따라 개선시켜 시험의 수행능을 감퇴시킬 수 있는 손실을 피하는 것이 필수적이다.
일 구현예에서, 원래의 시료(예: cfDNA 시료) 모두 또는 분획을 증폭시킴으로써 후속된 단계에서 잠재적인 손실을 경감시키는 것이 가능하다. 시료 속에서 유전 물질 모두를 증폭시켜, 하부 공정에 이용가능한 양을 증가시키는 다양한 방법이 이용가능하다. 일 구현예에서, 연결 매개된 PCR(LM-PCR) DNA 단편은 하나의 명확한 어댑터, 2개의 명확한 어댑터, 또는 많은 명확한 어댑터의 연결 후 PCR에 의해 증폭된다. 일 구현예에서, 다수의 대체 증폭(MDA) phi-29 폴리머라제를 사용하여 모든 DNA를 등온선상으로 증폭시킨다. DOP-PCR 및 변형에서, 무작위 프라이밍을 사용하여 원래의 물질 DNA를 증폭시킨다. 각각의 방법은 모든 나타낸 게놈 영역에 걸친 증폭의 균일성, 원래 DNA의 포획 및 증폭 효능, 및 단편의 길이의 함수로서 증폭 수행능과 같은 특정의 특성을 갖는다.
일 구현예에서, LM-PCR은 3-프라임 타이로신을 갖는 단일의 이형이본체 어댑터(heroduplexed adaptor)와 함께 사용될 수 있다. 이형이본체 어댑터는 제1 라운드의 PCR 동안 원래의 DNA 단편의 5-프라임 및 3-프라임 말단에서 2개의 명확한 서열로 전환될 수 있는 단일의 어댑터 분자의 사용을 가능하도록 한다. 일 구현예에서, 크기 분리, 또는 AMPURE, TASS 또는 다른 유사한 방법에 의해 증폭된 라이브러리를 분획화하는 것이 가능하다. 연결 전에, 시료 DNA는 평활 말단화(blunt ended)한 후, 단일 아데노신 염기를 3-프라임 말단에 가한다. 연결 전에 DNA를 제한 효소 또는 일부 다른 절단 방법을 사용하여 절단할 수 있다. 시료 단편의 3-프라임 아데노신 및 어댑터의 상보성 3-프라임 타이로신 오버행(overhang)의 연결은 연결 효능을 향상시킬 수 있다. PCR 증폭의 연장 단계는 약 200 bp, 약 300 bp, 약 400 bp, 약 500 bp 또는 약 1,000 bp보다 더 긴 단편의 증폭을 감소시키기 위한 시간 관점으로 인해 제한될 수 있다. 모계 혈장에서 발견된 보다 긴 DNA는 거의 전적으로 모계이므로, 이는 태아 DNA의 10 내지 50%가지의 농축 및 시험 수행능의 개선을 야기할 수 있다. 다수의 반응을 시판되는 키트에 의해 명시된 바와 같은 조건을 사용하여 수행하였으며; 시료 DNA 분자의 10% 미만의 성공적인 연결을 생성하였다. 이를 위한 반응 조건의 일련의 최적화는 연결을 대략 70% 개선시켰다.
미니- PCR
다음의 미니-PCR 방법은 짧은 핵산, 분해된 핵산, 또는 단편화된 핵산, 예를 들면, cfDNA를 함유하는 시료에 바람직하다. 전통적인 PCR 검정 설계는 명백한 태아 분자의 유의적인 손실을 초래하지만 손실은 매우 짧은 PCR 검정, 일명 미니-PCR 검정을 설계함으로써 현저히 감소시킬 수 있다. 모계 혈청 중 태아 cfDNA는 고도로 단편화되며 단편 크기는, 평균이 160 bp이고, 표준 편차가 15 bp이며, 최소 크기가 약 100 bp이고, 최대 크기가 약 220 bp인 대략 정규 방식(Gaussian fashion)으로 분포된다. 표적화된 다형성과 관련하여 단편 출발 및 말단 위치의 분포는, 필수적으로 무작위적이지 않지만, 개개 표적 중에서 광범위하게 및 모든 표적 중에서 집합적으로 변하며 하나의 특정한 표적 유전자자리의 다형성 부위는 유전자자리에서 기원하는 다양한 단편 중에서 출발 내지 말단의 어떠한 위치를 점유할 수 있다. 용어 미니-PCR은 추가의 한계 또는 제한없이 정상의 PCR을 말함에 동일하게 양호하게 나타낼 수 있음을 주목한다.
PCR 동안에, 증폭은 전방 및 역방 프라이머 부위 둘 모두를 포함하는 주형 DNA 단편에서만 발생할 것이다. 태아 cfDNA 단편은 짧으므로, 전방 및 역방 프라이머 둘 모두를 포함하는 길이 L의 태아 단편의 가능성을 나타내는 프라이머 부위 둘다의 가능성은 앰플리콘의 길이 대 단편의 길이의 비이다. 이상적인 조건 하에서, 앰플이콘이 45, 50, 55, 60, 65, 또는 70 bp인 검정은 시판되는 주형 단편 분자를 각각 72%, 69%, 66%, 63%, 59%, 또는 56%에서 성공적으로 증폭시킬 것이다. 앰플리콘 길이는 전방 및 역방 프라이밍 부위의 5-프라임 말단 사이에서의 길이이다. 당해 분야에 공지된 것에 의해 전형적으로 사용된 것보다 더 짧은 앰플리콘 길이는 짧은 서열 판복물 만을 요구함으로써 목적한 다형성 유전자자리의 보다 효율적인 측정을 야기할 수 있다. 일 구현예에서, 앰플리콘의 실질적인 분획은 100 bp 미만, 90 bp 미만, 80 bp 미만, 70 bp 미만, 65 bp 미만, 60 bp 미만, 55 bp 미만, 50 bp 미만, 또는 45 bp 미만일 수 있다.
당해 분야에 공지된 방법에서, 본원에 기술된 것과 같은 짧은 검정은, 이들이 요구되지 않고 프라이머 길이, 어닐링 특성, 및 전방 및 역방 프라이머 사이의 거리를 제한함으로써 프라이머 설계에 있어서 고려할만한 구속을 부여하므로, 일반적으로 피해진다.
프라이머의 3-프라임 말단이 다형성 부위의 거의 1 내지 6개 염기 내에 있는 경우 편향된 증폭 잠재능이 있음에 또한 주목한다. 초기 폴리머라제 결합의 부위에서 이러한 단일 염기 차이는 하나의 대립유전자의 우선적인 증폭을 초래할 수 있으며, 이는 관찰된 대립유전자 빈도를 변경시키고 수행능을 감퇴시킬 수 있다. 이들 구속 모두는 특수 유전자자리를 성공적으로 증폭시킬 프라이머를 확인하고, 동일한 다중화 반응에 혼화성인 거대 세트의 프라이머를 설계하도록 한다. 일 구현예에서, 내부 전방 및 역방 프라이머의 3' 말단은 다형성 부위의 상부 DNA 영역에 하이브리드화하며, 소수의 염기에 의해 다형성 부위에서 분리되도록 설계된다. 이상적으로, 염기의 수는 6 내지 10개 염기일 수 있으나, 동일하게 4 내지 15개 염기, 3 내지 20개 염기, 2 내지 30개 염기, 또는 1 내지 60개 염기 사이일 수 있으며, 실질적으로 동일한 말단을 달성한다.
다중 PCR은, 모든 표적이 증폭되는 1 라운드의 PCR을 포함할 수 있거나 1라운드의 PCR에 이어서 하나 이상의 라운드의 중첩된 PCR 또는 중첩된 PCR의 일부 변형을 포함할 수 있다. 중첩된 PCR은 적어도 하나의 염기 쌍에 의해 내부적으로 이전 라운드에서 사용된 프라이머에 결합하는 하나 이상의 신규 프라이머를 사용한 후속 라운드 또는 라운드들의 PCR 증폭으로 이루어진다. 중첩된 PCR은 후속 반응에서 정확한 내부 서열을 갖는 이전 PCR의 증폭 생성물만을 증폭시킴으로써 다수의 가짜 증폭 표적의 수를 감소시킨다. 가짜의 증폭 표적을 감소시키는 것은 특히 서열분석 시 수득될 수 있는 유용한 측정의 수를 증진시킨다. 중첩된 PCR은 전형적으로 앞서의 프라이머 결합 부위에 대해 완전히 내부인 프라이머의 설계 단계, 증폭에 요구되는 최소 DNA 분절 크기를 필수적으로 증가시키는 단계를 포함한다. DNA가 고도로 단편화된 모계 혈장 cfDNA와 같은 시료의 경우, 보다 큰 검정 크기는, 측정이 수득될 수 있는 명확한 cfDNA 분자의 수를 감소시킨다. 일 구현예에서, 당해 효과를 상쇄시키기 위해, 제2 라운드의 프라이머 중 하나 또는 둘 모두를 내부적으로 일부 수의 염기에 연장하는 제1의 결합 부위를 오우버랩함으로써 전체 검정 크기를 최소한으로 증가시키면서 추가의 특이성을 달성하는 부분 중첩 접근법을 사용할 수 있다.
일 구현예에서, PCR 검정의 다중 혼주물을 설계하여 하나 이상의 염색체에서 이형접합성 SNP 또는 하나 이상의 염색체에서 다른 다형성 또는 비-다형성 유전자자리를 잠재적으로 증폭시킬 수 있으며 이들 검정은 단일 반응으로 사용하여 DNA를 증폭시킨다. PCR 검정의 수는 50 내지 200개의 PCR 검정, 200 내지 1,000개의 PCR 검정, 1,000 내지 5,000개의 PCR 검정, 또는 5,000 내지 20,000개의 PCR 검정(각각 50 내지 200-플렉스, 200 내지 1,000-플렉스, 1,000 내지 5,000-플렉스, 5,000 내지 20,000-플렉스, 20,000-플렉스 이상)일 수 있다. 일 구현예에서, 약 10,000개의 PCR 검정(10,000-플렉스)의 다중혼주물을 설계하여 염색체 X, Y, 13, 18, 및 21 및 1 또는 2 상의 이형접합성 SNP 유전자자리를 잠재적으로 증폭시키며 이들 검정은 단일 반응으로 사용됨으로써 모계 혈장 시료, 융모막 융모 시료, 양수 천자 시료, 단일 또는 소수의 세포, 다른 체액 또는 조직, 암, 또는 다른 유전 물질에서 수득된 cfDNA를 증폭시킨다. 각각의 유전자자리의 SNP 빈도는 앰플리콘의 서열분석의 클론성 또는 일부 다른 방법에 의해 측정될 수 있다. 대립유전자 빈도 분포의 통계적 분석 또는 모든 검정의 비를 사용하여 시료가 시험에 포함된 염색체 중의 하나 이상의 삼염색체를 함유하는지를 측정할 수 있다. 다른 구현예에서, 원래의 cfDNA 시료는 2개의 시료로 쪼개지며 평행한 5,000-플렉스 검정이 수행된다. 다른 구현예에서, 원래의 cfDNA 시료는 n개의 시료로 쪼개지고 평행한 (~10,000/n)-플렉스 검정이 수행되며, 여기서 n은 2 내지 12, 또는 12 내지 24, 또는 24 내지 48, 또는 48 내지 96이다. 데이터는 이미 기술된 바와 유사한 방식으로 수집되어 분석된다. 당해 방법은 전좌, 결실, 중복, 및 다른 염색체 비정상을 검출하는데 동등하게 잘 적용될 수 있음에 주목한다.
일 구현예에서, 표적 게놈에 대해 상동성이 없는 테일(tail)을 또한 프라이머 중 어느 것의 3-프라임 또는 5-프라임 말단에 가할 수 있다. 이들 테일은 후속적인 조작, 과정 또는 측정을 촉진한다. 일 구현예에서, 테일 서열은 전방 및 역방 표적 특이적인 프라이머에 대해 동일할 수 있다. 일 구현예에서, 상이한 테일을 전방 및 역방 표적 특이적인 프라이머에 대해 사용할 수 있다. 일 구현예에서, 다수의 상이한 테일은 상이한 유전자자리 또는 유전자자리의 세트에 사용할 수 있다. 특정의 테일은 모든 유전자자리 또는 유전자자리의 소세트 중에서 공유될 수 있다. 예를 들어, 현재의 서열분석 플랫폼 중 어느 것에 의해 요구되는 전방 및 역방 서열에 상응하는 전방 및 역방 테일을 사용하는 것은 증폭 후 직접적인 서열분석을 가능하도록 할 수 있다. 일 구현예에서, 당해 테일은 다른 유용한 서열을 가하기 위해 사용될 수 있는 모든 증폭된 표적 중에서 일반적인 프라이밍 부위로서 사용될 수 있다. 일부 구현예에서, 내부 프라이머는 표적화된 유전자자리(예를 들면, 다형성 유전자자리)의 상부 또는 하부에 하이브리드화하도록 설게된 영역을 함유할 수 있다. 일부 구현예에서, 프라이머는 분자 바코드를 함유할 수 있다. 일부 구현예에서, 프라이머는 PCR 증폭을 허용하도록 설계된 공통의 프라이밍 서열을 함유할 수 있다.
일 구현예에서, 10,000-플렉스 PCR 검정 혼주물을 생성시켜 전방 및 역방 프라이머가 HISEQ, GAIIX, 또는 MYSEQ(ILLUMINA에서 시판됨)와 같은 고 배출 서열분석 장치에 의해 요구되는 필요한 전방 및 역방 서열에 상응하는 테일을 가지도록 한다. 또한, 서열분석 테일에 대해 포함된 5-프라임은 뉴클레오타이드 바코드 서열을 앰플리콘에 가하여, 고 배출 서열분석 장치의 단일 레인 속에서 다수의 시료의 다중 서열분석이 가능하도록 하는 후속적인 PCR에서 프라이밍 부위로서 사용될 수 있는 추가의 서열이다.
일 구현예에서, 10,000-플렉스 PCR 검정 혼주물을 생성하여 역방 프라이머가 고 배출 서열분석 장치에 의해 요구되는 필요한 역방 서열에 상응하는 테일을 갖도록 한다. 제1의 10,000-플렉스 검정을 사용한 증폭 후, 후속적인 PCR 증폭을 모든 표적에 대해 부분적으로 중첩된 전방 프라이머(예를 들면, 6개-염기가 중첩된) 및 제1 라운드에 포함된 역 서열분석 테일에 상응하는 역방 프라이머를 갖는 다른 10,000-플렉스 혼주물을 사용하여 수행할 수 있다. 단지 하나의 표적 특이적인 프라이머 및 공통의 프라이머 한계를 사용한 이러한 후속적인 라운드의 부분적으로 중첩된 증폭은 검정의 요구되는 크기, 감소되는 시료채취 노이즈를 제한하지만, 가짜의 앰플리콘의 수를 크게 감소시킨다. 서열분석 태그를 첨부된 연결 어댑터에 및/또는 PCR 프로브의 일부로서 가하여 태그가 최종 앰플리콘의 일부이도록 할 수 있다.
태아 분획은 시험의 수행능에 영향을 미친다. 모계 혈장에서 발견된 DNA의 태아 분획을 농축시키는 다수의 방법이 존재한다. 태아 분획은 이미 논의된 앞서 기술된 LM-PCR 방법 및 또한 긴 모계 단편의 표적화된 제거에 의해 증가시킬수 있다. 일 구현예에서, 표적 유전자자리의 다중 PCR 증폭 전에, 추가의 다중 PCR 반응을 수행하여 후속적인 다중 PCR에서 표적화된 유전자자리에 상응하는 길고 거대한 모계 단편을 선택적으로 제거할 수 있다. 추가의 프라이머를 설계하여 세포 유리 테마 DNA 단편 중에 존재하는 것으로 예측되는 것보다 다형성에서부터 더 큰 거리의 부위를 어닐링한다. 이들 프라이머는 표적 다형성 유전자자리의 다중 PCR 이전에 1개 주기의 다중 PCR 반응에서 사용할 수 있다. 이들 원위(distal) 프라이머는 DNA의 태그된 조각의 선택적인 인지를 가능하도록 할 수 있는 분자 또는 잔기로 태그된다. 일 구현예에서, DNA의 이들 분자는 1회 주기의 PCR 후 이들 프라이머를 포함하는 새로이 형성된 이본쇄 DNA의 제거를 허용하는 바이오틴 분자로 공유결합적으로 변형시킬 수 있다. 제1 라운드가 원래의 물질가 유사하게 되는 동안에 이본쇄 DNA가 형성되었다. 하이브리드 물질의 제거는 자기 스트렙타비딘 비드의 사용에 의해 달성될 수 있다. 동등하게 잘 작업할 수 있는 다른 태그화 방법이 존재한다. 일 구현예에서, 크기 선택 방법을 사용하여 DNA의 보다 짧은 쇄; 예를 들면, 약 800 bp 미만, 약 500 bp 미만, 또는 약 300 bp 미만의 시료를 농축시킬 수 있다. 짧은 단편의 증폭은 이후에 일반적으로 진행될 수 있다.
본 기재내용에 기술된 미니-PCR 방법은 고도로 다중화된 증폭 및 단일 시료의 단일 반응으로 수백 내지 수천 또는 심지어 수백만의 유전자자리의 분석이 가능하도록 한다. 동시에, 증폭된 DNA의 검출은 다중화될 수 있으며, 수십 내지 수백개의 시료를 하나의 서열분석 레인에서 바코드화 PCR을 사용하여 다중화할 수 있다. 이들 다중화된 검출은 49-플렉스 이하까지 성공적으로 시험되어 왔으며, 보다 높은 정도의 다중화가 가능한다. 실제로, 이는 수백개의 시료가 단일의 서열분석 실행 시 수천개의 SNP에서 유전형화되도록 한다. 이러한 시료의 경우, 당해 방법은 유전형 및 이형접합성 비율의 측정 및 카피 수의 동시 측정을 허용하며, 이들 둘 모두는 이수성 검출 목적으로 사용될 수 있다. 당해 방법은 모계 혈장에서 발견된 자유로이 부유하는 DNA에서, 잉태된 태아의 이수성을 검출하는 데 특히 유용하다. 당해 방법은 태아의 성감별 및/또는 태아의 친부를 예측하는 방법이 일부로서 사용될 수 있다. 이는 돌연변이 용량을 위한 방법의 일부로서 사용될 수 있다. 당해 방법은 특정 양의 DNA 또는 RNA에 대해 사용될 수 있으며, 표적화된 영역은 SNP, 다른 다형성 영역, 비-다형성 영역, 및 이의 조합일 수 있다.
일부 구현예에서, 단편화된 DNA의 연결 매개된 공통의-PCR 증폭을 사용할 수 있다. 연결 매개된 공통의-PCR 증폭을 사용하여 혈장 DNA를 증폭시킬 수 있으며, 이후에 이는 다수의 평행한 반응으로 나눌 수 있다. 이는 또한 짧은 단편을 우선적으로 증폭시킴으로써 태아 분획을 농축시키는데 사용될 수 있다. 일부 구현예에서, 연결에 의해 태그를 분획에 첨가하는 것은 보다 짧은 분획의 검출, 프라이머의 보다 짧은 표적 서열 특이적인 부위의 사용 및/또는 비특이적인 반응을 감소시키는 보다 높은 온도에서 어닐링을 가능하도록 할 수 있다.
본원에 기술된 방법은 다수의 목적을 위해 사용될 수 있으며, 여기서 오염되는 DNA의 양과 혼합되는 DNA의 표적 세트가 존재한다. 일부 구현예에서, 표적 DNA 및 오염되는 DNA는 유전적으로 관련된 개체에서 기원할 수 있다. 예를 들어, 태아(표적)에서 유전적 비정상은 태아(표적) DNA 및 또한 모(오염되는) DNA를 함유하는 모계 혈장에서 검출될 수 있으며; 비정상은 전체 염색체 비정상(예를 들면, 이수성) 부분 염색체 비정상(예를 들면, 결실, 중복, 역전, 전좌), 폴리뉴클레오타이드 다형성(예를 들면, STR), 단일 뉴클레오타이드 다형성, 및/또는 다른 유전적 비정상 또는 차이를 함유하는 모계 혈장에서 검출될 수 있다. 일부 구현예에서, 표적 및 오염되는 DNA는 동일한 개체에서 기원할 수 있지만, 표적 및 오염되는 DNA는 예를 들면, 암의 경우에 하나 이상의 돌연변이에 의해 상이하다(참조: 예를 들면, H. Mamon 등 Preferential Amplification of Apoptotic DNA from Plasma : Potential for Enhancing Detection of Minor DNA Alterations in Circulating DNA. Clinical Chemistry 54:9 (2008)). 일부 구현예에서, DNA는 세포 배양물(세포자멸사) 상층액 속에서 발견될 수 있다. 일부 구현예에서, 후속적인 라이브러리 제조, 증폭 및/또는 서열분석을 위한 생물학적 시료(예: 혈액)속에서 세포자멸사를 유도하는 것이 가능하다. 이러한 목표를 달성하기 위한 다수의 가능한 작업흐름 및 프로토콜을 본원의 어느 곳에 나타낸다.
일부 구현예에서, 표적 DNA는 단일 세포에서, 표적 게놈의 1개 미만의 카피로 이루어진 DNA의 시료에서, 소량의 DNA에서, 혼합 기원의 DNA(예를 들면, 임신성 혈장; 태반 및 모계 DNA; 암 환자 혈장 및 종양: 건강한 DNA와 암 DNA 사이이 혼합물, 이식 등)에서, 다른 체액에서, 세포 배양물에서, 배양물 상층액에서, DNA의 법의학적 시료에서, DNA의 고대 시료에서(예를 들면, 호박 속에 갖힌 곤충), DNA의 다른 시료에서, 그리고 이의 조합에서 기원할 수 있다.
일부 구현예에서, 짧은 앰플리콘 크기를 사용할 수 있다. 짧은 앰플리콘 크기는 단편화된 DNA에 특히 적합하다(참조: 예를 들면, A. Sikora, et sl. Detection of increased amounts of cell-free fetal DNA with short PCR amplicons. Clin Chem . 2010 Jan;56(1):136-8.).
짧은 앰플리콘 크기의 사용은 일부 유의적인 이점을 생성할 수 있다. 짧은 앰플리콘 크기는 최적화된 앰플리콘 효능을 생성할 수 있다. 짧은 앰플리콘 크기는 전형적으로 보다 짧은 생성물을 생산하므로, 비특이적인 프라이밍에 대한 기회가 거의 없다. 보다 짧은 생성물은 집단이 보다 작을 것이므로, 서열분석 유동 셀에서 보다 농밀하게 군집될 수 있다. 본원에 기술된 방법은 보다 긴 PCR 앰플리콘에 동등하게 잘 작업할 수 있음에 주목한다. 앰플리콘 길이는, 경우에 따라, 예를 들면, 보다 긴 서열 길이를 서열분석하는 경우에 증가될 수 있다. 중첩된-PCR 프로토콜에서 제1 단계로서 100 bp 내지 200 bp 길이의 검정을 사용한 146-플렉스 표적화된 증폭을 사용한 실험은 단일 세포 및 게놈 DNA에서 긍정적인 결과로 수행되었다.
일부 구현예에서, 본원에 기술된 방법을 사용하여 SNP, 카피 수, 뉴클레오타이드 메틸화, mRNA 수준, 다른 유형의 RNA 발현 수준, 다른 유전적 및/또는 후생적 특징을 증폭시키고/시키거나 검출할 수 있다. 본원에 기술된 미니-PCR 방법은 다음-세대 서열분석과 함께 사용될 수 있는데; 이는 미세배열, 디지탈 PCR에 의한 계수, 실시간 PCR, 질량-분광 분석 등과 같은 다른 하부 방법과 함께 사용할 수 있다.
일부 구현예에서, 본원에 기술된 미니-PCR 증폭 방법은 소수 집단의 정밀한 정량화를 위한 방법의 일부로서 사용될 수 있다. 이는 스파이크 교정기(spike calibrator)를 사용한 절대적인 정량화를 위해 사용할 수 있다. 이는 매우 깊은 서열분석을 통한 돌연변이/소수의 대립유전자 정량화를 위해 사용할 수 있으며, 고도로 다중화된 양식으로 수행할 수 있다. 이는 사람, 동물, 식물 또는 다른 창조물에서 표준 친자확인 및, 친척 또는 조상의 실체 시험을 위해 사용될 수 있다. 이는 외부 시험을 위해서도 사용될 수 있다. 이는 어떠한 종류의 물질, 예를 들면, 양수 및 CVS, 정자, 수태 생성물(POC)에서 신속한 유전형 및 카피 수 분석(CN)에 사용될 수 있다. 이는 배아에서 채취한 생검된 시료에서 유전형과 같은 단일 세포 분석을 위해 사용될 수 있다. 이는 미니-PCR을 사용한 표적화된 서열분석에 의한 신속한 배아 분석(1일 미만, 1일, 또는 2일 내의 생검)을 위해 사용될 수 있다.
일부 구현예에서, 이는 종양 분석을 위해 사용될 수 있으며: 종양 생검은 흔히 건강한 세포 및 종양 세포의 혼합물이다. 표적화된 PCR은 배경 서열과 근접하거나 배경 서열이 없는 SNP 및 유전자자리의 깊은 서열분석을 허용한다. 이는 종양 DNA에서 이형접합성 분석의 손실 및 카피 수를 위해 사용될 수 있다. 상기 종양 DNA는 종양 환자의 많은 상이한 체액 또는 조직 속에 존재할 수 있다. 이는 종양 재발 및/또는 종양 스크리닝의 검출을 위해 사용될 수 있다. 이는 종자의 품질 조절 시험을 위해 사용될 수 있다. 이는 양식 또는 어업 목적을 위해 사용될 수 있다. 이들 방법 중 어느 것도 배수성 요청의 목적을 위한 비-다형성 유전자자리를 표적화하는데 동등하게 잘 사용될 수 있음에 주목한다.
본원에 개시된 방법의 근간이 되는 기본 방법중의 일부를 기술하는 일부 문헌은 다음을 포함한다: (1) Wang HY, Luo M, Tereshchenko IV, Frikker DM, Cui X, Li JY, Hu G, Chu Y, Azaro MA, Lin Y, Shen L, Yang Q, Kambouris ME, Gao R, Shih W, Li H. Genome Res. 2005 Feb;15(2):276-83. Department of Molecular Genetics, Microbiology and Immunology/The Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey 08903, USA. (2) High-throughput genotyping of single nucleotide polymorphims with high sensitivity. Li H, Wang HY, Cui X, Luo M, Hu G, Greenawalt DM, Tereshchenko IV, Li JY, Chu Y, Gao R. Methods Mol Biol. 2007;396 - PubMed PMID: 18025699. (3) A method comprising mUL tiplexing of an average of 9 assays for sequencing is described in: Nested Patch PCR enables highly mul tiplexed mutation discovery in candidate genes. Varley KE, Mitra RD. Genome Res. 2008 Nov;18(11):1844-50. Epub 2008 Oct 10. 본원에 개시된 방법은 상기 참조문헌에서보다도 더 많은 크기 정도의 다중화를 허용함에 주목한다.
표적화된 PCR 변이체 - 중첩
PCR을 수행하는 경우 가능한 많은 작업흐름이 존재하는데; 본원에 개시된 방법에 대해 전형적인 일부 작업흐름이 기재되어 있다. 본원에 요약된 단계는 다른 가능한 단계를 배제하는 것을 의미하지 않을 뿐 아니라 본원에 기술된 단계들 중 어느 것도 당해 방법이 적절히 작업하도록 하기 위해 요구되지 않음을 내포한다. 다수의 매개변수 변화 또는 다른 변형이 문헌에 공지되어 있으며, 본 발명의 본질에 영향을 미치지 않고 이루어질 수 있다. 하나의 특정한 일반화된 작업흐름은 다수의 가능한 변화를 수반하여 하기에 제공된다. 당해 변수는 전형적으로 가능한 제2 PCR 반응, 예를 들면, 수행될 수 있는 상이한 유형의 중첩을 말한다(단계 3). 변수를 상이한 시간에, 또는 본원에 명쾌하게 기술된 것과는 상이한 순서로 수행할 수 있다. 설명을 위해 다형성을 사용하는 예는, 경우에 따라, 비 다형성 유전자자리의 증폭을 위해 용이하게 채택할 수 있다 .
1. 시료 속의 DNA는, 첨부된, 라이브러리 태그 또는 연결 어댑터 태그(LT)로서 흔히 언급된 연결 어댑터를 가질 수 있으며, 여기서 연결 어댑터는 공통의 프라이밍 서열을 함유하며, 공통의 증폭을 수반한다. 일 구현예에서, 이는 단편화 후 서열분석 라이브러리를 생성하도록 설계된 표준 프로토콜을 사용하여 수행할 수 있다. 일 구현예에서, DNA 시료는 평활 말단화시킨 후, A를 3' 말단에 가할 수 있다. T-오버행을 지닌 Y-어댑터를 가하여 연결할 수 있다. 일부 구현예에서, 다른 점성 말단을 A 또는 T 오버행 대신 사용할 수 있다. 일부 구현예에서, 다른 어댑터, 예를 들면, 루프 연결 어댑터를 가할 수 있다. 일부 구현예에서, 어댑터는 PCR 증폭을 위해 설계된 태그를 가질 수 있다.
2. 특이적인 표적 증폭(STA): 수백 내지 수천 내지 수만 및 심지어 수십만 개의 표적의 예비 증폭을 하나의 반응 용적 속에서 다중화할 수 있다. STA는 전형적으로 5 내지 40회 주기, 2 내지 50회 주기, 및 심지어 1 내지 100회 주기로 실시될 수 있지만, 전형적으로 10 내지 30회 주기로 실시된다. 프라이머는 예를 들면, 보다 단순한 작업흐름을 위해 조절되거나 큰 비율의 이량체의 서열분석을 피할 수 있다. 전형적으로 동일한 태그를 수반하는 프라이머 둘 모두의 이량체는 증폭되지 않거나 효율적으로 서열분석되지 않을 것이다. 일부 구현예에서, 1 내지 10회 주기의 PCR을 수행할 수 있으며; 일부 구현예에서, 10 내지 20회 주기의 PCR을 수행할 수 있으며; 일부 구현예에서, 20 내지 30회 주기의 PCR을 수행할 수 있으며; 일부 구현예에서, 30 내지 40회의 PCR을 수행할 수 있고; 일부 구현예에서, 40회 이상 주기의 PCR을 수행할 수 있다. 증폭은 선형 증폭일 수 있다. 다수의 PCR 주기를 최적화시켜 최적 깊이의 판독(DOR) 프로파일을 생성할 수 있다. 상이한 DOR 프로파일이 상이한 목적을 위해 바람직할 수 있다. 일부 구현예에서, 모든 검정들 사이에서 보다 균일한 판독물의 분포가 바람직하며; DOR이 일부 검정의 경우 너무 작은 경우, 확률적 노이즈는 데이터의 경우 너무 커서 너무 유용할 수 없게 될 수 있지만, 판독물의 깊이가 너무 큰 경우, 각각의 추가의 판독물의 미미한 유용성은 비교적 작다.
프라이머 테일은 공통으로 태그된 라이브러리에서 단편화된 DNA를 검출하는 것을 개선시킬 수 있다. 라이브러리 태그 및 프라이머-테일이 동종 서열을 함유하는 경우, 하이브리드화가 개선될 수 있으며(예를 들면, 용융 온도(T_M)는 낮아진다) 프라이머는, 프라이머 표적 서열의 일부분 만이 시료 DNA 단편내에 있는 경우 연장될 수 있다. 일부 구현예에서, 13개 이상의 표적 특이적인 염기쌍이 사용될 수 있다. 일부 구현예에서, 10 내지 12개의 표적 특이적인 염기쌍이 사용될 수 있다. 일부 구현예에서, 8 내지 9개의 표적 특이적인 염기쌍이 사용될 수 있다. 일부 구현예에서, 6 내지 7개의 표적 특이적인 염기쌍이 사용될 수 있다. 일부 구현예에서, STA는 예비-증폭된 DNA, 예를 들면, MDA, RCA, 다른 전체의 게놈 증폭, 또는 어댑터-매개된 공통의 PCR에서 수행할 수 있다. 일부 구현예에서, STA는 예를 들면, 크기 선택, 표적 포획, 직접적인 분해에 의해 특정의 서열 및 집단이 농축되거나 고갈된 시료에서 수행될 수 있다.
3. 일부 구현예에서, 제2의 다중 PCR 또는 프라이머 연장 반응을 수행하여 특이성을 증가시키고 바람직하지 않은 생성물을 감소시키는 것이 가능하다. 예를 들면, 보다 작은 검정 혼주물의 평행한 반응물로의 완전히 중첩, 어느 정도-중첩, 반-중첩, 및/또는 세분은 특이성을 증가시키기 위해 사용될 수 있는 모든 기술이다. 실험은, 시료를 400-플렉스 반응으로 쪼개는 것이 정확하게 동일한 프라이머를 사용한 하나의 1,200-플렉스 반응보다 더 큰 특이성을 갖는 생성물 DNA를 생성함을 나타낸다. 유사하게, 실험은 시료를 4개의 2,400-플렉스 반응으로 쪼개는 것이 정확하게 동일한 프라이머를 사용한 하나의 9,600-플렉스 반응보다 더 큰 특이성으로 생성물 DNA를 생성함을 나타낸다. 일 구현예에서, 동일한 및 반대의 방향성의 표적-특이적인 및 태그 특이적인 프라이머를 사용하는 것이 가능하다.
4. 일부 구현예에서, 태그-특이적인 프라이머 및 "공통의 증폭"을 사용한 STA 반응에 의해 (희석, 정제 또는 기타의 것) 생산된 DNA 시료를 증폭시키는, 즉, 많거나 또는 모든 예비-증폭되고 태그된 표적을 증폭시키는 것이 가능하다. 프라이머는 추가의 기능적 서열, 예를 들면, 바코드, 또는 고 배출 서열분석 플랫폼을 서열분석하는데 필수적인 완전한 어댑터 서열을 함유할 수 있다.
이들 방법은 DNA의 특정 시료의 분석을 위해 사용될 수 있으며, DNA의 시료가 특히 작거나, 이것이 DNA의 시료(여기서 당해 DNA는 모계 혈장의 경우에서와 같은 하나 이상의 개체에서 기원한다)인 경우 특히 유용하다. 이들 방법은 단일 또는 소수의 세포, 게놈 DNA, 혈장 DNA, 증폭된 혈장 라이브러리, 증폭된 세포자멸사 상층액 라이브러리, 또는 혼합된 DNA의 다른 시료와 같은 DNA 시료에서 사용될 수 있다. 일 구현예에서, 이들 방법은, 상이한 유전 구성의 세포가 암 또는 이식체를 지닌 것과 같은 단일의 개체 속에 존재할 수 있는 경우에 사용될 수 있다.
프로토콜 변이체 (상기 작업흐름에 대한 변형 및/또는 첨가)
직접적인 다중 미니- PCR: 태그된 프라이머를 지닌 다수의 표적 서열의 특이적인 표적 증폭(STA)을 도 1에 나타낸다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 DNA를 나타낸다. (102)는 공통의 증폭을 위해 첨가된 연결 어댑터를 지닌 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 PCR 프라이머를 사용하여 공통으로 증폭된 일본쇄 DNA를 나타낸다. (104)는 최종 PCR 생성물을 나타낸다. 일부 구현예에서, STA는 100개 이상, 200개 이상, 500개 이상, 1,000개 이상, 2,000개 이상, 5,000개 이상, 10,000개 이상, 20,000개 이상, 50,000개 이상, 100,000 또는 200,000개 이상의 표적에서 수행될 수 있다. 후속적인 반응에서, 태그-특이적인 프라이머는 모든 표적 서열을 증폭시키고 태그를 연장시켜 시료 색인을 포함하는 서열분석을 위한 모든 필수적인 서열을 포함한다. 일 구현예에서, 프라이머는 태그되지 않을 수 있거나 단지 특정의 프라이머만 태그될 수 있다. 서열분석 어댑터는 통상의 어댑터 연결에 의해 첨가될 수 있다. 일 구현예에서, 초기 프라이머는 태그를 수반할 수 있다.
일 구현예에서, 프라이머는 증폭된 DNA의 길이가 예상치 못하게 짧도록 설계된다. 선행 기술은, 당해 분야의 통상의 기술자가 전형적으로 100+ bp의 앰플리콘을 설계함을 입증한다. 일 구현예에서, 앰플리콘은 80 bp 미만이 되도록 설계할 수 있다. 일 구현예에서, 앰플리콘은 70 bp 미만이 되도록 설계할 수 있다. 일 구현예에서, 앰플리콘은 60 bp 미만이 되도록 설계할 수 있다. 일 구현예에서, 앰플리콘은 50 bp 미만이 되도록 설계할 수 있다. 일 구현예에서, 앰플리콘은 45 bp 미만이 되도록 설계할 수 있다. 일 구현예에서, 앰플리콘은 40 bp 미만이 되도록 설계할 수 있다. 일 구현예에서, 앰플리콘은 35 bp 미만이 되도록 설계할 수 있다. 일 구현예에서, 앰플리콘은 40 내지 65 bp가 되도록 설계할 수 있다.
실험은 1200-플렉스 증폭을 사용하는 당해 프로토콜을 사용하여 수행하였다. 게놈 DNA 및 임신성 혈장 둘다를 사용하였고; 약 70%의 서열 판독물이 표적화된 서열로 맵핑되었다. 세부사항은 또한 당해 문서의 어딘가에 제공된다. 검정의 설계 및 선택없이 1042-플렉스의 서열분석은 프라이머 이량체 생성물인 >99%의 서열을 생성하였다.
후속적인 PCR: STA1후 다중 분취량의 생성물을 동일한 프라이머와의 감소된 복합성의 혼주물과 평행하게 증폭시킬 수 있다. 제1의 증폭은 물질을 충분하게 쪼갤 수 있다. 당해 방법은 작은 시료, 예를 들면, 약 6 내지 100 pg, 약 100 pg 내지 1 ng, 약 1 ng 내지 10 ng, 또는 약 10 ng 내지 100 ng인 것에 특히 우수하다. 당해 프로토콜은 1200-플렉스를 사용하여 3개의 400-플렉스가 되도록 수행하였다. 서열분석 판독물의 맵핑은 1200-플렉스 단독에서 60 내지 70% 근처에서 95% 이상까지 증가시켰다.
어느정도 -중첩된 미니 - PCR:(참조: 도 2) STA 1 후 내부 중첩된 전방 프라이머(103 B, 105 b) 및 하나의(또는 소수의) 태크-특이적인 역방 프라이머(103 A)의 다중 세트를 포함하는 제2의 STA를 수행한다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 DNA를 나타낸다. (102)는 공통 증폭을 위해 가해진 연결 어댑터를 갖는 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 전방 프라이머 B 및 역방 프라이머 A로 공통적으로 증폭된 일본쇄 DNA를 나타낸다. (104)는 (103)에서 수득한 PCR 생성물을 나타낸다. (105)는 하이브리드화된 중첩된 전방 프라이머 b, 및 (103)과 (104) 사이에 발생한 PCR의 분자의 이미 일부인 역 태그 A를 지닌 (104)에서 수득한 생성물을 나타낸다. (106)은 최종 PCR 생성물을 나타낸다. 당해 작업흐름을 사용하여 일반적으로 95% 이상의 서열을 의도된 표적으로 맵핑한다. 중첩된 프라이머는 외부 전방 프라이머 서열과 오버랩될 수 있지만 추가의 3'-말단 염기를 도입한다. 일부 구현예에서, 1개 내지 20개 사이의 여분의 3' 염기를 사용하는 것이 가능하다. 실험은, 1,200-플렉스 설계에서 9개 이상의 여분의 3' 염기를 사용하는 것이 잘 작업됨을 나타낸다.
완전히 중첩된 미니 - PCR:(참조: 도 3) STA 단계 1 후, 태그(A, a, B, b)를 수반하는 2개의 중첩된 프라이머를 사용하여 제2의 다중 PCR(또는 감소된 다중성의 평행 m.p. PCR)을 수행하는 것이 가능하다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 DNA를 나타낸다. (102)는 공통의 증폭을 위해 첨가된 연결 어댑터를 지닌 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 전방 프라이머 B 및 역방 프라이머 A를 사용하여 공통적으로 증폭된 일본쇄 DNA를 나타낸다. (104)는 (103)에서 수득한 PCR 생성물을 나타낸다. (105)는 하이브리드화된 중첩된 전방 프라이머 b 및 중첩된 역방 프라이머를 지닌 (104)에서 수득한 생성물을 나타낸다. (106)은 최종 PCR 생성물을 나타낸다. 일부 구현예에서, 2개의 완전한 세트의 프라이머를 사용하는 것이 가능하다. 완전히 중첩된 미니-PCR 프로토콜을 사용하는 실험을 사용하여 단일 및 3개의 세포에서 공통의 연결 어댑터를 첨부하고 증폭시키는 단계 (102)없이 146-플렉스 증폭을 수행하였다.
반 -내포된 미니- PCR:(참조: 도 4) 단편 말단에 어댑터를 갖는 표적 DNA를 사용하는 것이 가능하다. 전방 프라이머(B) 및 하나(또는 소수의) 태그-특이적인 역방 프라이머(A)의 다중 세트를 포함하는 STA를 수행한다. 제2의 STA는 공통의 태그-특이적인 전방 프라이머 및 표적 특이적인 역방 프라이머를 사용하여 수행할 수 있다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 DNA를 나타낸다. (102)는 공통의 증폭을 위해 첨가된 연결 어댑터를 지닌 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 역방 프라이머 A를 사용하여 공통적으로 증폭된 일본쇄 DNA를 나타낸다. (104)는 역방 프라이머 A 및 연결 어댑터 태그 프라이머 LT를 사용하여 증폭된 (103)에서 수득한 PCR 생성물을 나타낸다. (105)는 하이브리드화된 전방 프라이머 B를 지닌 (104)에서 수득한 생성물을 나타낸다. (106)은 최종 PCR 생성물을 나타낸다. 당해 작업흐름에서, 표적 특이적인 전방 및 역방 프라이머를 별도의 반응에서 사용함으로써, 반응의 복잡성을 감소시키고 전방 및 역방 프라이머의 이량체 형성을 방지한다. 당해 실시예에서, 프라이머 A 및 B는 제1 프라이머인 것으로 고려될 수 있으며, 프라이머 'a' 및 'b'는 내부 프라이머인 것으로 고려될 수 있다. 당해 방법은 직접적인 PCR만큼 양호하므로 이는 직접적인 PCR에서 크게 개선되어 있지만 프라이머 이량체를 피한다. 세미 중첩된 프로토콜의 제1 라운드, 전형적으로 ~99%의 표적화되지 않은 DNA를 찾을 수 있지만, 제2 라운드 후에는 전형적으로 큰 개선이 있다.
3중 반-내포된 미니 - PCR: (참조: 도 5) 단편 말단에서 어댑터를 갖는 표적 DNA를 사용하는 것이 가능하다. 전방 프라이머(B) 및 하나(또는 소수의) 태그-특이적인 역방 프라이머(A) 및 (a)의 다중 세트를 포함하는 STA를 수행한다. 제2의 STA는 공통의 태그-특이적인 전방 프라이머 및 표적 특이적인 역방 프라이머를 사용하여 수행할 수 있다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 DNA를 나타낸다. (102)는 공통의 증폭을 위해 첨가된 연결 어댑터를 지닌 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 역방 프라이머 A를 사용하여 공통적으로 증폭된 일본쇄 DNA를 나타낸다. (104)는 역방 프라이머 A 및 연결 어댑터 태그 프라이머 LT를 사용하여 증폭된 (103)에서 수득한 PCR 생성물을 나타낸다. (105)는 하이브리드화된 전방 프라이머(B)를 지닌 (104)에서 수득한 생성물을 나타낸다. (106)은 역방 프라이머 A 및 전방 프라이머 B를 사용하여 증폭된 (105)에서 수득한 PCR 생성물을 나타낸다. (107)은 하이브리드화된 역방 프라이머 'a'를 지닌 (106)에서 수득한 생성물을 나타낸다. (108)은 최종 PCR 생성물을 나타낸다. 당해 실시예에서, 프라이머 'a' 및 'B'는 내부 프라이머인 것으로 고려될 수 있으며, A는 제1 프라이머인 것으로 고려될 수 있다. 임의로, A 및 B 둘 모두는 제1 프라이머인 것으로 고려될 수 있고 'a'는 내부 프라이머인 것으로 고려될 수 있다. 역방 및 전방 프라이머의 지정은 스위치될 수 있다. 작업흐름에서, 표적 특이적인 전방 및 역방 프라이머를 별도의 반응에 사용함으로써, 반응의 복잡성을 감소시키고 전방 및 역방 프라이머의 이량체 형성을 방지할 수 있다. 당해 방법은 직접적인 PCR만큼 양호하므로 직접적인 PCR에서 크게 개선되어 있지만, 프라이머 이량체를 피한다. 반 중첩된 프로토콜의 제1 라운드 후, 전형적으로 ~99%의 표적화되지 않은 DNA를 찾을 수 있지만, 제2 라운드 후에는 전형적으로 큰 개선이 있다.
일-면 중첩된 미니- PCR: (참조: 도 6) 단편 말단에서 어댑터를 갖는 표적 DNA를 사용하는 것이 가능하다. 중첩된 전방 프라이머의 다중화 세트 및 역방 프라이머로서 연결 어댑터 태그를 사용하여 STA를 또한 수행할 수 있다. 이후에, 제2의 STA는 중첩된 전방 프라이머 및 공통의 역방 프라이머의 세트를 사용하여 수행할 수 있다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 DNA를 나타낸다. (102)는 공통의 증폭을 위해 첨가된 연결 어댑터를 지닌 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 역방 프라이머 A를 사용하여 공통적으로 증폭된 일본쇄 DNA를 나타낸다. (104)는 역방 프라이머 A 및 연결 어댑터 태그 프라이머 LT를 사용하여 증폭된 (103)에서 수득한 PCR 생성물을 나타낸다. (105)는 하이브리드화된 중첩된 전방 프라이머를 지닌 (104)에서 수득한 생성물을 나타낸다. (106)은 최종 PCR 생성물을 나타낸다. 당해 방법은 제1 및 제2의 STA에서 중첩 프라이머를 사용하여 표준 PCR보다 더 짧은 표적 서열을 검출할 수 있다. 당해 방법은 상기 STA 단계 1 - 공통의 태그의 첨부 및 증폭을 이미 수행한 DNA의 시료로 전형적으로 수행하며; 2개의 중첩된 프라이머는 단지 하나의 면 위에 있고, 다른 면은 라이브러리 태그를 사용한다. 당해 방법은 세포자멸사 상층액 및 임신 혈장의 라이브러리에서 수행하였다. 당해 작업흐름으로, 대략 60%의 서열이 의도된 표적에 맵핑되었다. 역 어댑터 서열을 함유한 판독물은 맵핑되지 않으므로, 당해 번호는 역방 어댑터 서열을 함유하는 판독물이 맵핑되는 경우 보다 높을 것으로 예상됨에 주목한다.
단-면 미니 - PCR: 단편 말단에 어댑터를 지닌 표적 DNA를 사용하는 것이 가능하다(참조: 도 7). STA는 전방 프라이머 및 하나(또는 소수의) 태그-특이적인 역방 프라이머를 사용하여 수행할 수 있다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 DNA를 나타낸다. (102)는 공통의 증폭을 위해 첨가된 연결 어댑터를 지닌 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 전방 프라이머 A를 지닌 일본쇄 DNA를 나타낸다. (104)는 전방 프라이머 A 및 연결 어댑터 태그 역방 프라이머 LT를 사용하여 증폭된 (103)에서 수득한 PCR 생성물을 나타낸다. 당해 방법은 표준 PCR보다 더 짧은 표적 서열을 검출할 수 있다. 그러나, 이는 단지 하나의 표적 특이적인 프라이머가 사용되므로 비교적 비특이적일 수 있다. 당해 프로토콜은 단면 중첩된 미니 PCR의 효과적으로 반이다.
역방 반-내포된 미니- PCR: 단편 말단에서 어댑터를 가진 표적 DNA를 사용하는 것이 가능하다(참조: 도 8). STA는 전방 프라이머 및 하나(또는 소수의) 태그-특이적인 역방 프라이머의 다중 세트를 사용하여 수행할 수 있다. (101)은 X에서 목적한 다형성 유전자자리를 지닌 이본쇄 표준 DNA를 나타낸다. (102)는 공통의 증폭을 위해 가해진 연결 어댑터를 지닌 이본쇄 DNA를 나타낸다. (103)은 하이브리드화된 역 프라이머 B를 지닌 일본쇄 DNA를 나타낸다. (104)는 역 프라이머 B 및 연결 어댑터 태그 전방 프라이머 LT를 사용하여 증폭시킨 (103)에서 수득한 PCR 생성물을 나타낸다. (105)는 하이브리드화된 전방 프라이머 A, 및 내부 역방 프라이머 'b'를 지닌 PCR 생성물 (104)를 나타낸다. (106)은 전방 프라이머 A 및 역방 프라이머 'b'를 사용하는 (105)에서 증폭된 PCR 생성물을 나타내고, 이는 최종 PCR 생성물이다. 당해 방법은 표준 PCR 보다 더 짧은 표적 서열을 검출할 수 있다.
이중 중첩된 PCR과 같은 상기 방법의 단순한 반복 또는 조합인 추가의 변형이 존재할 수 있으며, 여기서 프라이머의 3개 세트가 사용된다. 다른 변형은 1과 1/2면 중첩된 미니-PCR이며, 여기서 STA는 또한 중첩된 전방 프라이머 및 하나(또는 소수의) 태그-특이적인 역방 프라이머의 다중 세트를 사용하여 수행할 수 있다.
이들 변형 모두에서, 전방 프라이머 및 역방 프라이머의 실체는 상호교환될 수 있음에 주목한다. 일부 구현예에서, 중첩된 변이체는 첨부되는 어댑터 태그 및 공통의 증폭 단계를 포함하는 초기 라이브러리 제조없이 동등하게 잘 수행될 수 있다. 일부 구현예에서, 추가 라운드의 PCR이 추가의 전방 및/또는 역방 프라이머 및 증폭 단계와 함께 포함될 수 있으며; 이들 추가의 단계는 표적화된 유전자자리에 상응하는 DNA 분자의 존재를 추가로 증가시키는 것이 바람직한 경우 특히 유용할 수 있음에 주목한다.
중첩된 작업흐름
상이한 정도의 중첩, 및 상이한 정도의 다중화를 사용하여 증폭을 수행하는 많은 방법이 존재한다. 도 9에서, 흐름 차트가 일부 가능한 작업흐름과 함께 제공된다. 10,000-플렉스의 PCR의 사용은 단지 예인 것으로 의미되고; 이들 흐름 차트는 다른 정도의 다중화를 위해 동등하게 잘 작업할 수 있음에 주목한다.
루프 연결 어댑터
예를 들면, 서열분석을 위한 라이브러리를 제공할 목적으로 공통의 태그된 어댑터를 가하는 경우, 어댑터를 연결하기 위한 다수의 방법이 존재한다. 한가지 방법은 시료 DNA를 평활말단화하고, A-테일링을 수행하며 T-오버행(overhang)을 지닌 어댑터와 연결하는 것이다. 어댑터를 연결하는 다수의 다른 방법이 존재한다. 또한 연결될 수 있는 다수의 어댑터가 존재한다. 예를 들면, Y-어댑터를 사용할 수 있으며, 여기서 어댑터는 2개 쇄의 DNA로 이루어지고 여기서 하나의 쇄는 이본쇄 영역, 및 전방 프라이머 영역에 의해 구체화된 영역을 가지며, 여기서 다른 쇄는 제1 쇄에서 이본쇄 영역에 상보성인 이본쇄 영역 및 역 프라이머를 지닌 영역에 의해 구체화되었다. 이본쇄 영역은, 어닐링되는 경우, A 오버행을 지닌 이본쇄 DNA에 대한 연결 목적을 위한 T-오버행을 함유할 수 있다.
일 구현예에서, 어댑터는 DNA의 루프일 수 있으며, 여기서 말단 영역은 상보성이고, 여기서 루프 영역은 전방 프라이머 태그된 영역(LFT), 역방 프라이머 태그된 영역(LRT), 및 2개 사이의 절단 부위를 함유한다(참조: 도 10). (101)은 이본쇄된, 평활말단화된 표적 DNA를 말한다. (102)는 A-테일된 표적 DNA를 말한다. (103)은 T 오버행 'T' 및 절단 부위 'Z'를 지닌 루프된 연결 어댑터를 말한다. (104)는 첨부된 루프된 연결 어댑터를 지닌 표적 DNA를 말한다. (105)는 절단 부위에서 첨부되어 절단된 연결 어댑터를 지닌 표적 DNA를 말한다. LFT는 연결 어댑터 전방 태그를 말하며, LTR은 연결 어댑터 역 태그를 말한다. 상보성 영역은 T 오버행에서 끝날 수 있거나 표적 DNA에 대한 연결을 위해 사용될 수 있는 다른 특징으로 끝날 수 있다. 절단 부위는 UNG에 의한 절단을 위한 일련의 우리실일 수 있거나 제한 효소 또는 다른 절단 방법 또는 바로 염기성 증폭에 의해 인식되고 절단될 수 있는 서열일 수 있다. 이들 어댑터는 라이브러리 제조, 예를 들면, 서열분석을 위해 사용될 수 있다. 이들 어댑터는 본원에 기술된 다른 방법들 중 어느 것, 예를 들면, 미니-PCR 증폭 방법과 함께 사용될 수 있다.
내부적으로 태그된 프라이머
서열분석을 사용하여 소정의 다형성 유전자자리에서 대립유전자를 측정하는 경우, 서열 판독은 전형적으로 프라이머 결합 부위(a)의 상부에서, 및 이후에 다형성 부위(X)에 대해 시작한다. 태그는 전형적으로 도 11, 좌측에 나타낸 바와 같이 구성된다. (101)은 목적한 다형성 유전자자리 'X', 및 첨부된 태그 'b'를 지닌 프라이머 'a'를 지닌 일본쇄 표적 DNA를 말한다. 비특이적인 하이브리드화를 피하기 위하여, 프라이머 결합 부위('a'에 대해 상보성인 표적 DNA의 영역)는, 길이가 전형적으로 18 내지 30bp이다. 서열 태그 'b'는 전형적으로 약 20bp이고; 많은 사람들이 서열분석 플랫폼 회사에서 시판하는 프라이머 서열을 사용하지만, 이론상으로 이들은 약 15bp보다 더 긴 어떠한 길이일 수 있다. 'a'와 'X' 사이의 거리 'd'는 대립유전자 편향을 피하기 위하여 적어도 2bp일 수 있다. 조심스러운 프라이머 설계가 과도한 프라이머 프라이머 상호작용을 피하는데 필수적인, 본원에 개시된 방법 또는 다른 방법을 사용하여 다중화된 PCR 증폭을 수행하는 경우, 'a'와 'X' 사이의 허용가능한 'd'의 윈도우는 매우 상당히: 2 bp 내지 10 bp, 2 bp 내지 20 bp, 2 bp 내지 30 bp, 또는 심지어 2 bp 내지 30 bp로 변할 수 있다. 따라서, 도 11, 좌측에 나타낸 프라이머 구조를 사용하는 경우, 서열 판독물은 최소 40 bp이어야만 다형성 유전자자리를 측정하기에 충분한 길이의 판독물을 수득하며, 'a' 및 'd'이 길이에 따라 서열 판독물은 60 또는 75 bp 이하일 필요가 있을 수 있다. 일반적으로, 서열 판독물이 길수록, 소정의 수의 판독물을 서열분석하는 비용 및 시간이 커지므로, 필수적인 판독물 길이를 최소하는 것이 시간 및 비용을 절약할 수 있다. 또한, 평균적으로, 판독물 상의 조기 염기 판독물은 판독물에서 후자의 판독물보다 더 정밀하게 판독되므로, 필수적인 서열 판독물 길이를 감소시키는 것이 또한 다형성 영역의 측정의 정밀도를 증가시킬 수 있다.
일 구현예에서, 명명된 내부적으로 표적화된 프라이머, 프라이머 결합 부위(a)는 다수의 분절(a', a", a"'....)로 쪼개지며 서열 태그(b)는 도 11, (103)에 나타낸 바와 같이, 프라이머 결합 부위 중 2개의 중간에 존재하는 DNA의 분절상에 있다. 이러한 구조는, 서열이 보다 짧은 서열 판독물이 되도록 한다. 일 구현예에서, a', + a"는 적어도 약 18 bp이어야하며, 30, 40, 50, 60, 80, 100 또는 100 bp 이상으로 길 수 있다. 일 구현예에서, a"는 적어도 약 6 bp이어야 하고, 일 구현예에서 약 8 내지 16 bp이다. 내부적으로 태그된 프라이머를 사용하여 동등한 모든 다른 인자는 서열 판독물의 길이를 적어도 6 bp, 8 bp, 10 bp, 12 bp, 15 bp 정도로 많이, 및 심지어 20 또는 30 bp 정도로 많이 절단할 수 있다. 이는 유의적인 비용, 시간 및 정밀도 장점을 생성할 수 있다. 내부적으로 태그된 프라이머의 예는 도 12에 제공된다.
연결 어댑터 결합 영역을 지닌 프라이머
단편화된 DNA를 사용한 한가지 쟁점은, 이것이, 길이가 짧으므로, 다형성이 DNA 쇄의 말단에 근접하는 기회가 보다 긴 쇄보다 더 높다는 것이다(즉, (101), 도 10). 다형성의 PCR 포획은 다형성의 양쪽 면에서 적합한 길이의 프라이머 결합 부위를 요구하므로, 유의적인 수의 태그된 다형성을 지닌 DNA 쇄는 프라이머와 표적화된 결합 부위 사이의 불충분한 오버랩으로 인하여 손실될 것이다. 일 구현예에서, 표적 DNA (101)은 (102)에 첨부된 연결 어댑터를 가질 수 있으며, 표적 프라이머(103)은 설계된 결합 영역(a)의 상부로 첨부된 연결 어댑터 태그(lt)에 대해 상보성인 영역(cr)을 가질 수 있으므로(참조: 도 13); 결합 영역(a 에 대해 상보성인 (101)의 영역)이 하이브리드화에 전형적으로 요구된 18bp보다 더 짧은 경우, 라이브러리 태그에 대해 상보성인 프라이머 사의 영역(cr)은, PCR이 진행될 수 있는 지점에서 결합 에너지를 증가시킬 수 있다. 보다 짧은 결합 영역으로 인하여 손실되는 어떠한 특이성도 적합하게 긴 표적 결합 영역을 지닌 다른 PCR 프라이머에 의해 제조될 수 있음에 주목한다. 당해 구현예는 직접적인 PCR, 또는 본원에 기술된 다른 방법 중 어느 것, 예를 들면, 중첩된 PCR, 어느 정도의 중첩된 PCR, 반 중첩된 PCR, 단면 중첩되거나 어느 정도 또는 반 중첩된 PCR, 또는 다른 PCR 프로토콜 중 어느 것과 함께 사용될 수 있다.
서열분석 데이터를 사용하여 관찰된 대립유전자 데이터를 다양한 가설을 위해 예측된 대립유전자 분포와 비교함을 포함하는 분석 방법과 함께 배수성을 측정하는 경우, 낮은 깊이의 판독물을 지닌 대립유전자의 각 추가 판독물은 큰 깊이의 판독물을 지닌 대립유전자의 판독물보다 더 추가의 정보를 수득할 것이다. 따라서, 이상적으로, 각각의 유전자자리가 유사한 수의 대표적인 서열 판독물을 가질 균일한 깊이의 판독물(DOR)을 찾는 것을 원할 수 있다. 따라서, DOR 변이를 최소화하는 것이 바람직하다. 일 구현예에서, 어닐링 시간을 증가시킴으로써 DOR의 변이 계수(이는 DOR의 표준 편차/DOR의 평균으로 정의될 수 있다)를 감소시키는 것이 가능하다. 일부 구현예에서, 어닐링 온도는 2분 이상, 4분 이상, 10분 이상, 30분 이상, 및 1시간 이상, 또는 심지어 그 이상일 수 있다. 어닐링은 평형 공정이므로, 증가되는 어닐링 시간으로 DOR 변이의 개선에 대한 제한은 없다. 일 구현예에서, 프라이머 농도를 증가시키는 것은 DOR 변수를 증가시킬 수 있다.
예시적인 전체 게놈 증폭 방법
일부 구현예에서, 본원의 방법은 표적 유전자자리를 바로 증폭시키기 전에 핵산 시료를 증폭시키기 위한 전체 게놈 적용의 사용과 같이, DNA를 증폭시키는 것을 포함할 수 있다. 소량의 유전 물질을 유사한 유전 데이터의 세트를 포함하는 다량의 유전 물질로 전환시키는 공정인, DNA의 증폭은 폴리머라제 연쇄 반응(PCR)을 포함하나, 이에 한정되지 않는 광범위한 방법으로 수행할 수 있다. DNA를 증폭시키는 한가지 방법은 전체 게놈 증폭(WGA)이다. WGA에 대해 이용가능한 다수의 방법: 연결-매개된 PCR(LM-PCR), 변성 올리고뉴클레오타이드 프라이머 PCR (DOP-PCR), 및 다중 대체 증폭(MDA)이 존재한다. LM-PCR에서, 어댑터로 불리는 짧은 DNA 서열는 DNA의 평활 말단에 연결된다. 이들 어댑터는 PCR에 의해 DNA를 증폭시키는데 사용된 공통의 증폭 서열을 함유한다. DOP-PCR에서, 공통의 증폭 서열을 또한 함유하는 무작위 프라이머를 제1의 라운드의 어닐링 및 PCR에서 사용한다. 이후에, 제2의 라운드의 PCR을 사용하여 공통의 프라이머 서열을 추가로 지닌 서열을 증폭시킨다. MDA는 phi-29 폴리머라제를 사용하며, 이는 DNA를 복제하는 고도로 진행성이고 비-특이적인 효소이며 단일-세포 분석에 사용된다. 단일 세포의 물질의 증폭에 대한 주요 제한은 (1) 극도로 희석된 DNA 농도 또는 극도로 작은 용적의 반응 혼합물이 필요하다는 것과 (2) 전체 게놈에 따른 단백질의 DNA의 신뢰가능한 해리가 어렵다는 것이다. 그럼에도 불구하고, 단일-세포 전체 게놈 증폭는 수년 동안 다양한 적용에 성공적으로 사용되어 왔다. DNA 시료의 DNA를 증폭시키는 다른 방법이 존재한다. DNA 증폭은 DNA의 내부 시료를 서열 세트에서 유사하지만 훨씬 더 많은 양의 DNA의 시료로 전환시킨다. 일부 경우에, 증폭은 요구되지 않을 수 있다.
일부 구현예에서, DNA는 WGA 또는 MDA와 같은 공통의 증폭을 사용하여 증폭시킬 수 있다. 일부 구현예에서, DNA는 표적화된 증폭, 예를 들면, 표적화된 PCR, 또는 순환되는 프로브에 의해 증폭될 수 있다. 일부 구현예에서, DNA는 표적화된 증폭 방법, 또는 하이브리드화 접근법에 의한 포획과 같은, 원치않는 DNA에서의 바람직한 완전하거나 부분적인 분리를 생성하는 방법을 사용하여 우선적으로 농축시킬 수 있다. 일부 구현예에서, DNA는 공통의 증폭 방법 및 우선적인 농축 방법의 조합을 사용하여 증폭시킬 수 있다. 이들 방법들 중 일부의 완전한 설명은 본 서류의 어딘가에서 찾을 수 있다.
예시적인 농축 및 서열분석 방법
일 구현예에서, 본원에 개시된 방법은 표적 유전자자리(예: 다형성 유전자자리)들의 세트의 각 표적 유전자자리(예: 각각의 다형성 유전자자리)의 DNA의 원래 시료에 존재하는 상대적인 대립유전자 빈도를 보존하는 선택적 농축 기술을 사용한다. 농축이 다형성 유전자자리를 분석하기 위한 방법에 특히 유리하지만, 이러한 농축 방법은 경우에 따라 비다형성 유전자자리에 용이하게 채택될 수 있다. 일부 구현예에서 증폭 및/또는 선택적인 농축 기술은 연결 매개된 PCR, 하이브리드화에 의한 단편 포획, 분자 역전 프로브, 또는 다른 원형 프로브와 같은 PCR을 포함할 수 있다. 일부 구현예에서, 증폭 또는 선택적인 농축을 위한 방법은 프로브를 사용함을 포함할 수 있으며, 여기서 표적 서열에 대한 정확한 하이브리드화시, 뉴클레오타이드 프로브의 3-프라임 또는 5-프라임 말단은 소수의 뉴클레오타이드에 의해 대립유전자의 다형성 부위에서 분리된다. 이러한 분리는 대립유전자 편향으로 명명되는, 하나의 대립유전자의 우선적인 증폭을 감소시킨다. 이는, 정확하게 하이브리드화된 프로브의 3-프라임 말단 또는 5-프라임 말단이 대립유전자의 다형성 부위에 인접하거나 매우 가까운 프로브를 사용함을 포함하는 방법에 비하여 개선되어 있다. 일 구현예에서, 하이브리드화 영역이 다형성 부위를 함유할 수 있거나 정확하게 함유하는 프로브는 배제된다. 하이브리드화 부위에서 다형성 부위는 동등하지 않은 하이브리드화를 유발하거나 일부 대립유전자에서 함께 하이브리드화를 억제함으로써 특정의 대립유전자의 우선적인 증폭을 생성한다. 이들 구현예는, 시료가 하나의 개체 또는 개체의 혼합물에서 채취한 순수한 게놈 시료인 것에 상관없이, 이들이 각각의 다형성 유전자자리에서 시료의 원래의 대립유전자 빈도를 보다 잘 보존한다는 점에서 표적화된 증폭 및/또는 선택적인 농축을 포함하는 다른 방법보다 개선되어 있다.
비-침입성 태아 대립유전자 요청 또는 배수성 요청을 위한 방법의 일부로서 DNA의 시료를 표적 유전자자리의 세트에서 농축시키는 기술에 이은 서열분석의 사용은 다수의 기대하지 않은 장점을 부여할 수 있다. 본원의 일부 구현예에서, 당해 방법은 PARENTAL SUPPORT^TM(PS)와 같은, 정보학 기반한 방법과 함께 사용하기 위해 유전 데이터를 측정하는 단계를 포함한다. 당해 구현예 중의 일부의 궁극적인 결과는 배아 또는 태아의 소송을 초래할 수 있는 유전 데이터가다. 구현된 방법의 일부로서 개체 및/또는 관련된 개체의 유전 데이터를 측정하는데 사용된 많은 방법이 존재한다. 일 구현예에서, 표적화된 대립유전자의 세트의 농도를 농축시키는 방법이 본원에 기재되어 있으며, 당해 방법은 다음 단계들 중 하나 이상을 포함한다: 유전 물질의 표적화된 증폭, 유전자자리 특이적인 올리고뉴클레오타이드 프로브의 첨가, 구체적인 DNA 쇄의 연결, 목적한 DNA 세트의 분리, 반응의 원치않는 성분의 제거, 하이브리드화에 의한 DNA의 특정 서열의 검출, 및 DNA 서열분석 방법에 의한 DNA의 하나 또는 다수의 쇄의 서열의 검출. 일부 경우에, DNA 쇄는 표적 유전 물질을 말할 수 있으며, 일부 경우에 이들은 프라이머를 말할 수 있고, 일부 경우에, 이들은 합성된 서열, 또는 이의 조합을 말할 수 있다. 이들 단계는 다수의 상이한 순서로 수행할 수 있다.
예를 들면, 표적화된 증폭 전에 DNA의 공통의 증폭 단계는 병목현상의 위험을 제거하고 대립유전자 편향성을 감소시키는 것과 같은 수개의 장점을 부여할 수 있다. DNA는 한면에서 하나씩, 2개의 이웃하는 표적 서열의 영역과 하이브리드화할 수 있는 혼합된 올리고뉴클레오타이드 프로브일 수 있다. 하이브리드화 후, 프로브의 말단은 연결 수단인 폴리머라제, 및 어떠한 필수적인 시약을 가하여 연결시킴으로써 프로브의 순환을 허용할 수 있다. 순환 후, 엑소뉴클레아제를 가하여 비-순환된 유전물질로 분해한 후, 순환된 프로브를 검출할 수 있다. DNA는 한면에 하나씩, 표적 서열의 2개의 이웃하는 영역과 하이브리드화할 수 있는 PCR 프라이머와 혼합될 수 있다. 하이브리드화 후, 프로브의 말단은 연결 수단인, 폴리머라제, 및 PCR 증폭을 완료시키기 위한 어떠한 필수적인 시약을 가하여 연결시킬 수 있다. 증폭되거나 증폭되지 않은 DNA는 유전자자리의 세트를 표적화하는 하이브리드 포획 프로브에 의해 표적화될 수 있으며; 하이브리드화 후, 프로브는 국재화되어 혼합물에서 분리됨으로써 표적 서열 속에 농축된 DNA의 혼합물을 제공할 수 있다.
특정의 유전자자리를 표적화하는 방법에 이어 대립유전자 요청 또는 배수성 요청을 위한 방법의 일부로서 서열분석의 사용은 다수의 예측하지 못한 장점을 부여할 수 있다. DNA가 표적화되거나, 우선적으로 농축될 수 있는 일부 방법은 순환 프로브, 연결된 역전 프로브(LIP, MIP)의 사용, SURESELECT와 같은 하이브리드화 방법에 의한 포획, 및 표적화된 PCR 또는 연결-매개된 PCR 증폭 전략을 사용함을 포함한다.
일부 구현예에서, 본원의 방법은 본원에 추가로 기술된, PARENTAL SUPPORT^TM(PS)과 같은, 정보학계 방법과 함께 사용하기 위해 유전 데이터를 측정하는 단계를 포함한다. PARENTAL SUPPORT^TM은 본원에 개시된 측면인, 유전 데이터를 조작하기 위한 정보학계 접근법이다. 일부 양태의 궁극적인 결과는 배아 또는 태아의 소송을 초래할 수 있는 유전 데이터에 이은 소송을 초래할 수 있는 데이터를 기반으로 한 임상 결정이다. PS 방법 이면의 알고리즘은 표적 개체, 흔히 배아 또는 태아의 측정된 유전 데이터, 및 관련 개체에서 측정된 유전 데이터를 취하며, 표적 개체의 유전 상태가 알려져 있는 정밀도를 증가시킬 수 있다. 일 구현예에서, 측정된 유전 데이터는 태아 유전 진단 동안에 배수성 측정을 이룬다는 측면에서 사용된다. 일 구현예에서, 측정된 유전 데이터는 시험관 수정 동안 배아에서 배수성 측정 또는 대립유전자 요청을 이룬다는 측면에서 사용된다. 상술한 내용에서 개체 및/또는 관련 개체의 유전 데이터를 측정하는데 사용될 수 있는 많은 방법이 존재한다. 상이한 방법은 다수의 단계를 포함하며, 이들 단계는 흔히 유전 물질의 증폭, 올리고뉴클레오타이드 프로브의 첨가, 상세한 DNA 쇄의 연결, 목적한 DNA 세트의 분리, 반응의 원치않는 성분의 제거, 하이브리드화에 의한 DNA의 특정 서열의 검출, DNA 서열분석 방법에 의한 DNA의 하나 또는 다수의 쇄의 서열의 검출을 포함한다. 일부 경우에, DNA 쇄는 표적 유전 물질을 말할 수 있으며, 일부 경우에 이들은 프라이머를 말할 수 있고, 일부 경우에 이들은 합성된 서열, 또는 이의 조합을 말할 수 있다. 이들 단계는 다수의 상이한 순서로 수행될 수 있다.
이론적으로, 하나의 유전자자리 내지 백만 개의 유전자자리 어디에서도, 게놈에서 어떠한 수의 유전자자리를 표적화하는 것이 가능하다는 것에 주목한다. DNA의 시료를 표적화에 적용시킨 후, 서열분석하는 경우, 서열분석기에 의해 판독되는 대립유전자의 퍼센트는 시료 속의 이들의 천연의 풍부성과 관련하여 농축될 것이다. 농축 정도는 1 퍼센트(또는 심지어 미만) 내지 10-배, 100-배, 1000-배, 또는 심지어 수백만-배까지의 어느 것일 수 있다 사람 게놈에는 대략 30억개의 염기쌍, 및 대략 7500만 개의 다형성 유전자자리를 포함하는, 뉴클레오타이드가 존재한다. 표적화된 유전자자리의 수가 많아질 수록, 가능한 농축의 정도는 더 적어진다. 표적화되는 유전자자리의 수가 적어질 수록, 가능한 농축 정도는 더 커지며, 보다 큰 깊이의 판독물은 소정의 수의 서열 판독물에 대한 이들의 유전자자리에서 달성될 수 있다.
본원의 일 구현예에서, 표적화 또는 우선권(preferential)은 전적으로 SNP에 촛점을 맞출 수 있다. 일 구현예에서, 표적화 또는 우선권은 어떠한 다형성 부위에도 촛점을 맞출 수 있다. 다수의 시판되는 표적화 생성물이 엑손을 농축시키는 데 이용가능하다. 놀랍게도, 전적으로 SNP, 또는 전적으로 다형성 유전자자리를 표적화하는 것은, 대립유전자 분포에 의존하는 NPD에 대한 방법을 사용하는 경우 특히 유리하다. 판독물 계수가, 소정의 염색체에 맵핑하는 판독물의 수를 계수하는 것에 촛점을 맞추고, 분석된 서열 판독물이 다형성인 게놈의 영역에 촛점을 맞추지 않는 경우의 판독물 계수 분석을 포함하는, 서열분석을 사용하는 NPD를 위한 공개된 방법, 예를 들면, 미국 특허 제7,888,017호가 존재한다. 다형성 대립유전자에 촛점을 맞추지 않는 방법의 이들 유형은 대립유전자들의 세트의 우선적 농축 또는 표적화의 유형만큼 많이 유리하지 않을 수 있다.
본원의 구현예에서, SNP에 촛점을 맞추어 게놈의 다형성 영역내 유전자 시료를 농축시키는 표적화 방법을 사용하는 것이 가능하다. 일 구현예에서, 소수의 SNP, 예를 들면, 1 내지 100개의 SNP, 또는 보다 많은 수, 예를 들면, 100 내지 1,000개, 1,000 내지 10,000개, 10,000 내지 100,000개 또는 100,000개 이상의 SNP에 촛점을 맞추는 것이 가능하다. 일 구현예에서, 살아있는 삼염색체 출생과 상호관련된 하나 또는 소수의 염색체, 예를 들면, 염색체 13, 18, 21, X 및 Y, 또는 이의 조합에 촛점을 맞추는 것이 가능하다. 일 구현예에서, 표적화된 SNP를 작은 인자, 예를 들면, 1.01배 내지 100배, 또는 보다 큰 인자, 예를 들면, 100배 내지 1,000,000배, 또는 심지어 1,000,000배 이상 농축시키는 것이 가능하다. 본 발명의 일 구현예에서, 게놈의 다형성 영역내에 우선적으로 농축된 DNA의 시료를 생성하는 표적화 방법을 사용하는 것이 가능하다. 일 구현예에서, DNA의 혼합물이 모계 DNA 및 또한 유리 부유하는 태아 DNA를 함유하는 이들 특성 중의 어느 것을 사용하여 DNA의 혼합물을 생성시키는 당해 방법을 사용하는 것이 가능하다. 일 구현예에서, 이들 인자의 어떠한 조합을 갖는 DNA의 혼합물을 생성시키는 당해 방법을 사용하는 것이 가능하다. 예를 들면, 본원에 기술된 방법을 사용하여 모계 DNA 및 태아 DNA를 포함하고, 모두 염색체 18 또는 21번 상에 위치하며, 평균 1000배 농축된, 200개의 SNP에 상응하는 DNA 속에 우선적으로 농축된 DNA의 혼합물을 생산할 수 있다. 다른 예에서, 염색체 13, 18, 21, X 및 Y 상에 모두 또는 대부분 위치하고, 유전자자리 당 평균 농축이 500배 이상인 10,000개의 SNP 속에 우선적으로 농축된 DNA의 혼합물을 생성시키는 방법을 사용하는 것이 가능하다. 본원에 기술된 표적화 방법 중 어느 것도 사용하여 특정 유전자자리에 우선적으로 농축된 DNA의 혼합물을 생성할 수 있다.
일부 구현예에서, 본원의 방법은 고 배출 DNA 서열분석기를 사용하여 혼합된 분획 속에서 DNA를 측정하는 단계를 포함하며, 여기서 혼합된 분획 속의 DNA는 하나 이상의 염색체에서 측정한 불균형한 수의 서열을 함유하며, 여기서, 하나 이상의 염색체는 염색체 13, 염색체 18, 염색체 21, 염색체 X, 염색체 Y 및 이의 조합을 포함하는 그룹에서 취한다.
본원에는 3개의 방법이 기술되어 있다: 다중 PCR, 하이브리드화에 의한 표적화된 포획, 및 연결된 역전된 프로브(LIP), 여기서 당해 방법을 사용하여 모계 혈장 시료의 충분한 수의 다형성 유전자자리의 측정을 수득하고 분석함으로써 태아 이수성을 검출할 수 있으며; 이는 표적화된 유전자자리의 선택적인 농축이 다른 방법을 배제함을 의미하지 않는다. 다른 방법을 방법의 주요내용을 변경하지 않고 동등하게 잘 사용할 수 있다. 각각의 경우에 검정된 다형성은 단일 뉴클레오타이드 다형성(SNP), 작은 삽입-결실(small indel), 또는 STR을 포함할 수 있다. 바람직한 방법은 SNP의 사용을 포함한다. 각각의 접근법은 대립유전자 빈도 데이터를 생산하며; 각각의 표적화된 유전자자리에 대한 대립유전자 빈도 데이터 및/또는 이들 유전자자리의 결합 대립유전자 빈도 분포를 분석하여 태아의 배수성을 측정할 수 있다. 각각의 접근법은 제한된 공급원 물질 및 모계 혈장이 모계 및 태아 DNA의 혼합물로 이루어진다는 사실로 인하여 이의 자체 고려사항을 갖는다. 당해 방법은 다른 접근법와 결합되어 보다 정밀한 측정을 제공할 수 있다. 일 구현예에서, 당해 방법은 미국 특허 제7,888,017호에 기술된 바와 같은 서열 계수 접근법와 함께 결합될 수 있다. 기술된 접근법을 또한 사용하여 모계 혈장 시료로부터 비침습적으로 태아의 부계를 검출할 수 있다. 또한, 각 접근법을 DNA의 다른 혼합물 또는 순수한 DNA 시료에 적용시켜 이수성 염색체의 존재 또는 부재를 검출하거나, 변성된 DNA 시료로 다수의 SNP의 유전자형을 분석하거나, 분절 카피 수 변이(CNV)를 검출하거나, 목적한 다른 유전형 상태, 또는 이의 일부 조합을 검출할 수 있다.
시료에서 대립유전자 분포의 정밀한 측정
현재의 서열분석 접근법을 사용하여 시료 속에서 대립유전자의 분포를 평가할 수 있다. 한가지 이러한 방법은 셧건 서열분석(shotgun sequencing)으로 명명된, 혼주물 DNA의 서열을 무작위로 시료채취하는 것을 포함한다. 서열분석 데이터에서 특정한 대립유전자의 비는 전형적으로 매우 낮으며 단순한 통계학으로 측정할 수 있다. 사람 게놈은 대략 3천만 개의 염기쌍을 함유한다. 따라서, 사용된 서열분석 방볍이 100bp의 판독물을 제조한 경우, 특정한 대립유전자는 매 3천만 개의 서열 판독물에서 약 1회 측정될 것이다.
일 구현예에서, 본원의 방법은 당해 염색체의 유전자자리의 측정된 대립유전자 분포로부터 DNA의 시료에 동일한 유전자자리들의 세트를 함유하는 2개 이상의 상이한 일배체형의 존재 또는 부재를 판정하는 데 사용된다. 상이한 일배체형은 하나의 개체의 2개의 상이한 동종 염색체, 삼염색체성 개체의 3개의 상이한 동종 염색체, 모계 및 태아의 3개의 상이한 동종 일배체형(여기서 일배체형 중 하나는 모친과 태아 사이에서 공유된다), 모계 및 태아의 3 또는 4개의 일배체형(여기서 일배체형의 1개 또는 2개는 모와 태아 사이에 공유된다), 또는 다른 조합을 나타낼 수 있다. 일배체형 사이에서 다형성인 대립유전자 보다 정보성인 경향이 있지만, 모친 및 태자녀 둘다 동일한 대립유전자에 대해 동종접합성인 어떠한 대립유전자도 단순한 판독물 계수 분석으로부터 이용가능한 정보를 초과하는 측정된 대립유전자 분포를 통해 유용한 정보를 수득할 것이다.
그러나, 이러한 시료의 셧건 서열분석은 시료에 있어서 상이한 일배체형 사이에 다형성이 아니거나 목적하지 않은 염색체를 위한 영역에 대해 많은 서열을 초래하므로, 매우 비효율적이며, 따라서 표적 일배체형의 비율에 대한 정보를 나타내지 않는다. 본원에 기술된 것은 게놈 속에서 다형성인 경향이 큰 시료 속에서 DNA의 분절을 우선적으로 농축시키고/시키거나 특이적으로 표적화하여 서열분석에 의해 수득된 대립유전자 정보의 수율을 증가시키는 방법이다. 농축된 시료 속에서 측정된 대립유전자 분포는 표적 개체에 존재하는 실제 양을 실제로 나타내므로, 표적화된 분절내 소정의 유전자자리에서 다른 대립유전자과 비교하여 하나의 대립유전자의 우선적인 농축이 거의 없거나 없는 것이 중요함에 주목한다. 다형성 대립유전자를 표적하하기 위해 당해 분야에 공지된 현재의 방법은, 존재하는 특정 대립유전자 중 적어도 일부가 검출되도록 보증하도록 설계되는 것이다. 그러나, 이들 방법은 원래의 혼합물 속에 존재하는 다형성 대립유전자의 편견이 없는 대립유전자 분포를 측정할 목적으로 설계되지 않았다. 표적 농축의 어떠한 특수 방법도 농축된 시료를 생산할 수 있을지는 명백하지 않으며, 여기서 측정된 대립유전자 분포는 다른 어떠한 방법보다 원래의 증폭되지 않는 시료에 존재하는 대립유전자 분포를 정확하게 나타낼 수 있다. 많은 농축 방법이 예측될 수 있지만, 이론적으로, 이러한 목적을 달성하기 위해서, 당해 분야의 통상의 기술자는, 현재의 증폭, 표적화 및 다른 우선적인 농축 방법에서 다량의 확률론적 또는 결정론적 편향이 존재함을 인식하고 있다. 본원에 기술된 방법 중의 일 구현예는 게놈 내에서 소정의 유전자자리에 상응하는 DNA의 혼합물 속에서 발견된 많은 대립유전자가 대립유전자 각각의 농축 정도가 거의 동일한 방식으로 증폭되거나, 우선적으로 농축되도록 한다. 이를 말하는 다른 방식은, 당해 방법이 혼합물 속에 존재하는 대립유전자의 상대적인 양이 전적으로 증가되도록 하면서, 각각의 유전자자리에 상응하는 대립유전자 사이이 비가, 이들이 DNA의 원래의 혼합물 속에 존재하는 바와 필수적으로 동일하게 잔존하도록 하는 것이다. 일부 보고된 방법의 경우, 유전자자리의 우선적인 농축은 1% 이상, 2% 이상, 5% 이상 및 심지어 10% 이상의 대립유전자 편향을 생성할 수 있다. 이러한 우선적인 농축은 하이브리드화 접근법에 의한 포획을 사용하는 경우 포획 편형 또는 각각의 사이클에 대해 작을 수 있지만, 20, 30 또는 40 주기에 걸쳐 화합되는 경우 크게 될 수 있는 증폭 편향에 기인할 수 있다. 본원의 목적을 위해, 필수적으로 이를 잔류시키기 위한 비는, 수득되는 혼합물 속의 대립유전자의 비로 나눈 원래의 혼합물 속의 대립유전자의 비가 0.95 내지 1.05, 0.98 내지 1.02, 0.99 내지 1.01, 0.995 내지 1.005, 0.998 내지 1.002, 0.999 내지 1.001, 또는 0.9999 내지 1.0001임을 의미한다. 본원에 나타낸 대립유전자 비의 계산은 표적 개체의 배수성 상태의 측정시 사용될 수 없으며, 대립유전자 편향을 측정하는데 사용되는 유일한 계량일 수 있다.
일 구현예에서, 일단 혼합물이 표적 유전자자리의 세트에서 우선적으로 농축되면, 이는 클론성 시료(단일 분자로부터 생성된 시료; 예는 ILLUMINA GAIIx, ILLUMINA HiSeq, Life Technologies SOLiD, 5500XL를 포함한다)를 서열분석하는 앞서의, 현재의, 또는 다음 세대 서열분석 장치 중의 어느 하나를 사용하여 서열분석할 수 있다. 당해 비는 표적화된 영역내에서 특정한 대립유전자를 통해 서열분석함으로써 평가할 수 있다. 이들 서열분석 판독물은 서열분석하여 대립유전자 유형 및 상응하게 측정된 상이한 대립유전자의 비에 따라 계수할 수 있다. 길이에 있어서 1 내지 약간의 염기인 변화의 경우, 대립유전자의 검출은 서열분석에 의해 수행될 것이며 서열분석 판독물이 문제의 대립유전자에 걸쳐서 포획된 분자의 대립유전자 조성을 평가하는 것이 필수적이다. 유전형에 대해 분석된 포획된 분자의 총 수는 서열분석 판독물의 길이를 증가시킴으로써 증가시킬 수 있다. 모든 분자의 완전한 서열분석은 농축된 혼주물에서 이용가능한 최대 양의 데이터의 수집을 보증할 수 있다. 그러나, 서열분석은 현재 비용이 많이 들고, 보다 적은 수의 서열 판독물을 사용하여 대립유전자 분포를 측정할 수 있는 방법이 보다 가치가 있을 것이다. 또한, 판독물의 최대 가능한 길이에 대한 기술적 제한 및 판독물 길이가 증가하면서 정밀성 제한이 존재한다. 최대 유용성의 대립유전자는, 길이가 1 내지 수개의 염기일 것이지만, 이론적으로 서열분석 판독물의 길이보다 더 짧은 어떠한 대립유전자도 사용할 수 있다. 대립유전자 변화가 모든 유형에서 오지만, 본원에 제공된 실시예는 바로 약간의 이웃하는 염기쌍이 함유된 변이체 또는 SNP에 촛점을 맞춘다. 분절 카피 수 변이체와 같은 보다 큰 변이체는, 분절에 대해 내부인 SNP의 전체 수집이 중복화되므로 많은 경우에 보다 작은 변이의 응집으로 검출할 수 있다. STR과 같은 약간의 염기보다 큰 변이체는 특정한 고려를 필요로 하며 일부 표적화 접근법은 작동하지만 다른 것은 그렇지 않을 것이다.
게놈내에서 하나 또는 다수의 변이체 위치를 구체적으로 분리하여 농축시키는데 사용될 수 있는 다수의 표적화 접근법이 존재한다. 전형적으로, 이는 변이체 서열을 플랭킹하는 불변 서열의 장점을 취하는 것에 의존한다. 기질이 모계 혈장인 경우 서열분석의 내용에서 표적화와 관련된 다른 이에 의한 보고가 존재한다(참조: 예를 들면, Liao 등, Clin. Chem. 2011; 57(1): pp. 92-101). 그러나, 이들 접근법은 엑손을 표적화하는 표적화 프로브를 사용하여, 게놈의 다형성 영역을 표적화하는데 촛점을 맞추지 않는다. 일 구현예에서, 본원의 방법은 다형성 영역에서 전적으로 또는 거의 전적으로 촛점을 맞추는 표적화 프로브를 사용하는 것을 포함한다. 일 구현예에서, 본 기재내용이 방법은 SNP에 전적으로 또는 거의 전적으로 촉점을 맞추는 표적화 프로브를 사용하는 것을 포함한다. 본원의 일부 구현예에서, 표적화된 다형성 부위는 적어도 10%의 SNP, 적어도 20%의 SNP, 적어도 30%의 SNP, 적어도 40%의 SNP, 적어도 50%의 SNP, 적어도 60%의 SNP, 적어도 70%의 SNP, 적어도 80%의 SNP, 적어도 90%의 SNP, 적어도 95%의 SNP, 적어도 98%의 SNP, 적어도 99%의 SNP, 적어도 99.9%의 SNP, 또는 전적으로 SNP로 이루어진다.
일 구현예에서, 본원의 방법을 사용하여 DNA 분자의 혼합물의 유전형(특정 유전자자리에서 DNA의 염기 조성) 및 이들 유전형의 상대적인 비율을 측정할 수 있으며, 여기서 이들 DNA 분자는 하나 또는 다수의 유전적으로 명백한 개체에서 기원할 수 있다. 일 구현예에서, 본원의 방법은 다형성 유전자자리의 세트에서 유전형, 및 이들 유전자자리에 존재하는 상이항 대립유전자의 양의 상대적인 비를 측정할 수 있다. 일 구현예에서, 다형성 유전자자리는 전적으로 SNP로 이루어질 수 있다. 일 구현예에서, 다형성 유전자자리는 SNP, 단일의 탄뎀 반복체(tandem repeat), 및 다른 다형성을 포함할 수 있다. 일 구현예에서, 본원의 방법을 사용하여 DNA의 혼합물 속의 다형성 유전자자리의 세트에서 대립유전자의 상대적인 분포를 측정할 수 있으며, 여기서 DNA의 혼합물은 모에서 기원한 DNA 및 태아에서 기원한 DNA를 포함한다. 일 구현예에서, 결합 대립유전자 분포는 임신한 여성의 혈액에서 분리된 DNA의 혼합물에서 측정할 수 있다. 일 구현예에서, 유전자자리의 세트에서 대립유전자 분포를 사용하여 잉태된 태아에서 하나 이상의 염색체의 배수성 상태를 측정할 수 있다.
일 구현예에서, DNA 분자의 혼합물은 하나의 개체의 다수 세포에서 추출된 DNA에서 유래될 수 있다. 일 구현예에서, DNA가 기원하는 세포의 원래의 수집은, 개체가 모자익(mosaic)(배선 또는 체세포)인 경우, 동일하거나 상이한 유전형의 배수체 또는 일배체 세포의 혼합물을 포함할 수 있다. 일 구현예에서, DNA 분자의 혼합물은 또한 단일 세포로부터 추출된 DNA에서 기원할 수 있다. 일 구현예에서, DNA 분자의 혼합물은 또한 동일한 개체 또는 상이한 개체의 2개 이상의 세포의 혼합물로부터 추출된 DNA에서 기원할 수 있다. 일 구현예에서, DNA 분자의 혼합물은, 유리 DNA를 함유하는 것으로 알려진, 혈액 혈장과 같은 세포로부터 이미 유리된 생물학적 물질로부터 분리된 DNA에서 기원할 수 있다. 일 구현예에서, 당해 생물학적 물질은, 태아 DNA가 혼합물 속에 존재하는 것으로 밝혀진 임신 동안의 경우에서와 같이, 한 명 이상의 개체의 DNA의 혼합물일 수 있다. 일 구현예에서, 생물학적 물질은 모계 혈액에서 발견된 세포의 혼합물에서 기원할 수 있었으며, 여기서 세포들 중 일부는 태아 기원이다. 일 구현예에서, 생물학적 물질은 태아 세포 속에 농축된 임산부의 혈액의 세포일 수 있었다.
순환하는 프로브
본원의 일부 구현예는 문헌에 이미 기술된 "연결된 도립 프로브"(LIP)를 사용하여 본 발명의 다중 PCR 방법에서 LIP가 아닌 프라이머를 사용한 증폭 전 또는 후 표적 유전자자리를 증폭시킴을 포함한다. LIP는 DNA의 원형 분자의 생성을 포함하는 기술을 포함함을 의미하는 일반적인 용어이며, 여기서 프로브는 표적화된 대립유전자의 한면에서 DNA의 표적화된 영역에 하이브리드화하도록 설계됨으로써, 적절한 폴리머라제 및/또는 리가제의 첨가, 및 적절한 조건, 완충제 및 다른 시약이 표적화된 대립유전자를 따라 DNA의 상보성인, 역전된 영역을 완성하여 표적화된 대립유전자에서 발견된 정보를 포획하는 DNA의 원형 루프를 생성하도록 한다. LIP는 또한 예비-순환된 프로브, 예비-순환화 프로브, 또는 순환화 프로브로 불릴 수 있다. LIP 프로브는 50 내지 500개의 뉴클레오타이드 길이의 선형 DNA 분자일 수 있으며, 일 구현예에서 70 내지 100개의 뉴클레오타이드 길이일 수 있고; 일부 구현예에서 이는 본원에 기술된 것보다 더 길거나 짧을 수 있다. 본원의 다른 구현예는 패드록(Padlock) 프로브 및 분자 역전 프로브(MIP)와 같은 LIP 기술의 상이한 생애(incarnation)를 포함한다.
서열분석을 위한 특이적인 위치를 표적화하는 한가지 방법은 프로브를 합성하는 것이며, 여기서 프로브의 3' 및 5' 말단은 표적화된 영역의 측면에 인접한 및 측면 위치에서 표적 DNA에 어닐링함으로써, DNA 폴리머라제 및 DNA 리가제의 첨가가 3' 말단의 연장, 표적 분자에 대해 상보성인 일본쇄 프로브에 대한 염기의 첨가(갭-충전: gap-fill)에 이어, 배경 DNA에서 후속적으로 분리될 수 있는 원형 DNA 분자를 생성하는 원래의 프로브의 3' 말단 내지 5' 말단의 연결을 초래한다. 프로브 말단은 목적한 표적화된 영역에 플랭킹되도록 설계된다. 당해 접근법의 하나의 측면은 일반적으로 명명된 MIP이며 배열 기술과 함께 사용되어 충전된 서열의 특정을 측정한다. 대립유전자 비를 측정하는 환경에서 MIP의 사용의 한가지 결점은, 하이브리드화, 순환 및 증폭 단계가 동일한 유전자자리에서 상이한 대립유전자에 대해 동등한 비율로 일어나지 않는다는 것이다. 이는 원래의 혼합물 속에 존재하는 실제 대립유전자 비를 나타내지 않는 측정된 대립유전자 비를 생성한다.
일 구현예에서, 순환하는 프로브는, 표적화된 다형성 유전자자리의 상부로 하이브리드화되도록 설계된 프로브의 영역 및 표적화된 다형성 유전자자리의 하부로 하이브리드화되도록 설계된 프로브의 영역이 비-핵산 골격을 통해 공유결합으로 연결되도록 작제된다. 당해 골격은 어떠한 생적합성 분자 또는 생적합성 분자의 조합일 수 있다. 가능한 생적합성 분자의 일부 예는 폴리(에틸렌 글리콜), 폴리카보네이트, 폴리우레탄, 폴리에틸렌, 폴리프로필렌, 설폰 중합체, 실리콘, 셀룰로오즈, 플루오로중합체, 아크릴 화합물, 스티렌 블록 공중합체, 및 다른 블록 공중합체이다.
본원의 일 구현예에서, 당해 접근법을 변형시켜 서열내 충전물을 질의하는 수단으로서 서열분석에 용이하게 처리되도록 변형시켜 왔다. 원래의 시료의 원래의 대립유전자 비를 보유하기 위하여 적어도 하나의 중요한 고려사항을 고려하여야만 한다. 갭-충전 영역내 상이한 대립유전자 중에서 다양한 위치는 차등적인 변이체를 생성하는 DNA 폴리머라제에 의한 개시 편향일 수 있으므로 프로브 결합 부위에 너무 근접하지 않아야만 한다. 다른 고려사항은, 추가 변이가 상이한 대립유전자의 동등하지 않은 증폭을 생성할 수 있는 갭-충전 영역내 변이체와 관련된 프로브 결합 부위에 존재할 수 있다는 것이다. 본원의 일 구현예에서, 예비-순환된 프로브의 3' 말단 및 5' 말단은 표적화된 대립유전자의 변이체 위치(다형성 위치)로부터 떨어진 하나 또는 소수의 위치에 있는 염기에 하이브리드화하도록 설계된다. 다형성 위치(SNP 또는 기타)와 예비-순환된 프로브의 3' 말단 및/또는 5'가 하이브리드화하도록 설계된 염기 사이의 염기의 수는 하나의 염기일 수 있거나, 이는 2개의 염기일 수 있거나, 이는 3개의 염기일 수 있거나, 이는 4개의 염기일 수 있거나, 이는 5개의 염기일 수 있거나, 이는 6개의 염기일 수 있거나, 이는 7개 내지 10개의 염기일 수 있거나, 이는 16개 내지 20개의 염기, 20개 내지 30개 염기, 또는 30개 내지 60개 염기일 수 있다. 전방 및 역방 프라이머를 설계하여 다형성 부위로부터 떨어진 상이한 수의 염기를 하이브리드화하도록 할 수 있다. 프로브의 순환화는 거대한 수의 프로브가 생성되어 잠재적으로 혼주되도록 함으로써 많은 유전자자리의 조사가 동시에 가능하도록 하는 현재의 DNA 합성 기술을 사용하여 다량으로 생성시킬 수 있다. 300,000개 이상의 프로브를 사용하여 작업하는 것이 보고되어 왔다. 표적 개체의 게놈 데이터를 측정하는데 사용될 수 있는 프로브를 순환시키는 것을 포함하는 방법은 다음을 포함한다: Porreca 등, Nature Methods, 2007 4(11), pp. 931-936.; 및 또한 Turner 등, Nature Methods, 2009, 6(5), pp. 315-316. 이들 논문에 기술된 방법은 본원에 기술된 다른 방법과 함께 사용될 수 있다. 이들 2개의 논문에 기술된 방법의 특정 단계를 본원에 기술된 다른 방법의 다른 단계와 함께 사용할 수 있다.
본원에 개시된 방법의 일부 구현예에서, 표적 개체의 유전 물질은 임의로 증폭된 후 예비-순환된 프로브의 하이브리드화를 수반하며, 갭 충전을 수행하여 하이브리드화된 프로브의 2개의 말단 사이에 염기를 충전시키고, 2개의 말단을 연결하여 순환된 프로브를 형성하며, 순환된 프로브를 예를 들면, 롤링 환 증폭(rolling circle amplication)을 사용하여 증폭시킨다. 일단 목적한 표적 대립유전자 유전 정보가, 적절하게 설계된 올리고뉴클레오타이드 프로브를, 예를 들면, LIP 시스템 속에 순환시킴으로써 포획하면, 순환된 프로브의 유전 서열은 바람직한 서열 데이터를 수득하기 위해 측정할 수 있다. 일 구현예에서, 적절하게 설계된 올리고뉴클레오타이드 프로브는 표적 개체의 증폭되지 않은 유전 물질에 직접 순환되어 후방으로 증폭될 수 있다. 롤링 환 증폭, MDA, 또는 다른 증폭 프로토콜을 포함하는 다수의 증폭 공정을 사용하여 원래의 유전물질, 또는 순환된 LIP를 증폭시킬 수 있음에 주목한다. 상이한 방법, 예를 들면, 고 배출 서열분석, 생거 서열분석(Sanger sequencing), 다른 서열분석 방법, 하이브리드화에 의한 포획(capture-by-hybridization), 순환에 의한 포획(capture-by-circul arization), 다중 PCR, 다른 하이브리드화 방법, 및 이의 조합을 사용하여 표적 게놈에서 유전 정보를 측정할 수 있다.
일단 개체의 유전 물질이 상기 방법의 하나 이상의 조합을 사용하여 측정되면, 적절한 유전 측정과 함께, PARENTAL SUPPORT^TM 방법과 같은, 정보학 기반 방법을 이후에 사용하여 개체의 하나 이상의 염색체의 배수성 상태, 및/또는 대립유전자, 구체적으로 목적한 유전 상태 또는 질병과 상호 관련된 대립유전자 중의 하나 또는 세트의 유전 상태를 측정할 수 있다. LIP의 사용은 유전 서열의 다중화된 포획에 이은 서열분석을 사용한 유전형분석에 의해 보고되어 왔음에 주목한다. 그러나, 단일 세포, 소수의 세포, 또는 세포외 DNA에서 발견된 유전 물질의 증폭을 위한 LIP-계 방법으로 생성된 서열 데이터의 사용은 표적 개체의 배수성 상태를 측정하는 목적으로 사용되지 않아왔다.
ILLUMINA INFINIUM 배열, 또는 AFFYMETRIX 유전자 칩과 같은 하이브리드화 배열에 의해 측정된 유전 데이터에 의하여 개체의 배수성 상태를 측정하는 방법에 기반한 정보학을 적용하는 것은 본 문서의 어딘가에 참조 문헌으로 본 문서에 기술되어 있다. 그러나, 본원에 기술된 방법은 문헌에서 이미 기술된 방법에 비해 개선을 나타낸다. 예를 들어, LIP계 접근법에 이은 고 배출 서열분석은 다중화를 위한 보다 우수한 능력, 보다 우수한 포획 특이성, 보다 우수한 균일성, 및 낮은 대립유전자 편형을 갖는 접근법로 인하여 보다 우수한 유전형 데이터를 제공한다. 보다 큰 다중화는 보다 많은 대립유전자가 표적화되도록 함으로써, 보다 정밀한 결과를 제공한다. 보다 우수한 균일성은, 보다 많은 표적화된 대립유전자가 측정되도록 함으로써 보다 정밀한 결과를 제공한다. 보다 낮은 비율의 대립유전자 편향은 보다 낮은 비율의 잘못된 요청을 초래하여, 보다 정밀한 결과를 제공한다. 보다 정밀한 결과는 임상 결과의 개선, 및 보다 우수한 의학적 관리를 초래한다.
LIP가 서열분석 이외의 방법에 의해 유전형분석을 위한 DNA의 시료 속에서 특이적인 유전자자리를 표적화하는 방법으로 사용될 수 있음이 중요하다. 예를 들어, LIP는 SNP 배열 또는 다른 DNA 또는 RNA계 미세배열을 사용한 유전형검사용 DNA를 표적화하는데 사용될 수 있다.
연결- 매개된 PCR
연결-매개된 PCR을 사용하여 연결되지 않은 프라이머를 사용한 PCR 증폭 전 또는 후에 표적 유전자자리를 증폭시킬 수 있다. 연결-매개된 PCR은 DNA의 혼합물 속의 하나 또는 다수의 유전자자리를 증폭시킴으로써 DNA의 시료를 우선적으로 농축시키는데 사용된 PCR 방법이며, 당해 방법은 프라이머 쌍들의 세트(여기서 쌍의 각각의 프라이머는 표적 특이적인 서열 및 비-표적 서열을 함유하고, 여기서 표적 특이적인 서열은 표적 영역, 다형성 부위의 하나의 상부 및 하나의 하부에 어닐링하도록 바람직하게 설계되어 있으며, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-20, 21-30, 31-40, 41-50, 51-100, 또는 100 초과에 의해 다형성 부위에서 분리될 수 있다)의 수득; 상부 프라이머의 3-프라임 말단에서 이것과 표적 분자에 대해 상보성인 뉴클레오타이드를 지닌 하부 프라이머의 5-프라임 말단 사이의 일본쇄 영역 충전물까지의 중합화; 상부 프라이머의 마지막 중합된 염기의 하부 프라이머의 인접한 5-프라임 염기까지의 연결; 및 상부 프라이머의 5-프라임 말단 및 하부 프라이머의 3-프라임 말단에서 함유된 비-표적 서열을 사용한 유일하게 중합되고 연결된 분자의 증폭을 포함한다. 명백한 표적에 대한 프라이머의 쌍은 동일한 반응물 속에서 혼합할 수 있다. 비-표적 서열은 공통의 서열로 제공됨으로써 성공적으로 중합되어 연결되어진 프라이머의 모든 쌍이 단일 쌍의 증폭 프라이머로 증폭될 수 있도록 한다.
하이브리드화에 의한 포획
일부 구현예에서, 본원의 방법은 다중 PCR을 사용하는 것 외에 하이브리드화 방법에 의한 다음의 포획 중의 어느 것을 사용하여 표적 유전자자리를 증폭시킴을 포함할 수 있다. 표적 게놈에서 서열의 특이적인 세트의 우선적인 농축은 다수의 방법으로 달성할 수 있다. 본 문서의 어딘가에 LIP가 서열의 특이적인 세트를 표적하는데 사용될 수 있는 방법이 기술되어 있지만, 이러한 출원들 모두에서는 다른 표적화 및/또는 우선적 농축 방법을 동일한 목적을 위해 동등하게 잘 사용할 수 있다. 다른 표적화 방법의 하나의 예는 하이브리드화 접근법에 의한 포획이다. 하이브리드화 기술에 의한 시판되는 포획의 일부 실시예는 AGILENT's SURE SELECT 및 ILLUMINA's TruSeq를 포함한다. 하이브리드화에 의한 포획에서, 바람직한 표적화된 서열에 대해 대부분 상보성이거나 상보성인 올리고뉴클레오타이드의 세트는 DNA의 혼합물에 하이브리드화하도록 한 후, 혼합물로부터 물리적으로 분리한다. 일단 목적한 서열이 표적 올리고뉴클레오타이드에 하이브리드화되면, 표적화 올리고뉴클레오타이드를 물리적으로 제거하는 효과는 또한 표적화된 서열을 제거하는 것이다. 일단 하이브리드화된 올리고가 제거되면, 이들을 이들의 용융 온도 이상으로 가열하고 이들을 증폭시킬 수 있다. 표적화 올리고뉴클레오타이드를 물리적으로 제거하는 일부 방법은 표적화 올리고를 고체 지지체, 예를 들면, 자기 비드, 또는 칩에 공유결합으로 결합시킴에 의한 것이다. 표적화 올리고뉴클레오타이드를 물리적으로 제거하는 다른 방법은 이들을 다른 분자 잔기에 대해 강력한 친화성을 지닌 분자 잔기에 공유 결합시키는 것이다. 이러한 분자쌍의 예는 SURE SELECT에서 사용된 바와 같은 바이오틴 및 스트렙타비딘이다. 따라서, 이러한 표적화된 서열은 바이오틴 분자에 공유결합적으로 부착될 수 있었으며, 하이브리드화 후, 고정된 스트렙타비딘을 지닌 고체 지지체를 사용하여, 표적화된 서열에 하이브리드화되는 바이오티닐화된 올리고뉴클레오타이드를 끌어내릴 수 있다.
하이브리드 포획은 목적한 표적에 대해 상보성인 프로브를 표적 분자에 하이브리드화시킴을 포함한다. 하이브리드 포획 프로브는 표적들 사이에 상대적인 균일성을 지닌 게놈의 거대 분획을 표적화하여 농축시키도록 원래 개발되어 왔다. 본 출원에서는, 모든 영역이 서열분석에 의해 검출될 수 있기에 충분한 균일성으로 증폭되지만, 원래의 시료 속에서 대립유전자의 비율을 유지하는 것이 고려되지 않았다는 것이 중요하였다. 포획 후, 시료 속에 존재하는 대립유전자는 포획된 분자의 직접적인 서열분석에 의해 측정될 수 있다. 이들 서열분석 판독물은 대립유전자 유형에 따라서 분석되고 계수될 수 있다. 그러나, 현재의 기술, 측정된 대립유전자 분포를 사용하여, 포획된 서열은 전형적으로 원래의 대립유전자 분포를 나타내지 않는다.
일 구현예에서, 대립유전자의 검출은 서열분석에 의해 수행된다. 다형성 부위에서 대립유전자 실체를 보증하기 위하여, 서열분석 판독물이 문제의 대립유전자에걸쳐져서 포획된 분자의 대립유전자 조성을 평가하는 것이 필수적이다. 포획 분자는 흔히 가변성 길이이므로, 서열분석시 전체 분자가 서열분석되지 않으면 변이체 위치를 오버랩하는 것을 보증할 수 없다. 그러나, 비용 고려 및 또한 최대로 가능한 길이에 대한 기술적 제약 및 서열분석 판독물의 정밀성은 전체 분자의 서열분석을 달성할 수 없도록 한다. 일 구현예에서, 판독물 길이는 약 30 내지 약 50개로 증가될 수 있거나 약 70개 염기는 표적화된 서열 내에서 변이체 위치를 오버랩하는 판독물의 수를 크게 증가시킬 수 있다.
목적한 위치를 질의하는 판독물의 수를 증가시키는 다른 방법은 직면한 농축된 대립유전자 속에 편향을 초래하지 않는 한, 프로브의 길이를 감소시키는 것이다. 합성된 프로브의 길이는, 하나의 유전자자리에서 발견된 2개의 상이한 대립유전자에 하이브리드화하도록 설계된 2개의 프로브가 원래의 시료 속에서 다양한 대립유전자에 대해 거의 동등한 친화성으로 하이브리드화하도록 충분히 길어야 한다. 현재, 당해 분야에 공지된 방법은 120개 염기 보다 전형적으로 더 긴 프로브를 기술하고 있다. 본 구현예에서, 대립유전자가 1개 또는 소수의 염기인 경우, 포획 프로브는 약 110개 미만의 염기, 약 100개 미만의 염기, 약 90개 미만의 염기, 약 80개 미만의 염기, 약 70개 미만의 염기, 약 60개 미만의 염기, 약 50개 미만의 염기, 약 40개 미만의 염기, 약 30개 미만의 염기, 및 약 25개 미만의 염기일 수 있고, 이는 모든 대립유전자의 동등한 농축을 보장하기에 충분하다. 하이브리드 포획 기술을 사용하여 농축될 DNA의 혼합물이 혈액, 예를 들면, 모계 혈액에서 분리된 자유로이 부유하는 DNA를 포함하는 혼합물인 경우, DNA의 평균 길이는 매우 짧아서, 전형적으로 200개 염기 미만이다. 보다 짧은 프로브의 사용은, 하이브리드 포획 프로브가 목적한 DNA 단편을 포획할 보다 큰 기회를 생성한다. 보다 큰 변화는 보다 긴 프로브를 필요로 할 수 있다. 일 구현예에서, 목적하는 변형은, 길이가 1개(SNP) 내지 수개의 염기이다. 일 구현예에서, 게놈내 표적화된 영역은 하이브리드 포획 프로브를 사용하여 우선적으로 농축시킬 수 있으며, 여기서 하이브리드 포획 프로브는 90개 이하의 염기 길이이고, 80개 미만의 염기 길이, 70개 미만의 염기 길이, 60개 미만의 염기 길이, 50개 미만의 염기 길이, 40개 미만의 염기 길이, 30개 미만의 염기 길이, 또는 25개 미만의 염기 길이일 수 있다. 일 구현예에서, 목적한 대립유전자가 서열분석되는 기회를 증가시키기 위해, 다형성 대립유전자 위치를 플랭킹하는 영역에 하이브리드화하도록 설계된 프로브의 길이는 90개 이상의 염기로부터, 약 80개 염기까지, 또는 약 70개 염기까지, 또는 약 60개 염기까지, 또는 약 50개 염기까지, 또는 약 40개 염기까지, 또는 약 30개 염기까지, 또는 약 25개 염기까지 감소시킬 수 있다.
포획을 가능하도록 하기 위하여 합성된 프로브와 표적 분자 사이의 최대 오버랩이 존재한다. 이러한 합성된 프로브는 가능한 한 짧게 제조될 수 있지만 오버랩에 요구된 이러한 최소 오버랩보다 더 크다. 다형성 영역을 포획화하기 위해 보다 짧은 프로브 길이를 사용하는 효과는, 표적 대립유전자 영역을 오버랩하는 보다 많은 분자가 존재할 것이라는 것이다. 원래의 DNA 분자의 단편화 상태는 또한 표적화된 대립유전자를 오버랩할 판독물의 수에 영향을 미친다. 혈장 시료와 같은 일부 DNA 시료는 생체내에서 일어나는 생물학적 공정으로 인하여 이미 단편화되어 있다. 그러나, 보다 긴 단편을 지닌 시료는 서열분석 라이브러리 제조 및 농축 전에 단편화로 인해 유리하다. 프로브 및 단편 둘 모두가 짧을 경우(~60-80 bp), 최대 특이성은 목적한 주요 영역과 중첩하는데 실패하는 비교적 적은 서열 판독물에서 달성될 수 있다.
일 구현예에서, 하이브리드화 조건을 조절하여 원래의 시료 속에 존재하는 상이한 대립유전자의 포획시 균일성을 최대화할 수 있다. 일 구현예에서, 하이브리드화 온도를 강하시켜 대립유전자 사이에 하이브리드화 편향에 있어서의 차이를 최소화한다. 당해 분야에 공지된 방법은, 온도를 저하시키는 것이 의도되지 않은 표적에 대한 프로브의 하이브리드화를 증가시키는 효과를 가지므로 하이브리드화를 위해 보다 낮은 온도를 사용하는 것을 피한다. 그러나, 목표가 최대의 정확도로 대립유전자 비를 보존하는 경우, 보다 낮은 하이브리드화 온도를 사용하는 접근법은, 현재의 기술이 이러한 접근법와는 벗어나게 교시한다는 사실에도 불구하고, 최적으로 정밀한 대립유전자 비를 제공한다. 하이브리드화 온도는 또한 표적과 합성된 프로브 사이에 보다 큰 오버랩을 요구하도록 증가시켜 표적화된 영역의 실질적인 오버랩을 지닌 표적만이 포획되도록 할 수 있다. 본원의 일부 구현예에서, 하이브리드화 온도는 정상의 하이브리드화 온도에서 약 40℃, 약 45℃, 약 50℃, 약 55℃, 약 60℃, 약 65, 또는 약 70℃까지 강하된다.
일 구현예에서, 하이브리드 포획 프로브를 설계하여 다형성 대립유전자를 플랭킹하는 영역 속에서 발견된 DNA에 대해 상보성인 DNA를 지닌 포획 프로브의 영역이 다형성 부위에 바로 근접하지 않도록 할 수 있다. 대신, 포획 프로브를 설계하여 표적의 다형성 부위를 플랭킹하는 DNA에 대해 하이브리드화하도록 설계된 포획 프로브의 영역이, 길이가 1개 또는 작은 수의 염기와 동등한 작은 거리에 의해 다형성 부위와 반데르발스(van der Waals) 내에 있게 될 포획 프로브의 부위에서 분리되도록 할 수 있다. 일 구현예에서, 하이브리드 포획 프로브는 다형성 대립유전자를 플랭킹하는 영역에 하이브리드화하도록 설계되지만 이를 교차하지 않으며; 이는 플랭킹 포획 프로브로 명명될 수 있다. 플랭킹 포획 프로브의 길이는 약 120개 염기 미만, 약 110개 염기 미만, 약 100개 염기 미만, 약 90 개 염기 미만일 수 있으며, 약 80개 염기 미만, 약 70개 염기 미만, 약 60개 염기 미만, 약 50개 염기 미만, 약 40개 염기 미만, 약 30개 염기 미만, 또는 약 25개 염기 미만일 수 있다. 플랭킹 포획 프로브에 의해 표적화된 게놈의 영역은 다형성 유전자자리에 의해 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 내지 20, 또는 20개 염기 쌍까지 분리될 수 있다.
표적화된 포획의 설명은 표적화된 서열 포획을 사용하는 질병 스크리닝 시험을 기반으로 한다. 통상의 표적화된 서열 포획은 AGILENT(SURE SELECT), ROCHE-NIMBLEGEN, 또는 ILLUMINA에 의해 현재 제공된 것들과 유사하다. 포획 프로브를 통상적으로 설계하여 다양한 유형의 돌연변이의 포획을 보증할 수 있었다. 점 돌연변이의 경우, 점 돌연변이와 오버랩되는 하나 이상의 프로브는 돌연변이를 포획하고 서열분석하기 충분하여야 한다.
작은 삽입 또는 결실의 경우, 돌연변이와 오버랩되는 하나 이상의 프로브는 돌연변이를 포함하는 단편을 포획하여 서열분석하기에 충분할 수 있다. 하이브리드화는 참조 게놈 서열에 대해 전형적으로 설계된, 프로브-제한 포획 효능 사이에 거의 충분하지 않을 수 있다. 돌연변이를 포함하는 단편의 포획을 보증하기 위하여, 하나는 정상의 대립유전자에 일치하고 하나는 돌연변이체 대립유전자에 일치하는, 2개의 프로브를 설계할 수 있었다. 보다 긴 프로브는 하이브리드화를 향상시킬 수 있다. 다수의 중첩되는 프로브는 포획을 향상시킬 수 있다. 최종적으로, 프로브를 바로 근접하게, 그러나 중첩되지 않도록 놓음으로써, 돌연변이는 정상 및 돌연변이체 대립유전자의 비교적 유사한 포획 효능을 허용할 수 있다.
단순한 연쇄 반복(STR)의 경우, 이들 고도의 가변성 부위를 중첩하는 프로브는 단편을 잘 포획하지 않는 가능성이 있다. 포획을 향상시키기 위하여, 프로브를 가변 부위에 근접하게 그러나 중첩되지 않도록 둘 수 있다. 이후에, 단편은 정상으로 서열분석하여 STR의 길이 및 조성을 나타낼 수 있다.
큰 결실의 경우, 일련의 중첩 프로브, 엑손 포획 시스템에서 현재 사용된 일반적인 접근법을 작동시킬 수 있다. 그러나, 이러한 접근법을 사용하면 개체가 이형접합성인지를 측정하기가 어려울 수 있다. 포획된 영역내에서 SNP를 표적화하고 평가하는 것은, 개체가 담체임을 나타내는 영역에 걸쳐 이형접합성의 손실을 잠재적으로 나타낼 수 있었다. 일 구현예에서, 중첩되지 않거나 개체 프로브를 잠재적으로 결실된 영역을 따라 배치하고 이형접합성의 척도로서 다수의 포획된 단편을 사용하는 것이 가능하다. 개체가 큰 결실을 수반하는 경우에, 단편의 수의 1/2은 결실되지 않은(이배체) 참조 유전자자리에 대한 포획에 이용가능한 것으로 예측된다. 결과적으로, 결실된 영역에서 수득된 판독물의 수는 일반적인 이배체 유전자자리에서 수득된 것의 거의 1/2이어야 한다. 잠재적으로 결실된 영역을 따라 다수의 개체 프로브로 서열분석 판독물 깊이를 응집시키고 평균을 내는 것은 신호를 향상시켜 진단의 신뢰도를 개선시킬 수 있다. SNP를 표적화하여 이형접합성의 손실을 확인하고 다수의 개체 프로브를 사용하여 이러한 유전자자리에서 근본적인 단편의 양의 정량적 척도를 수득하는 것을 또한 합할 수 있다. 이들 전략 각각 또는 둘 모두는 다른 전략과 합하여 동일한 목표를 보다 우수하게 수득할 수 있다.
Y-염색체 단편의 존재에 의해 나타난 것으로서, 남성 태아의 cfDNA 검출 동안인 경우, 동일한 시험에서 포획되고 서열분석되며, 모친 및 부친이 영향을 받지 않는 경우의 X-연결된 우성 돌연변이체, 또는 모친이 영향을 받지 않는 우성 돌연변이는 태아에 최대 위험을 나타낼 수 있다. 영향을 받지 않은 모에서 동일한 유전자내 2개의 돌연변이체 열성 대립유전자의 검출은, 태자녀 부친으로부터 돌연변이체 대립유전자를, 그리고 잠재적으로 모친으로부터 제2의 돌연변이체 대립유전자를 유전받았음을 내포할 수 있다. 모든 경우, 양수천자 또는 융모막 융모 시료채취에 의한 후속적인 시험을 나타낼 수 있다.
표적화된 포획계 질병 스크리닝 시험은 이수성에 대한 비-침입성 출생전 진단 시험을 기반으로 표적화된 포획과 결합시킬 수 있다.
판독(DOR) 가변성의 깊이를 감소시키기 위한 다수의 방법이 존재한다: 예를 들면, 프라이머 농도를 증가시킬 수 있거나, 보다 긴 표적화된 증폭 프로브를 사용할 수 있거나, 보다 많은 STA 주기(예를 들면, 25개 이상, 30개 이상, 35, 또는 심지어 40개 이상)를 수행할 수 있다.
시료의 DNA 분자의 수를 측정하는 예시적인 방법
제1 라운드의 DNA 증폭 동안에 시료 속의 각각의 원래 DNA 분자에 대한 유일하게 확인된 분자를 생성함으로써 시료 속의 DNA 분자의 수를 측정하는 방법이 본원에 기술되어 있다. 단일 분자 또는 클론 서열분석 방법에 따라 상기 목표를 달성하는 과정이 본원에 기술되어 있다.
당해 접근법은, 하나 이상의 특이적인 유전자자리를 표적화하고 각각의 표적화된 유전자자리의 대부분 또는 모든 태그된 분자가 유일한 태그를 가질 것이며 클론 또는 단일 분자 서열분석을 사용하여 당해 바코드의 서열분석시 서로 구별될 수 있는 방식으로 원래의 분자의 태그된 카피를 생성하는 것을 포함한다. 각각의 유일하게 서열분석된 바코드는 원래의 시료 속의 유일한 분자를 나타낸다. 동시에, 서열분석 데이터를 사용하여 분자가 기원하는 유전자자리를 확인한다. 당해 정보를 사용하여 각각의 유전자자리에 대한 원래의 시료 속에서 유일한 분자의 수를 측정할 수 있다.
당해 방법은, 원래의 시료 속의 다수의 분자의 정량적인 평가가 요구되는 어떠한 적용에도 사용될 수 있다. 또한, 하나 이상의 표적의 유일한 분자의 수를 하나 이상의 다른 표적에 대해 유일한 분자의 수와 관련시켜 상대적인 카피 수, 대랍형질 분포, 또는 대립유전자 비를 측정할 수 있다. 달리는 다양한 표적의 결실된 카피의 수를 분포로 모델화하여 원래의 표적의 대부분 유사한 카피 수를 확인할 수 있다. 적용은 듀켄씨 근이영양증의 매개체에서 발견되는 것과 같은 삽입 및 결실의 검출; 카피 수 변이체에서 관찰된 것과 같은 염색체의 결실 또는 중복 분절의 정량화; 태어난 개체에서 채취한 시료의 염색체 카피 수; 배아 또는 태아와 같이 태어나지 않은 개체에서 채취한 시료의 염색체 카피 수를 포함하나, 이에 한정되지 않는다.
당해 방법은 서열에 의해 표적화된 것 속에 함유된 변이체의 동시 평가와 합해질 수 있다. 이는 원래의 시료 속에서 각각의 대립유전자를 나타내는 분자의 수를 측정하는데 사용될 수 있다. 당해 카피 수 방법은 태어난 그리고 태어나지 않은 개체의 염색체 카피 수를 측정하기 위한 SNP 또는 다른 서열 변이의 평가와; 짧은 서열 변이를 갖지만, 여기서 PCR이 척추 근위축의 담체 검출과 같은 다수의 표적 영역의 증폭될 수 있는 다수의 표적의 증폭될 수 있는 유전자자리의 카피의 구별 및 정량화와; 모계 태반 혈장에서 수득된 자유로이 부유하는 DNA의 태아 이수성의 검출에서와 같이 상이한 개체의 혼합물로 이루어진 시료의 상이한 분자원의 카피 수의 측정과 결합될 수 있다.
일 구현예에서, 단일의 표적 유전자자리에 관한 방법은 다음의 단계 중 하나 이상을 포함할 수 있다: (1) 특이적인 유전자자리의 PCR 증폭을 위한 올리고머의 표준 쌍의 설계. (2) 합성 동안, 표적 유전자자리 또는 게놈에 대해 상보성이 없거나 최소인 명시된 염기의 서열의 표적 특이적인 올리고머 중 하나의 5' 말단에 대한 첨가. 테일로 명명된 당해 서열은 무작위 뉴클레오타이드(random nucleotide)의 서열을 수반하는, 후속적인 증폭을 위해 사용되는 공지된 서열이다. 당해 무작위 뉴클레오타이드는 무작위 영역을 포함한다. 당해 무작위 영역은 각각의 프로브 분자 사이에서 확률적으로 상이한 핵산의 무작위적으로 생성된 서열을 포함한다. 결과적으로, 합성에 이어서, 테일된 올리고머 혼주물은 공지된 서열로 시작한 후, 분자 사이에 상이한 공지되지 않은 서열에 이어서, 표적 특이적인 서열이 이어지는 올리고머의 집합으로 이루어질 것이다. (3) 테일된 올리고머만을 사용한 1 라운드의 증폭(변성, 어닐링, 연장). (4) 엑소뉴클레아제를 반응물에 첨가, PCR 반응의 효과적인 중지, 및 적절한 온도에서 반응물을 항온처리하여 주형에 어닐링하지 않은 전방 일본쇄 올리고를 제거하고 연장시켜 이본쇄 생성물을 형성함. (5) 고온에서 반응물을 항온처리하여 엑소뉴클레아제를 변성시키고 이의 활성을 제거함. (6) 제1 라운드의 PCR에서 생성된 생성물의 PCR 증폭을 가능하도록 하기 위해 다른 표적 특이적인 올리고머와 함께 제1 반응에 사용된 올리고머의 테일에 대해 상보성인 신규 올리고뉴클레오타이드를 반응물에 첨가함. (7) 증폭을 지속하여 하부 클로날 서열분석에 충분한 생성물을 형성함. (8) 서열을 연장시키기에 충분한 수의 염기에 대한 다수의 방법, 예를 들면, 클론 서열분석에 의해 증폭된 PCR 생성물의 측정.
일 구현예에서, 본원의 방법은 다수의 유전자자리를 평행하게 또는 달리 표적화함을 포함한다. 상이한 표적 유전자자리에 대한 프라이머는 독립적으로 생성시킬 수 있고 혼합하여 다중 PCR 혼주물을 생성시킬 수 있다. 일 구현예에서, 원래의 시료는 소-혼주물로 나눌 수 있으며 상이한 유전자자리는 재조합 및 서열분석되기 전에 각각의 소-혼주물 속에서 표적화할 수 있다. 일 구현예에서, 표적화 단계, 및 증폭 주기의 수는, 혼주물을 세분하여 쪼개기 전에 모든 표적의 충분한 표적화, 및 세분된 혼주물 속에서 프라이머의 보다 작은 세트를 사용한 연속 증폭에 의해 후속적인 증폭을 개선시키는 것을 보증하기 전에 수행할 수 있다.
당해 기술이 특히 유용할 수 있는 적용의 한가지 예는, 비-침입성인 태자녀 이수성 진단이고, 여기서 소정의 유전자자리에서 대립유전자의 비 또는 다수의 유전자자리에서 대립유전자의 분포를 사용하여 태아에 존재하는 염색체의 카피의 수를 측정하는데 도움을 줄 수 있다. 이와 관련하여, 초기 시료 속에 존재하는 DNA를 증폭시키는 한편 다양한 대립유전자의 상대적인 양을 유지하는 것이 바람직할 수 있다. 일부 상황에서, 매우 소량의 DNA, 예를 들면, 5,000개 미만의 카피의 게놈, 1,000개 이하의 카피의 게놈, 500개 이하의 카피의 게놈, 및 100개 이하의 카피의 게놈이 존재하는 경우에, 병목현상으로 불리는 현상에 직면할 수 있다. 이는 초기 시료의 특정한 대립유전자의 소수의 카피가 존재하여, 증폭 편향이 DNA의 초기 혼합물 속에서 존재하는 것보다 이들 대립유전자의 유의적으로 상이한 비를 갖는 DNA의 증폭된 혼주물을 생성할 수 있는 경우이다. 유일하거나 거의 유일한 세트의 바코드를 표준 PCR 증폭 전에 DNA의 각각의 쇄에 적용함으로써, 동일한 원래의 분자에서 기원한 서열분석된 DNA의 n개의 분자들의 동일한 세트에서 n-1개 카피의 DNA를 배제시키는 것이 가능하다.
예를 들면, 개체의 게놈내 이형접합성 SNP, 및 각각의 대립유전자의 10개 분자가 DNA의 원래의 시료 속에 존재하는 경우 개체의 DNA의 혼합물을 고려한다. 증폭 후 이러한 유전자자리에 상응하는 DNA의 100,000개의 분자가 존재할 수 있다. 확률론적 공정으로 인하여, DNA의 비는 1:2 내지 2:1일 수 있지만, 원래의 분자 각각은 유일한 태그로 태그되었으므로, 증폭된 혼주물의 DNA가 각각의 대립유전자의 DNA의 정확히 10개 분자에서 기원하였음을 측정하는 것이 가능할 수 있다. 따라서, 당해 방법은 당해 접근법을 사용하지 않은 방법보다 각각의 대립유전자의 상대적인 양의 보다 정확한 측정을 제공할 수 있다. 대립유전자 편향의 상대적인 양을 최소화시키는 것이 바람직한 방법의 경우, 당해 방법은 보다 정밀한 데이터를 제공할 것이다.
표적 유전자자리에 대한 서열분석된 단편의 연합은 다수의 방법으로 달성할 수 있다. 일 구현예에서, 충분한 길이의 서열을 표적화된 단편에서 수득하여 분자 바코드 및 표적 서열에 상응하는 유일한 염기의 충분한 수를 연장시켜서 표적 유전자자리를 명확하게 확인한다. 다른 구현예에서, 무작위적으로 생성된 분자 바코드를 함유하는 분자 바-코드화 프라이머는 또한, 이것이 관련된 표적을 확인하는 유전자자리 특이적인 바코드(유전자자리 바코드)를 함유할 수 있다. 당해 유전자자리 바코드는 각각이 개체의 표적에 대해 모든 분자 바-코드화 프라이머 중에서 동일할 수 있지만 다른 모든 표적과 상이하다. 일 구현예에서 본원에 기술된 태그화 방법은 단면 중첩화 프로토콜과 결합될 수 있다.
일 구현예에서, 분자 바코드화 프라이머의 설계 및 생성을 감소시켜 다음과 같이 실시할 수 있다: 분자 바코드화 프라이머는 표적 서열에 이어 무작위 분자 바코드 영역에 이어 표적 특이적인 서열에 대해 상보성이 아닌 서열로 이루어질 수 있다. 분자 바코드의 서열 5'는 서열 PCR 증폭에 사용될 수 있으며 앰플리콘의 서열분석용 라이브러리로의 전환 시 유용한 서열을 포함할 수 있다. 무작위 분자 바코드 서열은 다수의 방법으로 생성시킬 수 있다. 바람직한 방법은 바코드 영역의 합성 동안 반응에 모든 4개의 염기를 포함시키는 방식으로 분자 태그화 프라이머를 합성한다. 염기의 모든 또는 다양한 조합은 IUPAC DNA 애매성 코드(ambiguity code)를 사용하여 구체화할 수 있다. 이러한 방식으로 분자의 합성된 수집은 분자 바코드 영역 속에 서열의 무작위 혼합물을 함유할 것이다. 바코드 영역의 길이는, 얼마나 많은 프라이머가 유일한 바코드를 함유할 것인지를 측정할 것이다. 유일한 서열의 수는 N ^L (여기서 N은 염기의 수, 전형적으로 4이고 L은 바코드의 길이이다)로서의 바코드 영역의 길이와 관련되어 있다. 5개 염기의 바코드는 1024개 이하의 유일한 서열을 생성할 수 있으며; 8개 염기의 바코드는 65536개의 유일한 바코드를 생성할 수 있다. 일 구현예에서, DNA는 서열분석 방법으로 측정할 수 있으며, 여기서 서열 데이터는 단일 분자의 서열을 나타낸다. 이는, 단일 분자가 직접 서열분석되는 방법 또는 단일 분자가 증폭되어 서열 장치에 의해 검출가능한 클론을 형성하지만, 본원에서 클론성 서열분석으로 불리는, 단일 분자를 여전히 나타내는 방법을 포함할 수 있다.
증폭 생성물의 정량화를 위한 예시적인 방법 및 시약
목적한 특이적인 핵산 서열의 정량화는 TAQMAN (제조원: LIFE TECHNOLOGIES), INVADER 프로브(제조원: THIRD WAVE TECHNOLOGIES) 등과 같은 정량적 실시간 PCR 기술에 의해 전형적으로 수행된다. 이러한 기술은 다수의 서열의 동시 분석을 평행하게(다중화) 달성하기 위한 제한된 능력 및 단지 협소한 범위의 가능한 증폭 주기에 대해 정밀한 정량적 데이터를 제공하는 능력(예를 들면, PCR 증폭 생산량 대 주기의 수의 논리가 선형 범위에 있는 경우)과 같은 다수의 단점을 지니고 있다. MYSEQ (제조원: ILLUMINA), HISEQ (제조원: ILLUMINA), ION TORRENT (제조원: LIFE TECHNOLOGIES), GENOME ANALYZER ILX (제조원: ILLUMINA), GS FLEX+(ROCHE 454) 등에서 사용된 것과 같은 DNA 서열분석 기술, 특히 고 배출 차-세대 서열분석 기술(흔히 거대하게 평행한 서열분석 기술로 언급됨)을 시료 속에 존재하는 목적한 서열의 카피의 수와 정량적 측정을 위해 사용함으로써 출발 물질에 대한 정량적 정보, 예를 들면, 카피 수 또는 전사 수준을 제공할 수 있다. 고 배출 유전 서열분석기는 바 코드화(즉, 명백한 핵산 서열을 사용한 시료 태그화)의 사용으로 처리하여 개체에서 채취한 특정한 시료를 확인함으로써 1회 수행의 DNA 서열분석기 속의 다수의 시료를 동시 분석할 수 있다. 라이브러리 제제(또는 목적한 다른 핵산 제제) 속의 게놈의 특정 영역이 서열분석되는 회수(판독물의 수)는 목적한 게놈 속의 서열의 카피의 수(또는 cDNA 함유 제제의 경우에 발현 수준)에 비례할 것이다. 그러나, 유전자 라이브러리(및 유사한 게놈 기원한 제제)의 제제 및 서열분석은 목적한 핵산 서열에 대한 정확한 정량적 판독물을 수득하는 것을 방해하는 다수의 편향을 도입할 수 있다. 예를 들면, 상이한 핵산 서열은 유전자 라이브러리 제조 또는 시료 제조 동안에 일어나는 핵산 증폭 단계 동안에 상이한 효율로 증폭시킬 수 있다.
차등적인 증폭 효율과 관련된 문제는 본 발명의 특정 양태를 사용함으로써 완화시킬 수 있다. 본 발명은 정량량의 정밀도를 개선시키는데 사용될 수 있는 증폭 공정에서 포함시키기 위한 표준물의 사용과 관련된 각종 방법 및 조성물을 포함한다. 본 발명은 다른 분야 중에서도, 본원에 기술되고 다른 곳 중에서도, 이의 전문이 각각 본원에 참조로 포함된, 미국 특허 제8,008,018호; 미국 특허 제7,332,277호; PCT 특허 공보 제WO 2012/078792A2호; 및 PCT 공개 특허 공보 제WO 2011/146632 A1호에 기술된 바와 같은 모계 혈액 속의 자유로이 부유하는 태아 DNA를 분석함으로써 태아 속의 이수성의 검출 시 용도이다. 본 발명의 구현예는 또한 시험관 생성된 배아에서 이수성의 검출 시 용도이다. 검출될 수 있는 상업적으로 유의적인 이수성은 사람 염색체 13, 18, 21, X 및 Y의 이수성을 포함한다.
본 발명의 구현예는 사람 또는 비-사람 핵산과 함께 사용될 수 있고, 동물 및 식물 기원한 핵산 둘 모두에 적용될 수 있다. 본 발명의 구현예를 또한 사용하여 결실 또는 삽입에 의해 특징화된 다른 유전 질환에 대한 대립유전자를 검출하고/하거나 정량화할 수 있다. 대립유전자를 함유하는 결실은 목적한 대립유전자의 예측된 담체 속에서 검출될 수 있다.
본 발명의 일 구현예는 공지된 양(상대량 또는 절대량) 속에 존재하는 표준물을 포함한다. 예를 들면, 염색체 8(유전자자리 A 포함)에 대한 이배체 및 염색체 21(유전자자리 B 함유)에 대한 삼배체로 제조된 유전자 라이브러리를 고려한다. 유전자 라이브러리는 시료 속에 존재하는 다수의 염색체의 기능인 양, 예를 들면, 유전자자리 A의 200개 카피 및 유전자자리 B의 300개 카피로 서열을 함유할 당해 시료로 생산될 수 있다. 그러나, 유전자자리 A가 유전자자리 B보다 더 효율적으로 증폭되는 경우, PCR 후에 60,000개 카피의 A 앰플리콘 및 30,000개 카피의 B 앰플리콘이 존재할 수 있으므로, 고 배출 DNA 서열분석(또는 다른 정량적인 핵산 검출 기술)에 의한 분석시 초기 게놈 시료 의 실제 염색체 카피 수를 애매하게 한다. 이러한 문제를 완화시키기 위해, 유전자자리 A에 대한 표준 서열을 사용하며, 여기서 표준 서열은 유전자자리 A와 필수적으로 동일한 효율로 증폭된다. 유사하게, 유전자자리 B에 대한 표준 서열이 생성되며, 여기서 표준 서열은 유전자자리 B와 필수적으로 동일한 효율로 증폭한다. 유전자자리 A의 표준 서열 및 유전자자리 B의 표준 서열을 PCR(또는 다른 증폭 기술) 전에 혼합물에 가한다. 이들 표준 서열은 공지된 양, 상대량 또는 절대량으로 존재한다. 표준 서열 A 및 표준 서열 B의 1;1 혼합물이 앞서의 실시예에서 혼합물에 (증폭 전에) 가해지는 경우, 3000개 카피의 표준 A 앰플리콘이 생성될 수 있으며 1000개 카피의 표준 B 앰플리콘이 생성될 수 있고, 이는, 유전자자리 A가 동일한 세트의 조건 하에서 유전자자리 B보다 3배 더 효율적으로 증폭됨을 나타낸다.
다양한 구현예에서 목적한 SNP(또는 다른 다형성)을 함유하는 게놈의 하나 이상의 선택된 게놈이 구체적으로 증폭되어 후속적으로 서열분석될 수 있다. 당해 표적 특이적인 앰플리콘은 서열분석을 위한 유전자 라이브러리의 형성 동안에 일어날 수 있다. 당해 라이브러리는 증폭을 위한 다수의 표적화된 영역을 함유할 수 있다. 일부 구현예에서, 적어도 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 목적한 영역이 함유된다. 이러한 라이브러리의 예는 본원에 기술되어 있으며, 이의 전문이 본원에 참조로 포함된, 2011년 11월 18일자로 출원된 미국 특허원 제2012/0270212호에서 찾을 수 있다.
많은 고 배출 DNA 서열분석 기술은 유전 출발 물질의 변형, 예를 들면, 공통의 프라이밍 부위 및/또는 바코드의 연결을 필요로 함으로써 후속적인 서열분석 반응을 수행하기 전에 작은 핵산 단편의 클론 증폭을 용이하게 한다. 일부 구현예에서, 하나 이상의 표준 서열을 유전자 라이브러리 형성 동안에 가하거나 라이브러리의 증폭 전에 유전자 라이브러리의 전구체 성분에 가한다. 표준 서열을 선택하여 고 배출 유전 서열분석 기술에 의한 서열분석을 위해 제조된 표적 게놈 단편을 모사(아직 뉴클레오타이드 염기 서열을 기반으로 구별가능함)할 수 있다. 일 구현예에서, 표준 서열은 1, 2, 3, 4 내지 10, 또는 11 내지 20개의 뉴클레오타이드를 제외하고는 표적 게놈 단편과 동일할 수 있다. 일부 구현예에서, 표적 유전 서열이 SNP를 함유하는 경우, 표준 서열은 다형성 염기에서의 뉴클레오타이드를 제외하고는 SNP와 동일할 수 있으며, 이는 천연에서 당해 위치에서 관측되지 않는 4개의 뉴클레오타이드 중 하나인 것으로 선택될 수 있다. 표준 서열은 다수의 표적 유전자자리(예를 들면, 다형성 유전자자리)의 고도의 다중 분석에서 사용될 수 있다. 표준 서열은 라이브러리 형성의 공정 동안(증폭 전)에 공지된 양(상대적인 또는 절대적인)으로 가해져서 분석 시료 속에 목적한 표적 서열의 양을 측정하는데 있어서 보다 큰 정밀성을 위한 표준 미터법(metric)을 제공한다. 앞서 특성화된 배수성 수준, 예를 들어, 모든 자가 염색체에 대해 이배체인 것으로 공지된 게놈으로부터 형성된 서열분석용 라이브러리의 배수성 수준 형성의 지식과 함께 사용된 표준 서열의 공지된 양의 지식의 조합을 사용하여 다수의 표준 서열을 포함하는 혼합물의 배취 사이에 변화에 대한 수 및 이의 상응하는 표적 서열과 관련하여 각각의 표준 서열의 증폭 특성을 보정할 수 있다. 이것이 흔히 거대한 수의 유전자자리를 동시에 분석하는데 필수적임을 고려할 때, 큰 세트의 표준 서열을 포함하는 혼합물을 생산하는 것이 유용하다. 본 발명의 구현예는 다수의 표준 서열을 포함하는 혼합물을 포함한다. 이상적으로 혼합물 중 각각의 표준 서열의 양은 고도로 정밀한 것으로 알려져 있다. 그러나, 실질적인 문제로서 혼합물, 특히 다량의 상이한 합성 올리고뉴클레오타이드를 포함하는 혼합물의 경우 각각의 표준 서열의 양에 있어서의 변화의 유의적인 양이 존재하므로 당해 목적을 달성하기는 매우 어렵다. 이러한 변화는 다수의 공급원, 예를 들면, 배취 사이의 시험관내 올리고뉴클레오타이드 합성 반응 효율, 용적 측정에 있어서의 비 정확성, 피펫팅에 있어서의 변화를 갖는다. 또한, 이러한 변화는 정확하게 동일한 양으로 정확하게 동일한 세트의 표준 서열을 이론적으로 함유하는 상이한 배취 사이에서 발생할 수 있다. 따라서, 표준 서열의 각각이 배취를 독립적으로 보정하는 것은 흥미로운 것이다. 표준 서열의 배취는 공지된 염색체 조성물의 참조 게놈에 대해 보정될 수 있다. 표준 서열의 배취는 서열분석 프로토콜에 포함된 증폭 단계가 없거나 최소인 표준 서열의 배취를 서열분석함으로써 보정할 수 있다. 본 발명의 구현예는 상이한 표준 서열의 보정된 혼합물을 포함한다. 본 발명의 다른 구현예는 상이한 표준 서열의 혼합물을 보정하는 방법 및 이러한 방법으로 제조된 상이한 표준 서열의 보정된 혼합물을 포함한다.
표준 서열의 대상 혼합물 및 이들을 사용하는 방법의 각종 구현예는 적어도 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개 이상의 표준 서열, 및 또한 다양한 중간체 양을 포함할 수 있다. 표준 서열의 수는 DNA 서열분석을 위해 표적화된 라이브러리의 생성 동안 분석을 위해 선택된 표적 서열의 수와 동일할 수 있다. 그러나, 일부 구현예에서, 작제되는 라이브러리에서 표적화된 영역의 수보다 더 적은 수의 표준 서열을 사용하는 것이 유리할 수 있다. 사용되는 고 배출 DNA 서열분석기의 서열분석능의 한계에 반하여 나타나는 것을 피하기 위하여 보다 적은 수를 사용하는 것이 유리할 수 있다. 표준 서열의 수는 표적화된 영역의 수보다 50% 미만, 표적화된 영역의 수보다 40% 미만, 표적화된 영역의 수보다 30% 미만, 표적화된 영역의 수보다 20% 미만, 표적화된 영역의 수보다 10% 미만, 표적화된 영역의 수보다 5% 미만, 표적화된 영역의 수보다 1% 미만, 및 또한 다양한 중간 값일 수 있다. 예를 들면, 유전자 라이브러리가 특이적인 SNP 함유 유전자자리에 대해 표적화된 15,000개 쌍의 프라이머를 사용하여 생성되는 경우, 15,000개의 표적화된 유전자자리 중 1500개에 상응하는 1500개 표준 서열을 함유하는 적합한 혼합물을 라이브러리 작제의 증폭 단계 전에 가할 수 있다.
라이브러리 작제 동안에 가해진 표준 서열의 양은 상이한 구현예 중에서 현저하게 변할 수 있다. 일부 구현예에서, 각각의 표준 서열의 양은 라이브러리 제조 동안 사용된 게놈 물질 시료 속에 존재하는 표적 서열의 예측된 양과 거의 동일할 수 있다. 다른 구현예에서, 각각의 표준 서열의 양은 라이브러리 제조 동안 사용된 게놈 물질 시료 속에 존재하는 표적 서열의 예측된 양보다 크거나 이의 미만일 수 있다. 표적 서열 및 표준 서열의 초기 상대량이 본 발명의 기능에 중요하지 않지만, 당해 양이 라이브러리 제조를 위해 사용된 유전 물질 시료 속에 존재하는 표적 서열의 양 보다 100배 이상 내지 100배 미만의 범위내인 것이 바람직하다. 과도한 양의 표준물은 장치의 소정 실시 시 DNA 서열분석기의 너무나 많은 서열분석능을 사용할 수 있다. 표준 서열의 너무나 적은 양의 사용은 증폭 효율에 있어서 변화의 분석 시 보조가 되기에 불충분한 데이터를 제공할 것이다.
표준 서열은 목적한 증폭된 영역에 대해 뉴클레오타이드 염기 서열에서 매우 유사하게 되도록 선택될 수 있으며; 바람직하게 표준 서열은 분석된 게놈 영역, 즉, "표적 서열"과 정확히 동일한 프라이머-결합 부위를 가진다. 표준 서열은 소정의 유전자자리에서 상응하는 표적 서열과 구별되어야만 한다. 편의성을 위해, 표준 서열의 이러한 구별가능한 영역은 "마커 서열"로 언급될 것이다. 일부 구현예에서, 표적 서열의 마커 서열 영역은 다형성 영역, 예를 들면, SNP를 함유하며, 프라이머 결합 영역에 의해 양쪽 면에서 플랭킹될 수 있다. 표준 서열은 상응하는 표적 서열의 GC 함량에 밀접하게 일치하도록 선택될 수 있다. 일부 구현예에서, 표준 서열의 프라이밍 결합 영역은 공통의 프라이밍 부위에 의해 플랭킹된다. 이들 공통의 프라이밍 부위는 분석을 위한 게놈 라이브러리에 사용된 공통의 프라이밍 부위와 일치하도록 선택된다. 다른 구현예에서, 표준 서열은 공통의 프라이밍 부위를 가지지 않으며 공통의 프라이밍 부위는 라이브러리의 생성 동안 가해진다. 표준 서열은 전형적으로 일본쇄 형태로 제공된다. 표준 서열은 상응하는 표적 서열 및 표적 서열을 증폭시키는데 사용된 서열 특이적인 시약과 관련하여 정의된다. 일부 구현예에서, 표적 서열는 목적한 다형성, 예를 들면, 분석을 위한 핵산 시료 속에 존재하는 SNP, 결실, 또는 삽입을 함유한다. 표준 서열은 표적 서열에 대해 뉴클레오타이드 염기 서열 내에서 유사한 합성 폴리뉴클레오타이드이지만, 그럼에도 불구하고 적어도 하나의 뉴클레오타이드 염기 차이로 인하여 표적 서열로부터 구별됨으로써, 표준 서열에서 기원한 앰플리콘 서열을 표적 서열에서 기원한 앰플리콘 서열과 구분하는 매카니즘을 제공한다. 표준 서열은 동일한 세트의 증폭 시약, 예를 들면, PCR 프라이머로 증폭되는 경우 상응하는 표적 서열과 필수적으로 동일한 증폭 특성을 가지도록 선택된다. 일부 구현예에서, 표준 서열은 상응하는 표적 서열보다 동일한 프라이머 서열 결합 부위를 가질 수 있다. 다른 구현예에서, 표준 서열은 상응하는 표적 서열보다 상이한 프라이머 서열 결합 부위를 가질 수 있다. 일부 구현예에서, 표준 서열은 상응하는 표적 서열로부터 생산된 앰플리콘의 길이와 동일한 길이를 갖는 앰플리콘을 생산하도록 선택될 수 있다. 다른 구현예에서, 표준 서열은 상응하는 표적 서열로부터 생산된 앰플리콘의 길이보다 약간 상이한 길이를 갖는 앰플리콘을 생산하기 위해 선택될 수 있다.
증폭 반응이 완료된 후, 라이브러리를 고 배출 DNA 서열분석기 상에서 서열분석하며, 여기서 개개 분자는 클론적으로 증폭되어 서열분석된다. 표적 서열의 각각의 대립유전자에 대한 서열 판독물의 수가 계수되며, 또한 표적 서열에 상응하는 표준 서열에 대한 서열 판독물의 수가 계수된다. 당해 공정은 또한 표적 서열의 적어도 하나의 다른 쌍 및 상응하는 표준 서열에 대해 수행된다. 예를 들면, 유전자자리 A를 고려할 때, 유전자자리 A의 대립유전자 1에 대한 X_A1 판독물이 생산되며; 유전자자리 A의 대립유전자 2에 대한 X_A2 판독물이 생산되고 표준 서열 A에 대한 X_AC 이 생산된다. (X_A1 및 X_A2) 대 X_AC의 비는 목적한 각각의 유전자자리에 대해 측정된다. 앞서 논의한 바와 같이, 당해 공정은 참조 게놈, 예를 들면, 모든 염생체에 대해 이배체인 것으로 공지된 게놈에서 수행될 수 있다. 당해 공정은 거대한 수의 판독물 값을 제공하기 위해 수회 반복함으로써 판독물의 평균 수 및 다수의 판독물 중 표준 편차를 측정할 수 있다. 당해 공정은 상이한 유전자자리에 상응하는 거대 수의 상이한 표준 서열을 포함하는 혼합물로 수행한다. (1) X_A1 및 X_A2가 공지된 수의 염색체, 예를 들면, 정상의 사람 여성 게놈에 대해 2에 상응하고 (2) 표준 서열이 이들의 상응하는 천연의 유전자자리와 유사한 증폭(및 검출가능성) 특성을 가지는 것으로 추정함으로써, 다중 표준 혼합물 속의 상이한 표준 서열의 상대적인 양을 측정할 수 있다. 이후에, 보정된 다중 표준 서열을 사용하여 다중 증폭 반응 중 상이한 유전자자리 사이의 증폭 효능에 있어서의 가변성에 대해 조정할 수 있다.
본 발명의 다른 구현예는 서열분석에 의한 정량화를 방해할 수 있는 큰 결실에 의해 특성화된 중복체 및 돌연변이체 유전자를 포함하는, 목적한 특수 유전자의 카피 수를 측정하기 위한 방법 및 조성물을 포함한다. 서열분석은 이러한 결실을 갖는 대립유전자를 검출하는 문제를 가질 수 있다. 증폭 공정에 포함된 표준 서열은 이러한 문제를 감소시키기 위해 사용될 수 있다.
본 발명의 일 구현예에서 분석을 위한 표적 서열은 결실에 의해 특성화된 야생형(즉, 기능성) 형태 및 돌연변이체 형태를 갖는 유전자이다. 이러한 유전자의 예는 SMN1, 즉 유전병인 척추 근위축증(SMA)에 관여하는 결실을 갖는 대립유전자가다. 고 배출 유전 서열분석 기술을 이용하는 유전자의 돌연변이 형태를 수반하는 개체를 검출하는 것은 흥미로운 것이다. 이러한 기술을 결실 돌연변이의 검출에 적용하는 것은, 다른 이유들 중에서도, 서열 분석에서 관찰된 서열의 결실(단순한 점 돌연변이 또는 SNP를 검출하는 것과 대치되는 것으로서)로 인하여 문제가 될 수 있다. 이러한 구현예는 (1) 목적한 유전자에 대해 특이적인 증폭 프라이머의 쌍(여기서, 증폭 시 프라이머는 목적한 유전자(또는 이의 일부)를 증폭시킬 것이고 돌연변이체 대립유전자를 유의적으로 증폭시키지 않을 것이다), (2) 목적한 유전자(즉, 표적 서열)의 야생형 대립유전자에 상응하지만 적어도 하나의 검출가능한 뉴클레오타이드 염기에 의해 상이한 표준 서열, (3) 참조 서열로서 제공되는 제2의 표적 서열에 대해 특이적인 증폭 프라이머의 쌍, 및 (4) 참조 서열에 상응하는 표준 서열을 사용한다.
본 발명의 하나의 양태에서, 목적한 유전자의 카피의 수를 측정하기 위한 방법이 제공되며, 여기서 목적한 유전자는 결실을 포함하는 대립유전자를 의미하는 것을 갖는다. 당해 방법은, 목적한 유전자의 적어도 일부, 또는 목적한 유전자 전체, 또는 목적한 유전자에 인접한 영역을 증폭시키지만, 목적한 유전자의 대립유전자를 포함하는 결실은 증폭시키지 않음으로써, 목적한 유전자에 대해 특이적인, 목적한 유전자, 예를 들면, PCR 프라이머에 대해 특이적인 증폭 시약을 사용할 수 있다. 또한 당해 방법은 목적한 유전자에 상응하는 표준 서열을 사용하며, 여기서 표준 서열은 목적한 유전자의 적어도 하나의 뉴클레오타이드 염기가 상이하다(따라서 표준 서열의 서열은 천연적으로 존재하는 목적한 유잔자와는 용이하게 구별될 수 있다). 전형적으로, 표준 서열은 목적한 유전자와 동일한 프라이머 결합 부위를 함유함으로써 목적한 유전자와 목적한 유전자에 상응하는 표준 서열 사이의 어떠한 증폭 구별도 최소화할 것이다. 반응은 또한 참조 서열에 대해 특이적인 증폭 시약을 포함할 것이다. 참조 서열은 분석될 게놈내에서 공지된(또는 공지된 것으로 적어도 추정된) 카피 수의 서열이다. 반응은 또한 참조 서열에 상응하는 표준 서열을 포함한다. 전형적으로, 참조 서열에 상응하는 표준 서열은 참조 서열과 동일한 프라이머 결합 부위를 함유함으로써 참조 서열과 참조 서열에 상응하는 표준 서열 사이에 어떠한 증폭 구별도 최소화할 것이다.
예시적인 핵산 시료
일부 구현예에서, 유전 시료를 제조하고/하거나 정제할 수 있다. 이러한 목적을 달성하기 위한 당해 분야에 공지된 다수의 표준 과정이 존재한다. 일부 구현예에서, 시료는 원심분리하여 다양한 층으로 분리할 수 있다. 일부 구현예에서, DNA는 여과를 사용하여 분리할 수 있다. 일부 구현예에서, DNA의 제조는 증폭, 분리, 크로마토그래피에 의한 정제, 액체 액체 분리(liquid liquid separation), 분리(isolation), 차등적 농축, 차등적 증폭, 표적화된 증폭, 또는 당해 분야에 공지되거나 본원에 기술된 특정 수의 다른 기술을 포함할 수 있다.
일부 구현예에서, 본원에 개시된 방법은, 시험관 수정, 또는 법의학적 상황에서와 같이 극소량의 DNA가 존재하는 상황에서 사용될 수 있었으며, 여기서 하나 또는 소수의 세포가 이용가능하다(전형적으로 10개 세포 미만, 20개 세포 미만 또는 40개 세포 미만). 이들 구현예에서, 본원에 개시된 방법은 다른 DNA에 의해 오염되지 않은 소량의 DNA로부터 배수성 요청을 제조하기 위해 제공되지만, 여기서 배수성 요청은 소량의 DNA로는 매우 어렵다. 일부 구현예에서, 본원에 개시된 방법은, 표적 DNA가 예를 들면, 태아 검진, 부계 시험, 또는 인지 시험의 생성물과 관련하여 다른 개체의 DNA로 오염된 상황에서 사용될 수 있었다. 이들 방법이 특히 유리할 수 있는 일부 다른 상황은 암 시험의 경우일 수 있으며, 여기서 1개 또는 소수의 세포는 다량의 정상 세포 중에 존재하였다. 이들 방법의 일부로서 사용된 유전적 측정은 DNA 또는 RNA를 포함하는 특정 시료, 예를 들면, 혈액, 혈장, 체액, 뇨, 모발, 눈물, 타액, 조직, 피부, 손톱, 난할구, 배아, 양수액, 융모막 융모 시료, 대변, 담즙, 림프, 경부 점액, 정액, 또는 핵산을 포함하는 다른 세포 또는 물질을 포함하나, 이에 한정되지 않는 시료 속에서 이루어질 수 있었다. 일 구현예에서, 본원에 기술된 방법은 서열분석, 미세배열, qPCR, 디지탈 PCR, 또는 핵산을 측정하는데 사용된 다른 방법과 같은 핵산 검출 방법으로 수행할 수 있었다. 일부 이유로 이것이 바람직한 것으로 발견된 경우, 유전자자리에서 대립유전자 수 가능성의 비를 계산하고, 당해 대립유전자 비를 사용하여 본원에 기술된 방법 중의 일부와 함께 배수성 상태를 측정하여 당해 방법이 적합한지를 제공한다. 일부 구현예에서, 본원에 개시된 방법은 컴퓨터 상에서 프로세싱된 시료에서 이루어진 DNA 측정에 의하여 다수의 다형성 유전자자리에서 대립유전자 비를 계산하는 단계를 포함한다. 일부 구현예에서, 본원에 개시된 방법은 컴퓨터 상에서, 본 기재내용에 기술된 다른 개선의 어떠한 조합과 함께, 프로세싱된 시료에서 이루어진 DNA 측정에 의하여 다수의 다형성 유전자자리에서의 대립유전자 비를 계산하는 단계를 포함한다.
일부 구현예에서, 당해 방법을 사용하여 단일 세포, 소수의 세포, 2 내지 5개의 세포, 6 내지 10개의 세포, 10 내지 20개의 세포, 20 내지 50개의 세포, 50 내지 100개의 세포, 100개 내지 1,000개의 세포, 또는 소량의, 예를 들면, 1 내지 10 피코그램, 10 내지 100 피코그램, 100 피코그램 내지 1 나노그램, 1 내지 10 나노그램, 10 내지 100 나노그램, 또는 100 나노그램 내지 1 마이크로그램의 세포외 DNA를 유전형분석할 수 있다.
예시적인 RNA 발현 연구
본 발명의 다중 PCR 방법을 사용하여 유전자 발현 프로파일링 실험 동안에 평가될 수 있는 표적 유전자자리의 수를 증가시킬 수 있다. 예를 들면, 수천개의 유전자의 발현 수준을 동시에 모니터링하여 개체가 질병(예를 들면, 암)과 관련되거나 질병의 증가된 위험과 관련된 서열(예를 들면, 다형성 또는 다른 돌연변이)를 가지는지를 측정할 수 있다. 이들 방법은 질병이 있거나 없는 환자에서 채취한 시료 속에서 유전자 발현(예를 들면, 특정한 mRNA 대립유전자의 발현)을 비교함으로써 암과 같은 질병에 대한 증가되거나 감소된 위험과 관련된 서열(예를 들면, 다형성 또는 다른 돌연변이)를 확인할 수 있다. 또한, 특정한 치료, 질병, 또는 유전자 발현에 있어서 발달 단계의 효과를 측정할 수 있다. 유사하게, 이들 방법을 사용하여 이의 발현이 감염된 및 감염되지 않은 세포 또는 조직에서 유전자 발현을 비교함으로써 병원체 또는 다른 유기체에 대한 반응시 변화되는 유전자를 확인할 수 있다. 이들 방법에서 서열분석 판독물의 수는 분석되는 다형성의 빈도를 기반으로 조정함으로써 충분한 판독물이 다형성에 대해 수행되어 이들이 존재하는지를 검출하도록 할 수 있다.
일부 구현예에서, RNA(예를 들면, mRNA)를 함유하는 시료는 리버스 트랜스퍼라제(RT)를 사용하여 증폭시키며 수득되는 DNA(예를 들면, cDNA)는 DNA 폴리머라제(PCR)를 사용하여 증폭시킨다. RT 및 PCR 단계는 동일한 반응 용적 속에서 연속으로 또는 별도로 수행할 수 있다. 본 발명의 프라이머 라이브러리 중 어느 것도 당해 역 전사 폴리머라제 연쇄 반응(RT-PCR) 방법에서 사용할 수 있다. 다양한 구현예에서, 역 전사는 올리고-dT, 무작위 프라이머(randem primer), 올리고-dT와 무작위 프라이머의 혼합물, 또는 표적 유전자자리에 대해 특이적인 프라이머를 사용하여 수행된다. 오염되는 게놈 DNA의 증폭을 피하기 위해, RT-PCR에 대한 프라이머를 설계함으로써 하나의 프라이머 중의 일부가 하나의 엑손의 3' 말단에 하이브리드화하고 프라이머의 다른 부분이 인접한 엑손의 5' 말단에 하이브리드화하도록 한다. 이러한 프라이머는 스플라이싱된 mRNA로부터 합성된 cDNA에 아닐링되지만, 게놈 DNA에는 어닐링되지 않는다. 오염되는 DNA의 증폭을 검출하기 위해, RT-PCR 프라이머 쌍을 적어도 하나의 인트론을 함유하는 영역에 플랭킹하도록 설계할 수 있다. cDNA(인트론 비 함유)로부터 증폭된 생성물은 게놈 DNA(인트론 없음)로부터 증폭된 것보다 더 작다. 생성물에 있어서 크기 차이를 사용하여 오염되는 DNA의 존재를 검출한다. 일부 구현예에서 mRNA 서열만이 알려져 있는 경우, 적어도 300 내지 400개 염기쌍이 떨어져 있는 프라이머 어닐링 부위를 선택하는데, 이는 진핵 세포 DNA의 이러한 크기의 단편이 스플라이스 연결부를 함유하는 경향이 있기 때문이다. 달리는, 시료를 DNase로 처리하여 오염되는 DNA를 감퇴시킬 수 있다.
부계 시험을 위한 예시적인 방법
본 발명의 다중 PCR 방법을 사용하여 부계 시험의 정밀도를 개선시킬 수 있는데, 이는 많은 표적 유전자자리가 한번에 분석될 수 있기 때문이다(참조: 예를 들면, 이의 전문이 본원에 참조로 포함된, 2011년 12월 22일자로 출원된 미국 특허 공개 공보 제2012/0122701호). 예를 들어, 다중 PCR 방법은, 수천개의 다형성 유전자자리(예를 들면, SNP)가 본원에 기술된 PARENTAL SUPPORT알고리즘에서 사용하기 위해 분석되도록 함으로써 주장된 부친이 태아의 생물학적 부친인지를 판별할 수 있다. 일부 구현예에서 당해 방법은 (i) 주장된 부친의 유전 물질에 있어서 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리를 포함하는 다수의 다형성 유전자자리를 동시에 증폭시켜 제1 세트의 증폭된 생성물을 생산하는 단계; (ii) 임신한 모친의 혈액 시료에서 기원하는 DNA의 혼합 시료에서 상응하는 다수의 다형성 유전자자리를 동시에 증폭시켜 제2 세트의 증폭된 생성물을 생산하는 단계(여기서, DNA의 혼합된 시료는 태아 DNA 및 모계 DNA를 포함한다); (iii) 제1 및 제2 세트의 증폭된 생성물을 기반으로 유전형 측정을 사용하여 주장된 부친이 태아의 생물학적 부친인 가능성을 컴퓨터 상에서 측정하는 단계; 및 (iv) 주장된 부친이 태아의 생물학적 부친인 측정된 가능성을 사용하여 주장된 부친이 태아의 생물학적 부친인지를 확립하는 단계를 포함한다. 다양한 구현예에서, 당해 방법은 모친의 유전 물질에서 상응하는 다수의 다형성 유전자자리를 동시에 증폭시켜 제3 세트의 증폭된 생성물을 생산하는 단계를 포함하며; 여기서 주장된 부친이 태아의 생물학적 부친인 가능성은 제1, 제2, 및 제3 세트의 증폭된 생성물을 기반으로 한 유전형 측정을 사용하여 측정한다.
배아 특성화 및 선택을 위한 예시적인 방법
본 발명의 다중 PCR 방법을 사용하여 수천개의 표적 유전자자리를 한번에 분석하도록 함으로써 시험관내 수정을 위한 배아의 선택을 개선시킬 수 있다(참조: 예를 들면, 이의 전문이 본원에 참조로 포함된, 2008년 5월 27일자로 출원되고, 2011년 12월 22일자로 출원된 미국 특허 공개 공보 제2011/0092763호). 예를 들면, 다중 PCR 방법은 수천개의 다형성 유전자자리(예를 들면, SNP)가 본원에 기술된 PARENTAL SUPPORT 알고리즘에서 사용하기 위해 분석되도록 함으로써 시험관 수정을 위한 배아의 세트 중에서 배아를 선택하도록 할 수 있다.
일부 구현예에서, 본 발명은, 배아들의 세트에서 비롯된 각각의 배자녀 원하는 대로 발달할 상대적 가능성을 추정하는 방법을 제공한다. 일부 구현예에서, 당해 방법은 각각의 배아의 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리와 접촉시켜 각각의 배아에 대한 반응 혼합물을 생산하는 단계(여기서 시료는 각각 배아의 하나 이상의 세포에서 채취한다)를 포함한다. 일부 구현예에서, 각각의 반응 혼합물은 프라이머 연장 반응 조건에 적용시켜 증폭된 생성물을 생산한다. 일부 구현예에서, 당해 방법은 컴퓨터 상에서 증폭된 생성물을 기반으로 하여 각 배아의 적어도 하나의 세포의 하나 이상의 특성을 측정하는 단계; 및 컴퓨터 상에서 각각의 배아에 대한 적어도 하나의 세포의 하나 이상의 특성을 기반으로 하여, 각각의 배자녀 경우에 따라 발달할 상대적인 가능성을 평가하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 본원에 기술된 PARENTAL SUPPORT 알고리즘과 같은 적어도 하나의 특징을 측정하는 정보학 기반 방법을 사용함을 포함한다. 일부 구현예에서, 특징은 배수성 상태를 포함한다. 일부 구현예에서, 특징은 이수성, 정배수성, 모자이크, 결염색체성(nul lsomy), 일염색체성, 편친 이염색체성(uniparental disomy), 삼염색체성, 사염색체성(tetrasomy), 이수성의 유형, 일치하지 않는 카피 오차 삼염색체성, 일치한 카피 오차 삼염색체성, 이수성의 모계 기원, 이수성의 부친 기원, 질병-결합된 유전자의 존재 또는 부재, 특정의 이수성 염색체의 염색체적 실체, 비정상적인 유전 조건, 결실 또는 중복, 특징의 가능성, 및 이의 조합을 포함한다. 특징은 염색체 1, 염색체 2, 염색체 3, 염색체 4, 염색체 5, 염색체 6, 염색체 7, 염색체 8, 염색체 9, 염색체 10, 염색체 11, 염색체 12, 염색체 13, 염색체 14, 염색체 15, 염색체 16, 염색체 17, 염색체 18, 염색체 19, 염색체 20, 염색체 21, 염색체 22, X 염색체 또는 Y 염색체, 및 이의 조합으로 이루어진 그룹 중에서 선택된 염색체와 연관될 수 있다.
예시적인 부계 진단 방법
본 발명의 다중 PCR 방법을 사용하여 태아 염색체의 배수성 상태의 측정과 같은, 부계 진단 방법을 개선시킬 수 있다. 동시에 증폭될 수 있는 다수의 표적 유전자자리를 제공하여, 보다 정밀한 측정을 달성할 수 있다.
일 구현예에서, 본 기재내용은 DNA의 혼합된 시료(즉, 태아 모친의 DNA, 및 태아의 DNA)의, 모친의, 그리고 가능하게는 또한 부계 유전 물질의 시료로 측정된 유전형 데이터에 의하여 잉태된 태아 속의 염색체의 배수성 상태를 측정하고(여기서, 당해 측정은 결합 분포 모델을 사용함으로써 부모계 유전형 데이터를 제공한 상이한 가능한 태아 배수성 상태에 대해 예측된 대립유전자 분포의 세트를 생성함으로써 수행한다); 예측된 대립유전자 분포를 혼합된 시료 속에서 측정된 실제 대립유전자 분포와 비교하며, 예측된 대립유전자 분포 패턴이 관찰된 대립유전자 분포 양식과 가장 근접하게 일치하는 배수성 상태를 선택하기 위한 생체외 방법을 제공한다. 일 구현예에서, 혼합된 시료는 모계 혈액, 또는 모계 혈청 또는 혈장에서 채취한다. 일 구현예에서, DNA의 혼합된 시료는 표적 유전자자리(예를 들면, 다수의 다형성 유전자자리)에 우선적으로 농축될 수 있다. 일 구현예에서, 우선적 농축은 대립유전자 편향을 최소화시키는 방식으로 수행된다. 일 구현예에서, 본 기재내용은 다수의 유전자자리에서 우선적으로 농축됨으로써 대립유전자 편향이 낮도록 하는 DNA의 조성에 관한 것이다. 일 구현예에서, 대립유전자 분포(들)은 혼합된 시료의 DNA를 서열분석함으로써 측정한다. 일 구현예에서, 결합 분포 모델은, 대립유전자가 바이어스적 양식으로 분포될 것으로 추측한다. 일 구현예에서, 예측된 결합 대립유전자 분포의 세트는, 예를 들면, International HapMap Consortium의 데이터를 사용하여 다양한 공급원의 현존하는 재조합 빈도를 고려하면서, 유전적으로 결합된 유전자자리에 대해 생성한다.
일 구현예에서, 본 기재내용은 구체적으로 DNA 혼합물 에서 측정된 유전형 데이터에서 다수의 다형성 유전자자리에서 대립유전자 측정을 관찰함으로써 태아의 이수성 상태를 측정하는, 비-침입성 부계 진단(NPD) 방법을 제공하며, 여기서 특정의 대립유전자 측정은 이수성 태아의 지표인 반면, 다른 대립유전자 측정은 정배수성 태아의 지표이다. 일 구현예에서, 유전형 데이터는 모계 혈장의 DNA 혼합물을 서열분석하여 측정한다. 일 구현예에서, DNA 시료는, 이의 대립유전자 분포가 계산되는 다수의 유전자자리에 상응하는 DNA 분자내에 우선적으로 농축될 수 있다. 일 구현예에서 모친의 유전물질만을 유일하게 또는 대부분 포함하는 DNA의 시료 및 가능하게는 또한 부계 유전물질만을 유일하게 또는 대부분 포함하는 DNA의 시료를 측정한다. 일 구현예에서, 평가된 태아 분획과 함께 한쪽 또는 양쪽 부모의 유전적 측정을 사용하여 태아의 상이한 가능하게 잠재하는 유전 상태에 상응하는 다수의 예측된 대립유전자 분포를 생성하며; 예측된대립유전자 분포는 가설로 명명될 수 있다. 일 구현예에서, 모계 유전 데이터는 천연에서 배타적으로 또는 거의 배타적으로 모인 유전 물질을 측정함에 의해 측정되지 않으며, 오히려, 이는 모계 및 부계 DNA의 혼합물을 포함하는 모계 혈장에서 이루어진 유전적 측정에 의하여 평가된다. 일부 구현예에서, 가설은 하나 이상의 염색체에서 태아의 배수성을 포함할 수 있으며, 태아내 염색체의 분절은 부모, 및 이의 조합으로부터 유전되었다. 일부 구현예에서, 태아의 배수성 상태는 관찰된 대립유전자 측정을 상이한 가설과 비교하고(여기서, 가설의 적어도 일부는 상이한 배수성 상태에 상응한다), 대부분 관찰된 대립유전자 측정이 진실인 경향이 있는 가설에 상응한는 배수성 상태를 선택함으로써 측정된다. 일 구현예에서, 당해 방법은, 유전자자리가 동종접합성 또는 이형접합성인 것에 상관없이 일부 또는 모든 측정된 SNP의 대립유전자 측정 데이터를 사용함을 포함하므로 단지 이형접합성인 유전자자리의 대립유전자를 사용함을 포함하지 않는다. 당해 방법은, 유전 데이터가 단지 하나의 댜형성 유전자자리에 관한 것인 상황에 적절하지 않을 수 있다. 당해 방법은, 유전 데이터가 표적 염색체에 대해 10개 이상의 다형성 유전자자리 또는 20개 이상의 다형성 유전자자리에 대한 데이터를 포함하는 경우 특히 유리한다. 당해 방법은, 유전 데이터가 표적 염색체에 대해 50개 이상의 다형성 유전자자리, 표적 염색체에 대해 100개 이상의 다형성 유전자자리 또는 200개 이상의 다형성 유전자자리를 포함하는 경우에 특히 유리하다. 일부 구현예에서, 유전 데이터는 표적 염색체에 대해 500개 이상의 다형성 유전자자리, 표적 염색체에 대해 1,000개 이상의 다형성 유전자자리, 2,000개 이상의 다형성 유전자자리, 또는 5,000개 이상의 다형성 유전자자리를 포함할 수 있다.
일 구현예에서, 본원에 개시된 방법은 다형성 유전자자리에서 각각의 대립유전자의 독립된 관찰의 수의 정량적인 척도를 생성한다. 이는 2개의 대립유전자의 비에 대한 정보를 제공하지만 하나의 대립유전자의 독립된 관찰의 수를 정량화하지 않는 미세배열 또는 정량적 PCR과 같은 대부분의 방법과는 상이하다. 독립된 관찰의 수에 관한 정량적 정보를 제공하는 방법을 사용하면, 배수성 계산에 당해 비 만이 이용되지만, 자체에 의한 정량적 정보는 유용하지 않다. 독립된 관찰의 수에 대한 정보를 보유하는 중요성을 설명하기 위해, 2개의 대립유전자, A 및 B를 지닌 시료 유전자자리를 고려한다. 제1의 실험에서 20개의 A 대립유전자 및 20개의 B 대립유전자가 관찰되며, 제2 실험에서 200개의 A 대립유전자 및 200개의 B 대립유전자가 관찰된다. 실험 둘 모두에서, 비 (A/(A+B))는 0.5와 동일하지만, 제2의 실험은 A 또는 B 대립유전자의 빈도의 특정성에 대해 첫번째보다 더 많은 정보를 제공한다. 다른 것에 의한 일부 방법은 개체 대립유전자의 대립유전자 비 (채널 비)(즉, x_i/y_i)를 평균내거나 합하는 것을 포함하며 이를 참조 염색체에 대해 비교하거나 당해 비가 특정한 상황에서 작동하는 것으로 예측되는 방법에 관한 법칙을 사용함으로써, 당해 비를 분석한다. 대립유전자 칭량은 이러한 방법을 내포하지 않으며, 여기서 각각의 대립유전자에 대한 동일한 양의 PCR 생성물에 대해 보증할 수 있고 모든 대립유전자는 동일한 방식으로 작동할 수 있음이 보증된다. 이러한 방법은 다수의 장점을 가지며, 보다 중요하게는, 당해 기재내용에서 어딘가에 기술된 다수의 개선의 사용을 방해한다.
일 구현예에서, 본원에 기대된 방법은 이염색체에서 예측된 대립유전자 빈도 분포 및 감수분열 I 동안의 비분리(nondisjunction), 감수분열 II 동안의 비분리, 및/또는 태아 발달 초기의 유사분열 동안의 비분리로부터 생성되는 삼염색체성의 경우에 예측될 수 있는 다수의 대립유전자 빈도 분포를 전적으로 모델로 한다. 이것이 중요한 이유를 나열하기 위해, 교차가 없는 경우를 상상하면: 감수분열 I 동안의 비분리는, 2개의 상이한 동족체가 하나의 부모로부터 유전되는 삼염색체성을 생성할 수 있으며; 대조적으로 감수분열 II 동안 또는 태아 발달의 초기의 유사분열 동안의 비분리는 하나의 부모로부터 동일한 동족체의 2개 카피를 생성할 수 있다. 각각의 시나리오는 각각의 다형성 유전자자리에서 및 또한 유전적 연결로 인해, 공동으로 고려된 모든 유전자자리에서 상이한 예측된 대립유전자 빈도를 생성할 수 있다. 동족체 사이의 유전 물질의 교환을 초래하는 교차는 유전 양식을 보다 복잡하게 하며; 일 구현예에서, 본 발명은 유전자자리 사이의 물리적 거리 외에 재조합 비 정보를 사용함으로써 이를 수용한다. 일 구현예에서, 감수분열 I 비분리와 감수분열 II 또는 유사분열 비분리 사이의 개선된 구별을 가능하도록 하기 위해, 본 발명은, 동원체에서부터의 거리가 증가하면서 모델내로 증가하는 교차 가능성을 포함시킨다. 감수분열 II 및 유사분열 비분리는, 유사분열 비분리가 하나의 동족체의 동일하거나 거의 동일한 카피를 생성하지만 2개의 동족체는 배우자형성 동안 하나 이상의 교차로 인하여 흔히 상이한 감수분열 II 비분리 현상을 수반함을 나타낸다는 사실로 인하여 구별될 수 있다.
일부 구현예에서, 본원에 개시된 방법은 관찰된 대립유전자 측정을 가능한 태아 유전적 이수성에 상응하는 이론적 가설과 비교하는 단계를 포함하며, 이형접합성 유전자자리에서 대립유전자의 비를 정량화하는 단계는 포함하지 않는다. 유전자자리의 수가 약 20개 이하인 경우, 이형접합성 유전자자리에서 대립유전자가 비를 정량화하는 단계를 포함하는 방법을 사용하여 이룬 배수성 측정 및 가능한 대아 유전 상태에 상응하는 이론적인 대립유전자 분포 가설에 대해 관찰된 대립유전자 측정을 비교하는 단계를 포함하는 방법을 사용하여 이룬 배수성 측정은 유사한 결과를 제공한다. 그러나, 유전자자리의 수가 50개 초과인 경우, 이들 2개의 방법은 유의적으로 상이한 결과를 제공하는 경향이 있는데; 여기서 유전자자리의 수가 400 초과, 1,000초과 또는 2,000 초과인 경우, 이들 2개의 방법은 크게 유의적으로 상이한 결과를 제공하는 경향이 매우 크다. 이러한 차이는, 각각의 대립유전자의 크기를 독립적으로 측정하지 않고 이형접합성 유전자자리에서 대립유전자의 비를 정량화하는 단계 및 당해 비를 합하거나 평균을 내는 단계를 포함하는 방법은 결합 분포 모델을 사용하고/하거나, 연결 분석을 수행하고/하거나, 바이어스 분포 모델을 사용하고/하거나 진전된 다른 통계적 기술을 사용함을 포함하는 기술의 사용을 배제하지만, 관찰된 대립유전자 측정을 가능한 태아 유전 상태에 상응하는 이론적인 대립유전자 분포 가설과 비교하는 단계를 포함하는 방법을 사용하는 것은 측정의 정밀도를 실질적으로 증가시킬 수 있는 이러한 기술을 사용할 수 있다는 사실에 기인한다.
일 구현예에서, 본원에 개시된 방법은 관찰된 대립유전자 측정의 분포가 결합 분포 모델을 사용한 정배수성 또는 이수성 태아의 지표인지를 측정하는 단계를 포함한다. 결합 분포 모델의 사용은, 수득되는 측정이 유의적으로 보다 높게 정밀하다는 점에서 다형성 유전자자리를 독립적으로 처리함으로써 이형접합성 비율을 측정하는 방법보다 상이하고 유의적으로 개선되어 있다. 어느 특정 이론에 얽메이지 않고, 이들이 보다 정밀한 한가지 이유는, 결합 분포 모델이 SNP와, 태아로 성장하는 배아를 형성한 생식세포를 발생시킨 감수분열 동안에 발생한 교차 가능성 사이의 연결을 고려한다는 것이라고 여겨진다. 하나 이상의 가설에 대한 대립유전자 측정의 예측된 분포를 생성하는 경우 연결 개념을 사용하는 목적은, 이것이 연결을 사용하지 않는 경우보다 현저히 우수하게 현실성에 상응하는 예측된 대립유전자 측정 분포를 생성하도록 한다는 것이다. 예를 들면, 2개의 SNP가 존재한다고 상상할 때, 1 및 2는 서로 인접하여 위치하고, 모는 동족체 1에서 SNP 1의 A 및 SNP 2의 A에 있으며, 동족체 1에서 SNP 1의 B 및 SNP 2의 B에 있다. 부친이 동족체 둘 모두에서 SNP 둘 모두에 대해 A에 있고, B가 태아 SNP 1에 대해 측정되는 경우, 이는, 동족체 2가 태아에 의해 유전되었므로, SNP 2에서 태아에 존재하는 B의 매우 높은 가능성이 존재함을 나타낸다. 연관(linkage)을 고려한 모델은 이를 예측할 수 있는 반면, 연관을 고려하지 않는 모델은 이를 고려할 수 없다. 그렇지 않으면, 모친이 SNP1에서 AB 이고 SNP 2 근처에서 AB였던 경우, 당해 위치에서 모계 삼염색체성에 상응하는 2개의 가설을 사용할 수 있었는데 - 하나는 일치하는 카피 오차(감수분열 II 또는 초기 태아 발달 시 유사분열)를 포함하고, 하나는 일치하지 않은 카피 오차(감수분열 I에서 비분리)를 포함한다. 일치하는 카피 오차 삼염색체성의 경우에, 태자녀 SNP 1에서 모친으로부터 AA를 유전받은 경우, 당해 태아는 SNP 2에서 모친으로부터 AA 또는 BB를 유전받지만 AB는 유전받지 않을 가능성이 훨씬 더 크다. 일치하지 않는 카피 오차의 경우에, 태아는 SNP 둘 모두에서 모친으로부터 AB를 유전받을 수 있다. 연관을 고려하여 배수성 요청 방법으로 달성한 대립유전자 분포 가설은 이러한 예측을 이룰 수 있으므로, 연관을 고려하지 않는 배수성 요청 방법보다 고려할만하게 높은 정도로 실제 대립유전자 측정에 상응한다. 연결 접근법은, 대립유전자 비를 계산하고 이들 대립유전자 비를 합하는 것에 의존하는 방법을 사용하는 경우 불가능함에 주목한다.
관찰된 대립유전자 측정을 가능한 태아 유전 상태에 상응하는 이론적 가설과 비교하는 단계를 포함하는 방법을 사용한 배수성 측정은 보다 더 정밀하다고 여겨지는 한가지 이유는, 서열분석을 사용하여 대립유전자를 측정하는 경우에, 당해 방법이, 판독물의 전체 수가 다른 방법보다 적은 경우 대립유전자의 데이터에서 보다 많은 정보를 모을 수 있기 때문인데; 예를 들어, 대립유전자를 계산하여 합하는 단계에 의존하는 방법은 불균형적으로 칭량된 확률적 노이즈를 생산할 수 있다. 예를 들면, 서열분석을 사용하여 대립유전자를 측정함을 포함한 경우, 및 유전자자리의 세트가 존재하였고 단지 5개의 서열 판독물이 각각의 유전자자리에 대해 검출되었던 경우를 상상한다. 일 구현예에서, 대립유전자 각각에 대해, 데이터를 가설된 대립유전자 분포와 비교하여 서열 판독물의 수에 따라 칭량함으로써 이들 측정에 의한 데이터를 적절하게 칭량학 전체 측정에 포함시킬 수 있다. 이는 이형접합성 유전자자리에서 대립유전자의 비를 정량화하는 단계를 포함한 방법과 대조적인데, 그 이유는, 당해 방법이 가능한 대립유전자 비로서 0%, 20%, 40%, 60%, 80% 또는 100%의 비 만을 계산할 수 있으며; 이들 중 어느 것도 예측된 대립유전자 비에 근접할 수 없기 때문이다. 후자의 경우에, 계산된 대립유전자 비는 불충분한 판독물로 인하여 폐기되어야 할 수 있거나 부적절한 칭량을 가질 수 있고 확률적 노이즈를 측정에 도입함으로써 측정의 정밀도를 감소시킬 수 있다. 일 구현예에서, 개개의 대립유전자 측정은 독립된 측정으로 처리될 수 있으며, 여기서 동일한 유전자자리에서 대립유전자에 대해 이루어진 측정 사이의 관계는 상이한 유전자자리에서 대립유전자에 대해 이루어진 측정 사이의 관계와 상이하지 않다.
일 구현예에서, 본원에 개시된 방법은 관찰된 대립유전자 측정의 분포가 이염색체성(RC 방법으로 명명됨)인 것으로 예측되는 참조 염색체에서 관찰된 대립유전자 측정에 대해 어떠한 미터법도 비교하지 않고 정배수성 또는 이수성 태아의 지표인지를 측정하는 단계를 포함한다. 이는 하나 이상의 예측된 이염색체성 참조 염색체에 대해 예측된 염색체에서 기원하여 무작위적으로 서열분석된 단편의 비를 평가함으로써 이수성을 검출하는 셧건 서열분석을 사용하는 방법과 같은 방법보다 유의적으로 개선되어 있다. 이러한 RC 방법은, 예측된 이염색체성 참조 염색체가 실제로 이염색체성이 아닌 경우 부정확한 결과를 생성한다. 이는, 이수성이 단일 염색체의 삼염색체 보다 더 실질적이거나 태자녀 삼염색체성이고 모든 상염색체가 삼염색체성인 경우에서 발생할 수 있다. 여성 삼배체(69, XXX) 태아의 경우, 이염색체성 염색체는 실제로 전혀 존재하지 않는다. 본원에 기술된 방법은 참조 염색체를 필요로 하지 않으며 여성 삼배체 태아에서 삼염색체성 염색체를 정확하게 확인할 수 있다. 각각의 염색체, 가설, 자녀 분획 및 노이즈 수준의 경우, 결합 분포 모델은 참조 염색체 데이터, 전체 자녀 분획 평가, 또는 고정된 참조 가설 중의 어느 것 없이도 적합하게 될 수 있다.
일 구현예에서, 본원에 개시된 방법은 다형성 유전자자리에서 대립유전자를 관찰하는 방법을 사용하여 선행 기술의 방법보다 더 정밀하게 태아의 배수성 상태를 측정함을 입증한다. 일 구현예에서, 당해 방법은 표적화된 서열분석을 사용하여 혼합된 모-태아 유전형을 수득하고 임의로 다수의 SNP에서 모계 및/또는 부계 유전형을 수득하여 상이한 가설하에 다양한 예측된 대립유전자 빈도 분포를 우선 확립한 후, 모-태아 혼합물에서 수득된 정량적인 대립유전자를 관찰하여 어느 가설이 데이터에 가장 잘 맞는지를 평가하며, 여기서 데이터에 가장 잘 맞는 가설에 상응하는 유전 상태는 정확한 유전 상태로 불린다. 일 구현예에서, 본원에 개시된 방법은 또한 요청된 유전 상태가 정확한 유전 상태인 신뢰를 생성하기 위한 적합성 정도를 사용한다. 일 구현예에서, 본원에 개시된 방법은 상이한 부모 내용을 갖는 유전자자리에 대해 발견된 대립유전자의 분포를 분석하는 알고리즘을 사용하는 단계, 및 관찰된 대립유전자 분포를 상이한 부모 내용(상이한 부모계 유전형 양식)에 대해 상이한 배수성 상태를 위한 예측된 대립유전자 분포와 비교하는 단계를 포함한다. 이는 혼합된 모-태아 시료 중 각각의 유전자 자리의 독립된 예의 수를 평가할 수 있도록 하는 방법을 사용하지 않는 방법과 상이하며 이보다 개선되어 있다. 일 구현예에서, 본원에 개시된 방법은 관찰된 대립유전자 측정의 분포가, 모가 이형접합성인 경우의 유전자자리에서 측정된 관찰된 대립유전자 분포를 사용한 정배수성 또는 이수성 태아의 지표인지를 측정하는 단계를 포함한다. 이는, DNA가 우선적으로 농축되어 있지 않거나 특정한 표적 개체에 대해 고도의 정보인 것으로 알려져 있지 않은 유전자자리에 대해 우선적으로 농축된 경우, 이로 인해 배수성 측정에 있어서 서열 데이터들의 세트에 있는 약 2배나 되는 많은 유전적 측정 데이터를 사용할 수 있어 보다 정밀한 측정 결과를 내기 때문에, 모친이 이형접합성인 유전자자리에서 관찰된 대립유전자 분포를 사용하지 않는 방법과는 상이하고 이보다 개선되었다.
일 구현예에서, 본원에 개시된 방법은, 각각의 유전자자리에서 대립유전자 빈도가 천연적으로 다항(그래서 SNP가 이중대립유전자인 경우에 바이어스임)임을 추정하는 결합 분포 모델을 사용한다. 일부 구현예에서, 결합 분포 모델은 베타-이항 분포를 사용한다. 서열분석과 같은 측정 기술을 사용하는 경우, 각각의 유전자자리에 존재하는 각각의 대립유전자에 대한 정량적인 척도를 제공하며, 바이어스 모델은 각각의 유전자자리 및 대립유전자 빈도를 직면한 정도 및 빈도가 추정될 수 있는 신뢰에 적용시킬 수 있다. 대립유전자 비로부터 배수성 요청을 생성하는 당해 분야에서 공지된 방법 또는 정량적인 대립유전자 정보가 폐기되는 방법을 사용하여, 관찰된 비에서의 특정성을 추정할 수 없다. 본 발명은 대립유전자 비를 계산하고 이들 비를 합하여 배수성 요청을 이루는 방법과는 상이하고 개선되어 있는데, 이는, 특정한 유전자자리에서 대립유전자 비를 계산한 후 이들 비를 합하는 단계를 포함하는 어떠한 방법도, 어떠한 소정의 대립유전자 또는 유전자자리의 DNA 양의 척도인 측정된 강도 또는 계수가 정규 방식(Gaussian fashion)으로 분포될 것으로 반드시 추정되기 때문이다. 본원에 개시된 방법은 대립유전자 비율을 계산하는 단계를 포함하지 않는다. 일부 구현예에서, 본원에 개시된 방법은 다수의 유전자자리에서 각각의 대립유전자의 관찰의 수를 모델에 포함시키는 단계를 포함할 수 있다. 일부 구현예에서, 본원에 개시된 방법은 예측된 분포 자체를 계산하는 단계, 대립유전자 측정의 가우시안 분포를 추정하는 어떠한 모델보다 더 정밀할 수 있는 결합 바이어스 분포를 사용하도록 하는 단계를 포함할 수 있다. 바이어스 분포 모델이 가우시안 분포보다 유의적으로 더 정밀한 가능성은, 유전자자리의 수가 증가함에 따라 증가한다. 예를 들어, 20개 미만의 유전자자리가 질의되는 경우, 바이어스 분포 모델이 유의적으로 더 우수할 가능성은 낮다. 그러나, 100개 이상, 또는 특히 400개 이상, 또는 특히 1,000개 이상, 또는 특히 2,000개 이상의 유전자자리가 사용되는 경우, 바이어스 분포 모델은 가우시안 분포 모델보다 유의적으로 더 정밀한 매우 높은 가능성을 가질 것이므로 보다 정밀한 배수성 측정을 야기한다. 바이어스 분포 모델이 가우시안 분포보다 유의적으로 더 정밀하다는 가능성은 또한 각각의 유전자자리에서 관찰의 수가 증가함에 따라 증가한다. 예를 들어, 10개 미만의 명확한 서열이 각각의 유전자자리에서 관찰된 경우, 바이어스 분포 모델이 유의적으로 보다 우수할 가능성은 낮다. 그러나, 50개 이상의 서열 판독물, 또는 특히 100개 이상의 서열 판독물, 또는 특히 200개 이상의 서열 판독물, 또는 특히 300개 이상의 서열 판독물을 각각의 유전자자리에 대해 사용되는 경우, 바이어스 분포 모델은 가우시안 분포 모델보다 유의적으로 더 정밀한 매우 높은 가능성을 가질 것이므로, 보다 정밀한 배수성 측정을 야기한다.
일 구현예에서, 본원에 개시된 방법은 DNA 시료 속에서 각각의 유전자자리에서 각각의 대립유전자의 예의 수를 측정하기 위해 서열분석을 사용한다. 각각의 서열분석 판독물은 구체적인 유전자자리에 맵핑하고 이원성 서열 판독물로서 처리할 수 있으며; 달리는, 판독물의 실체 및/또는 맵핑의 가능성을 서열 판독물의 일부로서 포함시켜 확률론적 방법의 서열 판독물, 즉, 소정의 유전자자리에 맵핑하는 서열 판독물의 가능한 정수 또는 분수를 생성할 수 있다. 2진 계수 또는 수의 확률론을 사용하여 각각의 측정 세트에 대해 이원 분포를 사용하여, 신뢰 구간이 계수의 수 주변에서 계산되도록 허용하는 것이 가능하다. 이원 분포를 사용하는 능력은 보다 정확한 배수성 평가 및 보다 정밀한 신뢰 구간이 계산되도록 한다. 이는 존재하는 대립유전자의 양을 측정하기 위해 강도를 사용하는 방법, 예를 들면, 미세배열을 사용하는 방법, 또는 형광성 판독기를 사용하여 측정함으로써 전기영동 밴드에서 형광적으로 태그된 DNA의 강도를 측정하는 방법과는 상이하며 이보다 개선되어 있다.
일 구현예에서, 본원에 개시된 방법은 현재의 데이터 세트의 측면을 사용하여 당해 데이터 세트에 대해 평가된 대립유전자 빈도 분포에 대한 매개변수를 측정한다. 이는 현재 예측된 대립유전자 빈도 분포, 또는 가능하게 예측된 대립유전자 비에 대한 매개변수를 설정하기 위해 데이터의 훈련 세트 또는 데이터의 선행 세트를 이용하는 방법보다 개선되어 있다. 이는, 수집 및 모든 유전 시료의 측정에 포함된 상이한 세트의 조건이 존재하므로, 현재의 데이터의 세트에서 비롯된 데이터를 사용하여 이러한 시료에 대한 배수성 측정시 사용되어야 하는 결합 분포 모델에 대한 매개변수를 측정하는 방법이 보다 정밀하게 되는 경향이 있을 것이기 때문이다.
일 구현예에서, 본원에 개시된 방법은, 관찰된 대립유전자 측정의 분포가 최대 확률 기술을 사용하는 정배수성 또는 이수성 태아의 지표인지를 측정함을 포함한다. 최대 확률 기술의 사용은, 수득되는 결정인자가 유의적으로 보다 높은 정밀도로 이루어질 것이라는 점에서 단일의 가설 거부 기술을 사용하는 다른 방법과는 상이하고 이보다 유의적으로 개선되어 있다. 하나의 이유는, 단일 가설 거부 기술이 2개 보다는 하나의 측정 분포 만을 기반으로 하여 컷오프 한계(cut off threshold)를 설정하기 때문이며, 이는, 당해 한계가 일반적으로 최적이 아님을 의미한다. 다른 이유는, 최대 확률 기술이 각각의 개개 시료의 특정한 특징과는 상관없이 모든 시료에 대해 사용될 컷오프 한계를 측정하는 대신 각각의 개개 시료에 대한 컷오프 역치의 최적화를 허용하기 때문이다. 다른 이유는, 최대 확률 기술의 사용이 각각의 배수성 요청에 대한 신뢰도의 계산을 허용하기 때문이다. 각각의 요청에 대해 신뢰도 계산을 이루는 능력은, 참여자가 어떠한 요청이 정밀한지, 및 어떤 것이 보다 더 잘못된 것인지를 알도록 한다. 일부 구현예에서, 광범위한 방법을 최대 확률 평가 기술과 합하여 배수성 요청의 정밀도를 향상시킬 수 있다. 일 구현예에서, 최대 확률 기술은 미국 특허 제7,888,017호에 기술된 방법과 함께 사용될 수 있다. 일 구현예에서, 최대 확률 기술은 혼합된 시료 속에서 DNA를 증폭시키기 위해 표적화된 PCR 증폭을 사용한 후 서열분석 및 2011년 몬트리올에서 2011 사람 유전학 국제 회의(the International Congress of Human Genetics 2011)에서 나타낸 바와 같은 TANDEM DIAGNOSTICS에 의해 사용된 것과 같은 판독물 계수 방법을 사용한 분석을 사용하는 방법과 함께 사용될 수 있다. 일 구현예에서, 본원에 개시된 방법은 혼합된 시료 속에서 DNA의 태아 분획을 평가하는 단계 및 배수성 요청 및 배수성 요청의 신뢰 둘 모두를 계산하기 위해 이러한 평가를 사용하는 단계를 포함한다. 이는 둘 모두 충분한 태아 분획에 대한 스크리닝으로써, 이후에 태아 분획을 고려하지 않거나 요청에 대해 신뢰 계산을 생산하지 않는 단일의 가설 거부 기술을 사용하여 이룬 배수성 요청에 의해 평가된 태아 분획을 사용하는 방법과는 상이하고 구별된다.
일 구현예에서, 본원에 개시된 방법은 데이터에 대한 가능성이 노이지(noisy)가 되고 각각의 측정에 대해 활률을 접목시킴으로써 오차를 포함하도록 하는 가능성을 고려한다. 첨부된 확률적 평가를 지닌 측정 데이터를 사용하여 이룬 가설들의 세트에서 정확한 가설을 선택하는 최대 확률 기술의 사용은, 부정확한 측정이 중지될 것이고, 정확한 측정이 배수성 요청을 야기하는 계산에 사용될 것이라는 가능성이 보다 더 있도록 한다. 보다 정밀하도록 하기 위해, 당해 방법은 배수성 측정시 부정확하게 측정된 데이터의 영향을 체계적으로 감소시킨다. 이는, 모든 데이터가 동등하게 정확한 것으로 평가되는 방법 또는 멀리 떨어진 데이터가 배수성 요청을 야기하는 계산에서 독단적으로 배제되는 방법보다 개선되어 있다. 채널 비 측정(channel ratio measurement)을 사용하는 기존의 방법은 개개 SNP 채널 비를 평균냄으로써 당해 방법을 다수의 SNP로 확장시키도록 요청한다. SNP 품질, 및 판독물의 관찰된 깊이를 기반으로 하는 예측된 측정 변이에 의해 개개 SNP를 칭량하는 것은 수득되는 통계의 정밀도를 감소시키지 않으며, 특히 경계선의 경우에 배수성 요청의 정밀도의 유의적인 감소를 초래한다.
일 구현예에서, 본원에 개시된 방법은, 어느 SNP 또는 다른 다형성 유전자자리가 태아에서 이형접합성인지의 지식을 예측하지 않는다. 당해 방법은, 배수성 요청이, 부계 유전형 정보가 이용가능하지 않는 경우에서 이루어질 수 있도록 한다. 이는, 표적에 대한 유전자자리를 적절하게 선택하거나, 혼합된 태아/모계 DNA 시료에서 이루어진 유전적 측정을 해석하기 위하여 SNP가 이형접합성인 지식이 먼저 알려져야만 하는 방법보다 개선되어 있다.
본원에 기술된 방법은, 소량의 DNA가 이용가능하거나, 태아 DNA의 퍼센트가 낮은 경우의 시료에서 사용하는 경우에 특히 유리하다. 이는, 소량의 DNA만 이용가능한 경우 발생하는 상응하게 보다 높은 대립유전자 드롭아웃 비율 및/또는 태아 DNA의 퍼센트가 태아 및 모계 DNA의 혼합된 시료 속에서 낮은 경우 상응하게 더 높은 태아 대립유전자 드롭아웃 비율에 기인한다. 큰 퍼센트의 대립유전자가 표적 개체에 대해 측정되지 않았음을 의미하는, 높은 대립유전자 드롭아웃 비율은 불량하게 정밀한 태아 분획 계산, 및 불량하게 정밀한 배수체 측정을 초래한다. 본원에 개시된 방법은 SNP 사이의 유전 양식에 있어서 연관을 고려하는 결합 분포 모델을 사용하므로, 유의적으로 보다 정밀한 배수성 측정이 이루어질 수 있다. 본원에 기술된 방법은, 혼합물 속에서 태아 상태인 DNA의 분자의 퍼센트가 40% 미만, 30% 미만, 20% 미만, 10% 미만, 8% 미만, 및 심지어 6% 미만인 경우에 정밀한 배수체 측정이 이루어질 수 있도록 한다.
일 구현예에서, 개체의 DNA를 관련된 개체의 DNA와 합하는 경우 측정을 기반으로 개체의 배수성 상태를 측정하는 것이 가능하다. 일 구현예에서, DNA의 혼합물은 모계 혈장에서 발견된 자유로이 부유하는 DNA이며, 이는 공지된 핵형 및 공지된 유전형을 지닌 모친의 DNA를 포함할 수 있으며, 공지되지 않은 핵형 및 공지되지 않은 유전형을 지닌, 태아의 DNA와 혼합될 수 있다. 하나 또는 양쪽 부모의 공지된 유전형 정보를 사용하여 상이한 배수성 상태, 각 부모에서 태아까지의 상이한 염색체 분포, 그리고 선택적으로, 혼합물의 상이한 태아 DNA 분획에 대한 혼합된 시료의 DNA의 다수의 잠재적인 유전 상태를 예측하는 것이 가능하다. 각각의 잠재적인 조성은 가설로 언급될 수 있다. 이후에, 태아의 배수성 상태는 실제 측정을 찾아서, 어느 잠재적인 조성이 관찰된 데이터를 가장 잘 제공하는 가능성이 있는지를 측정함으로써 측정할 수 있다.
상기 의견의 추가 논의는 본 서류 어딘가에서 찾을 수 있다.
비- 칩입성 태아 진단( NPD )
비-침입성 태아 진단의 공정은 다수의 단계를 포함한다. 이들 단계들 중 일부는 (1) 태아에서 유전 물질을 수득하는 단계; (2) 혼합된 시료 속에 존재할 수 있는 태아의 유전물질을 생체외(ex vivo)에서 농축시키는 단계; (3) 유전 물질을 생체외에서 증폭시키는 단계; (4) 유전 물질 속에서 특정한 유전자자리를 생체외에서 우선적으로 농축시키는 단계; (5) 유전 물질을 생체외에서 측정하는 단계; 및 (6) 컴퓨터 상에서 및 생체외에서 유전형 데이터를 분석하는 단계를 포함할 수 있다. 이들 6개 및 다른 관련 단계를 실시하는 것을 줄이는 방법이 본원에 기술되어 있다. 방법 단계들 중 적어도 일부는 신체에 직접 적용되지 않는다. 일 구현예에서, 본 기재내용은 신체에서 분리되어 떨어진 조직 및 다른 생물학적 물질에 적용된 치료 및 진단 방법에 관한 것이다. 당해 방법 단계들 중 적어도 일부는 컴퓨터 상에서 실행된다.
본원의 일부 구현예는, 임상의가 모에 잉태중인 태아의 유전 상태를 비-칩임성 방식으로 측정하도록 함으로써 자녀의 건강을 태아의 유전 물질의 수집에 의해 위험에 처해지지 않도록 하고, 모가 침입성 과정을 겪을 필요가 없도록 한다. 또한, 특정의 측면에서, 본 기재내용은 태아 유전 상태가, 예를 들면, 산전 건강관리예 광범위하게 사용되는, 삼중 시험(triple test)와 같은, 비-침입성 모계 혈청 분석계 스크리닝보다 높은 정밀도, 유의적으로 더 높은 정밀도로 측정되도록 한다.
본원에 개시된 방법의 높은 정밀도는 본원에 기술된 바와 같은, 유전형 데이터의 분석에 대한 정보학 접근법의 결과이다. 현대의 기술적 진보는 고 배출 서열분석 및 유전형 배열과 같은 방법을 사용하여 유전 시료의 다량의 유전 정보를 측정하는 능력을 생성하여 왔다. 본원에 개시된 방법은, 임상의가 다량의 데이터를 이용가능하게 하는 보다 큰 장점을 취하도록 하며, 태아 유전 상태의 보다 정밀한 진단이 되도록 한다. 다수의 구현예의 세부사항이 하기 제공된다. 상이한 구현예는 상술한 단계의 상이한 조합을 포함할 수 있다. 상이한 단계의 상이한 구현예의 다양한 조합을 상호교환적으로 사용할 수 있다.
일 구현예에서, 혈액 시료를 임신한 모에서 취하여, 모체 기원의 DNA 및 태아 기원의 DNA 둘다의 혼합물을 함유하는, 모계 혈액의 혈장 속의 자유로이 부유하는 DNA를 분리하여 태아의 배수성 상태를 측정하는데 사용한다. 일 구현예에서, 본원에 개시된 방법은 다형성 대립유전자에 상응하는 DNA의 혼합물 속의 DNA 서열을, 대립유전자 비 및/또는 대립유전자 분포가 농축시 대부분 지속적으로 잔존하도록 하는 방식으로 우선적으로 농축시킴을 포함한다. 일 구현예에서, 본원에 개시된 방법은 고 효율의 표적화된 PCR 계 증폭을 포함함으로써, 매우 높은 퍼센트의 수득되는 분자가 표적화된 유전자자리에 상응하도록 한다. 일 구현예에서, 본원에 개시된 방법은 모체 기원의 DNA, 및 태아 기원의 DNA 둘 모두를 함유하는 DNA의 혼합물을 서열분석함을 포함한다. 일 구현예에서, 본원에 개시된 방법은 모에서 잉태중인 태아의 배수성 상태를 특정하기 위해 측정된 대립유전자 분포를 사용하는 단계를 포함한다. 일 구현예에서, 본원에 개시된 방법은 측정된 배수성 상태를 임상의에게 보고하는 단계를 포함한다. 일 구현예에서, 본원에 개시된 방법은 임상의 행동, 예를 들면, 융모막 융모 시료 채취 또는 양수 진단과 같은 후속적인 침입성 시험을 수행하는 단계, 삼염색체성 개체의 출생 또는 삼염색체성 태아의 임신 중절수술을 준비하는 단계를 포함한다.
본 출원은 2006년 11월 28일 출원된 미국 실용신안 출원 일련번호 제 11/603,406호(미국 공개 공보 제20070184467호); 2008년 3월 17일자로 출원된 미국 실용신안 출원 일련번호 제12/076,348호(미국 공개 공보 제20080243398호); 2009년 8월 4일자로 출원된, PCT 출원 일련번호 제PCT/US09/52730호(PCT 공보 제WO/2010/017214호); 2010년 9월 30일자로 출원된 PCT 출원 일련번호 제PCT/US10/050824호(PCT 공보 제WO/2011/041485호), 2011년 5월 18일자로 출원된 미국 실용신안 일련번호 제13/110,685호, 및 2012년 10월 3일자로 출원된 PCT 특허원 일련번호 제PCT/12/58578호를 참조하며, 이의 각각의 전문은 본원에 참조로 포함된다. 당해 출원에 사용된 어휘 중의 일부는 이들 참조문헌에서 이의 전례를 가질 수 있다. 본원에 기술된 개념 중의 일부는 이들 참조문헌에서 발견된 개념의 측면에서 보다 잘 이해될 것이다.
자유로이 부유하는 태아 DNA 를 포함하는 모계 혈액의 스크리닝
본원에 기술된 방법을 사용하여 자녀, 태아 또는 다른 표적 개체의 유전형을 측정하는데 도움을 줄 수 있으며, 여기서 표적의 유전 물질은 다른 유전 물질의 양의 존재로 발견된다. 일부 구현예에서 유전형은 하나 또는 다수의 염색체의 배수성 상태를 말할 수 있으며, 이는 하나 또는 다수의 질병 연결된 대립유전자, 또는 이의 일부 조합을 말할 수 있다. 당해 기재내용에서, 당해 논의는 태아의 유전 상태를 측정하는 것에 촛점을 맞추며, 여기서 태아 DNA는 모계 혈액에서 발견되지만, 당해 예는, 본 방법이 적용될 수 있는 가능한 개념으로 한정함을 의미하지 않는다. 또한, 당해 방법은, 표적 DNA의 양이 비-표적 DNA와 특정 비율로 존재하는 경우에 적용가능할 수 있으며; 예를 들면, 표적 DNA는 존재하는 DNA의 0.000001 내지 99.999999% 사이의 어디에서도 구성될 수 있다. 또한, 비-표적 DNA는, 관련 비-표적 개체(들)의 일부 또는 모두의 유전 데이터가 알려져 있는 한, 1명의 개체에서, 또는 심지어 관련 개체에서 반드시 기원할 필요는 없다. 일 구현예에서, 본원에 개시된 방법을 사용하여 태아 DNA를 함유하는 모계 혈액의 태아의 유전형 데이터를 측정할 수 있다. 이는 또한 임신한 여성의 자궁내에 다수의 태자녀 존재하는 경우에, 또는 다른 오염되는 DNA가, 예를 들면, 다른 이미 출생한 형제의 시료 속에 존재할 수 있는 경우 사용될 수 있다.
본 기술은 태반 융모를 통해 모계 순환에 접근하는 태아 혈액 세포의 현상을 이용할 수 있다. 정상적으로, 단지 소수의 태아 세포만이 이러한 양식으로 모계 순환에 도입된다(태아-모계 출혈을 위한 양성의 클라이하우어-베케 시험(Kleihauer-Betke test)을 생성하기에 충분하지 않음). 태아 세포는 다양한 기술에 의해 분류되고 분석되어 특정한 DNA 서열을 찾을 수 있지만, 침입성 과정이 고유하게 갖는 위험은 없다. 당해 기술은 또한 태반 조직의 세포자멸사 후 DNA 방출에 의한 모계 순환에 대한 접근을 획득하는 자유 부유하는 태아 DNA의 현상을 사용할 수 있으며, 여기서 문제의 태반 조직은 태아와 동일한 유전형의 DNA를 함유한다. 모계 혈장에서 발견된 자유로이 부유하는 DNA는 30 내지 40% 정도로 높은 태아 DNA의 비율로 태아 DNA를 함유하는 것으로 밝혀졌다.
일 구현예에서, 혈액은 임신한 여성에서 채혈할 수 있다. 연구는, 모계 혈액이 모체 기원의 자유로이 부유하는 DNA 외에, 태아의 소량의 자유로이 부유하는 DNA를 함유할 수 있음을 입증하여 왔다. 또한, 전형적으로 핵 DNA를 함유하지 않는, 모체 기원의 많은 혈액 세포 외에도, 태아 기원의 DNA를 포함하는 핵화된 태아 혈액 세포가 존재할 수 있다. 태아 DNA를 분리하거나 태아 DNA가 농축된 분획을 생성시키는 당해 분야에 공지된 많은 방법이 있다. 예를 들면, 크로마토그래피는 태아 DNA 속에 농축된 특정의 분획을 생성시킴을 입증한다.
비교적 비-침입성 방식으로 채취하여, 모계 DNA에 대해 이의 비율이 농축되어 있거나, 원래의 이의 비율인, 세포성 또는 자유 부유하는 태아 DNA의 양을 함유하는 모계 혈액, 혈장, 또는 다른 유액의 시료를 가지면, 당해 시료에서 발견된 DNA의 유전형을 분석할 수 있다. 일부 구현예에서, 혈액은 침을 사용하여 정맥, 예를 들면, 경정맥에서 혈액을 빼냄으로써 채혈할 수 있다. 본원에 기술된 방법을 사용하여 태아의 유전형 데이터를 측정할 수 있다. 예를 들면, 이는 하나 이상의 염색체에서 배수성 상태를 측정하는데 사용될 수 있으며, 삽입, 결실, 및 전좌를 포함하는, SNP 하나 또는 세트의 실체를 측정할 수 있다. 이는 하나 이상의 유전형의 특징의 기원인 부모를 포함하는, 하나 이상의 일배체형을 측정하는데 사용될 수 있다.
당해 방법이 ILLUMINA INFINIUM ARRAY 플랫폼, AFFYMETRIX GENECHIP, ILLUMINA GENOME ANALYZER, 또는 LIFE TECHNOLGIES' SOLID SYSTEM과 같은 어떠한 유전형 및/또는 서열분석 방법에 사용될 수 있는 어떠한 핵산과도 작동할 것임에 주목한다. 이는 혈장에서 추출된 자유-부유하는 DNA 또는 이러한 핵산의 증폭(예를 들면, 전체 게놈 증폭, PCR); 다른 세포 유형(예를 들면, 전혈의 사람 림프구)의 게놈 DNA 또는 이의 증폭을 포함한다. DNA의 제조를 위해, 이들 플랫폼 중의 하나에 적합한 게놈 DNA를 생성하는 어떠한 추출 또는 정제 방법도 잘 작동할 것이다. 당해 방법은 RNA의 시료와 함께 잘 작동할 수 있다. 일 구현예에서, 시료의 저장은 분해를 최소화할 방식(예를 들면, 동결 이하, 약 -20℃에서, 또는 보다 낮은 온도에서)으로 수행될 수 있다.
부모 지지
일부 양태를 PARENTAL SUPPORT^TM(PS) 방법과 함께 사용할 수 있으며, 이의 구현예는 미국 특허원 제11/603,406호(미국 공보 제20070184467호), 미국 특허원 제12/076,348호(미국 공보 제20080243398호), 미국 특허원 제13/110,685호, PCT 특허원 제PCT/US09/52730호(PCT 공보 제WO/2010/017214호), 및 PCT 특허원 제PCT/US10/050824호(PCT 공보 제WO/2011/041485호)에 기술되어 있으며, 이들은, 전문이 본원에 참조로 포함된다. PARENTAL SUPPORT^TM은 유전 데이터를 분석하는데 사용될 수 있는 정보학 기반의 접근법이다. 일부 구현예에서, 본원에 개시된 방법은 PARENTAL SUPPORT^TM 방법의 일부인 것으로 고려될 수 있다. 일부 구현예에서, PARENTAL SUPPORT ^TM 방법은 표적 개체의 유전 데이터를 고 정밀도로, 당해 개체의 하나 또는 소수의 세포, 또는 표적 개체의 DNA 및 하나 또는 다수의 다른 개체의 DNA로 이루어진 DNA의 혼합물의 유전 데이터를 고 정밀도로 측정하거나, 질병-관련된 대립유전자, 목적한 다른 대립유전자, 및/또는 표적 개체에서 하나 또는 다수의 염색체의 배수성 상태를 특이적으로 측정하는데 사용될 수 있는 방법의 집합이다. PARENTAL SUPPORT^TM는 이들 방법들 중의 어느 것을 말할 수 있다. PARENTAL SUPPORT^TM는 정보학 기반의 방법의 예이다. PARENTAL SUPPORT^TM방법의 예시적인 구현예는 도 29 내지 도 31g에 나열되어 있으며 실시예 19에 기술되어 있다.
PARENTAL SUPPORT^TM 방법은 공지된 부모 유전 데이터, 즉, 모친 및/또는 부친의 일배체 및/또는 이배체 유전 데이터를 감수분열의 메카니즘의 지식 및 표적 DNA의 불완전한 측정의 지식, 및 가능하게는 하나 이상의 관련된 개체과 함께, 교차 빈도를 기반으로 하는 집단에 따라 사용하여, 주요 유전자자리의 위치에서 특정 표적 세포(들), 및 표적 DNA 또는 배아의 배수성 상태, 및/또는 다수의 대립유전자에서 유전형을 고도의 신뢰도로 인실리코 재작제한다. PARENTAL SUPPORT^TM 방법은 불량하게 측정된 단일 뉴클레오타이드 다형성(SNP)을 재작제할 수 있을 뿐 아니라, 전혀 측정되지 않았던 DNA의 삽입 및 결실, 및 SNP 또는 전체 영역을 재작제할 수 있다. 또한, PARENTAL SUPPORT^TM 방법은 단일 세포로부터 다수의 질병-연결된 유전자자리 및 또한 이수성에 대한 스크린 둘 모두를 측정할 수 있다. 일부 구현예에서, PARENTAL SUPPORT^TM 방법은 IVF 주기 동안에 생검된 배아의 하나 이상의 세포를 특성화하여 하나 이상의 세포의 유전 상태를 측정하는 데 사용될 수 있다.
PARENTAL SUPPORT^TM 방법은 노이지 유전 데이터의 정화를 허용한다. 이는 관련된 개체(부모)의 유전형을 참조물질로서 사용하여 표적 게놈(배아)내 정확한 유전 대립유전자를 부여함으로써 수행할 수 있다. PARENTAL SUPPORT^TM이 특히 관련될 수 있는데, 여기서 유전 물질의 단지 소량만이 이용가능하며(예를 들면, PGD) 여기서 유전형의 직접적인 측정은 제한된 양의 유전 물질로 인하여 고유하게 노이지가 있다. PARENTAL SUPPORT^TM이 특히 관련될 수 있는데, 여기서 이용가능한 유전 물질의 단지 소 분획만이 표적 개체에서 기원되고(예를 들면, NPD) 여기서 유전형의 직접적인 측정은 다른 개체의 오염되는 DNA 신호로 인하여 고유하게 노이지가 있다. PARENTAL SUPPORT^TM 방법은, 통상의 무질서한 이배체 측정이 고 비율의 대랍형질 드롭아웃, 드롭-인(drop-in), 다양한 증폭 바이어스 및 다른 오차에 의해 특징화될 수 있다고 해도, 염색체 분절의 카피 수와 함께, 배아에서 정렬된 이배체 대립유전자 서열을 고도로 정밀하게 재작제할 수 있다. 당해 방법은 잠재하는 유전 모델 및 측정 오차의 잠재하는 모델 둘 모두를 사용할 수 있다. 유전 모델은 각각의 SNP에서 대립유전자 확률 및 SNP 사이의 교차 확률 둘 모두를 측정할 수 있다. 대립유전자 확률은 International HapMap 기획에 의해 개발된 것으로서, HapMap 데이터베이스에서 수득된 데이터를 기반으로 하여 SNP 사이의 부모 및 모델 교차 확률에서 수득된 데이터를 기반으로 각각의 SNP에서 모델화될 수 있다. 적절한 잠재하는 유전 모델 및 측정 오차 모델을 제공하여, 사후 확률 최대화( maxim um a posteriori: MAP) 평가를 계산적 효능에 관한 변형과 함께 사용하여, 배아내 각각이 SNP에서 정확하게, 정렬된 대립유전자 값을 평가할 수 있다.
일부 경우에, 위에서 요약한 기술은 개체에서 기원하는 극소량의 DNA를 제공하여 개체의 유전형을 측정할 수 있다. 이는 하나 또는 소수의 세포의 DNA일 수 있거나, 모계 혈액에서 발견된 소량의 태아 DNA에서 기원할 수 있다.
가설
본 기재내용과 관련하여, 가설은 가능한 유전 상태를 말한다. 이는 가능한 배수성 상태를 말할 수 있다. 이는 가능한 대립유전자 상태를 말할 수 있다. 가설의 세트는 가능한 유전 상태의 세트, 가능한 대립유전자 상태의 세트, 가능한 배수체 상태의 세트, 또는 이의 조합을 말할 수 있다. 일부 구현예에서, 가설들의 세트는, 그 가설들로부터 추정한 하나의 가설이 어떠한 특정 개체의 실제 유전 상태에 상응하도록 설계할 수 있다. 일부 구현예에서, 가설들의 세트를 설계하여 모든 가능한 유전 상태가 그 가설들 중 적어도 하나의 가설에 의해 기술될 수 있도록 할 수 있다. 본원의 일부 구현예에서, 방법의 하나의 측면은, 어느 가설이 문제의 개체가 실제 유전 상태에 상응하는지를 측정하는 것이다.
본원의 다른 구현예에서, 하나의 단계는 가설을 생성시킴을 포함한다. 일부 구현예에서, 이는 카피 수 가설일 수 있다. 일부 구현예에서 이는 관련된 개체 각각의 염색체의 어느 분절이, 경우에 따라, 다른 관련된 개체의 어느 분절에 상응하는지에 관한 가설을 포함할 수 있다. 가설을 생성하는 것은, 고려하에 있는 가능한 유전 상태의 전체 세트가 이들 변수에 의해 포함되도록 변수의 한계를 설정하는 작용을 말할 수 있다.
"배수성 가설", 또는 "배수성 상태 가설"로 또한 명명되는 "카피 수 가설"은 표적 개체에서 소정의 염색체 카피, 염색체 유형, 또는 염색체 단면에 대한 가능한 배수성 상태에 관한 가설을 말할 수 있다. 이는 또한 개체에서 염색체 유형 중 하나 이상의 배수성 상태를 말할 수 있다. 카피 수 가설의 세트는 가설의 세트를 말할 수 있으며, 여기서 각각의 가설은 개체에서 상이한 가능한 배수성 상태에 상응한다. 가설 세트는 가능한 배수성 상태의 세트, 가능한 부모 일배체형 분포의 세트, 혼합된 시료 속에서 가능한 태아 DNA 퍼센트의 세트, 또는 이의 조합에 관한 것일 수 있다.
정상의 개체는 각각의 부모의 각 염색체 유형 중 하나를 함유한다. 그러나, 감수분열 및 유사분열로 인하여, 개체가 각각의 부모로부터 받은 염색체 유형 중 0, 1, 2개 이상을 가지는 것이 가능하다. 실제로는, 부모로부터 받은 염색체 2개 이상을 찾는 것은 드물다. 본원에서, 일부 구현예는 단지 소정의 염색체 중의 0, 1, 또는 2개의 카피가 환자에서 비롯된 가능한 가설만을 고려하며; 이는 부모에서 기원하는 다소의 가능한 카피를 고려하기 위한 사소한 확장이다. 일부 구현예에서, 소정의 염색체의 경우, 9개의 가능한 가설이 존재한다: 부모 기원의 0, 1, 또는 2개 염색체에 관한 3개의 가능한 가설로 곱해진, 모체 기원의 0, 1, 또는 2개의 염색체에 관한 3개의 가능한 가설. Let(m,f)는 가설을 말하며, 여기서 m은 모친으로부터 유전된 소정의 염색체의 수이고, f는 부친으로부터 유전된 소정의 염색체의 수이다. 따라서, 9개의 가설은 (0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), 및 (2,2)이다. 이들은 또한 H₀₀, H₀₁, H₀₂, H₁₀, H₁₂, H₂₀, H₂₁, 및 H₂₂로 쓰여진다. 상이한 가설이 상이한 배수성 상태에 상응한다. 예를 들면, (1,1)은 정상의 이염색체성 염색체를 말하고; (2,1)은 모의 삼염색체성을 말하며, (0,1)은 부계 일염색체성을 말한다. 일부 구현예에서, 2개의 염색체가 하나의 부친으로부터 유전되고 하나의 염색체가 다른 부모로부터 유전되는 경우는 2개의 경우로 추가로 차별화될 수 있다: 하나는, 2개의 염색체가 동일한 경우(일치된 카피 오차)이고, 하나는 2개의 염색체가 상동성이지만 동일하지 않은 경우(일치하지 않는 카피 수)이다. 이들 구현예에서, 6개의 가능한 가설이 존재한다. 다른 세트의 가설, 및 상이한 수의 가설을 사용하는 것이 가능함이 이해되어야 한다.
본원의 일부 구현예에서, 배수성 가설은, 다른 관련된 개체의 어느 염색체가, 표적 개체가 게놈에서 발견된 염색체에 상응하는지에 관한 가설을 말한다. 일부 구현예에서, 당해 방법의 주요사항은, 관련된 개체가 일배체형 블록을 공유하는 것으로 예측될 수 있으며, 관련된 개체의 측정된 유전 데이터를, 표적 개체과 관련된 개체 사이의 일배체형 블록 일치의 지식과 함께 사용하여, 표적 개체에 대한 정확한 유전 데이터를 표적 개체의 유전적 측정 만을 사용하는 것보다 더 높은 신뢰도로 부여하는 것이 가능하다는 것이다. 이와 같이, 일부 구현예에서, 배수성 가설은 단지 다수의 염색체에 관한 것이 아니라, 표적 개체에서 하나 이상의 염색체를 사용하여, 관련된 개체에서 어느 염색체가 동일하거나, 거의 동일한지에 관한 것일 수 있다.
가설의 세트가 정의되면, 논리가 입력 유전 데이터에서 작동하는 경우, 이들은 고려하에 가설 각각에 대한 측정된 통계적 가능성을 출력할 수 있다. 다양한 가설의 확률은 하나 이상의 숙련된 기술, 알고리즘, 및/또는 본원의 어딘가에 기술된 방법 중 하나 이상에 의해 기술된 바와 같이, 입력으로서 관련된 유전 데이터를 사용하여, 각각의 다양한 가설, 확률이 동일한 값에 대해 수학적으로 계산함으로써 측정할 수 있다.
다수의 기술에 의해 측정된 것으로서, 상이한 가설의 확률이 추정되면, 이들을 합할 수 있다. 이는 각각의 가설에 대하여, 각각의 기술에 의해 측정된 것으로서의 확률을 곱함을 포함할 수 있다. 가설의 확률의 곱은 정규화될 수 있다. 하나의 배수성 가설이 염색체에 대한 하나의 가능한 배수성 상태를 말함에 주목한다.
또한 "가설을 합하는", 또는 숙련된 기술의 결과를 합하는 것으로 불리는 "확률을 합하는"의 공정은 선형 대수의 분야의 숙련가에게 친숙할 수 있는 개념이다. 확률을 합하는 한가지 가능한 방식은 다음과 같다: 숙련된 기술을 사용하여 유전 데이터들의 특정 세트에 의하여 일단의 가설을 평가하는 경우, 방법의 출력은 가설의 세트 중의 가설 각각과 1 대 1 방식으로 관련된 확률의 세트이다. 제1의 숙련된 기술에 의해 측정된, 각각의 세트내 가설 중의 하나와 관련되어 있는 확률들의 세트를 제2의 숙련된 기술에 의해 측정된, 각각이 동일한 세트의 가설과 관련되어 있는 확률들의 세트와 합하는 경우, 2개 세트의 확률을 곱한다. 이는, 당해 세트의 각각의 가설의 경우, 이러한 가설과 관련된 2개의 확률을 함께 곱하고, 상응하는 생성물이 출력 확률임을 의미한다. 당해 공정은 특정 수의 전문 기술로 확장될 수 있다. 유일하게 하나의 기술이 사용되는 경우, 출력 확률은 입력 확률과 동일하다. 2개 이상의 전문 기술이 사용되는 경우, 이후에 관련 확률은 동시에 배가될 수 있다. 생성물은 표준화시킴으로서 가설들의 합에서 세트내 가설의 확률의 합은 100%이 된다.
일부 구현예에서, 소정의 가설에 대해 합한 확률이 다른 가설의 어느 것에 대해 합한 확률보다 큰 경우, 당해 가설은 가장 높은 가능성이 있는 것으로 측정된 것으로 고려될 수 있다. 일부 구현예에서, 가설은 가장 가능성이 있는 것으로 측정될 수 있으며, 표준화된 확률이 한계보다 큰 경우 배수성 상태, 또는 다른 유전 상태가 요청될 수 있다. 일 구현예에서, 이는, 가설과 관련된 염색체의 수 및 실체가 배수성 상태로 요청될 수 있음을 의미할 수 있다. 일 구현예에서, 이는, 당해 가설과 관련된 대립유전자의 실체가 대립유전자 상태로서 요청될 수 있음을 의미한다. 일부 구현예에서, 한계는 약 50% 내지 약 80% 일 수 있다. 일부 구현예에서 한계는 약 80% 내지 약 90%일 수 있다. 일부 구현예에서 한계는 약 90% 내지 약 95%일 수 있다. 일부 구현예에서 한계는 약 95% 내지 약 99%일 수 있다. 일부 구현예에서 한계는 약 99% 내지 약 99.9%일 수 있다. 일부 구현예에서 한계는 약 99.9% 초과일 수 있다.
부모 관계( parental context )
부모 관계는 표적의 2명의 부모 중의 1명 또는 둘 모두에 대한 2개의 관련 염색체 중 각각에서, 소정의 대립유전자의 유전 상태를 말한다. 일 구현예에서, 부모 관계는 표적의 대립유전자 상태를 말하는 것이 아니라, 부모의 대립유전자 상태를 말한다. 소정의 SNP에 대한 부모 관계는 4개의 염기쌍, 2개의 부친 및 2개의 모친으로 이루어질 수 있고; 이들은 서로 동일하거나 상이할 수 있다. 이는 전형적으로 "m₁m₂|f₁f₂"(여기서, m₁ 및 m₂는 2개의 모계 염색체의 소정의 SNP의 유전 상태이며, f₁ 및 f₂는 2개의 부계 염색체의 소정의 SNP의 유전 상태이다. 일부 구현예에서, 부모 관계는 "f₁f₂|m₁m₂"로 기록될 수 있다. 첨자 "1" 및 "2"는 제1 및 제2 염색체의 소정의 대립유전자에서 유전형을 말하며; 또한 "1"로 표지된 염색체 및 "2"로 표지된 염색체의 선택이 임의적임에 주목한다.
본 기재내용에서, A 및 B는 흔히 일반적으로 염기쌍 실체를 나타내는데 사용되며; A 또는 B는 동등하게 C(사이토신), G(구아닌), A(아데닌) 또는 T(티민)을 잘 나타낼 수 있다. 예를 들어, 소정의 SNP 기반 대립유전자에서, 모의 유전형은 염색체 상의 이러한 SNP에서 T였으며, 동종 염색체 상의 SNP에서 G이고, 대립유전자에서 부계 유전형은 동종 염색체 둘 모두의 SNP에서 G이며, 표적 개체의 대립유전자는 AB|BB의 부모 관계를 가지는 것으로 말할 수 있으며; 이는 또한, 대립유전자가 AB|AA의 부모 관계를 가지는 것으로 일컬어질 수 있다. 이론적으로, 4개의 가능한 뉴클레오타이드 중의 어느 것도 소정의 대립유전자에서 발생할 수 있으므로, 예를 들면, 모의 경우 AT의 뉴전형을 가지는 것이 가능하며 부계의 경우 특정 대립유전자에서 GC의 유전형을 가지는 것이 가능함을 주목한다. 그러나, 실험 데이터는, 대부분의 경우에 4개의 가능한 염기 쌍 중의 2개 만이 소정의 대립유전자에서 관찰됨을 나타낸다. 예를 들면, 단일의 연쇄 반복을 사용하는 경우 2개 이상의 부모 관계, 4 이상 및 심지어 10 이상의 정보를 가지는 것이 가능하다. 당해 기재내용에서, 논의는, 본원에 개시된 구현예를 변형시켜 이러한 추정이 유지되지 않는 경우를 고려할 수 있었다고 해도, 단지 2개의 가능한 염기쌍 만이 소정의 대립유전자에서 관찰될 것으로 추정한다.
"부계 정보"는 동일한 부계 정보를 갖는 표적 SNP의 세트 또는 소세트를 말할 수 있다. 예를 들면, 표적 개체에서 소정의 염색체 상에 1000개의 대립유전자를 측정하려는 경우, 정보 AA|BB는 1,000개의 대립유전자의 그룹 중의 모든 대립유전자의 세트를 말할 수 있으며, 여기서 표적이 되는 모의 유전형은 동형접합체이고, 표적이 되는 부계 유전형은 동형접합체이지만, 여기서 모계 유전형 및 부계 유전형은 당해 유전자자리에서 같지 않다. 부계 자료가 단계적이지 않고, 따라서 AB = BA인 경우, 9개의 가능한 부모 관계가 존재한다: AA|AA, AA|AB, AA|BB, AB|AA, AB|AB, AB|BB, BB|AA, BB|AB, 및 BB|BB. 부모 데이터가 단계적이고, 따라서 AB≠BA인 경우, 16개의 상이한 가능한 부모 관계가 존재한다: AA|AA, AA|AB, AA|BA, AA|BB, AB|AA, AB|AB, AB|BA, AB|BB, BA|AA, BA|AB, BA|BA, BA|BB, BB|AA, BB|AB, BB|BA, 및 BB|BB. 성 염색체 상의 일부 SNP를 제외한 염색체 상의 모든 SNP 대립유전자는 이들 부모 관계 중 하나를 갖는다. 한 명의 부모에 대한 부모 관계가 이형접합성인 SNP의 세트는 이형접합성 정보로서 언급될 수 있다.
NPD에서 부모 관계의 용도
비-침입성 태자녀 진단은 예를 들면, 임신한 모친에서 채혈한 혈액에서 비-침입성 방식으로 수득된 유전 물질로 태아의 유전 상태를 측정하기 위해 사용될 수 있는 중요한 기술이다. 혈액은 혈장 DNA의 분리 후, 떨어져서 혈장 분리될 수 있다. 크기 선택을 사용하여 적절한 길이의 DNA를 분리할 수 있다. DNA는 유전자자리의 세트에서 우선적으로 농축될 수 있다. 이후에, 당해 DNA는 유전형 배열에 대한 하이브리드화 및 형광성 측정, 또는 고 배출 서열분석기 상에서 서열분석에 의한 것과 같은 다수의 수단으로 측정할 수 있다.
서열분석이 비-침입성 임태아 검진과 관련하여 태아의 배수성 요청을 위해 사용된 경우, 서열 데이터를 사용하는 다수의 방법이 존재한다. 서열 데이터를 사용할 수 있었던 가장 일반적인 방법은 소정의 염색체에 맵핑하는 다수의 판독물을 단순히 계수하는 것이다. 예를 들어, 태아에서 염색체 21번의 배수성 상태를 측정하려고 노력하는 경우를 상상한다. 또한, 시료 속의 DNA가 태아 기원의 DNA 10%, 및 모체 기원의 DNA 90%로 이루여 있다고 상상한다. 이 경우, 이염색체성인 것으로 예측될 수 있는 염색체, 예를 들면, 염색체 3번 상의 판독물의 평균 수를 관찰할 수 있고, 이를 염색체 21번의 판독물의 수와 비교할 수 있었으며, 여기서 판독물은 유일한 서열의 일부인 이러한 염색체 상의 다수의 염기 쌍에 대해 조절된다. 태자녀 정배수성인 경우, 게놈의 단위당 DNA의 양이 모든 위치에서 거의 동등한 것으로 예측할 수 있다(확률적 변화에 적용됨). 한편, 태자녀 염색체 21번에서 삼염색체성인 경우, 게놈의 다른 위치에서보다 염색체 21번으로부터 유전 단위당 DNA가 약간 더 존재하는 것으로 예측할 수 있다. 구체적으로, 혼합물 중 염색체 21로부터 약 5% 이상의 DNA가 존재하는 것으로 예측될 수 있다. 서열분석을 사용하여 DNA를 측정하는 경우, 다른 염색체로부터 유일한 분절 당 염색체 21번으로부터 약 5% 이상의 유일하게 맵핑가능한 판독물을 예측할 수 있다. 이수성 진단에 대한 편향으로서, 염색체에 유일하게 맵핑가능한 다수의 서열에 대해 조절하는 경우, 특정의 한계보다 더 높은 특수 염색체의 DNA 양을 관찰할 수 있다. 이수성을 검출하는데 사용될 수 있는 다른 방법은, 부모 관계를 고려하는 것을 제외하고는, 상기 것과 유사하다.
어느 대립유전자가 표적에 대한 것인지를 고려하는 경우, 일부 부모 관계가 다른 것보다 더 유익한 가능성을 고려할 수 있다. 예를 들어, AA|BB 및 대칭 정보 BB|AA는 가장 유익한 정보인데, 이는, 태자녀 모와는 상이한 대립유전자를 수반하는 것으로 알려져 있기 때문이다. 대칭의 이유로, AA|BB 및 BB|AA 정보 둘 모두는 AA|BB로 언급될 수 있다. 다른 세트의 유익한 부모 관계는 AA|AB 및 BB|AB인데, 이는, 당해 경우에 태자녀 모가 가지지 않은 대립유전자를 수반하는 50%의 기회를 가지기 때문이다. 대칭으로 인하여, AA|AB 및 BB|AB 정보 둘 모두는 AA|AB로 언급될 수 있다. 제3 세트의 유익한 부모 관계는 AB|AA 및 AB|BB이며, 이는, 이 경우에 태자녀 공지된 부모 대립유전자를 수반하고, 이러한 대립유전자가 또한 모계 게놈에 존재하기 때문이다. 대칭으로 인하여, AB|AA 및 AB|BB 정보 둘 모두는 AB|AA로 언급될 수 있다. 제4의 부모 관계는 AB|AB이며, 여기서 태아는 알려지지 않은 대립유전자 상태를 가지고, 대립유전자 상태에 상관없이, 이는, 모가 동일한 대립유전자를 갖는 것이다. 제5의 부모 관계는 AA|AA이며, 여기서 모친 및 부친은 이형접합성이다.
현 재 개시된 구현예의 상이한 실행
표적 개체의 배수성 상태를 측정하는 방법이 본원에 기재되어 있다. 표적 개체는 난할구, 배아, 또는 태아일 수 있다. 본원의 일부 구현예에서, 표적 개체의 하나 이상의 염색체에서 배수성 상태를 측정하는 방법은 당해 문서에 기술된 단계들 중 어느 것, 및 이의 조합을 포함할 수 있다.
일부 구현예에서 태아의 유전 상태를 측정하는데 사용될 유전 물질의 공급원은 모계 혈액에서 분리된, 핵화된 태아 적혈구와 같은 태아 세포일 수 있다. 당해 방법은 임신한 모친에서 혈액 시료를 수득하는 단계를 포함할 수 있다. 당해 방법은, 특정의 색의 조합이 핵화된 적혈구 세포와 독특하게 관련되어 있으며, 유사한 색의 조합은 모계 혈액 속의 다른 어떠한 존재하는 세포와 관련되어 있지 않다는 발상을 기반으로 하여, 가시적인 기술을 사용하여 태아 적혈구 세포를 분리하는 단계를 포함한다. 핵화된 적혈구와 관련된 색상의 조합은 염색에 의해 보다 더 명백해질 수 있는, 핵 주변의 적색의 헤모글로빈, 및 예를 들면, 청색으로 염색될 수 있는 핵 물질이 색상을 포함할 수 있다. 모친 혈액의 세포를 분리하여 이들을 슬라이드 위에 분산시킨 후 적색(헤모글로불린) 및 청색(핵 물질) 둘 모두를 관찰하는 지점을 확인함으로써 핵화된 적혈구의 위치를 확인할 수 있다. 이후에, 현미조작장치를 사용하여 핵화된 적혈구를 추출하고, 유전형 및/또는 서열분석 기술을 사용하여 이들 세포 속의 유전 물질의 유전형의 측면을 측정할 수 있다.
일 구현예에서, 태아 헤모글로불린의 존재하에서만 형광을 나타내고 모계 헤모글로빈의 존재하에서는 형광을 나타내지 않는 금형으로 핵화된 적혈구를 염색하고, 핵화된 적혈구가 모계 또는 태아에서 기원하는지 사이의 모호성을 제거할 수 있다. 본원의 일부 구현예는 유전 물질을 염색하거나 달리는 표지하는 단계를 포함할 수 있다. 본원의 일부 구현예는 태아 세포 특이적인 항체를 사용하여 태아 핵 물질을 구체적으로 표지하는 단계를 포함할 수 있다.
모계 혈액에서 태아 세포를 분리하거나, 모계 혈액에서 태아 DNA를 분리하거나, 모계 유전 물질의 존재하에서 태아 유전 물질의 시료를 농축시키는 많은 다른 방법이 존재한다. 이들 방법들 중 일부는 본원에 나열되어 있지만, 이것이 전적인 목록인 것으로 의도되지는 않는다. 일부 적절한 기술이 편의를 위해 본원에 나열되어 있다: 형광적으로 또는 달리는 태그된 항체의 사용, 크기 배제 크로마토그래피, 자기적으로 또는 달리는 표지된 친화성 태그, 특이적인 대립유전자에서 모와 태아 세포 사이의 차등적인 메틸화와 같은 후생적 차이, 밀도 구배 원심분리에 이은 CD45/14 음성-세포의 CD45/14 고갈 및 CD71-양성 선택, 삼투질 농도가 상이한 단일 또는 이중 퍼콜 구배(Percoll gradient), 또는 갈락토즈 특이적인 렉틴 방법.
본원의 일 구현예에서, 표적 개체는 태자녀고, 상이한 유전형 측정은 태아에서 채취한 다수의 DNA 시료에서 이루어진다. 본원의 일부 구현예에서, 태아 DNA 시료는 분리된 태아 세포에서 기원하며, 여기서 태아 세포는 모계 세포와 혼합될 수 있다. 본원의 일부 구현예에서, 태아 DNA 시료는 자유로이 부유하는 태아 DNA에서 기원하며, 여기서 태아 DNA는 자유로이 부유하는 모계 DNA와 혼합될 수 있다. 일부 구현예에서, 태아 DNA 시료는 모계 DNA 및 태아 DNA의 혼합물을 함유하는 모계 혈장 또는 모계 혈액에서 기원할 수 있다. 일부 구현예에서, 태아 DNA는 99.9:0.1% 내지 99:1%; 99:1% 내지 90:10%; 90:10% 내지 80:20%; 80:20% 내지 70:30%; 70:30% 내지 50:50%; 50:50% 내지 10:90%; 또는 10:90% 내지 1:99%; 1:99% 내지 0.1:99.9% 범위의 모:태아 비로 모계 DNA와 혼합될 수 있다.
표적 개체 및/또는 관련된 개체의 유전 데이터는 유전형 미세배열, 및 고 배출 서열분석을 포함하나, 이에 한정되지 않은 그룹 중에서 취한 도구 및/또는 기술을 사용하여 적절한 유전 물질을 측정함으로써 분자 상태에서 전기 상태로 전환시킬 수 있다. 일부 고배출 서열분석 방법은 생거 DNA 서열분석(Sanger DNA sequencing), 파이로씨퀀씽(pyrosequencing), ILLUMINA SOLEXA 플랫폼, ILLUMINA's GEMONE ANALYZER, 또는 APPLIED BIOSYSTEM's 454 서열분석 플랫폼, HELICOS's TRUE SINGLE MOLECULE SEQUENCING PLATFORM, HALCYON MOLECULAR의 전기 현미경 서열분석 방법, 또는 어떠한 다른 서열분석 방법을 포함한다. 이들 방법 모두는 DNA의 시료 속에 저장된 유전 데이터를 처리되는 도중에 기억 장치 속에 전형적으로 저장되는 유전 데이터의 세트로 물리적으로 전환시킨다.
관련된 개체의 유전 데이터는 개체의 거대 이배체 조직, 개체의 하나 이상의 이배체 세포, 개체의 하나 이상의 반수체 세포, 표적 개체의 난할구, 개체에서 발견된 세포외 유전 물질, 모친 혈액에서 발견된 개체의 세포외 유전 물질, 모친 혈액에서 발견된 개체의 세포, 관련된 개체의 생식세포(들)로부터 생성된 하나 이상의 배아, 이러한 배아에서 채취한 하나 이상의 난할구, 관련된 개체에서 발견된 세포외 유전 물질, 관련 개체에서 기원하는 것으로 알려진 유전 물질, 및 이의 조합을 포함하나, 이에 한정되지 않는 물질을 분석함으로써 측정할 수 있다.
일부 구현예에서, 적어도 하나의 배수성 상태 가설의 세트를 표적 개체의 목적한 염색체 형 각각에 대해 생성시킬 수 있다. 배수성 상태 가설 각각은 표적 개체의 염색체 또는 염색체 분절의 한가지 가능한 배수성 상태를 말할 수 있다. 이러한 가설의 세트는, 표적 개체의 염색체가 가지는 것으로 예측될 수 있는 가능한 배수성 상태 중의 일부 또는 모두를 포함할 수 있다. 가능한 배수성 상태 중의 일부는 결염색체성, 일염색체성, 이염색체성, 편친 이염색체성(uniparental disomy), 정배수성, 삼염색체성, 일치하는 삼염색체성, 일치하지 않는 삼염색체성, 모계 삼염색체성, 부계 삼염색체성, 사염색체성, 균형을 이룬 (2:2) 사염색체성, 균형을 이루지 않은(3:1) 사염색체성, 오염색체성, 육염색체성, 다른 이수성, 및 이의 조합을 포함할 수 있다. 이들 이수성 상태 중 어느 것도 균형을 이루지 않은 전좌, 균형을 이룬 전좌, 로벌슨 전좌(Robertsonian translocation), 재조합, 결실, 삽입, 교차, 및 이의 조합과 같은 혼합되거나 부분적인 이수성일 수 있다.
일부 구현예에서, 측정된 배수성 상태의 지식을 사용하여 임상 결정을 할 수 있다. 전형적으로 기억 장치 속에 물질의 물리적 배열로 저장된 당해 지식은 이후에 보고서로 전환될 수 있다. 이후에 보고서를 실행할 수 있다. 예를 들면, 임상 결정은 임신을 종결시키도록 이루어질 수 있으며; 달리는, 임상 결정은 임신을 지속시키기 위해 이루어질 수 있다. 일부 구현예에서, 임상 결정은 유전 질환의 표현형적 표시의 심각성을 감소시키도록 설계된 중재, 또는 장애아에 대해 준비하기 위한 관련 단계를 취하는 결정을 포함할 수 있다.
본원의 일 구현예에서, 본원에 개시된 방법 중 어느 것도 변형시켜 다수의 표적이 동일한 표적 개체, 예를 들면, 동일한 임신한 모친에서 채혈한 다수의 혈액에서 기원할 수 있게 할 수 있다. 이는, 다수의 유전적 측정이, 이로써 표적 유전형이 측정될 수 있는 보다 많은 데이터를 제공할 수 있으므로, 모델의 정확성을 개선시킬 수 있다. 일 구현예에서, 1개 세트의 표적 유전 데이터는 보고된 1차 데이터로서 제공되고, 다른 것은 1차 표적 유전 데이터를 이중-점검하기 위한 데이터로서 제공된다. 일 구현예에서, 각각 표적 개체에서 취한 유전 물질로부터 측정된, 복수 세트의 유전 데이터는 평행인 것으로 간주되므로, 표적 유전자 데이터의 세트 둘 모두는 고정밀도로 측정된, 모계 유전 데이터의 어느 단면이 태아 게놈을 구성하는지를 측정하는 데 도움을 주는 역할을 한다.
일 구현예에서, 당해 방법은 부계 시험의 목적으로 사용될 수 있다. 예를 들면, 모친의, 그리고 유전적 부친일 수 있거나 부친이 아닐 수 있는 남성의 SNP-기반 유전형 정보, 및 혼합된 시료로부터 측정된 유전형 정보를 제공함으로써, 또한 남성의 유전형 정보가 잉태된 태아의 실제 유전적 부친을 나타내는지를 판별하는 것이 가능하다. 이를 수행하는 단순한 방법은, 모친이 AA이고, 가능한 부친이 AB 또는 BB인 정보를 단순히 관찰하는 것이다. 이 경우, 부계 기여 1/2(AA|AB) 또는 모든(AA|BB) 시간을 각각 알아보는 것을 예측할 수 있다. 예측된 ADO를 고려하여, 관찰된 부계 SNP가 가능한 부친의 것과 관련되어 있는지를 측정하는 것이 간단하다.
본원의 일 구현예는 다음과 같을 수 있다: 임신한 여성이, 그녀의 태자녀 다운증후군으로 고생하고/하거나 낭성 섬유증으로 고생할 것인지 알기를 원하는 경우, 및 그녀가 이들 상태 중의 어느 것으로 고생할 자녀를 낳기를 원치 않는 경우. 의사는 그녀의 혈액을 채취하여, 헤모들로빈을 하나의 마커로 염색함으로써 이것이 명확하게 적색으로 보이고, 핵 물질을 다른 마커로 염색함으로써 이것이 명확이 청색으로 보이도록 한다. 모계 적혈구가 전형적으로 무핵이나, 고 비율의 태아 세포가 핵을 함유한 것을 안다면, 의사는 적색 및 청색 둘다를 나타내는 세포를 확인함으로써 핵화된 적혈구의 수를 가시적으로 분리할 수 있다. 의사는 미세조작기로 슬라이드로부터 이들 세포를 집어올려 이들을 증폭시켜 10개의 개개 세포로 유전형 분석하는 실험실로 보낸다. 유전적 측정을 사용함으로써, PARENTAL SUPPORT^TM 방법은, 10개 세포중 6개가 모계 혈액 세포이고, 10개 세포 중 4개가 태아 세포임을 측정할 수 있다. 자녀가 이미 임신한 모친으로부터 태어난 경우, PARENTAL SUPPORT^TM을 또한 사용하여, 태아 세포 상의 신뢰할 수 있는 대립유전자 요청을 이루고 이들이 태어난 자녀의 것과 유사하지 않음을 나타냄으로써 태아 세포가 태어난 자녀의 세포와 구별됨을 판별할 수 있다. 당해 방법은, 본원의 부모 시험 구현예에 대한 개념과 유사함에 주목한다. 태아 세포로부터 측정된 유전 데이터는, 많은 대립유전자 드롭 아웃을 포함하므로, 단일 세포를 유전형분석하는데 곤란성으로 인하여, 품질이 매우 불량할 수 있다. 임상의는 측정된 태아 DNA를 부모의 신뢰가능한 DNA 측정과 함께 사용하여 태아 게놈의 측면을 높은 정밀도로 PARENTAL SUPPORT^TM을 사용하여 부여할 수 있기 때문에, 태아의 유전 물질에 함유된 유전 데이터를 태아의 예측된 유전 상태로 전환시켜, 컴퓨터 상에 저장할 수 있다. 임상의는 태아의 배수성 상태, 및 목적한 질병-연결된 유전자 다수의 존재 또는 부재 둘 모두를 측정할 수 있다. 태자녀 정배수성이고, 낭성 섬유종에 대한 매개체가 아닌 것으로 밝혀지면, 모는 임신을 지속하는 것을 결정한다.
본원의 일 구현예에서, 임신한 모는, 그녀의 태자녀 어떠한 전체 염색체 비정상으로 고생하는지를 측정하기를 원할 수 있다. 그녀는 그녀의 의사한테 가서, 그녀의 혈액을 제공하고, 그녀 및 그녀의 남편이 그들의 볼 면봉에서 채취한 DNA를 제공한다. 실험실 연구자는 MDA 프로토콜을 사용하여 부모 DNA를 유전형분석함으로써 부모 DNA를 증폭하고, ILLUMINA INFINIUM 배열로 다수의 SNP에서 부모의 유전 데이터를 측정한다. 이후에, 연구자는 혈액을 스핀 다운(spin down)시켜, 혈장을 취하고, 크기 배제 크로마토그래피를 사용하여 자유로이-부유하는 DNA의 시료를 분리한다. 달리는, 연구자는 태아 헤모글로빈에 대해 특이적인 것과 같은, 하나 이상의 형광성 항체를 사용하여 핵화된 태아 적혈구를 분리한다. 연구자는 이후에, 분리되거나 농축된 태아 유전 물질을 취하고 이를 각각의 올리고뉴클레오타이드의 2개 말단이 표적 대립유전자의 한쪽 측면에서 플랭킹 서열에 상응하도록 적절히 설계된 70-머 올리고뉴클레오타이드의 라이브러리를 사용하여 증폭시킨다. 폴리머라제, 리가제, 및 적절한 시약의 첨가 시, 올리고뉴클레오타이드는 갭-충전, 순환화, 목적한 대립유전자의 포획을 겪는다. 엑소뉴클레아제를 가하고, 열-불활성화시키고, 생성물을 PCR 증폭용 주형으로 직접 사용하였다. PCR 생성물을 ILLUMINA GENOME ANALYZER 상에서 서열분석하였다. 서열 판독물을 이후, 태아의 배수성 상태를 예측하는, PARENTAL SUPPORT^TM 방법을 위한 입력으로 사용하였다.
다른 구현예에서, 모가 임신중이고, 산모 연령이 많은 한쌍이, 잉태중인 태자녀 다운 증후군, 터너증후군, 프라더 윌리 증후군(Prader Willi syndrome), 또는 일부 다른 전체 염색체 비정상을 가지고 있는지를 알고 싶어한다. 산부인과 전문의는 모친 및 부친에서 혈액을 채혈한다. 혈액은 실험실로 보내져서, 여기서 기술자들은 모계 시료를 원심분리하여 혈장 및 연막으로 분리한다. 연막 속의 DNA 및 부계 혈액 시료를 증폭을 통해 전환시키고, 증폭된 유전 물질 속에 코드화된 유전 데이터를 분자 저장된 유전 데이터로부터 전자적으로 저장된 유전 데이터로 고배출 서열분석기상에서 유전 물질을 작동시켜 추가로 전환시킴으로써 부모계 유전형을 측정한다. 혈장 시료를 5,000-플렉스 반-내포된 표적화된 PCR 방법을 사용하여 유전자자리의 세트에 우선적으로 농축시킨다. DNA 단편의 혼합물을 서열분석에 적합한 DNA 라이브러리로 제조한다. 이후에 DNA를 고 배출 서열분석 방법, 예를 들면, ILLUMINA GAIIx GENOME ANALYZER를 사용하여 서열분석한다 서열분석은 DNA 속에 분자적으로 인코딩된 정보를 컴퓨터 하드웨어 속에 전기적으로 인코딩된 정보로 전환시킨다. 현재 개시된 구현예를 포함하는 정보학 기반 기술, 예를 들면, PARENTAL SUPPORT^TM을 사용하여 태아의 배수성 상태를 측정한다. 이는 컴퓨터 상에서, 제조된 시료 상에서 이루어진 DNA 측정에 의하여 다수의 다형성 유전자자리에서 대립유전자 수 확률을 계산하고; 컴퓨터상에서 각각 염색체의 상이한 가능한 배수성 상태에 관한 다수의 배수성 가설을 생성하며; 컴퓨터상에서 각각의 배수성 가설에 대한 염색체 상의 다수의 다형성 유전자자리에서 예측된 대립유전자 수에 대한 결합 분포 모델을 구축하고; 컴퓨터 상에서 결합 분포 모델 및 제조된 시료에서 측정된 대립유전자 수를 사용하여 각각의 배수성 가설의 상대적인 확률을 측정하며; 최대 확률을 갖는 가설에 상응하는 배수성 상태를 선택함으로써 태아의 배수성 상태를 요청하는 단계를 포함할 수 있다. 태아는 다운 증후군을 갖는 것으로 측정된다. 보고서를 출력하거나, 임신한 여성의 산부인과 전문의에게 전기적으로 송신하며, 전문의는 여성에게 진단을 내린다. 여성, 그녀의 남편, 및 의사는 앉아서 이들의 의견을 논의한다. 그 쌍은, 태자녀 삼염색체성 상태로 고생한다는 지식을 기반으로 임신을 중단할 것을 결정한다.
일 구현예에서, 회사는 모친 채혈로부터, 잉태중인 태아에서 이수성을 검출하도록 설계된 진단 기술을 제공하는 것을 결정할 수 있다. 이들의 제품은, 모친이 그녀의 피를 채혈하는 그녀의 산부인과 전문의에게 보여주는 것을 포함할 수 있다. 산부인과 전문의는 또한 태아의 부친으로부터 유전 시료를 수집할 수 있다. 임상의는 모계 혈액에서 혈장을 분리하고, 혈장의 DNA를 정제한다. 임상의는 또한 모계 혈액에서 연막 층을 분리하여 연막으로부터 DNA를 제조할 수 있다. 임상의는 또한 부모 유전 시료로부터 DNA를 제조할 수 있다. 임상의는 본 기재내용에 기술된 분자 생물학 기술을 사용하여 혈장 시료에서 기원한 DNA 속의 DNA에 대한 공통의 증폭 태그를 부착한다. 임상의는 공통적으로 태그된 DNA를 증폭시킬 수 있다. 임상의는 하이브리드화에 의한 포획 및 태그된 PCR을 포함하는 다수의 기술에 의해 DNA를 우선적으로 농축시킬 수 있다. 태그된 PCR은 혈장 기원한 DNA의 효율적인 농축을 초래하는 중첩, 반-중첩 또는 어느 정도의 중첩, 또는 다른 어떠한 접근법을 포함할 수 있다. 표적화된 PCR은 예를 들면, 하나의 반응 용적 속에서 10,000개의 프라이머와 거대하게 다중화시킬 수 있으며, 여기서 프라이머는 염색체 13, 18, 21, X 및 X, 및 Y 둘 모두에 대해 일반적인 유전자자리, 및 임의로 다른 염색체에서 또한 SNP를 표적화한다. 선택적인 농축 및/또는 증폭은 상이한 태그, 분자 바코드, 증폭용 태그 및/또는 서열분석용 태그를 지닌 각각의 개개 분자를 태그화함을 포함할 수 있다. 이후에 임상의는 혈장 시료를 서열분석하고 또한 가능하게는 제조된 모계 및/또는 부계 DNA를 서열분석할 수 있다. 분자 생물학 단계는 진단 박스에 의해 전체적으로 또는 부분적으로 실행될 수 있다. 서열 데이터를 단일 컴퓨터, 또는 '혼탁'으로 발견될 수 있는 바와 같이, 계산 플랫폼(computing platform)의 다른 유형에 공급할 수 있다. 계산 플랫폼은 서열분석기에 의해 이루어진 측정에 의하여 표적화된 다형성 유전자자리에서 대립유전자 수를 계산할 수 있다. 계산 플랫폼은 염색체 13, 18, 21, X 및 Y 각각에 대해 결염색체성, 일염색체성, 이염색체성, 일치된 삼염색체성, 및 일치하지 않은 삼염색체성에 관한 다수의 배수성 가설을 생성할 수 있다. 계산 플랫폼은, 정보가 얻어지는 5개의 염색체 각각에 대한 각각의 배수성 가설에 대해 컴퓨터 상에서 표적화된 유전자자리에서 예측된 대립유전자 수에 대한 결합 분포 모델을 구축할 수 있다. 계산 플랫폼은, 각각의 배수성 가설이 결합 분포 모델 및 혈장 시료에서 기원한 우선적으로 농축된 DNA에서 측정된 대리병질 수를 사용하여 진실이 될 확률을 측정할 수 있다. 계산 플랫폼은 최대 확률로 적절한 가설에 상응하는 배수성 상태를 선택함으로써 각각의 염색체 13, 18, 21, X 및 Y에 대한, 태아의 배수성 상태를 요청할 수 있다. 요청된 배수성 상태를 포함하는 보고서를 생성하여, 이를 산부임과 전문의에게 전자적으로 송신하여, 출력 장치에서 나타낼 수 있거나, 보고서의 인쇄된 하드 카피를 산부인과 전문의에게 전달할 수 있다. 산부인과 전문의는 부모 및 임의로 태아의 부친에게 통지할 수 있으며, 이들은, 어떠한 임상적 선택이 그들에게 개방되어 있는지, 및 어느 것이 가장 바람직한지를 결정할 수 있다.
다른 구현예에서, 이후 "모"로 언급되는 임신한 여성은, 그녀가 그녀의 태자녀 어떠한 유전적 비정상 또는 다른 상태를 수반하고 있는지의 여부를 알기를 원하는지를 결정할 수 있다. 그녀는, 그녀가 임신을 지속하는 것을 확신하기 전에 어떠한 전체적인 이상이 없는지를 보증하기를 원할 수 있다. 그녀는 그녀의 혈액의 시료를 취할 수 있는 그녀의 산부인과 전문의를 방문할 수 있다. 그는 그녀의 뺨에서 본 면봉과 같은 유전 시료를 취할 수 있다. 그는 또한 볼 면봉, 정자 시료, 또는 혈액 시료와 같은 태아의 부친에서 유전 시료를 취할 수 있다. 그는 당해 시료를 임상의에게 보낼 수 있다. 임상의는 모계 혈액 시료 속에서 자유로이 부유하는 태아 DNA의 분획을 농축시킬 수 있다. 임상의는 모계 혈액 시료에서 핵화된 태아 혈액 세포의 분획을 농축시킬 수 있따. 임상의는 본원에 기술된 방법의 각종 측면을 사용하여 티아의 유전 데이터를 측정할 수 있다. 당해 유전 데이터는 태아의 배수성 상태, 및/또는 태아에서 질병 연관된 하나 또는 다수의 대립유전자의 실체를 포함할 수 있다. 태아 검진의 결과를 요약하는 보고서를 생성시킬 수 있다. 당해 보고서는 의사에게 전송되거나 우편으로 보내질 수 있으며, 의사는 태아의 유전 상태를 모에게 말할 수 있다. 모는, 태자녀 하나 이상의 염색체성, 또는 유전적 비정상, 또는 바람직하지 않은 상태를 가지고 있다는 사실을 기반으로 임신을 중지하는 것을 결정할 수 있다. 그녀는 또한, 태자녀 어떠한 전체적인 염색체성 또는 유전적 비정상, 또는 목적한 어떠한 유전적 상태를 가지고 있지 않다는 사실을 기반으로 임신을 지속하는 것을 결정할 수 있다.
다른 예는 정자 공여자에 의해 인공 수정되어 임신된 임신한 여성을 포함할 수 있다. 그녀는, 태자녀 유전 정보를 수반할 위험을 최소화하기를 원한다. 그녀는 전문의사에게 채혈하여, 본원에 기술된 기술을 사용하여 핵화된 태아 적혈구를 분리하며, 조직 시료는 또한 모친 및 유전적 부친에서 수집된다. 태아, 모친 및 부친의 유전 물질은 적절하게는 증폭시켜 ILLUMINA INFINIUM BEADARRAY를 사용하여 유전형 분석되며, 본원에 기술된 방법은 부모 및 태아 유전형을 고 정밀도로 세정하고 단계화하며, 태아에 대한 배수성 요청을 이룬다. 태자녀 정배수성인 것으로 밝혀지고, 표현형적 감수성이 재구축된 태아 유전형으로부터 예측되며, 보고서가 생성되면 모의 주치의에게 보내어 이들이 가장 우수한 임상 결정이 무엇일지를 결정할 수 있도록 한다.
일 구현예에서, 모친 및 부친의 조악한 유전 물질은, 서열이 유사하지만, 양이 보다 많은 DNA의 양으로 증폭시킴으로써 전환시킨다. 이후에, 유전형 분석 방법으로, 핵산에 의해 인코딩된 유전형 데이터를 위에 기술된 것들과 같은 기억 장치 상에 물리적 및/또는 전자적으로 저장할 수 있다. PARENTAL SUPPORT^TM 알고리즘을 제조하는 관련 알고리즘, 이의 관련 부분은 본원에 상세히 논의되어 있으며 프로그래밍 언어를 사용하여 컴퓨터 프로그램으로 해석한다. 이후에, 비트 및 바이트를 물리적으로 인코딩하는 대신에, 조악한 측정 데이터를 나타내는 양식으로 정렬된, 컴퓨터 하드웨어 상의 컴퓨터 프로그램의 실행을 통해, 이들은 태아의 배수성 상태의 고 신뢰도 측정을 나타내는 양식으로 전환되어진다. 이러한 전환의 세부사항은 데이터 자체 및 본원에 기술된 방법을 실행하는데 사용된 컴퓨터 언어 및 하드웨어 시스템에 의존할 것이다. 이후에, 물리적으로 구성되어 태아의 고 품질 배수성 측정을 나타내는 데이터는 의사에게 보내질 수 있는 보고서로 전환된다. 이러한 전환은 프린터 또는 컴퓨터 디스플레이를 사용하여 수행될 수 있다. 보고서는 종이 위 또는 다른 적합한 매체 위에 인쇄된 카피일 수 있거나, 또는 이는 전자적일 수 있다. 전자 보고서의 경우에, 이는 전송될 수 있고, 의사가 접근가능한 컴퓨터 상의 위치에 기억 장치에 물리적으로 저장될 수 있거나, 또한 스크린에 나타내어 이를 판독할 수 있게 할 수 있다. 스크린 디스플레이의 경우에, 데이터는 디스플레이 장치 상의 화소(pixel)의 물리적 전환을 유발함으로써 판독가능한 양식으로 전환될 수 있다. 전환은 인광성 스크린에서 물리적으로 방사되는 전자의 방식에 의해, 광자를 방사하거나 흡수하는 기판의 전면에 놓일 수 있는 스크린 상의 화소의 특정한 세트의 투명성을 물리적으로 변화시키는 전기적 전하를 변경시킴에 의해 달성될 수 있다. 이러한 전환은 특수 세트의 화소에 있어서 네마틱에서 콜레스테릭 또는 스메틱 상까지, 액정 속의 분자의 나노규모 배향을 변형시킴으로써 달성할 수 있다. 이러한 전환은, 광자가 의미있는 양식으로 정렬된 다수의 발광 다이오드로 제조된 특수 세트의 픽셀로부터 방사되도록 하는 전류의 방식으로 달성할 수 있다. 이러한 전환은 컴퓨터 스크린, 또는 일부 다른 출력 장치 또는 전송 정보의 사용과 같은 디스플레이 정보를 사용한 다른 어떠한 방식으로 달성할 수 있다. 의사는 이후에 보고서에 따라 행동하여, 보고서 속의 데이터가 행동으로 전환되도록 한다. 이러한 행동은 임신을 지속하거나 중지시키는 것일 수 있으며, 이 경우 유전적 비정상인 잉태된 태아는 죽은 태아로 전환된다. 본원에 나열된 정보를 종합함으로써, 예를 들면, 임신한 모계 및 부계 유전 물질을, 본 기재내용에 요약된 다수의 단계를 통해 유전적 비정상인 태아의 낙태로 이루어지거나, 임신의 중지로 이루어진 의학적 결정으로 전환할 수 있다. 달리는, 주치의가 그의 임신한 환자를 치료하는 것을 돕는 보고서로 유전형 측정의 세트를 전환할 수 있다.
본원의 일 구현예에서, 본원에 기술된 방법을 사용하여 숙주 모, 즉, 임신한 여성이 그녀가 지닌 태아의 생물학적 모가 아닌 경우에도 태아의 배수성 상태를 측정할 수 있다. 본원의 일 구현예에서, 본원에 기술된 모를 사용하여 모계 혈액 시료만을 사용하고, 부계 유전 시료에 대한 요구없이 태아의 배수성 상태를 측정할 수 있다.
현재 개시된 구현예에서 수학의 일부는 이수성 상태의 제한된 수에 관한 가설을 만든다. 일부 경우에, 예를 들면, 단지 0, 1 또는 2개의 염색체가 각각의 환자에서 기원하는 것으로 예측된다. 본원의 일부 구현예에서, 수학적 편차는 사염색체성과 같은 이수성의 다른 형태를 고려하여 확장시킬 수 있으며, 여기서 3개의 염색체는 본원의 근본적인 개념을 변화시키지 않고, 하나의 부모, 오염색체성, 육염색체성 등에서 기원한다. 동시에, 보다 적은 수의 배수성 상태, 예를 들면, 단지 삼염색체성 및 이염색체성에 촛점을 맞추는 것도 가능하다. 전체 수가 아닌 염색체를 나타내는 배수성 측정은 유전 물질의 시료 속에서 모자이크 현상을 나타낼 수 있음에 주목한다.
일부 구현예에서, 유전 비정상은 다운 증후군(또는 삼염색체성 21), 에드워드 증후군(Edwards syndrome)(삼염색체성 18), 파타우 증후군(Patau syndrome)(삼염색체성 13), 터너 증후군(45X), 클라인펠터 증후군(Klinefelter's syndrome)(2X 염색체를 지닌 남성), 프라더-윌리 증후군, 및 디조지 증후군(DiGeorge syndrome)(UPD 15)과 같은 이수성 형태이다. 앞서의 문장에 나열된 것과 같은 선천적 장애는 일반적으로 바람직하지 않으며, 태자녀 하나 이상의 표현형적 비정상으로 고생한다는 지식은, 주치의가 임신을 종결시키고, 장애아의 출생에 준비하기 위한 필수적인 예방책을 취하거나, 염색체 비정상의 심각성을 낮추는 것을 의미하는 일부 치료학적 접근법을 취하는 결정에 대한 근거를 제공할 수 있다.
일부 구현예에서, 본원에 기술된 방법을 매우 초기 임신 기간에, 예를 들면 4주 정도로 조기, 5주 정도로 조기, 6주 정도로 조기, 7주 정도로 조기, 8주 정도로 조기, 9주 정도로 조기, 10주 정도로 조기, 11주 정도로 조기 및 12주 정도로 조기에 사용할 수 있다.
일부 구현예에서, 본원에 개시된 방법은 시험관 수정 동안 배아 선택을 위한 착상전 유전 진단(PGD)과 관련하여 사용되며, 여기서 표적 개체는 배자녀고, 부모계 유전형 데이터를 사용하여 3일째 배아의 1 또는 2개의 세포 생검 또는 5 또는 6일째 배아의 영양외배엽 생검에서 비롯된 서열분석 데이터에 의하여 배아에 대한 배수성 측정을 이룰 수 있다. PGD 설정에서, 자녀 DNA 만을 측정하며, 소수의 세포만, 일반적으로 1 내지 5개 그러나 10, 20 또는 50개 정도로 많은 세포가 시험된다. 이후에, A 및 B 대립유전자(SNP에서)의 출발하는 카피의 총 수는 자녀 유전형 및 다수의 세포에 의해 사소하게 측정된다. NPD에서, 출발하는 카피의 수는 매우 높으므로 PCR 후 대립유전자 비는 출발하는 비를 정밀하게 반영하는 것으로 예측된다. 그러나, PGD에서 소수의 출발 카피는, 오염 및 불완전한 PCR 효능이 PCR 후 대립유전자 비에서 사소한 효과를 가지지 않음을 의미한다. 당해 효과는 서열분석 후 측정된 대립유전자 비에서의 변화를 예측하는데 있어서 판독물의 깊이보다 더 중요할 수 있다. 공지된 자녀 유전형을 제공하는 측정된 대립유전자의 분포는 PCR 프로브 효능 및 오염 확률을 기반으로 한 PCR 공정의 몬테 카를로 시뮬레이션(Monte Carlo simul ation)으로 생성시킬 수 있다. 각각의 가능한 자녀 유전형에 대한 대립 형질 비 분포를 제공하여, 다양한 가설의 가능성을 NIPD에 대해 계산할 수 있다.
최대 확률 평가
생물학적 현상 또는 의학적 상태의 존재 또는 부재를 검출하기 위해 당해 분야에 공지된 대부분의 방법은 단일 가설 거부 시험의 사용을 포함하며, 여기서 상태와 관련된 미터법을 측정하고, 당해 미터법이 소정의 한계의 한 면에 있는 경우, 상태가 존재하지만, 미터법이 한계의 다른 면에 떨어지는 경우, 상태는 부재한다. 단일-가설 거부 시험은, 널(null)과 대안의 가설 사이에 결정하는 경우 널 분포를 관찰한다. 대안의 분포를 고려하지 않고, 관찰된 데이터에 의해 소정의 각 가설의 가능성을 평가할 수 없으므로 요청시 신뢰도를 계산할 수 없다. 따라서, 단일 가설 거부 시험을 사용하여, 특정한 경우와 관련된 신뢰도에 대한 느낌없이 긍정 또는 부정의 대답을 얻는다.
일부 구현예에서, 본원에 개시된 방법은 최대 확률 방법을 사용하여 생물학적 현상 또는 의학적 상태의 존재 또는 부재를 검출할 수 있다. 이는, 상태의 존재 또는 부재를 요청하기 위한 역치를 각각의 경우에 대해 적절한 것으로 조절할 수 있으므로 단일의 가설 거부 기술을 사용하는 방법보다 실질적으로 개선되어 있다. 이는 특히 모계 혈장에서 발견된 자유로이 부유하는 DNA 속에 존재하는 태아 및 모계 DNA의 혼합물로부터 입수할 수 있는 유전 데이터에 의하여 잉태된 태아에서 이수성의 존재 또는 부재를 측정하는 것을 목표로 하는 진단 기술과 특히 관련되어 있다. 이는, 혈장 기원한 분획 속의 태아 DNA의 분획이 변화하면서, 이수성 대 정배수성을 요청하기 위한 최적의 한계가 변하기 때문이다. 태아 분획이 떨어지면서, 이수성과 관련된 데이터의 분포는 정배수성과 관련된 데이터의 분포와 크게 유사해진다.
최대 확률 평가 방법을 각각의 가설과 관련된 분포를 사용하여 각각의 가설에 대해 조건화된 데이터의 가능성을 평가한다. 이후에, 이들 조건적 가능성을 가설 요청 및 신뢰도로 전환시킬 수 있다. 유사하게, 최대 귀납적 평가 방법을 최대 확률 평가와 동일한 조건부 확률을 사용하지만, 또한 가장 우수한 가설을 선택하여 신뢰도를 측정하는 경우 이전의 집단을 포함한다.
따라서, 최대 확률 평가(MLE) 기술, 또는 밀접하게 관련된 최대 귀납적(MAP) 기술은 2개의 장점을 제공하는데, 첫째는 이것이 정확한 요청의 기회를 증가시키는 것이고, 이것이 또한 각각의 요청에 대해 신뢰도가 계산되도록 한다는 것이다. 일 구현예에서, 최대 확률로 가설에 상응하는 배수성 상태를 선택하는 것은 최대 확률 평가 또는 최대 귀납적 평가를 사용하여 수행한다. 일 구현예에서, 단일의 가설 거부 기술을 사용하는 당해 분야에 현재 공지된 어떠한 방법도 취하여 이를 재제형화함으로써 이것이 MLE 또는 MAP 기술을 사용하도록 함을 포함하는 잉태된 태아의 배수성 상태를 측정하기 위한 방법이 기재되되어 있다. 이들 기술을 적용함으로써 유의적으로 개선시킬 수 있는 방법의 일부 예는 미국 특허 제8,008,018호, 미국 특허 제7,888,017호, 또는 미국 특허 제7,332,277호에서 찾을 수 있다.
일 구현예에서, 태아 및 모계 게놈 DNA를 포함하는 모계 혈장 시료 속에서 태아 이수성의 존재 또는 부재를 측정하는 방법이 기술되어 있으며, 당해 방법은 모계 혈장 시료를 수득하는 단계; 혈장 시료 속에서 발견된 DNA 단편을 고 배출 서열분석기로 측정하는 단계; 서열을 염색체에 맵핑하여 각각의 염색체에 맵핑되는 서열 판독물의 수를 측정하는 단계; 혈장 시료 속의 태아 DNA의 분획을 계산하는 단계; 제2의 표적 염색체가 정배수성인 경우 존재하는 것으로 예측될 수 있는 표적 염색체의 양의 예측된 분포 및 이러한 염색체가 이수성이었던 경우 예측될 수 있는 하나 또는 다수의 예측된 분포를, 태아 분획 및, 정배수성인 것으로 예측된 하나 또는 다수의 참조 염색체를 맵핑하는 서열 판독물의 수를 사용하여 계산하는 단계; 및 MLE 또는 MAP를 사용하여 분포 중 어느 것이 가장 정확하게 될 가능성이 있는지를 측정함으로써 태아 이수성의 존재 또는 부재를 나타내는 단계를 포함한다. 일 구현예에서, 혈장의 DNA를 측정하는 단계는 거대하게 평행한 셧건 서열분석을 수행하는 단계를 포함할 수 있다. 일 구현예에서, 혈장 시료의 DNA를 측정하는 단계는 예를 들면, 표적화된 증폭을 통해 다수의 다형성 또는 비-다형성 유전자자리에서 우선적으로 농축된 DNA를 서열분석하는 단계를 포함할 수 있다. 다수의 유전자자리는 하나 또는 소수의 예측된 이수성 염색체 및 하나 또는 소수의 참조 염색체를 표적화하도록 설계될 수 있다. 우선적인 농축의 목적은 배수성 측정을 위해 유익한 서열 판독물의 수를 증가시키는 것이다.
배수성 요청 정보학 방법
서열 데이터가 소정의 태아의 배수성 상태를 측정하기 위한 방법이 본원에 기술되어 있다. 일부 구현예에서, 당해 서열은 고 배출 서열분석기 상에서 측정할 수 있다. 일부 구현예에서, 당해 서열 데이터는 모계 혈액에서 분리된 자유로이 부유하는 DNA에서 기원한 DNA 상에서 측정할 수 있으며, 여기서 자유로이 부유하는 DNA는 모체 기원의 일부 DNA, 및 태아/태반 기원의 일부 DNA를 포함한다. 이러한 부분은 본원의 일 구현예를 기술할 것이며, 여기서 태아의 배수성 상태는 분석된 혼합물 속의 태아 DNA의 분획이 알려져 있지 않고 데이터에 의하여 평가될 것으로 추정하여 측정한다. 또한 혼합물 속의 태아 DNA의 분획("태아 분획") 또는 태아 DNA의 퍼센트가 다른 방법으로 측정될 수 있고 태아의 배수성 상태를 측정하는데 있어서 알려져 있는 것으로 추정되는 양태를 기술할 것이다. 일부 구현예에서, 태아 분획은 태아 및 모계 DNA의 혼합물인, 모계 혈액 시료 자체에서 이루어진 유전형 측정만을 사용하여 계산할 수 있다. 일부 구현예에서, 분획은 또한 모친의 측정되거나 달리 알려진 유전형 및/또는 부친의 측정되거나 달리 공지된 유전형을 사용하여 계산할 수 있다. 다른 구현예에서, 태아의 배수성 상태는 이염색으로 추정된 참조 염색체에 대한 태아 DNA의 계산된 분획과 비교하여 문제의 염색체에 대한 태아 DNA의 계산된 분획을 기반으로 유일하게 측정될 수 있다.
바람직한 구현예에서, 특정한 염색체에 대해, 본 발명자들은 N개의 SNP를 관찰하고 분석하고, 이에 대해 다음을 가진 것으로 가정한다:
●NR 자유로이 부유하는 DNA 서열 측정 S=(s₁,....,s_NR)을 설정한다. 당해 방법은 SNP 측정을 이용하므로, 비-다형성 유전자자리에 상응하는 모든 서열 데이터를 무시할 수 있다. 본 발명자들이 각각의 SNP에 대해 (A,B) 수(여기서 A 및 B는 소정의 유전자자리에서 2개의 대립유전자에 상응한다)를 갖는 단순화된 버젼에서, S는 S=((a₁,b₁),....,(a_N, b_N))(여기서 a_i는 SNP i 에서 A 수이고, b_i는 SNP i 에서 B 수이다),

이다.
●부모 데이터는 다음으로 이루어진다:
○ SNP 미세배열의 유전형 또는 다른 강도에 기반한 유전형 플랫폼:
모친 M=(m₁,...,m_N), 부친 F=(f₁,...,f_N)(여기서 m_i, f_i

(AA,AB, BB)이다).
○ AND/OR 서열 데이터 측정: NRM 모친 측정 SM=(sm₁,...,sm_nrm), NRF 부친 측정 SF=(sf₁,...,sf_nrf). 상기 증폭과 유사하게, 본 발명자가 각각의 SNP에서 (A,B) 수를 갖는 경우 SM은 ((am₁,bm₁),...,(am_N, bm_N)), SF=((af₁,bf₁),...,(af_N, bf_N))이다.
종합적으로, 모친, 부친 자녀 데이터는 D = (M,F,SM,SF,S)로 나타낸다. 부모 데이터가 바람직하고 알고리즘의 정밀도를 증가시키지만, 특히 부계 데이터는 필수적이지 않다(NOT). 이는, 심지어 모계 및/또는 부계 데이터의 부재하에서도, 매우 정밀한 카피 수 결과를 얻는 것이 가능함을 의미한다.
고려된 전체 가설(H)에 걸쳐 데이터 로그 가능성

을 최대화하여 가장 우수한 수 평가(

를 유도하는 것이 가능하다. 특히, 제조된 시료에서 측정된 결합 분포 모델 및 대립유전자 수를 사용하고, 이러한 상대적 확률을 사용하여 배수성 가설 각각의 상대적인 확률을 측정하여 다음과 같이 교정될 가능성이 가장 큰 가설을 측정한다:

유사하게, 데이터를 제공한 귀납적 가설 가능성은 다음과 같이 쓰여질 수 있다:

여기서, priorprob(H)는 모델 설계 및 선행 지식을 기반으로, 각각의 가설 H에 대해 지정된 선행 확률(prior probability)이다.
선행들(priors)을 사용하여 최대의 귀납적 평가를 찾는 것도 또한 가능하다:

일 구현예에서, 고려될 수 있는 카피 수 가설은 다음과 같다:
● 일염색체성:
○ 모친 H10(모친의 1개의 카피)
○ 부친 H01(부친의 1개의 카피)
● 이염색체성: H11(모친 및 부친 각각 1개의 카피)
● 단순한 삼염색체성, 교차는 고려되지 않는다:
○ 모친: H21_일치됨(모친의 2개의 동일한 카피, 부친의 1개의 카피), H21_일치되지 않음(모친의 카피, 부친의 1개의 카피 둘 모두)
○ 모친: H12_일치됨(모친의 1개의 카피, 부친의 2개의 동일한 카피), H12_일치되지 않음(모친의 1개의 카피, 부친의 카피 둘 모두)
● 복잡한 삼염색체성, 교차 허용(결합 분포 모델 사용):
○ 모친 H21(모친의 2개의 카피, 부친의 1개의 카피),
○ 부친 H12(모친의 1개의 카피, 부친의 2개의 카피)
다른 구현예에서, 결염색체성(H00), 편친 이염색체성(H20 및 H02), 및 사염색체성(H04, H13, H22, H31 및 H40)이 고려될 수 있다.
교차가 존재하지 않는 경우, 각각의 삼염색체성은, 기원이 유사분열, 감수분열 I, 또는 감수분열 II이었던 것에 상관없이, 일치하거나 일치하지 않는 삼염색체성 중의 하나일 수 있다. 교차로 인하여, 실제의 삼염색체성은 일반적으로 2의 조합이다. 첫째로, 단순한 가설에 대한 가설 가능성을 유도하는 방법이 기술되어 있다. 이후에, 복잡한 가설에 대한 가설 가능성을 유도하기 위한 방법이 개개 SNP 가능성을 교차와 결합시켜 기술되어 있다.
단순한 가설에 대한 LIK (D|H)
일 구현예에서, LIK(D|H)는 다음과 같은 단순한 가설에 대해 측정될 수 있다. 단순한 가설 H의 경우, 전체 게놈에서 가설 H의 로그 가능성, LIK(H)는, 공지되거나 기원한 자녀 분획 cf를 추정하여 개개의 SNP의 로그 가능성의 합으로서 계산될 수 있다. 일 구현예에서 데이터로부터 cf를 유도시키는 것이 가능하다.

이러한 가설은 SNP 사이의 어떠한 연결도 추정하지 않으므로, 결합 분포 모델을 이용하지 않는다.
일부 구현예에서, Log 가능성은 SNP 기준당 측정할 수 있다. 특정 SNP i에서, 태아 배수성 가설 H 및 태아 DNA 퍼센트 cf를 추정할 때, 관찰된 데이터 D의 로그 가능성은 다음과 같이 정의된다:

여기서, m은 가능한 실제 모계 유전형이고, f는 가능한 실제 부계 유전형이며, 여기서, m,f

{AA,AB,BB}이고, c는 가설 H를 제공한 가능한 자녀 유전형이다. 특히, 일염색체성의 경우 c

, 이염색체성의 경우 c

, 삼염색체성의 경우 c

이다.
유전형 선행 빈도: p(m|i)는 pA _i 로 나타낸, SNP I에서 공지된 집단 빈도에 기초한, SNP i에서의 모계 유전형의 일반적인 선행 확률이다. 특히

,

이다.
부친 유전형 확률, p(f|i)은 유사한 방식으로 측정될 수 있다.
실제 자녀 확률:

은 부모 m, f을 제공하고, 용이하게 계산될 수 있는 가설 H를 추정하여, 실제 자녀 유전형 = c를 얻을 확률이다. 예를 들면, H11의 경우, H21은 일치하며 H21은 일치하지 않고, p(c|m,f,H)는 하기 제공된다.

matched: 일치됨, unmatched: 일치되지 않음
데이터 가능성:

은 실제 모계 유전형 m, 실제 부계 유전형 f, 실제 자녀 유전형 c, 가설 H 및 자녀 분획 cf를 고려해볼 때, SNP i 에서의 특정 데이터 D의 확률이다. 이는 다음과 같이 모친, 부친 및 자녀 데이터의 확률로 나눌 수 있다:

모친 SNP 배열 데이터 가능성: SNP 배열 유전형이 정확하다고 가정할 시 실제 유전형 m과 비교한 SNP i에서의 모친 SNP 배열 유전형 데이터

은 단순히

이다.
모친 서열 데이터 가능성: SNP i 에서의 모친 서열 데이터의 확률은, 수 S_i=(am_i,bm_i)의 경우에, 가외의 노이즈 또는 편향이 연관되지 않을 시, P(SM|m,i)=P_X _|m(am_i)로 정의된 바이어스 확률이고, 여기서, X|m~Binom(p_m(A), am_i+bm_i)와

는 다음과 같이 정의된다:

no call: 요청 없음
부친 데이터 가능성: 유사한 식을 부계 데이터 가능성에 적용한다. 부모 데이터, 특히 부계 데이터의 부재하에 자녀 유전형을 결정하는 것이 가능함에 주목한다. 예를 들어, 부계 유전형 데이터 F가 이용가능한 경우, 바로

을 사용할 수 있다. 부계 서열 데이터 SF가 이용가능하지 않은 경우, 바로 P(SF|f,i)=1을 이용할 수 있다.
일부 구현예에서, 당해 방법은 배수성 가설 각각에 대한 염색체 상의 다수의 다형성 유전자 자리에서의 예측된 대립유전자 수에 대한 결합 분포 모델을 구축함을 포함한다: 하나의 방법은 본원에 기술된 목적을 달성하는 것이다. 유리된 태아 DNA 데이터 가능성:

는 실제 모계 유전형 m, 실제 자녀 유전형 c, 자녀 카피 수 가설 H를 고려해볼 때, 그리고, 자녀 분획 cf를 추정할 시, SNP i 에서의 유리된 태아 DNA 서열 데이터의 확률이다. 이는 사실 SNP i

에서 A 함량의 실제 확률을 고려해볼 때, SNP I에서 서열 데이터 S의 확률이다.

계수를 위해, S_i는 (a_i,b_i)이고, 데이터 속에 초과의 노이즈 또는 편향이 포함되지 않으며,

여기서 X는 Binom(p(A), a_i+b_i)이고 p(A)=

이다. 정확한 정렬 및 SNP당 (A,B) 수가 알려져 있지 않은 보다 복잡한 경우에,

는 통합된 바이어스의 조합이다.
실제 A 함량 확률:

, 이러한 모친/부친 혼합물의 SNP i 에서 A 함량의 실제 확률은, 실제 모계 유전형 = m, 실제 자녀 유전형 = c, 및 전체 자녀 분획 = cf를 추정하면, 다음과 같이 정의된다:

여기서 #A(g)는 유전형 g의 A의 수이고,

은 모의 소미(somy)이고

는 가설 H 하의 자녀의 배수성(일염색체성의 경우 1, 이염색체의 경우 2, 삼염색체성의 경우 3)이다.
결합 분포 모델의 사용: 복합 가설에 대한 LIK (D|H)
일부 구현예에서, 당해 방법은 각각이 배수성 가설에 대한 염색체의 다수의 다형성 유전자자리에서 예측된 대립유전자 수에 대한 결합 분포 모델의 구축을 포함하고; 이러한 목표를 달성하기 위한 한가지 방법이 본원에 기술되어 있다. 많은 경우에, 삼염색체성은 일반적으로 교차로 인하여 순수하게 일치하지 않거나 일치하지 않으므로, 당해 단락에서는 복합 가설 H21(모계 삼염색체성) 및 H12(부모계 삼염색체성)에 대한 결과를 유도하고, 이를 가능한 교차를 고려하여, 일치된 및 일치되지 않은 삼염색체성과 합한다.
삼염색체성의 경우에, 교차가 존재하지 않는 경우, 삼염색체성은 단순이 일치하거나 일치하지 않는 삼염색체성일 수 있다. 일치한 삼염색체성은, 자녀가 하나의 부모로부터 동일한 염색체 분절 중의 2개 카피를 유전받은 경우이다. 일치하지 않은 삼염색체성은, 자녀가 부모로부터 각각이 동종 염색체 분절의 하나의 카피를 유전받은 경우이다. 교차로 인하여, 염색체의 일부 분절은 일치한 삼염색체성을 가질 수 있으며, 다른 부분은 일치하지 않는 삼염색체성을 가질 수 있다. 당해 단락에는 대립유전자들의 세트에 대한, 즉, 하나 이상의 가설에 대한 다수의 유전자자리에서 예측된 대립유전자 수에 대한 이형접합성 비율에 대한 결합 분포 모델을 구축하는 방법이 기술되어 있다.
SNP i에서,

는 일치된 가설 H_m에 대한 적합성이고,

는 일치되지 않은 가설 H_u에 대한 적합성이며 pc(i)는 SNPs i-1와 i 사이의 교차 확률로 가정한다. 그런 다음, 완전한 가능성을 다음과 같이 계산할 수 있다:

여기서,

는 SNP 1:N에 대해 가설에서 종결 가능성이다. E는 마지막 SNP, E

의 가설이다. 반복적으로, 다음을 계산할 수 있다:

여기서, ~E는 E(E가 아님) 이외의 가설이고, 여기서 고려된 가설은 H_m 및 H_u이다. 특히, 동일한 가설 및 비 교차, 또는 반대의 가설 및 교차를 사용하여 1 대 (i-1) SNP의 가능성을 기반으로, SNPi의 가능성을 곱하여 1:i SNP의 가능성을 계산할 수 있다.
SNP 1의 경우, i=1,

이다.
SNP 2의 경우, i=2,

, i=3:N의 경우 등이다.
일부 구현예에서, 자녀 분획을 측정할 수 있다. 자녀 분획은 자녀에서 기원한 DNA의 혼합물 중 서열의 비를 말할 수 있다. 비-침입성 태아 진단과 관련하여, 자녀 분획은 태아 또는 태아 유전형을 지닌 태반의 일부에서 기원한 모계 혈장 속의 서열의 비를 말할 수 있다. 이는 모계 혈장으로부터 제조된 DNA의 시료속의 자녀 분획을 말할 수 있으며 태아 DNA 속에 농축될 수 있다. DNA의 시료 속에서 자녀 분획을 측정하는 한가지 목적은 태아 상에서 배수성 요청을 이룰 수 있는 알고리즘을 사용하기 위한 것이므로, 자녀 분획은, 비-침입성 태아 검진의 목적을 위해 서열분석하여 분석한 어떠한 DNA 시료도 말할 수 있다.
비-침입성 태자녀 이수성 진단의 방법의 일부인 본 기재내용에 나타낸 알고리즘 중의 일부는 공지된 자녀 분획을 추정하며, 이는 항상 이 경우가 아닐 수 있다. 일 구현예에서, 부모계 데이터의 존재와 함께 또는 존재없이, 선택된 염색체에서 이염색체성에 대한 가능성을 최대화시킴으로써 가장 큰 가능성의 자녀 분획을 찾은 것이 가능하다.
특히, 이염색체성 가설에 대해서 및 염색체 chr에서 자녀 분획 cf에 대해, 위에서 기술된 바와 같은 LIK(D| H11, cf, chr) = log 가능성을 가정한다. Cset 에서 선택된 염색체의 경우(일반적으로 1:16), 정배수성인 것으로 추정하면, 완전한 가능성은 다음과 같다:

가장 가능성 있는 자녀 분획 (

은

로서 유도된다.
어떠한 염색체들의 세트도 사용하는 것이 가능하다. 또한 참조 염색체에서 정배수성을 추정하지 않고 자녀 분획을 유도하는 것이 가능하다. 당해 방법을 사용하여 다음 상황 중 어느 것에 대한 자녀 분획을 측정하는 것이 가능하다: (1) 하나는 부모에서 배열 데이터 및 모계 혈장에서 셧건 서열분석 데이터를 갖는다; (2) 하나는 부모에서 배열 데이터 및 모계 혈장에서 표적화된 서열분석 데이터를 갖는다; (3) 하나는 부모계 및 모계 혈장 둘 모두에서 표적화된 서열분석 데이터를 갖는다; (4) 하나는 모계 및 모계 혈장 분획 둘 모두에서 표적화된 서열분석 데이터를 갖는다; (5) 하나는 모계 혈장 분획에서 표적화된 서열분석 데이터를 갖는다; (6) 부모 및 자녀 분획 측정의 다른 조합.
일부 구현예에서 정보학 방법을 데이터 드롭아웃에 포함시킬 수 있으며; 이는 보다 높은 정밀도의 배수성 측정을 초래할 수 있다. 본원의 어딘가에 A를 얻을 확률은 실제 모계 유전형, 실제 자녀 유전형, 혼합물 중 자녀의 분획, 및 자녀 카피 수의 직접적인 함수인 것으로 추정하여 왔다. 또한 모계 또는 자녀 대립유전자가 예를 들면, 혼합물 속에서 실제 자녀 AB를 측정하는 것 대신 드롯 아웃될 수 있음이 가능하며, 이는, 대립유전자 A로 맵핑되는 유일한 서열이 측정되는 경우일 수 있다. 게놈 조명 데이터에 대한 부모 드롭아웃 비율 d_pg, 서열 데이터에 대한 부모 드롭아웃 비율 d_ps 및 서열 데이터에 대한 자녀 드롭아웃 비율 d_cs를 주목할 수 있다. 일부 구현예에서, 모계 드롭아웃 비율은 0인 것으로 추정될 수 있으며, 자녀 드롭아웃 비율은 낮고; 이 경우, 결과는 드롭아웃에 심각하게 영향받지 않는다. 일부 구현예에서, 대립유전자 드롭아웃의 확률은, 이들이 예측된 배수성요청의 유의적인 효과를 생성하기에 충분히 클 수 있다. 이러한 경우에, 대립유전자 드롭아웃은 여기의 알고리즘으로 포함된다:
부모 SNP 배열 데이터 드롭아웃: 모계 게놈 데이터 M의 경우, 드롭아웃 후 유전형을 m_d로 가정하면

이고,
여기서 이전과 같이

이며,

는 드롭아웃 비율 d에 대해, 아래에 정의된 바와 같이, 실제 유전형 m을 고려해볼 때, 가능한 드롭아웃 후 유전형 m_d의 가능성이다.

no call: 요청 없음
유사한 식을 부계 SNP 배열 데이터에 적용한다.
부모계 서열 데이터 드롭아웃: 모계 서열 데이터 SM에 대해

이고,
여기서,

은 상기 단락에서와 같이 정의되며 바이어스 분포로부터 판별한

확률은 부모계 데이터 가능성 단락에서 앞서와 같이 정의된다. 유사한 식을 부모계 서열 데이터에 적용한다.
자유 부유하는 DNA 서열 데이터 드롭아웃:

여기서,

는 자유 부유하는 데이터 가능성에 있어서 본 단락에서 정의한 바와 같다.
일 구현예에서,

은 실제 모계 유전형

을 제공하고, 드롭아웃 비율 d_ps 및

를 관찰된 자녀 유전형

의 확률로 가정하며, 실제 자녀 유전형

를 제공하고, 드롭아웃 비율 d_cs를 추정할 때 관찰된 모계 유전형

의 확률이다. nA_T가 실제 유전형 c의 A 대립유전자의 수이고, nA_D는 관찰된 유전형

에서 A 대립유전자의 수이고, 여기서 nA_T = nA_D이며 유사하게 nB_T는 실제 유전형 c에서 B 대립유전자의 수이며, nB_D는 관찰된 유전형

에서 B 대립유전자의 수이고, 여기서 nB_T ≥ nB_D 이며 d는 드롭아웃 비율인 경우,

이다.
일 구현예에서, 정보학 방법은 무작위 및 지속적인 편향을 포함할 수 있다. 이상적인 단어에서 다수의 서열 수에 있어서 SNP 당 지속적인 시료채취 바이어스 또는 무작위 노이즈(이항 분포 변화 외에)는 존재하지 않는다. 특히, SNP i에서, 모계 유전형 m의 경우, 실제 자녀 유전형 c 및 자녀 분획 cf, 및 X가 SNP i에서 (A+B) 판독물들의 세트 중 A의 수인 경우, X는 X~이항(p, A+B)과 같이 작동하며, 여기서 p는

이고 A 함량의 실제 확률이다.
일 구현예에서, 정보학 방법은 무작위 편향을 포함할 수 있다. 흔한 경우로서, 측정시 a 편향이 존재하는 것으로 가정하여 당해 SNP에서 A를 얻을 확률은 q와 같으며, 이는 상기에 정의한 바와 같은 p와는 약간 상이하다. p가 q와 상이한 정도는 측정 과정의 정밀도 및 다른 인자의 수에 의존하며 p와는 떨어진 q의 표준 편차로 정량화할 수 있다. 일 구현예에서, p에서 중심에 있는 분포의 평균, 일부 규정된 표준 편차 s에 따라 매개변수

와 함께, 베타 변수를 가지는 것으로 q를 모델화하는 것이 가능하다. 특히, 이는

를 제공하며, 여기서

이다.

로 놓은 경우 매개변수

는

로 유도될 수 있으며, 여기서

이다.
이는 베타-이항 분포의 정의이며, 여기서 하나는 가변 매개변수 q를 갖는 바이어스 분포로부터 판별한 시료채취이고, 여기서 q는 평균 p를 지닌 베타 분포를 따른다. 따라서, 편향이 없는 설정에서, SNP i에서, 실제 모친 유전형 (m)을 추정하는 부모 서열 데이터(SM) 확률은, SNP i에서 모친 서열 A의 수(am_i) 및 SNP i에서 모친 서열 B의 수(bm_i)를 고려해볼 때, 다음과 같이 계산할 수 있다:
P(SM|m,i)=P_X _|m(am_i); 여기서 X|m~Binom(p_m(A), am_i+bm_i)
이제, 표준 편차 s를 사용한 무작위 편향을 포함하는, 이는 다음과 같이 된다:
X|m~BetaBinom(p_m(A), am_i+bm_i,s)
편향이 없는 경우, 실제 모친 유전형(m), 실제 자녀 유전형(c), 자녀 분획(cf)을 추정하는 모계 혈장 DNA 서열 데이터 (S) 확률은, 자녀 가설 H를 가정할 시, SNP i에서 자유 부유하는 DNA 서열 A의 수 (a_i) 및 SNP i에서 자유 부유하는 서열 B의 수(b_i)를 고려해볼 때, 다음과 같이 계산할 수 있다:

여기서 X는 바이어스(p(A), a_i+b_i)이고, p(A)는

이다.
일 구현예에서, 표준 편차 s를 사용하는 무작위 편향을 포함하는, 이는 X~BetaBinom(p(A),a_i+b_i,s)이 되고, 여기서 초과의 변수의 양은 편차 매개변수 s로 규정되거나, 동등하게 N으로 규정된다. s의 값이 작을수록(또는 N의 값이 클수록) 당해 분포는 정규 바이어스 분포에 가까워진다. 편향의 양, 즉, 명확한 내용 AA|AA, BB|BB, AA|BB, BB|AA로부터 상기 N을 추정하여 상기 확률에서 추정된 N을 사용하는 것이 가능하다. 데이터의 거동에 따라서, N은 판독물 a_i+b_i의 깊이와 상관없이 상수 또는 a_i+b_i의 함수인 것으로 하여, 판독물의 보다 큰 깊이에 대해 보다 작은 편향을 이룰 수 있다.
일 구현예에서, 정보학 방법은 지속적인 SNP-당 편향을 포함할 수 있다. 서열분석 공정의 인공물로 인하여, 일부 SNP는 실제 A 성분의 양과는 관계없이 지속적으로 보다 낮거나 보다 높을 수 있다. SNP i가 w_i 퍼센트의 편향을 다수의 A 수에 가하는 것으로 가정한다. 일부 구현예에서, 이러한 편향은 동일한 조건 하에서 유도된 훈련 데이터의 세트로부터 평가하여, 다음과 같은 부모계 서열 데이터 평가에 다시 가할 수 있고:
P(SM|m,i)=P_X _|m(am_i); 여기서 X|m~BetaBinom(p_m(A)+ w_i, am_i+bm_i,s)
자유 부유하는 DNA 서열 데이터 확률을 다음과 같이 평가할 수 있다:

여기서 X는 BetaBinom(p(A)+ w_i,a_i+b_i,s)이다.
일부 구현예에서, 당해 방법은 추가의 노이즈, 차등적인 시료 품질, 차등적인 SNP 품질, 및 무작위 시료채취 편향을 특이적으로 고려하여 씌여질 수 있다. 이의 예는 본원에 제공되다. 당해 방법은 거대한 다중화 미니-PCR 프로토콜을 사용하여 생성된 데이터의 내용에서 특히 유용한 것으로 밝혀졌으며 실험 7 내지 13에서 사용되었다. 당해 방법은 각각 최종 모델에 상이한 종류의 노이즈 및/또는 편향을 도입하는 수개의 단계를 포함한다:
(1) 모계 및 태아 DNA의 혼합물을 함유하는 제1의 시료가 일반적으로 1,000 내지 40,000의 범위인, 크기가 N₀인 분자의 원래의 양을 함유하는 것으로 가정하며, 여기서 p는 실제 %ref이고,
(2) 공통의 연결 어댑터를 사용하는 증폭에서, N₁개 분자; 일반적으로 N₁ ~ N₀/2 분자를 시료채취하며 무작위 시료채취 편향을 시료채취에 기인하여 도입하는 것으로 추정한다. 증폭된 시료는 다수의 분자 N₂를 함유할 수 있으며, 여기서 N₂ >> N₁이다. X₁이 N₁개의 시료채취된 분자 중 참조 유전자자리(SNP 바이어스 당)의 양을 나타내고, 프로토콜의 나머지 전체에서 무작위 시료채취 편향을 도입하는 변수, p₁= X₁/N₁를 나타내도록 한다. 당해 시료채취 편향은 단순한 바이어스 분포 모델을 사용하는 것 대신에, 베타-이항(BB) 분포를 사용함으로써 당해 모델에 포함시킨다. 베타-이항 분포의 매개변수 N은 0<p<1인 SNP에서, 누출 및 증폭 편향에 대해 조절한 후 훈련 데이터로부터 시료 바이어스 당으로 이후에 평가될 수 있다. 누출은, SNP가 잘못 판독되는 가능성이다.
(3) 증폭 단계는 어떠한 대립유전자 편향도 증폭시킬 것이므로, 증폭 편향은 가능한 균일하지 않은 증폭으로 인하여 도입된다. 유전자자리에서 하나이 대립유전자가 f 횟수로 증폭되고 당해 유전자자리에서 다른 대립유전자가 g 횟수로 증폭된다고 가정하면, 여기서 f=ge^b(여기서 b는 0이다)는 편향이 없음을 나타낸다. 바이어스 매개변수, b가 0에서 중심에 있는 경우 얼마나 더 많거나 적은 A 대립유전자가 특정한 SNP에서 B 대립유전자에 대치되어 증폭되는지를 나타낸다. 매개변수 b는 SNP에서 SNP까지 상이할 수 있다. 바이어스 매개변수 b는 SNP 편향당, 예를 들면, 훈련 데이터에 의하여 평가될 수 있다.
(4) 서열분석 단계는 증폭된 분자의 시료를 서열분석함을 포함한다. 당해 단계에서, 누출이 존재할 수 있으며, 여기서 누출은, SNP가 부정확하게 판독되는 상황이다. 누출은 어떠한 수의 문제로부터도 생성될 수 있으며, 정확한 대립유전자 A로서 뿐 아니라 이러한 유전자자리에서 발견된 다른 대립유전자 B 또는 이러한 유전자자리에서 전형적으로 발견되지 않은 대립유전자 C 또는 D로서의 판독물인 SNP를 생성할 수 있다. 서열분석이 크기 N₃(여기서 N₃ < N₂이다)인 증폭된 시료의 다수의 DNA 분자의 서열 데이터를 측정하는 것을 가정한다. 일부 구현예에서, N₃은 20,000 내지 100,000; 100,000 내지 500,000; 500,000 내지 4,000,000; 4,000,000 내지 20,000,000; 또는 20,000,000 내지 100,000,000의 범위일 수 있다. 시료채취된 각각의 분자는 정확하게 판독되는 확률 p_g를 가지며, 이 경우 이는 대립유전자 A로서 정확하게 나타날 것이다. 시료는 1-p_g의 확률로 원래의 분자와 관련되지 않은 대립유전자로서 부정확하게 판독될 것이며, 확률 p_r로서 대립유전자 A와 같이, 확률 p_m으로서 대립유전자 B와 같이 또는 확률 p_o로서 대립유전자 C 또는 대립유전자 D와 같이 보일 것이고, 여기서 p_r+p_m+p_o는 1이다. 매개변수 p_g, p_r, p_m, p_o는 훈련 데이터에 의하여 SNP 기반으로 평가된다.
상이한 프로토콜은 상이한 양의 무작위 시료채취, 상이한 수준의 증폭 및 상이한 유출 편향을 생성하는 분자 생물학 단계에서 변수와 함께 유사한 단계를 포함할 수 있다. 다음의 모델을 이들 경우 각각에 잘 적용시킬 수 있다. SNP 기반 당, 시료채취된 DNA의 양에 대한 모델은 다음과 같이 제공된다:
X₃~베타이항(L(F(p,b),p_r,p_g), N*H(p,b))
여기서 p는 참조 DNA의 실제량이고, b는 SNP 바이어스 기준이며, 상기 기술된 바와 같고, p_g는 정확한 판독물의 확률이며, p_r은 위에서 기술된 바와 같이, 부정확하지만 우연히 정확한 대립유전자로 보일 확률이고:

일부 구현예에서, 당해 방법은 단순한 바이어스 분포 대신에 베타-이항 분포를 사용하며; 이는 무작위 시료채취 편향을 고려한다. 베타-이항 분포의 매개변수 N은 요구된 기준에서 시료 기준당으로 평가된다. 단지 p 대신에, 바이어스 교정 F(p,b), H(p,b)를 사용하여 증폭 편향을 고려한다. 편향의 매개변수 b는 예정보다 빨리 훈련 데이터에 의하여 SNP 기준당으로 평가된다.
일부 구현예에서, 당해 방법은 p 대신에 누출 정정L(p,p_r,p_g)을 사용하며; 이는 누출 편향, 즉, 다양한 SNP 및 시료 품질을 고려한다. 일부 구현예에서, 매개변수 p_g, p_r, p_o는 예정보다 빨리 훈련 데이터에 의하여 SNP 기준당으로 평가한다. 일부 구현예에서, 매개변수 p_g, p_r, p_o는 다양한 시료 품질을 고려하여 현재의 시료로 계속해서 업데이트될 수 있다.
본원에 기술된 모델은 매우 일반적으로 차등적인 시료 품질 및 차등적인 SNP 품질 둘 모두를 고려할 수 있다. 상이한 시료 및 SNP는, 일부 구현예가 베타-이항 분포를 사용하며 이의 평균 및 변화는 DNA의 원래의 양, 및 이의 시료 및 SNP 품질의 함수라는 사실에 의해 예시되는 바와 같이, 상이하게 처리된다.
플랫폼 시료채취
혈장 속에 존재하는 예측된 대립유전자 비가 r(모계 및 태아 유전형을 기반으로 함)인 경우의 단일 SNP를 고려한다. 예측된 대립유전자 비는 합해진 모계 및 태아 DNA에서 A 대립유전자의 예측된 기능으로 정의된다. 모계 유전형 g_m 및 자녀 유전형 g_c의 경우, 예측된 대립유전자 비는, 유전형을 또한 대립유전자 비로 나타내는 것으로 추정하여, 식 1로 제공된다.

SNP에서 관찰은 존재하는 각각의 대립유전자, n_a 및 n_b(이는 판독물 d의 깊이로 합해진다)을 사용한 맵핑된 판독물의 수로 이루어진다. 한계가 이미 맵핑 확률 및 프레드 점수(phred score)에 적용됨으로써 랩핑 및 대립유전자 관찰이 정확한 것으로 고려될 수 있다고 가정한다. 프레드 점수는 특정한 염기에서 특정한 측정이 잘못된 확률에 관한 수치적 척도이다. 일 구현예에서, 염기를 서열분석으로 측정하는 경우, 프레드 점수는 다른 염기의 염료 강도에 대해 요청된 염기에 상응하는 염료 강도의 비로부터 계산할 수 있다. 관찰 가능성에 대한 가장 간단한 모델은, d 판독물 각각을 대립유전자 비 r을 갖는 거대한 혼주물로부터 독립적으로 얻는 것으로 가정한 바이어스 분포이다. 식 2는 당해 모델을 기술한다.

이항 모델은 다수의 방식으로 확장시킬 수 있다. 모계 및 부계 유전형이 모두 A 또는 모두 B인 경우, 혈장에서 예측된 대립유전자 비는 0 또는 1일 것이고, 이원 확률은 잘 정의되지 않을 것이다. 실제로, 예측되지 않은 대립유전자는 때때로 실제로 관찰된다. 일 구현예에서, 정정된 대립유전자 비

= 1/(n_a + n_b)를 사용하여 소수의 예측되지 않은 대립유전자를 허용한다. 일 구현예에서, 각각의 SNP에서 나타나는 예측되지 않은 대립유전자의 비를 모델화하고, 당해 모델을 사용하여 예측된 대립유전자 비를 정정하는 것이 가능하다. 예측된 대립유전자 비가 0 또는 1이 아닌 경우, 관측된 대립유전자 비는 증폭 바이어스 또는 다른 현상으로 인한 예측된 대립유전자 비에 대한 판독물의 충분히 큰 깊이로 수렴되지 않을 수 있다. 이후에, 대립유전자 비는 바이어스보다 더 높은 변화를 갖는 P(n_a, n_b|r)에 대한 베타-이항 분포를 초래하는, 예측된 대립유전자 비에서 중앙에 있는 베타 분포로서 모델화될 수 있다.
단일 SNP에서의 반응에 대한 플랫폼 모델은, 모계 및 태아 유전형을 고려해볼 때 F(a, b, g_c, g_m, f) (3), 또는 n_a = a 및 n_b = b를 관찰하는 확률로서 정의될 것이며, 이는 또한 식 1을 통해 태아 분획에 의해 결정된다. F의 기능적 형태는 위에서 논의한 바와 같이, 이원 분포, 베타-이원 분포, 또는 유사한 기능일 수 있다.

일 구현예에서, 자녀 분획은 다음과 같이 측정될 수 있다. 산전 시험에 대한 태아 분획 f의 최대 확률 평가는 부계 정보의 사용없이 유도될 수 있다. 이는, 부계 유전 데이터가 이용가능하지 않은 경우, 예를 들면, 아버지 기록이 실제로 태아의 유전적 아버지가 아닌 경우에 관련있을 수 있다. 태아 분획은 SNP들의 세트인 것으로 평가되며, 여기서 모계 유전형은 0 또는 1이며, 단 2개의 가능한 태아 유전형의 세트를 생성한다. S₀은 모계 유전형 0을 갖춘 SNP들의 세트로 정의하고 S₁은 모계 유전형 1을 갖춘 SNP들의 세트로 정의한다. S₀에서 가능한 태아 유전형은 0 및 0.5이고, 가능한 대립유전자 비 R₀(f) = {0,f/2}의 세트를 생성한다. 유사하게, R₁(f)는 {1-f/2, 1}이다. 당해 방법은 SNP를 포함하도록 사소하게 확장시킬 수 있으며, 여기서 모계 유전형은 0.5이지만, 이들 SNP는 보다 큰 대립유전자 비의 세트로 인하여 거의 유익하지 않을 것이다.
N_a0 및 N_b0을 S₀, 및 N_a1에서 SNP에 대해 n_as 및 n_bs으로 형성된 벡터로서 정의하고, N_a1 및 N_b1를 유사하게 S₁에 대해 정의한다.

의 최대 확률 평가는 식 4로 정의된다.

각각의 SNP에서 대립유전자의 수가 독립적으로 SNP의 혈장 대립유전자 비를 조건으로 하는 것으로 추정하면, 확률은 각 세트 (5)에서 SNP에 대한 곱으로서 나타낼 수 있다.

f에서의 의존성은 R₀(f) 및 R₁(f)의 가능한 대립유전자 비의 세트를 통한다. SNP 확률 P(n_as, n_bs|f)은 f에서 조건화된 최대 확률 유전형을 추정함으로써 접근할 수 있다. 충분히 높은 태아 분획 및 판독물 깊이에서, 최대 확률 유전형의 선택은 높은 신뢰도가 될 것이다. 예를 들면, 10 퍼센트의 태아 분획 및 1000의 판독물 깊이는, 모가 유전형 0인 SNP를 고려한다. 예측된 대립유전자 비는 0 및 5 퍼센트이며, 이는 판독물의 충분히 높은 깊이에서 용이하게 구별가능해질 것이다. 추정된 자녀 유전형을 식 5로 치환하여 태아 분획 평가물에 대한 완전한 식 (6)을 생성시킨다.

태아 분획은 [0, 1]의 범위이어야만 하므로 최적화는 제한된 1차원 연구에 의해 용이하게 시행될 수 있다.
낮은 판독물의 깊이 또는 높은 노이즈 수준의 존재하에서, 최대 확률 유전형을 추정하는 것은 바람직하지 않을 수 있으며, 이는 인공적으로 높은 신뢰도를 초래할 수 있다. 다른 방법은 각각의 SNP에서 가능한 유전형에 걸친 합일 수 있으며, S₀에서 SNP에 대한 P(n_a, n_b|f)의 경우 다음 식 (7)을 생성한다. 이전의 확률 P(r)은 R₀(f)에 걸쳐 균일한 것으로 추정될 수 있거나 집단 빈도를 기반으로 할 수 있다. 그룹 S₁에 대한 확장은 사소하다.

일부 구현예에서, 확률은 다음과 같이 유도할 수 있다. 신뢰도는 2개의 가설 H_t 및 H_f의 데이터 가능성에 의하여 계산할 수 있다. 각각의 가설의 가능성은 반응 모델, 추정된 태아 분획, 모계 유전형, 대립유전자 집단 빈도, 및 혈장 대립유전자 수를 기반으로 유도된다.
다음 표기법을 정의한다:
G_m, G_c 실제 모계 및 자녀 유전형
G_af, G_tf 주장된 부친 및 실제 부친의 실제 유전형
G(g_c, g_m, g_tf) =P(G_c =g_c|G_m =g_m,G_tf =g_tf) 유전 확률
P(g) = P(G_tf = g) 특정한 SNP에서 유전형 g의 집단 빈도
각각의 SNP에서의 관찰이 혈장 대립유전자를 독립적으로 조건으로 한다고 가정하면, 친자 가설의 가능성은 SNP에서의 가능성의 생성물이다. 다음 식은 단일 SNP에 대한 가능성을 유도한다. 식 (8)은 어떠한 가설 h의 가능성에 대한 표현이며, 이는 이후에 H_t 및 H_f의 특수 경우로 나누어진다.

H_t의 경우에, 주장된 부친은 실제 부친이고 태아 유전형은 모계 유전형 및 주장된 부친 유전형으로부터 식 (9)에 따라서 유전된다.

H_f의 경우에, 주장된 부친은 실제 부친이 아니다. 실제 부계 유전형의 가장 우수한 평가는 각각의 SNP에서 집단 빈도에 의해 제공된다. 따라서, 자녀 유전형의 가능성은 식 (10)에서와 같이, 공지된 모계 유전형, 및 집단 빈도로 측정된다.

정확한 부계에 대한 신뢰도 C_p는 베이스 규칙(Bayes rule)(11)을 사용하여 2개의 가능성의 SNP에 걸친 생성물로부터 계산한다.

태아 분획 퍼센트를 사용한 최대 확률 모델
모계 혈청에 함유된 자유로이 부유하는 DNA를 측정하거나, 어떠한 혼합된 시료 속의 유전형 물질을 측정함으로써 태아의 배수성 상태를 측정하는 것은 중요한 시험이다. 예를 들면, 태자녀 특정한 염색체에서 삼염색체성인 경우, 모계 혈액에서 발견된 이러한 염색체의 DNA 총량은 참조 염색체와 관련하여 상승될 것이라고 가정하는 판독물 수 분석을 수행하는 다수의 방법이 존재한다. 이러한 태아에서 삼염색체성을 검출하는 한가지 방법은 소정의 염색체에 상응하는 분석 세트에서 SNP의 수에 따라, 또는 염색체의 유일하게 맵핑가능한 부위의 수에 따라, 각각의 염색체에 대해 예측된 DNA의 양을 표준화하는 것이다. 측정이 표준화되면, 측정된 DNA의 양이 특정의 한계를 초과하는 어떠한 염색체도 삼염색체성으로 측정된다. 이러한 접근법은 문헌[참조: Fan, 등 PNAS, 2008; 105(42); pp. 16266-6271, 및 또한 Chiu 등 BMJ 2011;342:c7401]에 기술되어 있다. 치우(Chiu) 등의 논문에서, 이러한 표준화는 다음과 같은 Z 점수를 계산하여 달성하였다:
시험 경우에 염색체 21 퍼센트에 대한 Z 점수 = ((시험 경우에 염색체 21 퍼센트) - (참조 대조군에서 염색체 21 평균 퍼센트)) / (참조 대조군에서 염색체 21 퍼센트의 표준 편차).
이들 방법은 단일 가설 거부 방법을 사용하여 태아의 배수성 상태를 측정한다. 그러나, 이들은 일부 유의적인 단점으로 고생한다. 태아에서 배수성을 측정하는 이들 방법은 시료 속의 태아 DNA의 퍼센트에 따라 불변하므로, 이들은 하나의 컷 오프를 사용하며; 이의 결과는, 측정의 정밀도가 최적이지 않고, 혼합물 속의 태아 DNA의 퍼센트가 비교적 낮은 경우는 최악의 정밀도로 고생할 것이다.
일 구현예에서, 본원의 방법을 사용하여 태아의 배수성 상태를 측정하는 것은 시료 속에서 태아 DNA의 분획을 고려함을 포함한다. 본원의 다른 구현예에서, 당해 방법은 최대 확률 평가의 사용을 포함한다. 일 구현예에서, 본원의 방법은 원래의 태아 또는 태반인 시료 속에서의 DNA의 퍼센트를 계산함을 포함한다. 일 구현예에서, 이수성을 요청하기 위한 한계는 태아 DNA의 계산된 퍼센트를 기반으로 적절히 조절된다. 일부 구현예에서, DNA의 혼합물 속에서 태아 기원의 DNA의 퍼센트를 평가하기 위한 방법은 모친의 유전 물질, 및 태아의 유전 물질을 포함하는 혼합된 시료를 수득하는 단계, 태아의 부친으로부터 유전 물질을 수득하는 단계, 혼합된 시료 속에서 DNA를 측정하는 단계, 부계 시료의 DNA를 측정하는 단계, 및 혼합된 시료, 및 부계 시료의 DNA 측정을 사용하여 혼합된 시료 속에서 태아 기원의 DNA의 퍼센트를 계산하는 단계를 포함한다.
본원의 구현예에서, 혼합물 중 태아 DNA의 분획, 또는 태아 DNA의 퍼센트를 측정할 수 있다. 일부 구현예에서, 분획은 태아 및 모계 DNA의 혼합물인, 모계 혈장 시료 자체에서 이루어진 유전형 측정 만을 사용하여 계산할 수 있다. 일부 구현예에서, 분획은 모친의 측정되거나 달리 공지된 유전형 및/또는 부친의 측정되거나 달리 공지된 유전형을 사용하여 계산할 수 있다. 일부 구현예에서, 태아 DNA의 퍼센트는 부모 관계의 지식과 함께 모계 및 태아 DNA의 혼합물에서 이루어진 측정을 사용하여 계산할 수 있다. 일 구현예에서, 태아 DNA의 분획은 집단 빈도를 사용하여 계산함으로써 특정한 대립유전자 측정에 있어서의 가능성에 모델을 조절할 수 있다.
본원의 일 구현예에서, 신뢰도는 태아의 배수성 상태의 측정의 정밀도로 계산할 수 있다. 일 구현예에서, 최대 확률(H_major)의 가설의 신뢰도는 (1- H_major) / ∑(모든 H)로 계산할 수 있다. 모든 가설의 분포가 알려져 있는 경우 가설의 신뢰도를 측정하는 것이 가능하다. 부모계 유전형 정보가 알려져 있는 경우 가설 모두의 분포를 측정하는 것이 가능하다. 정배수성 태아에 대한 예측된 분포 및 이수성 태아에 대한 데이터의 예측된 분포의 지식이 알려져 있는 경우 배수성 측정의 신뢰도를 계산하는 것이 가능하다. 부모계 유전형 데이터가 공지되어 있는 경우 이들 예측된 분포를 계산하는 것이 가능하다. 일 구현예에서, 정상적인 가설 주변 및 비정상적인 가설 주변의 시험 통계의 분포의 지식을 이용하여 요청의 신뢰성 및 또한 한계의 개선 둘 모두를 측정하여 보다 신뢰성있는 요청을 할 수 있다. 이는 혼합물 속의 태아 DNA의 양 및/또는 퍼센트가 낮은 경우, 특히 유용하다. 실질적으로 이수성인 태자녀 시험 통계로 인하여 정배수성인 것으로 밝혀진 경우, 예를 들면, Z 통계가 보다 높은 퍼센트의 태아 DNA가 존재하는 경우에 대해 최적화되는 한계를 기반으로 이루어진 한계를 초과하지 않는 경우의 상황을 피하는데 도움을 줄 것이다.
일 구현예에서, 본원에 개시된 방법을 사용하여 모계 및 태아 유전 물질의 혼합물 속에서 모계 및 태아 표적 염색체의 카피의 수를 측정함으로써 태아 이수성을 측정할 수 있다. 당해 방법은 모계 및 태아 유전물질 둘 모두를 포함하는 모계 조직을 수득하는 것을 포함할 수 있으며; 일부 구현예에서, 당해 모계 조직은 모계 혈액에서 분리된 모계 혈장 또는 조직일 수 있다. 당해 방법은 또한 상술한 모계 조직을 프로세싱함으로써 상기 모계 조직으로부터 모계 및 태아 유전 물질의 혼합물을 수득함을 내포할 수 있다. 당해 방법은 다수의 반응 시료내로 수득된 유전 물질을 분포시켜, 표적 염색체의 표적 서열을 포함하지 않는 개개 반응 시료, 및 표적 염색체의 표적 서열을 포함하지 않는 개개 반응 시료를 무작위로 제공하는, 예를 들면, 시료 상에서 고 배출 서열분석을 수행함을 포함할 수 있다. 당해 방법은 상기 개개 반응 시료 속에 존재하거나 부재하는 유전 물질의 표적 서열을 분석함으로써 반응 시료 속에 아마도 정배수성 태아 염색체의 존재 또는 부재를 나타내는 이원 결과의 제1의 수 및 반응 시료 속에서 가능한 이수성 태아 염색체의 존재 또는 부재를 나타내는 이원 결과의 제2의 수를 제공함을 내포할 수 있다. 다수의 이원 결과들 중 어느 것도 예를 들면, 특정한 염색체, 염색체의 특수 영역, 특정한 유전자자리 또는 유전자자리의 세트에 맵핑하는 서열 판독물을 계수하는 정보학 기술의 방법으로 계산할 수 있다. 당해 방법은 염색체 길이, 염색체의 영역의 길이, 또는 세트 속의 유전자자리의 수를 기반으로 이원 현상의 수를 표준화함을 포함할 수 있다. 당해 방법은 제1 수를 사용하여 반응 시료 속의 아마도 정배수성 태아 염색체에 대한 이원 결과의 수의 예측된 분포를 계산함을 내포할 수 있다. 당해 방법은 예를 들면, 아마도 정배수성 태아 염색체에 대한 이원 결과의 수의 예측된 판독물 수 분포에 (1+n/2)(여기서, n은 추정된 태아 분획이다)를 곱함으로써, 혼합물 속에서 발견된 태아 DNA의 추정된 분획 및 제1의 수를 사용하여 반응 시료 속에서 아마도 이수성 태아 염색체에 대한 이원 결과의 예측된 분포를 계산함을 내포할 수 있다. 일부 구현예에서, 서열 판독물은 이원 결과보다는 확률론적 맵핑에서 처리될 수 있으며; 당해 방법은 보다 높은 정밀도를 생성할 수 있지만, 보다 더 계산력을 요구한다. 태아 분획은, 이의 일부가 본원의 어딘가에 기술되어 있는, 다수의 방법으로 평가할 수 있다. 당해 방법은, 제2 수가 가능한 이수성 태아 염색체에 상응하는지를 측정하기 위한 최대 확률 접근법을 사용함을 포함할 수 있다. 이러한 방법은 측정된 데이터를 고려해볼 때, 태아의 배수성 상태를 정확함의 최대 확률이 있는 가설에 상응하는 배수성 상태가 되도록 요청하는 것을 포함할 수 있다.
최대 확률 모델의 사용은 태아의 배수성 상태를 측정하는 어떠한 방법의 정밀도로 증가시키기 위해 사용될 수 있음에 주목한다. 유사하게, 신뢰도는 태아의 배수성 상태를 측정하는 어떠한 방법에 대해서도 계산될 수 있다. 최대 확률 모델의 사용은, 배수성 측정이 단일 가설 거부 기술을 사용하여 이루어지는 어떠한 방법의 정밀도로 개선시킬 수 있다. 최대 확률 모델은, 가능성 분포가 정상 및 비정상의 경우 둘 모두에 대해 계산될 수 있는 어떠한 방법에 대해서도 사용될 수 있다. 최대 확률 모델의 사용은 배수성 요청에 대한 신뢰도를 계산하는 능력을 내포한다.
방법의 추가 논의
일 구현예에서, 본원에 개시된 방법은 다형성 유전자자리에서 각각의 대립유전자의 독립된 관측의 수의 정량적 척도를 이용하며, 여기서 이는 대립유전자의 비를 계산함을 포함하지 않는다. 이는 유전자자리에서 2개의 대립유전자의 비에 대한 정보를 제공하지만, 다른 대립유전자의 다수의 독립된 관찰을 정량화하지 않는, 일부 미세배열을 기반으로 하는 방법과 같은 방법과는 상이하다. 당해 분야에 공지된 일부 방법은 다수의 독립된 관찰에 관한 정량적 정보를 제공할 수 있지만, 배수성 측정을 이끄는 계산은 단지 대립유전자 비만을 이용하며, 정량적 정보는 이용하지 않는다. 독립된 관찰의 수에 대한 정보를 보유하는 중요성을 나열하기 위하여, 2개의 대립유전자, A 및 B를 지닌 시료 유전자자리를 고려한다. 첫번째 실험에서 20개의 A 대립유전자 및 20개의 B 대립유전자를 관찰하며, 두번째 실험에서 200개의 A 대립유전자 및 200개의 B 대립유전자를 관찰한다. 실험 둘 모두에서 비 (A/(A+B))는 0.5로 같지만, 두번째 실험에서는 A 또는 B 대립유전자의 빈도의 특정성에 대해 첫번째 보다 더 많은 정보를 전달한다. 대립유전자 비를 이용하는 것 보다, 본 방법은 각각의 다형성 유전자자리에서 가장 유사하게 대립유전자 빈도를 보다 정밀하게 모델화하는 정량적 데이터를 사용한다.
일 구현예에서, 본 방법은 다수의 다형성 유전자자리의 측정을 종합하기 위한 유전 모델을 구축하여 이배체성을 삼배체성과 보다 잘 구별하고 또한 삼염색체성의 유형을 측정한다. 또한, 본 발명은 본 방법의 정밀도를 향상시기키 위한 유전 연결 정보를 포함한다. 이는 대립유전자 비를 염색체상에서 모든 다형성 유전자자리에 걸쳐 평균을 내는 당해 분야에 공지된 일부 방법과는 대조적이다. 본원에 개시된 방법은 감수분열 I 동안의 비분리, 감수분열 II 동안의 비분리, 및 태아 발달 조기에 유사분열 동안의 비분리로부터 생성되는 이염색체성 및 또한 삼염색체성에서 예측된 대립유전자 빈도 분포를 명쾌하게 모델화한다. 이것이 중요한 이유를 나열하기 위하여, 감수분열 I 동안에, 2개의 상이한 동족체가 하나의 부모로부터 유전되는 삼염색체성을 생성하는 동안 교차 비분리가 존재하지 않는 경우; 감수분열 II 동안의 비분리 또는 태아 발달 조기의 유사분열은 하나의 부모로부터 동일한 동족체의 2개의 카피를 생성할 수 있다. 각각의 시나리오는 각각의 다형성 유전자자리 및 또한 결합된 것으로 고려된 모든 물리적으로 연결된 유전자자리(즉, 동일한 염색체 상의 유전자자리)에서 상이한 예측된 대립유전자 빈도를 생성한다. 동족체 사이의 유전 물질의 교환을 초래하는 교차는 유전 양식을 보다 복잡하게 하지만, 본 방법은 유전 연결 정보, 즉, 재조합 비 정보 및 유전자자리 사이의 물리적 거리를 사용함으로써 이를 수용한다. 감수분열 I 비분리와 감수분열 II 또는 유사분열 비분리사이를 구별하기 위하여 본 방법은 당해 모델 내로 중심체에서부터의 거리가 증가함에 따라 교차의 증가된 확률을 포함시킨다. 감수분열 II 및 유사분열 비분리는, 유사분열 비분리가 전형적으로 하나의 동족체의 동일하거나 거의 동일한 카피를 생성하지만 감수분열 II 비분리 현상 후 존재하는 2개의 동족체는 배우자 형성 동안에 하나 이상의 교차로 인하여 흔히 상이하다는 사실에 의해 구별될 수 있다.
일 구현예에서, 본원의 방법은, 이염색체성이 추측되는 경우 부모의 일배체형을 측정하지 않을 수 있다. 일 구현예에서, 삼염색체성의 경우, 본 방법은, 혈장이 하나의 부모로부터 2개의 카피를 취하고, 부모 관계는 2개의 카피가 문제의 부모로부터 유전되어 진다는 인식에 의해 판별될 수 있다는 사실을 이용하여 한쪽 부모 또는 양쪽 부모의 일배체에 대해 측정할 수 있다. 특히, 자녀는 부모의 동일한 카피 중 2개(일치된 삼염색체성) 또는 부모의 카피 둘 모두(일치되지 않은 삼염색체성)를 유전할 수 있다. 각각의 SNP에서 일치한 삼염색체성 및 일치하지 않은 삼염색체성의 가능성을 계산할 수 있다. 교차에 대해 계수하는 연결 모델을 사용하지 않는 배수체 요청 방법은, 시료가 모든 염색체에 걸쳐 일치한 삼염색체성 및 일치하지 않은 삼염색체성의 단순한 중량 평균으로부터 삼염색체성의 전체 가능성을 계산할 수 있다. 그러나, 비연결 오차 및 교차를 생성하는 생물학적 메카니즘으로 인하여, 삼염색체성은 교차가 발생하는 경우에만 염색체 상에서 일치에서 비일치로 변화(및 역으로의 변화)할 수 있다. 본 방법은 교차 가능성을 확률적으로 고려함으로써 이를 고려하지 않는 방법보다 더 큰 정밀도의 배수성 요청을 생성한다.
일 구현예에서, 참조 염색체를 사용하여 자녀 분획 및 노이즈 수준의 양 또는 가능성 분포를 측정한다. 일 구현예에서, 자녀 분획, 노이즈 수준, 및/또는 가능성 분포는, 이의 배수성 상태가 측정되는 염색체로부터 입수할 수 있는 유전 정보만을 사용하여 측정한다. 본 방법은 참조 염색체 없이, 및 또한 특정한 자녀 분획 또는 노이즈 수준의 고정 없이 작업한다. 이는, 참조 염색체에서 비롯된 유전 데이터가 자녀 분획 및 염색체 행위를 교정하는 데 있어 필수적인 당해 분야에 공지된 방법에 의하여 유의적으로 개선되고 차별화된 핵심이다.
일 구현예에서 참조 염색체가 태아 분획을 측정할 필요가 없는 경우, 가설의 측정을 다음과 같이 수행한다:

*priorprob(H)
참조 염색체를 사용한 알고리즘을 사용하여, 참조 염색체가 이염색체성인 것으로 전형적으로 추정한 후, (a) 이러한 추정 및 참조 염색체 데이터를 기반으로 가장 유사한 자녀 분획 및 무작위적인 노이즈 수준 N을 고정할 수 있다:

다음에, 감소시키거나

또는 (b) 이러한 추정 및 참조 염색체 데이터를 기반으로 자녀 분획 및 노이즈 수준 분포를 평가한다. 특히, cfr 및 N에 대한 하나의 값만을 고정시키지 않지만, 보다 광범위한 범위의 가능한 cfr에 대해 확률 p(cfr, N)를 지정하며, N 값은:

이고
여기서 priorprob(cfr, N)는 선행 지식 및 실험에 의해 측정된, 특정한 자녀 분획 및 노이즈 수준의 선행 확률이다. 경우에 따라, cfr의 범위, N에 걸쳐 단지 균일화한다. 이후에:

를 기재할 수 있다.
상기 방법 둘 모두는 우수한 결과를 제공한다.
일부 예에서 참조 염색체를 사용하는 것이 바람직하지 않거나, 가능하지 않거나 실현가능한 경우에 주목한다. 이러한 경우에, 각각의 염색체에 대해 별도로 가장 우수한 배수성 요청을 유도하는 것이 가능하다. 특히:

는 단지 참조 염색체에 대해 이염색체성을 추정하지 않고, 가설 H를 추정하여, 각각의 염색체에 대해 별도로, 상기와 같이 측정할 수 있다. 가능하게는, 당해 방법을 사용하여, 각각의 염색체 및 각각의 가설에 대해 확률 형태로, 고정된 노이즈 및 자녀 분획 매개변수를 유지하거나, 매개 변수 중 하나를 고정시키거나, 매개 변수 둘 모두를 유지하는 것이 가능하다.
DNA의 측정, 특히 DNA의 양이 적거나, 또는 DNA가 오염되는 DNA와 혼합된 경우의 측정은 노이즈가 많고/많거나 오류 발생이 쉽다. 이러한 노이즈는 정밀한 유전형 데이터, 및 정밀한 배수성 요청을 거의 생성하지 않는다. 일부 구현예에서, 플랫폼 모델화 또는 노이즈 모델화의 일부 다른 방법을 사용하여 배수성 측정시 노이즈의 유해한 효과를 계산할 수 있다. 본 발명은 채널 둘 모두의 결합 모델을 사용하며, 이는 입력 DNA, DNA 품질, 및/또는 프로토콜 품질의 양으로 인하여 무작위 노이즈에 대해 계산하는, 채널 둘 모두의 결합 모델을 사용한다.
이는, 배수성 측정이 유전자자리에서 대립유전자 강도의 비를 사용하여 이루어지는 당해 분야에 공지된 일부 방법과는 대조적이다. 당해 방법은 정밀한 SNP 노이즈 모델화를 불가능하게 한다. 특히, 측정시 오차는 전형적으로 당해 모델을 일차원 정보를 사용하여 환원시키는, 측정된 채널 강도 비에 구체적으로 의존하지 않는다. 노이즈, 채널 품질 및 채널 상호작용의 정밀한 모델화는 대립유전자 비를 사용하여 모델화될 수 없는, 2차원 결합 모델을 필요로 한다.
특히, 2개의 채널 정보를 비 r을 투여하는 것(여기서, f(x,y)는 r = x/y이다)은 자체가 정밀한 채널 노이즈 및 바이어스 모델화로 이끌지 않는다. 특정한 SNP에서 노이즈는 비의 기능이 아니지만, 즉, 노이즈(x,y)≠f(x,y)이지만, 실제로 채널 둘 모두의 결합 기능이다. 예를 들면, 바이어스 모델에서, 측정된 비의 노이즈는 순수하게 r의 기능이 아닌, r(1-r)/(x+y)의 변화량을 갖는다. 이러한 모델에서, 어떠한 채널 바이어스 또는 노이즈도 포함되지 않는 경우, SNP i에서, 관찰된 채널 X 값은 x=a_iX+b_i이고, 여기서 X는 실제 채널 값이며, b_i는 추가의 채널 바이어스 및 무작위 노이즈이다. 유사하게, y=c_iY+d_i으로 가정한다. (aiX+bi)/(ciY+di)는 X/Y의 기능이 아니므로, 관찰된 비 r=x/y는 실제 비 X/Y를 정밀하게 예측할 수 없거나 잔재 노이즈를 모델화할 수 없다.
본원에 개시된 방법은 측정 채널 모두의 결합 바이어스 분포를 개별적으로 사용하여 노이즈 및 편향을 모델화하는 효과적인 방법을 기술한다. 관련 방정식은 또한 SNP 거동을 효과적으로 조정하는 SNP 지속 편향당, P(good) 및 P(ref|bad), P(mut|bad)를 말하는 단락에서 본 문서 어딘가에서 찾을 수 있다. 일 구현예에서, 본원의 방법은 베타이항 분포를 사용하며, 이는 단지 대립유전자 비에 대한 의존의 제한 실시를 피하지만, 대신 채널 수 둘 모두를 기준으로 한 거동을 모델화한다.
일 구현예에서, 본원에 개시된 방법은 모든 가능한 측정을 사용함으로써 모계 혈장에서 발견된 유전 데이터에 의하여 잉태된 태아의 배수성을 요청할 수 있다. 일 구현예에서, 본원에 개시된 방법은 하위세트의 부모 관계만의 측정을 사용함으로써 모계 혈장에서 발견된 유전 정보에 의하여 잉태된 태아의 배수성을 요청할 수 있다. 당해 분야에 공지된 일부 방법은, 부모 관계가 AA|BB 관계에서 기원하는, 즉, 부모가 소정의 유전자자리에서 둘 모두 동종접합성이지만, 상이한 대립유전자에 대한 것인 경우이다. 당해 방법이 지닌 한가지 문제는, 다형성 유전자자리의 소 집단, 전형적으로 10% 미만이 AA|BB 관계에서 기원한다는 것이다. 본원에 개시된 방법의 일 구현예에서, 당해 방법은, 모계 정보가 AA|BB인 유전자자리에서 이루어진 모계 혈장의 유전적 측정을 사용하지 않는다. 일 구현예에서, 당해 방법은 AA|AB, AB|AA, 및 AB|AB 부모 관계를 사용한 이들 다형성 유전자자리에 대한 혈장 측정을 사용한다.
당해 분야에 공지된 일부 방법은 부모계 유전형 둘 모두 존재하는 AA|BB 관계에서 SNP에서 비롯된 대립유전자 비를 평균내는 단계를 포함하며, 이들 SNP에 대한 평균 대립유전자 비에서 비롯된 배수성 요청을 측정하도록 요구한다. 당해 방법은 차등적인 SNP 행위로 인하여 유의적인 비정밀도를 겪는다. 당해 방법은, 부모계 유전형 둘 모두가 공지되어 있는 것으로 추정함에 주목한다. 대조적으로, 일부 구현예에서, 당해 방법은 부모의 존재를 추정하지 않는 결합 채널 분포를 사용하며, 균일한 SNP 거동을 추정하지 않는다. 일부 구현예에서, 본 방법은 상이한 SNP 거동/중량화를 고려한다. 일부 구현예에서, 당해 방법은 부모계 유전형 하나 또는 둘 모두의 지식을 필요로 한다. 본 방법이 이를 달성할 수 있는 방법의 예는 다음과 같다:
일부 구현예에서, 가설의 로그 가능성은 SNP 기준 당 측정할 수 있다. 특정한 SNP i에서, 태아 배수성 가설 H 및 태아 DNA 퍼센트 cf를 추정하여, 관측된 데이터 D의 로그 가능성은 다음과 같이 정의된다:

여기서 m은 가능한 실제 모계 유전형이고, f는 가능한 실제 부계 유전형이며, 여기서, m,f

{AA,AB,BB}이고, 여기서 c는 가설 H를 제공한 가능한 자녀 유전형이다. 특히, 일염색체성의 경우 c

, 이염색체성의 경우 c

, 삼염색체성의 경우 c

이다. 부모계 유전형 데이터를 포함시키는 것이 보다 정밀한 배수성 측정을 생성하지만, 부모계 유전형 데이터는 본 방법을 잘 작업하는데 필수적이지 않음에 주목한다.
당해 분야에 공지된 일부 방법은 SNP에서 비롯된 대립유전자 비를 평균내는 단계를 포함하며, 여기서 모는 동종접합성이지만 상이한 대립유전자가 혈장(AA|AB 또는 AA|BB 관계)에서 측정되며, 이들 SNP에서 평균 대립유전자 비에 의한 배수성 요청을 측정하도록 청구된다. 당해 방법은, 부모계 유전형이 이용불가능한 경우에 의도된다. 혈장이 동종접합성 및 반대되는 부계 BB의 존재없이 특정한 SNP에서 이종접합성임을 어떻게 정밀하게 요청할 수 있는지: 낮은 자녀 분획을 사용하는 경우에, B 대립유전자의 존재가 바로 노이즈의 존재일 수 있는지를 어떻게 볼 것인가; 추가로, B 존재가 태아 측정의 단순한 대립유전자 드롭 아웃이 아닐 수 있는지를 어떻게 볼 것인가가 의문시됨에 주목한다. 혈장의 이종접합성을 실제적으로 측정할 수 있는 경우에서조차, 당해 방법은 부모계 삼염색체성을 구별할 수 없을 것이다. 특히, 모가 AA이고 일부 B가 혈장 속에서 측정되는 SNP의 경우, 부친의 GG가 GG라면, 생성되는 자녀 유전형은 AGG이고, 33%의 평균비(자녀 분획의 경우 = 100%)가 생성된다. 부친이 AG인 경우, 수득되는 자녀 유전형은 일치된 상염색체성의 경우 AGG일 수 있으며, 33% A 비, 또는 일치되지 않는 삼염색체성의 경우 AAG에 기영하여, 66% A에 대해 보다 더 평균 비를 이끌어낸다. 많은 삼염색체성이 교차되는 염색체 상에 제공되는 경우, 전체 염색체는 일치하지 않은 삼염색체성의 부재와 모든 일치하지 않는 삼염색체성 사이의 어느 부위를 가질 수 있으며, 당해 비는 33 내지 66% 사이의 어느 곳에서도 변할 수 있다. 분명한 이염색체성의 경우, 당해 비는 대략 50%일 수 있다. 결합 모델 또는 평균의 정밀한 오차 모델의 사용없이, 당해 방법은 많은 경우의 부모계 삼염색체성을 놓칠 수 있다. 대조적으로, 본원에 개시된 방법은 이용가능한 유전형 정보 및 집단 빈도를 기반으로, 각각의 부모계 유전형 후보물에 대한 부모계 유전형 확률을 지정하며, 부모계 유전형을 분명하게 요구하지 않는다. 또한, 본원에 개시된 방법은 부모계 유전형 데이터의 부재 또는 존재에서 조차 삼염색체성을 검출할 수 있으며, 연결 모델을 사용하여 일치된 삼염색체성에서 일치하지 않은 삼염색체성까지 가능한 교차점을 확인함으로써 보상될 수 있다.
당해 분야에 공지된 일부 방법은, 모친 또는 부모계 유전형은 알려져 있지 않는 SNP의 대립유전자 비를 평균내고, 이들 SNP에서 평균 비에 의한 배수성 요청을 측정하기 위한 방법을 요청한다. 그러나, 이러한 목표를 달성하기 위한 방법은 기재되어 있지 않다. 본원에 개시된 방법은 이러한 상황에서 정밀한 배수성 요청을 이룰 수 있고, 실시하기 위한 감소는 본 서류의 어딘가에 결합 가능성 최대 확률 방법을 사용하여 기재되어 있으며 임의로 SNP 노이즈 및 바이어스 모델, 및 연결 모델을 이용한다.
당해 분야에 공지된 일부 방법은 대립유전자 비를 평균내는 단계를 포함하며 하나 또는 소수의 SNP에서 평균 대립유전자 비에 의한 배수성 요청을 측정하기 위해 요청된다. 그러나, 이러한 방법은 연결의 개념을 이용하지 않는다. 본원에 개시된 방법은 이들 단점을 겪지 않는다.
DNA의 기원을 측정하기 위한 선행으로서 서열 길이의 사용
서열 길이의 분포가 모계 및 부계 DNA의 경우 상이하며, 태아는 일반적으로 더 짧음이 보고되어 왔다. 본원의 일 구현예에서, 실험 데이터의 형태로 사전 지식을 사용하고, 모계 (P(X| 모계)) 및 태아 DNA (P(X| 태아))의 예측된 길이에 대한 사전 분포를 작제하는 것이 가능하다. 길이가 x인 새로이 확인되지 않은 DNA 서열을 제공함으로써, 제공된 모계 또는 태아에 대한 사전 가능성 x를 기반으로 하여, 제공된 DNA 서열이 모계 또는 부계 DNA일 확률을 지정하는 것이 가능하다. 특히 P(x|모계) > P(x|부계)의 경우, DNA 서열은 P(x|모계) = P(x|모계)/[(P(x|모계) + P(x| 태아)]를 사용하여 모로 분류될 수 있으며, p(x|모계) < p(x|부계)인 경우, DNA 서열은 태아, P(x| 태아) = P(x| 태아)/[(P(x|모계) + P(x| 태아)]로 분류될 수 있다. 본원의 일 구현예에서, 확률이 높은 모계 또는 태아로서 지정될 수 있는 서열을 고려함으로써 이러한 서열에 대해 특이적인 모계 및 태아 서열 길이의 분포를 측정한 후, 이러한 시료 특이적인 분포를 이러한 시료에 대한 예측된 크기 분포로서 사용할 수 있다.
서열분석 비용을 최소화하기 위한 가변성 판독물 깊이
예를 들면, 문헌(참조: Chiu 등 BMJ 2011;342:c7401)에서 진단에 관한 많은 임상 접근법에서, 다수의 매개변수를 사용한 프로토콜을 설정한 후, 시료 프로토콜을 접근법 중 환자 각각에 대해 동일함 매개변수를 사용하여 수행한다. 유전 물질을 측정하기 위한 방법으로서 서열분석을 사용하여 모에서 잉태중인 태아의 배수성 상태를 측정하는 경우에, 한가지 적절한 매개변수는 다수의 판독물이다. 다수의 판독물은 다수의 실제 판독물, 다수의 의도된 판독물, 분획 레인, 완전한 레인, 또는 서열분석기 상의 완전한 유동 세포들을 말할 수 있다. 이들 연구에서, 다수의 판독물은 전형적으로, 모든 또는 거의 모든 시료가 바람직한 정밀도 수준을 달성하는 것을 보증할 수준에서 전형적으로 설정된다. 서열분석은 현재, 비용이 많이 드는 기술이며, 이의 비용은 5백만 개의 맵핑가능한 판독물 당 대략 200불이지만, 당해 가격은 내려가고 있으며, 서열분석을 기반으로 하는 진단이 유사한 수준의 정밀도로 그러나 보다 적은 판독물을 사용하여 작동하도록 허용하는 어떠한 방법도 필수적으로 고려한만한 양의 비용을 절약할 것이다.
배수성 측정의 정밀도는 혼합물 속의 판독물 및 태아 DNA의 분획의 수를 포함하는 다수의 인자에 전형적으로 의존한다. 당해 정밀도는, 혼합물 속의 태아 DNA의 분획이 보다 더 많은 경우 전형적으로 더 높다. 동시에, 정밀도는, 판독물의 수가 더 큰 경우 전형적으로 더 높다. 배수성 상태가 고려할만한 정밀도로 측정되는 2가지 경우를 지닌 상황을 갖는 것이 가능하며, 여기서 첫번째 경우는 두번째보다 혼합물 속에 태아 DNA의 분획이 더 낮고, 보다 많은 판독물이 두번째보다 첫번째 경우에서 서열분석되었다. 혼합물 속의 태아 DNA의 추정된 분획을 제공된 수준의 정밀도를 달성하는데 필수적인 다수의 판독물을 측정하는데 있어서 안내자로서 사용하는 것이 가능하다.
본원의 일 구현예에서, 시료의 세트를 수행할 수 있으며, 여기서 세트내 상이한 시료를 상이한 판독물 깊이로 서열분석하며, 여기서, 각각의 시료에서 수행된 판독물의 수를 선택하며 각 혼합물에 함유된 태아 DNA의 계산된 분획을 고려해볼 때 소정의 정확도 수준을 달성한다. 본원의 일 구현예에서, 이는 혼합된 시료를 측정하여 혼합물 속의 태아 DNA의 분획을 측정함을 내포할 수 있으며; 태아 분획의 이러한 평가는 서열분석을 사용하여 수행할 수 있고, 이는 TAQMAN으로 수행할 수 있으며, 이는 qPCR로 수행할 수 있고, 이는 SNP 배열로 수행할 수 있으며, 이는 소정의 유전자자리에서 상이한 대립유전자를 구별할 수 있는 어떠한 방법도 사용하여 수행할 수 있다. 태아 분획 평가에 대한 요구도는, 실제 측정된 데이터와 비교하는 경우 고려되는 가설의 세트에서 모든 또는 선택된 세트의 태아 분획을 포함하는 가설을 포함시킴으로써 평가할 수 있다. 분획 후 혼합물 속의 태아 DNA를 측정하여, 각각의 시료에 대해 판독될 서열의 수를 측정할 수 있다.
본원의 일 구현예에서, 100명의 임신한 여성이 이들 각각의 OB에 방문하여 그들의 혈액을 항-용해제 및/또는 DNAase를 불활성화시키는 어떤 것이 들어있는 혈액 튜브내로 채혈한다. 이들 각각은 타액 시료를 제공한 이들의 잉태된 태아의 부친에 대한 키트를 집에다 준다. 모든 100개의 쌍들에 대해 유전 물질의 세트 둘 모두를 실험실로 다시 보내며, 여기서 모계 혈액은 회전시켜 외피 및 혈장을 분리시킨다. 혈장은 모계 DNA 및 태반 기원의 DNA의 혼합물을 포함한다. 모계 연막 및 태아 혈액을 SNP 배열을 사용하여 유전형 분석하고, 모계 혈장 시료 속의 DNA는 SURESELECT 하이브리드화 브로브로 표적화한다. 프로브로 분해한(pul l down) DNA를 사용하여, 모계 시료 각각에 대해 1개씩, 100개의 표적화된 라이브러리를 생성시키고, 여기서 각각의 시료는 상이한 태그로 태그시킨다. 각각의 라이브러리에서 분획을 제거하고, 이들 분획 각각을 함께 혼합하여 다중화된 양식으로 ILLUMINA HISEQ DNA 서열분석기의 2개의 레인에 가하며, 여기서 각각의 레인은 대략 5천만 개의 맵핑가능한 판독물을 생성하며, 100개의 다중화된 혼합물에 대해 대략 1억개의 맵핑가능한 판독물을 생성하거나, 시료당 대략 백만 개의 판독물을 생성한다. 서열 판독물을 사용하여 각각이 혼합물 중 태아 DNA의 분획을 측정하였다. 시료 중 50개는 혼합물 속에서 15% 이상의 태아 DAN를 가지며, 백만 개의 판독물은 99.9%의 신뢰도로 태아의 배수성 상태를 측정하기에 충분하였다.
나머지 혼합물 중에서, 25개는 10 내지 15%의 태아 DNA를 가졌으며; 이들 혼합물로부터 제조된 관련 라이브러리 각각의 분획을 다중화시키고 HISEQ의 하나의 레인을 중지시켜 각각의 시료에 대해 추가의 2백만 개의 판독물을 생성하였다. 10 내지 15%의 태아 DNA를 갖는 혼합물 각각에 대한 서열 데이터의 2개 세트를 함께 가하여, 99.9%의 신뢰도로 이들 태아의 배수성 상태를 측정하기에 충분한 시료당 3백만 개의 판독물을 생성하였다.
나머지 혼합물 중에서, 13개는 6 내지 10%의 태아 DNA를 가졌고; 이들 혼합물로부터 제조된 관련된 라이브러리 각각의 분획은 다중화시켜 HISEQ의 하나의 레인을 중지시켜 각각의 시료에 대해 추가로 4백만 개의 판독물을 생성하였다. 6 내지 10%의 태아 DNA를 지닌 혼합물 각각에 대한 서열 데이터의 2개 세트를 함께 가하고, 99.9% 신뢰도로 이들 태아의 배수성 상태를 측정하기에 충분한 혼합물 당 총 5백만 개의 판독물을 생성하였다.
나머지 혼합물 중에서, 8개는 4 내지 6%의 태아 DNA를 가졌으며; 이들 혼합물로부터 제조된 관련 라이브러리 각각의 분획을 다중화하여 HISEQ의 하나의 레인을 중지시켜 각각의 시료당 추가로 6백만 개의 판독물을 생성하였다. 4 내지 6%의 태아 DNA를 갖는 혼합물 각각에 대한 서열 데이터의 2개 세트를 함께 가하여, 99.9%의 신뢰도로 이들 태아의 배수성 상태를 측정하기에 충분한 혼합물당 7백만 개의 판독물을 생성하였다.
나머지 4개의 혼합물 중에서, 이들 모두는 2 내지 4%의 태아 DNA를 가졌으며; 이들 혼합물로부터 제조된 관련 라이브러리 각각의 분획을 다중화하여 HISEQ의 하나의 레인을 중지시켜 각각의 시료에 대해 추가로 1,200만 개의 판독물을 생성하였다. 2 내지 4%의 태아 DNA를 갖는 혼합물 각각에 대한 서열 데이터의 2개 세트를 함께 가하여, 99.9%의 신뢰도로 이들 태아의 배수성 상태를 측정하기에 충분한 혼합물당 1,300만 개의 판독물을 생성하였다.
당해 방법은 100개 이상의 시료에 대하여 99.9%의 정밀도를 달성하기 위해 HISEQ 기계 위에서 서열분석하는 6개의 레인을 요구한다. 동일한 수의 수행이 모든 시료에 대해 요구되는 경우, 모든 배수성 측정이 99.9%의 신뢰도로 이루어지도록 보증하기 위해서, 25개의 서열분석 레인이 취해질 수 있으며, 요청 비가 없거나 4%의 오차율이 용인되는 경우, 이는 14개의 서열분석 레인으로 달성될 수 있었다.
미가공 유전형 데이터의 사용
모계 혈액에서 발견된 태아 DNA에서 측정한 태아 유전 정보를 사용하여 NPD를 달성할 수 있는 다수의 방법이 존재한다. 이들 방법들 중 일부는 SNP 배열을 사용하여 태아 DNA를 측정함을 포함하며, 일부 방법은 표적화되지 않은 서열분석을 포함하고, 일부 방법은 표적화된 서열분석을 포함한다. 표적화된 서열분석은 SNP를 표적화할 수 있으며, 이는 STR을 표적화할 수 있고, 이는 다른 다형성 유전자자리를 표적화할 수 있으며, 이는 비-다형성 유전자자리, 또는 이의 일부 조합을 표적화할 수 있다. 이들 방법들 중 일부는 이러한 측정을 수행하는 기계 속의 센서의 강도 데이터로부터 판별하는 대립유전자의 실체를 요청하는, 시판되거나 적절한 대립유전자 요청기를 사용하는 단계를 포함할 수 있다. 예를 들어, ILLUMINA INFINIUM 시스템 또는 AFFYMETRIX GENECHIP 미세배열 시스템은 DNA의 상보성 분절에 하이브리드화할 수 있는 부착된 DNA 서열를 지닌 비드 또는 미세칩을 포함하고; 하이브리드화 시, 검출될 수 있는 센서 분자의 형광성 특성에 있어서의 변화가 존재한다. 또한 서열분석 방법, 예를 들면, ILLUMINA SOLEXA GENOME SEQUENCER 또는 ABI SOLID GENOME SEQUENCER가 존재하며, 여기서 DNA 단편의 유전 서열이 서열분석되고; 서열분석되는 쇄에 대해 상보성인 DNA의 쇄의 연장 시, 연장된 뉴클레오타이드의 실체는 전형적으로 상보성 뉴클레오타이드에 첨부된 형광성 또는 방사성 태그를 통해 검출된다. 이들 방법 모두에서, 유전형 또는 서열분석 데이터는 전형적으로 형광성 또는 다른 신호, 또는 이의 결함을 기반으로 측정된다. 이들 시스템은 전형적으로 형광성 또는 다른 검출 장치(일차 유전 데이터)의 유사체 출력으로부터 특이적인 대립유전자 요청(2차 유전 데이터)를 제조하는 낮은 수준의 소프트웨어 패키지와 합해진다. 예를 들면, SNP 배열에서 소정의 대립유전자의 경우에, 소프트웨어는 예를 들면, 형광성 강도가 특정의 한계를 초과하거나 이하로 측정되는 경우에 특정의 SNP가 존재하거나 부재하는 요청을 이룰 것이다. 유사하게, 서열분석기의 출력은 염료 각각에 대해 검출된 형광성 수준을 나타내는 크로마토그람이며, 당해 소프트웨어는, 특정의 염기쌍이 A 또는 T 또는 C 또는 G라는 요청을 할 것이다. 고 배출 서열분석기는 전형적으로 판독물로 불리는, 일련의 이러한 측정을 이루며, 이는 서열분석된 DNA 서열의 가장 유사한 구조를 나타낸다. 크로마토그람의 직접적인 유사체 배출은 본원에서 1차 유전 데이터인 것으로 정의되며, 소프트웨어에 의해 이루어진 염기쌍/SNP 요청은 본원에서 2차적인 유전 데이터인 것으로 고려된다. 일 구현예에서, 1차 데이터는 유전형 플랫폼의 프로세싱되지 않은 출력인 미가공 강도 데이터를 말하며, 여기서 유전형 플랫폼은 SNP 배열, 또는 서열분석 플랫폼을 말할 수 있다. 2차의 유전 데이터는 가공된 유전 데이터를 말하며, 여기서 대립유전자 요청은 이루어지거나, 서열 데이터가 염기 쌍에 지정되고/되거나 서열 판독물이 게놈에 대해 맵핑된다.
많은 보다 높은 수준의 적용은 이들 대립유전자 요청, SNP 요청 및 서열분석 판독물, 즉, 유전형 분석 소프트웨어가 생산하는, 2차 유전 데이터의 장점을 취한다. 예를 들어, DNA NEXUS, ELAND 또는 MAQ는 서열분석 판독물을 취하여 이들을 게놈내로 맵핑한다. 예를 들면, 비-침입성 태아기 진단과 관련하여, PARENTAL SUPPORT^TM과 같은 복합체 정보학은 다수의 SNP 요청을 지렛대로 이용하여 개체의 유전형을 측정할 수 있다. 또한, 착상 전 유전 진단과 관련하여, 게놈에 맵핑된 서열 판독물의 세트를 취하고, 각각의 염색체, 또는 염색체의 단면에 맵핑된 판독물의 표준화된 수를 취함으로써, 개체의 배수성 상태를 측정하는 것이 가능하다. 비-침입성 태아기 진단과 관련하여, 모계 혈장 속에 존재하는 DNA에서 측정된 서열 판독물의 세트를 취하여 이들을 게놈내로 맵핑하는 것이 가능할 수 있다. 이후에, 각각의 염색체, 또는 염색체의 단면에 맵팽되는 판독물이 표준화된 수를 취하고, 당해 데이터를 사용하여 개체의 배수성 상태를 측정할 수 있다. 예를 들면, 불균형적으로 거대한 수의 판독물을 갖는 염색체는, 혈액이 채혈되는 모가 임신중인 태아에서 삼염색체성인 것으로 결론짓는 것이 가능할 수 있다.
그러나, 실제로, 측정하는 장치의 초기 출력은 동족체 신호이다. 특정의 염기쌍이 서열분석 소프트웨어와 관련된 소프트웨어에 의해 요청되는 경우, 예를 들어, 소프트웨어는 염기쌍 T를 쵸청할 수 있고, 실제로 당해 요청은, 소프트웨어가 가장 가능성이 있는 것으로 여겨지는 요청이다. 그러나, 일부 경우에서, 당해 요청은, 신뢰도가 낮을 수 있는데, 예를 들면, 동족체 신호는, 특정한 염기쌍이 T가 될 가능성이 단지 90%이고, A가 될 가능성이 10%임을 나타낼 수 있다. 다른 예에서, SNP 배열 판독기와 관련된 소프트웨어를 요청하는 유전형은 특정의 대립유전자가 G가 되도록 요청할 수 있다. 그러나, 실제로, 직면한 동족체 신호는, 대립유전자가 G인 가능성이 단지 70%이고 대립유전자가 T일 가능성이 30%임을 나타낼 수 있다. 이들 경우에, 보다 높은 수준의 적용이 보다 낮은 수준의 소프트웨어에 의해 이루어진 유전형 요청 및 서열 요청을 사용하는 경우, 이들은 일부 정보를 상실한다. 즉, 유전형 플랫폼에 의해 직접 측정된 것으로서, 1차 유전 데이터는 첨부된 소프트웨어 패키지에 의해 측정된 2차 유전 데이터보다 더 지저분할 수 있지만, 이는 더 많은 정보를 함유한다. 2차 유전 데이터 서열을 게놈에 맵핑시, 일부 염기가 충분한 명확성으로 판독되지 않고/않거나 맵핑이 명확하지 않으므로 많은 판독물이 배출된다. 1차 유전 데이터 서열 판독물이 사용되는 경우, 2차 유전 데이터 서열 판독물로 먼저 전한되는 경우 매출될 수 있는 모든 또는 많은 판독물을 확률론적 방식으로 판독물을 처리하여 사용할 수 있다.
본원의 구현예에서, 보다 높은 수준의 소프트웨어는 대립유전자 요청, SNP 요청, 또는 보다 낮은 수준의 소프트웨어에 의해 측정된 서열 판독물에 의존하지 않는다. 대신, 보다 높은 수준의 소프트웨어는 유전형 플랫폼으로부터 직접 측정된 유사체 신호에 대한 이의 계산을 기반으로 한다. 본원의 일 구현예에서, PARENTAL SUPPORT^TM과 같은 정보학을 기반으로 하는 방법은, 배아/태아/자녀의 유전 데이터를 재구축하는 이의 능력을 가공하여 유전형 플랫폼에 의해 측정된 것으로서 1차 유전 데이터를 직접 사용하도록 변형된다. 본원의 일 구현예에서, PARENTAL SUPPORT^TM과 같은 정보학 기반 방법은 1차 유전 데이터를 사용하고, 2차 유전 데이터는 사용하지 않으면서 대립유전자 요청 및/또는 염색체 카피 수 요청을 이룰 수 있다. 본원의 일 구현예에서, 모든 유전 요청, SNP 요청, 서열 판독물, 서열 맵핑은 1차 유전 데이터를 2차 유전 요청으로 전환시키기 보다는, 유전형 플랫폼에 의해 직접 측정된 미가공 강도 데이터를 사용함으로써 확률론적 방식으로 처리된다. 일 구현예에서, 대립유전자 수 확률을 계산하고 각각의 가설의 상대적인 확률을 계산하는데 사용된 제조된 시료의 DNA 측정은 1차 유전 데이터를 포함한다.
일부 구현예에서, 당해 방법은 적어도 하나의 관련 개체의 유전 데이터를 포함하는 표적 개체의 유전 데이터의 정밀도를 증가시킬 수 있으며, 당해 방법은 표적 개체의 게놈에 대해 특이적인 1차 유전 데이터 및 관련 개체(들)의 게놈(들)에 대해 특이적인 유전 데이터를 수득하는 단계, 관련된 개체(들)의 어느 염색체의 분절이 가능하게는 표적 개체의 게놈에서 이들 분절에 상응하는지에 관한 하나 이상의 가설의 세트를 생성시키는 단계, 표적 개체의 1차 유전 데이터 및 관련된 개체(들)의 유전 데이터를 제공한 가설 각각의 확률을 측정하는 단계, 및 각각이 가설과 관련된 확률을 사용하여 표적 개체의 실제 유전 물질의 가장 유사한 상태를 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 표적 개체의 게놈에서 염색체의 분절의 카피의 수를 측정할 수 있으며, 당해 방법은 염색체 분절의 얼마나 많은 카피가 표적 개체의 게놈 속에 존재하는지에 대한 카피 수 가설의 세트를 생성하는 단계, 하나 이상의 관련된 개체의 유전 정보 및 표적 개체의 1차 유전 데이터를 데이터 세트내로 포함시키는 단계, 데이터 세트와 관련된 플랫폼 반응의 특성을 평가하는 단계(여기서, 당해 플랫폼 반응은 실험마다 변할 수 있다), 각각의 카피 수 가설, 소정의 데이터 세트 및 플랫폼 반응 특성의 조건화된 확률을 계산하는 단계, 및 가장 가능성있는 카피 수 가설을 기반으로 염색체 분절의 카피 수을 측정하는 단계를 포함한다. 일 구현예에서, 본원의 방법은 표적 개체에서 적어도 하나의 염색체의 배수성 상태를 측정할 수 있으며, 당해 방법은 표적 개체로부터 그리고 하나 이상의 관련 개체로부터 1차 유전 데이터를 수득하는 단계, 표적 개체의 염색체 각각에 대해 적어도 하나의 배수성 상태 가설의 세트를 생성시키는 단계, 하나 이상의 배출 기술을 사용하여 세트내 각각의 배수성 상태 가설, 사용된 각각의 숙련된 기술에 대한 통계적 확률을, 수득된 유전 데이터를 제공하여 측정하는 단계, 각각의 배수성 상태 가설에 대해, 하나 이상의 숙련된 기술에 의해 측정된 것으로서 통게적 확률을 합하는 단계, 및 각각의 배수성 상태 가설의 합해진 통계적 확률을 기반으로 표적 개체에서 염색체 각각에 대한 배수성 상태를 측정하는 단계를 포함한다. 일 구현예에서, 본원의 방법은 대립유전자의 세트, 표적 개체, 및 표적 개체의 한쪽 또는 양쪽 부모로부터, 그리고 선택적으로, 하나 이상의 관련 개체로부터 대립유전자들의 세트의 대립유전자 상태를 측정할 수 있으며, 당해 방법은 표적 개체, 및 부모 한명 또는 둘 모두, 및 어떠한 관련 개체의 1차 유전 데이터를 수득하는 단계, 표적 개체, 및 부모 한명 또는 둘 모두, 및 임의로 한명 이상의 관련된 개체에 대해 적어도 하나의 대립유전자 가설을 생성시키는 단계(여기서, 당해 가설은 대립유전자의 세트에서 가능한 대립유전자 상태를 기술한다), 수득된 유전 데이터가 가설들의 특정 세트에서 각각의 대립유전자 가설에 대한 통계적 가능성을 측정하는 단계, 및 각각이 대립유전자 가설의 통계적 확률을 기반으로 하여, 표적 개체에 대해, 및 부모 한명 또는 둘 모두에 대해, 및 임의로 한명 이상의 관련된 개체에 대해 대립유전자 상태를 측정하는 단계를 포함한다.
일부 구현예에서, 혼합된 시료의 유전 데이터는 서열 데이터를 포함할 수 있으며, 여기서 당해 서열 데이터는 사람 게놈에 유일하게 맵핑되지 않을 수 있다. 일부 구현예에서, 혼합된 시료의 유전 데이터는 서열 데이터를 포함할 수 있으며, 여기서 서열 데이터는 게놈내 다수의 위치에 맵핑되며, 여기서 각각의 가능한 맵핑은, 소정의 맵핑이 정확한지에 대한 확률과 관련된다. 일부 구현예에서, 서열 판독물은 게놈에 특수 위치와 관련된 것으로 추정되지 않는다. 일부 구현예에서, 서열 판독물은 게놈내 다수의 위치, 및 당해 위치에 속하는 관련된 확률과 관련되어 있다.
염색체 카피 수를 측정하기 위한 방법 계산
하나의 측면에서, 본 발명은 상이한 염색체에 정렬된 서열 태그의 수를 비교함으로써 태아 염색체의 비정상적인 분포에 대해 시험하는 방법을 특징으로 한다(참조: 예를 들면, 이의 전문이 본원에 참조로 포함된, 2012년 4월 20일자로 출원된 미국 특허 제8,296,076호). 당해 분야에 공지된 바와 같이, 용어 "서열 태그"는 예를 들면, 염색체 또는 게놈 영역 또는 유전자에 맵핑될 특정의 보다 큰 서열을 확인하는데 사용될 수 있는 비교적 짧은(예: 15 내지 100) 핵산 서열을 말한다. 일부 구현예에서, 당해 방법은 (i) 모계 또는 태아 DNA의 혼합물을 포함하는 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 표적 유전자자리에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계(여기서, 표적 유전자자리는 다수의 상이한 염색체에서 기원하며; 여기서 다수의 상이한 염색체는 시료 속에 비정상적인 분포를 갖는 것으로 추측된 적어도 하나의 제1 염색체 및 시료 속에 일반적으로 분포된 것으로 추측된 적어도 하나의 제2 염색체를 포함한다); (ii) 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 증폭된 생성물을 생산하는 단계; (iii) 증폭된 생성물을 서열분석하여 표적 유전자자리에 정렬되는 다수의 서열 태그를 수득하는 단계(여기서, 서열 태그는 특이적인 표적 유전자자리에 지정되기에 충분한 길이이다); (iv) 컴퓨터 상에서 이들의 상응하는 표적 유전자자리에 대한 다수의 서열 태그를 지정하는 단계; (v) 컴퓨터 상에서 제1의 염색체의 표적 유전자자리에 지정하는 서열 태그의 수 및 제2 염색체의 표적 유전자자리에 지정하는 서열 태그의 수를 측정하는 단계; 및 (vi) 단계 (v)에서 측정된 수를 비교하여 제1 염색체의 비정상적인 분포의 존재 또는 부재를 측정하는 단계를 포함한다.
하나의 측면에서, 본 발명은 염색체 사이의 표적 앰플리콘의 상대적인 빈도를 비교함으로써 태아 이수성의 존재 또는 부재를 측정하는 방법을 제공한다(참조: 예를 들면, 이의 전문이 본원에 참조로 포함된, 2012년 1월 23일자로 출원된, PCT 공보 제WO 2012/103031호). 일부 구현예에서, 당해 방법은 (i) 모계 및 태아 DNA의 혼합물을 포함하는 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 비-다형성 표적 유전자자리에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계(여기서, 표적 유전자자리는 다수의 상이한 염색체에서 기원한다); (ii) 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 표적 앰플리콘을 포함하는 증폭된 생성물을 생산하는 단계; (iii) 컴퓨터 상에서 목적한 제1 및 제2 염색체의 표적 앰플리콘의 상대적인 빈도를 정량하는 단계; (iv) 컴퓨터 상에서 목적한 제1 및 제2의 염색체의 표적 앰플리콘의 상대적 빈도를 비교하는 단계; 및 (v) 목적한 제1 및 제2의 염색체의 비교된 상대적 빈도를 기반으로 하여 이수성의 존재 또는 부재를 확인하는 단계를 포함한다. 일부 구현예에서, 제1의 염색체는 정배수성인 것으로 추측된 염색체이다. 일부 구현예에서, 제2의 염색체는 이수성인 것으로 추측된 염색체이다.
태아기 진단의 조합 방법
이수성 또는 다른 유전 결함의 태아기 진단 또는 산전 스크리닝에 사용될 수 있는 많은 방법이 있다. 본 문서의 어딘가에 및 2006년 11월 28일자로 출원된 미국 실용신안 출원 일련번호 제11/603,406호, 2008년 3월 17일자로 출원된 미국 실용신안 출원 일련번호 제12/076,348호, 및 PCT 출원 일련번호 제PCT/S09/52730호에, 관련 개체의 유전 데이터를 사용하여 정밀도를 증가시키고 이를 사용하여 태아와 같은 표적 개체의 유전 데이터를 알거나 평가하는 하나의 이러한 방법이 기재되어 있다. 태아기 진단에 사용된 다른 방법은 모계 혈액 속에서 특정 호르몬의 수준을 측정하는 단계를 포함하며, 여기서 이들 호르몬은 각종 유전 비정상과 관련되어 있따. 이의 예는 수개(일반적으로 2개, 3개, 4개 또는 5개)의 상이한 호르몬의 수준이 모계 혈액 속에서 측정되는 시험인, 3중 시험으로 불린다. 다수의 방법을 사용하여 특정한 결과의 가능성을 측정하는 경우에, 방법이 확정적이지 않거나 자체가 아닌 경우, 이들 방법으로 소정의 정보를 합하여 개개 방법 중 어느 것보다 더 정밀한 예측을 하는 것이 가능하다. 3중 시험에서, 3개의 상이한 호르몬에 의해 소정의 정보를 합하는 것은 예측될 수 있는 개개 호르몬 수준보다 더 정밀한 유전 비정상을 예측할 수 있다.
태아의 유전 상태, 구체적으로 태아내 유전적 비정상의 예측을 합하는 것을 포함하여 태아내에서 유전적 비정상의 가능성에 대해 보다 정밀한 예측을 하는 방법이 본원에 기재되어 있으며, 여기서 이러한 예측은 다양한 방법을 사용하여 이루었다. "보다 정밀한" 방법은 소정의 거짓 양성 비율에서 보다 낮은 거짓 음성 비율을 갖는 비정상성을 진단하는 방법을 말할 수 있다. 본원의 양호한 구현예에서, 하나 이상의 예측은 태아에 대해 공지된 유전 데이터를 기반으로 이루어지며, 여기서 유전적 지식은 PARENTAL SUPPORT^TM 방법, 즉, 태아와 관련된 개체의 유전 데이터를 사용하여 보다 큰 정밀도로 태아의 유전 데이터를 측정하는 방법을 사용하여 측정되었다. 일부 구현예에서 유전 데이터는 태아의 배수성 상태를 포함할 수 있다. 일부 구현예에서 유전 데이터는 태아의 게놈에 대한 대립유전자 요청의 세트를 말할 수 있다. 일부 구현예에서 예측의 일부는 3중 시험을 사용하여 이루어질 수 있었다. 일부 구현예에서, 예측의 일부는 모계 혈액 속의 다른 호르몬 수준의 측정을 사용하여 이루어질 수 있었다. 일부 구현예에서, 진단을 고려한 방법에 의해 이루어진 예측은 스크리닝을 고려한 방법에 의해 이루어진 예측과 합할 수 있다. 일부 구현예에서, 당해 방법은 알파-페토단백질(AFP)의 모계 혈액 수준을 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 접합되지 않은 에스트리올(UE₃)의 모계 혈액 수준을 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 베타 사람 융모성 고나도트로핀(베타-hCG)의 모계 혈액 수준을 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 침입성 영양세포줄기 항원(ITA)의 모계 혈액 수준을 측정하는 방법을 포함한다. 일부 구현예에서, 당해 방법은 인히빈의 물질 혈액 수준을 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 임신-관련 혈장 단백질 A(PAPP-A)의 모계 혈액 수준을 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 다른 호르몬의 모계 혈액 수준 또는 모계 혈청 마커를 측정하는 단계를 포함한다. 일부 구현예에서, 예측의 일부는 다른 방법을 사용하여 이루어질 수 있다. 일부 구현예에서, 예측중의 일부는 임신 12주 근처에서 초음파 및 혈액 시험 및 16주 주변에서 제2의 혈액 시료를 합한 것과 같은 완전히 통합된 시험을 사용하여 이루어 왔다. 일부 구현예에서, 당해 방법은 태아 목덜미 반투명(NT)을 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 예측을 이루기 위해 전술한 호르몬의 측정된 수준을 사용하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 상술한 방법의 조합을 포함한다.
예측을 합하는 많은 방법이 존재하는데, 예를 들면, 호르몬 측정을 다수의 중간값(MoM)으로 전환시킨 후 가능성 비(LR)로 전환시킬 수 있다. 유사하게, 다른 측정을 NT 분포의 혼합물 모델을 사용하여 LR로 전환시킬 수 있다. NT용 LR 및 생화학적 마커는 연령 및 소화 관련 위험을 곱하여 삼염색체성 21과 같은 다양한 조건에 대한 위험을 유도할 수 있었다. 검출비(DR) 및 거짓-양성 비(FPR)는 소정의 위험 한계를 초과하는 위험을 지닌 집단을 취하여 계산할 수 있었다.
일 구현예에서, 배수성 상태를 요청하는 방법은 결합 분포 모델을 사용하여 측정된 배수성 가설 각각의 상대적인 가능성 및 대립유전자 수 가능성을 판독물 수 비 분석, 이형접합성 비의 비교, 부모계 유전 정보를 사용하는 경우 유일하게 이용가능한 통계, 특정의 부모 관계에 대한 표준화된 유전형 신호의 가능성, 제1의 시료 또는 제조된 시료의 평가된 태아 분획을 사용하여 계산된 통계, 및 이의 조합을 포함하나, 이에 한정되지 않는, 삼염색체성인 태아에 대한 위험 점수를 측정하는 다른 방법에서 취한 통계 기술을 사용하여 계산된 배수성 가설 각각의 상대적인 가능성과 합하는 단계를 포함한다.
다른 방법은 4개의 측정된 호르몬 수준을 사용하는 상황을 포함할 수 있으며, 여기서 이들 호르몬 주변의 가능성 분포는 알려져 있다: 정배수성의 경우 p(x₁, x₂, x₃, x₄|e) 및 이수성 경우 p(x₁, x₂, x₃, x₄|a). 이후에, DNA 측정을 위한 확률 분포, g(y|e) 및 정배수성 및 이수성 경우에 대한 g(y|a) 각각을 측정할 수 있다. 이들이 소정의 정배수성/이수성의 추측에 의존적인 것으로 가정하여, p(x₁, x₂, x₃, x₄|a)g(y|a) 및 p(x₁, x₂, x₃, x₄|e)g(y|e)로서 합한 후 각각에 모친 연령이 소정의 이전의 p(a) 및 p(e)를 곱할 수 있다. 이후에, 최대인 것을 선택할 수 있다.
일 구현예에서, 중심 한계 정리를 유발시켜 g(y|a 또는 e) 상에서의 분포가 다수의 시료를 관찰함에 의해, 가우시안, 및 척도 평균 및 표준 편차인지를 추정할 수 있다. 다른 구현예에서, 이들이 소정의 결과와는 독립적이 아님을 추정하여 결합 분포 p(x₁, x₂, x₃, x₄|a 또는 e)를 평가하기에 충분한 시료를 수집할 수 있다.
일 구현예에서, 표적 개체에 대한 배수성 상태는, 이의 확률이 최대인 가설과 관련된 배수성 상태인 것으로 측정된다. 일부 경우에, 한가지 가설은 90% 초과의 표준화되고, 합해진 확률을 가질 것이다. 각각의 가설은 배수성 상태 하나 또는 세트와 관련되어 있으며, 이의 표준화되고, 합해진 가능성이 90% 초과이거나, 50%, 80%, 95%, 98%, 99%, 또는 99.9%와 같은 일부 다른 한계 값과 관련된 배수성 상태는 측정된 배수성 상태로서 요청될 가설에 대해 요구된 한계로서 선택될 수 있다.
모계 혈액에 존재하는 이전 임신의 자녀의 DNA
비-침입성 출생전 진단의 한가지 어려움은 현재의 임신의 태아 세포를 이전 임신의 태아 세포와 구별하는 것이다. 어떤 이는, 이전 임신의 유전 물질이 얼마의 시간 후 사라질 것이지만, 결론적인 증거는 나타나지 않았다고 생각한다. 본원의 구현예에서, 부계 기원의 모계 혈액(즉, 태자녀 부친으로부터 유전받은 DNA) 속에 존재하는 태아 DNA는 PARENTAL SUPPORT^TM (PS) 방법, 및 부모계 게놈의 지식을 사용하여 측정하는 것이 가능하다. 당해 방법은 단계화된 부모계 유전 정보를 이용할 수 있다. 조부모계 유전 데이터(할아버지 정자의 측정된 유전 데이터와 같은), 또는 다른 태아안 자녀, 또는 유산의 시료의 유전 데이터를 사용하여 단계화되지 않은 유전형 정보에 의하여 부모계 유전형을 단계화하는 것이 가능하다. 또한 HapMap-계 단계화, 또는 부모계 세포의 배체형의 방법으로 단계화되지 않은 유전 정보를 단계화할 수 있다. 성공적인 배체형은, 염색체가 강력한 다발이고 미소유체를 사용하여 별도의 웰 속에서 염색체를 분리하는 경우 유사분열의 상에서 세포를 정지시켜 입증하였다. 다른 구현예에서, 단계화된 부모계 일배체형 데이터를 사용하여 부친의 하나 이상의 동족체의 존재를 검출하는 것이 가능하며, 1명 이상의 자녀의 유전 물질이 혈액 속에 존재함을 내포한다. 태아내 정배수성인 것으로 예측된 염색체에 촛점을 맞춤으로써, 태자녀 삼염색체성에 영향을 받을 가능성을 제외시킬 수 있다. 또한, 태아 DNA가 현재의 부친에서 기원하지 않는 경우를 측정하는 것이 가능하며, 이 경우 3중 시험과 같은 다른 방법을 사용하여 유전적 비정상을 예측할 수 있다.
채혈 이외의 방법을 통해 이용가능한 태아 유전 물질의 다른 공급원이 존재할 수 있다. 모계 혈액 속에서 이용가능한 태아 유전 물질의 경우에, 2개의 주요 범주가 존재한다: (1) 전체 태아 세포, 예를 들면, 핵화된 태아 적혈구 세포 또는 적아구, 및 (2) 자유로이 부유하는 태아 DNA. 전체 태아 세포의 경우에, 태아 세포가 연장된 기간 동안 모계 혈액 속에서 존재함으로써 자녀 또는 이전 임신의 태아의 DNA를 함유하는 임신한 여성의 세포를 분리할 수 있다. 자유로이 부유하는 태아 DNA가 주의 문제로 시스템에서 정화되는 증거가 또한 존재한다. 한 가지 과제(challenge)는, 이의 유전 물질이 세포 속에 함유된 개체의 실체를 측정하는 방법, 즉, 측정된 유전 물질이 이전 임신의 태아에서 기원한 것이 아니라는 것을 확인하는 것이다. 본원의 구현예에서, 모계 유전 물질의 지식을 사용하여 문제의 유전 물질이 모계 유전 물질이 아니라는 것을 확인할 수 있다. 본 서류 또는 본 서류에 언급된 특허 중 어느 것에 기술된 바와 같은, PARENTAL SUPPORT^TM와 같은 정보학 기반 방법을 포함하는, 이러한 목적을 달성하기 위한 다수의 방법이 존재한다.
본원의 일 구현예에서, 임신한 모친에서 채취한 혈액은 자유로이 부유하는 태아 DNA를 포함하는 분획, 및 핵화된 적혈구 세포를 포함하는 분획으로 분리할 수 있다. 자유로이 부유하는 DNA는 임의로 농축시킬 수 있으며, DNA의 유전형 정보를 측정할 수 있다. 자유로이 부유하는 DNA의 측정된 유전형 정보의 모계 유전형의 지식을 사용하여 태아 유전형의 측면을 측정할 수 있다. 당해 측면은 배수성 상태, 및/또는 대립유전자 실체의 세트를 말한다. 이후에, 개체의 핵화된 적혈구 세포는 본 서류, 및 다른 참조 특허, 특히 본 서류의 제1 단락에서 언급된 것들의 어딘가에 기술된 방법을 사용하여 유전형 분석할 수 있다. 모계 게놈의 지식은 소정의 단일 혈액 세포가 유전적으로 모의 것인지 또는 아닌지를 측정하도록 한다. 그리고, 상기 기술된 바와 같이 측정된 태아 유전형의 측면은, 단일 혈액 세포가 현재 임신중인 태아에서 유전적으로 기원하는지를 측정할 수 있도록 한다. 필수적으로, 본원의 이러한 측면은, 모친의 유전 지식, 및 가능하게는 부친과 같은 다른 관련된 개체의 유전 정보를, 모계 혈액 속에서 발견된 자유로이 부유하는 DNA의 측정된 유전 정보와 함께 사용하여 모계 혈액 속에서 발견된 분리된 핵화된 세포가 (a) 유전적으로 모계인지, (b) 유전적으로 현재 잉태된 태아에서 기원하는지, 또는 (c) 유전적으로 이전 임신의 태아에서 기원하는지를 측정하도록 한다.
출생전 성 염색체 이수성 측정
당해 분야에 공지된 방법에서, 모계 혈액으로부터 임신중인 태아의 성별을 판별하기 위해 접근법하는 사람들은, 태아의 자유로이 부유하는 DNA(fffDNA)가 모의 혈장 속에 존재한다는 사실을 사용한다. 모계 혈장 속에서 Y-특이적인 유전자자리를 검출할 수 있는 경우, 이는, 잉태중인 태자녀 남자임을 내포한다. 그러나, 혈장 속의 Y-특이적인 유전자자리의 검출의 부재는, Y-특이적인 유전자자리가 남자 태아의 경우에 검출될 수 있는 지를 보증할 수 없도록 fffDNA의 양이 너무 적은 경우 잉태되는 태자녀 여자인지를 항상 보장하지는 않는다.
본원에서는, Y-특이적인 핵산, 즉, 전적으로 부모계 기원 유전자자리의 DNA의 측정을 필요로 하지 않는 신규한 방법을 나타낸다. 앞서 개시된 부모 지지 방법은 교차 빈도 데이터, 부모계 유전형 데이터, 및 정보학 기술을 사용하여 잉태중인 태아의 배수성 상태를 측정한다. 태아의 성별은 성 염색체에서 단순히 태아의 배수성 상태이다. XX인 자녀는 여자녀고, XY는 남아있다. 본원에 기술된 방법은 또한 태아의 배수성 상태를 측정하기 위한 것이다. 성감별이 성 염색체의 배수성 측정과 동의어임에 주목하며; 성감별의 경우에, 추정은 종종, 자녀가 정배수성이어서 약간의 가능한 가설이 존재하는 경우에 이루어진다.
본원에 개시된 방법은 X 및 Y 염색체 둘 모두에 대해 일반적인 유전자자리에서 찾아 태아에 대해 존재하는 태아 DNA의 예측된 양의 측면에서 기본선을 생성하는 것을 포함한다. 이후에, X 염색체에 대해서만 특이적인 영역의 정보를 얻어서 태자녀 여아인지 또는 남아인지를 측정할 수 있다. 남아의 경우에, 본 발명자들은 X 및 Y 둘다에 대해 특이적인 유전자자리보다는 X 염색체에 대해 특이적인 유전자자리에서 태아 DNA를 거의 찾을 수 없는 것으로 예측한다. 대조적으로, 여아 태아에서, 본 발명자는 이들 그룹 각각에 대한 DNA의 양이 동일할 것으로 예측한다. 문제의 DNA는 시료 속에 존재하는 DNA의 양을 정량화할 수 있는 어떠한 기술, 예를 들면, qPCR, SNP 배열, 유전형 배열, 또는 서열분석으로 측정할 수 있다. 개체에서 전적으로 기원하는 DNA의 경우, 본 발명자는 다음을 찾을 수 있는 것으로 예측할 수 있다:

모친의 DNA와 혼합된 태아의 DNA의 경우, 및 혼합물 속의 태아 DNA의 분획이 F인 경우 및 혼합물의 모계 DNA의 분획이 M인 경우에, F+M = 100%이도록 함으로써, 본 발명자는 다음을 찾는 것을 예측할 수 있다:

F 및 M이 알려져 있는 경우에, 예측된 비를 계산하고, 관찰된 데이터를 예측된 데이터와 비교할 수 있다. M 및 F가 알려져 있지 않은 경우, 한계는 과거 데이터를 기반으로 선택할 수 있다. 둘 모두의 경우에서, X 및 Y 둘 모두에 대해 특이적인 유전자자리에서 DNA의 측정된 양을 기본선으로 사용할 수 있으며, 태아의 성별에 대한 시험은 X 염색체 만에 대해 특이적인 유전자자리에서 관찰된 DNA의 양을 기반으로 할 수 있다. 당해 양이 ½F와 거의 동일한 양, 또는 예정된 한계 이하에 속하도록 하는 양까지 기본선보다 낮은 경우, 태아는 남아인 것으로 측정되며, 당해 양이 기본선과 거의 같거나, 예정된 한계 이하에 속하도록 향까지 낮지 않는 경우, 태아는 여아로 측정된다.
다른 구현예에서, X 및 Y 염색체 둘 모두에 대해 일반적인, Z 염색체로 흔히 명명된 유전자자리에서만 찾을 수 있다. Z 염색체 상의 유전자자리의 소세트는 X 염색체에서 전형적으로 항상 A이며, Y 염색체에서는 B이다. Z 염색체의 SNP가 B 유전형을 갖는 것으로 밝혀진 경우, 태아는 남아로 불리며; Z 염색체의 SNP가 단지 A 유전형을 갖는 것으로 밝혀진 경우, 태아는 여아로 불린다. 다른 구현예에서, X 염색체 상에서만 발견된 유전자자리에서 찾을 수 있다. AA|B과 같은 관계는, B의 존재가, 태자녀 부친의 X 염색체를 가지고 있다는 것을 나타내므로 특히 유익하다. AB|B와 같은 관계도, 본 발명자가 남아 티아와 비교하여 여아 태아의 경우에 흔한 것으로 단지 반으로만 존재하는 B를 찾는 것을 예측하므로, 유익하다. 다른 구현예에서, A 및 B 대립유전자 둘 모두가 X 및 Y 염색체 상에 존재하고, 어느 SNP가 부모 Y 염색체에서 기원되고, 어느 것이 부모 X 염색체에서 기원하는지 알려져 있는 경우에 Z 염색체 상에서 SNP를 찾을 수 있다.
일 구현예에서, 염색체 Y 및 염색체 X에 의해 공유된 동종 비-재조합(HNR) 영역 사이에서 변하는 것으로 알려진 단일 뉴클레오타이드 위치를 증폭시키는 것이 가능하다. 당해 HNR 영역내 서열은 X 및 Y 염색체 상에서 크게 동일하다. 이러한 동일한 영역내에는, 집단내 X 염색체 및 Y 염색체 중의 불변이 X 및 Y 염색체 사이에서 상이한 단일 뉴클레오타이드 위치가 존재한다. 각각의 PCR 검정은 X 및 Y 염색체 둘 모두에 존재하는 유전자자리의 서열을 증폭시킬 수 있다. 각각의 증폭된 서열 내에서 서열분석 또는 일부 다른 방법을 사용하여 검출될 수 있는 단일 염기가 존재할 수 있다.
일 구현예에서, 태아의 성별을 모계 혈장 속에서 발견된 태아의 자유로이 부유하는 DNA로부터 측정할 수 있으며, 당해 방법은 다음 단계 중 일부 또는 모두를 포함한다: 1) PCR(정규의 또는 미니-PCR, 및 경우에 따라 다중화) 프라이머를 HNR 영역내에서 X/Y 변이체 단일 뉴클레오타이드 위치를 증폭시키는 단계, 2) 모계 혈장을 수득하는 단계, 3) HNR X/Y PCR 검정을 사용하여 모계 혈장의 표적을 PCR 증폭시키는 단계, 4) 앰플리콘을 서열화하는 단계, 5) 하나 이상의 증폭된 서열내에서 Y-대립유전자의 존재에 대해 서열 데이터를 시험하는 단계. 하나 이상의 존재는 남아 태아를 나타낼 수 있다. 모든 앰플리콘의 Y-대립유전자의 부재는 여아 태아를 나타낸다.
일 구현예에서, 모계 혈장 및/또는 부모계 유전형 속에서 DNA를 측정하기 위해 표적화된 서열분석을 사용할 수 있다. 일 구현예에서, 부모로부터 받은 DNA에서 명확하게 기원한 모든 서열을 무시할 수 있다. 예를 들면, AA|AB와 관련하여, A 서열의 수를 계수하고 모든 B 서열을 무시할 수 있다. 상기 알고리즘에 대한 이형접합성 비율을 측정하기 위하여, 관찰된 A 서열의 수를 소정의 프로브에 대한 전체 서열의 예측된 수와 비교할 수 있다. 시료 기준으로 각각의 프로브에 대해 예측된 수의 서열을 계산할 수 있는 많은 방법이 존재한다. 일 구현예에서, 모든 서열의 어느 분획이 각각의 특이적인 프로브에 속하는지를 측정한 후 이러한 실험적 분획을 사용하여 전체 수의 서열 판독물과 합하여, 각각의 프로브에서 다수의 서열을 평가하는 것이 가능하다. 다른 접근법은 일부 공지된 동종접합성 대립유전자를 표적화한 후 과거 데이터를 사용하여 각각의 프로브에서의 다수의 판독물과 공지된 동종접합성 대립유전자에서 다수의 판독물을 관련시킬 수 있다. 각각의 시료의 경우, 이후에 동종접합성 대립유전자에서 판독물의 수를 측정한 후 당해 측정을 실험적으로 기원한 관계와 함께 사용하여 각각의 프로브에서 다수의 서열 판독물을 평가하는 것이 가능하였다.
일부 구현예에서, 다수의 방법에 의해 이루어진 예측을 결합시켜 태아의 성별을 측정하는 것이 가능하다. 일부 구현예에서, 다수의 방법은 본 기내내용에 기술된 방법으로부터 취한다. 일부 구현예에서, 다수 방법 중의 적어도 하나는 본 기재내용에 기술된 방법으로부터 취한다.
일부 구현예에서, 본원에 기술된 방법을 사용하여 잉태된 태아의 배수성 상태를 측정할 수 있다. 일 구현예에서, 배수성 요청 방법은 X 염색체에 대해 특이적이거나, X 및 Y 염색체 둘 모두에 대해 일반적인 유전자자리를 사용하지만, 어떠한 Y-특이적인 유전자자리를 사용하지 않는다. 일 구현예에서, 배수성 요청 방법은 다음 중 하나 이상을 사용한다: X 염색체에 대해 특이적인 유전자자리, X 및 Y 염색체 둘 모두에 대해 일반적인 유전자자리, 및 Y 염색체에 대해 특이적인 유전자자리, 일 구현예에서, 성별 염색체의 비가 유사한 경우, 예를 들어, 45,X (터너 증후군), 46,XX(일반적인 남아) 및 47,XXX(삼염색체성 X)의 경우, 대립유전자를 다양한 가설에 따라 예측된 대립유전자 분포와 비교함으로써 달성할 수 있다. 다른 구현예에서, 이는 성 염색체의 상대적인 수의 서열 판독물을 정배수성인 것으로 추정된 다수의 참조 염색체와 비교하여 달성할 수 있다. 또한, 이들 방법이 이수성 경우를 포함하도록 확장될 수 있음에 주목한다.
단일 유전자 질병 스크리닝
일 구현예에서, 태아의 배수성 상태를 측정하는 방법은 단일 유전자 질환에 대한 동시 시험이 가능하도록 확장시킬 수 있다. 단일-유전자 질병 진단은 이수성 시험에 대해 사용된 동일한 표적화된 접근법을 지렛대로 하며, 추가의 특이적인 표적을 필요로 한다. 일 구현예에서, 단일 유전자 NPD 진단은 연결 분석을 통한다. 많은 경우에, cfDNA 시료의 직접적인 시험은, 모계 DNA의 존재가 이를 태자녀 모계 돌연변이를 유전받았는지를 측정하는 것을 실제로 불가능하게 하므로, 신뢰할 수 없다. 유일한 부모-기원한 대립유전자의 검출은 거의 도전적이지 않지만, 질병이 우성이고 부친에 의해 운반되는 경우에만 충분히 유익하여, 당해 접근법의 활용을 제한한다. 일 구현예에서, 당해 방법은 PCR 또는 관련된 증폭 접근법을 포함한다.
일부 구현예에서, 당해 방법은 비정상 대립유전자를 일촌 친척(first-degree relatives)에서 비롯된 정보를 사용하여 부모에서 주변을 매우 강력하게 연결된 SNP로 단계화함을 포함한다. 이후에, 부모 지지는 이들 SNP로부터 수득된 표적화된 서열분석 데이터에서 수행하여 정상 또는 비정상인 어느 동족체가 부모 둘 모두로부터 태자녀 유전받았는지를 측정한다. SNP가 충분히 연결되어 있는 한, 태아의 유전형의 유전은 매우 신뢰가능하게 측정될 수 있다. 일부 구현예에서, 당해 방법은 (a) SNP 유전자자리의 세트를 가하여 일반적인 질병의 구체화된 세트를 이수성 시험을 위한 본 발명자들의 다중화 혼주물(mul tiplex pool)에 밀접하게 플랭킹시키는 단계; (b) 이들 가해진 SNP의 대립유전자를 다양한 친척의 유전 데이터를 기반으로 비정상 및 정상의 대립유전자로 신뢰가능하게 단계화하는 단계; 및 (c) 유전된 물질 및 질병 유전자자리 주변의 영역내 부모 동족체에서 태아 일배체형, 또는 단계적인 SNP 대립유전자의 세트를 재작제하여 태아 유전형을 측정하는 단계를 포함한다. 일부 구현예에서, 질병 연결된 유전자자리에 밀접하게 연결된 추가의 프로브는 이수성 시험에 사용되는 다형성 유전자자리의 세트에 가해진다.
태아 이배체형을 재작제하는 것은, 당해 시료가 모계 및 태아 DNA의 혼합물이므로 도전적이기 때문이다. 일부 구현예에서, 당해 방법은 상대적인 정보를 포함함으로써 SNP와 질병 대립유전자를 단계화한 후, SNP 및 위치 특이적인 재조합 가능성에 근거한 재조합 데이터 및 모계 혈장의 유전 측정에 의하여 관찰된 데이터의 물리적 거리를 고려하여 태아의 가장 가능성 있는 유전형을 수득한다.
일 구현예에서, 질병 연결된 유전자자리 당 다수의 추가의 프로브가 표적화된 다형성 유전자자리의 세트에 포함되며; 질병 연관된 유전자자리 당 다수의 추가의 프로브의 수는 4 내지 10, 11 내지 20, 21 내지 40, 41 내지 60, 61 내지 80, 또는 이의 조합일 수 있다.
부모의 배수성 데이터를 단계화하는 것은 과제일 수 있으며, 달성될 수 있는 다수의 방법이 존재한다. 일부는 본 기재내용에 논의되어 있고, 다른 것은 다른 기재내용에 보다 상세시 기술되어 있다(참조: 예를 들면, 2009년 2월 9일자로 출원된 PCT 공보 제WO2009105531호, 및 2009년 8월 4일자로 출원된 PCT 공보 제WO2010017214호; 이들 각각은, 전문이 본원에 참조로 포함된다). 일 구현예에서, 부모는 예를 들면, 하나 이상의 정자 또는 난자를 측정함으로써 반수체인 부모의 조직을 측정함으로써 추론에 의해 단계화할 수 있다. 일 구현예에서, 부모는 환자의 부모(들) 또는 형제자매와 같은 제1 정도 관련성의 측정된 유전형 데이터를 사용하여 추론에 의해 단계화할 수 있다. 일 구현예에서, 부모는 희석(여기서 DNA는 다수의 웰 속에서, 각각의 웰 속에 각각의 반수체의 대략 1개 이하의 카피가 존재하는 것으로 예상되는 지점까지 희석된다)시킨 후, 하나 이상의 웰 속에서 DNA를 측정함으로써 단계화할 수 있다. 일 구현예에서, 부모계 유전형은 가장 가능성있는 상을 부여하는 일배체형 빈도를 기반으로 집단을 사용하는 컴퓨터 프로그램을 사용하여 단계화할 수 있다. 일 구현예에서, 부모는, 단계화된 일배체형 데이터가 다른 부모에 대해, 부모의 하나 이상의 유전적 자식의 단계화되지 않은 유전 데이터와 함께 공지된 경우 단계화될 수 있다. 일 구현예에서, 환자는, 단계화된 일배체형 데이터가 다른 환자에 대해, 환자의 하나 이상의 유전적 자식의 단계화되지 않은 유전 데이터와 함께 공지되어 있는 경우 단계화될 수 있다. 일부 구현예에서, 환자의 유전적 자식은 하나 이상의 배아, 태아 및/또는 출생아일 수 있다. 부모 한 명 또는 둘 모두를 단계화하기 위한 이들 방법 및 다른 방법의 일부는 2010년 8월 19일자로 출원된, 미국 공보 제2011/0033862호; 2011년 2월 3일자로 출원된 미국 공보 제2011/0178719호, 2006년 11월 22일자로 출원된 미국 공보 제2007/0184467호; 2008년 3월 17일자로 출원된 미국 공보 제2008/0243398호에 개시되어 있으며, 이들은 각각, 이의 전문이 본원에 참조로 포함된다.
태아 게놈 재작제
하나의 측면에서, 본 발명은 태아의 일배체형을 측정하기 위한 방법을 특징화한다. 다양한 구현예에서, 당해 방법은, 어느 다형성 유전자자리(SNP와 같은)가 태아에게 유전되었는지를 측정하고 어느 동족체(재조합 현상을 포함)가 태아에 존재하는지(및 이의 의해 다형성 유전자자리 사이에 서열을 삽입함)를 재작제하도록 한다. 경우에 따라, 필수적으로 태아의 전체 게놈이 재작제될 수 있다. 태아의 게놈내에 일부 남아있는 애매성이 존재하는 경우(교차와의 간격과 같이), 이러한 애매성은, 추가의 다형성 유전자자리를 분석함으로써 최소화할 수 있다. 다양한 구현예에서, 다형성 유전자자리를 선택하여 어느 애매성도 바람직한 수준으로 감소시키는 밀도에서 하나 이상의 염색체를 포함한다. 당해 방법은 태아 게놈내 목적한 다른 돌연변이 또는 다형성을 검출하는 것을 지시하기 보다는 연결(태아 게놈에서 연결된 다형성 유전자자리의 존재와 같은)을 기반으로 이들의 검출이 가능하도록 하므로 태아내 목적한 다른 돌연변이 또는 다형성의 검출에 중요한 적용을 갖는다. 예를 들어, 환자가 낭포성 섬유증(CF)과 관련된 돌연변이에 대한 매개체인 경우, 태아의 모친의 모계 DNA 및 태아의 태아 DNA를 포함하는 핵산 시료를 분석하여 태아 DNA가 CF 돌연변이를 함유하는 일배체형을 포함하는지를 측정할 수 있다. 특히, 다형성 유전자자리를 분석하여, 태아 DNA가 CF 돌연변이를 함유하는 일배체형을 포함하는지를 측정함으로써 태아 DNA내 CF 돌연변이 자체를 검출할 수 있다.
일부 구현예에서, 당해 방법은 부모 일배체형(예를 들면, 태아의 모친 또는 부친의 일배체형)을 측정함을 포함한다. 일부 구현예에서, 이러한 측정은 모친 또는 부친의 친척의 데이터를 사용하지 않고 달성된다. 일부 구현예에서, 부모 일배체형은 희석 접근법에 이어서 본원 및 어딘가(참조: 예를 들면, 이의 전문이 본원에 참조로 혼입된, 2010년 8월 19일자로 출원된 미국 공보 제2011/0033862호)에 기술된 바와 같은 서열분석 또는 SNP 유전형 분석을 사용하여 측정한다. DNA는 희석되므로, 하나 이상의 일배체형이 동일한 분획(또는 튜브) 속에 존재할 가능성은 없다. 따라서, 튜브 속에 DNA의 단일 분자로 효과적으로 존재할 수 있으며, 이는 측정될 단일 DNA 분자에 일배체형을 허용한다. 일부 구현예에서, 당해 방법은 DNA 시료를 다수의 분획으로 나눔으로써 분획 중 적어도 하나가 하나의 염색체 또는 염색체 쌍의 하나의 염색체 분절을 포함하도록 하는 단계, 및 적어도 하나의 분획 속에서 DNA 시료를 유전형분석(예를 들면, 2개 이상의 다형성 유전자자리의 존재를 측정)함으로써, 부모 일배체형을 측정하는 단계를 포함한다. 일부 구현예에서, 유전형분석은 서열분석(예를 들면, 셧건 서열분석)을 포함한다. 일부 구현예에서, 유전형은 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리와 같은 다형성 유전자자리를 검출가히 위한 SNP 배열의 사용을 포함한다. 일부 구현예에서, 유전형분석은 다중 PCR의 사용을 포함한다. 일부 구현예에서, 당해 방법은 분획 속의 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리(예: SNP)에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계; 및 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 고 배출 서열분석기로 측정된 증폭된 생성물을 생산하여 서열분석 데이터를 생산하는 단계를 포함한다.
일부 구현예에서, 모친의 일배체형은 모친의 친척의 데이터를 사용하여 본원에 기술된 방법 중 어느 것으로도 측정된다. 일부 구현예에서, 부친의 일배체형은 본원에 기술된 방법 중 어느 것에 의해 부친의 친척의 데이터를 사용하여 측정된다. 일부 구현예에서, 일배체형은 부친과 모친 둘 모두에 대해 측정된다. 일부 구현예에서, SNP 배열은 모친(또는 부친) 및 모친(또는 부친)의 친척의 DNA 시료 속에서 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리의 존재를 측정한다. 일부 구현예에서, 당해 방법은 모친(또는 부친) 및/또는 모친(또는 부친)의 친척의 DNA 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리(예: SNP)에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산하는 단계; 및 반응 혼합물을 프라이머 연장 반응 조건에 적용시켜 서열분석 데이터를 생산하기 위한 고 배출 서열분석기로 측정되는 증폭된 생성물을 생산하는 단계를 포함한다. 부모 일배체형은 SNP 배열 또는 서열분석 데이터를 기반으로 측정할 수 있다. 일부 구현예에서, 부모 데이터는 당해 서류의 어딘가에 기술되거나 참조된 방법으로 단계화할 수 있다.
당해 부모 일배체형을 사용하여, 태자녀 모계 일배체형을 유전받았는지를 측정할 수 있다. 일부 구현예에서, 태아의 모친의 모계 DNA 및 태아의 태아 DNA를 포함하는 핵산 시료를 SNP 배열로 분석하여 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리를 검출한다. 일부 구현예에서, 태아의 모친의 모계 DNA 및 태아의 태아 DNA를 포함하는 핵산 시료는 시료를 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리(예: SNP)에 동시에 하이브리드화하는 프라이머의 라이브러리와 접촉시켜 반응 혼합물을 생산함으로써 분석한다. 일부 구현예에서, 반응 혼합물은 프라이머 연장 반응 조건에 적용시켜 증폭된 생성물을 생산한다. 일부 구현예에서, 증폭된 생성물은 고 배출 서열분석기를 사용하여 측정함으로써 서열분석 데이터를 생산한다. 다양한 구현예에서, SNP 배열 또는 서열분석 데이터를 사용하여 염색체내 상이한 위치에서 염색체 교차 확률에 대한 데이터를 사용함으로써(예를 들면, HapMap 데이터베이스에서 발견될 수 있는 바와 같은 재조합 데이터를 사용함으로써 어떠한 간격에 대해 재조합 위험 점수를 생성함으로써) 부모 일배체형을 측정하여 염색체 상의 다형성 대립유전자 사이의 의존성을 모델화할 수 있다. 일부 구현예에서, 다형성 유전자자리에서 대립유전자 수는 서열분석 데이터를 기반으로 하는 컴퓨터 상에서 계산한다. 일부 구현예에서, 각각 염색체의 상이한 가능한 배수성 상태에 관한 다수의 배수성 가설을 컴퓨터에서 생성하며; 염색체 상의 다형성 유전자자리에서 예측된 대립유전자 수에 대한 모델(예: 결합 분포 모델)을 각각의 배수성 가설에 대해 컴퓨터 상에서 구축하고; 각각의 배수성 가설의 상대적인 가능성을 결합 분포 모델 및 대립유전자 수를 사용하여 컴퓨터 상에서 측정하며; 태아의 배수성 상태를 최대 확률로 당해 가설에 상응하는 배수성 상태를 선택함으로써 요청한다. 일부 구현예에서, 대립유전자 수에 대한 결합 분포 모델의 구축 및 각각의 가설의 상대적인 확률의 측정은 참조 염색체의 사용을 필요로 하지 않는 방법을 사용하여 수행한다.
일부 구현예에서, 태아 일배체형은 염색체 13, 18, 21, X, 및 Y로 이루어진 그룹 중에서 취한 하나 이상의 염색체에 대해 측정된다. 일부 구현예에서, 태아 일배체형은 모든 태아 염색체에 대해 측정된다. 다양한 구현예에서, 당해 방법은 필수적으로 태아의 전체 게놈을 측정한다. 일부 구현예에서, 일배체형은 태아의 게놈의 적어도 30, 40, 50, 60, 70, 80, 90, 또는 95%에 대해 측정된다. 일부 구현예에서, 태아의 일배체형 측정은, 어느 대립유전자가 적어도 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; 또는 100,000개의 상이한 다형성 유전자자리에 대해 존재하는지에 대한 정보를 포함한다.
DNA 의 조성
태아 및 모계 혈액의 혼합물에서 측정된 서열분석 데이터에서 정보학 분석을 수행하여 태아, 예를 들면, 태아의 배수성에 관한 게놈 정보를 측정하는 경우, 대립유전자의 세트에서 대립유전자 분포를 측정하는 것이 유리할 수 있다. 불행하게도, 모계 혈액 시료의 혈장에서 발견된 DNA 혼합물로부터 태아의 배수성 상태를 측정하기 위해 접근법하는 경우와 같은, 많은 경우에서, 이용가능한 DNA의 양은 혼합물 속에서 우수한 충실도로 대립유전자 분포를 직접 측정하기에 충분하지 않다. 이들 경우에, DNA 혼합물의 증폭은, 바람직한 대립유전자 분포가 우수한 충실도로 측정될 수 있는 DNA 분자의 충분한 수를 제공할 것이다. 그러나, 서열분석을 위한 DNA의 증폭에 전형적으로 사용된 현재의 증폭 방법이 흔히 매우 편향적이이며, 이들이 동일한 양에 의해 다형성 유전자자리에서 대립유전자 둘 모두를 증폭시키지 않음을 의미한다. 편향된 증폭은 원래의 혼합물 속에서 대립유전자 분포로 인하여 매우 상이한 대립유전자 분포를 초래할 수 있다. 대부분의 목적을 위해, 다형성 유전자자리에 존재하는 대립유전자의 상대적인 양의 고도로 정밀한 측정이 요구되지 않는다. 대조적으로, 본원의 구현예에서, 다형성 대립유전자를 구체적으로 농축시키고 대립유전자 비를 보존하는 증폭 또는 농축 방법이 유리할 수 있다.
대립유전자 편향을 최소화시키는 방식으로 다수의 유전자자리에서 DNA의 시료를 우선적으로 농축시키는데 사용될 수 있는 다수의 방법이 기술되어 있다. 일부 예는 다수의 유전자자리를 표적화하기 위한 순환 프로브를 사용하며, 여기서 예비-순환된 프로브의 3' 말단 및 5' 말단은 표적화된 대립유전자의 다형성 부위에서 떨어져 있는 1개 또는 소수의 위치에 있는 염기에 하이브리드화하도록 설계된다. 다른 것은 PCR 프로브를 사용하는 것이며, 여기서 3' 말단 PCR 프로브는 표적화된 대립유전자의 다형성 부위에서 떨어져 있는 1개 또는 소수의 위치에 있는 염기에 하이브리드화하도록 설계된다. 다른 것은 분열(split) 및 혼주물 접근법을 사용하여 DNA의 혼합물을 생성하는 것이며, 여기서 우선적으로 농축된 유전자자리는 직접적인 다중화의 단점없이 낮은 대립유전자 편향으로 농축된다. 다른 것은 하이브리드 포획 접근법을 사용하는 것이며, 여기서 포획 프로브는, 표적의 다형성 부위를 플랭킹하는 DNA에 하이브리드화하도록 설계된 포획 프로브의 영역이 다형성 부위에서 1개 또는 소수의 염기에 의해 분리되도록 설계된다.
다형성 유전자자리의 세트에서 측정된 대립유전자 분포를 사용하여 개체의 배수성 상태를 측정하는 경우에, 이는 유전 측정에 대해 제조되므로 DNA의 시료 속에서 대립유전자의 상대적인 양을 보존하는 것이 바람직하다. 이러한 제조는 WGA 증폭, 표적화된 증폭, 선택적인 농축 기술, 하이브리드 포획 기술, 순환 프로브 또는 특정의 대립유전자에 상응하는 DNA의 분자의 존재를 선택적으로 향상시키고/시키거나 DNA의 양을 증폭시킴을 의미하는 다른 방법을 포함할 수 있다.
본원의 일부 구현예에서, 당해 유전자자리가 최대의 작은 대립유전자 빈도를 갖는 유전자자리를 표적으로 삼도록 설계된 DNA 프로브의 세트가 있다. 본원의 일부 구현예에서, 유전자자리가 당해 유전자자리에서 고도로 유용한 정보를 주는 SNP를 갖는 태아의 최대 확률을 갖는 자리를 표적으로 삼도록 설계된 프로브의 세트가 있다. 본원의 일부 구현예에서, 당해 프로브가 소정의 집단 소그룹에 대해 최적화된 유전자자리를 표적으로 삼도록 설계된 프로브의 세트가 있다. 본원의 일부 구현예에서, 당해 프로브가 집단 소그룹의 특정한 혼합에 대해 최적화된 유전자자리를 표적으로 삼도록 설계된 프로브의 세트가 있다. 본원의 일부 구현예에서, 당해 프로브는 상이한 작은 대립유전자 빈도 프로파일을 갖는 상이한 집단 소그룹에서 기원하는 부모의 특정 쌍에 대해 최적화된 유전자자리를 표적으로 삼도록 설계된 프로브의 세트가 있다. 본원의 일부 구현예에서, 태아 기원의 DNA의 조각에 어닐링된 적어도 하나의 염기 쌍을 포함하는 DNA의 순환된 쇄가 있다. 본원의 일부 구현예에서, 태반 기원의 DNA의 조각에 어닐링된 적어도 하나의 염기 쌍을 포함하는 DNA의 순환된 쇄가 있다. 본원의 일부 구현예에서, 뉴클레오타이드의 적어도 일부는 태아 기원의 DNA에 어닐링된 한편 순환된 DNA의 순환된 쇄가 있다. 본원의 일부 구현예에서, 뉴클레오타이드의 적어도 일부는 태반 기원의 DNA에 어닐링된 한편 순환된 DNA의 순환된 쇄가 있다. 본원의 일부 구현예에서, 프로브 중의 일부는 단일의 연쇄 반복을 표적으로 삼고, 프로브 중 일부는 단일 뉴클레오타이드 다형성을 표적으로 삼는 프로브의 세트가 있다. 일부 구현예에서, 당해 유전자자리는 비-침입성 태아기 진단의 목적으로 선택된다. 일부 구현예에서, 프로브는 비-침입성 태아기 진단의 목적으로 사용된다. 일부 구현예에서, 유전자자리는 순환 프로브, MIP, 하이브리드화 프로브에 의한 포획, SNP 배열 상의 프로브, 또는 이의 조합을 포함할 수 있는 방법을 사용하여 표적이 된다. 일부 구현예에서, 프로브는 순환 프로브, MIP, 하이브리드화 프로브에 의한 포획, SNP 배열 상의 프로브, 또는 이의 조합으로서 사용된다. 일부 구현예에서, 유전자자리는 비-침입성 태아기 진단의 목적으로 서열분석된다.
상대적인 부모 관계와 합할 때 서열의 상대적인 유익성이 보다 큰 경우, 이는, 부모 관계가 공지된 SNP를 함유하는 다수의 서열 판독물을 극대화하는 것은 혼합된 시료에서 서열분석 판독물의 세트의 유익성을 극대화할 수 있다. 일 구현예에서, 모계 정보가 공지된 SNP를 함유하는 다수의 서열 판독물은 qPCR을 사용하여 향상시킴으로써 특이적인 서열을 우선적으로 증폭시킬 수 있다. 일 구현예에서, 모계 정보가 공지된 SNP를 함유하는 다수의 서열 판독물은 순환 프로브(예를 들면, MIP)를 사용함으로써 향상시켜 특이적인 서열을 우선적으로 증폭시킬 수 있다. 일 구현예에서, 부모의 정보가 공지된 SNP를 함유하는 다수의 서열 판독물은 하이브리드화 방법(예를 들면, SURESELET)에 의한 포획을 사용함으로써 향상시켜 특이적인 서열을 우선적으로 증폭시킬 수 있다. 상이한 방법을 사용하여 모계 정보가 공지된 SNP를 함유하는 다수의 서열 판독물을 향상시킬 수 있다. 일 구현예에서, 표적화하는 연장 연결, 연장이 없는 연결, 하이브리드화에 의한 포획, 또는 PCR에 의해 달성될 수 있다.
단편화된 게놈 DNA의 시료에서, DNA 서열의 분획은 개체 염색체에 유일하게 맵핑되며; 다른 DNA 서열은 상이한 염색체에서 발견될 수 있다. 혈장에서 발견된 DNA는, 기원이 모계 또는 태아인 것이 전형적으로 단편화되는 것에 상관없이, 흔히 길이가 500bp 이하이다. 대표적인 게놈성 시료에서, 대략 맵핑가능한 서열의 3.3%가 염색체 13에 맵핑될 것이며; 맵핑가능한 서열의 2.2%가 염색체 18에 맵핑될 것이고; 대략 1.35%의 맵핑가능한 서열이 염색체 21에 맵핑될 것이며; 맵핑가능한 서열의 4.5%가 여성의 염색체 X에 맵핑될 것이고; 대략 맵핑가능한 서열의 2.25%가 염색체 X에 맵핑될 것이며(남성에서); 맵핑가능한 서열의 0.73%가 염색체 Y(남성에서)에 맵핑될 것이다. 이들은 태아에서 이수성일 가능성이 가장 큰 염색체이다. 또한, 짧은 서열 중에서, 20개 서열 중 대략 1개는 dbSNP에 함유된 SNP를 사용하여, SNP를 함유할 것이다. 당해 집단은, 발견되지 않은 많은 SNP가 존재할 수 있는 것을 경우에도 보다 더 높을 수 있다.
본원의 일 구현예에서, 표적화 방법을 사용하여 소정의 염색체에 맵핑되는 DNA의 시료 속의 DNA의 분획을 향상시킴으로서 분획이 게놈 시료에 대해 전형적인 위에서 나열한 퍼센트를 유의적으로 초과하도록 할 수 있다. 본원의 구현예에서, 표적화 방법을 사용하여 DNA의 시료 속의 DNA의 분획을 향상시켜 SNP를 함유하는 서열의 퍼센트가 게놈 시료에 대해 전형적으로 발견될 수 있는 것보다 유의적으로 크도록 할 수 있다. 본원의 일 구현예에서, 표적화 방법을 사용하여 태아기 진단의 목적을 위해 모계 및 태아 DNA의 혼합물에 있어서 SNP들의 세트의 또는 염색체의 DNA를 표적화하는 데 사용될 수 있다.
의심되는 염색체에 맵핑하는 판독물의 수를 계수하고 이를 참조 염색체에 맵핑되는 판독물의 수와 비교함으로써 태아 이수성을 측정하고, 의심되는 염색체 상의 판독물의 과잉이 이러한 염색체에서 태아내 삼염색체성에 상응한다는 가정을 사용하는 방법이 보고되어 있음에 주목한다(미국 특허 제7,888,017호). 태아기 진단을 위한 이들 방법은 어느 종류의 표적화를 사용하지 않을 뿐 아니라, 이들이 태아기 진단을 위한 표적화의 사용을 기술하지 않을 수 있다.
혼합된 시료를 서열분석하는데 있어서 표적화 접근법을 사용함으로써, 보다 적은 서열 판독물로 특정 수준의 정밀도를 달성하는 것이 가능할 수 있다. 이러한 정밀도는 민감성을 말할 수 있거나, 이는 특이성을 말할 수 있거나, 이는 이의 일부 조합을 말할 수 있다. 정밀도의 바람직한 수준은 90% 내지 95%일 수 있으며; 이는 95% 내지 98%일 수 있고; 이는 98% 내지 99%일 수 있으며; 이는 99% 내지 99.5%일 수 있으며; 이는 99.5% 내지 99.9%일 수 있으며; 이는 99.9% 내지 99.99%일 수 있으며; 이는 99.99% 내지 99.999%일 수 있으며; 이는 99.999% 내지 100%일 수 있다. 95%를 초과하는 정밀도의 수준은 고 정밀도로서 언급될 수 있다.
모계 및 태아 DNA의 혼합된 시료로부터 태아의 배수성 상태를 측정할 수 있는 방법을 입증하는 선행 기술에서 발표된 방법이 다수 존재한다[예: G.J. W. Liao 등 Clinical Chemistry 2011; 57(1) pp. 92-101]. 이들 방법은 각각의 염색체를 따라 수천개의 위치에 집중한다. DNA의 혼합된 시료로부터, 특정 다수의 서열 판독물에 대해, 태아에서 고 정밀도의 배수성 측정을 여전히 초래하면서 표적화될 수 있는 염색체를 따른 위치의 수는 예상치못하게 낮다. 본원의 구현예에서, 정밀한 배수성 측정은 표적화된 서열분석을 사용하거나, 표적화의 어떠한 방법, 예를 들면, qPCR, 리간드 매개된 PCR, 다른 PCR 방법, 하이브리드화에 의한 포획, 또는 순환 프로브를 사용하여 이룰 수 있으며, 여기서 표적화될 필요가 있는 염색체에 따른 유전자자리의 수는 5,000 내지 2,000개의 유전자자리일 수 있고; 이는 2,000 내지 1,000개의 유전자자리일 수 있고; 이는 1,000 내지 500개의 유전자자리일 수 있고; 이는 500 내지 300개의 유전자자리일 수 있고; 이는 300 내지 200개의 유전자자리일 수 있고; 이는 200 내지 150개의 유전자자리일 수 있고; 이는 150 내지 100개의 유전자자리일 수 있고; 이는 100 내지 50개의 유전자자리일 수 있고; 이는 50 내지 20개의 유전자자리일 수 있고; 이는 20 내지 10개의 유전자자리일 수 있다. 임의로, 이는 100 내지 500개의 유전자자리일 수 있다. 높은 수준의 정밀도는 소수의 유전자자리를 표적화하고 예상치못한 소수의 서열 판독물을 실행시킴으로써 달성할 수 있다. 판독물의 수는 1억 내지 5,000만 개의 판독물일 수 있고; 판독물의 수는 5,000만 내지 2,000만 개의 판독물일 수 있으며; 판독물의 수는 2,000만 내지 1,000만 개의 판독물일 수 있고; 판독물의 수는 1,000만 내지 500만 개의 판독물일 수 있으며; 판독물의 수는 500만 내지 200백만 개의 판독물일 수 있으며; 판독물의 수는 200만 내지 100만 개의 판독물일 수 있고; 판독물의 수는 100만 내지 50만 개의 판독물일 수 있고; 판독물의 수는 50만 내지 20만 개일 수 있고; 판독물의 수는 20만 내지 10만 개일 수 있으며; 판독물의 수는 10만 내지 5만 개만일 수 있으며; 판독물의 수는 5만 내지 2만 개일 수 있으며; 판독물의 수는 2만 내지 1만 개일 수 있으며; 판독물의 수는 1만 개 이하일 수 있다. 보다 적은 수의 판독물이 보다 많은 양의 투입 DNA에 필수적이다.
일부 구현예에서, 태아 기원의 DNA, 및 모체 기원의 DNA의 혼합물을 포함하는 조성물이 존재하며, 여기서, 염색체 13에 유일하게 맵핑되는 서열의 퍼센트는 4% 이상, 5% 이상, 6% 이상, 7% 이상, 8% 이상, 9% 이상, 10% 이상, 12% 이상, 15% 이상, 20% 이상, 25% 이상, 또는 30% 이상이다. 본원의 일부 구현예에서, 태아 기원의 DNA, 및 모체 기원의 DNA의 혼합물을 포함하는 조성물이 존재하며, 여기서 염색체 18에 유일하게 맵핑되는 서열의 퍼센트는 3% 이상, 4% 이상, 5% 이상, 6% 이상, 7% 이상, 8% 이상, 9% 이상, 10% 이상, 12% 이상, 15% 이상, 20% 이상, 25% 이상, 또는 30% 이상이다. 본원의 일부 구현예에서, 태아 기원의 DNA, 및 모체 기원의 DNA의 혼합물을 포함하는 조성물이 존재하며, 여기서 염색체 21에 유일하게 맵핑되는 서열의 퍼센트는 2% 이상, 3% 이상, 4% 이상, 5% 이상, 6% 이상, 7% 이상, 8% 이상, 9% 이상, 10% 이상, 12% 이상, 15% 이상, 20% 이상, 25% 이상, 또는 30% 이상이다. 본원의 일부 구현예에서, 태아 기원의 DNA, 및 모체 기원의 DNA의 혼합물을 포함하는 조성물이 존재하며, 여기서 염색체 X에 유일하게 맵핑되는 서열의 퍼센트는 6% 이상, 7% 이상, 8% 이상, 9% 이상, 10% 이상, 12% 이상, 15% 이상, 20% 이상, 25% 이상, 또는 30% 이상이다. 본원의 일부 구현예에서, 태아 기원의 DNA, 및 모체 기원의 DNA의 혼합물을 포함하는 조성물이 존재하며, 여기서 염색체 Y에 유일하게 맵핑되는 서열의 퍼센트는 1% 이상, 2% 이상, 3% 이상, 4% 이상, 5% 이상, 6% 이상, 7% 이상, 8% 이상, 9% 이상, 10% 이상, 12% 이상, 15% 이상, 20% 이상, 25% 이상, 또는 30% 이상이다.
일부 구현예에서, 태아 기원의 DNA, 및 모체 기원의 DNA의 혼합물을 포함하는 조성물이 기술되어 있으며, 여기서 염색체에 유일하게 맵핑되고, 적어도 하나의 단일 뉴클레오타이드 다형성을 함유하는 서열의 퍼센트는 0.2% 이상, 0.3% 이상, 0.4% 이상, 0.5% 이상, 0.6% 이상, 0.7% 이상, 0.8% 이상, 0.9% 이상, 1% 이상, 1.2% 이상, 1.4% 이상, 1.6% 이상, 1.8% 이상, 2% 이상, 2.5% 이상, 3% 이상, 4% 이상, 5% 이상, 6% 이상, 7% 이상, 8% 이상, 9% 이상, 10% 이상, 12% 이상, 15% 이상, 또는 20% 이상이고, 여기서 염색체는 그룹 13, 18, 21, X, 또는 Y 중에서 취해진다. 본원의 일부 구현예에서, 태아 기원의 DNA, 및 모체 기원의 DNA의 혼합물을 포함하는 조성물이 존재하며, 여기서 염색체에 유일하게 맵핑되고 단일 뉴클레오타이드 다형성들의 세트 중 적어도 하나의 단일 뉴클레오타이드 다형성을 함유하는 서열의 퍼센트는 0.15% 이상, 0.2% 이상, 0.3% 이상, 0.4% 이상, 0.5% 이상, 0.6% 이상, 0.7% 이상, 0.8% 이상, 0.9% 이상, 1% 이상, 1.2% 이상, 1.4% 이상, 1.6% 이상, 1.8% 이상, 2% 이상, 2.5% 이상, 3% 이상, 4% 이상, 5% 이상, 6% 이상, 7% 이상, 8% 이상, 9% 이상, 10% 이상, 12% 이상, 15% 이상, 또는 20% 이상이고, 여기서 염색체는 염색체 13, 18, 21, X 및 Y의 세트로부터 취해지고, 여기서 단일 뉴클레오타이드 다형성의 세트내 단일 뉴클레오타이드 다형성의 수는 1 내지 10, 10 내지 20, 20 내지 50, 50 내지 100, 100 내지 200, 200 내지 500, 500 내지 1,000, 1,000 내지 2,000, 2,000 내지 5,000, 5,000 내지 10,000, 10,000 내지 20,000, 20,000 내지 50,000, 및 50,000 내지 100,000개이다.
이론적으로, 증폭시 각각의 주기는 존재하는 DNA의 양을 2배로 만들지만; 실제로 증폭의 정도는 2배보다 약간 낮다. 이론적으로, 표적화된 증폭을 포함하는 증폭은 DNA 혼합물의 편향이 자유로운 증폭을 생성할 것이나; 실제로, 상이한 대립유전자는 다른 대립유전자보다 상이한 정도로 증폭되는 경향이 있다. DNA가 증폭되면, 대립유전자 편향의 정도는 증폭 단계의 수와 함께 전형적으로 증가한다. 일부 구현예에서, 본원에 기술된 방법은 DNA를 낮은 수준의 대립유전자 편향으로 증폭시키는 단계를 포함한다. 대립유전자 바이어스 화합물이 각각의 추가의 주기와 화합되므로, 전체 편향의 n번째 루트(root)를 계산함으로써 주기당 대립유전자 편향을 측정할 수 있으며, 여기서 n은 농축 정도의 염기 2 로그이다. 일부 구현예에서, DNA의 제2 혼합물을 포함하는 조성물이 제공되며, 여기서 DNA의 제2 혼합물은 DNA의 제1 혼합물의 다수의 다형성 유전자자리에 우선적으로 농축되고, 여기서, 농축 정도는 적어도 10, 적어도 100, 적어도 1,000, 적어도 10,000, 적어도 100,000 또는 적어도 1,000,000이고, 여기서 각각의 유전자자리에서 DNA의 제2 혼합물 중 대립유전자 비는 DNA의 제1의 혼합물의 다수 다형성 유전자자리의 대립유전자 비와는 평균 1,000%, 500%, 200%, 100%, 50%, 20%, 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, 0.02%, 또는 0.01% 미만인 비율만큼 상이하다. 일부 구현예에서, DNA의 제2의 혼합물을 포함하는 조성물이 존재하며, 여기서 DNA의 제2의 혼합물은 DNA의 제1의 혼합물의 다수의 다형성 유전자자리에 우선적으로 농축되어 있으며, 여기서 다수의 다형성 유전자자리에 대해 주기당 대립유전자 편향은 평균적으로 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, 또는 0.02% 미만이다. 일부 구현예에서, 다수의 다형성 유전자자리는 적어도 10개의 유전자자리, 적어도 20개의 유전자자리, 적어도 50개의 유전자자리, 적어도 100개의 유전자자리, 적어도 200개의 유전자자리, 적어도 500개의 유전자자리, 적어도 1,000개의 유전자자리, 적어도 2,000개의 유전자자리, 적어도 5,000개의 유전자자리, 적어도 10,000개의 유전자자리, 적어도 20,000개의 유전자자리, 또는 적어도 50,000개의 유전자자리를 포함한다
일부 구현예
일부 구현예에서, 잉태된 태아에서 염색체의 측정된 배수성 상태를 기재하는 보고서를 생성하는 방법이 본원에 기재되어 있으며, 당해 방법은 태아 모친의 DNA 및 태아의 DNA를 함유하는 제1의 시료를 수득하는 단계; 태아의 부모 한명 또는 둘 모두의 유전형 데이터를 수득하는 단계; DNA를 분리하여 제조된 시료를 수득함으로써 제1 시료를 제조하는 단계; 제조된 시료 속에서 다수의 다형성 유전자자리에서 DNA를 측정하는 단계; 컴퓨터 상에서 제조된 시료에서 이루어진 DNA 측정에 의하여 다수의 다형성 유전자자리에서 대립유전자 수 또는 대립유전자 수 확률을 계산하는 단계; 컴퓨터 상에서, 염색체의 상이한 가능한 배수성 상태에 대해 염색체 상에서 다수의 다형성 유전자자리에서 예측된 대립유전자 수 확률에 관한 다수의 배수성 가설을 생성하는 단계; 컴퓨터 상에서, 태아의 부모 한 명 또는 둘 모두의 유전형 데이터를 사용하여 각각의 배수성 가설에 대한 염색체 상에서의 각각의 다형성 유전자자리의 대립유전자 수 확률에 대한 결합 분포 모델을 구축하는 단계; 컴퓨터 상에서 결합 분포 모델 및 제조된 시료에 대해 계산된 대립유전자 수 확률을 사용하여 배수성 가설 각각이 상대적인 확률을 측정하는 단계; 상기 가설에 상응하는 배수성 상태를 최대 확률로 선택함으로써 태아의 배수성 상태를 요청하는 단계; 및 측정된 배수성 상태를 기재한 보고서를 생성하는 단계를 포함한다.
일부 구현예에서, 당해 방법을 사용하여 다수의 각각의 모에서 다수의 잉태된 태아의 배수성 상태를 측정하며, 당해 방법은: 각각의 제조된 시료 속에서 태아 기원인 DNA의 퍼센트를 측정하는 단계를 추가로 포함하며; 여기서 제조된 시료 속에서 DNA를 측정하는 단계는 각각의 제조된 시료 속에서 DNA 분자의 수를 서열분석함으로써 수행되며, 여기서 DNA의 보다 많은 분자가 태아 DNA의 보다 큰 분획을 갖는 제조된 시료보다 태아 DNA의 보다 작은 분획을 갖도록 제조된 시료의 서열분석이 시행된다.
일부 구현예에서, 당해 방법을 사용하여 다수의 각각의 모에서 다수의 잉태된 태아의 배수성 상태를 측정하며, 여기서 제조된 시료 속에서 DNA를 측정하는 단계는 각각의 태아에 대해, DNA의 제조된 시료의 제1 분획을 서열분석하여 제1의 측정 세트를 수득함으로써 수행되며, 당해 방법은 제1 세트의 DNA 측정을 제공하여, 태아 각각에 대해 각각의 배수성 가설에 대한 제1의 상대적인 확률을 측정하는 단계(여기서 각각의 배수성 가설에 대한 제1의 상대적인 확률 측정은, 이수성 태아에 상응하는 배수성 가설이 유의적이지만 결정적인 확률은 아님을 나타낸다); 이들 태아로부터 제조된 시료의 제2 분획을 재서열분석하여 제2 세트의 측정을 수득하는 단계; 제2 세트의 측정 및 임의로 또한 제1 세트의 측정을 사용하여 태아에 대한 배수성 가설에 대한 제2의 상대적 확률을 측정하는 단계; 및 이의 제2 시료가 상기 가설에 상응하는 배수성 상태를, 제2의 상대적인 확률 결정에 의해 측정된 것으로서 최대 확률로 선택함으로써 재서열분석되는 태아의 배수성 상태를 요청하는 단계를 추가로 포함한다.
일부 구현예에서, 물질의 조성물이 기재되어 있고, 당해 물질의 조성물은 우선적으로 농축된 DNA의 시료를 포함하고, 여기서, 우선적으로 농축된 DNA의 시료는 DNA의 제1 시료의 다수의 다형성 유전자자리에 우선적으로 농축되며, 여기서 DNA의 제1의 시료는 모계 DNA 및 모계 혈장의 태아 DNA의 혼합물로 이루어지고, 여기서 농축의 정도는 적어도 2 인자이고, 여기서 제1 시료와 우선적으로 농축된 시료 사이의 대립유전자 편향은 평균적으로 2% 미만, 1% 미만, 0.5% 미만, 0.2% 미만, 0.1% 미만, 0.05% 미만, 0.02% 미만, 및 0.01% 미만으로 이루어진 그룹 중에서 선택된다. 일부 구현예에서, 이러한 우선적으로 농축된 DNA의 시료를 생성하기 위한 방법이 기재되어 있다.
일부 구현예에서, 태아 및 모계 게놈성 DNA를 포함하는 모계 조직 시료 속의 태아 이수성의 존재 또는 부재를 측정하는 방법이 기재되어 있으며, 당해 방법은 (a) 상기 모계 조직 시료로부터 태아 및 모계 게놈성 DNA의 혼합물을 수득하는 단계; (b) 다수의 다형성 대립유전자에서 태아 및 모계 DNA의 혼합물을 선택적으로 농축시키는 단계; (c) 단계 (a)의 태아 및 모계 게놈성 DNA의 혼합물로부터 선택적으로 농축된 단편을 분포시켜 단일 게놈성 DNA 분자 또는 단일 게놈성 DNA 분자의 증폭 생성물을 포함하는 반응 시료를 제공하는 단계; (d) 단계 (c)의 반응 시료 속에서 게놈성 DNA의 선택적으로 농축된 단편의 거대한 평행 DNA 서열분석을 수행하여 상기 선택적으로 농축된 단편의 서열을 측정하는 단계; (e) 단계 (d)에서 수득된 서열이 속하는 염색체를 확인하는 단계; (f) 단계 (d)의 데이터를 분석하여 i) 모계 및 태아 둘 모두에서 배수체인 것으로 추정되는 적어도 하나의 제1의 표적 염색체에 속하는 단계 (d)에서 측정한 게놈성 DNA의 단편의 수, 및 ii) 제2의 표적 염색체에 속하는 단계 (d)에서 측정한 게놈성 DNA의 단편의 수를 측정하는 단계(여기서, 상기 제2의 염색체는 태아에서 이수성인 것으로 추측된다); (g) 제2의 표적 염색체가 정배수성인 경우 단계 (f) i) 부분에서 측정된 수를 사용하여, 제2의 표적 염색체에 대한 단계 d)에서 측정한 게놈성 DNA의 단편의 수의 예측된 분포를 계산하는 단계; (h) 제2의 표적 염색체가 이수성인 경우, 단계 (f), 부분 i) 및 단계 (b)의 혼합물 속에서 발견된 태아 DNA의 평가된 분획을 사용하여, 제2의 표적 염색체에 대해 단계 (d)에서 측정한 게놈성 DNA의 단편의 수의 예측된 분포를 계산하는 단계; 및 (i) 최대 확률 또는 최대 귀납적 접근법을 사용하여 단계 (f) ii) 부분에서 측정된 게놈성 DNA의 단편의 수가 단계 (g)에서 계산된 분포 또는 단계 (h)에서 계산된 분포의 부분일 가능성이 더 있는지를 측정함으로써; 태아 이수성의 존재 또는 부재를 나타내는 단계를 포함한다.
예시적인 암 진단 방법
숙주내에서 살아있는 암의 DNA가 숙주의 혈액 속에서 발견될 수 있음이 입증되어 있음에 주목한다. 유전 진단이 모계 혈액에서 발견된 혼합된 DNA의 측정에 의하여 이루어질 수 있는 것과 동일한 방법으로, 유전적 진단을 숙주 혈액에서 발견된 혼합된 DNA의 측정에 의하여 동일하게 잘 이루어질 수 있다. 유전적 진단은 이수성 상태, 또는 유전자 돌연변이를 포함할 수 있다. 모계 혈액에서 이루어진 측정으로부터 태아의 배수성 상태 또는 유전적 상태의 측정시 판독하는 본원의 어떠한 청구항도 숙주 혈액에서의 측정으로부터 암의 배수성 상태 또는 유전 상태를 측정하는 데 있어 동일하게 잘 판독할 수 있다.
일부 구현예에서, 본원의 방법은 암의 배수성 상태를 측정하도록 하며, 당해 방법은 숙주의 유전 물질, 및 암의 유전 물질을 함유하는 혼합된 시료를 수득하는 단계; 혼합된 시료 속에서 DNA를 측정하는 단계; 혼합된 시료 속에서 암 기원의 DNA의 분획을 계산하는 단계; 및 혼합된 시료 및 계산된 분획에서 이루어진 측정을 사용하여 암의 배수성 상태를 측정하는 단계를 포함한다. 일부 구현예에서, 당해 방법은 암의 배수성 상태의 측정을 기반으로 암 치료요법을 측정하는 단계를 추가로 포함할 수 있다. 일부 구현예에서, 당해 방법은 암의 배수성 상태의 측정을 기반으로 암 치료제를 투여하는 단계를 추가로 포함할 수 있으며, 여기서 암 치료제는 약제, 생물학적 치료제, 및 항체 기반 치료요법 및 이의 조합을 포함하는 그룹 중에서 취한다.
예시적인 시행 방법
본원에 개시된 어떠한 구현예도 디지탈 전자 회로, 집적 회로, 특별하게 설계된 ASIC(애플리케이션-특이적인 집적 회로), 컴퓨터 하드웨어, 펌웨어(firmware), 소프트웨어, 또는 이의 조합으로 시행될 수 있다. 현재 개시된 양태의 장치는 프로그램화된 프로세서에 의해 실행하기 위한 기계-판독가능한 저장 장치 속에 명백하게 구현된 컴퓨터 프로그램 제품 속에서 시행될 수 있으며; 현재 개시된 구현예의 방법 단계는 지시의 프로그램을 실행하는 프로그램화된 프로세서가 입력 데이터 및 생성되는 출력에서 운용시킴으로써 현재 개시된 구현예의 기능을 수행함으로써 수행될 수 있다. 현재 개시된 구현예는 특수 또는 일반적인 목적일 수 있고, 데이터 및 지시 사항을 수령하고, 데이터 및 설명서를 저장 시스템, 적어도 하나의 입력 장치, 및 적어도 하나의 출력 장치에 전송할 수 있도록 커플링된, 적어도 하나의 프로그램화된 프로세서를 포함하는 프로그램화된 시스템에서 실행될 수 있고/있거나 해석되는 하나 이상의 컴퓨터 프로그램에서 유리하게 시행될 수 있다. 각각의 컴퓨터 프로그램은 경우에 따라 고도의 공정 또는 대상-기원한 프로그래밍 언어로 또는 어셈블리 또는 기계어로 시행될 수 있으며; 특정 경우에, 당해 언어는 편집형 언어 또는 해석형 언어일 수 있다. 컴퓨터 프로그램은 자립형 프로그램으로서, 또는 모듈, 부품, 서브루틴(subroutine), 또는 컴퓨터 환경에서 사용하기에 적합한 다른 유닛으로서 포함하는 어떠한 형태로 배치될 수 있다. 컴퓨터 프로그램은 하나의 부위에서 하나의 컴퓨터 또는 다수의 컴퓨터에서 실행되거나 해석되도록, 또는 다수의 부위를 통해 분산되어 통신 네트워크에 의해 상호연결되도록 배치될 수 있다.
본원에 사용된 것으로서, 컴퓨터 판독가능한 저장 매체는 물리적 또는 명확한 저장(신호와 대치되는 것으로서)를 말하며, 컴퓨터-판독가능한 설명서, 데이터 구조, 프로그램 모듈 또는 다른 데이터와 같은 정보의 명확한 저장을 위한 어떠한 방법 또는 기술로 시행된 휘발성 및 비-휘발성의, 제거가능한 및 제거가능하지 않은 매체를 포함하나, 이에 한정되지 않는다. 컴퓨터 판독가능한 저장 매체는 RAM, ROM, EPROM, EEPROM, 플래쉬 메모리 또는 다른 반도체 기억 장치 기술(고상 메모리 기술), CD-ROM, DVD, 또는 다른 광학 저장, 자기 카세트, 자기 테이프, 자기 디스크 저장 또는 다른 자기 저장 장치, 또는 바람직한 정보 또는 데이터 또는 설명서를 명확하게 저장하는데 사용될 수 있고 컴퓨터 또는 프로세서에 의해 접근가능한 어떠한 다른 물리적 또는 물질 매체를 포함하나, 이에 한정되지 않는다.
본원에 기술된 방법 중 어느 것도 컴퓨터 스크린, 또는 종이 출력과 같은 물리적 양식으로 데이터의 출력을 포함한다. 당해 서류의 어느 곳의 어떠한 구현예의 설명에서도, 기술된 방법이 주치의에 의해 작동될 수 있는 작동가능한 데이터의 출력과 결합될 수 있음은 이해될 수 있다. 또한, 기술된 방법은 임상 치료를 초래하는 임상 결정 또는 작동이 없는 임상 결정과 결합될 수 있다. 표적 개체에 관한 유전 데이터를 측정하기 위해 본 서류에 기술된 구현예 중의 일부는 IVF와 관련하여 전달하기 위한 하나 이상의 배아를 선택하기 위한 결정과 결합될 수 있거나, 임으로 추정된 모의 자궁으로 배아를 전달하는 과정과 결합될 수 있다. 표적 개체에 관한 유전 데이터를 측정하기 위해 본 서류에 기술된 구현예들 중 일부는 잠재적인 염색체 비정상성, 또는 이의 결여의 인식과, 의학적 전문의와 결합될 수 있으며, 임의로 태아기 진단과 관련하여 태아를 유산시키거나 유산시키지 않는 결정과 결합될 수 있다. 본원에 기술된 구현예 중의 일부는 작동가능한 데이터의 출력, 임상 치료를 초래하는 임상 결정의 실행, 또는 행동을 취하지 않는 임상 결정의 실행과 결합될 수 있다.
예시적인 진단 박스
일 구현예에서, 본 기재내용은 본 기내내용에 기술된 방법들 중 어느 것을 부분적으로 또는 완전히 수행할 수 있는 진단 박스를 포함한다. 일 구현예에서, 진단 박스는 주치의의 사무소, 병원 실험실, 또는 환자 보호 지점과 충분히 근접한 어떠한 적합한 위치에 위치할 수 있다. 당해 박스는 전체적으로 자동화된 양식으로 전체 방법을 수행할 수 있거나, 박스는 기술자에 의해 수동으로 완료될 하나 또는 다수의 단계를 필요로 할 수 있다. 일 구현예에서, 당해 박스는 모계 혈장에서 측정된 적어도 유전형 데이터에서 분석할 수 있다. 일 구현예에서, 박스는 진단 박스에서 측정된 유전형 데이터를 이후에 유전형 데이터를 분석하여 가능하게는 또한 보고서를 생성할 수 있는 외부 계산 시설에 송신하기 위한 수단과 연결될 수 있다. 진단 박스는 하나의 용기에서 다른 용기로 수성 또는 액체 시료를 전달할 수 있는 로보트 단위를 포함할 수 있다. 이는 고체 및 액체 둘 모두의 다수의 시약을 포함할 수 있다. 이는 고 배출 서열분석기를 포함할 수 있다. 이는 컴퓨터를 포함할 수 있다.
실험 단락
현재 개시된 구현예는 다음의 실시예에 기술되어 있으며, 이는 본원의 이해에 도움이 되도록 설정되며 이후 수반되는 특허청구범위에 정의된 것으로서 기재내용의 영역을 어떠한 방식으로 제한하는 것으로 고려되지 않아야 한다. 다음의 실시예는 기술된 구현예를 사용하는 방법의 설명 및 완전한 기재내용과 함께 당해 분야의 통상의 기술자에게 제공하기 위해 설정되어 있으며, 본원의 영역을 제한함을 의도하지 않으며 이들은, 하기 실험이 수행된 실험만 또는 모두임을 나타냄을 의도하지 않는다. 사용된 수(예를 들면, 양, 온도 등)와 관련하여 정밀성을 보증하기 위해 노력하였지만, 일부 실험 오차 및 편차가 고려될 수 있다. 달리 나타내지 않는 한, 부분은 용적부이고, 온도는 섭씨 온도이다. 기술된 방법에서 변화는, 실험을 나열함을 의미하는 근본적인 측면을 변화시키지 않고 이루어질 수 있음이 이해되어야 한다.
실험 1
당해 목적은 발표된 방법과 비교하여 비-침입성 태아 삼염색체성 진단의 정밀도를 증진시키는 태아 분획을 계산하기 위해 모계 유전형을 사용하는 베이시안 최대 확률 평가(Bayesian maximum likelihood estimation: MLE) 알고리즘을 나타내기 위한 것이었다.
모계 cfDNA에 대한 모의된 서열분석 데이터는 삼염색체-21 및 각각의 모계 세포주에서 수득된 시료채취 판독물로 생성시켰다. 정확한 이염색체성 및 삼염색체성 요청의 비율은 발표된 방법(참조: Chiu 등 BMJ 2011;342:c7401) 및 본 발명자들의 MLE-기반 알고리즘에 대한 다양한 태아 분획에서 500개의 모의(simulation) 실험에 의해 측정되었다. 본 발명자들은 IRB-증명된 프로토콜 하에 수집한 4명의 임신한 모친 및 부친으로부터 5백만 개의 셧건 판독물을 수득함으로써 당해 모의 시험을 입증하였다. 모계 유전형은 290K SNP 배열에서 수득하였다.(참조: 도 14)
모의 실험에서, MLE-기반 접근법은 9% 정도의 낮은 태아 분획에 대해 99.0%의 정밀도를 달성하였으며 전체 정밀도에 대해 잘 상응하는 신뢰도를 보고하였다. 본 발명자들은 4개의 실제 시료를 사용하여 이들 결과를 입증하였으며, 여기서 본 발명자들은 99%를 초과하는 계산된 확신으로 모든 정확한 요청을 수득하였다. 대조적으로, 츄(Chiu) 등에 대한 발표된 알고리즘의 본 발명자들의 시행은 99.0%의 정밀도를 달성하기 위해 18%의 태아 분획을 필요로 하였고, 9%의 태아 DNA에서 단지 87.8%의 정밀도를 달성하였다.
MLE-기반 접근법과 관련하여 부모계 유전형의 태아 분획 측정은 첫 번째 및 조기 2 번째 3개월 동안 예측된 태아 분획에서 발표된 알고리즘보다 더 높은 정밀도를 달성한다. 또한, 본원에 개시된 방법은 특히 배수성 검출이 보다 어려운 낮은 태아 분획에서, 결과의 신뢰도를 측정하는데 있어서 중대한 신뢰 기준(confidence metric)을 생성한다. 발표된 방법은 거짓 양성율을 미리 정하는 접근법인, 거대 세트의 이배체성 훈련 데이터를 기반으로 배수성을 요청하기 위한 거의 정밀하지 않은 한계 방법을 사용한다. 또한, 신뢰 기준 없이, 발표된 방법은, 요청을 이루기에 불충분한 태아 cfDNA가 존재하는 경우 거짓 음성 결과를 보고하는 위험에 있다. 일부 구현예에서, 신뢰 추정은 요청된 배수성 상태에 대해 계산된다.
실험 2
본 목적은 베이시안 최대 확률 평가(MLE) 알고리즘에서 모계 유전형 및 합맵 데이터(Hapmap data)를 사용함으로써 저 태아 분획으로 이루어진 시료 속에서 태아 삼염색체 18, 21, 및 X의 비-침입성 검출을 개선시키기 위한 것이었다.
4개의 정배수성 및 2개의 삼염색체성-양성 임신의 모계 시료 및 각각의 부계 시료를 태아 핵형이 알려져 있는 부모로부터 IRB-승인된 프로토콜 하에 수득하였다. 모계 cfDNA를 혈장에서 추출하고 대략 1,000만 개의 서열 판독물을 특이적인 SNP를 표적화하는 우선적인 농축 후 수득하였다. 부모 시료를 유사하게 서열분석하여 유전형을 수득하였다.
기술된 알고리즘은 모든 정배수성 시료의 염색체 18 및 21 및 이수성 시료의 정상의 염색체를 정확하게 요청하였다. 삼염색체성 18 및 21은, 남성 및 여성 태아에서 염색체 X 카피 수에서와 같이, 정정되었다. 당해 알고리즘에 의해 생산된 신뢰도는 모든 경우에서 98%를 초과하였다.
기술된 방법은 12% 미만의 태아 DNA를 포함한 시료를 포함한, 6개의 시료의 모든 시험한 염색체의 배수성을 보고하였으며, 이는 1번째 및 조기 2번째-3개월 시료의 대략 30%를 차지한다. 본 MLE 알고리즘과 발표된 방법 사이의 중요한 차이는, 이것이 부모계 유전형 및 합맵 데이터를 지렛대로 하여 정확도를 개선시키고 신뢰 기준을 생성한다는 것이다. 낮은 태아 분획에서, 모든 방법은 거의 정밀하지 않게 되므로; 충분한 태아 cfDNA없이 시료를 정확히 확인하여 신뢰가능한 요청을 이루는 것이 중요하다. 다른 것은 염색체 Y 특이적인 프로브를 사용하여 남아 태아의 태아 분획을 평가하였으나, 공존하는 부모계 유전형은 성 둘 모두에 대한 태아 분획의 평가를 가능하도록 한다. 표적화되지 않은 숏건 서열분석을 사용하는 발표된 방법의 다른 고유의 한계는, 배수성 요청의 정밀도가 GC 풍부성과 같은 인자에서의 차이로 인하여 염색체 중에서 변한다는 것이다. 본 표적화된 서열분석 접근법은 이러한 염색체-규모 변화에 크게 독립적이며 염색체 사이에서 보다 지속적인 수행능을 생성한다.
실험 3
본 목적은, 삼염색체성이 신규 정보학을 사용하여, 삼배체성 태아에서 고 신뢰도록 검출되는지를 측정하여 모계 혈장 속에서 자유로이 부유하는 태아 DNA의 SNP 유전자자리를 분석하는 것이었다.
20mL의 혈액을 비정상적인 초음파 후 임신한 부모에서 채혈하였다. 원심분리 후, 모계 DNA를 연막(DNEASY, 제조원: QIAGEN)으로부터 추출하고; 세포-유리된 DNA를 혈장(QIAAMP 제조원: QIAGEN)으로부터 추출하였다. 표적화된 서열분석을 DNA 시료 둘 모두에서 염색체 2, 21, 및 X에서 SNP 유전자자리에 적용하였다. 최대-확률 베이시안 평가는 모든 가능한 배수성 상태에서 가장 가능성있는 가설을 선택하였다. 당해 방법은 태아 DNA 분획, 배수성 상태 및 배수성 측정시 명백한 신뢰도를 측정하다. 참조 염색체의 배수성에 대해 추정을 이루어지지 않는다. 당해 진단은 당해 분야의 최근 상태인, 서열 판독물 수와는 독립적인 시험 통계학을 사용한다.
본 방법은 염색체 2 및 21의 삼염색체성을 정확하게 진단하였다. 자녀 분획은 11.9% [CI 11.7-12.1]에서 평가하였다. 태아는 효과적으로 1의 신뢰도(오차 확률 < 10^-30)의 신뢰도로 염색체 2 및 21의 1명의 모친 및 2명의 부친 카피를 갖는 것으로 밝혀졌다. 이는 염색체 2 및 21 각각에서 92,600 및 258,100개의 판독물을 사용하여 달성하였다.
이는 메타상 핵형에 의해 입증되는 바와 같이, 태자녀 삼배수성인 경우 모계 혈액의 삼염색체성 염색체의 비-침입성 출생전 진단의 최초 입증이다. 비-침입성 진단의 현존하는 방법은 당해 시료에서 이수성을 검출하지 않을 수 있다. 현재의 방법은 이염색체성 참조 염색체와 관련하여 삼염색체성 염색체에서 과잉의 서열 판독물에 의존하지만; 삼염색체성 태아는 이염색체성 참조물질을 가지지 않는다. 또한, 현재의 방법은 태아 DNA의 분획 및 서열 판독물의 수를 사용하여 유사하게 고-신뢰도 배수성 측정을 달성하지 않을 수 있다. 이는 모든 24개 염색체에 대한 접근법을 확장시키는데 용이하다.
실험 4
다음 프로토콜을 표준 PCR(중첩이 사용되지 않았음을 의미)을 사용하는 정배수성 임신의 모체 혈장에서 분리된 DNA 및 또한 삼염색체성 21 세포주의 게놈성 DNA의 800-플렉스 증폭을 위해 사용하였다. 실험실 제조 및 증폭은 단일 튜브 평활 말단화에 이은 A-테일링를 포함하였다. 어댑터 연결을 AGILENT SURESELECT 키트에서 발견된 연결 키트를 사용하여 수행하였으며, PCR을 7 주기 동안 수행하였다. 이후에, 800개의 상이한 프라이머 쌍을 사용하여 15 주기의 STA(95℃에서 30초 동안; 72℃에서 1분 동안; 60℃에서 4분 동안; 65℃에서 1분 동안; 72℃에서 30초 동안)로 염색체 2, 21 및 X에서 SNP를 표적화하였다. 반응은 12.5nM 프라이머 농도로 실시하였다. 이후에, DNA를 ILLUMINA IIGAX 서열분석기로 서열분석하였다. 서열분석기는 190만 개의 판독물을 배출하였으며, 이중 92%가 게놈에 맵핑되었고; 게놈에 맵핑된 판독물 중에서, 99% 이상이 표적화된 프라이머에 의해 표적화된 영역 중 하나에 맵핑되었다. 당해 수는 혈장 DNA 및 게놈 DNA 둘다에 대해 필수적으로 동일하였다. 도 15는 염색체 21에서 공지된 삼염색체성을 사용하여 세포주로부터 취한 게놈 DNA에서 서열분석기에 의해 검출된 ~780개의 SNP에 대한 2개의 대립유전자의 비를 나타낸다. 대립유전자 비는 가시적으로 판독하기 용이하지 않으므로, 대립유전자 비는 용이한 가시화를 위해 본원에서 플롯팅되었음에 주목한다. 원은 이염색체성 염색체에서 SNP를 나타내는 반면, 별표는 삼염색체성 염색체에서 SNP를 나타낸다. 도 16은 도 X에서와 동일한 데이터의 다른 표시이며, 여기서 Y-축은 각각의 SNP에 대해 측정된 A 및 B의 상대적인 수이고, 여기서 X-축은 SNP의 수이며 여기서 SNP는 염색체에 의해 분리된다. 도 16에서, SNP 1 내지 312는 염색체 2에서 발견되며, SNP 313 내지 605는 삼염색체성인 염색체 21에서 발견되고, SNP 606 내지 800은 염색체 X에서 발견된다. 염색체 2 및 X의 데이터는 이염색체성 염색체를 나타내는데, 그 이유는 상대적인 서열 수가 3개의 집단에 놓여있기 때문이다: 그래프의 상단에서의 AA, 그래프의 하단에서의 BB, 및 그래프의 중간에서의 AB. 삼염색체성인 염색체 21의 데이터는 4개의 집단을 나타낸다: 그래프의 상단에서 AAA, 0.65 라인(2/3) 주변에서 AAB,.35 라인(1/3)에서 ABB, 및 그래프의 하단에서 BBB.
도 17a 내지 도 17d는 동일한 800-플렉스 프로토콜에 대한 데이터를 나타내지만 임신한 여성의 4개의 혈장 시료의 증폭된 DNA에서 측정된다. 이들 4개의 시료의 경우, 본 발명자는 7개 부류의 점을 찾는 것으로 예측한다: (1) 그래프의 상단을 따라 모계 및 태아 둘 모두가 AA인 유전자자리가 있고, (2) 그래프의 상단의 약간 아래에 모가 AA이고 태자녀 AB인 유전자자리가 있고, (3) 0.5 라인의 약간 상단에 모가 AB이고 태자녀 AA인 유전자자리가 있고, (4) 0.5 라인을 따라 모계 및 태아 둘 모두가 AB인 인 유전자자리가 있고, (5) 0.5 라인의 약간 아래쪽에 모가 AB이고 태자녀 BB인 유전자자리가 있고, (6) 그래프의 하단 약간 위쪽에 모가 BB이고 태자녀 AB인 유전자자리가 있고, (1) 그래프의 하단을 따라 모계 및 태아 둘 모두가 BB 인 유전자자리가 있다. 태아 분획이 작을 수록, 집단 (1)과 (2), 집단 (3), (4)와 (5), 및 집단 (6)과 (7) 사이에 분리가 일어나지 않는다. 분리는 태아 기원인 DNA의 분획의 1/2인 것으로 예측된다. 예를 들어, DNA가 20% 태아, 및 80% 모인 경우, 본 발명자들은 (1) 내지 (7)이 각각 1.0, 0.9, 0.6, 0.5, 0.4, 0.1 및 0.0에 집중될 것으로 예측한다; 예를 들면, 도 17d, POOL1_BC5_ref_rate를 참조한다. 대신 DNA가 8% 태자녀고, 92% 모인 경우, 본 발명자는, (1) 내지 (7)이 각각 1.00, 0.96, 0.54, 0.50, 0.46, 0.04 및 0.00에 집중될 것으로 예측한다; 예를 들면, 도 17b, POOL1_BC2_ref_rate를 참조한다. 태아 DNA가 검출되지 않는 경우, 본 발명자들은 (2), (3), (5), 또는 (6)를 찾기를 기대하지 않으며; 달리는 본 발명자는, 당해 분리가 0이므로, (1) 및 (2)가 (3), (4) 및 (5), 및 또한 (6) 및 (7)에서와 같이 서로 상단에 있는 것으로 말할 수 있다; 예를 들면, 도 17c, POOL1_BC7_ref_rate를 참조한다. 도 17a, POOL1_BC1_ref_rate에 대한 태아 분획은 약 25%인 것에 주목한다.
실험 5
DNA 증폭 및 측정의 대부분의 방법은 일부 대립유전자 편향을 생산할 것이며, 여기서 유전자자리에서 전형적으로 발견된 2개의 대립유전자는 DNA의 시료 속의 대립유전자의 실제량을 대표하지 않는 강도 또는 수로 검출된다. 예를 들면, 단일의 개체의 경우, 이종접합성 유전자자리에서 본 발명자는 1:1 비의 2개의 대립유전자를 찾는 것으로 고려하며, 이는 이종접합성 유전자자리에 대해 예측된 이론적 비이며, 본 발명자는 55:45, 또는 심지어 60:40을 찾을 수 있다. 또한, 서열분석과 관련하여, 판독물의 깊이가 낮은 경우, 단순한 확률적 노이즈는 유의적인 대립유전자 편향을 생성할 수 있었음에 주목한다. 일 구현예에서, 각각의 SNP의 거동를 모델화함으로써 지속적인 편향이 특정한 대립유전자에 대해 관찰되는 경우, 당해 편향을 이에 대해 정정할 수 있는 것이 가능하다. 도 18은 바이어스 정정 전 및 후에, 바이어스 변화에 의해 설명될 수 있는 데이터의 분획을 나타낸다. 도 18에서, 별은 800-플렉스 실험에 대한 드믄 서열 데이터에서 관찰된 대립유전자 편향을 나타내고; 원은 정정 후 대립유전자 편향을 나타낸다. 대립유전자 편향이 전적으로 없는 경우, 본 발명자는, 당해 데이터가 x=y 라인을 따라 속하는 것으로 예측할 수 있다. 150-플렉스 표적화된 증폭을 사용하여 DNA를 증폭시킴으로써 생산된 유사한 세트의 데이터는 바이어스 정정 후 1:1 라인에 매우 밀접하게 속하는 데이터를 생산하였다.
실험 6
어댑터 태그에 대해 특이적인 프라이머와 연결된 어댑터를 사용한 DNA의 공통의 증폭(여기서 프라이머 어닐링 및 연장 시간은 수분으로 제한된다)은 보다 짧은 DNA 쇄의 비율을 농축시키는 효과를 갖는다. 서열분석에 적합한 DNA 라이브러리를 생성하도록 설계된 대부분의 라이브러리 프로토콜은 이러한 단계를 함유하며, 실시예 프로토콜을 발표되어 있고 당해 분야의 기술자에게 잘 공지되어 있다. 본 발명의 일부 구현예에서, 공통의 태그를 지닌 어댑터를 혈장 DNA에 연결시키고, 어댑터 태그에 특이적인 프라이머를 사용하여 증폭시켰다. 일부 구현예에서, 공통의 태그는 서열분석에 사용된 바와 동일한 태그일 수 있거나, 이는 PCR 증폭을 위한 유일한 공통의 태그일 수 있거나, 이는 태그의 세트일 수 있다. 태아 DNA는 천연에서 전형적으로 짧으므로, 모계 DNA가 천연에서 둘 모두 짧고 길 수 있지만, 당해 방법은 혼합물 속의 태아 DNA의 비율을 농축시키는 효과를 갖는다. 세포자멸사 세포의 DNA인 것으로 고려되며, 태아 및 모계 DNA 둘 모두를 함유하는, 자유로이 부유하는 DNA는 짧으며-거의 200bp 이하이다. 정맥절개술 후 일반적인 현상인, 세포 분해에 의해 방출된 세포 DNA는 전형적으로 거의 전적으로 모의 것이며, 또한 매우 길어서-거의 500bp 초과이다. 따라서, 수분 이상 동안 주변에서 설정된 혈액 시료는 짧은(태아 + 모) 및 보다 긴(모) DNA의 혼합물을 함유할 것이다. 모계 혈장에서 비교적 짧은 연장 시간을 사용한 공통의 증폭에 이어 표적화된 증폭을 수행하는 것은 표적화된 증폭만을 사용하여 증폭된 혈장과 비교하는 경우 태아 DNA의 상대적인 비율을 증가시키는 경향이 있을 것이다. 이는, 투입이 혈장 DNA인 경우 측정된 태아 퍼센트(수직 축) 대 투입 DNA가 ILLUMINA GAIIx 라이브러리 제조 프로토콜을 사용하여 제조된 라이브러리를 갖는 혈장 DNA인 경우 측정된 태아 퍼센트를 나타내는 도 19에서 찾을 수 있다. 모든 점은 라인 아래에 속하며, 이는 라이브러리 제조 단계가 태아 기원인 DNA의 분획을 농축시킴을 나타낸다. 용혈을 나타내는 적색을 나타내므로 세포 분해로 인해 양이 증가된 모계 DNA 퍼센트일 수 있는 혈장의 2개의 시료는, 라이브러리 제조를 표적화된 증폭 전에 수행하는 경우 태아 분획의 특히 유의적인 농축을 나타낸다. 본원에 개시된 방법은, 용혈 또는 오염되는 DNA의 비교적 긴 쇄를 포함하는 세포가 분해되어 짧은 DNA와 긴 DNA의 혼합된 시료로 오염되는 일부 다른 상황이 발생하는 경우에 특히 유용하다. 전형적으로 비교적 짧은 어닐링 및 연장 시간은, 이들이 5 또는 10초 이하로 짧거나 5 또는 10분으로 길 수 있지만, 30초 내지 2분이다.
실험 7
다음 프로토콜을 정배수성 임신에서 분리된 DNA 및 또한 삼배수성 21 세포주의 게놈성 DNA의 직접적인 PCR 프로토콜, 및 또한 반-중첩 접근법을 사용한 1,200-플렉스 증폭에 사용하였다. 라이브러리 제조 및 증폭은 단일의 튜브 평활 말단화에 이은 A-테일링을 포함하였다. 어댑터 연결은 AGILENT SURESELECT 키트에서 발견된 연결 키트의 변형을 사용하여 수행하였으며, PCR은 7개 주기 동안 수행하였다. 표적화된 프라이머 혼주물에서, 염색체 21의 SNP에 대해 550개 검정, 및 염색체 1 및 X 각각의 SNP에 대해 325개 검정이 존재하였다. 프로토콜 둘 모두는 16 nM 프라이머 농도를 사용하는 15 주기의 STA(95℃에서 30초; 72℃에서 1분; 60℃에서 4분; 65℃에서 30초; 72℃에서 30초)를 포함하였다. 반-중첩 PCR 프로토콜은 29 nM의 내부 전방 태그 농도, 및 1 μM 또는 0.1 uM의 역방 태그 농도를 사용하는 15 주기의 STA(95℃에서 30초; 72℃에서 1분; 60℃에서 4분; 65℃에서 30초; 72℃에서 30초)의 제2의 증폭을 포함하였다. 이후에, DNA를 ILLUMINA IIGAX 서열분석기를 사용하여 서열분석하였다. 직접적인 PCR 프로토콜을 위해, 판독물의 73%를 게놈에 맵핑하고; 반-중첩 프로토콜의 경우, 서열 판독물의 97.2%가 게놈에 맵핑되었다. 따라서, 반-중첩 프로토콜은 대략 30%의 추가의 정보를 생성하며, 이는 아마도 프라이머 이량체를 유발하는 가능성이 가장 큰 프라이머의 제거에 대부분 기인한다.
판독물 가변성의 깊이는, 반-중첩 프로토콜을 사용하는 경우 직접적인 PCR 프로토콜을 사용하는 경우보다 더 큰 경향이 있으며(참조: 도 20), 여기서 다이아몬드 모양은 반-중첩 프로토콜을 사용하여 수행된 유전자자리에 대한 판독물의 깊이를 말하며, 사각형은 중첩없이 수행된 유전자자리에 대한 판독물의 깊이를 말하다. SNP는 다이아몬드의 경우 판독물의 깊이에 의해 배열되므로, 다이아몬드 모양은 모두 곡선에 속하나, 사각형은 느슨하게 관련된 것으로 여겨지며; SNP의 배열은 임의적이고, 이는 좌측에서 우측으로 이의 위치라기 보다는 판독물의 깊이를 나타내는 점의 높이이다.
일부 구현예에서, 본원에 기술된 방법은 판독물(DOR) 변수의 탁월한 깊이를 달성할 수 있다. 예를 들면, 1,200개 검정의 게놈 DNA의 1,200-플렉스의 직접적인 PCR 증폭을 사용하는 당해 실험의 하나의 버젼에서(도 21): 1186개 검정은 10보다 큰 DOR을 가졌고; 판독물의 평균 깊이는 400이었으며; 1063개 검정(88.6%)은 200 내지 800개의 판독물의 깊이를 가졌고, 각각의 대립유전자에 대한 판독물의 수가 의미있는 데이터를 수득하기에 충분히 높지만, 각각의 대립유전자에 대한 판독물의 수가 이들 판독물의 미미한 사용보다 충분히 높지 않은 경우 이상적인 윈도우는 특히 작다. 12개의 대립유전자만이 1035개 판독물에서 최대의 판독물의 보다 높은 깊이를 가졌다. DOR의 표준 편차는 290이었으며, 평균 DOR은 453이었고, DOR의 변동 계수는 64%이었으며, 950,000개의 총 판독물이 존재하였고, 판독물의 63.1%가 게놈에 맵핑되었다. 1,200-플렉스 반-내포된 프로토콜을 사용하는 다른 실험(도 22)에서, DOR이 더 높았다. DOR의 표준 편차는 583이었으며, 평균 DOR은 630이었고, DOR의 변동 계수는 93%이었으며, 870,000개의 총 판독물이 존재하였으며, 판독물의 96.3%가 게놈에 맵핑되었다. 이들 경우 둘 모두에서, SNP는 모에 대한 판독물의 깊이에 의해 배열되므로, 곡선은 판독물의 모계 깊이를 나타냄에 주목한다. 자녀와 부친 사이의 차이는 유의적이지 않으며; 이는 단지 본 설명의 목적을 위한 유의적인 경향이다.
실험 8
당해 실험에서, 반-내포된 1,200-플렉스 PCR 프로토콜을 사용하여 3개 세포 및 1개 세포의 DNA를 증폭시켰다. 당해 실험은 모계 혈액에서 분리된 태아 세포를 사용한 태아 이수성 시험과 관련되거나, 생검 난할구 또는 영양외배엽 시료를 사용한 착상 전 유전 진단을 위한 것이다. 조건 당 2명의 개체(46개의 XY 및 47개의 XX+21)의 1 및 3개의 세포의 3개 복제물이 존재하였다. 검정은 염색체 1, 21 및 X를 표적화하였다. 3개의 상이한 분해 방법을 사용하였다: ARCTURUS, MPERv2 및 알칼린 분해. 서열분석을 하나의 서열분석 레인에서 48개의 시료의 다중화를 수행하였다. 알고리즘을 3개의 염색체 각각, 및 복제물 각각에 대해 정확한 배수성 요청으로 회귀하였다.
실험 9
하나의 실험에서, 4개의 모계 혈장 시료를 제조하고 반-내포된 9,600-플렉스 프로토콜을 사용하여 증폭시켰다. 시료를 다음 방식으로 제조하였다: 40 mL 이하의 모계 혈액을 원심분리하여 연막 및 혈장을 분리하였다. 모계 시료 속의 게놈 DNA를 연막으로부터 제조하고 부계 DNA를 혈액 시료 또는 타액 시료로부터 제조하였다. 모계 혈장 속의 세포-유리된 DNA를 QIAGEN CIRCUL ATING 핵산 키트를 사용하여 분리하고 제조업자의 지시에 따라 45 μL의 TE 완충액 속에서 용출시켰다. 공통의 연결 어댑터를 35 μL의 정제된 혈장 DNA의 각각의 분자의 말단에 첨부하고 라이브러리를 어댑터 특이적인 프라이머를 사용하여 7 주기 동안 증폭시켰다. 라이브러리를 AGENCOURT AMPURE 비드를 사용하여 정제하고 50 μl의 물 속에서 용출시켰다.
3 μl의 DNA를 15 주기의 STA(초기 폴리머라제 활성화의 경우 95℃에서 10분, 이후에, 15주기의 95℃에서 30초; 72℃에서 10초; 65℃에서 1분; 60℃에서 8분; 65℃에서 3분 및 72℃에서 30초; 및 72℃에서 2분 동안 최종 연장)로 9600개 표적-특이적인 태그된 역방 프라이머의 14.5 nM 프라이머 및 하나의 라이브러리 어댑터 특이적인 전방 프라이머를 500 nM에서 사용하여 증폭시켰다.
반-내포된 PCR 프로토콜은 1000 nM의 역 태그 농도, 및 각각의 9600-특이적인 전방 프라이머에 대해 16.6 nM의 농도를 사용하는 15주기의 STA(초기 폴리머라제 활성화의 경우 95℃에서 10분, 이후에, 15 주기의 95℃에서 30초; 65℃에서 1분; 60℃에서 5분; 65℃에서 5분 및 72℃에서 30초; 및 72℃에서 2분 동안 최종 연장)에 대한 제1의 STA 생성물의 희석의 제2 증폭을 포함하였다.
이후에, STA 생성물의 분취량을 표준 PCR로 1 μM의 태그-특이적인 전방 및 바코드화된 역 프라이머를 사용한 10 주기 동안 표준 PCR로 증폭시켜 바코드화된 서열분석 라이브러리를 생성하였다. 각각의 라이브러리의 분취량을 상이한 바코드의 라이브러리와 혼합하고 회전 컬럼을 사용하여 정제하였다.
당해 방법으로, 9,600개의 프라이머를 단일-웰 반응기에서 사용하였으며; 프라이머를 염색체 1, 2, 13, 18, 21, X 및 Y에서 발견된 SNP를 표적화하도록 설계하였다. 이후에, 앰플리콘을 ILLUMINA GAIIX 서열분석기를 사용하여 서열분석하였다. 시료 당, 대략 390만 개의 판독물을 서열분석기로 생성하였으며, 370만 개의 판독물이 게놈에 맵핑되었고(94%), 이들 중, 290만 개의 판독물(74%)이 표적화된 SNP에 맵핑되었으며 판독물의 평균 깊이는 344이고 판독물의 중간 깊이는 255이었다. 4개의 시료에 대한 태아 분획은 9.9%, 18.9%, 16.3%, 및 21.2%인 것으로 밝혀졌다.
상대적인 모계 및 부계 게놈성 DNA 시료를 반-내포된 9600-플렉스 프로토콜을 사용하여 증폭시키고 서열분석하였다. 반-충접된 프로토콜은, 이것이 제1의 STA에서 7.3nM에서 9,600개의 외부 전방 프라이머 및 태그된 역방 프라이머에 적용된다는 점에서 상이하다. 열주기 조건 및 제2의 STA의 조성물, 및 바코드 PCR은 반-내포된 프로토콜과 동일하였다.
서열분석 데이터를 본원에 개시된 정보학 방법을 사용하여 분석하고 배수성 상태를 이의 DNA가 4개의 모계 혈장 시료 속에 존재하는 태아에 대한 6개 염색체에서 요청하였다. 당해 세트에서 모든 28개의 염색체에 대한 배수성 요청은, 정확하게 요청되었지만 신뢰도가 83%인 하나의 염색체를 제외하고는 99.2% 초과의 신뢰도로 정확하게 요청되었다.
도 23은, 100 이상, 200 이상 및 400 이상의 판독물의 깊이를 갖는 SNP의 수가 1,200-플렉스 프로토콜에서보다 유의적으로 더 높다고 해도, 실험 7에 기술된 1,200-플렛스 반-내포된 접근법의 판독물의 깊이와 함께 9,600-플렉스 반-내포된 접근법의 판독물의 깊이를 나타낸다. 90번째 백분위수(percentile)에서 판독물의 수는 10번째 백분위수에서의 판독물의 수로 나누어 판독물의 깊이의 균일성의 지표인 무치수 기준을 수득하였으며; 당해 수가 작을 수록, 판독물의 깊이는 보다 균일(협소)하다. 평균 90번째 백분위수/10번째 백분위수 비는 실험 9에서 수행된 방법의 경우 11.5이지만, 실험 7에서 수행된 방법의 경우 5.6이다. 보다 적은 서열 판독물은, 판독물의 특정의 퍼센트가 판독물 수 한계 초과임을 보증하는데 필수적이므로, 소정의 프로토콜 배수성에 대한 판독물의 깊이가 협소할수록 서열분석 효능에 대해 더 우수하다.
실험 10
하나의 실험에서, 4개의 모계 혈장 시료를 제조하고 반-내포된 9,600-플렉스 프로토콜을 사용하여 증폭시켰다. 실험 10의 세부사항은 중첩 프로토콜이고, 4개의 시료의 실체를 포함하는 것을 제외하고는, 실험 9와 매우 유사하였다. 당해 세트에서 모든 28개 염색체에 대한 배수성 요청은 99.7%를 초과하는 신뢰도록 정확하게 요청되었다. 760만 개(97%)의 판독물이 게놈에 맵핑되었으며, 630만 개(80%)의 판독물이 표적화된 SNP에 맵핑되었다. 판독물의 평균 깊이는 751이었고, 판독물의 중간 깊이는 396이었다.
실험 11
하나의 실험에서, 3개의 모계 혈장 시료를 4개의 동일한 부위로 나누고, 각각의 부위를 2,400개의 다중화된 프라이머(4개의 부위) 또는 1,200개의 다중화된 프라이머(1개 부위)를 사용하여 증폭시키고 총 10,800개의 프라이머에 대해 반-내포된 프로토콜을 사용하여 증폭시켰다. 증폭 후, 당해 부위들을 서열분석을 위해 함께 혼주시켰다. 실험 11의 세부사항은, 중첩 프로토콜이고, 분열 및 혼주 접근법인 것을 제외하고는 실험 9와 매우 유사하였다. 신뢰도가 83%이었던 경우의 하나의 잃은 요청을 제외하고는, 당해 세트에서 모든 21개 염색체에 대한 배수성 요청은 99.7% 초과의 신뢰도로 정확하게 요청되었다. 340만 개의 판독물은 표적화된 SNP에 맵핑되었고, 판독물의 평균 깊이는 404이었고 판독물의 중간 깊이는 258이었다.
실험 12
하나의 실험에서, 4개의 모계 혈장 시료를 4개의 동일한 부위로 나누고, 각각의 부위를 2,400개의 다중화된 프라이머를 사용하여 증폭시키고 총 9,600개의 프라이머에 대해 반-내포된 프로토콜을 사용하여 증폭시켰다. 증폭 후, 당해 부위들을 서열분석을 위해 함께 혼주시켰다. 실험 12의 세부사항은, 중첩 프로토콜이고, 분열 및 혼주 접근법인 것을 제외하고는 실험 9와 매우 유사하였다. 신뢰도가 78% 초과이었던 경우의 하나의 잃은 요청을 제외하고는, 당해 세트에서 모든 28개 염색체에 대한 배수성 요청은 97% 초과의 신뢰도로 정확하게 요청되었다. 450만 개의 판독물은 표적화된 SNP에 맵핑되었고, 판독물의 평균 깊이는 535이었고 판독물의 중간 깊이는 412이었다.
실험 13
하나의 실험에서, 4개의 모계 혈장 시료를 제조하고 총 9,600개의 프라이머에 대해 9,600-플렉스 삼중으로 반-내포된 프로토콜을 사용하여 증폭시켰다. 실험 12의 세부사항은, 3개 라운드의 중첩을 포함한 중첩 프로토콜이고; 3개의 라운드가 15, 10 및 15개의 STA 주기를 각각 포함한 것을 제외하고는 실험 9와 매우 유사하였다. 94.6%로 정확하게 요청된 하나, 및 신뢰도가 80.8%인 하나의 잃은 요청을 제외하고는, 당해 세트에서 28개 염색체 중 27개에 대한 배수성 요청은 99.9% 초과의 신뢰도로 정확하게 요청되었다. 350만 개의 판독물은 표적화된 SNP에 맵핑되었고, 판독물의 평균 깊이는 414이고 판독물의 중간 깊이는 249이었다.
실험 14
하나의 실험에서, 세포의 45 세트를 1,200-플렉스 반-내포된 프로토콜을 사용하여 증폭시키고, 배수성 측정을 3개의 염색체에서 수행하였다. 당해 실험은 3일째 배아의 단일-세포 생검, 또는 5일째 배아의 영양외배엽 생검에서 착상전 유전 진단을 수행하는 조건을 시뮬레이션한다는 것을 의미한다는 것을 주목한다. 15개의 개개 단일 세포 및 30개 세트의 3개의 세포를 총 45개 반응물을 위한 45개의 개개 반응 튜브 속에 두었으며, 여기서 각각의 반응물은 단지 하나의 세포주의 세포를 함유하였지만, 상이한 반응물은 상이한 세포주의 세포를 함유하였다. 세포를 5 μl의 세척 완충액으로 제조하고 5μl의 ARCTURUS PICOPURE 분해 완충액(제조원: APPLIED BIOSYSTEMS)을 가하여 분해하고 56℃에서 20분, 95℃에서 10분 동안 항온처리하였다.
단일/3개의 세포의 DNA를 50 nM 프라이머 농도의 1200개의 표적-특이적인 전방 및 태그된 역방 프라이머를 사용하여 25주기의 STA(초기 폴리머라제 활성화를 위해 95℃에서 10분, 이후에 25 주기의 95℃에서 30초; 72℃에서 10초; 65℃에서 1분; 60℃에서 8분; 65℃에서 3분 및 72℃에서 30초; 및 72℃에서 2분 동안 최종 연장)로 증폭시켰다.
반-내포된 PCR 프로토콜은 1000nM의 역방 태그 특이적인 프라이머 농도, 및 400개의 표적-특이적인 중첩된 전방 프라이머 각각에 대해 60 nM의 농도를 사용하여 20주기의 STA(초기 폴리머라제 활성화를 위해 95℃에서 10분, 이후에 15 주기의 95℃에서 30초; 65℃에서 1분; 60℃에서 5분; 65℃에서 5분 및 72℃에서 30초; 및 72℃에서 2분 동안 최종 연장)를 위한 제2의 STA 생성물의 희석물의 3개의 평행한 제2의 증폭을 포함하였다. 따라서, 3개의 평행한 400-플렉스 반응물에서 제1의 STA에서 증폭된 총 1200개의 표적이 증폭되었다.
이후에, STA 생성물의 분취량을 표준 PCR에 의해 1μM의 태그-특이적인 전방 및 바코드화된 역 프라이머를 사용하는 15주기 동안 증폭시켜 바코드화된 서열분석 라이브러리를 생성시켰다. 각각의 라이브러리의 분취량을 상이한 바코드의 라이브러리와 혼합하고 회전 컬럼을 사용하여 정제하였다.
이러한 방식으로, 1,200개의 프라이머를 단일 세포 반응에서 사용하고; 프라이머를 염색체 1, 21 및 X에서 발견된 SNP를 표적화하도록 설계하였다. 이후에, 앰플리콘을 ILLUMINA GAIIX 서열분석기를 사용하여 서열분석하였다. 시료 당, 대략 390만 개의 판독물을 서열분석기에 의해 게놈에 맵핑된 500,000 내지 800,000 백만 개(5천억개 내지 8천억개)의 판독물을 사용하여 생성시켰다(시료당 모든 판독물의 74% 내지 94%).
세포주의 관련된 모계 및 부계 게놈성 DNA를 보다 적은 주기 및 1200-플렉스 제2의 STA를 지닌 유사한 프로토콜로 동일한 반-내포된 1200-플렉스 검정 혼주물을 사용하여 분석하고, 서열분석하였다.
서열분석 데이터를 본원에 개시된 정보학 방법을 사용하여 분석하고 배수성 상태를 시료에 대한 3개의 염색체에서 요청하였다.
도 24는 3개의 염색체(1 = 염색체 1; 2 = 염색체 21; 3 = 염색체 X)에서 6개 시료에 대한 판독물 비의 표준화된 깊이(수직 축)를 나타낸다. 당해 비는 표준화하고, 3개의 46XY 세포를 포함하는 3개의 웰 각각에 걸쳐 평균낸 염색체에 맵핑된 판독물의 수로 나눈 염색체에 맵핑된 판독물의 수와 동일하게 설정되었다. 46XY 반응물에 상응하는 데이터 점의 3개 세트는 1:1의 비를 갖는 것으로 예측된다. 47XX+21 세포에 상응하는 데이터 점의 3개의 세트는 염색체 1의 경우 1:1, 염색체 21의 경우 1.5:1, 및 염색체 X의 경우 2:1의 비를 갖는 것으로 예측되다.
도 25는 3개의 반응물에 대해 3개의 염색체(1, 21, X)에 대해 플롯팅된 대립유전자 비를 나타낸다. 하부 좌측의 반응물은 3개의 46XY 세포에서 반응을 나타낸다. 좌측 영역은 염색체 1에 대한 대립유전자 비이고, 중간 영역은 염색체 21에 대한 대립유전자 비이며, 우측 영역은 염색체 X에 대한 대립유전자 비이다. 46XY 세포의 경우, 염색체 1에 대해 본 발명자는 AA, AB 및 BB SNP 유전형에 상응하는 1, 0.5 및 0의 비를 찾는 것을 예측하였다. 46XY 세포의 경우, 염색체 21에 대해 본 발명자는 AA, AB 및 BB SNP 유전형에 상응하는 1, 0.5 및 0의 비를 찾는 것을 예측하였다. 46XY 세포에 대해, 염색체 X의 경우 본 발명자는 A, 및 B SNP 유전형에 상응하는 1 및 0의 비를 찾는 것을 예측하였다. 하부 우측의 반응은 3개의 47XX+21 세포의 반응을 나타낸다. 대립유전자 비는 하부 좌측 그래프에서와 같이 염색체로 분리하였다. 47XX+21 세포의 경우, 염색체 1에 대해 본 발명자는 AA, AB 및 BB SNP 유전형에 상응하는 1, 0.5 및 0의 비를 찾는 것을 예측한다. 47XX+21 세포의 경우, 염색체 21에 대해 본 발명자는 AAA, AAB, ABB 및 BBB SNP 유전형에 상응하는 1, 0.67, 0.33 및 0의 비를 찾는 것을 예측한다. 47XX+21 세포의 경우, 염색체 X에 대해 본 발명자는 AA, AB, 및 BB SNP 유전형에 상응하는 1, 0.5 및 0의 비를 찾는 것을 예측한다. 상부 우측에서 플롯은 47XX+21 세포주의 1 ng의 게놈성 DNA를 포함하는 반응물에서 이루어졌다. 도 26은 도 25와 동일한 그래프를 나타내지만, 단지 하나의 세포에서 수행된 반응물에 대한 것이다. 좌측 그래프는 47XX+21 세포를 함유한 반응물이었고 우측 그래프는 46XX 세포를 함유한 반응물에 대한 것이었다.
도 25 및 도 26에 나타낸 그래프로부터, 본 발명자가 1 및 0의 비를 찾기를 예측한 경우의 염색체에 대해 점의 2개 집단; 본 발명자가 1, 0.5 및 0의 비를 찾기를 예측한 경우 염색체에 대한 점의 3개의 집단, 및 본 발명자가 1, 0.67, 0.33 및 0의 비를 찾기를 예측한 경우 염색체에 대한 점의 4개 집단이 존재함이 가시적으로 명백하다. 모계 지지 알고리즘은 45개의 반응물 중 모두에 대해 3개의 염색체 중 모두에서 정확한 요청을 이룰 수 있었다.
실험 15
하나의 실험에서, 모계 혈장 시료를 제조하고 반-내포된 19,488-플렉스 프로토콜을 사용하여 증폭시켰다. 시료를 다음 방식으로 제조하였다: 20 mL 이하의 모계 혈액을 원심분리하여 연막 및 혈장을 분리하였다. 모계 시료 속의 게놈성 DNA를 연막으로부터 제조하고 부계 DNA를 혈액 시료 또는 타액 시료로부터 제조하였다. 모계 혈장 속의 세포-유리된 DNA를 QIAGEN CIRCULATING NUCLEIC ACID kit를 사용하여 분리하고 50μl의 TE 완충액 속에서 제조업자의 지시에 따라 용출시켰다. 공통의 연결 어댑터를 40 μL의 정제된 혈장 DNA의 각각의 분자 말단에 첨부시키고 라이브러리를 9주기 동안 어댑터 특이적인 프라이머를 사용하여 증폭시켰다. 라이브러리를 AGENCOURT AMPURE 비드로 정제하고 50μl의 DNA 현탁 완충액 속에서 용출시켰다.
6μl의 DNA를 15주기의 STAR 1(초기 폴리머라제 활성화의 경우 95℃에서 10분 동안, 이후에 15 주기의 96℃에서 30초; 65℃에서 1분; 58℃에서 6분; 60℃에서 8분; 65℃에서 4분 및 72℃에서 30초; 및 72℃에서 2분 동안의 최종 연장)으로 7.5 nM 프라이머 농도의 19,488개의 표적-특이적인 태그된 역방 프라이머 및 500nM에서 하나의 라이브러리 어댑터 특이적인 전방 프라이머를 사용하여 증폭시켰다.
반-내포된 PCR 프로토콜은 1000nM의 역 태그 농도, 및 각각의 19,488개의 표적-특이적인 전방 프라이머에 대해 20nM의 농도를 사용하는 15 주기(STAR 2)(초기 폴리머라제 활성화의 경우 95℃에서 10분, 이후에 15주기의 95℃에서 30초; 65℃에서 1분; 60℃에서 5분; 65℃에서 5분 및 72℃에서 30초; 및 72℃에서 2분 동안의 최종 연장) 동안의 STAR 1 생성물의 희석물의 제2 증폭을 포함하였다
이후에, STAR 2 생성물의 분취량을 1μM의 태그-특이적인 전방 및 바코드화된 역방 프라이머를 사용한 12 주기 동안 표준 PCR로 증폭시켜 바코드화된 서열분석 라이브러리를 생성하였다. 각각의 라이브러리의 분취량을 상이한 바코드의 라이브러리와 혼합하고 회전 컬럼을 사용하여 정제하였다.
당해 방식으로, 19,488개의 프라이머를 단일-웰 반응에서 사용하였다; 프라이머를 염색체 1, 2, 13, 18, 21, X 및 Y에서 발견된 SNP를 표적화하도록 설계하였다. 이후에, 앰플리콘을 ILLUMINA GAIIX 서열분석기를 사용하여 서열분석하였다. 혈장 시료의 경우, 대략 1,000만 개의 판독물을 서열분석기에 의해 생성시키고, 940 내지 960만 개의 판독물을 게놈에 맵핑하고(94 내지 96%), 이들 중, 99.95%가 460의 판독물의 평균 깊이 및 350의 판독물의 중간 깊이로 표적화된 SNP에 맵핑하였다. 비교를 위해, 완전하게 균일한 분포는 다음과 같다: 10M 판독물 / 19,488개의 표적 = 513 판독물/표적. 프라이머-이량체의 경우, 30,000개의 판독물은 서열분석된 프라이머-이량체(서열분석기에 의해 생성된 판독물의 0.3%)에서 기원하였다. 게놈 시료의 경우, 판독물의 99.4 내지 99.7%가 게놈에 맵핑되었으며, 이들 중, 표적 SNP에 맵핑된 99.99%, 및 서열분석기에 의해 생성된 판독물의 0.1%가 프라이머-이량체이었다.
1,000만 개의 서열분석 판독물의 경우, 전형적으로 19,488개의 표적화된 SNP 중 적어도 19,350개(99.3 %)가 증폭되고 서열분석되었다. 2M 서열분석 판독물을 지닌 DNA 시료의 경우, 전형적으로 적어도 19,000개의 표적화된 SNP(97.5%)가 증폭되고 서열분석되었다. 판독물의 수는 보다 낮고 서열분석기는 증폭된 생성물중 일부를 잃으므로 보다 낮은 수는 시료 노이즈에 기인할 수 있다. 경우에 따라, 서열분석 판독물의 수를 증가시켜 증폭되고 서열분석되는 표적화된 SNP의 수를 증가시킬 수 있다.
관련된 모계 및 부계 게놈성 DNA 시료를 STAR 1에서 7.5 nM의 반-내포된 19,488개의 외부 전방 프라이머 및 태그된 역방 프라이머를 사용하여 증폭시켰다. STAR 2의 열주기 조건 및, 바코드화 PCR은 반-내포된 프로토콜의 경우와 동일하였다.
407개 시료에 대한 평균 태아 분획은 14.8%인 것으로 밝혀졌다. 서열분석 데이터를 본원에 개시된 정보학 방법을 사용하여 분석하고 배수성 상태를 이의 DNA가 407개의 모계 혈장 시료중 378개에 존재하고 407개의 모계 혈장 시료 중 375개의 염색체 X에 존재하는 태아에 대해 4개의 염색체(13, 18, 21, Y)에서 요청하였다. 당해 세트에서 모든 1,887개의 염색체에 대한 배수성 요청은 90% 초과의 신뢰도로 정확하게 요청되었다. 1887개의 요청 중 1882개는 95%를 초과하였고; 1887개의 요청 중 1,862개는 99% 초과의 신뢰도로 요청되었다.
유사한 대조군 실험을 혈장 PCR 프로토콜에서 혈장에서 추출한 DNA 대신 물을 사용하여 수행하였다. 실험의 이러한 접근법중 6개를 기반으로, 서열분석된 판독물의 5 내지 6%가 프라이머-이량체이었다. 다른 서열분석된 판독물은 배경 노이즈에 기인하였다. 당해 실험은, 심지어 하이브리드화시키기 위한(다른 프라이머에 하이브리드화하여 증폭된 프라이머 이량체를 형성시키기 보다는) 프라이머에 대한 표적 유전자자리를 지닌 핵산 시료의 존재하에서 조차 프라이머 이량체가 거의 형성되지 않음을 입증한다.
실험 16
다음의 실험은 본 발명의 다중화된 PCR 방법 중의 어느 것에서 사용될 수 있는 프라이머의 라이브러리를 설계하여 선택하는 예시적인 방법을 나열한다. 당해 목표는 단일 반응으로 다수의 표적 유전자자리(또는 표적 유전자자리의 소세트)를 동시에 증폭시키는데 사용될 수 있는 후보물 프라이머의 초기 라이브러리에서 프라이머를 선택하기 위한 것이다. 초기 세트의 후보물 표적 유전자자리에 대해, 프라이머는 각각의 표적 유전자자리에 대해 설계되거나 선택되어야할 필요가 없었다. 바람직하게는, 프라이머는 대부분의 바람직한 표적 유전자자리의 거대 부위에 대해 설계되고 선택된다.
단계 1
후보물 표적 유전자자리(예: SNP)의 세트를 표적 집단내 SNP의 빈도 또는 SNP의 이형접합성 비율과 같은, 표적 유전자자리에 대한 바람직한 매개변수에 대해 공공 이용가능한 정보를 기반으로 선택하였다(worldwide web at ncbi.nlm.nih.gov/projects/SNP/; Sherry ST, Ward MH, Kholodov M, 등 dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11, 이의 전문은 각각 참조로 포함된다). 각각의 후보물 유전자자리에 대해, 하나 이상의 PCR 프라이머 쌍을 Primer3 프로그램(primer3.sourceforge.net; libprimer3 release 2.2.3에서의 범세계 통신망, 이의 전문은 참조로 각각 포함된다). 특정한 표적 유전자자리에 대한 PCR 프라이머에 대해 실현가능한 설계가 존재하지 않는 경우, 표적 유전자자리를 추가 고려에서 배제하였다.
경우에 따라, "표적 유전자자리 점수"(보다 높은 점수는 보다 높은 바람직성을 나타낸다)는 표적 유전자자리에 대해 다양한 목적한 매개변수의 중량 평균을 기반으로 계산된 표적 유전자자리 점수와 같은, 표적 유전자자리 대부분 또는 모두에 대해 계산될 수 있다. 매개변수는, 프라이머가 사용될 특정한 적용에 대한 이들의 중요성을 기반으로 상이한 중량으로 지정될 수 있다. 예시적인 매개변수는 표적 유전자자리의 이형접합성 비, 표적 유전자자리에서 서열과 관련된 질병 유병률(예를 들면, 다형성), 표적 유전자자리에서 서열과 관련된 질병 침투능(예를 들면, 다형성), 표적 유전자자리를 증폭시키는데 사용된 후보물 프라이머(들)의 특이성, 표적 유전자자리를 증폭시키는데 사용된 후보물 프라이머(들)의 크기, 및 표적 앰플리콘의 크기를 포함한다.
단계 2
열역학적 상호작용 점수는 각각의 프라이머와 모든 프라이머 사이에서 단계 1의 다른 모든 표적 유전자자리에 대해 계산되었다(참조: 예를 들면, Allawi, H. T. & SantaLucia, J., Jr. (1998), "Thermodynamics of Internal C-T Mismatches in DNA", Nucleic Acids Res . 26, 2694-2701; Peyret, N., Seneviratne, P. A., Allawi, H. T. & SantaLucia, J., Jr. (1999), "Nearest-Neighbor Thermodynamics and NMR of DNA Sequences with Internal A-A, C-C, G-G, and T-T Mismatches", Biochemistry 38, 3468-3477; Allawi, H. T. & SantaLucia, J., Jr. (1998), "Nearest-Neighbor Thermodynamics of Internal A-C Mismatches in DNA: Sequence Dependence and pH Effects", Biochemistry 37, 9435-9444.; Allawi, H. T. & SantaLucia, J., Jr. (1998), "Nearest Neighbor Thermodynamic Parameters for Internal G-A Mismatches in DNA", Biochemistry 37, 2170-2179; and Allawi, H. T. & SantaLucia, J., Jr. (1997), "Thermodynamics and NMR of Internal G-T Mismatches in DNA", Biochemistry 36, 10581-10594; Mμl tiPLX 2.1 (Kaplinski L, Andreson R, Puurand T, Remm M. Mμl tiPLX: automatic grouping and evaluation of PCR primers. Bioinformatics. 2005 Apr 15;21(8):1701-2, 이들의 전문은 각각 본원에 참조로 포함된다). 당해 단계는 상호작용 점수의 2D 매트릭스를 생성하였다. 상호작용 점수는 2개의 상호작용하는 프라이머를 포함하는 프라이머-이량체의 가능성을 예측하였다. 점수는 다음과 같이 계산하였다:
상호작용_점수 = max(- deltaG_2, 0.8 * (- deltaG_1))
여기서, deltaG_2는 말단 둘 모두, 즉, 다른 프라이머에 어닐링하는 각각의 프라이머의 3' 말단에서 PCR에 의해 연장가능한 이량체에 대한 깁스 에너지(Gibbs energy)(이량체를 파괴하는데 요구되는 에너지)이고;
deltaG_1은 적어도 하나의 말단에서 PCR에 의해 연장가능한 이량체에 대한 깁스 에너지이다.
단계 3:
각각의 표적 유전자자리에 대해, 하나 이상의 프라이머-쌍 설계가 존재한 경우, 다음 방법을 사용하여 하나의 설계를 선택하였다:
1 유전자자리에 대한 각각의 프라이머-쌍 설계를 위해, 다른 모든 표적 유전자자리에 대한 모든 설계의 모든 프라이머 및 설계에 있어서 2개의 프라이머에 대한 최악의 경우(최대)의 상호작용 점수를 찾는다.
2 최대(최저)로 나쁜 경우의 상호작용 점수를 사용하여 설계를 선택한다.
단계 4
그래프를 구축하여 각각의 절(node)이 하나의 유전자자리 및 이의 관련된 프라이머-쌍 설계(예를 들면, 최대 클릭 문제(Maximal Clique problem))를 나타내도록 하였다. 하나의 모서리(edge)를 절의 매 쌍 사이에 생성시켰다. 중량을 모서리에 연결된 2개의 절과 관련된 프라이머 사이에 최악의-경우(최대) 상호작용에 동등한 각각의 모서리에 지정하였다.
단계 5
경우에 따라, 하나의 설계의 프라이머 중 하나 및 다른 설계의 프라이머 중 하나가 중첩된 표적 영역에 어닐링될 수 있는 2개의 상이한 표적 유전자자리에 대한 설계의 매 쌍에 대해, 추가의 모서리를 2개의 설계에 대한 절 사이에 가하였다. 이들 모서리의 중량을 단계 4에서 지정한 최대 중량과 동일하게 설정하였다. 따라서, 단계 5는 라이브러리를 중첩하는 표적 영역에 어닐링할 수 있는 프라이머를 갖는 것을 방지하므로, 다중 PCR 반응 동안 서로를 방해한다.
단계 6
초기의 상호작용 점수 한계는 다음과 같이 계산하였다:
중량_역치 = max(_모서리__중량) - 0.05 * (max(_모서리__중량) - min(_모서리__중량))
여기서,
max(_모서리__중량)는 그래프에서 최대 모서리 중량이고;
min(_모서리__중량)는 그래프에서 최소 모서리 중량이다.
한계에 대한 초기 결합은 다음과 같이 설정하였다:
max__중량__역치 = max(_모서리__중량)
min__중량__역치 = min(_모서리__중량)
단계 7
단계 5의 그래프와 동일한 세트의 절로 이루어지고, 중량_역치를 초과하는 중량을 지닌 모서리 만을 포함하는 새로운 그래프를 구성하였다. 따라서, 단계는 중량_역치와 동일하거나 미만인 점수를 갖는 상호작용은 무시한다.
단계 8
절(및 제거된 절에 연결된 모서리 모두)을 모서리가 남아있지 않을 때까지 단계 7의 그래프에서 제거하였다. 다음 과정을 적용시킴으로써 절을 반복적으로 제거하였다:
1. 최대 정도를 지닌 절을 찾는다(모서리의 최대 점수). 1개 이상이 존재하는 경우 하나를 임의로 선택한다.
2. 상기 선택된 절 및 이에 연결된 절 모두로 이루어지지만, 위에서 선택된 절 미만의 정도를 갖는 어떠한 절을 제외하는 절의 세트를 정의한다.
3. 단계 1의 최소 표적 유전자자리 점수(보다 낮은 점수는 보다 낮은 바람직성을 나타낸다)를 갖는 세트에서 절을 선택한다. 그래프에서 절을 제거한다.
단계 9
그래프에 남아있는 절의 수가 다중화된 PCR 혼주물에 대한 표적 유전자자리의 요구된 수를 만족시키는 경우(허용되는 내성 내에서), 당해 방법을 단계 10에서 지속하였다.
그래프에 너무 많거나 너무 적은 절이 존재하는 경우, 이원 조사를 수행하여 어떤 한계 값이 그래프에 남아있는 절의 바람직한 수를 야기할 수 있는지를 측정하였다. 다음에 그래프에 너무 많은 절이 존재한 경우, 중량 한계 결합을 다음과 같이 조절하였다:
max__중량__역치 = 중량_역치
또한(그래프에 2개의 약간의 절이 존재한 경우), 중량 한계 결합을 다음과 같이 조절하였다:
min__중량__역치 = 중량_역치
이후에, 중량 한계를 다음과 같이 조절하였다:
중량_역치 = (max__중량__역치 + min__중량__역치) / 2
단계 7 내지 9를 반복하였다.
단계 10
그래프에 남아있는 절과 관련된 프라이머-쌍 설계를 프라이머의 라이브러리를 위해 선택하였다. 당해 프라이머 라이브러리를 본 발명의 방법 중의 어느 것에도 사용할 수 있다.
경우에 따라, 프라이머를 설계하고 선택하는 방법을 프라이머 라이브러리에 대해 수행하였으며, 여기서 단지 하나의 프라이머(프라이머 쌍 대신)을 표적 유전자자리의 증폭에 사용한다. 이 경우에, 절은 표적 유전자자리(프라이머 쌍보다는) 당 하나의 프라이머를 나타낸다.
실험 17
도 27은 본 발명의 방법을 사용하여 설계된 2개의 프라이머 라이브러리를 비교한 그래프이다. 당해 그래프는 각각의 프라이머 라이브러리에 의해 표적화된 특정한 작은 대립유전자 빈도를 지닌 유전자자리의 수를 나타낸다. "신규 혼주물" 라이브러리의 선택 동안에, 보다 많은 프라이머를 유지하였다. 당해 라이브러리는 보다 많은 표적 유전자자리, 특히 비교적 많은 작은 대립유전자 빈도(이는 태아 염색체 비정상을 검출하기 위한 것과 같이, 본 발명의 일부 방법을 위해 보다 유익한 대립유전자가다)를 지닌 표적 유전자자리의 증폭을 가능하도록 한다.
이들 프라이머 라이브러리를 다음의 다중 PCR 방법에 사용하였다. 혈액(20-40 mL)을 각각의 피험자로부터 2 내지 4개의 CELL-FREE^TM DNA 튜브(제조원: Streck)에 수집하였다. 혈장(최소 7 mL)을 각각의 시료 2,000g에서 20분 동안에 이어서 3,220g에서 30분 동안의 이중 원심분리 프로토콜을 통해 분리하고, 상층액을 최초 회전 후 이전시켰다. cfDNA를 7 내지 20 mL 혈장에서 QIAGEN QIAamp 순환 핵산 키트를 사용하여 분리하고 45μl의 TE 완충액 속에 용출시켰다. 순수한 모계 게놈성 DNA를 첫번째 원심분리 후 수득된 연막에서 분리하고, 순수한 부계 게놈성 DNA는 혈액, 타액 또는 볼 시료로부터 유사하게 제조하였다.
모계 cfDNA, 모계 게놈성 DNA, 및 부계 게놈성 DNA 시료를 15 주기 동안 11,000개의 표적-특이적인 검정을 사용하여 예비-증폭시키고 분취량을 15 주기의 제2의 PCR 반응물에 중첩된 프라이머를 사용하여 이전시켰다. 최종적으로, 시료를 제3의 12-주기 라운드의 PCR에서 바코드화된 태그를 가함으로써 서열분석을 위해 제조하였다. 따라서, 11,000개의 태그를 단일 반응에서 증폭시켰으며; 표적은 염색체 13, 18, 21, X, 및 Y에서 발견된 SNP를 포함하였다. 이후에 앰플리콘을 ILLUMINA GAIIx 또는 HISEQ 서열분석기를 사용하여 서열분석하였다. 부계 표현형을 태아 유전형보다 더 낮은 판독물 깊이(~20%의 cfDNA 판독물 깊이)에서 서열분석하였다.
실험 18
경우에 따라, PCR 생성물의 크기 및 양을 Agilent Technologies 2100 Bioanalyzer와 같은 표준 방법을 사용하여 분석할 수 있다(도 28a 내지 도 28m). 예를 들면, 중첩의 부재하에 본원에 기술된 직접적인 PCR 방법을 2,400-플렉스 시험(도 28b 내지 도 28g) 및 19,488-플렉스 실험(도 28h 내지 도 28m)에서 사용하였다. 프라이머의 양은 도 28b 내지 도 28d 및 도 28h 내지 도 28j의 경우 10 nM이었다. 프라이머의 양은 도 28e 내지 도 28g 및 28k 내지 28m의 경우 1 nM이었다. 투입 DNA의 양은 도 28b, 도 28e, 도 28h, 및 도 28k의 경우 24 ng이었고, 도 28c, 도 28f, 도 28i, 및 도 28l의 경우 80 ng 이었으며, 도 28d, 도 28g, 도 28j, 및 도 28m의 경우 250 ng이었다. 추가의 투입 DNA는 목적한 180개 염기쌍 생성물의 보다 높은 비율을 생성하였다. 140개 염기 쌍에서 피크는 프라이머 이량체 생성물이었다.
실험 19
원리 증명 연구는 모든 염색체에 걸쳐 동일하게 높은 정밀도로 T13, T18, T21, 45,X, 및 47,XXY의 검출을 입증하였다.
환자
임신한 쌍을 지방법에 따라 기관감사위원회(Institutional Review Board)에 의해 승인된 프로토콜 하에 구체적인 산전 보호 센터에 등록시켰다. 포함 기준은 적어도 18세 이상의 연령, 적어도 9주의 잉태 연령, 단태 임신, 및 서명된 고지된 동의서였다. 혈액 시료를 임신한 모친에서 채혈하고, 혈액 또는 볼 시료를 부친에서 수집하였다. T13(파타우 증후군(Patau Syndrome))을 지닌 2명의 임산부, T18(에드워드 증후군)을 지닌 2명의 임산부, T21(다운 증후군)을 지닌 2명의 임산부, 45,X를 지닌 2명의 임산부, 47,XXY를 지닌 2명의 임산부, 및 90명의 정상의 임산부를 ~500명의 여성의 집단에서 시험 전에 선택하여 어느 염색체 비정상을 당해 방법이 검출하는지를 시험하였다. 정상의 태아 핵형을 시료에 대한 분자 핵형분석으로 확인하였으며, 여기서 출생 후 태아 조직이 이용가능하였다. 정배수성 시료를 저-위험 여성에서 침입성 시험 전에 수집하였다. 이수성 시료를 침입성 시험 후 적어도 7일째에 수집하고 이수성을 독립된 실험실에서 반응계내 하이브리드화에서 세포유전학적 핵형분석 또는 형광성을 통해 입증하였다.
시료 제조 및 다중 PCR
도 30a 내지 도 30e, 도 30g, 도 30h, 및 31a 내지 도 31g에서의 데이터의 경우, 시료 제조 및 19,488-플렉스-PCR을 실시예 15에 기술된 바와 같이 수행하였다. 도 30f에서의 데이터의 경우, 시료 제조 및 11,000-플렉스-PCR을 실험 17에 기술된 바와 같이 수행하였다.
방법론 및 데이터 분석
알고리즘을 부모계 유전형 및 교차 빈도 데이터(합맵 데이터베이스(HapMap database)의 데이터와 같음)를 고려하여 매우 큰 수의 가능한 태아 배수성 상태에 대해 19,488개의 다형성 유전자자리, 및 다양한 태아 cfDNA 분획에서 예측된 대립유전자 분포를 계산하였다(도 29a 내지 도29 c). 대립유전자 비 기반-방법과는 달리, 이는 또한 연결 불균형을 고려하고, 비-가우시안 데이터 모델을 사용하여 관찰된 플랫폼 특징 및 증폭 편향이 소정의 SNP에서 대립유전자 측정의 예측된 분포를 기술한다. 이후에, 이는 다양한 예측된 대립유전자 분포를 cfDNA 시료(도 29c)에서 측정한 실제 대립유전자 분포에 대해 비교하고, 서열분석 데이터를 기반으로 한 각각의 가설(일염색체성, 이염색체성, 또는 삼염색체성, 이에 대해 다양한 잠재적인 교차를 기반으로 다수의 가설이 존재한다)의 가능성을 계산한다. 알고리즘은 각각의 개체의 일염색체성, 이염색체성, 또는 삼염색체성 가설의 가능성을 합하여(도 29d), 카피 수 및 태아 분획으로서 최대의 전체 가능성으로 배수성 상태를 요청한다(도 29e). 실험실 연구자가 시료 핵형에 대해 맹검처리되어 있지 않았지만, 당해 알고리즘은 사람 중재없이 배수성 상태를 요청하였으며 진실에 대해 맹검처리되었다.
데이터 해석
일반화된 데이터의 그래프적 표시
목적한 염색체의 배수성 상태를 측정하기 위하여, 알고리즘은 염색체당 3,000 내지 4,000개의 SNP에서 각각의 2개의 가능한 대립유전자의 서열 수의 분포를 고려하였다. 알고리즘이 자체를 가시화에 기대지 않는 접근법을 사용하여 배수성 요청을 하는 것을 주목하는 것이 중요하다. 따라서, 나열의 목적을 위해, 데이터를 본원에 A 및 B로 표지된, 2개의 가장 유사한 대립유전자의 비로서 단순화된 양식으로 나타냄으로써 관련된 양상이 보다 용이하게 가시화되도록 할 수 있다. 이러한 단순화된 나열은 알고리즘의 특성 중의 일부를 고려하지 않는다. 예를 들면, 대립유전자 비를 나타내는 가시화 방법을 사용하여 나열하는 것이 불가능한 알고리즘의 2개의 중요한 측면은: 1) 연결 불균형을 지렛대화하는 능력, 즉 하나의 SNP에서 측정이 이웃하는 SNP의 유사한 실체에서 갖는 영향, 및 2) 플랫폼 특성 및 증폭 편향이 소정의 SNP에서 대립유전자 측정의 예측된 분포를 기술하는 비-가우시안 데이터 모델의 용도. 또한, 알고리즘이 각각의 SNP에서 2개의 가장 일반적인 대립유전자만을 고려하며, 다른 가능한 대립유전자는 무시함에 주목한다.
도 30a 내지 도 30h의 그래프적 표시는 이에 대해 2개, 1개, 또는 3개의 태아 염색체가 존재하는 시료를 포함한다. 일반적으로, 이는 정배수성(도 30a 내지 30c) 일염색체성(도 30d), 및 삼염색체성(도 30e 내지 도 30h)을 각각 나타낸다. 모든 플롯에서, 각각의 점은 단일 SNP를 나타내며, 여기서 표적화된 SNP는 수평한 축을 따라 하나의 염색체에 대해 좌측에서 우측으로 순서적으로 플롯팅된다. 수직 축은 이러한 SNP에 대한 A 및 B 대립유전자 두 다에 대한 판독물의 총 수의 분획으로서 A 대립유전자에 대한 판독물의 수를 나타낸다. 측정은 모계 혈액에서 분리한 총 cfDNA에서 수행하였으며 cfDNA는 모계 및 태아 cfDNA 둘 모두를 포함하므로; 각각의 점은 이러한 SNP에 대한 태아 및 모계 DNA 기여의 조합을 나타냄에 주목한다. 따라서, 모계 cfDNA의 비율을 0%에서 100%로 증가시키는 것은 모계 및 태아 유전형에 따라, 플롯내에서 일부 점을 상향 또는 하향으로 점진적으로 이동시킬 것이다. 이는 상응하는 플롯과 함께 하기에 보다 상세히 기술되어 있다.
가시화를 촉진하는 것이 요구되는 경우, 모계 유전형은 각각의 점의 국재화에 보다 더 기여하며 대부분의 삼염색체는 모에서 유전하므로, 점은 모계 유전형에 따라 색상-코드화될 수 있으며; 이는 배수성 상태를 가시화하는데 보조한다. 구체적으로, 이에 대한 모계 유전형이 AA인 SNP는 적색으로 나타낼 수 있으며, 이에 대한 모계 유전형이 AB인 것은 녹색으로 나타낼 수 있으며, 이에 대한 모계 유전형이 BB인 것은 청색으로 나타낼 수 있다.
모든 경우에서, 모계 및 태아 둘 모두에서 A 대립유전자(AA)에 대해 동종접합성인 SNP는, B 대립유전자가 존재하지 않은 수 있으므로 A 대립유전자의 분획이 높기 때문에, 플롯의 상부 한계와 밀접하게 관련된 것으로 밝혀져 있다. 역으로, 모계 및 태아 둘 모두에서 B 대립유전자에 대해 이형접합성인 SNP는, B 대립유전자만이 존재하여야 하므로 A 대립유전자 판독물의 분획이 낮기 때문에, tdhe 플롯의 보다 낮은 한계와 밀접하게 관련된 것으로 밝혀져 있다. 플롯의 보다 높은 및 보다 낮은 제한과 밀접하게 관련되지 않은 스폿은, 이의 모, 태아 또는 둘 모두가 이형접합성인 SNP를 나타내고; 이들 스폿은 태아 배수성을 확인하는데 유용하지만, 또한 부계 대 모계 유전성을 측정하는 데 유익할 수 있다. 이들 점은 모계 및 태아 유전형 및 태아 분획 둘다를 기반으로 분리되며, 자체로 y-축을 따라 각각의 개개 점의 상세한 위치는 입체화학적 및 태아 분획 둘 모두에 의존한다. 예를 들면, 모가 AA이고 태자녀 AB인 유전자자리는 A 대립유전자 판독물의 상이한 분획을 가짐으로써 태아 분획에 따라서, y-축을 따라 상이한 위치선정을 갖는 것으로 예측된다.
존재하는 2개의 염색체
도 30a 내지 도 30c는, 시료가 전적으로 모인 경우(태아 cfDNA는 존재하지 않음, 도 30a) 2개의 염색체의 존재가 중간의 태아 cfDNA 분획을 함유하거나(도 30b), 높은 태아 cfDNA 분획을 함유함(도 30c)을 나타내는 데이터를 묘사한다.
도 30a는 임신하지 않은 여성의 혈액에서 분리한 cfDNA로부터 수득한 데이터를 나타낸다. 태아 cfDNA가 존재하지 않고 시료가 모계 cfDNA 만을 함유하는 경우, 플롯은 순수하게 정배수성 모계 유전형을 나타내고; 홀마크(Hallmark) 양식은 점의 "집단"을 포함한다: 적색 집단은 점의 상단과 밀접하게 관련되어 있고(모계 유전형이 AA인 SNP), 청색 집단은 점의 하단과 밀접하게 관련되어 있으며(모계 유전형이 BB인 SNP), 하나는 녹색 집단에 집중되어 있다(모계 유전형이 AB인 SNP)(색상은 나타내지 않음).
태아 cfDNA가 존재하는 경우, 점의 위치는, 집단이 별개의 "밴드"로 분리되도록 이동한다. 태아 분획이 0%인 시료의 경우, 점의 그룹화는 "집단"으로 언급되며(도 30a에서와 같이), 태아 분획이 >0%인 모든 시료의 경우, 점의 그룹화는 "밴드"로 언급된다(도 30b 내지 도 30j에서와 같이). 태아 분획이 충분히 높은 경우, 이들 별개의 밴드는 용이하게 가시화될 것이다. 구체적으로, 도 30b 및 도 30c는 각각 중간 및 고 태아 분획으로 나타난 2개의 태아 염색체와 관련된 특징적인 양식을 입증한다. 이러한 양식은 모에서 이형접합성인 SNP에 상응하는 3개의 중심 녹색 밴드, 및 모에서 동형접합성인 SNP에 상응하는 점의 상단(적색) 및 하단(청색) 둘 모두에서 2개의 "주변" 밴드를 포함한다(색상은 나타내지 않음).
도 30b는 정배수성 태아를 지니고 12%의 태아 cfDNA 분획을 지닌 여성의 혈장 시료에서 분리한 cfDNA로부터 수득한 데이터를 나타낸다. 여기서, 플롯의 상단 및 하단과 밀접하게 연관된 점의 집단을 각각 2개의 명백한 밴드로 분리하였다: 플롯의 상부 또는 하부 한계와 밀접하게 관련되어 남아있는 하나의 적색 및 하나의 청색 외부 주변 밴드, 및 플롯의 한계에서 분리된 하나의 적색 및 하나의 청색 내부 주변 밴드(색상은 나타내지 않음). 0.92 및 0.08 주변에 집중된, 이들 내부 주변 밴드는, 이에 대한 모계 유전형이 AA이고 태아 유전형이 AB(적색으로 나타냄)인 SNP, 및 모계 유전형이 BB이고 태아 유전형이 AB인 SNP(청색으로 나타냄) 각각을 나타낸다. 녹색 점의 중심 집단은 확장되지만, 당해 태아 분획에서 명백한 밴드로의 분리는 용이하게 가시적이지 않다.
높은 태아 cfDNA 분획에서, 2개의 염색체의 존재(3개의 녹색 밴드 및 또한 2개의 적색 및 2개의 청색 주변 밴드)는 용이하게 명확하다(색상은 나타내지 않음). 도 30c은 26%의 태아 cfDNA 분획에서 정배수성 태아를 지닌 여성의 혈장 샘플에서 수득한 데이터를 나타낸다. 여기서, 주변 밴드를 분리함으로써 내부 밴드가 증가된 태아 cfDNA 분획에서 B 대립유전자의 변경된 수준으로 인한 플롯의 중심을 향해 이동하도록 한다. 유의적으로, 보다 높은 태아 분획에서, 중심 녹색 집단의 3개의 명백한 밴드로의 분리는 이제 용이하게 명확해진다. 이 경우에 0.37, 0.50 및 0.63 주변에 집단화된, 이러한 중심 3개의 밴드는 이들 SNP에 상응하며 여기서 모계 유전형은 AB이고, 태아 유전형은 AA(상단), AB(중간) 및 BB(하단)이다.
이들 홀마크 양식, 즉 3개의 녹색 밴드 및 4개의 주변 밴드(2개의 적색 및 2개의 청색)은 여성(XX) 태아에서 X 염색체 또는 자가 정배수성에서와 같은 2개의 염색체의 존재를 나타낸다.
존재하는 하나의 염색체
태자녀 단지 단일 염색체를 유전받아서, 단지 단일 대립유전자를 유전받은 경우, 태아의 이형접합성은 불가능하다. 이와 같이, 유일하게 가능한 태아 SNP 실체는 A 또는 B이다. 따라서, 모계-유전된 일염색체성 염색체는, 이에 대한 모가 이형접합성인 SNP를 나타내는 2개의 중심 녹색 밴드의 특징적인 패턴을 가지고 있으며, 이에 대한 모가 동형접합성이고, 플롯의 상한 및 하한(1 및 0)과 밀접하게 관련되어 남아있는 SNP를 나타내는 단일의 주변 적색 및 녹색 밴드를 각각 갖는다(도 30d) (색상은 나타내지 않음). 내부 주변 밴드의 부재에 주목한다. 이러한 양식은 모계-유전된 상염색체 일염색체에서와 같이 하나의 염색체의 존재, 또는 남자(XY) 태아에서 X 염색체의 경우를 나타낸다.
존재하는 3개의 염색체
삼염색체성 염색체는 3개의 특징적인 양식을 갖는다. 제1 양식은 모계-유전된 감수분열 삼염색체성, 태자녀 모친으로부터 2개의 동종접합성의 동일하지 않은 염색체를 유전받은 유사분열 오차를 나타낸다(도 30e); 이러한 양식은 각각 2개의 주변 적색 및 청색 밴드를 지닌 2개의 중심 녹색 밴드를 나타낸다(색상은 나타내지 않음). 제2의 양식은 부계-유전된 유사분열 삼염색체를 나타내고, 여기서 태아는 부친로부터 2개의 동종의, 동일하지 않은 염색체를 유전받는다(도 30f); 이러한 양식은 4개의 중심 녹색 밴드 및 각각 3개의 주변 적색 및 청색 밴드를 포함한다(색상은 나타내지 않음). 제3의 양식은 모-(도 30g) 또는 부계-유전된(도 30h) 유사분열 삼염색체성, 태자녀 모계 또는 부친으로부터 2개의 동일한 염색체를 유전받은 유사분열 오차를 나타내며; 이러한 양식은 각각 2개의 주변 적색 및 청색 밴드를 지닌 4개의 중심 녹색 밴드를 포함한다. 모계- 및 부계-유전된 유사분열 삼염색체성은 플랭킹된 적색 및 청색 밴드의 치환에 의해 적색 및 청색 내부 주변 밴드(이들은 플롯의 한계와 관련되지 않는다)가 부계-유전된 유사분열 삼염색체에서 중심에 보다 가까워지도록 함으로써 구별할 수 있다(색상은 나타내지 않음). 이는 동일한 염색체의 부계 기여에 기인한다. 본 발명자들의 앞서의 결과는, 난할구 단계, 모계-유전된 삼염색체의 66.7%가 유사분열이고, 삼염색체의 10.2% 만이 부계-유전됨을 나타낸다.
Y 염색체의 경우, PS 방법은 상이한 세트의 가설을 고려한다: 0, 1, 또는 2개의 염색체가 존재한다. 각각의 유전자자리에 서열 판독물에 대한 모계 기여가 없고 이형접합성 유전자자리가 가능하지 않기 때문에(2개의 Y 염색체가 필수적으로 2개의 동일한 염색체를 포함하지 않는 경우), 밴드는 플롯의 상단(A 대립유전자) 또는 하단(B 대립유전자)과 밀접하게 관련되어 남으며(데이터 미표시), 분석은 정량적인 대립유전자 수 데이터에 의존하여 크게 단순화된다. 당해 방법은 SNP를 질의할 수 있기 때문에, 이는 Y 염색체의 동종의, 비-재조합체 SNP를 사용하므로 하나의 프로브 쌍에 대해 X 및 Y 둘 모두에 대한 데이터를 수득한다.
이수성의 확인
이러한 플롯-기반 가시화 방법을 사용하는 상염색체 이수성의 확인은 충분한 태아 분획의 제공시 간단해지며, 위에서 기술한 바와 같이 비정상적인 수의 염색체가 존재하는 플롯을 단지 확인하는 단계를 필요로 한다. X 및 Y 염색체의 카피 수의 지식을 합하여 성 염색체 이수성이 존재하는지를 확인한다. 구체적으로, 47,XXX 유전형을 지닌 태아를 나타내는 플롯은 전형적인 "삼-염색체" 양식을 가질 것이며, 47,XXY 유전형을 갖는 태아를 나타내는 플롯은 X 염색체에 대해 전형적인 "이-염색체" 양식을 가질 것이지만, 또한 하나의 Y 염색체의 존재를 나타내는 대립유전자 판독물을 가질 것이다. 당해 방법은 47,XYY를 요청할 수 있으며, 여기서 "하나의-염색체" 양식은 단일의 X 염색체의 존재를 나타내며, 대립유전자 판독물은 2개의 Y 염색체의 존재를 나타낸다. 45,X 유전형을 지닌 태아는 X 염색체에 대해 전형적인 "하나의 염색체" 양식, 및 0개의 Y 염색체를 나타내는 데이터를 가질 것이다.
태아 분획의 효과
위에서 논의한 바와 같이, 태아의 서열 판독물의 수는 플롯에서 y-축을 따라 각각의 점의 상세한 위치에 기여한다. 태아 분획은 태아 및 모에서 기원하는 판독물의 비율에 영향을 미치므로, 이는 또한 각각의 점의 위치화에 영향을 미칠 것이다. 도 30c 내지 도 30e 및 도 30g 및 도 30h에서와 같이, 태아 cfDNA의 고 분획(일반적으로 ~20% 초과)에서, 점 집단이 주로 모계 유전형을 기반으로 하지만, 이의 유전형이 모계 유전형과는 구별되는 대립유전자의 태아 DNA의 존재는 집단을 다수의 명확한 밴드로 이동시킨다. 그러나, 태아 분획이 감소하면서(도 30b 및 30f에서와 같이), 점은 극 및 플롯의 중심을 향해 퇴행하여 더 단단한 클러스터를 생성한다. 구체적으로, 주변 적색 밴드의 세트(여기서 모계 유전형은 AA이다)는 플롯의 상단을 향해 퇴행하고; 주변 청색 밴드의 세트(여기서 모계 유전형은 BB이다)는 하단을 향해 퇴행하며; 중심 녹색 밴드이 세트(여기서 모는 이형접합성이다)는 플롯의 중심에서 단일 집단으로 농축된다(도 30b 및 도 30c와 비교)(색상은 나타내지 않음). 이수성이 낮은 태아 분획의 경우에 이러한 가시화 기술을 사용하여 눈에 의해 용이하게 명백하지 않지만, 당해 알고리즘은 3%의 태아 분획과 같이 매우 낮은 태아 분획을 사용하여 배수성 상태를 확인할 수 있다. 통계 기술은 관찰된 데이터를 소정의 시료 매개변수 세트(카피 수, 부계 유전형, 및 태아 분획 포함)에 대한 대립유전자 분포를 예측하는 매우 정밀한 데이터 모델과 비교하므로 이를 수행하는 것이 가능하다. 데이터 모델 정확성은 낮은 태아 분획의 경우에 중요하며, 이는 상이한 배수성 상태에 대한 대립유전자 분포 사이의 차이가 태아 분획과 비례하기 때문이다. 또한, 알고리즘은, 데이터 세트가 확실한 태아 배수성 측정을 이루기에 충분한 데이터를 함유하지 않는 경우 측정될 수 있다.
결과
표적화된 SNP에 맵핑하는 서열분석 판독물은 유익한 것으로 여겨졌으며 알고리즘에 의해 사용되었다. 95% 이상의 표적화된 유전자자리는 서열 분석 결과에서 관찰되었다. 주요 배수성 요청을 가시화하기 위한 플롯은 도 31a 내지 도 31g에 나타나 있다. 도 31a는 정배수성 시료를 나타낸다. 여기서, 염색체 13, 18, 및 21은 전형적인 "2개 염색체" 양식(본원에 기술된 바와 같음)을 갖는다. 이는 중심의 녹색 밴드, 및 2개의 적색 및 2개의 청색 주변 밴드의 3개조를 갖는다. 이는, X 염색체에 대한 2개의 중심 녹색 밴드 및 플롯의 주변을 따라 Y 염색체 밴드의 존재와 함께, 정배수성 XY 유전형을 나타낸다(색상은 나타내지 않음).
가장 우세한 상염색체 삼염색체, T13, T18, 및 T21는 각각 도 31b, 31c, 및 31d에 나타낸다. 구체적으로, 도 31b는 T13 시료를 나타낸다. 여기서, 염색체 18 및 21은 전형적인 "2개의 염색체" 패턴을 나타내고, 염색체 X는 전형적인 "하나의 염색체" 양식을 나타내고, Y 염색체의 판독물이 존재한다. 함께, 이는 염색체 18 및 21에서 이염색체를 나타내며 태아 XY 유전형을 확인한다. 그러나, 염색체 13은 구체적으로 전형적인 "삼-염색체" 양식을 나타낸다. 유사하게, 도 31c는 T18 시료를 나타내고, 도 31d는 T21 시료를 나타낸다.
당해 방법은 또한 45,X (도 31e), 47,XXY (도 31f), 및 47,XYY (도 31g)를 포함하는 성 염색체 이수성을 검출할 수 있다. 당해 방법은 염색체 13, 18, 21, X, 및 Y에서 카피 수를 요청함에 주목한다; 전체 염색체 수는 나머지 염색체에서 이염색체로 추정되는 것으로 보고되어 있다. 45,X 시료를 나타내는 플롯의 X 염색체 영역은 단일 염색체의 존재를 나타낸다. 그러나, 염색체 13, 18, 및 21에 대한 "2개 염색체" 양식과 커플링된, Y 염색체의 판독물의 결여는 45,X 유전형을 나타낸다. 역으로, 47,XXY 시료는 2개의 X 염색체의 존재를 나타내는 플롯을 생성한다. 당해 데이터는 또한 Y 염색체의 대립유전자에 대한 판독물을 나타낸다. 염색체 13, 18, 및 21의 2개 카피의 존재와 함께, 이는 47,XXY 유전형을 나타낸다. 47,XYY 유전형은 X 염색체에 대한 "하나의 염색체" 양식, 및 2개의 Y 염색체의 존재를 나타내는 판독물의 존재로 나타난다.
논의
당해 방법은 모계 혈액에서 비-침입적으로 T13, T18, T21, 45,X, 47,XXY, 및 47,XYY를 검출하였다. 당해 방법은 19,488 SNP의 표적화된 다중화 PCR 증폭 및 고-배출 서열분석에 의해 모계 혈장의 cfDNA의 정보를 얻는다. 부계 유전형 정보, 및 태아 분획 및 DNA 품질을 포함하는, 다수의 시료 매개변수를 고려하는 방법의 세련된 정보학 분석과 커플링시켜 7개의 가장 일반적인 유형의 출생시 이수성(T13, T18, T21, 45,X, 47,XXX, 47,XXY, 및 47,XYY)과 연루된 5개의 염색체 중 모두에서 태아 신호를 확실하게 검출하여 고도로 정밀한 배수성 요청을 이룬다. 당해 방법은 가장 유의적으로 큰 임상 범위 및 시료-특이적인 계산된 정밀도(개별화된 위험 점수와 유사)를 포함하는, 앞서의 방법보다 다수의 임상 장점을 제공한다.
증가된 임상 범위
당해 방법은 상염색체 삼염색체 및 성 염색체 이수성을 정밀하게 검출하는 이의 능력을 제공하면서, 임상적으로 이용가능한 NIPT 방법론과 비교하여 이수성 범위에 있어서 대략 2배의 증가를 제공한다. 본원에 나타낸 방법은 성 염색체에서 배수성을 고도의 정밀도로 요청하는 유일한 비침입성 시험이다. 선행의 DNA 혼합 장치 및 본 발명자의 실험 검정에서 분석된 분리된 혈장 시료는, 당해 방법이 47,XXX를 포함하는 성 염색체 비정상의 보다 큰 집단을 검출할 것임을 제안한다. 여기서 나타낸 방법은 또한 고 민감성 및 특이성으로 염색체 13, 18, 및 21에서 이수성을 검출하며, 적절한 프라이머 설계와 함께 나머지 염색체에서 또한 카피 수를 검출할 수 있는 것으로 예측된다.
시료-특이적인 계산된 정밀도
유의적으로, 당해 방법은 각각의 시료 속의 각각의 염색체에서 배수성 요청에 대한 시료-특이적인 정밀도를 계산한다. 당해 방법으로 계산된 정밀도는 불량한 품질의 DNA 또는 불량한 정밀도 시험 결과를 야기하는 경향이 있는 저 태아 분획을 갖는 개체의 시료를 확인하여 표시함으로써 부정확한 요청의 비율을 유의적으로 낮추는 것으로 예측된다. 대조적으로, 거대한 평행의 셧건 서열분석(MPSS)-기반 방법은 단일의 가설 거부 시험을 사용하여 양성 또는 음성 요청을 생성하며, 이들의 정밀도 평가는 개체의 시료의 특징보다는 발표된 연구 집단을 기반으로 하므로, 이는 집단과 동일한 정밀도를 갖는 것으로 추측된다. 그러나, 집단 분포의 테일내 매개변수를 갖는 시료에 대한 개체의 정밀도는 유의적으로 상이할 수 있다. 이는 조기 잉태 나이에서와 같은 낮은 태아 분획, 또는 낮은 DNA 품질을 갖는 시료의 경우 악화된다. 이들 시료는 일반적으로 확인되지 않으며 후속처리용으로 표시되지 않으며, 이는 놓친 요청을 야기할 수 있다. 그러나, 본 방법은 태아 분획, 다수의 DNA 품질 기준을 포함하는 많은 매개변수를 고려하여 각각의 염색체 카피 수 요청, 이러한 요청에 대한 시료-특이적인 정밀도를 달성한다. 이는, 당해 방법이 개체의 시료를 낮은 정밀도로 확인하여 후속용으로 이들을 표시하도록 한다. 이는 특히 태아 분획이 전형적으로 낮은 경우 조기 임신 단계에서 놓친 요청을 거의 제거하는 것으로 예측된다. 이러한 가정은, 변경(redraw) 및 재분석을 단순히 필요로 하는 요청이 없으므로, 놓친 요청에 대해서는 매우 바람직한 요청이 아니라는 것이다.
계산된 정밀도의 전통적인 위험 점수로의 전환
당해 방법은 고-위험 임신 여성에 대한 이수성의 조절된 위험을 제공할 수 있으며, 여기서 조절된 위험은 선험적(a priori) 위험을 고려한다(참조: Benn P, Cuckle H, Pergament E. Non-invasive prenatal diagnosis for Down syndrome: the paradigm will shift, but slowly. ultrasound Obstet Gynecol 2012;39:127-130, 이의 전문은 본원에 참조로 포함된다). 본 방법이 각각의 환자에게 통상화된 계산된 정밀도를 제공하지만, 임상 용도를 위해 이들 정밀도는 전통적인 위험 점수로 전환될 수 있으며, 이러한 점수는 또한 이수성 임신의 위험을 나타내지만 분획으로 표현된다. 전통적 위험 점수는 연령-관련 위험 및 생화학적 표지의 혈청 수준을 포함하는 다양한 매개변수를 고려하여, 위험 점수를 초과하는 모는 고-위험으로 고려되고 이에 대해 후속적인 침입성 진단 과정이 추천되는 위험 점수를 부여한다. 당해 방법은 이러한 위험 점수를 유의적으로 개선함으로써, 거짓 양성 및 거짓 음성 비 둘 모두를 감소시키고, 개체의 모 위험의 보다 정밀한 평가를 부여한다. 본원에 사용된 것으로서 계산된 정밀도는, 배수성 요청이 정확하며, 퍼센트로 표현되지만, 실험 19에서 사용된 계산된 정밀도는 연령-관련된 위험을 포함하지 않는 가능성이 있다. 위험 점수의 계산은 전형적으로 연령-관련된 위험을 포함하므로, 계산된 정밀도 및 전통적인 위험 점수는 상호교환가능하지 않으며; 이들은 합하여 전통적인 위험 점수로 전환하여야 한다. 연령-관련된 위험과 계산된 정밀도를 합하기 위한 식은:

(여기서, R₁은 본 방법에 의해 계산된 위험 점수이고 R₂는 첫번째 3개월 스크리닝에 의해 계산된 위험 점수이다)이다.
SNP -기반 방법은 증폭 변수를 지닌 쟁점을 무효화한다
일부 다른 방법에 의해 사용된 계수 방법에 대한 고유의 단점은, 이들이 참조 염색체에 맵핑한 것에 대한 목적한 염색체(예를 들면, 염색체 21)에 맵핑되는 판독물의 수의 비를 측정함으로써 태아 배수성 상태를 측정한다는 점이다. 염색체 13, X, 및 Y를 포함하는, GC 함량이 높거나 낮은 염색체는 고 가변성으로 증폭시킨다. 이는 목적한 염색체의 대립유전자 판독물의 비를 참조 염색체의 비로 변경시킴으로써 카피 수 요청을 혼동할 수 있는, 태아 cfDNA 신호에 대해 규모로 비교가능한 단일 변화를 야기할 수 있다. 이는 염색체 13, X, 및 Y에 대해 낮은 정밀도를 야기할 수 있다. 유의적으로, 이러한 문제는 조기 잉태 연령에서의 경우인 경향에서와 같이, 낮은 태아 cfDNA 분획에서 악화된다.
대조적으로, SNP-계 방법은 염색체 사이에 일관된 증폭 수준에 의존하지 않으므로, 모든 염색체에 걸쳐 동등하게 정밀한 결과를 제공하는 것으로 예측된다. 본 방법은 부분적으로 다형성 유전자자리에서 상이한 대립유전자의 상태적인 수를 고찰하며, 이에 의해 정의가 단일 뉴클레오타이드에 의해서만 상이하므로, 이는 참조 염색체의 사용을 필요로 하지 않으며, 이는 판독물 수를 정량화하는데 의존하는 방법에 대해 고유한 염색체-대-염색체 증폭 변수를 지닌 문제를 필요없도록 한다. 정배수성인 참조 염색체를 필요로 하는 정량적 방법과는 달리, 본 방법은 편친 이염색체성과 유사한 카피 수 중성 비정상 및 또한 삼염색체를 검출할 수 있을 것으로 예측된다.
조기 검출의 중요성
유의적으로, 성 염색체 이수성의 결합된 출생시 우세성은 가장 일반적인 상염색체 이수성의 것보다 더 크다(도 32). 그러나, 성염색체 비정상을 신뢰있게 검출하는 정규적인 비-침입성 스크리닝 방법이 현재 존재하지 않는다. 따라서, 성 염색체 비정상은 일반적으로 다운 증후군 또는 다른 상염색체 이수성에 대한 정규 시험의 부작용으로서 출생전에 일반적으로 검출되며; 경우들의 많은 비율이 전적으로 놓쳐진다. 조기 및 정밀한 검출은, 조기 치료학적 개입이 임상 결과를 개선시키는 많은 이들 질환에 중요하다. 예를 들면, 터너 증후군은, 이의 전체 출생시 유병률이 2,500명의 여성 중 1명이지만, 청소년기 때까지 진단되지 않는다. 성장 호르몬 치료요법이 당해 질환으로부터 야기되는 단신(short stature)을 방지하는 것으로 알려져 있지만, 치료는 4세 이전에 개시하는 경우 유의적으로 보다 효과적이다. 또한, 에스트로겐 대체 치료요법은 터너 증후군을 지닌 환자에서 2차 성징을 자극할 수 있으나, 다시 치료요법은 10대 이전에, 증후군이 일반적으로 검출되기 전에 개시되어야 한다. 이와 함께, 이는 성 염색체 이수성의 조기, 정규의, 및 안전한 검출의 중요성을 강조한다. 당해 방법은 성 염색체 비정상에 대한 정규의 스크린으로서 제공되는 가능성을 지닌 첫번째 접근법을 제공한다.
추가의 적용
당해 방법은 표적화된 증폭을 이용하므로, 이는 미세결실 및 미세복제와 같은 초현미경적 비정상을 검출하기 위해 유일하게 유지된다. MPSS와 같은 비-표적화된 방법이 디조지 미세결실 증후군(DiGeorge microdeletion syndrome)을 검출하는 것으로 밝혀져 있으나, 이는 당해 접근법이 실행할 수 없도록 하기에 충분히 높은 수준의 게놈 범위를 필요로 하였다. 이는, 서열분석 판독물의 매우 작은 분획이 유익할 것이므로, 비-표적화된 증폭이 초현미경적 영역에 효과적이지 않은 수개의 자릿수일 것이기 때문이다. 또한, 현재 이용가능한 방법이 성 염색체에 대한 배수성 상태를 정밀하게 확인하는데 문제가 있다는 사실은, 이들이 또한 보다 작은 염색체 분절에서 다양한 증폭 문제에 직면할 것임을 제안한다.
유사하게, SNP 기반 방법은 세포유전학적 핵형 및/또는 형광성 반응계내 하이브리드화에 의존하는 CVS 및 양수천자와 같은 전통적인 침입성 방법 또는 계수에 의존한 현재의 비침입성 방법에 의해 검출되지 않을 카피 수-중성 비정상인 UPD 질환을 검출할 수 있다. 이는, SNP-기반 방법이 유일하게 개체의 일배체형을 구별할 수 있지만 임상적으로 이용가능한 MPSS-기반 및 표적화된 방법은 비-다형성 유전자자리를 증폭시킴으로써 예를 들면, 목적한 염색체가 동일한 부모에서 기원하는지를 측정할 수 없기 때문이다. 이는, 프라더-윌리(Prader-Willi), 안젤만(Angelman), 및 벡크위드-와이드만 증후군(Beckwith-Wiedemann syndrome)을 포함하는 이들 미세결실/미세중복 및 UPD 증후군이 일반적으로 산전에 진단되지 않으며, 흔히 출산 후 초기에 잘못 진단됨을 의미한다. 이는 치료학적 중재를 유의적으로 지연시킨다. 또한, 당해 방법은 SNP를 표적화하므로, 당해 방법은 또한 부모 일배체형 재작제를 촉진시켜, 개체의 질병-연결된 유전자자리의 태아 유전의 검출을 허용한다(참조: Kitzman JO, Snyder MW, Ventura M, 등 Noninvasive whole-genome sequencing of a human fetus. Sci Transl Med 2012;4:137ra76, 이의 전문은 본원에 참조로 포함된다).
본원에 나타낸 결과는 산전 이수성을 확인하기 위한 당해 방법의 확장된 영역을 입증한다. 구체적으로, 19,488개의 SNP를 증폭시키고 서열분석함으로써, 당해 방법은 염색체 13, 18, 21, X, 및 Y에서 카피 수를 검출할 수 있으며, 어떠한 다른 임상적으로 이용가능한 비-침입성 방법에 의해 검출되지 않는 삼배체 및 UPD와 같은 다른 염색체 비정상을 검출하기 위해 유일하게 예측된다. 증가된 임상 범위 및 강력한 단순하고-특이적인 계산된 정밀도는, 당해 방법이 태아 염색체 이수성을 검출하기 위한 침입성 시험에 대한 가치있는 부가물을 제공할 수 있음을 제안한다.
본원에 인용된 모든 특허, 특허원, 및 발표된 참조문헌은, 이의 전문이 참조로 본원에 포함되어 있다. 본원의 방법이 이의 특정한 구현예와 관련하여 기술되어 있지만, 이는 추가로 변형될 수 있음이 이해될 것이다. 또한, 본 출원은 본원의 방법이 속한 분야에 공지되거나 통상의 실시 내에 있고, 첨부된 특허청구범위의 영역에 속하는 것으로서 본 기재내용으로부터 벗어남을 포함하는, 어떠한 변화, 사용, 또는 본원의 방법의 조정을 포함하는 것으로 의도된다. 예를 들면, DNA에 대한 본원에 개시된 방법 중 어느 것도 RNA를 DNA로 전환시키는 역 전사 단계를 포함함으로써 RNA에 대해 용이하게 조정될 수 있다. 설명을 위해 다형성 유전자자리를 사용하는 예는 경우에 따라 비 다형성 유전자자리의 증폭을 위해 용이하게 조정될 수 있다.The presently disclosed embodiments will be further described with reference to the accompanying drawings, wherein like structures are referred to by like numerals throughout the several views. The depicted figures are not necessarily to scale, but are instead generally emphasized when describing the principles of the presently disclosed embodiments.
Figure 1 is a graphical depiction of a direct multiplexed mini-PCR method.
Figure 2 is a graphical depiction of an anti-nested mini-PCR method.
Figure 3 is a graphical depiction of a fully overlapped mini-PCR method.
Figure 4 is a graphical depiction of a hemi-nested mini-PCR method.
Figure 5 is a graphical depiction of a tri-hemi-nested mini-PCR method.
Figure 6 is a graphical depiction of a cross-sectioned mini-PCR method.
Figure 7 is a graphical depiction of a cross-sectional mini-PCR method.
Figure 8 is a graphical depiction of an anti-semi- nested mini-PCR method.
Figure 9 is some possible flow diagrams for an anti-nested method.
Figure 10 is a graphical depiction of a looped connection adapter.
Figure 11 is a graphical depiction of internally tagged primers.
Figure 12 is an example of some primers with internal tags.
Figure 13 is a graphical depiction of a method using a primer with a linking adapter binding region.
Figure 14 is a simulated drain request accuracy for a counting method using two different analytical techniques.
15 is the ratio of two alleles to a number of SNPs in the cell line of Experiment 4. Fig.
Figure 16 is the ratio of two alleles to multiple SNPs in the cell line of Experiment 4 classified by chromosome.
Figures 17a-d are ratios of two alleles to multiple SNPs in a pregnant female plasma sample, sorted by chromosome.
Figure 18 is a fraction of the data that can be explained by the binomial variance before and after data correction.
Figure 19 is a graph showing the relative concentration of fetal DNA in a sample after a short library manufacturing protocol.
Figure 20 is the depth of the read graph comparing direct PCR and semi-nested methods.
Figure 21 is a comparison of the depth of readings for direct PCR of three genomic samples.
Figure 22 is a comparison of the depth of the readings for a semi-nested mini-PCR of three samples.
Figure 23 is a comparison of the depth of readings for 1,200-plex and 9,600-plex reactions.
Figure 24 is the ratio of the number of readings to six cells in three chromosomes.
Figure 25 is the allele ratio for two three-cell reactions performed on 1 ng of genomic DNA in three chromosomes and a third reaction run.
Figure 26 is the allele ratio for two single-cell reactions on three chromosomes.
Figure 27 is a comparison of two primer libraries representing multiple gene sites with a specific mini allele frequency targeted by each primer library.
28A is an electrophoresis graph of a PCR product.
Figures 28b-28m are the electropherograms of lanes 1 through 12, respectively, in Figure 28a.
29A-29E are illustrations depicting the method of the invention for measurement of fetal insulin resistance (FIG. 29A). Using cross-frequency data derived from maternal and parental genotype data (measured using blood or oral swabs) and the HapMap database, a number of independent hypotheses for each potential fetal drainage status (FIG. 29c) 29b). Each hypothesis was expanded to include sub-hypotheses that considered different possible intersection points. The data model predicts which sequence analysis data may resemble a given angiogenic fetal genotype and a different fetal cfDNA fraction and is compared with the actual sequence analysis data; The probability for each hypothesis is measured using Bayesian statistics. In this hypothetical embodiment, a hypothesis having a maximum probability (positive water hygroscopicity) is measured (Fig. 29D). The individual possibilities in Figure 29c are summarized for each copy number hypothesis series (one chromosomal, dichromic, or triploid). The hypothesis with the greatest probability is termed a mydriatic state, representing the fetal fraction and showing sample-specific calculated precision (Figure 29E).
Figures 30a-30h are representative graphical representations of the electrophysiology (Figures 30a-30c), monochromatic (Figure 30d), and trisomy (Figure 30e-30h). For all plots, the x-axis represents the linear position of the individual polymorphic locus according to each chromosome (shown in the plots below) and the y-axis is the fraction of the entire (A + B) allele reading, . The positions on the y-axis where the maternal and fetal genotypes, and also the bands are concentrated, are shown on the right side of the plot. When it is desirable to facilitate visualization, the plot is color-coded according to the maternal genotype so that the red represents the maternal genotype of AA, the blue represents the maternal genotype of BB, and the green represents the maternal genotype of AB. In some cases, the maternal allele distribution can be colored in the "fetal genotype" column. The allele contributes to the fetus so that the allele of parent AA is AA and that of child AB is AA | AB. 30A is a cross- Two chromosomes are present and a plot is generated when the fetal cfDNA fraction is 0%. Since the plot originates from a woman who is not pregnant, it represents the form in which the genotype is entirely mock. Thus, the allele population is concentrated around 1 (AA allele), 0.5 (AB allele), and 0 (BB allele). Figure 30B shows plots generated when two chromosomes are present and the fetal fraction is 12%. The distribution of fetal alleles for fractions of the A allele gene reads shifts the position of some alleles upward or downward along the y-axis, so the band is 1 (AA | AA allele), 0.94 (AA | AB allele ), 0.56 (AB | AA allele), 0.50 (AB | AB allele), 0.44 (AB | BB allele), 0.06 (BB | AB allele), and 0 do. Figure 30c shows plots generated when two chromosomes are present and the fetal fraction is 26%. The mode involving two red and two blue surround bands and three center green bands is readily apparent (color not shown). AA allele), 0.87 (AA | AB allele), 0.63 (AB | AA allele), 0.50 (AB | AB allele), 0.37 | AB allele), and 0 (BB | BB allele). 30D shows a plot generated when one chromosome is present and the fetal fraction is 26%. One external red and one external blue peripheral band and two central green bands characteristically represent a single chromosomal inheritance (color not shown) inherited in the mother line. Because only the fetus contributes to a single allele (A or B) for allele readings, there are no internal peripheral red and blue bands, and the three central bands are concentrated into two bands (color not shown) . The band is concentrated around 1 (AA | A allele), 0.57 (AB | A allele), 0.43 (AB | B allele), and 0 (BB | B allele). Figure 30E shows plots generated when three chromosomes are present and the fetal fraction is 27%. This pattern of two red and two blue peripheral bands and two central green bands indicates maternal-inherited meiosis trisomy (color not shown). ABA allele), 0.12 (BB ABB allele), and 0 (AA ABA allele), 0.88 (AA ABAB allele), 0.56 AB ABAB allele, BB < / RTI > BBB allele). Figure 30f shows the generated plots with three chromosomes present and a fetal fraction of 14%. This pattern of three red and three blue surrounding bands, and two central green bands, indicates paternal-inherited meiosis trisomy (color not shown). The bands were 1 (AA | AAA allele), 0.93 (AA | AAB allele), 0.87 (AA | ABB allele), 0.60 (AB | AAA allele), 0.53 ABB allele), 0.40 (AB | BBB allele), 0.13 (BB | AAB allele), 0.07 (BB | ABB allele), and 0 (BB | BBB allele). Figure 30g shows a plot generated when three chromosomes are present and the fetal fraction is 35%. This mode of two red and two blue peripheral bands and also four central green bands represents paternal-inherited meiosis trisomy (color not shown). The bands were 1 (AA | AAA allele), 0.85 (AA | AAB allele), 0.72 (AB | AAA allele), 0.57 | BBB allele), 0.15 (BB | ABB allele), and 0 (BB | BBB allele). Figure 30h shows a plot generated when three chromosomes are present and the fetal fraction is 25%. This pattern of two red and two blue surrounding bands and four central green bands represents paternal-inherited meiosis trisomy (color not shown). This form can be distinguished from that of paternal-inherited meiosis trisomy (as shown in Figure 30g) by the position of the inner peripheral band. Specifically, the bands were 1 (AA | AAA allele), 0.78 (AA | ABB allele), 0.67 (AB | AAA allele), 0.56 0.0 > (BB) < / RTI > BBB allele), 0.22 (BB AAB allele), and 0 (BB BBB allele).
31 (a), 31 (b), 31 (b), 31 (b), 31, Graphically. Each chromosome is shown at the top of the plot, the fetal and maternal genotypes are shown to the right of the plot, the x-axis represents the linear position of the SNP along each chromosome and the y-axis represents the number of A allele readings . It is noted that the modified group locating is based on the fetal fraction, as shown here. Each point represents a single SNP locus. Fetal and maternal genotypes are shown on the right side of the plot, and chromosomal entities are shown on the top of the plot.
Figure 32 shows that the combined birth sex ratio of sex chromosomal aberrations is greater than that of autism.
While the foregoing drawings depict the presently disclosed embodiments, other embodiments are also contemplated, as shown in the discussion. The above description is provided by way of illustration and not by way of limitation. Many other variations and embodiments may be devised by those skilled in the art that fall within the scope and spirit of the principles of the presently disclosed embodiments.
details
The present invention is based in part on the surprising discovery that only a relatively small number of primers, often in the library of primers, are involved in the substantial amount of amplified polymeric dimer that forms during a multiplex PCR reaction. Methods have been developed to select the most undesirable primers for removal from libraries of post-treasure primers. By reducing the amount of primer dimer to negligible amount (~ 0.1% of the PCR product), the method allows the resulting primer library to simultaneously amplify a large number of target gene sites in a single multiplex PCR reaction. The primer hybridizes to the target gene site and hybridizes to the other primer to amplify them rather than to form the amplified primer dimer so that the number of different target gene sites that can be amplified is increased. It has also been found that using lower primer concentrations and annealing times longer than normal increases the likelihood that the primers will hybridize to each other to hybridize to the target gene locus instead of forming a primer dimer.
During PCR amplification and sequencing of 19,488 target gene sites in genomic samples, 99.4 to 99.7% of the sequence analysis reads are mapped to the genome, 99.99% of which are mapped to the targeted gene locus. For plasma samples using 10 million sequencing assays, at least 19,350 of the 19,488 targeted gene sites (99.3%) are typically amplified and sequenced. The ability to simultaneously amplify such a large number of target gene sites greatly reduces the amount and amount of DNA required to analyze thousands of target gene sites. For example, DNA in a single cell is sufficient for simultaneous analysis of thousands of target gene sites, which can be achieved by using low amounts of DNA, for example, in vitro fertilization or genetic testing of external samples using very few DNA It is important for genetic testing of single cells in embryos. Also, being able to analyze the target gene locus in one reaction volume (e.g., one chamber or well), rather than splitting the sample into a number of different reactions, reduces the variability that may occur between the reactions. In addition, methods have been developed for using reference standards to calibrate for amplification defects that may occur between different target gene sites. For example, differences in amplification efficiency between target gene sites due to factors such as GC content can differentiate the amount of PCR product produced for an actual target gene locus in the same amount. The use of reference standards similar to the target gene locus allows detection of this amplification bias so that it can be corrected during quantification of the target gene locus.
During the sequencing of the PCR product, artifacts such as primer dimers are detected, thus inhibiting the detection of the target amplicon. Due to these limitations, microarrays using high-hybridization probes are commonly used for detection because microarrays are nearly insensitive to interference due to primer dimers. The high level of multiplexing achieved with the minimal non-target amplicon achieved in the present invention allows sequence analysis to be used as an alternative to microarrays following PCR.
The multi-PCR method of the present invention can be used for genetic analysis, genetic mutation and polymorphism (e. G., Single nucleotide polymorphism, SNP) analysis, gene deletion analysis, For example, in the detection of viruses, in the detection of viruses, in forensic analysis, forensic analysis, for determination of disease susceptibility, for quantitative analysis of mRNA, and for detection and identification of infectious agents (eg, bacteria, parasites, and viruses). Multiple PCR methods can also be used for noninvasive fetal testing, such as the father test or the detection of fetal chromosomal abnormality.
Illustrative primer Design Method
Highly multiplexed PCR can often lead to the production of a very high proportion of product DNA due to non-productive side reactions such as primer dimer formation. In one embodiment, a primer library that will yield a higher percentage of amplified DNA mapped to the genome can be obtained by removing from the primer library the specific primer that is most likely to cause unproductive side reactions. Primers in question, i.e. primers that tend to stabilize dimers in particular, unexpectedly allow extremely high levels of PCR multiplexing for subsequent analysis by sequencing. In systems such as sequence analysis where the performance is significantly degraded by primer dimers and / or other deleterious products, multiplexing has been achieved at least 10 times, at least 50 times, and at least 100 times higher than other described multiplexing . It is noted that a large amount of primer dimer is substituted for probe-based detection methods, such as microarray, TAQMAN, PCR, etc., which will not perceptibly affect the results. Also, a common belief in the art is that multiplex PCR for sequencing is limited to about 100 assays in the same well. Fluidigm and Rain Dance can be used for simultaneous reactions to one samplein parallel reactions) Provides a platform to perform 48 or 1000 PCR assays.
There are a number of methods for selecting primers for libraries in which the amount of non-mapped primer dimer or other primer damage products is minimized. Empirical data indicate that a small number of 'bad' primers are involved in a large amount of non-mapped primer dimer side reactions. Removing these 'bad' primers can increase the percentage of sequence readings that map to the targeted gene locus. One way to identify 'bad' primers is to look at sequencing data from amplified DNA by targeted amplification; It is possible to obtain a primer library in which these primer dimers that are observed at the greatest frequency are removed to significantly reduce the tendency to generate by-product DNA that is not mapped to the genome. There are also publicly available programs that can calculate the binding energies of the various primer combinations and removing them using the maximal binding energies can also be used to identify primer libraries that are not significantly more likely to generate byproduct DNA that is not mapped to the genome . &Lt; / RTI >
In some embodiments for selecting a primer, an initial library of a post-treasure primer is generated by designing one or more primers or pairs of primers into a post-treasure target gene site. A set of tracer target gene sites (e.g., SNPs) are selected based on publicly available information on the desired parameters for the target gene locus, such as the frequency of the SNPs in the target population or the heterozygosity ratio of the SNPs . In one embodiment, the PCR primers can be designed using the Primer3 program (primer3.sourceforge.net; the global network (www) of libprimer3 release 2.2.3, the subject of which is incorporated herein by reference). In some cases, the primer may be annealed and / or annealed within a specific annealing temperature range and / or have a specific range of GC content and / or have a specific size range, produce a target size range of amplicons and / May be designed to have parameter properties. Starting with multiple primers or primer pairs per post-trefoil target gene locus will allow the primer or primer pair to remain in the library for most or all target loci. In one embodiment, the selection category may require that at least one primer pair per target gene locus remains in the library. In this way, most or all of the target gene sites will be amplified when using the first primer library. This is desirable for applications such as screening for deletion or redundancy at a large number of locations in the genome or for screening for a large number of sequences (e.g., polymorphisms or other mutations) associated with disease or increased risk of disease. If the selected primer pair in the library is capable of producing an overlapping target amplicon with the target amplicon produced by the other primer pair, one of the primer pairs can be removed from the library to prevent interference.
In some embodiments, the "rain-resistance score" (the score shows the least preference) is calculated (equivalent to computation on a computer) for most or all possible combinations of two primers in a library of post-treasure primers . In various embodiments, the rain resistance score is calculated for at least 80, 90, 95, 98, 99, or 99.5% of possible combinations of post-treasure primers in the library. Each weathering score is based, at least in part, on the possibility of dimer formation between two post-treasure primers. In some cases, the raggedness score may be a measure of the rate of heterozygosity of the target gene locus, the disease prevalence associated with the sequence (eg, polymorphism) at the target locus, the disease penetration associated with the locus at the target locus (eg, polymorphism) One or more other parameters selected from the group consisting of the specificity of the post-treasure primer to the target primer, the size of the post-treasure primer, the melting temperature of the target ampullicon, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon It can be based on variables. If multiple factors are taken into account, the weathering score can be calculated based on the weighed mean of the various parameters. The parameters may be assigned different weights based on their importance for the particular application in which the primers will be used. In some embodiments, the primer with the highest weathering score is removed from the library. If the removed primer is a member of a primer pair that hybridizes to one target gene site, other members of the primer pair may also be removed from the library. The process of removing the primer may be repeated as occasion demands. In some embodiments, the selection method is performed until all of the weathering scores for the post-treasure primer combination remaining in the library are equal to or less than the minimum limit. In some embodiments, the selection method is performed until the number of post-treasure primers remaining in the library is reduced to a desired number.
In various embodiments, the trefoil primer that is part of the maximum number of combinations of two trefoil primers with weathering scores exceeding the first threshold is removed from the library after the trefoil score is calculated. The step ignores the interaction below the first minimum limit, since this interaction is barely significant. If the removed primer is a member of a primer pair that hybridizes to one target locus, other members of the primer pair may be removed from the library. The process of removing the primer may be repeated as occasion demands. In some embodiments, the selection method is performed until all of the weathering scores for the post-treasure primer remaining in the library are below the first minimum limit. If the number of post-treasure primers remaining in the library is greater than the desired number, the number of primers can be reduced by repeating the process of reducing the first minimum limit to a lower second minimum limit and removing the primer. If the number of post-treasure primers remaining in the library is less than the desired number, the method continues by repeating the process of increasing the first minimum limit to a higher second minimum limit and removing the primer, The primer can be left in the library. In some embodiments, the selection method is performed until both the rain resistance scores for the post-treasure primers remaining in the library are below the second minimum threshold, or until the number of post-treasure primers remaining in the library is reduced to a desired number.
In some cases, a primer pair that produces a target amplicon that overlaps a target ampiclone produced by another primer pair can be divided into separate amplification reactions. It may be desirable to analyze all of the multiple PCR amplification reactions to the post-treasure target gene locus (instead of subtracting the post-target gene locus by analysis due to overlapping of the target amplicon).
This selection method is removed from the library to minimize the number of post-treasure primers that achieve the desired reduction in the primer dimer. By removing a small number of post-treble primers from the library, more (or all) target gene sites can be amplified using the resulting primer library.
Multiplexing of multiple primers confers considerable constraints on assays that may be included. Inadvertently interacting assays generate fake amplification products. The size constraints of the mini-PCR can lead to additional constraints. In one embodiment, it is possible to use an extremely large number of potential SNP targets (about 500 to 1 million or more) and to disclose an approach for designing primers to amplify each SNP. If primers can be designed, use the published thermodynamic parameters for DNA duplexing to identify primers that are likely to form spurious products by assessing the likelihood of spurious primer duplex formation between pairs of all possible primers Approach is possible. The primer interaction can be ranked by scoring the functions associated with the interaction, and the primer with the worst interaction score is removed until the desired number of primers is met. It is also possible to rank the list of assays and select the most heterozygous compatible assays if the likelihood that the SNP is heterozygous is most useful. Experiments demonstrate that primers with high reactive scores are most likely to form primer dimers. In high multiplexing, it is not only possible to eliminate all false interactions, but they can exclude the whole reaction and greatly limit the amplification of the intended targetInsilico It is essential to remove the primer or primer pair with the highest interaction score. We performed this procedure to produce a composite primer set that is less than or equal to 10,000 primers and, in some cases, in some cases. The improvement due to the process is 80% or more, 90% or more, 95% or more, 98% or more in the target product as measured by sequencing of all the PCR products, compared to 10% , And even more than 99% of amplification. When combined with a partial anti-nested approach as described above, more than 90%, and even more than 95% of the amplicon can be mapped to a targeted sequence.
It should be noted that there are other ways of measuring which PCR probes are likely to form dimers. In one embodiment, analysis of the homogenate of the DNA amplified using the non-optimized set of primers may be sufficient to measure the problematic primers. For example, analysis can be performed using sequencing, and these dimers present in the maximum number are determined to have the greatest likelihood of forming a dimer and can be eliminated.
The methods have a number of potential applications for, for example, SNP genotyping, heterozygosity ratio determination, copy number determination, and other targeted sequencing applications. In one embodiment, the primer design method can be used in combination with the mini-PCR method described elsewhere in this document. In some implementations, the primer design method can be used as part of a large multiplexed PCR method.
Use of the tag in the primer can reduce amplification and sequencing of the primer dimer product. In some embodiments, the primer contains an internal region that forms a loop structure with the tag. In a particular embodiment, the primer comprises a 5 ' region specific for the target gene locus, an internal region that is not specific for the target gene locus but forms a loop structure, and a 3 ' region specific for the target gene locus. In some embodiments, the loop region can be between two binding regions, wherein the two binding regions are designed to bind to contiguous or neighboring regions of the template DNA. In various embodiments, the length of the 3 'region is at least 7 nucleotides. In some embodiments, the length of the 3 'region is from 7 to 20 nucleotides, for example, from 7 to 15 nucleotides, or from 7 to 10 nucleotides. In various embodiments, the primers are not specific for the target gene locus (e.g., a tag or a common primer binding site), but are specific for the 5 'region followed by the target gene locus, An internal region that forms a high loop structure, and a 3 ' region that is specific for the target gene locus. The tag-primer can be used to shorten essential target-specific sequences with fewer than 20, fewer than 15, fewer than 12, and even fewer than ten base pairs. This can be coincident with the standard primer design if the target sequence is fragmented within the primer binding site, or it can be designed with a primer design. The advantage of this method is that it increases the number of assays that can be designed for a particular maximal amplicon length, which shortens the "non-informative" sequence analysis of the primer sequence. It can also be used with internal tagging (see elsewhere in this document).
In one embodiment, the relative amount of non-productive product in the multiplexed and targeted PCR amplification can be reduced by raising the annealing temperature. If one is an amplification library with the same tag as the target specific primer, the annealing temperature can be increased compared to the genomic DNA since the tag will contribute to primer binding. In some embodiments, the inventors use a significantly lower primer concentration than previously reported, with longer annealing times than reported elsewhere. In some embodiments, the annealing time is at least 3 minutes, at least 5 minutes, at least 8 minutes, at least 10 minutes, at least 15 minutes, at least 20 minutes, at least 30 minutes, at least 60 minutes, at least 120 minutes, at least 240 minutes, at least 480 minutes Or more, and even 960 minutes or more. In one embodiment, a longer annealing time is used than in the previous reports, allowing a lower primer concentration. In various embodiments, a longer extension time than the normal extension time, e.g., 3, 5, 8, 10, or 15 minutes or more is used. In some embodiments, the primer concentration is as low as 50 nM, 20 nM, 10 nM, 5 nM, 1 nM, and 1 μM or less. This surprisingly results in a highly multiplexed reaction, such as a 1,000-plex reaction, a 2,000-plex reaction, a 5,000-plex reaction, a 10,000-plex reaction, a 20,000-plex reaction, a 50,000-plex reaction and even a 100,000- Producing a robust performance capability for the reaction. In one embodiment, amplification involves using a 1, 2, 3, 4 or 5 cycle runs with a longer annealing time followed by a PCR cycle with a more general annealing time using the tagged primers.
To select the target location, a horseshoe of a post-treasure primer pair design was used to generate a thermodynamic model of potentially inverse interaction between the primer pairs, Adult design can be eliminated.
After the selection process, the primers remaining in the library can be used in any of the methods of the present invention.
Illustrative primer library
In one aspect, the invention features a library of primers, such as primers selected from libraries of post-treasure primers using any of the methods of the present invention. In some embodiments, the library comprises at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or primers capable of simultaneously hybridizing (or simultaneously hybridizing) or simultaneously amplifying (or simultaneously amplifying) 100,000 different target gene sites in one reaction volume. In various embodiments, the library comprises from 1,000 to 2,000; 2,000 to 5,000; 5,000 to 7,500; 7,500 to 10,000; 10,000 to 20,000; 20,000 to 25,000; 25,000 to 30,000; 30,000 to 40,000; 40,000 to 50,000; 50,000 to 75,000; Or a primer capable of simultaneously amplifying (or simultaneously amplifying) different target gene sites, including 75,000 to 100,000, in one reaction volume. In various embodiments, the library comprises 1,000 to 100,000 different target gene sites, for example, 1,000 to 50,000; 1,000 to 30,000; 1,000 to 20,000; 1,000 to 10,000; 2,000 to 30,000; 2,000 to 20,000; 2,000 to 10,000; 5,000 to 30,000; 5,000 to 20,000; Or a primer capable of simultaneously amplifying (or simultaneously amplifying) different target gene sites, including 5,000 to 10,000, in one reaction volume. In some embodiments, the library may be constructed to contain at least one of the following sequences: 60, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, or less than 0.5% of the amplified product is a primer dimer. In various embodiments, the amount of amplified product that is a primer dimer is from 0.5 to 60%, such as from 0.1 to 40%, from 0.1 to 20%, from 0.25 to 20%, from 0.25 to 10%, from 0.5 to 20% %, 1 to 20%, or 1 to 10%. In some embodiments, the primers are at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or even more preferably at least 50, 99.5% of the amplified product is the target amplicon. In various embodiments, the amount of amplified product that is the target amplicon is 50 to 99.5%, such as 60 to 99%, 70 to 98%, 80 to 98%, 90 to 99.5%, or 95 to 99.5% . In some embodiments, the primer is at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or even more preferably at least 50, 99.5% of the targeted gene locus is amplified. In various embodiments, the amount of amplified target gene locus is from 50 to 99.5%, such as from 60 to 99%, from 70 to 98%, from 80 to 99%, from 90 to 99.5%, from 95 to 99.9% 99.99%. In some embodiments, the library of primers is at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 primer pairs, wherein each pair of primers comprises a forward test primer and a reverse test primer, wherein each pair of test primers is hybridized to a target gene locus. In some embodiments, the library of primers comprises at least 1,000 that hybridizes to a different target locus, respectively; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 individual primers, wherein the individual primers are not part of the primer pair.
In various embodiments, the concentration of each primer is less than 100, 75, 50, 25, 20, 10, 5, 2, or 1 nM, or less than 500, 100, 10, or 1 μM. In various embodiments, the concentration of each primer is between 1 μM and 100 nM, for example between 1 μM and 1 nM, between 1 and 75 nM, between 2 and 50 nM, or between 5 and 50 nM. In various embodiments, the GC content of the primer is from 30 to 80%, for example, from 40 to 70%, or from 50 to 60%. In some embodiments, the GC content of the primer ranges from 30, 20, 10, or 5%. In some embodiments, the GC content of the primer ranges from 5 to 30%, such as from 5 to 20%, or from 5 to 10%. In some embodiments, the melting temperature of the test primer, T_m) Is from 40 to 80 캜, and includes, for example, 50 to 70 캜, 55 to 65 캜, or 57 to 60.5 캜. In some embodiments, T_mIs calculated using the Primer3 program (libprimer3 release 2.2.3) using the built-in SantaLucia parameter (global network at primer3.sourceforge.net). In some embodiments, the range of melting temperatures of the primers is less than 15, 10, 5, 3, or 1 ° C. In some embodiments, the range of melting temperatures of the primer is from 1 to 15 占 폚, and includes, for example, from 1 to 10 占 폚, from 1 to 5 占 폚, or from 1 to 3 占 폚. In some embodiments, the length of the primer is 15 to 100 nucleotides, for example, 15 to 75 nucleotides, 15 to 40 nucleotides, 17 to 35 nucleotides, 18 to 30 nucleotides, 20 to 65 nucleotides, . In some embodiments, the range of primer lengths is 50, 40, 30, 20, 10, or fewer than 5 nucleotides. In some embodiments, the length of the primer ranges from 5 to 50 nucleotides and includes, for example, 5 to 40 nucleotides, 5 to 20 nucleotides, or 5 to 10 nucleotides. In some embodiments, the length of the target amplicon is from 50 to 100 nucleotides, for example, from 60 to 80 nucleotides, or from 60 to 75 nucleotides. In some embodiments, the length of the target amplicon is less than 50, 25, 15, 10, or 5 nucleotides. In some embodiments, the length of the target amplicon is in the range of 5 to 50 nucleotides and includes, for example, 5 to 25 nucleotides, 5 to 15 nucleotides, or 5 to 10 nucleotides.
These primer libraries can be used in any of the methods of the present invention.
Illustrative primer Kit
In one aspect, the invention features a kit comprising any of the primer libraries of the invention (e.g., a kit for amplifying a target gene locus in a nucleic acid sample). In some embodiments, a kit comprising a plurality of primers designed to achieve the methods described herein may be formulated. Primers may be external forward and reverse primers, internal forward and reverse primers as disclosed herein, which may be primers designed to have low binding affinity for other primers in the kit as disclosed in the paragraph in the primer design, These may be hybrid capture probes or pre-circularized probes as described in the relevant paragraph, or some combination thereof. In one embodiment, the kit may be formulated to measure the ploidic state of a target chromosome in a gestational embryo designed for use with the methods disclosed herein, the kit comprising a plurality of internal forward primers and external reverse primers, Optionally, an outer forward primer and an outer reverse primer, wherein each primer hybridizes to a DNA region at the top and / or bottom of one of the target chromosome, and optionally a target site (e.g., a polymorph site) on an additional chromosome . In one embodiment, a primer kit can be used with the diagnostic box described elsewhere in this document. In some embodiments, the kit includes instructions for using the library to amplify the target gene locus.
Exemplary Multiplex PCR Way
In one aspect, the invention provides a method of screening a nucleic acid sample comprising: (i) 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different target gene sites to produce a reaction mixture; And (ii) subjecting the reaction mixture to primer extension reaction conditions (e. G., PCR conditions) to produce an amplified product comprising the target ampiclonin, thereby amplifying the target gene locus in the nucleic acid sample . In some embodiments, the method also comprises administering to the subject at least one of the target amplicons (e. G., At least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the target amplicon) &Lt; / RTI > presence or absence. In some embodiments, the method also comprises administering to the subject at least one of the target amplicons (e. G., At least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the target amplicon) And measuring the sequence. In some embodiments, at least 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, or 99.5% of the targeted loci are amplified. In various embodiments, the amplified product of less than 60, 50, 40, 30, 20, 10, 5, 4, 3, 2, 1, 0.5, 0.25, 0.1, or 0.05% is a primer dimer.
In one embodiment, the methods disclosed herein amplify DNA using highly efficient, highly targeted, multi-target PCR and then measure allele frequencies at each target gene site by high throughput sequencing. The ability to multiplex about 50 or 100 PCR primers in a single reaction volume in a manner that maps most of the resulting sequence reads to targeted gene loci is new and clear. One technique that allows highly multiplexed target PCR in a highly efficient manner involves designing primers that are not prone to hybridization with each other. A PCR probe, typically referred to as a primer, has a length of at least 500; At least 1,000; At least 2,000; At least 5,000; At least 7,500; At least 10,000; At least 20,000; At least 25,000; At least 30,000; At least 40,000; At least 50,000; At least 75,000; Or potential adverse interactions between at least 100,000 potential primer pairs, or unwanted interactions between the primer and the sample DNA, and then use this model to construct other designs in the co- And removing the non-mixable design. Another technique that allows highly multiplexed and targeted PCR to be performed in a highly efficient manner is to use a partial or fully overlapping approach to targeted PCR. Using one or a combination of these approaches would require at least 300, at least 800, at least 100, at least 100, at least 100, at least 100, at least 100, 1,200, at least 4,000, or at least 10,000 primers. Use of one or a combination of these approaches may result in more than 50%, at least 60%, at least 67%, at least 80%, at least 90%, at least 95%, at least 96% , More than 97%, 98%, 99%, or 99.5% of the DNA molecules.
In some embodiments, detection of the target dielectric material may be performed in a multiplexed fashion. The number of genetic target sequences that can be performed in parallel can range from 1 to 10, 10 to 100, 100 to 1000, 1000 to 10,000, 10,000 to 100,000, 100,000 to 1,000,000, or 1,000,000 to 10,000,000. The prior approach of multiplexing more than 100 primers per horn has resulted in significant problems with undesirable side effects such as primer-dimer formation.
Targeted PCR
In some embodiments, PCR can be used to target the target specific location of the genome. In plasma samples, the original DNA is highly fragmented (typically less than 500 bp, with an average length of less than 200 bp). In PCR, both the forward and reverse primers are annealed to the same fragment to allow amplification. Thus, if the fragment is short, PCR assays should also amplify relatively short regions. As with MIPS, when the polymorphic position was too close to the polymerase binding site, this could lead to a bias in amplification of different alleles. Presently, PCR primers targeting a polymorphic region such as that containing a SNP are typically designed to hybridize to a polymorphic base or a base immediately adjacent to the bases, with the 3 'end of the primer. In embodiments herein, the 3 ' end of both forward and reverse PCR primers is designed to hybridize to a base in which one or several positions are separated from the mutated position (polymorphic site) of the targeted allele. The number of bases between the polymorphic site (SNP or otherwise) and the base designed to hybridize the 3 'end of the primer can be one base, or it can be two bases, or it can be three bases, 4 bases or it may be 5 bases or it may be 6 bases or it may be 7 to 10 bases or it may be 11 to 15 bases or it may be 16 to 20 bases . The forward and back primers can be designed to hybridize different numbers of bases away from the polymorphic site.
Although PCR assays can be generated in large numbers, interactions between different PCR assays make them difficult to multiplex beyond about 100 assays. Various complex molecular approaches can be used to increase the level of multiplexing, but this can still be limited to less than 100, perhaps less than 200, or even less than 500, assays per reaction. Samples with large amounts of DNA can be cleaved among multiple side reactions and then recombined prior to sequencing. In the case where the entire sample or a small subset of DNA molecules is restricted, splitting the sample can introduce substantial noise. In one embodiment, a small or limited amount of DNA can refer to an amount of less than 10 pg, from 10 to 100 pg, from 100 pg to 1 ng, from 1 to 10 ng, or from 10 to 100 ng. While this method is particularly useful in small quantities of DNA that can cause significant problems associated with stochastic noise introduced by other methods including splitting into multiple horns, Note that it still offers the advantage of minimizing bias. In this situation, a common pre-amplification step can be used to increase the total sample volume. Ideally, this pre-amplification step should not noticeably alter the allele distribution.
In one embodiment, the methods herein utilize a limited sample, such as a single cell or DNA of bodily fluids, to identify a plurality of targeted loci for genotyping by sequence analysis or some other genotyping method, specifically from 1,000 to 5,000 loci , 5,000 to 10,000 loci, or more than 10,000 loci. Currently, performing multiple PCR reactions of 5 to 10 or more targets represents a major challenge and is often interrupted by primer subproducts such as primer dimers, and other artifacts. When a target sequence is detected using a microarray having a hybridization probe, the primer dimer and other artifacts can be ignored since they are not detected. However, when using sequencing as the detection method, many of the sequence analysis readings can sequence such artifacts and the target sequences in the sample may not be sequenced. The methods described in the prior art used for sequencing after 50 or more than 100 reactions have been multiplexed into one reaction volume are more than 20% and often more than 50%, in many cases more than 80% and in some cases more than 90% Off-target sequence reading.
Generally, in order to perform a targeted sequencing of a large number (n) of targets (more than 50, more than 100, more than 500, or more than 1,000), the sample is subjected to a number of parallel reactions amplifying one individual target It can be split. This may be done in a PCR multiwell plate or by RAIN DANCE TECHNOLOGY (100 to several thousand targets) on a commercial platform such as FLUIDIGM ACCESS ARRAY (48 reactions per sample in microchannel chip) or DROPLET PCR . Unfortunately, these split-and-pool methods are problematic for samples with limited amounts of DNA because there is one copy of each region of the genome in each well Because there is often not enough copies of the genome to warrant. This is particularly acute when the polymorphic locus is targeted and a relative proportion of alleles at the locus of the polymorphism is required because the probabilistic noise introduced by cleavage and homology is an allele in the original sample of DNA 0.0 > very < / RTI > Described herein are methods for effectively and efficiently amplifying as many PCR reactions as are applicable only when a limited amount of DNA is available. In one embodiment, the method is applicable for the analysis of a mixture of DNA, such as free floating DNA, found in a single cell, body fluid, maternal plasma, biopsy, environment and / or external sample.
In one embodiment, the targeted sequence analysis can include one, several, or all of the following steps. a) generating and amplifying a library with adapter sequences at both ends of the DNA fragment. b) dividing the library into a plurality of reactants after amplification. c) generating and arbitrarily amplifying the library using adapter sequences at both ends of the DNA fragment. d) Performing 1000- to 10,000-fold amplification of the target of choice using one target specific "forward" primer per target and one tag specific primer. e) performing a second amplification of the product using one (or more) primers specific for a common tag introduced as part of the target specific forward primer in the first round and a "reverse" tag specific primer. f) Performing a 1000-flex preamplification of the selected tag for a limited number of cycles. g) amplifying the product by dividing the product into a plurality of aliquots and dividing into the godmobil cast of the target in the respective reactants (for example, 50 to 500 plexes, although this can be used in all of the methods of falling into a singleplex) step. h) harvesting the product of a parallel seated casting reaction. i) allowing the primer to carry a sequence analysis miscible tag (partially or full length) during these amplifications so that the product can be sequenced.
Highly multiplexed PCR
A method is described herein in which a target sequence (e.g., a SNP locus) of a nucleic acid sample, such as genomic DNA obtained in plasma, performs hundreds to thousands or more targeted amplifications. The amplified sample may not contain relatively primer dimer products and has a low allele bias in the target gene locus. If the product is attached with a sequencing miscible adapter during or after amplification, analysis of these products can be performed by sequencing.
Performing highly multiplex PCR amplification using methods known in the art results in the formation of primer dimer products that are undesirable and undesirable amplification products for sequencing. This can be reduced empirically by removing the primers that form these products or by performing the silico selection of the primers. However, the greater the number of tests, the more difficult this problem becomes.
One solution is to split the 5000-flex reaction into several less-flex amplification, e.g., 150-flex or 5000-flex reactions, using microfluidics or even splitting the sample into individual PCR products. However, if the sample DNA is limited, as in the case of an invasive fetal diagnosis by pregnancy plasma, dividing the sample among multiple reactions should be avoided as it will cause bottlenecks.
Described herein is a method of first broadly amplifying plasma DNA of a sample and then dividing the sample into a plurality of multiple targeted enrichment reactions using a medium number of target sequences per reaction. In one embodiment, the methods of the present invention can be used to preferentially enrich a DNA mixture at multiple loci, the method comprising one or more of the following steps: a molecule in the library is linked to both ends of the DNA fragment Generating and amplifying a library of mixtures of DNA having a sequence of adapters, dividing the amplified library into a plurality of reactions, one target specific "forward" primer per target and one or more adapters specific common Performing a first round of multiple amplification of the selected target using a "reverse" primer. In one embodiment, the methods of the invention comprise a second amplification using one or more primers specific for the " reverse "target specific primer and a common tag introduced as part of the target specific forward primer in the first round And a step of performing the processing. In one embodiment, the method may be used to form a fully overlapped, hemi-nested, semi-nested, fully nested, one-sided hemi-nested, Lt; RTI ID = 0.0 > PCR < / RTI > In one embodiment, the method is used to preferentially concentrate a DNA mixture into a plurality of loci; The method comprises performing a multiple preliminary amplification of a selected target for a limited number of cycles, dividing the product into multiple aliquots, amplifying the aliquot of the target in the individual reaction, and amplifying the product of the parallel, Lt; / RTI > Using this approach, targeting in a manner that is capable of producing low levels of allelic bias for 50 to 500 loci, 500 to 5,000 loci, 5,000 to 50,000 loci, or even 50,000 to 500,000 loci Amplification can be performed. In one embodiment, the primers involve partial or full length sequencing miscible tags.
The workflow includes steps of (1) extracting DNA such as plasma DNA, (2) preparing a fragment library using a common adapter at both ends of the fragment, and (3) using a common library specific for the adapter (4) isolating the amplified sample "library" into a plurality of aliquots, (5) isolating the amplified sample " (6) performing an aliquot of one sample, (7) barcoding the sample, (3) amplifying the sample, and 8) mixing the sample and adjusting the concentration, and (9) sequencing the sample. The workflow may include a plurality of small-scale steps that contain one of the listed steps (e.g., step (2) of producing a library step, one of three enzyme steps (smooth end, dA tailing and adapter linkage) Step < / RTI > The steps of the workflow may be summed, divided, or performed in a different order (e.g., bar coding and sample mixing).
It is important to note that the amplification of the library can be performed in a biased manner to more efficiently amplify short fragments. In this way, it is possible to preferentially amplify shorter sequences, such as mono-nucleosomal DNA fragments, such as the cell-derived fetal DNA (placental origin) found in the circulation of pregnant women. PCR assays can have tags, for example, sequencing tags (typically in the truncated form of 15-25 bases). After multiplexing, the PCR multiplexing of the samples is harvested and the tag is then completed (including bar coding) by a tag-specific PCR (which may also be performed by linkage). In addition, complete sequence analysis tags can be added to the same reaction as multiplexing. In the first cycle, the target can be amplified using a target specific primer, and subsequently tag-specific primers are substituted to complete the SQ-adapter sequence. PCR primers may not carry tags. A sequencing tag is attached to the amplification product by a linkage.
In one embodiment, a highly multiplexed PCR involving an evaluation of the amplified material by clonal sequence analysis can be used for a variety of applications, such as detection of fetal integrity. Conventional multiplex PCR uses up to 50 gene loci simultaneously, but using the approach described here, more than 50 loci simultaneously, more than 100 loci simultaneously, more than 500 loci simultaneously, more than 1,000 At the same time, it may be possible to simultaneously evaluate more than 5,000 gene loci, more than 10,000 loci simultaneously, more than 50,000 loci simultaneously, and more than 100,000 loci simultaneously. Experiments have shown that up to 10,000 distinct loci can be evaluated simultaneously with efficacy and specificity sufficient to achieve high precision in non-invasive fetal DNA diagnostic and / or copy number requests in a single reaction. The assay can combine a single reagent with the entire sample, such as a maternal plasma, a fragment thereof, or a cfDNA sample separated from the maternal placenta, a fragment thereof, or an additional processed derivative of a cfDNA sample. Samples (eg, cfDNA or derivatives) can also be split into multiple parallel multiple reactions. Optimal sample cleavage and multiplexing is measured by trading off various performance specifications. Due to the limited amount of material, splitting the sample into multiple fractions can introduce sampling noise, handling time, and increase the likelihood of error. Conversely, higher multiplexing can both produce larger amounts of false amplification that can reduce test performance and greater unevenness at amplification.
Two important related considerations in the application of the methods described herein are the limited amount of original sample (e.g., plasma) and the number of original molecules in the material from which the allele frequency or other measurement is obtained. If the number of original molecules is below a certain level, random sampling noise will be significant and may affect the accuracy of the test. Typically, data of sufficient quality to achieve a non-invasive maternal insight diagnosis can be obtained if the measurement is performed on a sample comprising 500 to 1000 original molecule equivalents per target gene locus. There are a number of clear measures, for example, a number of ways to increase the sample volume. Each operation applied to the sample also potentially results in loss of material. It is essential to characterize the losses imposed by various operations and to avoid the loss of the yield of a particular operation or, in some cases, to improve the performance of the test.
In one embodiment, it is possible to alleviate potential losses in subsequent steps by amplifying all or a fraction of the original sample (e.g., cfDNA sample). A variety of methods are available that amplify all of the dielectric material in the sample and increase the amount available for the underlying process. In one embodiment, the linkage mediated PCR (LM-PCR) DNA fragment is amplified by PCR after coupling of one specific adapter, two specific adapters, or a number of specific adapters. In one embodiment, all DNA is isothermally amplified using a number of alternative amplification (MDA) phi-29 polymerases. In DOP-PCR and modifications, the original material DNA is amplified using random priming. Each method has certain characteristics such as amplification performance as a function of the uniformity of amplification across all depicted genomic regions, the capture and amplification efficiency of the original DNA, and the length of the fragment.
In one embodiment, LM-PCR can be used with a single variant with a 3-prime tyrosine in conjunction with a heroduplexed adapter. The template adapters enable the use of a single adapter molecule that can be converted to two distinct sequences at the 5-prime and 3-prime ends of the original DNA fragment during the first round of PCR. In one embodiment, it is possible to fractionate the library amplified by size separation, or AMPURE, TASS or other similar methods. Prior to ligation, the sample DNA is blunt ended followed by a single adenosine base at the 3-prime terminus. DNA can be cleaved using restriction enzymes or some other cleavage method prior to ligation. Complementarity of the 3-prime adenosine and adapters of the sample fragments 3-prime tyrosine overhang linkage can improve linking efficacy. The extension step of PCR amplification can be limited due to the time perspective to reduce the amplification of fragments longer than about 200 bp, about 300 bp, about 400 bp, about 500 bp, or about 1,000 bp. Longer DNA found in maternal plasma is almost entirely maternal, which can lead to an enrichment of 10-50% of fetal DNA and an improvement in test performance. A number of reactions were performed using conditions as specified by commercial kits; Resulting in less than 10% successful linkage of the sample DNA molecule. A series of optimization of reaction conditions for this improved the connection by approximately 70%.
mini- PCR
The following mini-PCR method is preferred for samples containing short nucleic acids, degraded nucleic acids, or fragmented nucleic acids, for example, cfDNA. Traditional PCR assay designs result in significant loss of apparent fetal molecules, but loss can be significantly reduced by designing a very short PCR assay, aka mini-PCR assay. Fetal cfDNA in maternal serum is highly fragmented and fragment sizes are distributed in a Gaussian fashion with an average of 160 bp, a standard deviation of 15 bp, a minimum size of about 100 bp, and a maximum size of about 220 bp do. With respect to the targeted polymorphism, the distribution of fragment start and end positions is not necessarily random, but is broadly and globally variable among the individual targets and the polymorphic site of one particular target locus is located at the locus It can occupy any position in the start or end of various fragments. Note that the term mini-PCR can equally well be represented in saying normal PCR without further limitations or limitations.
During PCR, amplification will occur only in template DNA fragments, including both forward and reverse primer sites. Since the fetal cfDNA fragment is short, the length including both forward and reverse primersLThe likelihood of both primer regions indicating the likelihood of a fetal fragment of the ampicillin is the ratio of the length of the amplicon to the length of the fragment. Under ideal conditions, assays with ampicocon at 45, 50, 55, 60, 65, or 70 bp were successful in 72%, 69%, 66%, 63%, 59%, or 56% of commercially available template fragment molecules, respectively Lt; / RTI > Amplicone length is the length between the 5-prime ends of the forward and reverse priming sites. Amplicon lengths shorter than those typically used by those known in the art can result in more efficient measurement of the desired polymorphic locus by requiring only short sequence replicates. In one embodiment, the substantial fraction of the ampicillin may be less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 65 bp, less than 60 bp, less than 55 bp, less than 50 bp, or less than 45 bp.
In methods known in the art, short assays such as those described herein provide constraints that are not considered required in primer design by limiting the primer length, annealing properties, and distance between the forward and reverse primers, Generally avoided.
It is also noted that when the 3-prime end of the primer is within approximately 1 to 6 bases of the polymorphic site, there is a biased amplification potential. This single base difference at the site of the initial polymerase binding may result in preferential amplification of an allele, which may alter the observed allele frequency and decrease performance. All of these constraints identify primers that will successfully amplify a particular gene locus and design a large set of primers that are compatible with the same multiplexing reaction. In one embodiment, the 3'end of the internal forward and reverse primers is designed to hybridize to the upper DNA region of the polymorphic site and to be separated at the polymorphic site by a few bases. Ideally, the number of bases may be from 6 to 10 bases, but may equally be between 4 and 15 bases, between 3 and 20 bases, between 2 and 30 bases, or between 1 and 60 bases, and substantially the same Terminal.
Multiple PCRs can include a round of PCR in which all targets are amplified or can include one round of PCR followed by one or more rounds of overlapping PCR or some modification of overlapping PCR. The overlapping PCR consists of subsequent rounds or rounds of PCR amplification using at least one new primer that binds internally to the primer used in the previous round by at least one base pair. Overlapping PCR reduces the number of multiple spurious amplification targets by amplifying only the amplification products of the previous PCR with the correct internal sequence in subsequent reactions. Reducing the fake amplification target improves the number of useful assays that can be obtained, especially in sequence analysis. The overlapping PCR typically involves a step of designing the primer completely internal to the preceding primer binding site, essentially increasing the minimum DNA segment size required for amplification. In the case of samples such as those in which DNA is highly fragmented, such as maternal plasma cfDNA, larger assay sizes reduce the number of distinct cf DNA molecules from which measurements can be obtained. In one embodiment, in order to counteract the effect, the addition of one or both of the primers of the second round internally to some of the bases, Lt; RTI ID = 0.0 > specificity < / RTI >
In one embodiment, multiple horseshoes of the PCR assay can be designed to potentially amplify heterozygous SNPs on one or more chromosomes or other polymorphic or non-polymorphic gene sites on one or more chromosomes, Lt; / RTI > The number of PCR assays can range from 50 to 200 PCR assays, from 200 to 1,000 PCR assays, from 1,000 to 5,000 PCR assays, or from 5,000 to 20,000 PCR assays (50 to 200-flex, 200 to 1,000-flex, - flex, 5,000 to 20,000-flex, 20,000-flex or more). In one embodiment, a multiplex hornblende of about 10,000 PCR assays (10,000-plex) was designed to potentially amplify the heterozygous SNP locus on chromosomes X, Y, 13, 18, and 21 and 1 or 2, Is used as a single reaction to amplify cfDNA obtained from maternal plasma samples, chorionic villus samples, amniocentesis samples, single or small numbers of cells, other body fluids or tissues, cancer, or other genetic material. The SNP frequency of each locus can be measured by cloning of the sequence analysis of the amplicon or by some other method. A statistical analysis of the allele frequency distribution or the ratio of all assays can be used to determine if the sample contains one or more of the chromosomes included in the test. In another embodiment, the original cfDNA sample is split into two samples and a parallel 5,000-flex assay is performed. In another embodiment, the original cfDNA sample is split into n samples and a parallel (~ 10,000 / n) -flex test is performed, where n is 2 to 12, or 12 to 24, or 24 to 48, or 48 Lt; / RTI > The data is collected and analyzed in a manner similar to that already described. It is noted that the method can be applied equally well to detect translocation, deletion, redundancy, and other chromosomal abnormalities.
In one embodiment, a tail that is not homologous to the target genome can also be added to either the 3-prime or 5-prime end of any of the primers. These tails facilitate subsequent manipulation, process or measurement. In one embodiment, the tail sequences may be identical for forward and reverse target specific primers. In one embodiment, different tails may be used for forward and reverse target specific primers. In one embodiment, a number of different tailings can be used for different genetic loci or sets of loci. A particular tail may be shared among all gene loci or a small set of loci. For example, using the forward and reverse tail corresponding to the forward and reverse sequences required by any of the current sequencing platforms may enable direct sequence analysis after amplification. In one embodiment, the tail can be used as a common priming site among all amplified targets that can be used to add other useful sequences. In some embodiments, the internal primers may contain regions that are designed to hybridize to the top or bottom of the targeted gene locus (e. G., The polymorphic locus). In some embodiments, the primer may contain a molecular bar code. In some embodiments, the primers may contain a common priming sequence designed to allow PCR amplification.
In one embodiment, a 10,000-flex PCR assay mix is generated such that the forward and reverse primers correspond to the required forward and reverse sequences required by the high-emission sequencer, such as HISEQ, GAIIX, or MYSEQ (commercially available from ILLUMINA) Have a tail. In addition, the 5-prime included for the sequencing tail can be used as a priming site in subsequent PCR, allowing nucleotide barcode sequences to be added to the amplicon to allow multiple sequence analysis of multiple samples in a single lane of a high- Is an additional sequence that can be used.
In one embodiment, a 10,000-flex PCR assay mix is generated such that the reverse primer has a tail corresponding to the required reverse sequence required by the high-emission sequencer. After amplification with a first 10,000-flex assay, subsequent PCR amplification was performed using a partially overlapping forward primer (e. G., Six-base overlapping) for all targets and reverse sequence analysis included in the first round Flex horn with a corresponding reverse primer on the tail. Partially overlapping amplification of this subsequent round using only one target-specific primer and a common primer limit limits the required size of the assay, reducing sampling noise, but greatly reduces the number of fake amplicons . A sequencing tag can be added to the attached link adapter and / or as part of a PCR probe to make the tag part of the final amplicon.
Fetal fraction affects the performance of the test. There are a number of ways to concentrate the fetal fraction of DNA found in maternal plasma. Fetal fractionation can be increased by the previously described LM-PCR method as discussed above and also by targeted removal of long maternal fragments. In one embodiment, an additional multiplex PCR reaction may be performed prior to multiple PCR amplification of the target gene site to selectively remove long, large maternal fragments corresponding to the targeted gene locus in subsequent multiplex PCR. Additional primers are designed to anneal sites of greater distance from polymorphism than would be expected to be present in the cell glass-themed DNA fragment. These primers can be used in multiplex PCR reactions in one cycle prior to multiplex PCR with the target polymorphic locus. These distal primers are tagged with molecules or residues that can enable selective recognition of the tagged fragment of DNA. In one embodiment, these molecules of DNA can be covalently modified into a biotin molecule that allows for the removal of newly formed double-stranded DNA containing these primers after a single cycle of PCR. Double stranded DNA was formed during the first round when the original material was similar. Removal of the hybrid material can be accomplished by the use of magnetic streptavidin beads. There are other tagging methods that can work equally well. In one embodiment, a shorter chain of DNA using a size selection method; For example, samples less than about 800 bp, less than about 500 bp, or less than about 300 bp can be enriched. Amplification of the short fragment can then proceed normally.
The Mini-PCR method described in this disclosure allows for the analysis of hundreds to thousands or even millions of gene sites with highly multiplexed amplification and single reactions of a single sample. At the same time, the detection of amplified DNA can be multiplexed, and tens or hundreds of samples can be multiplexed using barcoding PCR in one sequencing lane. These multiplexed detections have been successfully tested to below 49-flex, and a higher degree of multiplexing is possible. In practice, this allows several hundred samples to be genotyped in thousands of SNPs in a single sequencing run. In the case of these samples, the method allows measurement of the genotype and heterozygosity ratio and simultaneous measurement of the number of copies, both of which can be used for the purpose of detecting the identity. This method is particularly useful for detecting the fertility of the conceived fetus in freely floating DNA found in maternal plasma. This method can be used as part of a method for predicting sex of a fetus and / or a parent of a fetus. It can be used as part of a method for mutation capacity. The methods can be used for a particular amount of DNA or RNA, and the targeted regions can be SNPs, other polymorphic regions, non-polymorphic regions, and combinations thereof.
In some embodiments, linkage-mediated common-PCR amplification of the fragmented DNA can be used. Linkage-mediated common-PCR amplification can be used to amplify plasma DNA, which can then be divided into a number of parallel reactions. It can also be used to concentrate the fetal fraction by preferentially amplifying the short fragment. In some embodiments, adding tags to the fragments by linkage may enable detection of shorter fractions, use of shorter target sequence specific sites of the primers, and / or annealing at higher temperatures to reduce nonspecific reactions .
The methods described herein can be used for a number of purposes, wherein there is a target set of DNA that is mixed with the amount of DNA contaminated. In some embodiments, the target DNA and the contaminating DNA may originate from a genetically related entity. For example, genetic abnormalities in the fetus (target) can be detected in maternal plasma containing fetal (target) DNA and also parent (contaminated) DNA; An abnormal condition may be a full chromosomal abnormality (e.g., isomerism) partial chromosomal abnormality (e.g., deletion, redundancy, reversal, translocation), polynucleotide polymorphism (e.g. STR), single nucleotide polymorphism, and / Can be detected in maternal plasma containing abnormal or differential. In some embodiments, the target and the contaminating DNA can originate from the same individual, but the target and the contaminating DNA are different, for example, by one or more mutations in the case of cancer (see, e.g., H. Mamon EtcPreferential Amplification of Apoptotic DNA from Plasma : Potential for Enhancing Detection of Minor DNA Alterations in Circulating DNA. Clinical Chemistry 54: 9 (2008)). In some embodiments, DNA can be found in cell cultures (apoptosis) supernatants. In some embodiments, it is possible to induce apoptosis in a biological sample (e.g., blood) for subsequent library preparation, amplification and / or sequencing. A number of possible workflows and protocols for achieving this goal are shown elsewhere herein.
In some embodiments, the target DNA is expressed in a single cell, in a sample of DNA consisting of less than one copy of the target genome, in a small amount of DNA, mixed with a source of DNA (e.g., gestational plasma; placenta and maternal DNA; (For example, in a patient's plasma and tumor: a mixture between healthy DNA and cancer DNA, a transplant, etc.), in another body fluid, in a cell culture, in a culture supernatant, in a forensic sample of DNA, Insects trapped in amber), other samples of DNA, and combinations thereof.
In some implementations, a short amplicon size may be used. Short amplicon sizes are particularly suitable for fragmented DNA (see, for example, A. Sikora, et al., Detection of the Increased Amount of Cell-Free Fetal DNA with Short PCR Amplicons.Clin Chem . Jan Jan; 56 (1): 136-8.).
The use of a short amplicon size can produce some significant benefits. A short ampicron size can produce optimized ampiclone efficacy. Short amplicon sizes typically produce shorter products, so there is little opportunity for non-specific priming. The shorter products would be less clustered and therefore more clustered than in the sequencing flow cell. Note that the methods described herein can work equally well in longer PCR amplicons. The amplicon length may be increased, for example, when sequencing longer sequence lengths, as the case may be. Experiments with 146-flex targeted amplification using a 100-bp to 200-bp long assay as the first step in the nested-PCR protocol were performed with positive results in single cells and genomic DNA.
In some embodiments, the methods described herein can be used to amplify and / or detect SNPs, copy numbers, nucleotide methylation, mRNA levels, other types of RNA expression levels, other genetic and / or welfare characteristics . The mini-PCR methods described herein can be used in conjunction with next-generation sequencing; It can be used with other sub-methods such as microarray, digital PCR, real-time PCR, mass spectrometry, and so on.
In some embodiments, the mini-PCR amplification methods described herein can be used as part of a method for precise quantification of minority populations. This can be used for absolute quantification using a spike calibrator. This can be used to quantify mutations / minority alleles through very deep sequence analysis and can be performed in highly multiplexed fashion. It may be used for standard paternity identification in humans, animals, plants, or other creations, and for physical examination of relatives or ancestors. It can also be used for external testing. It can be used for rapid genotyping and copy number analysis (CN) in any kind of material, such as amniotic fluid and CVS, sperm, and pregnancy products (POC). It can be used for single cell analysis such as genotyping in biopsied samples taken from embryos. This can be used for rapid embryo analysis (biopsy in less than 1 day, 1 day, or 2 days) by targeted sequencing using mini-PCR.
In some embodiments, this can be used for tumor analysis: tumor biopsies are often a mixture of healthy cells and tumor cells. Targeted PCR allows for deep sequencing of SNPs and loci with or without background sequences. Which can be used for the loss and copy number of heterozygous analysis in tumor DNA. The tumor DNA can be present in many different body fluids or tissues of a tumor patient. This can be used for tumor recurrence and / or detection of tumor screening. This can be used for quality control testing of seeds. It can be used for aquaculture or fishing purposes. It is noted that none of these methods could equally well be used to target non-polymorphic loci for purposes of drainage requests.
Some documents describing some of the basic methods underlying the methods disclosed herein include: (1) Wang HY, Luo M, Tereshchenko IV, Frikker DM, Cui X, Li JY, Hu G, Chu Y , Azaro MA, Lin Y, Shen L, Yang Q, Kambouris ME, Gao R, Shih W, Li H. Genome Res. 2005 Feb; 15 (2): 276-83. Department of Molecular Genetics, Microbiology and Immunology / The Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey 08903, USA. (2) High-throughput genotyping of single nucleotide polymorphisms with high sensitivity. Li H, Wang HY, Cui X, Luo M, Hu G, Greenawalt DM, Tereshchenko IV, Li JY, Chu Y, Gao R. Methods Mol Biol. 2007; 396 - PubMed PMID: 18025699. (3) A method comprising mUL tiplexing of an average of 9 assays for sequencing is described in: Nested Patch PCR enables highly mul tiplexed mutation discovery in candidate genes. Varley KE, Mitra RD. Genome Res. 2008 Nov; 18 (11): 1844-50. Epub 2008 Oct 10. Note that the method disclosed herein permits multiplexing to the extent of greater magnitude than in the above reference.
Targeted PCR Mutant - Overlay
There are as many workflows as possible when performing PCR; Some exemplary workflows for the methods disclosed herein are described. The steps outlined herein are not meant to exclude other possible steps as well as that none of the steps described herein are required to allow the method to work properly. Many parameter variations or other variations are known in the literature and can be made without affecting the essence of the present invention. One particular generalized workflow is provided below with a number of possible changes. The variable typically refers to a possible second PCR reaction, e. G., A different type of overlap that can be performed (step 3). The variables may be performed at different times, or in an order different from those explicitly described herein. Examples using polymorphisms for illustration can be readily adapted for amplification of non-polymorphic loci, as the case may be .
1. The DNA in the sample may have a linking adapter commonly referred to as an attached library tag or linking adapter tag (LT), wherein the linking adapter contains a common priming sequence and involves a common amplification. In one embodiment, this can be performed using standard protocols designed to generate sequencing libraries after fragmentation. In one embodiment, a DNA sample can be subjected to a smooth end-of-sheath, followed by the addition of A to the 3 ' end. A Y-adapter with a T-overhang can be connected to connect. In some embodiments, other viscous ends may be used instead of A or T overhangs. In some implementations, other adapters, such as loop-connected adapters, may be added. In some embodiments, the adapter may have a tag designed for PCR amplification.
2. Specific Target Amplification (STA): Preliminary amplification of hundreds to thousands to tens of thousands and even hundreds of thousands of targets can be multiplexed in one reaction volume. The STA can typically be run in cycles of 5 to 40 cycles, 2 to 50 cycles, and even 1 to 100 cycles, but is typically performed at 10 to 30 cycles. Primers can be adjusted, for example, for simpler workflows or avoid sequencing of larger proportions of dimers. Typically, the dimers of both primers carrying the same tag will not be amplified or efficiently sequenced. In some embodiments, 1 to 10 cycles of PCR can be performed; In some embodiments, 10 to 20 cycles of PCR can be performed; In some embodiments, 20-30 cycles of PCR can be performed; In some embodiments, 30 to 40 PCRs can be performed; In some embodiments, more than 40 cycles of PCR can be performed. Amplification may be linear amplification. Multiple PCR cycles can be optimized to generate an optimal depth reading (DOR) profile. Different DOR profiles may be desirable for different purposes. In some implementations, a more uniform distribution of readings between all assays is desirable; If the DOR is too small for some assays, the probabilistic noise may be too large for the data to be too useful, but if the depth of the readout is too large, the minimal usefulness of each additional readout is relatively small.
Primer tails can improve the detection of fragmented DNA in commonly tagged libraries. If the library tag and the primer-tail contain homologous sequences, hybridization can be improved (e.g., melting temperature T_M) Is lowered. The primer can be extended if only a portion of the primer target sequence is in the sample DNA fragment. In some embodiments, more than 13 target specific base pairs may be used. In some embodiments, from 10 to 12 target specific base pairs may be used. In some embodiments, 8 to 9 target specific base pairs may be used. In some embodiments, from 6 to 7 target specific base pairs may be used. In some embodiments, the STA can be performed in pre-amplified DNA, such as MDA, RCA, other whole genome amplification, or adapter-mediated common PCR. In some embodiments, the STA can be performed on a sample in which the particular sequence and population are concentrated or depleted, for example by size selection, target capture, direct degradation.
3. In some embodiments, it is possible to perform a second multiplex PCR or primer extension reaction to increase specificity and reduce undesirable products. For example, a complete overlap, a certain degree of overlapping, a semi-overlap, and / or a subdivision into parallel reactants of a smaller black blank may be used to increase the specificity. Experiments show that splitting the sample into a 400-flex reaction produces product DNA with greater specificity than a single 1,200-flex reaction using exactly the same primers. Similarly, experiments indicate that splitting the sample into four 2,400-flex reactions produces product DNA with greater specificity than one 9,600-flex reaction using exactly the same primers. In one embodiment, it is possible to use the same and opposite directional, target-specific and tag-specific primers.
4. In some embodiments, a DNA sample produced by the STA reaction using a tag-specific primer and "common amplification" (dilution, purification or otherwise) is amplified, i.e., more or less pre- And it is possible to amplify the tagged target. The primer may contain additional functional sequences, such as a bar code, or a complete adapter sequence, which is essential for sequencing the high-emission sequencing platform.
These methods can be used for the analysis of specific samples of DNA and are particularly useful when the sample of DNA is particularly small or if it is a sample of DNA in which the DNA originates from one or more individuals such as in the case of maternal plasma Do. These methods can be used in single or small numbers of cells, DNA samples, such as genomic DNA, plasma DNA, amplified plasma libraries, amplified apoptosis supernatant libraries, or other samples of mixed DNA. In one embodiment, these methods can be used where cells of different genetic makeup can be present in a single individual, such as having a cancer or an implant.
protocol Mutant (remind In the workflow Modification and / or addition)
Direct multiplex mini- PCR: Specific target amplification (STA) of multiple target sequences with the tagged primers1Respectively. (101) represents a double-stranded DNA having a desired polymorphic gene locus in X. (102) represent double-stranded DNA with a splicing adapter added for common amplification. (103) represents a single chain DNA amplified in common using a hybridized PCR primer. (104) represents the final PCR product. In some embodiments, the STA can be at least 100, at least 200, at least 500, at least 1,000, at least 2,000, at least 5,000, at least 10,000, at least 20,000, at least 50,000, at least 100,000, or at least 200,000 targets . In a subsequent reaction, a tag-specific primer amplifies all target sequences and extends the tag to include all necessary sequences for sequencing, including the sample index. In one embodiment, the primers can be untagged or only specific primers can be tagged. Sequence adapters can be added by conventional adapter connections. In one embodiment, the initial primer can carry a tag.
In one embodiment, the primers are designed such that the length of the amplified DNA is unexpectedly short. The prior art demonstrates that the skilled artisan in the art typically designs a 100+ bp amplicon. In one embodiment, the amplicon can be designed to be less than 80 bp. In one embodiment, the amplicon can be designed to be less than 70 bp. In one embodiment, the amplicon can be designed to be less than 60 bp. In one embodiment, the ampicillin can be designed to be less than 50 bp. In one embodiment, the amplicon can be designed to be less than 45 bp. In one embodiment, the amplicon can be designed to be less than 40 bp. In one embodiment, the amplicon can be designed to be less than 35 bp. In one embodiment, the amplicon can be designed to be between 40 and 65 bp.
Experiments were performed using this protocol using 1200-plex amplification. Both genomic DNA and gestational plasma were used; Approximately 70% of the sequence reads were mapped to the targeted sequences. Details are also provided elsewhere in the document. Sequence analysis of the 1042-plex without design and selection of assays yielded a> 99% sequence of the primer dimer product.
Subsequent PCR: After STA1, multiple aliquots of the product can be amplified in parallel with the reduced complexity horn with the same primer. The first amplification can break apart the material sufficiently. The method is particularly well suited for small samples, for example, about 6-100 pg, about 100 pg to 1 ng, about 1 ng to 10 ng, or about 10 ng to 100 ng. The protocol was performed to be three 400-plexes using 1200-plex. The mapping of the sequencing reads increased to more than 95% in the vicinity of 60-70% in 1200-flex alone.
Somewhat - Nested Mini - PCR:(Reference:2) Perform a second STA that includes multiple sets of internal primed front primers (103 B, 105 b) and one (or few) of the tag-specific reverse primers (103 A) after STA 1. (101) represents a double-stranded DNA having a desired polymorphic gene locus in X. (102) represents double-stranded DNA with a splice adapter applied for common amplification. (103) represents a hybridized forward primer B and a single stranded DNA commonly amplified with a reverse primer A. (104) represents the PCR product obtained in (103). (105) represents the product obtained in (104) with a hybridized overlapping forward primer b, and reverse tag A, which is an already part of the molecule of PCR occurring between (103) and (104). (106) represents the final PCR product. Using the workflow in question, typically more than 95% sequence is mapped to the intended target. The overlapping primers may overlap with the outer forward primer sequence, but introduce an additional 3'-terminal base. In some embodiments, it is possible to use between 1 and 20 extra 3 'bases. Experiments indicate that it is better to use more than 9 extra 3 'bases in a 1,200-flex design.
Fully nested mini - PCR:(Reference:3) After STA Step 1, it is possible to perform a second multiplex PCR (or a parallel multiplex PCR of reduced multiplicity) using two overlapping primers carrying tags (A, a, B, b). (101) represents a double-stranded DNA having a desired polymorphic gene locus in X. (102) represent double-stranded DNA with a splicing adapter added for common amplification. (103) represents a single-stranded DNA commonly amplified using a hybridized forward primer B and a reverse primer A. (104) represents the PCR product obtained in (103). (105) represents the product obtained in (104) with hybridized overlapping forward primer b and overlapping reverse primer. (106) represents the final PCR product. In some embodiments, it is possible to use two complete sets of primers. A 146-plex amplification was performed without the step 102 of attaching and amplifying a common link adapter in single and three cells using experiments using a fully nested mini-PCR protocol.
half - Nested mini- PCR:(Reference:4) It is possible to use a target DNA with an adapter at the end of the fragment. (STA) comprising a forward primer (B) and multiple sets of one (or a few) tag-specific reverse primers (A). The second STA can be performed using a common tag-specific forward primer and a target specific reverse primer. (101) represents a double-stranded DNA having a desired polymorphic gene locus in X. (102) represent double-stranded DNA with a splicing adapter added for common amplification. (103) represents a single-stranded DNA commonly amplified using the hybridized reverse primer A. (104) shows the PCR product obtained in (103) amplified using the reverse primer A and the connecting adapter tag primer LT. (105) represents the product obtained in (104) with the hybridized forward primer B. (106) represents the final PCR product. In this workflow, the target specific forward and reverse primers are used in separate reactions to reduce the complexity of the reaction and prevent dimer formation of the forward and reverse primers. In this example, primers A and B can be considered to be first primers, and primers 'a' and 'b' can be considered to be internal primers. This method is as good as direct PCR, so it avoids the primer dimer although it is greatly improved in direct PCR. A first round of semi-nested protocol, typically ~ 99% of untargeted DNA can be found, but there is typically a big improvement after the second round.
Three Half-Nested Mini - PCR: (Reference:5) It is possible to use a target DNA with an adapter at the end of the fragment. (STA) comprising a forward primer (B) and multiple sets of single (or prime) tag-specific reverse primers (A) and (a). The second STA can be performed using a common tag-specific forward primer and a target specific reverse primer. (101) represents a double-stranded DNA having a desired polymorphic gene locus in X. (102) represent double-stranded DNA with a splicing adapter added for common amplification. (103) represents a single-stranded DNA commonly amplified using the hybridized reverse primer A. (104) shows the PCR product obtained in (103) amplified using the reverse primer A and the connecting adapter tag primer LT. (105) represents the product obtained in (104) with the hybridized forward primer (B). (106) represents the PCR product obtained at (105) amplified using reverse primer A and forward primer B. (107) represents the product obtained in (106) with the hybridized reverse primer 'a'. (108) represents the final PCR product. In this example, primers 'a' and 'B' can be considered to be internal primers, and A can be considered to be the first primer. Optionally, both A and B can be considered to be the first primer and 'a' can be considered to be the internal primer. The designation of reverse and forward primers can be switched. In the workflow, the target specific forward and reverse primers can be used in separate reactions to reduce the complexity of the reaction and prevent dimer formation of the forward and reverse primers. Although the method is as good as direct PCR, it is greatly improved in direct PCR, but avoids primer dimers. After the first round of the semi-nested protocol, typically ~ 99% of untargeted DNA can be found, but there is typically a big improvement after the second round.
One-sided The nested mini- PCR: (Reference:6) It is possible to use a target DNA with an adapter at the end of the fragment. The STA can also be performed using a multiplexing set of overlapping forward primers and a connection adapter tag as the reverse primer. Thereafter, the second STA can be performed using a set of overlapping forward primers and a common reverse primer. (101) represents a double-stranded DNA having a desired polymorphic gene locus in X. (102) represent double-stranded DNA with a splicing adapter added for common amplification. (103) represents a single-stranded DNA commonly amplified using the hybridized reverse primer A. (104) shows the PCR product obtained in (103) amplified using the reverse primer A and the connecting adapter tag primer LT. (105) represents the product obtained in (104) with hybridized overlapping forward primers. (106) represents the final PCR product. The method can detect shorter target sequences than standard PCR using overlapping primers in the first and second STAs. The method is typically performed with a sample of DNA that has already undergone attachment and amplification of the STA step 1-common tag; The two overlapping primers are on one side only and the other side uses a library tag. The method was performed in a library of apoptosis supernatants and pregnancy plasma. With this workflow, approximately 60% of the sequence was mapped to the intended target. Note that the reading containing the reverse adapter sequence is not mapped, so that the number is expected to be higher than when the reading containing the reverse adapter sequence is mapped.
Short-side Mini - PCR: It is possible to use a target DNA with an adapter at the short end (see:7). STA can be performed using a forward primer and a single (or a few) tag-specific reverse primer. (101) represents a double-stranded DNA having a desired polymorphic gene locus in X. (102) represent double-stranded DNA with a splicing adapter added for common amplification. (103) represents a single-stranded DNA having a hybridized forward primer A. (104) shows the PCR product obtained in (103) amplified using the forward primer A and the connecting adapter tag reverse primer LT. The method can detect a target sequence that is shorter than a standard PCR. However, this may be relatively non-specific since only one target specific primer is used. The protocol is effectively half of the cross-nested mini PCR.
Reverse The semi-nested mini- PCR: It is possible to use a target DNA with an adapter at the short end (see:8). STA can be performed using multiple sets of forward primers and one (or few) tag-specific reverse primers. (101) represents a double-stranded standard DNA having a desired polymorphic gene locus in X. (102) represent double-stranded DNA with a splice adapter applied for common amplification. (103) represents a single-stranded DNA having a hybridized reverse primer B; (104) shows the PCR product obtained in (103) amplified using reverse primer B and connecting adapter tag forward primer LT. (105) represents a hybridized forward primer A, and a PCR product 104 with an internal reverse primer ' b '. (106) represents the PCR product amplified at (105) using forward primer A and reverse primer 'b', which is the final PCR product. The method can detect a target sequence that is shorter than a standard PCR.
There may be additional modifications, such as simple repetitions or combinations of the above methods, such as double-nested PCR, where three sets of primers are used. Another variant is a mini-PCR superimposed on a 1 and 1/2 plane, where the STA can also be performed using multiple sets of overlapping forward primers and one (or few) tag-specific reverse primers.
Note that in both of these variants, the entities of the forward primer and the reverse primer can be interchanged. In some embodiments, the nested variants can be performed equally well without the initial library preparation involving an adapter tag attached and a common amplification step. In some embodiments, additional rounds of PCR may be included with additional forward and / or reverse primer and amplification steps; It is noted that these additional steps may be particularly useful when it is desired to further increase the presence of DNA molecules corresponding to the targeted gene locus.
Nested Workflow
There are many ways to perform amplification using different degrees of overlap, and different degrees of multiplexing.9, A flow chart is provided with some possible workflows. The use of 10,000-flex PCR is meant to be merely exemplary; Note that these flow charts can work equally well for different degrees of multiplexing.
Loop connection adapter
For example, when applying a common tagged adapter for the purpose of providing a library for sequencing, there are a number of ways to connect the adapter. One method is to smooth-terminate the sample DNA, perform A-tailing, and connect it to an adapter with T-overhang. There are a number of other ways to connect the adapter. There are also a number of adapters that can be connected. For example, a Y-adapter can be used, wherein the adapter consists of two strands of DNA, wherein one strand has a region embodied by a duplex region and a forward primer region, wherein the other strand comprises a first strand Lt; RTI ID = 0.0 > reverse < / RTI > primer. The double stranded region, when annealed, may contain a T-overhang for connection purposes to double-stranded DNA with an overhang.
In one embodiment, the adapter may be a loop of DNA, wherein the terminal region is complementary, wherein the loop region comprises a forward primer tagged region (LFT), a reverse primer tagged region (LRT) &Lt; / RTI > (see:10). (101) refers to double-stranded, smooth-ended target DNA. (102) refers to the A-tailed target DNA. (103) refers to a looped connection adapter having a T overhang 'T' and a cut portion 'Z'. (104) refers to the target DNA with looped connection adapter attached. (105) refers to a target DNA having a connection adapter attached and cut at a cleavage site. LFT refers to the connection adapter front tag, and LTR refers to the connection adapter reverse tag. The complementarity region can end in T overhangs or end up with other features that can be used for linking to the target DNA. The cleavage site may be a series of cleavage sites for cleavage by UNG or may be a restriction enzyme or other cleavage method or a sequence that can be recognized and cleaved by immediate basic amplification. These adapters can be used for library production, e.g., sequencing. These adapters may be used with any of the other methods described herein, for example, mini-PCR amplification methods.
Internally Tagged primer
When an allele is measured at a given polymorphic locus using sequencing, a sequence reading typically begins at the top of the primer binding site (a) and subsequently against the polymorphic site (X). Tags are typicallyDegree 11, As shown on the left side. (101) refers to a Japanese chain target DNA having a desired polymorphic locus 'X' and a primer 'a' having the attached tag 'b'. To avoid nonspecific hybridization, the primer binding site (the region of target DNA complementary to 'a') is typically 18-30 bp in length. Sequence tag 'b' is typically about 20 bp; Many people use primer sequences marketed by sequence analysis platform companies, but in theory they can be any length longer than about 15 bp. The distance 'd' between 'a' and 'X' can be at least 2 bp to avoid allelic bias. When performing multiplexed PCR amplification using the methods disclosed herein or other methods, where careful primer design is essential to avoid excessive primer primer interaction, the window of acceptable 'd' between 'a' and 'X' Can vary from very high: 2 bp to 10 bp, 2 bp to 20 bp, 2 bp to 30 bp, or even 2 bp to 30 bp. therefore,11, If using the primer structure shown on the left, the sequence reads should be at least 40 bp to obtain a length of the reading sufficient to measure the polymorphic locus, and 'a' and 'd' 60 or 75 bp or less. Generally, the longer the sequence readings, the greater the cost and time to sequence a given number of readings, so that minimizing the required read length can save time and money. Also, on average, since the early base readings on the readings are read more accurately than the latter readings in the readings, reducing the essential sequence read length can also increase the precision of the measurement of the polymorphic region.
In one embodiment, the named internally targeted primer, the primer binding site (a), is cleaved into a plurality of segments (a ', a ", a"' ....)11, And (103), in the middle of two primer binding sites. Such a structure allows the sequence to be a shorter sequence reading. In one embodiment, a ', + a "should be at least about 18 bp and may be at least 30, 40, 50, 60, 80, 100 or 100 bp. bp, in one embodiment about 8 to 16 bp. All other factors that are equivalent using internally tagged primers can be cut at least as long as 6 bp, 8 bp, 10 bp, 12 bp, 15 bp, and even as much as 20 or 30 bp in length of sequence readings. This can create significant cost, time and precision advantages. Examples of internally tagged primers include:12.
Connecting adapter with coupling area primer
One issue with fragmented DNA is that because of its short length, the chance of the polymorphism approaching the end of the DNA strand is higher than the longer strand (i.e., (101), Figure 10). Since PCR capture of a polymorphism requires primer binding sites of appropriate length on both sides of the polymorphism, a DNA strand with a significant number of tagged polymorphisms will be lost due to insufficient overlap between the primer and the targeted binding site. In one embodiment, the target DNA 101 may have a link adapter attached to 102, and the target primer 103 is complementary to the link adapter tag lt attached to the top of the designed linkage region < RTI ID = 0.0 >Quot; cr " region (see: "13); If the binding region (the region of complementary 101 to a) is shorter than the 18 bp typically required for hybridization, the region cr of the primer that is complementary to the library tag is the binding site at the point where PCR can proceed Energy can be increased. It is noted that any specificity lost due to shorter binding regions can be produced by other PCR primers with a suitably long target binding region. Such implementations may include direct PCR, or any of the other methods described herein, for example, nested PCR, some overlapping PCR, half-overlapping PCR, cross-over overlapping or some or semi-overlapping PCR, or It can be used with any of the other PCR protocols.
When measuring ploidy with an assay that involves comparing the observed allele data with the allele distribution predicted for various hypotheses using sequencing data, each additional reading of the allele with a low depth of reading Water will acquire more information than a readout of an allele with a large depth of reading. Thus, ideally, each locus may desire to find a uniform depth of reading (DOR) with a similar number of representative sequence readings. Therefore, it is desirable to minimize the DOR variation. In one embodiment, it is possible to reduce the coefficient of variation of the DOR (which can be defined as the average of the standard deviation of the DOR / DOR) by increasing the annealing time. In some embodiments, the annealing temperature may be at least 2 minutes, at least 4 minutes, at least 10 minutes, at least 30 minutes, and at least one hour, or even more. Since annealing is an equilibrium process, there is no restriction on the improvement of the DOR variation with an increased annealing time. In one embodiment, increasing the primer concentration can increase the DOR parameter.
Exemplary Whole Genome Amplification Methods
In some embodiments, the methods of the invention can include amplifying DNA, such as the use of whole genome applications to amplify a nucleic acid sample prior to amplifying the target gene site directly. Amplification of DNA, a process that converts a small amount of genetic material into a large amount of genetic material including a set of similar genetic data, can be performed in a wide variety of ways including, but not limited to, polymerase chain reaction (PCR). One way to amplify DNA is whole genome amplification (WGA). There are a number of methods available for WGA: link-mediated PCR (LM-PCR), denaturing oligonucleotide primer PCR (DOP-PCR), and multiple alternative amplification (MDA). In LM-PCR, a short DNA sequence, called an adapter, is linked to the blunt end of the DNA. These adapters contain a common amplification sequence used to amplify DNA by PCR. In DOP-PCR, random primers that also contain a common amplification sequence are used in the first round of annealing and PCR. Thereafter, the second round of PCR is used to amplify the sequences with additional common primer sequences. MDA uses phi-29 polymerase, a highly progressive, non-specific enzyme that replicates DNA and is used for single-cell analysis. Major constraints on the amplification of single cell material are (1) the need for a reaction mixture of extremely dilute DNA concentration or an extremely small volume, and (2) the ability to reliably dissociate the DNA of the protein according to the entire genome. Nevertheless, single-cell whole genome amplification has been used successfully for many years in a variety of applications. There are other ways to amplify the DNA of a DNA sample. DNA amplification converts an internal sample of DNA into a similar but much larger sample of DNA in the sequence set. In some cases, amplification may not be required.
In some embodiments, the DNA can be amplified using a common amplification such as WGA or MDA. In some embodiments, the DNA can be amplified by targeted amplification, e. G., Targeted PCR, or by circulating probes. In some embodiments, the DNA can be preferentially enriched using a method of targeted amplification, or a method that produces desirable complete or partial isolation in unwanted DNA, such as capture by a hybridization approach. In some embodiments, DNA can be amplified using a combination of a common amplification method and a preferred enrichment method. A complete description of some of these methods can be found elsewhere in this document.
Exemplary enrichment and sequencing methods
In one embodiment, the methods disclosed herein preserve the relative allele frequency present in the original sample of DNA of each target gene locus (e.g., each polymorphic locus) of a set of target gene loci (e.g., polymorphic locus loci) Lt; / RTI > Although enrichment is particularly advantageous for methods for analyzing polymorphic loci, such enrichment methods can be readily adapted to non-emerging loci as the case may be. In some embodiments, amplification and / or selective enrichment techniques may include PCR, such as linkage-mediated PCR, fragment capture by hybridization, molecular reverse probes, or other circular probes. In some embodiments, the method for amplification or selective enrichment may comprise using a probe wherein upon precise hybridization to the target sequence, the 3-prime or 5-prime end of the nucleotide probe is hybridized to the nucleotide probe by a small number of nucleotides Lt; RTI ID = 0.0 > polymorphic < / RTI > This separation reduces preferential amplification of one allele, named allele bias. This is an improvement over methods in which the 3-prime or 5-prime end of an accurately hybridized probe involves using probes adjacent or very close to the polymorphic site of the allele. In one embodiment, the hybridization region may contain a polymorphic site or precisely contain probes are excluded. Polymorphic sites at the hybridization site generate preferential amplification of a particular allele by either causing unequal hybridization or inhibiting hybridization together in some alleles. These implementations are particularly advantageous in that they allow better conservation of the original allele frequency of the sample at each polymorphic locus, regardless of whether the sample is a pure genomic sample taken from a single entity or a mixture of individuals / &Lt; / RTI > or other methods including selective enrichment.
The use of sequencing followed by techniques for enriching a sample of DNA in a set of target gene sites as part of a method for requesting non-invasive fetal alleles or drainage may confer a number of unexpected advantages. In some embodiments of the present application, the method comprises providing a PARENTAL SUPPORT^TMAnd measuring the genetic data for use with an informatics-based method, such as the PS (PS) method. The ultimate result of some of the embodiments is genetic data that can lead to embryonic or fetal litigation. There are many methods used to measure genetic data of individuals and / or related entities as part of an implemented method. In one embodiment, a method of enriching the concentration of a set of targeted alleles is described herein, the method comprising one or more of the following steps: targeted amplification of the genetic material, genetic locus specific oligonucleotides One or a plurality of strands of DNA by DNA sequencing methods, such as addition of a probe, linking of specific DNA strands, separation of a desired DNA set, removal of unwanted components of the reaction, detection of a specific sequence of DNA by hybridization, &Lt; / RTI > In some cases, the DNA strand may refer to the target genetic material, and in some cases they may refer to primers, and in some cases they may refer to synthesized sequences, or combinations thereof. These steps may be performed in a number of different orders.
For example, a common amplification step of DNA prior to targeted amplification can confer several advantages, such as eliminating the risk of bottlenecks and reducing allelic bias. DNA can be a mixed oligonucleotide probe that can hybridize to regions of two adjacent target sequences, one on each side. After hybridization, the ends of the probes can be allowed to circulate by connecting the polymerase, the linking means, and any necessary reagents. After circulation, exonuclease may be added to decompose into non-circulating genetic material, and then the circulated probe may be detected. DNA can be mixed with PCR primers, one on each side, that can hybridize to two neighboring regions of the target sequence. After hybridization, the ends of the probe can be ligated with the polymerase, the linking means, and any necessary reagents to complete the PCR amplification. DNA that is not amplified or amplified can be targeted by a hybrid capture probe that targets a set of loci; After hybridization, the probe can be localized and separated from the mixture to provide a mixture of concentrated DNA in the target sequence.
The use of sequence analysis as part of a method for targeting an individual gene site followed by an allelic request or a request for drainage can confer a number of unpredictable advantages. Some methods by which DNA can be targeted or preferentially enriched include the use of circulating probes, coupled reverse probes (LIP, MIP), capture by hybridization methods such as SURESELECT, and targeted PCR or linkage-mediated PCR amplification strategies . &Lt; / RTI >
In some embodiments, the methods of the present invention may be practiced using a PARENTAL SUPPORT^TMAnd measuring the genetic data for use with an informational method, such as a computer program (PS). PARENTAL SUPPORT^TMIs an informational approach to manipulating genetic data, an aspect of which is disclosed herein. The ultimate outcome of some aspects is a clinical decision based on data that can lead to litigation following genetic data that could lead to embryonic or fetal litigation. The algorithm behind the PS method takes the genetic data measured in the target entity, often the embryo or fetus, and the genetic data measured in the relevant entity, and can increase the precision with which the genetic status of the target entity is known. In one embodiment, the measured genetic data is used in terms of achieving a mobility measurement during fetal genetic diagnosis. In one embodiment, the measured genetic data is used in the sense that the embryo undergoes a mobility measurement or allele requirement during in vitro fertilization. In the context of the foregoing, there are many ways that can be used to measure genetic data of an individual and / or a related entity. Different methods include multiple steps, which are often performed by amplification of the genetic material, addition of oligonucleotide probes, linking of detailed DNA strands, isolation of the desired DNA set, removal of unwanted components of the reaction, Detection of a specific sequence of DNA, and detection of the sequence of one or more strands of DNA by DNA sequencing. In some cases, the DNA strand may refer to the target genetic material, and in some cases they may refer to primers, and in some cases they may refer to synthesized sequences, or combinations thereof. These steps may be performed in a number of different orders.
It is theoretically noted that it is possible to target any number of genetic loci in the genome, even from one locus to one million loci. When a sample of DNA is subjected to targeting and then sequenced, the percentage of alleles read by the sequencer will be enriched with respect to their natural abundance in the sample. The degree of concentration may be anywhere from 1 percent (or even less) to 10-fold, 100-fold, 1000-fold, or even up to several million-fold. The human genome contains approximately 3 billion base pairs, and approximately 75 million polymorphic genes There is a nucleotide, including a spot. The greater the number of targeted gene loci, the lower the degree of possible enrichment. The fewer the number of gene sites to be targeted, the greater the degree of possible enrichment, and a greater depth of reading can be achieved at these loci for a given number of sequence readings.
In one embodiment herein, the targeting or preference may focus entirely on the SNP. In one embodiment, targeting or preference can focus on any polymorphic site. Many commercially available targeting products are available for concentrating the exon. Surprisingly, targeting entire SNPs, or entirely polymorphic loci, is particularly advantageous when using methods for NPD that depend on allele distribution. Wherein the readout factor is based on the number of readings mapped to a given chromosome and the analysis of the readings comprises the analysis of the readings when the analyzed sequence reads do not focus on regions of the genome that are polymorphic For example, U.S. Patent No. 7,888,017. These types of methods that do not focus on polymorphic alleles may not be as much advantageous as the type of preferential enrichment or targeting of a set of alleles.
In embodiments of the invention, it is possible to use a targeting method to concentrate gene samples in the polymorphic region of the genome, focusing on the SNPs. In one embodiment, a small number of SNPs, such as from 1 to 100 SNPs, or more, such as from 100 to 1,000, from 1,000 to 10,000, from 10,000 to 100,000, or more than 100,000 SNPs It is possible to match. In one embodiment, it is possible to focus on one or a small number of chromosomes, such as

chromosomes

13, 18, 21, X and Y, or a combination thereof, associated with a living trisomic birth. In one embodiment, it is possible to concentrate the targeted SNP with a small factor, for example, 1.01 to 100 times, or a larger factor, such as 100 to 1,000,000 times, or even 1,000,000 times or more. In one embodiment of the invention, it is possible to use a targeting method that produces a sample of DNA that has been preferentially enriched within the polymorphic region of the genome. In one embodiment, it is possible to use this method in which the mixture of DNA produces a mixture of DNA using either the maternal DNA and also any of these characteristics containing glass-floating fetal DNA. In one embodiment, it is possible to use this method to generate a mixture of DNA with any combination of these factors. For example, the methods described herein can be used to determine the degree of preferential enrichment in DNA corresponding to 200 SNPs, including both maternal DNA and fetal DNA, all located on

chromosome

18 or 21, A mixture of DNA can be produced. In another example, it is possible to use a method of generating a mixture of DNA preferentially enriched in 10,000 SNPs, all or most located on

chromosome

13,18, 21, X and Y, with an average concentration of 500 times or more per locus Do. Any of the targeting methods described herein can be used to produce a mixture of DNA that is preferentially enriched in a particular gene locus.
In some embodiments, the methods comprise measuring DNA in a mixed fraction using a high-output DNA sequencer, wherein the DNA in the mixed fraction has an unequal number of sequences as measured on one or more chromosomes Wherein one or more chromosomes are in a group comprising chromosome 13, chromosome 18, chromosome 21, chromosome X, chromosome Y, and combinations thereof.
Three methods are described herein: multiplex PCR, targeted capture by hybridization, and linked inverted probe (LIP), wherein a sufficient number of polymorphic loci measurements of the maternal plasma sample are obtained using the method And fetal insulin sensitivity can be detected; This does not imply that selective enrichment of the targeted gene locus excludes other methods. Other methods can be used equally well without changing the main content of the method. In each case the assayed polymorphism may comprise a single nucleotide polymorphism (SNP), a small insert, or an STR. A preferred method involves the use of SNPs. Each approach produces allele frequency data; The maternal allele frequency data for each targeted gene locus and / or the distribution of allele frequency frequencies of these loci can be analyzed to determine fetal drainage. Each approach has its own considerations due to the fact that the limited source material and maternal plasma consist of a mixture of maternal and fetal DNA. The method may be combined with other approaches to provide a more precise measurement. In one embodiment, the method may be combined with a sequence coefficient approach as described in U.S. Patent No. 7,888,017. The described approach can also be used to detect non-invasive fetal paternity from maternal plasma samples. In addition, each approach can be applied to other mixtures of DNA or pure DNA samples to detect the presence or absence of a chromosome of interest, to analyze genotypes of multiple SNPs with a denatured DNA sample, to detect fractional copy number variation (CNV) , Other desired genetic conditions, or some combination thereof.
Precise measurement of allele distribution in samples
The current sequencing approach can be used to evaluate the distribution of alleles in a sample. One such method involves random sampling of the sequence of the hornblende DNA, named as shotgun sequencing. The ratio of specific alleles in the sequencing data is typically very low and can be measured with simple statistics. The human genome contains approximately 30 million base pairs. Thus, if the sequence analysis used produces a 100 bp reading, the specific allele will be measured approximately once in every 30 million sequence readings.
In one embodiment, the method is used to determine the presence or absence of two or more different haplotypes containing a set of identical gene positions in a sample of DNA from a measured allele distribution of the chromosomal locus of the chromosome. The different haplotypes are characterized by two different homologous chromosomes of one individual, three different homologous chromosomes of a trichromatic entity, three different homologous haplotypes of the maternal and fetal, where one of the haplotypes is shared between the mother and the fetus, , Three or four haplotypes of the maternal and fetal (where one or two of the haplotypes are shared between the mother and the fetus), or other combinations. Although allelic and ternary children tend to be more informed than alleles that are polymorphic between haplotype, any allele that is homozygous for the same allele has a measured allele distribution that exceeds the available information from a simple readability count analysis Lt; RTI ID = 0.0 > useful information.
However, the shut-down sequence analysis of these samples is highly inefficient, as it is not polymorphism between different haplotypes in the sample or results in a large number of sequences for the region for the undesired chromosome, thus providing information on the ratio of the target haplotype Not shown. What is described herein is a method for preferentially enriching and / or specifically targeting segments of DNA in a sample that tends to be polymorphic in the genome to increase the yield of allelic information obtained by sequencing. Since the allele distribution measured in the enriched sample actually represents the actual amount present in the target entity, there is little or no preferential concentration of one allele at a given locus in the targeted segment compared to other alleles Notice the importance. Current methods known in the art for targeting polymorphic alleles are designed to ensure that at least some of the specific alleles present are detected. However, these methods were not designed to measure the unbiased allele distribution of polymorphic alleles present in the original mixture. It is not clear that any particular method of target enrichment will produce a concentrated sample, where the measured allele distribution can accurately represent the allele distribution present in the original unamplified sample than any other method. Although many enrichment methods can be predicted, in theory, in order to achieve this purpose, one of ordinary skill in the art will appreciate that there is a large amount of probabilistic or deterministic bias in current amplification, targeting and other preferential enrichment methods . One embodiment of the methods described herein allows many alleles found in a mixture of DNA corresponding to a given gene locus in the genome to be amplified or preferentially enriched in substantially the same degree of concentration of each of the alleles . Another way of saying this is that the method allows the relative amount of alleles present in the mixture to be wholly increased so that the ratio of alleles corresponding to each locus is essentially the same as that present in the original mixture of DNA To remain. For some reported methods, preferential enrichment of genetic loci can produce greater than 1%, greater than 2%, greater than 5%, and even greater than 10% allelic bias. This preferential concentration can be due to amplification deflections that can be large for trapping or for each cycle, but when combined over 20, 30 or 40 cycles, using capture by the hybridization approach. For purposes herein, the ratio required to remain there is that the ratio of alleles in the original mixture divided by the ratio of the alleles in the resulting mixture is between 0.95 and 1.05, between 0.98 and 1.02, between 0.99 and 1.01, between 0.995 and 1.005, between 0.998 and 0.998, 1.002, 0.999 to 1.001, or 0.9999 to 1.0001. The calculation of allelic ratios shown herein can not be used to determine the pivotal state of the target entity and can be the only metric used to measure allele bias.
In one embodiment, once the mixture is preferentially enriched in a set of target gene sites, it can be cloned (including samples from single molecules; examples include ILLUMINA GAIIx, ILLUMINA HiSeq, Life Technologies SOLiD, 5500XL) Sequencing can be performed using any of the preceding, current, or next generation sequencing devices to be analyzed. The ratio can be assessed by sequencing through a specific allele in the targeted region. These sequence analysis readings can be sequenced and counted according to the ratio of allele types and correspondingly determined different alleles. In the case of a change of 1 to a few bases in length, the detection of the allele will be performed by sequencing and it is essential to evaluate the allele composition of the molecule in which the sequencing reads are captured across the allele in question. The total number of captured molecules analyzed for the genotype can be increased by increasing the length of the sequence analysis readout. Complete sequence analysis of all molecules can ensure the collection of the maximum amount of data available in the concentrated horn. However, sequencing is currently costly, and a method that can measure the allele distribution using a smaller number of sequence readings would be more valuable. There is also a precision limitation with increasing technical limits and read lengths for the maximum possible length of the readings. The allelic of maximum utility will be from 1 to several bases in length, but any allele that is theoretically shorter than the length of the sequence analysis can be used. Although allelic variants come in all types, the examples provided herein focus on just a few neighboring base pair containing variants or SNPs. Larger variants, such as segmental copy number variants, can in many cases be detected with smaller variations of aggregation since the entire collection of SNPs internal to the segment is redundant. Larger variants than some bases, such as STR, require specific considerations and some targeting approaches will work but others will not.
There are a number of targeting approaches that can be used to specifically isolate and concentrate one or multiple variant sites within the genome. Typically, this will depend on taking advantage of invariant sequences flanking mutant sequences. There is a report by others related to targeting in the context of sequencing if the substrate is maternal plasma (see, for example, Liao et al., Clin. Chem. 2011; 57 (1): pp. 92-101). However, these approaches do not focus on targeting the polymorphic region of the genome, using a targeting probe that targets the exon. In one embodiment, the methods of the invention include using a targeting probe that is wholly or nearly exclusively focused in the polymorphic region. In one embodiment, the methods described herein involve using a targeting probe that is wholly or nearly entirely in the interest of the SNP. In some embodiments of the disclosure, the targeted polymorphic site comprises at least 10% SNP, at least 20% SNP, at least 30% SNP, at least 40% SNP, at least 50% SNP, at least 60% SNP, at least 70 % SNP, at least 80% SNP, at least 90% SNP, at least 95% SNP, at least 98% SNP, at least 99% SNP, at least 99.9% SNP, or entirely SNP.
In one embodiment, the method of the present invention can be used to measure the genotype (base composition of DNA at a particular locus) of a mixture of DNA molecules and the relative proportions of these genotypes, wherein these DNA molecules are one or more genetically It can originate from an obvious object. In one embodiment, the methods of the invention can measure the relative ratio of the genotype in the set of polymorphic loci, and the amount of the mutant alleles present in these loci. In one embodiment, the polymorphic locus may consist entirely of SNPs. In one embodiment, the polymorphic locus may comprise a SNP, a single tandem repeat, and other polymorphisms. In one embodiment, the method of the present invention can be used to determine the relative distribution of alleles in a set of polymorphic loci in a mixture of DNA, wherein the mixture of DNA comprises DNA from parental origin and DNA from fetal origin do. In one embodiment, the binding allele distribution can be measured in a mixture of DNA isolated from the blood of a pregnant woman. In one embodiment, the allelic distribution in a set of loci can be used to measure the morbidity of one or more chromosomes in an embryo conceived.
In one embodiment, a mixture of DNA molecules can be derived from DNA extracted from multiple cells of an individual. In one embodiment, the original collection of cells from which the DNA originates may include a mixture of diploid or monoclonal cells of the same or different genotype, when the individual is a mosaic (wired or somatic cell). In one embodiment, the mixture of DNA molecules may also originate from DNA extracted from a single cell. In one embodiment, the mixture of DNA molecules may also originate from DNA extracted from a mixture of two or more cells of the same or different individuals. In one embodiment, the mixture of DNA molecules can originate from DNA isolated from biological material already liberated from cells such as blood plasma, known to contain free DNA. In one embodiment, the biological material may be a mixture of one or more individual DNAs, such as during pregnancy where the fetal DNA is found to be present in the mixture. In one embodiment, the biological material could originate from a mixture of cells found in maternal blood, wherein some of the cells are of fetal origin. In one embodiment, the biological material could be cells of blood of pregnant women enriched in fetal cells.
Circulating Probe
Some embodiments herein involve amplifying the target gene locus either before or after amplification with a non-LIP primer in a multiplex PCR method of the invention using the " linked inverted probe "(LIP) described previously in the literature. LIP is a generic term meant to encompass techniques involving the generation of circular molecules of DNA, wherein a probe is designed to hybridize to a targeted region of DNA on one side of a targeted allele, thereby providing a suitable polymerase and / The addition of ligase and the appropriate conditions, buffers and other reagents complete the inverted region, complementary to the DNA along the targeted allele, to produce a circular loop of DNA that captures the information found in the targeted allele . LIP may also be referred to as pre-circulating probe, pre-circulating probe, or circulating probe. The LIP probe may be a linear DNA molecule of 50 to 500 nucleotides in length, and in one embodiment may be 70 to 100 nucleotides in length; In some implementations this may be longer or shorter than those described herein. Other embodiments of the present application involve different incarnations of LIP techniques such as padlock probes and molecular reverse probes (MIP).
One method of targeting a specific site for sequencing is to synthesize a probe wherein the 3 ' and 5 ' ends of the probe are annealed to the target DNA at the flanking and side positions of the targeted region, Following the addition of the DNA ligase, the extension of the 3 'end, the addition of a base to a Japanese chain probe complementary to the target molecule (gap-fill) followed by a circular DNA Resulting in the connection of the 3 ' end to the 5 ' end of the original probe generating the molecule. The probe ends are designed to flank the desired targeted area. One aspect of the approach is generally named MIP and is used in conjunction with the array technique to determine the identity of the filled sequence. One drawback of the use of MIP in an environment measuring allelic ratios is that the hybridization, circulation and amplification steps do not occur at an equivalent rate to different alleles at the same locus. This produces a measured allele ratio that does not represent the actual allele ratio present in the original mixture.
In one embodiment, the circulating probe comprises a region of the probe designed to hybridize to the top of the targeted polymorphic locus and a region of the probe designed to hybridize under the targeted polymorphic locus to the covalent bond through the non- Respectively. The skeleton may be any biocompatible molecule or a combination of biocompatible molecules. Some examples of possible biocompatible molecules are poly (ethylene glycol), polycarbonate, polyurethane, polyethylene, polypropylene, sulfone polymers, silicones, celluloses, fluoropolymers, acrylic compounds, styrene block copolymers, and other block copolymers .
In one embodiment herein, the approach has been modified to modify the sequence so as to facilitate sequencing as a means of querying the in-sequence packing. At least one important consideration must be considered to retain the original allele ratio of the original sample. The various positions among the different alleles in the gap-fill region should not be too close to the probe binding sites, as they may be initiation biases due to DNA polymerases producing differential variants. Another consideration is that additional mutations may be present at the probe binding site associated with variants in the gap-fill region that are capable of producing unequal amplification of different alleles. In one embodiment herein, the 3 'and 5' ends of the pre-circulated probe are designed to hybridize to a base in one or a few positions remote from the mutated position (polymorphic position) of the targeted allele. The number of bases between a polymorphic site (SNP or otherwise) and the base designed to hybridize the 3'end and / or 5'of the pre-circulated probe may be one base, or it may be two bases, Which may be three bases or it may be four bases or it may be five bases or it may be six bases or it may be 7 to 10 bases or it may contain 16 to 20 bases, 20 to 30 bases, or 30 to 60 bases. The forward and reverse primers can be designed to hybridize different numbers of bases away from the polymorphic site. Cycling of probes can be generated in large quantities using current DNA synthesis techniques that allow a large number of probes to be generated and potentially mixed so that multiple loci probes are simultaneously available. More than 300,000 probes have been reported to work. Methods that include circulating probes that can be used to measure genomic data of a target entity include: Porreca et al., Nature Methods, 2007 4 (11), pp. 931-936 .; And also in Turner et al., Nature Methods, 2009, 6 (5), pp. 315-316. The methods described in these articles may be used in conjunction with other methods described herein. Certain steps of the methods described in these two articles can be used with other steps of the other methods described herein.
In some embodiments of the methods disclosed herein, the genetic material of the target entity is optionally amplified followed by hybridization of the pre-circulated probe, performing gap filling to fill the base between the two ends of the hybridized probe And the two ends are connected to form a circulated probe, and the circulated probe is amplified using, for example, rolling circle amplification. Once the desired target allele genetic information is captured by circulating a suitably designed oligonucleotide probe, for example, in a LIP system, the genetic sequence of the circulated probe can be measured to obtain the desired sequence data. In one embodiment, a suitably designed oligonucleotide probe can be directly cycled to the unamplified genetic material of the target entity and amplified rearward. Note that a number of amplification processes including rolling ring amplification, MDA, or other amplification protocols can be used to amplify the original genetic material, or circulated LIP. For example, different methods, such as high-emission sequence analysis, Sanger sequencing, other sequence analysis methods, capture-by-hybridization, capture-by-circul arization, , Multiplex PCR, other hybridization methods, and combinations thereof, may be used to measure genetic information in the target genome.
Once the individual ' s genetic material is measured using one or more combinations of the above methods, with appropriate genetic measurements, PARENTAL SUPPORT^TM Method is then used to measure the genetic status of one or more of the alleles of one or more chromosomes of an individual, and / or alleles, specifically allelic correlated with a desired genetic condition or disease . It is noted that the use of LIP has been reported by genetic analysis using sequencing followed by multiplex capture of the genetic sequence. However, the use of sequence data generated by LIP-based methods for the amplification of genetic material found in single cells, small numbers of cells, or extracellular DNA has not been used for the purpose of measuring the drainage state of the target entity.
The application of informatics based on the method of measuring the diastolic status of an individual by genetic data measured by an ILLUMINA INFINIUM array, or a hybridization arrangement such as the AFFYMETRIX gene chip, is described in this document by reference elsewhere in this document. However, the methods described herein exhibit improvements over the methods already described in the literature. For example, high-emission sequence analysis following a LIP-based approach provides better genetic data due to better performance for multiplexing, better capture specificity, better uniformity, and an approach with lower allelic variability. Larger multiplexing allows more alleles to be targeted, resulting in more precise results. Greater uniformity provides more precise results by allowing more targeted alleles to be measured. Lower rates of allele biases result in lower rates of false requests, resulting in more precise results. More accurate results lead to improved clinical outcomes and better medical care.
It is important that LIP can be used as a method for targeting specific gene loci in DNA samples for genotyping by methods other than sequencing. For example, LIP can be used to target DNA for genotyping using SNP sequences or other DNA or RNA-based microarrays.
connect- Mediated PCR
Linkage-mediated PCR can be used to amplify the target gene locus either before or after PCR amplification using unlinked primers. Link-mediated PCR is a PCR method used to preferentially enrich samples of DNA by amplifying one or a plurality of loci in a mixture of DNAs, the method comprising a set of primer pairs, wherein each primer of the pair is a target specific Wherein the target specific sequence is preferably designed to anneal to the target region, one upper and one lower portion of the polymorphic site, and 0, 1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11-20, 21-30, 31-40, 41-50, 51-100, or more than 100 in the polymorphic site; Polymerization at the 3-prime end of the upper primer to this and the 5'-prime end of the lower primer with nucleotides complementary to the target molecule to the flanking strand region packing; Connection of the lower primer of the last polymerized base of the upper primer to the adjacent 5-prime base; And amplification of the only polymerized and linked molecule using a non-target sequence contained at the 5-prime end of the upper primer and at the 3-prime end of the lower primer. A pair of primers for an apparent target can be mixed in the same reactant. Non-target sequences are provided in a common sequence so that all pairs of successfully polymerized and linked primers can be amplified with a single pair of amplification primers.
On hybridization Capture by
In some embodiments, the methods of the invention can include amplifying the target gene locus using any of the following encodings by hybridization methods in addition to using multiple PCR. The preferential enrichment of a specific set of sequences in the target genome can be accomplished in a number of ways. While somewhere in the document describes how LIP can be used to target a specific set of sequences, both of these applications can equally well use other targeting and / or preferential enrichment methods for the same purpose. One example of another targeting method is capture by a hybridization approach. Some embodiments of commercially available capture by hybridization techniques include AGILENT ' s SURE SELECT and ILLUMINA ' s TruSeq. In capture by hybridization, a set of oligonucleotides that are mostly complementary or complementary to a desired targeted sequence are allowed to hybridize to a mixture of DNA and then physically separated from the mixture. Once the desired sequence is hybridized to the target oligonucleotide, the effect of physically removing the targeted oligonucleotide is also to remove the targeted sequence. Once the hybridized oligos are removed, they can be heated above their melting temperature and amplified. Some methods of physically removing the targeted oligonucleotide are by covalently binding the targeted oligos to a solid support, such as a magnetic bead, or a chip. Another way to physically remove the targeted oligonucleotides is to covalently link them to molecular moieties that have strong affinity for other molecular moieties. An example of such a pair of molecules is biotin and streptavidin as used in the SURE SELECT. Thus, such a targeted sequence could be covalently attached to the biotin molecule and, after hybridization, a biotinylated oligonucleotide that hybridizes to the targeted sequence using a solid support with immobilized streptavidin It can be pulled down.
Hybrid capture involves hybridizing a probe that is complementary to a target to a target molecule. Hybrid capture probes have been originally developed to target and concentrate large fractions of the genome with relative homogeneity between targets. In the present application, it was important that all regions were amplified with sufficient homogeneity so that they could be detected by sequencing, but it was not considered to maintain the ratio of alleles in the original sample. After capture, the allele present in the sample can be measured by direct sequencing of the captured molecule. These sequence analysis readings can be analyzed and counted according to the allele type. However, using current technology, the measured allele distribution, the captured sequence typically does not exhibit the original allele distribution.
In one embodiment, the detection of an allele is performed by sequencing. To ensure allele identity at the polymorphic site, it is essential that the sequence analysis reads across the allele in question and assesses the allele composition of the captured molecule. Because capture molecules are often of variable length, it is not possible to ensure that the sequence overlaps the mutant position unless the entire molecule is sequenced. However, cost considerations and also the technical constraints on the maximum possible length and the precision of the sequencing readings make it impossible to achieve sequence analysis of the whole molecule. In one embodiment, the read length can be increased to from about 30 to about 50, or about 70 bases can greatly increase the number of readings overlapping mutant positions within the targeted sequence.
Another way to increase the number of readings querying a desired location is to reduce the length of the probe, unless it causes a bias in the confluent alleles encountered. The length of the synthesized probe should be long enough so that the two probes designed to hybridize to two different alleles found in one locus will hybridize to the approximately identical affinity for various alleles in the original sample. Presently, methods known in the art typically describe longer probes than 120 bases. In this embodiment, when the allele is one or a small number of bases, the capture probe may comprise less than about 110 bases, less than about 100 bases, less than about 90 bases, less than about 80 bases, about 70 Less than about 10 bases, less than about 50 bases, less than about 40 bases, less than about 30 bases, and less than about 25 bases, which is equivalent to all of the alleles Enough to ensure enrichment. If the mixture of DNA to be concentrated using hybrid capture techniques is a mixture comprising free floating DNA isolated from blood, for example, maternal blood, the average length of DNA is very short, typically less than 200 bases. The use of shorter probes creates a greater opportunity for the hybrid capture probe to capture the desired DNA fragment. A larger change may require a longer probe. In one embodiment, the desired modification is from one (SNP) to several bases in length. In one embodiment, the targeted region in the genome can be preferentially enriched using a hybrid capture probe, wherein the hybrid capture probe has a base length of 90 or fewer nucleotides, less than 80 base lengths, less than 70 base lengths , Less than 60 base lengths, less than 50 base lengths, less than 40 base lengths, less than 30 base lengths, or less than 25 base lengths. In one embodiment, the length of the probe designed to hybridize the polymorphic allele position to the flanking region to increase the chance that the desired allele is sequenced is from 90 or more bases to about 80 bases, or To about 70 bases, or to about 60 bases, to about 50 bases, to about 40 bases, to about 30 bases, or to about 25 bases.
There is a maximum overlap between the synthesized probe and the target molecule to enable capture. These synthesized probes can be made as short as possible but are larger than this minimum overlap required for overlap. The effect of using a shorter probe length to capture a polymorphic region is that there will be more molecules overlapping the target allele region. The fragmented state of the original DNA molecule also affects the number of readings that overlap the targeted allele. Some DNA samples, such as plasma samples, are already fragmented due to biological processes in vivo. However, samples with longer fragments are advantageous due to fragmentation prior to sequencing library preparation and concentration. If both the probe and the fragment are short (~ 60-80 bp), maximum specificity can be achieved in a relatively small number of sequence readings that fail to overlap with the desired major region.
In one embodiment, the hybridization conditions can be adjusted to maximize the capture uniformity of the different alleles present in the original sample. In one embodiment, the hybridization temperature is lowered to minimize the difference in hybridization bias between alleles. Methods known in the art avoid the use of lower temperatures for hybridization since they have the effect of increasing the hybridization of the probe to the unintended target of lowering the temperature. However, when the target conserves allelic ratios with the greatest accuracy, approaches using a lower hybridization temperature provide an optimal precise allele ratio, despite the fact that the current technology teaches away from this approach . The hybridization temperature may also be increased to require greater overlap between the target and the synthesized probe so that only the target with the substantial overlap of the targeted region is captured. In some embodiments of the invention, the hybridization temperature is lowered to about 40 캜, about 45 캜, about 50 캜, about 55 캜, about 60 캜, about 65, or about 70 캜 at the normal hybridization temperature.
In one embodiment, a hybrid capture probe can be designed so that the region of the capture probe with DNA complementary to the DNA found in the region flanking the polymorphic allele is not immediately adjacent to the polymorphic site. Instead, the capture probe is designed so that the region of the capture probe designed to hybridize to the flanking DNA of the polymorphic site of the target is replaced by a small distance equivalent to one or fewer bases in length, van der Waals) at the site of the capture probe. In one embodiment, the hybrid capture probe is designed to hybridize to the flanking region of the polymorphic allele but does not cross it; This can be termed a flanking capture probe. The length of the flanking capture probe may be less than about 120 bases, less than about 110 bases, less than about 100 bases, less than about 90 bases, less than about 80 bases, less than about 70 bases, less than about 60 bases , Less than about 50 bases, less than about 40 bases, less than about 30 bases, or less than about 25 bases. The region of the genome targeted by the flanking capture probe can be separated by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 to 20, or 20 base pairs by the polymorphic locus have.
The description of targeted capture is based on a disease screening test using targeted sequence capture. Conventional targeted sequence sequencing is similar to those currently provided by AGILENT (SURE SELECT), ROCHE-NIMBLEGEN, or ILLUMINA. Capture probes were routinely designed to ensure capture of various types of mutations. In the case of point mutations, one or more probes overlapping the point mutation should be sufficient to capture and sequence the mutation.
In the case of small insertions or deletions, one or more probes overlapping the mutation may be sufficient to capture and sequence the fragment containing the mutation. Hybridization may be nearly insufficient between probe-limited capture efficiencies, which are typically designed for reference genomic sequences. To ensure capture of the fragments containing the mutations, two probes could be designed, one matching the normal allele and one matching the mutant allele. Longer probes can improve hybridization. Multiple overlapping probes can improve capture. Finally, by placing the probe immediately adjacent, but not overlapping, the mutation can tolerate a relatively similar capture effect of the normal and mutant alleles.
In the case of a simple chain repetition (STR), probes overlapping these highly variable sites may not capture the fragments well. To improve capture, the probe can be placed close to but not overlapping the variable region. Thereafter, the fragment may be sequenced to normal to indicate the length and composition of the STR.
For large defects, a series of overlapping probes, the common approach currently used in exon capture systems, can be activated. However, using this approach may be difficult to measure if an individual is heterozygous. Targeting and evaluating SNPs within the captured region could potentially indicate a loss of heterozygosity over the region where the individual is a carrier. In one embodiment, it is possible to place the non-overlapping or object probes along the potentially deleted region and use multiple captured fragments as a measure of heterozygosity. In the case where the subject entails large deletions, one half of the fragments are predicted to be available for capture of the non-deleted (diploid) reference gene locus. Consequently, the number of readings obtained in the deleted region should be approximately one half that obtained in the common diploid locus. Sequencing with a number of individual probes along the potentially deleted region Aggregating and averaging the readout depth can improve the signal and improve diagnostic confidence. It may also be possible to target SNPs to identify loss of heterozygosity and to obtain quantitative measures of the amount of essential fragments at these loci using multiple individual probes. Each of these strategies, or both, together with other strategies, can better achieve the same goal.
Linked dominant mutant in the case of cfDNA detection of a male fetus as captured by the presence of the Y-chromosome fragment, captured and sequenced in the same test and not affected by the mother and father, Dominant mutations that do not respond to the fetus may present the greatest risk to the fetus. Detection of two mutant alleles in the same gene in the unaffected parent may imply that the mutant allele has been inherited from the maternal father and potentially inherited the second mutant allele from the mother . In all cases, subsequent testing by amniocentesis or chorionic villus sampling may be indicated.
Targeted catch-based disease screening tests can be combined with targeted capture based on a non-invasive prenatal diagnostic test for isomerism.
There are a number of ways to reduce the depth of read (DOR) variability: for example, increase the primer concentration, use a longer targeted amplification probe, or use more STA cycles (e.g., 25 or more, 30 or more, 35, or even 40 or more).
Of the sample DNA Exemplary methods for measuring the number of molecules
Methods for measuring the number of DNA molecules in a sample by generating uniquely identified molecules for each original DNA molecule in the sample during the first round of DNA amplification are described herein. A process for accomplishing the above goal according to a single molecule or clonal sequence analysis method is described herein.
This approach targets one or more specific loci of the gene and most or all of the tagged molecules of each targeted locus will have unique tags and can be distinguished from one another by sequencing of the barcodes using clone or single molecule sequence analysis And generating a tagged copy of the original molecule in such a way that Each unique sequence-analyzed bar code represents the unique molecule in the original sample. Simultaneously, sequencing data is used to identify the locus of the gene from which the molecule originates. The information can be used to determine the number of unique molecules in the original sample for each locus of the gene.
The method can be used in any application where quantitative evaluation of multiple molecules in the original sample is required. Also, the number of unique molecules of one or more targets can be correlated to the number of unique molecules for one or more other targets to determine the relative copy number, mature trait distribution, or allele ratio. You can model the number of missing copies of a running target as a distribution to identify the most similar copies of the original target. Applications include the detection of insertions and deletions as found in mediators of Duchenne muscular dystrophy; Quantification of deletion or redundant segments of the same chromosome as observed in the copy number variants; The number of copies of chromosomes in samples taken from a born subject; Including, but not limited to, the number of copies of a chromosome of a sample taken from an unborn individual, such as an embryo or fetus.
The method can be combined with the simultaneous evaluation of variants contained within those targeted by the sequence. This can be used to measure the number of molecules representing each allele in the original sample. The method of copying may include evaluating a SNP or other sequence variation to measure the number of chromosomes of a born and unborn individual; Wherein the PCR identifies and quantifies a copy of the amplifiable locus of a plurality of targets that can be amplified in a plurality of target regions, such as carrier detection of vertebral muscle atrophy; Can be combined with the measurement of the number of copies of different minute resources of a sample of a mixture of different individuals as in the detection of the fetal identity of free floating DNA obtained in maternal placental plasma.
In one embodiment, the method for a single target gene locus may comprise one or more of the following steps: (1) Design of a standard pair of oligomers for PCR amplification of specific loci. (2) Addition to the 5 'end of one of the target specific oligomers of the sequence of the specified base, which is complementary or minimal to the target gene locus or genome during synthesis. The sequence designated as tail is a known sequence used for subsequent amplification involving the sequence of a random nucleotide. The random nucleotide in question contains a random region. The random region comprises a randomly generated sequence of nucleic acids that are stochastically different between each probe molecule. Consequently, following the synthesis, the tailed oligomeric mixture will consist of a set of oligomers, starting with a known sequence, followed by a sequence that is different from that between the molecules, followed by a target specific sequence. (3) One round of amplification (denaturation, annealing, extension) using only tailed oligomers. (4) adding an exonuclease to the reaction, effectively stopping the PCR reaction, and incubating the reaction at the appropriate temperature to remove and extend the non-annealed front flanking oligos on the template to form a double-stranded product. (5) The reaction product is incubated at high temperature to denature exonuclease and remove its activity. (6) adding new oligonucleotides that are complementary to the tail of the oligomer used in the first reaction with other target specific oligomers to enable PCR amplification of the product generated in the first round of PCR. (7) Continue amplification to form sufficient product for the subcloning sequence analysis. (8) A number of methods for a sufficient number of bases to extend the sequence, e. G., The measurement of PCR products amplified by clone sequence analysis.
In one embodiment, the methods herein involve parallelizing or otherwise targeting multiple loci. Primers for different target gene sites can be generated independently and mixed to produce multiple PCR mixes. In one embodiment, the original sample can be subdivided into bovine horns, and the different genetic loci can be targeted in each sub-cohort prior to recombination and sequencing. In one embodiment, the targeting step, and the number of amplification cycles, is accomplished by sufficient targeting of all targets and subsequent amplification by serial amplification using a smaller set of primers in the submultiplied mix, prior to subdividing the horseshoe Can be performed before assuring that the improvement is achieved.
One example of an application in which the technique may be particularly useful is in the diagnosis of non-invasive adult childhood, wherein the presence of an allele in a given locus, or a distribution of alleles at multiple loci, This can help to measure the number of copies of chromosomes. In this regard, it may be desirable to amplify the DNA present in the initial sample while maintaining the relative amounts of various alleles. In some situations, a very small amount of DNA, for example, a genome of less than 5,000 copies, a genome of less than 1,000 copies, a genome of less than 500 copies, and a genome of less than 100 copies , You can face a phenomenon called bottleneck. This is because there are a small number of copies of a particular allele of the initial sample so that the amplification bias can produce an amplified hornblende of DNA having a significantly different ratio of these alleles than is present in the initial mixture of DNA . By applying a unique or nearly unique set of bar codes to each strand of DNA prior to standard PCR amplification, it is possible to eliminate n-1 copies of DNA in the same set of n molecules of sequenced DNA originating from the same original molecule It is possible.
For example, consider a heterozygous SNP in the genome of an individual, and a mixture of individual DNAs if ten molecules of each allele are present in the original sample of DNA. After amplification, there may be 100,000 molecules of DNA corresponding to these loci. Because of the stochastic process, the ratio of DNA can be between 1: 2 and 2: 1, but since each of the original molecules is tagged with a unique tag, the DNA of the amplified product is expressed in exactly 10 molecules of DNA of each allele It may be possible to measure the origin. Thus, the method can provide a more accurate measurement of the relative amount of each allele than a method that does not use the approach. For methods where it is desirable to minimize the relative amount of allelic deviation, the method will provide more precise data.
Association of the sequenced fragments to the target gene locus can be accomplished in a number of ways. In one embodiment, a sufficient length of sequence is obtained in the targeted fragment to elongate a sufficient number of unique bases corresponding to the molecular barcode and target sequence to clearly identify the target gene locus. In other embodiments, molecular bar-coding primers containing randomly generated molecular barcodes may also contain a locus specific barcode (locus barcode) that identifies the target to which they are associated. The locus barcode may be the same among all the molecular bar-coding primers for the target of each individual, but is different from all other targets. In one implementation, the tagging method described herein can be combined with a cross-over-overlap protocol.
In one embodiment, the design and production of molecular bar code primers can be reduced and performed as follows: The molecular bar code primers are designed to hybridize to a target molecule, such as a target molecule, followed by a randomized molecular bar code region followed by a sequence that is not complementary to the target specific sequence Lt; / RTI > Sequence 5 'of the molecular barcode may be used for sequencing PCR amplification and may include sequences useful in the conversion of the amplicon into a library for sequencing. Random molecular bar code sequences can be generated in a number of ways. A preferred method is to synthesize molecular tagging primers in such a way that all four bases are included in the reaction during the synthesis of the bar code domain. All or various combinations of bases may be embodied using an IUPAC DNA ambiguity code. A synthetic collection of molecules in this way will contain a random mixture of sequences in the molecular bar code region. The length of the barcode region will measure how many primers will contain the unique barcode. The number of unique sequences isN ^L (Where N is the number of bases, typically 4 and L is the length of the bar code). Bar codes of 5 bases can produce no more than 1024 unique sequences; Bar codes of 8 bases can produce 65536 unique bar codes. In one embodiment, the DNA can be measured by a sequencing method, wherein the sequence data represents the sequence of a single molecule. This may include a method in which a single molecule is directly sequenced or a method in which a single molecule is amplified to form a detectable clone by the sequencing device, but still represents a single molecule, referred to herein as a clonal sequence analysis.
Exemplary methods and reagents for quantification of amplification products
Quantification of the desired specific nucleic acid sequence is typically performed by quantitative real time PCR techniques such as TAQMAN (LIFE TECHNOLOGIES), INVADER probe (manufacturer: THIRD WAVE TECHNOLOGIES), and the like. This technique has the limited ability to achieve simultaneous analysis of multiple sequences (multiplexing) and the ability to provide precise quantitative data for only a narrow range of possible amplification cycles (e. G., PCR amplification yield vs. number of cycles When the logic of the input signal is in the linear range). DNA sequencing techniques such as those used in MYSEQ, HISEQ, ION TORRENT, GENOME ANALYZER ILX, GS FLEX + (ROCHE 454) Genetic sequencing techniques (often referred to as gigantic parallel sequencing techniques) are used for the quantitative determination of the number of copies of the desired sequence present in the sample, quantitative information on the starting material, for example, Copy number or transcription level. A high-throughput genetic sequencer can be analyzed simultaneously by using a barcoding (ie, sample tagging with an apparent nucleic acid sequence) to identify a particular sample from an individual, allowing simultaneous analysis of multiple samples in a single-run DNA sequencer have. The number of times the specific region of the genome in the library preparation (or other desired nucleic acid preparation) is sequenced (the number of readings) will be proportional to the number of copies of the sequence in the desired genome (or expression level in the case of a cDNA-containing preparation) . However, preparations and sequencing of gene libraries (and similar genomic formulations) can introduce a number of biases that hinder obtaining accurate quantitative readings for the desired nucleic acid sequence. For example, different nucleic acid sequences can be amplified with different efficiencies during the nucleic acid amplification steps that occur during gene library preparation or sample preparation.
Problems associated with differential amplification efficiency can be alleviated by using certain aspects of the present invention. The present invention includes various methods and compositions related to the use of standards for inclusion in an amplification process that can be used to improve the accuracy of quantitative quantities. The present invention is, among other fields, described in U.S. Patent Nos. 8,008,018, each of which is hereby incorporated herein by reference in its entirety, among others; U.S. Patent No. 7,332,277; PCT Patent Publication No. WO 2012 / 078792A2; And analyzing free floating fetal DNA in the maternal blood as described in PCT Publication No. WO 2011/146632 A1. Embodiments of the present invention are also for use in the detection of impermeability in vitro generated embryos. Commercially significant imperfections that can be detected include the isomerism of

human chromosome

13,18, 21, X and Y. [
Embodiments of the invention can be used with human or non-human nucleic acids, and can be applied to both animal and plant-derived nucleic acids. Embodiments of the invention may also be used to detect and / or quantify alleles for other genetic disorders characterized by deletion or insertion. Deletions containing alleles can be detected in predicted carriers of the desired allele.
One embodiment of the present invention includes standards that are present in known amounts (either relative or absolute). For example, consider a gene library made in triplicate against a chromosome 8 (including gene locus A) and a chromosome 21 (containing locus B). A gene library can be produced as an amount of a function of a plurality of chromosomes present in a sample, for example, 200 copies of gene locus A and 300 copies of locus B. However, if gene locus A is amplified more efficiently than locus B, there may be 60,000 copies of the A amplicon and 30,000 copies of the B amplicon after PCR, so high-throughput DNA sequencing (or other quantitative nucleic acid Detection technology) obscures the actual chromosomal copy number of the initial genomic sample. To alleviate this problem, a standard sequence for gene locus A is used, wherein the standard sequence is amplified with essentially the same efficiency as locus A. Similarly, a standard sequence for gene locus B is generated, wherein the standard sequence amplifies with essentially the same efficiency as locus B. The standard sequence of Gene A and the standard sequence of Gene B are added to the mixture prior to PCR (or other amplification technique). These standard sequences are present in known amounts, in either large or absolute amounts. When 1: 1 mixture of standard sequence A and standard sequence B is added to the mixture (before amplification) in the previous example, 3000 copies of the standard A amplicon can be generated and 1000 copies of the standard B amplicon , Indicating that gene locus A is amplified three times more efficiently than locus B under the same set of conditions.
In various embodiments, one or more selected genomes of a genome containing a desired SNP (or other polymorphism) may be specifically amplified and subsequently sequenced. The target specific amplicon may occur during the formation of a gene library for sequencing. The library may contain a plurality of targeted regions for amplification. In some embodiments, at least 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 target regions. Examples of such libraries are found in U.S. Patent Application No. 2012/0270212, filed November 18, 2011, the disclosure of which is hereby incorporated herein by reference in its entirety.
Many high-throughput DNA sequencing techniques require modifications of the genetic starting material, e. G., A common priming site and / or linkage of the barcode, to facilitate clone amplification of small nucleic acid fragments before performing subsequent sequencing reactions do. In some embodiments, one or more standard sequences are added during gene library formation or added to a precursor component of a gene library prior to amplification of the library. Standard sequences can be selected to replicate target genomic fragments prepared for sequencing by high-throughput genetic sequencing techniques (yet distinguishable based on nucleotide sequence). In one embodiment, the standard sequence may be identical to the target genomic fragment except 1, 2, 3, 4 to 10, or 11 to 20 nucleotides. In some embodiments, when the target genetic sequence contains a SNP, the standard sequence may be identical to the SNP except for the nucleotide in the polymorphic base, which is one of the four nucleotides that are not observed in nature at that location Can be selected. Standard sequences may be used in multiple multiplex assays of multiple target gene sites (e. G., Polymorphic loci). Standard sequences are added in known amounts (relative or absolute) during the process of library formation (prior to amplification) to provide a standard metric for greater precision in determining the amount of target sequence desired in the assay sample . Using a combination of knowledge of known quantities of the standard sequence used with the knowledge of preexisting levels of mobility, for example, the formation of ploidy levels of libraries for sequencing formed from genomes known to be diploid for all autosomal chromosomes The amplification characteristics of each standard sequence can be corrected with respect to the number of changes between batches of the mixture containing a plurality of standard sequences and their corresponding target sequences. It is useful to produce a mixture containing a large set of standard sequences, given that this is often necessary to simultaneously analyze a large number of gene sites. Embodiments of the invention include mixtures comprising a plurality of standard sequences. Ideally, the amount of each standard sequence in a mixture is known to be highly precise. However, as a practical matter, it is very difficult to achieve the object, especially in the case of a mixture comprising a large amount of different synthetic oligonucleotides, since there is a significant amount of variation in the amount of each standard sequence. These changes have in vitro oligonucleotide synthesis reaction efficiencies between multiple sources, for example batches, inaccuracies in volume measurements, and variations in pipetting. In addition, such changes can occur between different batches that theoretically contain exactly the same amount and exactly the same set of standard sequences. Thus, it is interesting that each of the standard sequences independently corrects the batch. Batch of standard sequences can be corrected for the reference genome of known chromosomal compositions. Batches of the standard sequence can be corrected by sequencing the batch of the standard sequence with no or minimal amplification step included in the sequence analysis protocol. Embodiments of the invention include a calibrated mixture of different standard sequences. Other embodiments of the invention include a method of correcting a mixture of different standard sequences and a corrected mixture of different standard sequences prepared in this manner.
Various embodiments of the subject mixture of standard sequences and methods of using them include at least 10; 100, 500; 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or more than 100,000 standard sequences, as well as various intermediate quantities. The number of standard sequences may be equal to the number of target sequences selected for analysis during generation of the targeted library for DNA sequencing. However, in some embodiments, it may be advantageous to use a smaller number of standard sequences than the number of regions targeted in the library being constructed. It may be advantageous to use fewer numbers to avoid appearing against the limit of the sequencing ability of the high-emission DNA sequencer used. The number of standard sequences is less than 50% of the number of target regions, less than 40% of the number of target regions, less than 30% of the number of target regions, less than 20% of the number of target regions, Less than 10%, less than 5% less than the number of targeted regions, less than 1% less than the number of targeted regions, and also various intermediate values. For example, if a gene library is generated using 15,000 pairs of primers targeted for a specific SNP containing gene locus, a suitable mixture containing 1500 standard sequences corresponding to 1500 of the 15,000 targeted gene loci Can be added prior to the amplification step of the library construct.
The amount of standard sequence applied during library construction can vary significantly in different embodiments. In some embodiments, the amount of each of the standard sequences may be approximately equal to the predicted amount of the target sequence present in the genomic material sample used during library preparation. In other embodiments, the amount of each of the standard sequences may be greater than or less than the predicted amount of the target sequence present in the genomic material sample used during library preparation. Although the initial relative amounts of the target sequence and the standard sequence are not critical to the function of the present invention, the amount is within the range of 100 times to less than 100 times the amount of the target sequence present in the genetic material sample used for library preparation desirable. Excess amounts of standards can use too much sequencing capability of DNA sequencers in certain implementations of the device. The use of too little of the standard sequence will provide insufficient data to aid in the analysis of the change in amplification efficiency.
The standard sequence may be selected to be very similar in nucleotide sequence to the desired amplified region; Preferably, the standard sequence has exactly the same primer-binding site as the genomic region analyzed, i.e., the "target sequence ". Standard sequences should be distinguished from corresponding target sequences at the desired locus. For convenience, this distinguishable region of the standard sequence will be referred to as the "marker sequence ". In some embodiments, the marker sequence region of the target sequence contains a polymorphic region, e. G., A SNP, and can be flanked on both sides by a primer binding region. The standard sequence may be selected to closely match the GC content of the corresponding target sequence. In some embodiments, the priming binding region of the standard sequence is flanked by a common priming site. These common priming sites are chosen to match the common priming sites used in the genomic library for analysis. In other embodiments, the standard sequence does not have a common priming site and a common priming site is added during the generation of the library. Standard sequences are typically provided in the form of a single strand. Standard sequences are defined with respect to the corresponding target sequences and sequence specific reagents used to amplify the target sequence. In some embodiments, the target sequence contains the desired polymorphism, e. G., A SNP present in the nucleic acid sample for analysis, deletion, or insertion. The standard sequence is a synthetic polynucleotide analogous to the nucleotide sequence in the nucleotide sequence for the target sequence, but nevertheless is distinguished from the target sequence due to at least one nucleotide base difference so that the amplicon sequence from the standard sequence is amplified from the target sequence It provides a mechanism to distinguish between sequences. The standard sequence is selected to have essentially the same amplification characteristics as the corresponding target sequence when amplified with the same set of amplification reagents, e. G., PCR primers. In some embodiments, the standard sequence may have the same primer sequence binding site than the corresponding target sequence. In other embodiments, the standard sequence may have a different primer sequence binding site than the corresponding target sequence. In some embodiments, the standard sequence may be selected to produce an amplicon having a length equal to the length of the amplicon produced from the corresponding target sequence. In other embodiments, the standard sequence may be selected to produce an amplicon having a length slightly different than the length of the amplicon produced from the corresponding target sequence.
After the amplification reaction is complete, the library is sequenced on a high-export DNA sequencer, where the individual molecules are clonally amplified and sequenced. The number of sequence readings for each allele of the target sequence is counted and the number of sequence reads for the standard sequence corresponding to the target sequence is counted. The process is also carried out on at least one other pair of the target sequence and the corresponding standard sequence. For example, when considering gene locus A, the ratio of X to allele 1 in locus A_A1 A readout is produced; X for allele 2 of locus A_A2 Lt; RTI ID = 0.0 > X < / RTI >_AC Is produced. (X_A1 And X_A2) Vs. X_ACIs determined for each locus of interest. As discussed above, the process may be performed in a genome known to be a diploid for the reference genome, e. G., All saline organisms. The process can measure the average number of readings and the standard deviation of a number of readings by repeating several times to provide a large number of readings. The process is carried out with a mixture comprising a large number of different standard sequences corresponding to different genetic loci. (1) X_A1 And X_A2(2) correspond to a known number of chromosomes, for example, the normal human female genome, and (2) the standard sequences have amplification (and detectability) properties similar to their corresponding natural locus of genes , The relative amount of different standard sequences in the multi-standard mixture can be measured. Subsequently, the corrected multi-standard sequences can be used to adjust for variability in amplification efficiency between different loci in a multiplex amplification reaction.
Another embodiment of the invention includes methods and compositions for determining the number of copies of a desired special gene, including redundant and mutant genes characterized by large deletions that may interfere with quantification by sequence analysis . Sequence analysis can have the problem of detecting an allele with this deletion. The standard sequence included in the amplification process can be used to reduce this problem.
In one embodiment of the invention, the target sequence for analysis is a gene having a wild-type (i.e., functional) form and a mutant form characterized by deletion. An example of such a gene is the allele with deletion involved in SMN1, a genetic disease, spinal muscular atrophy (SMA). It is interesting to detect individuals involving mutant forms of genes using high-throughput genetic sequencing techniques. Applying this technique to the detection of deletion mutants can be a problem, among other reasons, due to deletion of sequences observed in sequence analysis (as opposed to detecting simple point mutations or SNPs). Such an embodiment may include (1) a pair of amplification primers specific for the desired gene, wherein the amplification primer will amplify the desired gene (or a portion thereof) and will not significantly amplify the mutant allele (2) a sequence that corresponds to a wild-type allele of the desired gene (i.e., the target sequence) but that is different by at least one detectable nucleotide base, (3) is specific for a second target sequence provided as a reference sequence A pair of amplification primers, and (4) a standard sequence corresponding to the reference sequence.
In one embodiment of the present invention, a method is provided for measuring the number of copies of a desired gene, wherein the desired gene has an allele that includes deletion. The method amplifies at least a part of the target gene or the entire target gene or a region adjacent to the target gene but does not amplify the deletion including the allele of the target gene so that the specific gene Amplification reagents specific for the target gene, for example PCR primers, can be used. The method also uses a standard sequence corresponding to the gene of interest, wherein the standard sequence is different from at least one nucleotide base of the gene of interest (thus the sequence of the standard sequence is identical to the naturally- Can easily be distinguished). Typically, a standard sequence will contain the same primer binding site as the desired gene, thereby minimizing any amplification distinction between the desired gene and the corresponding standard sequence of the desired gene. The reaction will also include an amplification reagent specific for the reference sequence. A reference sequence is a sequence of a known number of copies (or at least deduced to be known) within the genome to be analyzed. The reaction also includes a standard sequence corresponding to the reference sequence. Typically, the standard sequence corresponding to the reference sequence will contain the same primer binding site as the reference sequence, thereby minimizing any amplification distinction between the reference sequence and the corresponding standard sequence of the reference sequence.
Exemplary nucleic acid samples
In some embodiments, a dielectric sample can be prepared and / or refined. There are a number of standard procedures known in the art to accomplish this purpose. In some embodiments, the sample can be separated into various layers by centrifugation. In some embodiments, the DNA can be isolated using filtration. In some embodiments, the production of DNA can be accomplished by any method known in the art such as amplification, separation, purification by chromatography, liquid liquid separation, isolation, differential enrichment, differential amplification, Or may include a certain number of other techniques described herein.
In some embodiments, the methods disclosed herein can be used in the context of in vitro fertilization, or in the presence of very small amounts of DNA, such as in a forensic context, where one or a few cells are available (typically less than 10 cells , Less than 20 cells or less than 40 cells). In these embodiments, the methods disclosed herein are provided for preparing drainage requests from small amounts of DNA that are not contaminated by other DNA, but where drainage requests are very difficult with small amounts of DNA. In some embodiments, the methods disclosed herein could be used in situations where the target DNA has been contaminated with DNA from other individuals, for example, with respect to products of fetal screening, paternity testing, or cognitive testing. Some other situations where these methods may be particularly advantageous may be the case of cancer testing where one or a few cells were present in large numbers of normal cells. Genetic determinations used as part of these methods include the use of specific samples containing DNA or RNA such as blood, plasma, body fluids, urine, hair, tears, saliva, tissue, skin, nails, But not limited to, saliva, chorionic villus samples, feces, bile, lymph, cervical mucus, semen, or other cells or materials including nucleic acids. In one embodiment, the methods described herein can be performed with nucleic acid detection methods such as sequencing, microarray, qPCR, digital PCR, or other methods used to measure nucleic acids. If, for some reason, this is found to be desirable, the ratio of allele frequencies in the locus is calculated and the ploidy status is measured, together with some of the methods described herein, using the allelic ratios to determine if the method is suitable . In some embodiments, the methods disclosed herein include calculating allelic ratios at multiple polymorphic loci by DNA measurements made on a sample processed on a computer. In some embodiments, the methods described herein are performed on a computer, along with any combination of the other improvements described herein, by calculating the allele ratio in a plurality of polymorphic loci by DNA measurements made in the processed sample .
In some embodiments, the method can be used to treat a single cell, a small number of cells, 2 to 5 cells, 6 to 10 cells, 10 to 20 cells, 20 to 50 cells, 50 to 100 cells, From 1 to 10 picograms, from 10 to 100 picograms, from 100 picograms to 1 nanogram, from 1 to 10 nanograms, from 10 to 100 nanograms, or from 100 nanograms to 1 nanogram, Micrograms of extracellular DNA can be genotyped.
Illustrative RNA Expression study
The multiplex PCR method of the present invention can be used to increase the number of target gene sites that can be evaluated during gene expression profiling experiments. For example, the level of expression of thousands of genes can be monitored simultaneously to determine whether an individual has a sequence associated with a disease (e.g., cancer) or an increased risk of disease (e.g., polymorphism or other mutation) . These methods include sequences associated with increased or decreased risk for diseases such as cancer by comparing gene expression (e.g., expression of a particular mRNA allele) in samples taken from patients with or without disease (e. G. Polymorphism or other mutation). In addition, the effect of the developmental stage on a particular treatment, disease, or gene expression can be measured. Similarly, these methods can be used to identify genes that are altered in response to a pathogen or other organism by comparing gene expression in infected and uninfected cells or tissues thereof. In these methods, the number of sequencing readings can be adjusted based on the frequency of the polymorphism being analyzed, so that sufficient readings can be performed on the polymorphism to detect if they are present.
In some embodiments, a sample containing RNA (e.g., mRNA) is amplified using reverse transcription (RT) and the resulting DNA (e.g., cDNA) is amplified using a DNA polymerase (PCR) . The RT and PCR steps may be performed sequentially or separately in the same reaction volume. Any of the primer libraries of the present invention may be used in the reverse transcription polymerase chain reaction (RT-PCR) method. In various embodiments, reverse transcription is carried out using oligo-dT, a random primer, a mixture of oligo-dT with a random primer, or a primer specific for the target gene locus. In order to avoid amplification of contaminated genomic DNA, a primer for RT-PCR was designed so that one part of one primer hybridized to the 3 'end of one exon and the other part of the primer hybridized to the 5' end of the adjacent exon . These primers anneal to cDNA synthesized from spliced mRNA, but are not annealed to genomic DNA. To detect amplification of contaminated DNA, the RT-PCR primer pair can be designed to flank the region containing at least one intron. The products amplified from cDNA (intron-free) are smaller than those amplified from genomic DNA (no intron). The size difference in the product is used to detect the presence of contaminating DNA. In some embodiments, where only the mRNA sequence is known, a primer annealing site of at least 300 to 400 base pairs apart is chosen because this size fraction of eukaryotic DNA tends to contain splice junctions. Running, the sample can be treated with DNase to decay the contaminated DNA.
Exemplary Method for Paternity Test
The multi-PCR method of the present invention can be used to improve the accuracy of the paternity test because many target gene sites can be analyzed at one time (see, for example, U.S. Patent Application Publication No. 2012/0122701, filed December 22, 2005). For example, multiple PCR methods can be used in which thousands of polymorphic loci (e. G., SNPs) are identified using the PARENTAL SUPPORTBy being analyzed for use in the algorithm, one can determine if the claimed father is a biological father of the fetus. In some embodiments, the method comprises (i) at least 1,000 for the claimed parent's dielectric material; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or simultaneously amplifying a plurality of polymorphic loci comprising 100,000 different polymorphic loci to produce a first set of amplified products; (ii) simultaneously amplifying a corresponding plurality of polymorphic loci in a mixed sample of DNA originating from a blood sample of a pregnant mother to produce a second set of amplified products, wherein the mixed sample of DNA comprises fetal DNA And maternal DNA); (iii) measuring in computer a likelihood that the claimed father is a biological father of the fetus using genotyping based on the first and second sets of amplified products; And (iv) establishing that the claimed father is a biological father of the fetus using the measured likelihood that the claimed father is the biological father of the fetus. In various embodiments, the method comprises simultaneously amplifying a corresponding plurality of polymorphic loci in a maternal genetic material to produce a third set of amplified products; The likelihood that the claimed father is the biological father of the fetus is measured using genotyping measures based on amplified products of the first, second, and third sets.
Exemplary methods for embryo characterization and selection
The multiplex PCR method of the present invention can be used to improve the selection of embryos for in vitro fertilization by analyzing thousands of target gene sites in one go (see, for example, U.S. Patent Application Publication No. 2011/0092763 filed on May 27, 2011 and filed on December 22, 2011). For example, multiple PCR methods can allow embryos to be selected from a set of embryos for in vitro fertilization by allowing thousands of polymorphic loci (e.g., SNPs) to be analyzed for use in the PARENTAL SUPPORT algorithm described herein .
In some embodiments, the present invention provides a method for estimating the relative likelihood of developing each folded child's wishes resulting from a set of embryos. In some embodiments, the method comprises contacting a sample of each embryo with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or contacting 100,000 different target gene sites to produce a reaction mixture for each embryo, wherein the sample is taken from one or more cells of the embryo, respectively. In some embodiments, each reaction mixture is subjected to primer extension reaction conditions to produce amplified products. In some embodiments, the method comprises: measuring one or more characteristics of at least one cell of each embryo based on an amplified product on a computer; And evaluating the relative likelihood of developing each of the double-daughter cases based on one or more characteristics of at least one cell for each embryo on the computer. In some implementations, the method includes using an information-based method of measuring at least one feature, such as the PARENTAL SUPPORT algorithm described herein. In some embodiments, the feature comprises a drainage state. In some embodiments, the feature is selected from the group consisting of an isomerism, an isotropic, mosaic, nul lsomy, one chromosomal, uniparental disomy, trisomy, tetrasomy, Inconsistent copy-error trichomosomes, coincidental copy trosomes, maternal origin of isomerism, parental origin of isomerism, presence or absence of disease-associated genes, chromosomal entity of a particular aberrant chromosome, abnormal genetic conditions, Deletion or redundancy, possibility of feature, and combinations thereof. Chromosome 1, Chromosome 2, Chromosome 3, Chromosome 4, Chromosome 5, Chromosome 6, Chromosome 7, Chromosome 8, Chromosome 9, Chromosome 10, Chromosome 11, Chromosome 12, Chromosome 13, Chromosome 14, Chromosome 15, May be associated with a chromosome selected from the group consisting of chromosome 17, chromosome 18, chromosome 19, chromosome 20, chromosome 21, chromosome 22, X or Y chromosome, and combinations thereof.
Exemplary Paternity Diagnostic Methods
The multiplex PCR method of the present invention can be used to improve the method for diagnosing paternity, such as measuring the status of the fetal chromosomes. A plurality of target gene sites that can be amplified at the same time can be provided, thereby achieving more accurate measurement.
In one embodiment, the present disclosure is based on genotyping data measured on a sample of a mixed sample of DNA (i. E., DNA of a fetal mother, and fetal DNA), of a mother, and possibly also of a paternal genetic material (Wherein the measurement is performed by generating a set of predicted allele distributions for different possible fetal ploidy conditions providing parental genotype data by using a binding distribution model); Compare the predicted allele distribution with the measured allele distribution in the mixed sample and compare the predicted allele distribution pattern to the in vivo method for selecting the mating state most closely matching the observed allele distribution pattern to provide. In one embodiment, the mixed sample is taken from maternal blood, or from maternal serum or plasma. In one embodiment, a mixed sample of DNA can be preferentially enriched to a target gene locus (e. G., A plurality of polymorphic loci). In one embodiment, preferential enrichment is performed in a manner that minimizes allelic bias. In one embodiment, the present disclosure relates to the composition of DNA, which allows for preferential enrichment at multiple locus sites thereby lowering allelic bias. In one embodiment, the allele distribution (s) is determined by sequencing the DNA of the mixed sample. In one embodiment, the binding distribution model assumes that the alleles will be distributed in a biased fashion. In one embodiment, a predicted set of binding allele distributions is generated for genetically linked loci, taking into account the existing recombination frequencies of various sources, for example, using data from the International HapMap Consortium.
In one embodiment, the present disclosure relates specifically to a non-invasive paternity diagnostic (NPD) method for measuring the fetal status of an embryo by observing allelic measurements at multiple polymorphic loci in genotype data measured in a DNA mixture Where specific allele measurements are indicative of an isomeric fetus, while other allelic determinations are indicative of an embryonic fetus. In one embodiment, the genotyping data is determined by sequencing a DNA mixture of maternal plasma. In one embodiment, a DNA sample can be preferentially enriched within a DNA molecule corresponding to a plurality of gene sites for which its allele distribution is calculated. In one embodiment, a sample of DNA that uniquely or most predominantly contains only maternal genetic material, and possibly also a sample of DNA that contains only, or mostly, only paternal genetic material. In one embodiment, genetic measurements of one or both parents with an estimated fetal fraction are used to generate a plurality of predicted allele distributions corresponding to different possibly latent genetic conditions of the fetus; The predicted allele distribution can be named hypotheses. In one embodiment, maternal genetic data is not measured by measuring genetic material that is naturally exclusively or almost exclusively gathered, rather it is assessed by genetic measurements made in maternal plasma, including mixtures of maternal and paternal DNA do. In some embodiments, the hypothesis may include fetal drainage in one or more chromosomes, and the segment of the chromosome in the fetus has been inherited from the parent, and combinations thereof. In some embodiments, the maternal condition of the fetus is determined by comparing the observed allelic measurement with a different hypothesis (where at least a portion of the hypothesis corresponds to a different mydriatic condition), the hypothesis that most observed allelic measures tend to be true Is determined by selecting the drainage state. In one embodiment, the method comprises using allelic measurement data of some or all of the measured SNPs, regardless of whether the locus is homozygous or heterozygous, and thus includes using alleles of heterozygous adult genetic loci only I never do that. The method may not be appropriate for situations in which the genetic data is only about one genetic locus. The method is particularly advantageous when the genetic data comprises data for at least 10 polymorphic loci or at least 20 polymorphic loci for the target chromosome. The method is particularly advantageous when the genetic data comprises more than 50 polymorphic loci for the target chromosome, more than 100 polymorphic loci for the target chromosome, or more than 200 polymorphic loci for the target chromosome. In some embodiments, the genetic data may comprise at least 500 polymorphic loci for the target chromosome, at least 1,000 polymorphic loci for target chromosomes, at least 2,000 loci for polymorphic genes, or at least 5,000 loci for polymorphic genes.
In one embodiment, the methods disclosed herein produce a quantitative measure of the number of independent observations of each allele at the polymorphic locus. This differs from most methods, such as microarray or quantitative PCR, which provides information on the ratio of two alleles but does not quantify the number of independent observations of one allele. Using the method of providing quantitative information on the number of independent observations, only the obesity is used for the calculation of the drainage, but the quantitative information by itself is not useful. To illustrate the importance of retaining information on the number of independent observations, consider the sample gene locus with two alleles, A and B. In the first experiment, 20 A alleles and 20 B alleles are observed. In the second experiment, 200 A alleles and 200 B alleles are observed. In both experiments, the ratio (A / (A + B)) is equal to 0.5, but the second experiment provides more information than the first for the specificity of the frequency of the A or B allele. Some methods by others are the allele ratio (channel ratio) of the individual allele, x_i/ y_i), And compares them to reference chromosomes, or analyzes the ratio by using the law of how the ratio is expected to operate in a particular situation. Allele weighing does not imply this method, where it is ensured that the same amount of PCR product for each allele is guaranteed and that all the alleles can operate in the same manner. This method has a number of advantages and, more importantly, prevents the use of a number of improvements described elsewhere in the description.
In one embodiment, the methods contemplated herein can be used to predict the allele frequency distribution predicted in the chromosome and the nondisjunction during meiosis I, non-segregation during meiosis II, and / or during mitosis at the beginning of fetal development Lt; RTI ID = 0.0 > of allelic < / RTI > frequency distributions that can be predicted in the case of trisomy. To illustrate why this is important, suppose that there is no crossover: nonsegregation during meiosis I can produce trisomy, where two different homologues are inherited from one parent; In contrast, nonsegregation during meiosis II or during early mitosis of fetal development can produce two copies of the same homolog from one parent. Each scenario can produce a different predicted allele frequency at each polymorphic locus and also at all loci considered jointly due to genetic linkage. Crossovers resulting in the exchange of genetic material between homologues further complicate genetic form; In one embodiment, the invention accommodates this by using recombination non-information in addition to the physical distance between the loci. In one embodiment, to enable improved discrimination between meiosis I non-separation and meiosis II or mitotic non-separation, the present invention includes the possibility of increasing crossing into the model with increasing distance from the mobilis . Meiosis II and mitotic non-separation suggest that the mitotic non-separation results in the same or nearly identical copies of one homologue, but the two homologues often involve a different meiosis II non-segregation phenomenon due to one or more intersections during spouse formation Can be distinguished because of the fact that it represents.
In some embodiments, the methods disclosed herein include comparing the observed allelic measurement to a theoretical hypothesis that corresponds to a possible fetal genetic make-up, and does not include quantifying the ratio of alleles at the heterozygous locus Do not. Observation of the theoretical allelic distribution hypothesis that corresponds to possible parental genetic status using a method involving the step of quantifying the ratio of the allele at the heterozygous locus when the number of loci is less than about 20 Lt; RTI ID = 0.0 > allelic < / RTI > measurements. However, when the number of loci exceeds 50, these two methods tend to provide significantly different results; Where the number of loci is greater than 400, greater than 1000, or greater than 2,000, these two methods tend to provide significantly different results. These differences include quantifying the ratio of the alleles at the heterozygous locus without independently measuring the size of each allele, and methods involving summing or averaging the ratios using a conjugate distribution model / RTI > and / or exclude the use of techniques that involve performing linkage analysis and / or using a bias distribution model and / or using other advanced statistical techniques, Lt; RTI ID = 0.0 > allelic < / RTI > gene distribution hypothesis is due to the fact that this technique can be used which can substantially increase the precision of the measurement.
In one embodiment, the methods disclosed herein comprise the step of determining whether the distribution of observed allelic determinations is indicative of an isotropic or isomeric fetus using a binding distribution model. The use of a conjugate distribution model is different and significantly improved than the method of measuring the heterozygosity ratio by independently treating the polymorphic locus in that the measurements obtained are significantly more precise. Without wishing to be bound by any particular theory, one reason for their more precise one is believed to be that the bond distribution model accounts for the link between the SNP and the likelihood of crossing occurring during meiosis that generated germ cells that formed the embryo growing in the embryo. The purpose of using the connec- tion concept when generating a predicted distribution of allelic measures for one or more hypotheses is that it produces a predicted allelic gen- eration distribution that is significantly more realistic than when no connec- tion is used . For example, when imagining that there are two SNPs, 1 and 2 are located adjacent to each other and are located at A of SNP 1 and A of SNP 2 in homolog 1 and B of SNP 1 and SNP 2 of SNP 2 in homolog 1 . If the father is in A for both SNPs in both homologues and B is measured for fetal SNP 1, this means that the very high likelihood of B present in the fetus in SNP 2, because homolog 2 was inherited by the fetus Is present. A model that considers linkage can predict this, while a model that does not consider association can not take this into account. Otherwise, if the mother was AB in SNP1 and AB in the vicinity of SNP 2, two hypotheses corresponding to maternal tricomicity could be used at that location - one with a matching copy error (similar to meiosis II or early fetal development Fragmentation), one containing an incoincident copy error (non-separation in meiosis I). In the case of coincident copy-number trisomy, if the mother is inherited from the mother in SNP 1, the fetus inherits AA or BB from the mother in SNP 2, but AB is much more likely not to inherit. In the case of inconsistent copy errors, the fetus can inherit AB from the mother in both SNPs. Considering the association, the allele distribution hypothesis achieved by the parentheses request method is equivalent to the real allele measurement, to a considerably higher degree than the parentheses request method which does not take into account the association, since this prediction can be made. It is noted that the linking approach is not possible using methods that calculate the allele ratio and rely on the sum of allele ratio.
One reason that the ploidy measurement using a method involving comparing the observed allele measurement with a theoretical hypothesis corresponding to a possible fetal genetic status is considered to be more precise is that when the allele is measured using sequence analysis , Because the method can collect more information from the data of the allele if the total number of readings is less than the other methods; For example, a method that depends on the step of calculating and summing alleles can produce disproportionately weighed stochastic noise. For example, imagine using a sequencing assay to measure an allele, and when there was a set of loci and only five sequence reads were detected for each gene locus. In one embodiment, for each of the alleles, the data by these measurements can be appropriately included in the overall determination of the quantification by comparing the data with the hypothesized allele distribution and weighing it according to the number of sequence readings. This is in contrast to methods involving quantifying the ratio of alleles at the locus of heterozygosity because the method can be used as an allelic ratio of 0%, 20%, 40%, 60%, 80% or 100% Lt; / RTI > Since none of these can approximate the predicted allele ratio. In the latter case, the calculated allele ratio may have to be discarded due to insufficient readings or may have improper weighing and the accuracy of the measurement may be reduced by introducing stochastic noise into the measurement. In one embodiment, the individual allelic measures can be treated as independent measures, wherein the relationship between the measurements made for the allele at the same locus is different from the relationship between the measurements made for the allele at the different locus not.
In one embodiment, the methods disclosed herein provide a method for determining allelic variability of an allelic variant of a chromosomal aberration without reference to any metric for the allele measurement observed on a reference chromosome predicted to be dichromatic (termed the RC method) Or an indicator of an isomeric fetus. This is significantly improved over methods such as the use of a closed sequence analysis to detect the identity by evaluating the ratio of randomly sequenced fragments originating from the predicted chromosome to one or more predicted dichromic reference chromosomes have. This RC method produces inaccurate results if the predicted dichromic reference chromosome is not actually dichromic. This may occur when the heterozygosity is more substantial than the trisomy of a single chromosome or is tachycardia and all autosomes are trisomy. In the case of female triplet (69, XXX) embryos, no chromosomal chromosomes exist at all. The methods described herein do not require reference chromosomes and can accurately identify trisomy chromosomes in female triplet fetuses. For each chromosome, hypothesis, child fraction and noise level, the joint distribution model can be fitted without reference chromosome data, whole child fraction estimation, or fixed reference hypothesis.
In one embodiment, the methods disclosed herein demonstrate that the method of observing an allele at a polymorphic locus can be used to measure fetal ploidy status more precisely than prior art methods. In one embodiment, the methods utilize targeted sequence analysis to obtain a mixed fetal genotype and optionally to obtain maternal and / or paternity genotypes in multiple SNPs to prioritize various predicted allele frequency distributions under different hypotheses Once established, the quantitative alleles obtained in the parent-fetus mixture are examined to assess which hypothesis best fits the data, where the genetic state corresponding to the hypothesis best suited to the data is referred to as the correct genetic state. In one embodiment, the methods disclosed herein also employ a degree of conformance to generate confidence that the requested dielectric state is a precise dielectric state. In one embodiment, the methods disclosed herein include using an algorithm to analyze the distribution of alleles found for a locus having different parental content, and comparing the observed allelic distribution to a different parental content ) With a predicted allele distribution for a different mydriatic condition. This is different from the method that does not use a method to evaluate the number of independent cases of each gene spot among the mixed fetal samples, and it is improved. In one embodiment, the methods disclosed herein measure the distribution of observed allelic determinations as an indicator of an aneuploidy or an isoformic fetus using the observed allele distribution measured at the locus of the parent when the parent is heterozygous . This is because when the DNA is preferentially enriched in preference to a locus that is not primarily enriched or is not known to be highly informative for a particular target entity, this results in about twice as much in the set of sequence data Since the genetic measurement data can be used to give more precise measurement results, the mother is different from and better than the method that does not use the allele distribution observed in heterozygous adult genetic loci.
In one embodiment, the methods disclosed herein use a binding distribution model that estimates that the allele frequency at each locus is naturally polyclonal (so it is a bias if the SNP is a double allele). In some implementations, the binding distribution model uses a beta-binomial distribution. When using measurement techniques such as sequencing, we provide a quantitative measure of each allele present in each gene locus, and the bias model estimates the degree and frequency of each locus and allele frequency encountered It can be applied to trust. It is not possible to deduce the specificity in the observed ratios using methods known in the art or that quantitative allele information is discarded from generating allelic requests from allelic ratios. The present invention is different and improved from the method of calculating allelic ratios and summing these ratios to achieve a drainage requirement since any method including calculating the allelic ratios at a particular locus and then summing these ratios , Since the measured intensity or coefficient, which is a measure of the amount of DNA of any given allele or locus, is necessarily estimated to be distributed in a Gaussian fashion. The methods disclosed herein do not involve calculating the allele ratio. In some embodiments, the methods disclosed herein can include incorporating into the model the number of observations of each allele at multiple loci. In some embodiments, the methods disclosed herein may include calculating a predicted distribution itself, using a binding bias distribution that may be more precise than any model that estimates the Gaussian distribution of allelic measurements. The probability that the bias distribution model is significantly finer than the Gaussian distribution increases as the number of loci increases. For example, when fewer than 20 loci are queried, the probability of the bias distribution model being significantly better is low. However, if more than 100, or more particularly more than 400, or especially more than 1,000, or especially more than 2,000 loci are used, the bias distribution model will have a very high probability of being significantly finer than the Gaussian distribution model Resulting in precise measurement of drainage. The possibility that the bias distribution model is significantly more precise than the Gaussian distribution also increases as the number of observations at each locus increases. For example, if less than 10 distinct sequences are observed at each locus, the likelihood of the bias distribution model being significantly better is low. However, when used for each genetic locus, more than 50 sequence reads, or in particular 100 or more sequence reads, or more particularly 200 or more sequence reads, or especially 300 or more sequence reads, Will have a very high probability of being significantly finer than the Gaussian distribution model, resulting in more accurate drainage measurements.
In one embodiment, the methods disclosed herein employ sequence analysis to determine the number of instances of each allele at each locus in a DNA sample. Each sequence analysis readout can be mapped to a specific gene locus and processed as a binary sequence reading; Running can include probable substance and / or mapping possibilities as part of a sequence reading to generate a sequence or a fraction of a sequence reading that maps to a locus of the probabilistic method, i. E., A given gene locus. It is possible to use a binary distribution for each measurement set, using a binomial coefficient or the probability theory of numbers, to allow the confidence interval to be calculated around the number of coefficients. The ability to use a binary distribution allows a more accurate drainage evaluation and a more accurate confidence interval to be calculated. This can be done by measuring the intensity of fluorescently tagged DNA in the electrophoretic band by measuring the amount of alleles present, such as using intensities, e.g., using microarrays, or using a fluorescent reader But it has been improved.
In one embodiment, the methods disclosed herein measure aspects of the distribution of allelic frequencies evaluated for that data set using aspects of the current data set. This is improved over the way of using a training set of data or a preceding set of data to set the parameters for the currently predicted allele frequency distribution, or possibly the allele ratio predicted. This is because there are different sets of conditions involved in the collection and measurement of all dielectric samples, so that data derived from a set of current data is used to measure the parameters for the binding distribution model that should be used in the measurement of drainage for such samples This is because the method of making the image will tend to be more precise.
In one embodiment, the methods disclosed herein comprise measuring whether the distribution of observed allelic measures is an indicator of an aneuploidy or isomeric fetus using a maximum likelihood technique. The use of the maximum likelihood technique is different and significantly improved from other methods using a single hypothesis rejection technique in that the determinants obtained will be significantly more accurate. One reason is that the single hypothesis rejection technique sets the cutoff threshold based only on one measurement distribution rather than two, which means that the limit is generally not optimal. Another reason is that the maximum probability technique allows optimization of the cutoff threshold for each individual sample instead of measuring the cutoff limit to be used for all samples regardless of the particular characteristics of each individual sample. Another reason is that the use of the maximum likelihood technique allows the calculation of the reliability for each of the multiple requests. The ability to perform a reliability calculation for each request allows the participant to know which request is precise and which is more incorrect. In some implementations, a wide range of methods can be combined with a maximum likelihood estimation technique to improve the precision of the drainage requests. In one embodiment, the maximum likelihood technique may be used in conjunction with the method described in U.S. Patent No. 7,888,017. In one embodiment, the maximum likelihood technique employs targeted PCR amplification to amplify DNA in mixed samples followed by sequencing and sequencing as indicated in the 2011 International Congress of Human Genetics 2011 in Montreal Can be used in conjunction with methods using analysis using the same read counting method as used by the same TANDEM DIAGNOSTICS. In one embodiment, the methods disclosed herein include using the assessment to evaluate both the fetal fraction of DNA in the mixed sample and the confidence in both the drain request and the drain request. This includes both methods of screening for adequate fetal fraction, using a fetal fraction assessed by the resulting drainage request using a single hypothesis rejection technique that does not consider the fetal fraction or produces a confidence calculation for the request Are different and distinct.
In one embodiment, the method disclosed herein considers the possibility that the likelihood for the data becomes noisy and includes errors by grafting the scales for each measurement. The use of a maximum probability technique to select the correct hypothesis from a set of derived hypotheses using the measurement data with the attached probabilistic estimates suggests that the probability that an incorrect measurement will be discontinued and an accurate measurement will be used for the calculation More than this. To be more precise, the method systematically reduces the impact of incorrectly measured data in drainage measurements. This is an improvement over the way all data is evaluated to be equally accurate, or the way away data is arbitrarily excluded from calculations that lead to drain demand. Conventional methods of using channel ratio measurements require that the method be extended to multiple SNPs by averaging the individual SNP channel ratios. Weighing individual SNPs with predicted measurement variations based on the SNP quality and the observed depth of the readings does not reduce the accuracy of the statistics obtained and in particular leads to a significant reduction in the precision of the mobility request in the case of a borderline do.
In one embodiment, the methods disclosed herein do not predict which SNP or other polymorphic locus is heterozygous in the fetus. The method allows the drainage request to be made when paternal genetic information is not available. This is an improvement over methods in which the heterozygosity knowledge must first be known in order to appropriately select the locus for the target or to interpret genetic measurements made in mixed fetal / maternal DNA samples.
The methods described herein are particularly advantageous when used in a sample where a small amount of DNA is available or the percentage of fetal DNA is low. This is due to the fact that correspondingly higher allele gene dropout rates and / or percentage of fetal DNA that occurs when only a small amount of DNA is available is low in mixed samples of fetal and maternal DNA, . A high allele dropout rate, which means that a large percentage of alleles have not been measured for the target individual, results in poorly precise fetal fractionation, and poorly precise polyp determination. The methods disclosed herein use a binding distribution model that considers association in the genetic mode between SNPs, so that a more precise drainage measurement can be made. The methods described herein allow precise polyp determination when the percentage of molecules of DNA in the fetal state in the mixture is less than 40%, less than 30%, less than 20%, less than 10%, less than 8%, and even less than 6% .
In one embodiment, it is possible to measure the ploidy status of an individual based on measurements when the individual's DNA is combined with the DNA of the relevant individual. In one embodiment, the mixture of DNA is free floating DNA found in maternal plasma, which may include DNA of a mother with a known karyotype and a known genotype, and may include DNA of unknown karyotype and an unknown type , And can be mixed with fetal DNA. Using known genotypic information of one or both parents, different chromosomal distributions from different parents to each fetus, and, optionally, a number of potential genetic conditions of the DNA of the mixed sample for different fetal DNA fractions of the mixture Can be predicted. Each potential composition can be referred to as a hypothesis. Later on, the fetal drainage state can be determined by looking at the actual measurements and measuring by what potential composition is the most likely to provide the observed data.
Further discussion of the above comments can be found elsewhere in this document.
ratio- Chip entry Fetal diagnosis NPD )
The process of non-invasive fetal diagnosis includes a number of steps. Some of these steps include (1) obtaining a genetic material in the fetus; (2) the fetal genetic material, which may be present in the mixed sample,ex vivo&Lt; / RTI > (3) amplifying the genetic material in vitro; (4) preferentially enriching a specific gene locus in a genetic material in vitro; (5) measuring in vitro the genetic material; And (6) analyzing the genomic data on a computer and ex vivo. Methods for reducing the implementation of these six and other related steps are described herein. At least some of the method steps do not apply directly to the body. In one embodiment, the present disclosure relates to methods of treatment and diagnosis applied to tissues and other biological material that are separated from the body. At least some of the method steps are performed on a computer.
Some embodiments of the invention allow a clinician to measure the genetic status of a fetus in a non-chip mode so that the health of the child is not compromised by the collection of genetic material of the fetus, So you do not have to. In addition, in particular aspects, the present disclosure provides a method of screening a fetal heredity condition that has a higher precision, better reproducibility, and a better reproducibility than a non-invasive maternal serum analyzer screening, such as a triple test, To be measured with significantly higher precision.
The high precision of the methods disclosed herein is the result of an informatics approach to the analysis of genotypic data, as described herein. Modern technological advances have created the ability to measure large amounts of genetic information in genetic samples using methods such as high-emission sequence analysis and genotyping. The methods disclosed herein allow clinicians to take greater advantage of making large amounts of data available and more precise diagnosis of the fetal genetic condition. The details of many implementations are provided below. Different implementations may include different combinations of the steps described above. Various combinations of different embodiments of different steps may be used interchangeably.
In one embodiment, a blood sample is taken from a pregnant fetus and the free floating DNA in the blood plasma of the maternal blood, containing a mixture of both the maternal DNA and the DNA of the fetal origin, is isolated and used to determine the fetal drainage state do. In one embodiment, the methods disclosed herein include preferentially enriching DNA sequences in a mixture of DNA corresponding to a polymorphic allele, such that the allele ratio and / or the allele distribution remains largely persistent in concentration do. In one embodiment, the methods disclosed herein include highly efficient targeted PCR amplification, so that a very high percentage of the resulting molecules correspond to the targeted gene locus. In one embodiment, the methods disclosed herein include sequencing a mixture of DNA containing both maternal DNA and DNA of fetal origin. In one embodiment, the methods disclosed herein comprise using a measured allelic distribution to specify the maternal condition of the fetus being conceived in the mother. In one embodiment, the methods disclosed herein comprise reporting to a clinician a measured drainage condition. In one embodiment, the methods disclosed herein can be used to perform a clinical trial, such as performing a subsequent invasive test such as, for example, chorionic villus sampling or amniocentesis diagnosis, the birth of a trisomychate, And preparing for abortion surgery.
This application is a continuation-in-part of U.S. Utility Model Serial No. 11 / 603,406, filed November 28, 2006 (U.S. Publication No. 20070184467); U.S. Utility Model Serial No. 12 / 076,348, filed March 17, 2008 (U.S. Publication No. 20080243398); PCT Application Serial No. PCT / US09 / 52730 (PCT Publication No. WO / 2010/017214), filed August 4, 2009; PCT Application Serial No. PCT / US10 / 050824 (PCT Publication No. WO / 2011/041485) filed on September 30, 2010, U.S. Utility Model Serial No. 13 / 110,685 filed on May 18, 2011 And PCT Patent Application Serial No. PCT / 12/58578, filed October 3, 2012, each of which is incorporated herein by reference in its entirety. Some of the vocabulary used in the present application may have precedents in these references. Some of the concepts described herein will be better understood in terms of concepts found in these references.
Freely floating fetus DNA Screening of maternal blood
The methods described herein can be used to help measure the genotype of a child, fetus, or other target entity, wherein the target's genetic material is found in the presence of a quantity of another genetic material. In some embodiments, the genotype may refer to the pivotal state of one or more chromosomes, which may refer to one or more disease-linked alleles, or some combination thereof. In the context of the present disclosure, the discussion focuses on measuring the genetic status of a fetus, where fetal DNA is found in maternal blood, but the example does not imply that this method is limited to a possible concept to which the method may be applied. Further, the method may be applicable when the amount of target DNA is present in a specific ratio with the non-target DNA; For example, the target DNA can be constructed anywhere between 0.000001 and 99.999999% of the DNA present. Also, the non-target DNA need not necessarily originate in one entity, or even in a related entity, as long as the genetic data of some or all of the relevant non-target entity (s) are known. In one embodiment, the method disclosed herein can be used to measure fetal genomic data of maternal blood containing fetal DNA. It can also be used when there are a number of fetal children in the uterus of a pregnant woman, or when other contaminating DNA can be present, for example, in samples of other already born brothers.
This technique can exploit the phenomenon of fetal blood cells approaching the maternal circulation through the placenta villus. Normally, only a few fetal cells are introduced into the maternal circulation in this manner (not enough to generate a positive Kleihauer-Betke test for fetal-maternal hemorrhage). Fetal cells can be categorized and analyzed by various techniques to find specific DNA sequences, but there is no inherent risk of invasive processes. The technique may also use the phenomenon of free floating fetal DNA, which obtains access to the maternal circulation by DNA release after apoptosis of placental tissue, wherein the placental tissue in question contains DNA of the same genotype as the fetus. Free-floating DNA found in maternal plasma has been found to contain fetal DNA at rates as high as 30-40% fetal DNA.
In one embodiment, the blood can be collected from a pregnant woman. Studies have proved that maternal blood can contain a small amount of freely floating DNA in the fetus, in addition to the free floating DNA of the maternal origin. In addition to the many blood cells of the maternal origin, which typically do not contain nuclear DNA, there may be nucleated fetal blood cells containing DNA of the fetal origin. There are many methods known in the art for separating fetal DNA or producing a fraction enriched in fetal DNA. For example, chromatography proves to produce a specific fraction enriched in fetal DNA.
A sample of maternal blood, plasma, or other emulsion containing the amount of cellular or free-floating fetal DNA, which is collected in a relatively non-invasive manner and whose ratio to the maternal DNA is enriched, , The genotype of the DNA found in the sample can be analyzed. In some embodiments, the blood may be drawn using a needle to withdraw blood from a vein, for example, the jugular vein. The methods described herein can be used to measure fetal genotype data. For example, it can be used to measure drainage status in one or more chromosomes and can measure the identity of one or a set of SNPs, including insertions, deletions, and translocations. It can be used to measure one or more haplotypes, including parents that are the origin of one or more genotype features.
Note that the method will work with any nucleic acid that can be used in any genetic and / or sequencing method, such as the ILLUMINA INFINIUM ARRAY platform, AFFYMETRIX GENECHIP, ILLUMINA GENOME ANALYZER, or LIFE TECHNOLOGIES SOLID SYSTEM. This includes free-floating DNA extracted from plasma or amplification of such nucleic acids (e. G., Whole genome amplification, PCR); Genomic DNA of another cell type (e. G., Human lymphocytes of whole blood) or amplification thereof. For the production of DNA, any extraction or purification method that produces genomic DNA suitable for one of these platforms will work well. The method can work well with a sample of RNA. In one embodiment, the storage of the sample can be performed in a manner that minimizes degradation (e.g., below freezing, at about -20 占 폚, or at a lower temperature).
Parent support
PARENTAL SUPPORT^TM(US) No. 11 / 603,406 (US Publication No. 20070184467), US Patent Application No. 12 / 076,348 (US Publication No. 20080243398), US Patent Application 13 / 110,685, PCT Patent Application No. PCT / US09 / 52730 (PCT Publication No. WO / 2010/017214), and PCT Patent Application No. PCT / US10 / 050824 (PCT Publication No. WO / 2011/041485) Which are incorporated herein by reference in their entirety. PARENTAL SUPPORT^TMIs an informatics-based approach that can be used to analyze genetic data. In some embodiments, the methods disclosed herein are < RTI ID = 0.0 >^TM Can be considered to be part of the method. In some implementations, the PARENTAL SUPPORT ^TM The method involves highly accurate measurement of genetic data of a target entity with high precision, genetic data of a mixture of DNA consisting of one or a small number of cells of the individual, DNA of the target individual, and DNA of one or several other individuals, Is a set of methods that can be used to specifically measure disease-related alleles, other alleles of interest, and / or the morbidity of one or more chromosomes in a target individual. PARENTAL SUPPORT^TMCan refer to any of these methods. PARENTAL SUPPORT^TMIs an example of an informatics-based method. PARENTAL SUPPORT^TMAn exemplary implementation of the method29 To 31gAnd is described in Example 19.
PARENTAL SUPPORT^TM The method is based on the knowledge of the known parental genetic data, i.e. the knowledge of the mechanism of meiosis and / or the parental and / or diploid genetic data of the mother and / or father and the incomplete measurement of the target DNA, and possibly with one or more related entities, (S) at the location of the major gene locus, and the polymorphic status of the target DNA or embryo at the location of the major gene locus, and / or the genotype at multiple alleles with high confidence, Repeat. PARENTAL SUPPORT^TM The method can not only reprogram single-nucleotide polymorphisms (SNPs) that are poorly measured, but also insert and delete DNA that has not been measured at all, and can re-engineer SNPs or entire regions. Also, PARENTAL SUPPORT^TM The method can measure both disease-linked loci and also screens for biosynthesis from single cells. In some implementations, the PARENTAL SUPPORT^TM The method can be used to characterize one or more cells of an embryo biopsied during an IVF cycle to determine the genetic status of one or more cells.
PARENTAL SUPPORT^TM The method allows purification of noisy genetic data. This can be accomplished by using the genotype of the relevant entity (parent) as a reference material and assigning the correct genetic allele in the target genome (embryo). PARENTAL SUPPORT^TMWhere only a small amount of dielectric material is available (e. G., PGD), where a direct measurement of the genotype is inherently noisy due to the limited amount of dielectric material. PARENTAL SUPPORT^TM, Where only a small fraction of the available genetic material is derived from the target entity (e.g., NPD), where a direct measurement of the genotype is inherently noisy due to the contaminated DNA signal of the other entity . PARENTAL SUPPORT^TM The method is based on the assumption that a normal disordered diploid measurement can be characterized by a high proportion of graft dropout, drop-in, various amplification biases and other errors, Lt; RTI ID = 0.0 > oligomeric < / RTI > The method can use both latent models of latent genetic model and measurement error. The genetic model can measure both the probability of alleles and the cross-probability between SNPs in each SNP. Allelic probability developed by the International HapMap scheme can be modeled at each SNP based on data obtained at parent and model crossing probabilities between SNPs based on data obtained from the HapMap database. By providing appropriate latent genetic models and measurement error models,Maximizing Posterior Probability ( maxim um a posteriori: MAP) evaluation can be used with variations on computational efficacy to allow each of the embryos to accurately assess the aligned allele value in the SNP.
In some cases, the techniques outlined above can measure the genotype of an individual by providing a very small amount of DNA originating from the individual. It can be DNA in one or a few cells, or it can originate from a small amount of fetal DNA found in the maternal blood.
theory
In connection with the present disclosure, the hypothesis refers to a possible genetic condition. This is a possible drainage condition. This can refer to the possible allele status. A set of hypotheses may refer to a set of possible genetic states, a set of possible allelic states, a set of possible parasite states, or a combination thereof. In some implementations, a set of hypotheses may be designed such that one hypothesis estimated from the hypotheses corresponds to the actual genetic state of any particular entity. In some implementations, a set of hypotheses may be designed such that all possible genetic conditions can be described by at least one hypothesis of the hypotheses. In some embodiments of the invention, one aspect of the method is to determine which hypothesis corresponds to the actual genetic state of the subject in question.
In another embodiment of the present application, one step involves creating a hypothesis. In some implementations, this may be a copy number hypothesis. In some embodiments, this may include a hypothesis that which segment of the chromosome of each of the related entities corresponds to which segment of another related entity, as the case may be. Creating a hypothesis can refer to the action of setting the limits of a variable such that the entire set of possible dielectric states under consideration is included by these variables.
A " copy number hypothesis " also referred to as a "drainage hypothesis ", or" drainage state hypothesis ", may refer to a hypothesis as to a possible chromosomal copy, chromosome type, or possible drainage state for a chromosome section in a target entity. It can also refer to one or more of the chromosomal ploidy conditions in the subject. The set of copy number hypotheses can refer to a set of hypotheses, where each hypothesis corresponds to a different possible drainage state in an individual. The hypothetical set may relate to a set of possible drainage states, a set of possible parent haplotype distributions, a set of possible fetal DNA percentages in the mixed sample, or a combination thereof.
Normal individuals contain one of each chromosome type of each parent. However, due to meiosis and mitosis, it is possible for an individual to have 0, 1, 2 or more of the chromosome types received from each parent. In fact, it is rare to find two or more chromosomes from parents. Herein, some embodiments only consider possible hypotheses in which 0, 1, or 2 copies of a given chromosome are derived from a patient; This is a minor extension to consider some possible copies of parental origin. In some embodiments, for a given chromosome, there are nine possible hypotheses: 0, 1, or 2 of the parental origin, multiplied by three possible hypotheses about 0, 1, or 2 chromosomes of the parental origin Three possible hypotheses about chromosomes. Let (m, f) is a hypothesis, where m is the number of predetermined chromosomes inherited from the mother and f is the number of predetermined chromosomes inherited from the father. Thus, the nine hypotheses are (0,0), (0,1), (0,2), (1,0), (1,1), (1,2) , 1), and (2, 2). These are also H₀₀, H₀₁, H₀₂, H₁₀, H₁₂, H₂₀, H₂₁, And H₂₂. Different hypotheses correspond to different drainage states. For example, (1,1) refers to the normal dichromic chromosome; (2,1) refers to the simulated trichomosomism, and (0,1) refers to the paternal chromosomes. In some embodiments, if two chromosomes are derived from one father and one chromosome is inherited from another parent, the two cases can be further differentiated: one is that if two chromosomes are identical Error), and one is when the two chromosomes are homologous but not identical (the number of copies that do not match). In these implementations, there are six possible hypotheses. It is to be understood that it is possible to use different sets of hypotheses and different numbers of hypotheses.
In some embodiments of the present application, the praxis hypothesis refers to a hypothesis as to which chromosome of another related entity corresponds to the chromosome found in the genome of the target entity. In some embodiments, the key points of the method are that the associated entity can be expected to share a haplotype block, and the measured genetic data of the associated entity can be compared with knowledge of the haplotype block match between entities associated with the target entity , It is possible to impart accurate genetic data on the target entity with a higher degree of confidence than using only the genetic measure of the target entity. As such, in some embodiments, the preexisting hypothesis may not relate only to a plurality of chromosomes, but may relate to which chromosomes are the same or substantially the same in a related entity, using one or more chromosomes in the target entity.
If a set of hypotheses is defined, if the logic operates on the input genetic data, they can output the measured statistical likelihood for each hypothesis under consideration. The probabilities of the various hypotheses may be determined using the relevant genetic data as input, as described by one or more of skill in the art, algorithms, and / or methods described elsewhere herein, Can be measured by mathematically calculating the temperature.
As measured by a number of techniques, if the probabilities of the different hypotheses are estimated, they can be summed. This may include multiplying the probability for each hypothesis as measured by the respective technique. The product of the probability of hypotheses can be normalized. It is noted that one dysplastic hypothesis refers to one possible drainage state for a chromosome.
The process of "summing hypotheses", or "combining probabilities", which is said to combine the results of skilled techniques, is a concept familiar to those in the field of linear algebra. One possible way of summing probabilities is as follows: When using a skilled technique to evaluate a set of hypotheses by a specific set of genetic data, the output of the method is the probability associated with each hypothesis in the set of hypotheses in a one- . A set of probabilities associated with one of the hypotheses in each set, measured by a first skilled artisan, as a set of probabilities, each measured by a second skilled art, associated with the same set of hypotheses , Multiply the two sets of probabilities. This means that, for each hypothesis of the set, the two probabilities associated with this hypothesis are multiplied together, signifying that the corresponding product is the output probability. The process can be extended to a specific number of expertise. If only one technique is used, the output probability equals the input probability. If more than one expertise is used, then the related probabilities can be doubled at the same time. By standardizing the product, the sum of the probabilities of hypotheses in the set at the sum of hypotheses is 100%.
In some implementations, if the combined probability for a given hypothesis is greater than the combined probability for any of the other hypotheses, then the hypothesis can be considered as being measured as having the highest probability. In some implementations, the hypothesis may be measured as most probable, and a drainage state, or other genetic condition, may be requested if the normalized probability is greater than the limit. In one embodiment, this may mean that the number of chromosomes associated with the hypothesis and the entity may be requested as a drainage state. In one embodiment, this means that the entity of the allele associated with the hypothesis can be requested as the allelic state. In some embodiments, the limit may be from about 50% to about 80%. In some embodiments, the limit may be from about 80% to about 90%. In some embodiments, the limit may be from about 90% to about 95%. In some embodiments, the limit may be from about 95% to about 99%. In some embodiments, the limit may be from about 99% to about 99.9%. In some embodiments, the limit may be greater than about 99.9%.
Parent relationship parental context )
The parental relationship refers to the genetic status of a given allele in each of two related chromosomes for one or both of the two parents of the target. In one embodiment, the parental relationship does not refer to the allelic state of the target, but to the parental allelic state. The parental relationship for a given SNP may consist of four base pairs, two fathers and two mothers; These may be the same or different. This is typically "m_Onem₂| f_Onef₂"Where m_One And m₂Is the genetic state of a given SNP of two maternal chromosomes, f_One And f₂Is the genetic state of a given SNP of two parental chromosomes. In some implementations, the parent relationship is "f_Onef₂| m_Onem₂The suffixes " 1 "and" 2 " refer to the genotype in a given allele of the first and second chromosomes, and also the chromosomes labeled "1 " Note that this is arbitrary.
In the present description, A and B are often used generally to denote base pair entities; A or B can equally represent C (cytosine), G (guanine), A (adenine) or T (thymine). For example, in a given SNP-based allele, the mock genotype was T in this SNP on the chromosome, G in the SNP on the homologous chromosome, in the allele the paternity genotype was G in both SNPs of homologous chromosomes, The allele can be said to have a parental relationship of AB | BB; It can also be said that the allele has a parental relationship of AB | AA. It is theoretically noted that it is possible, for example, to have a neurotransmission of AT in a simulated form, and to have a genotype of GC in certain alleles in the case of paternity since none of the four possible nucleotides can occur in a given allele do. However, the experimental data shows that, in most cases, only two of the four possible base pairs are observed in a given allele. For example, if a single chain repetition is used, it is possible to have two or more parent relationships, four or more, and even ten or more pieces of information. In the context of the present description, the discussion assumes that only two possible base pairs will be observed in a given allele, even though the embodiments described herein may be modified to take into account cases where such an estimate is not maintained.
"Paternal information" may refer to a set or a small set of target SNPs having the same paternal information. For example, to measure 1000 alleles on a given chromosome in a target entity, the information AA | BB may refer to a set of alleles in a group of 1,000 alleles, wherein the target mock genotype Homozygous, and the target paternal genotype is homozygous, but where the maternal and paternal genotypes are not the same at the locus of interest. If pseudo-data are not stepped and thus AB = BA, then there are nine possible parent relationships: AA | AA, AA | AB, AA | BB, AB | AA, AB | AB, AB | , BB | AB, and BB | BB. If the parent data is step by step and therefore AB ≠ BA, then there are sixteen different possible parent relationships: AA | AA, AA | AB, AA | BA, AA | BB, AB | AA, AB | AA, BA | AB, BA | BA, BB | BB, BB | AA, BB | AB, BB | All SNP alleles on chromosomes except some SNPs on sex chromosomes have one of these parental relationships. A set of SNPs with heterozygous parental relationships to one parent may be referred to as heterozygosity information.
Purpose of parental relationship in NPD
Non-invasive childhood diagnosis is an important technique that can be used to measure the genetic status of a fetus, for example, with genetic material obtained in a non-invasive manner in blood drawn from a pregnant mother. Blood can be separated from plasma after separation of plasma DNA. Size selection can be used to isolate DNA of appropriate length. DNA can be preferentially enriched in a set of gene sites. Thereafter, the DNA can be measured by a number of means, such as by hybridization and fluorescence measurements on the genomic array, or by sequencing on a high-emission sequencer.
There are a number of methods that use sequence data when sequencing is used to request fetal drainage in connection with non-invasive fetal screening. The most common way that sequence data could be used is simply to count a number of readings that map to a given chromosome. For example, imagine that you are trying to measure the morbidity of chromosome 21 in the fetus. Also, imagine that the DNA in the sample consists of 10% DNA of fetal origin and 90% of DNA of maternal origin. In this case, the average number of readings on a chromosome, for example, chromosome 3, which could be predicted to be dichromic, could be observed and compared to the number of readings on chromosome 21, Is regulated for multiple base pairs on these chromosomes. In the case of fetal maturation, the amount of DNA per unit of genome can be predicted to be nearly equal at all locations (applied to stochastic changes). On the other hand, when trisomy 21 is trisomy, it can be predicted that there is a little more DNA per genome unit than chromosome 21 at other positions in the genome. Specifically, it can be predicted that about 5% or more DNA is present from chromosome 21 in the mixture. When DNA is measured using sequencing, a unique mapable reading of about 5% or more can be predicted from chromosome 21 per segment that is unique from other chromosomes. As a bias towards isothermal diagnosis, when controlling for a large number of sequences that can be uniquely mapped to a chromosome, the amount of DNA on a special chromosome that is higher than a certain limit can be observed. Other methods that can be used to detect isomericity are similar to those described above except that the parent relationship is taken into account.
When considering which alleles are for a target, one may consider the possibility that some parental relationships are more beneficial than others. For example, AA | BB and symmetric information BB | AA are the most informative information, because it is known that the child shares a different allele. For reasons of symmetry, both AA | BB and BB | AA information may be referred to as AA | BB. Another set of beneficial parental relationships is AA | AB and BB | AB, because it has a 50% chance of involving an allele that does not have a child in that case. Due to the symmetry, both AA | AB and BB | AB information can be referred to as AA | AB. The third set of beneficial parental relationships are AB | AA and AB | BB, which in this case involves a parental known parental allele, which is also present in the maternal genome. Due to the symmetry, both AB | AA and AB | BB information can be referred to as AB | AA. The fourth parent relationship is AB | AB, where the fetus has an unknown allele status, regardless of the allele status, which has the same allele. The fifth parent relationship is AA | AA, where mother and father are heterozygous.
prefecture ashes Disclosed Implementation example Different Executions
Methods for measuring the miliary status of a target entity are described herein. The target entity may be a refractory, embryo, or fetus. In some embodiments of the subject matter, the method for measuring the paitogenic status in one or more chromosomes of a target entity may comprise any of the steps described in the document, and combinations thereof.
In some embodiments, the source of the genetic material to be used to measure the genetic status of the fetus may be a fetal cell, such as a nucleated fetal red blood cell, isolated from maternal blood. The method may comprise obtaining a blood sample in a pregnant mother. The method is based on the idea that a particular color combination is uniquely associated with nucleated red blood cells and that a combination of similar colors is not associated with any other existing cells in the maternal blood, And separating fetal red blood cells. The combination of colors associated with nucleated red blood cells may include color red hemoglobin around the nucleus, which can be more apparent by staining, and nuclear material that can be stained, for example, blue. The nucleated red blood cells can be identified by isolating the cells of the mother's blood, dispersing them on the slide, and then identifying both red (hemoglobin) and blue (nuclear) materials. Thereafter, the nucleated red blood cells can be extracted using a brown rice manipulating apparatus, and the genotype aspect of the genetic material in these cells can be measured using genotypic and / or sequence analysis techniques.
In one embodiment, the nucleated red blood cells are dyed with a mold that exhibits fluorescence only in the presence of fetal hemoglobin and no fluorescence in the presence of maternal hemoglobin, and the ambiguity between whether the nucleated red blood cells originate in the maternal or fetal have. Some embodiments of the present disclosure may include labeling or dyeing or running the dielectric material. Some embodiments herein may include the step of specifically labeling the fetal nuclear material using a fetal cell specific antibody.
There are many other ways to isolate fetal cells from maternal blood, to isolate fetal DNA from maternal blood, or to concentrate samples of fetal genetic material in the presence of maternal genetic material. Some of these methods are listed herein, but this is not intended to be an exhaustive list. Some suitable techniques are listed herein for convenience: use of tagged antibodies that are fluorescent or running, size exclusion chromatography, labeled affinity tags that are magnetically or running, interactions between moiety and fetal cells in specific alleles Eutrophic differences such as differential methylation, CD45 / 14 depletion and CD71-positive selection of CD45 / 14 negative cells after density gradient centrifugation, single or double Percoll gradients with different osmolality concentrations, Specific lectin method.
In one embodiment herein, the target entity is a fetus, and a different genotyping measure is made in a plurality of DNA samples taken from the fetus. In some embodiments herein, the fetal DNA sample originates from a separate fetal cell, wherein the fetal cell can be mixed with a maternal cell. In some embodiments herein, the fetal DNA sample originates from freely floating fetal DNA, wherein the fetal DNA can be mixed with free floating maternal DNA. In some embodiments, the fetal DNA sample can originate from a maternal plasma or maternal blood containing a mixture of maternal DNA and fetal DNA. In some embodiments, the fetal DNA comprises 99.9: 0.1% to 99: 1%; 99: 1% to 90: 10%; 90: 10% to 80: 20%; 80: 20% to 70: 30%; 70: 30% to 50: 50%; 50: 50% to 10: 90%; Or from 10: 90% to 1: 99%; Can be mixed with maternal DNA in a ratio of 1: 99% to 0.1: 99.9%.
The genetic data of the target entity and / or the related entity can be determined by measuring the appropriate genetic material using tools and / or techniques, including, but not limited to, genetic microarrays, and high- . &Lt; / RTI > Some high-throughput sequencing methods include Sanger DNA sequencing, pyrosequencing, ILLUMINA SOLEXA platform, ILLUMINA's GEMONE ANALYZER, or APPLIED BIOSYSTEM's 454 sequence analysis platform, HELICOS 'TRUE SINGLE MOLECULE SEQUENCING PLATFORM, HALCYON MOLECULAR , Or any other method of sequencing. Both of these methods physically convert the genetic data stored in a sample of DNA into a set of genetic data typically stored in memory during processing.
The genetic data of a related entity may include a large diploid tissue of the subject, one or more diploid cells of the subject, one or more haploid cells of the subject, a complex of the target, extracellular genetic material found in the subject, A genetic material, a cell of an individual found in the mother's blood, one or more embryos generated from the germ cell (s) of the associated individual, one or more oocytes harvested from such an embryo, an extracellular genetic material found in the relevant individual, But not limited to, dielectric materials known to originate from, and combinations thereof.
In some embodiments, a set of at least one drainage state hypothesis can be generated for each of the target chromosome types of the target entity. Each of the diploid state hypotheses can refer to one possible drainage state of the chromosome or chromosome segment of the target entity. This set of hypotheses may include some or all of the possible drainage states that can be predicted to have the chromosomes of the target entity. Some of the possible mydriatic conditions include, but are not limited to, chromosomal, monoclonal, dysmorphic, uniparental disomy, hydrolytic, trisomal, conforming trisomy, inconsistent trisomy, (3: 1) four chromosomal, autosomal, extrachromosomal, and other isomeric, and combinations thereof, of the sex, paternal tricuspid, autosomal, balanced (2: 2) . &Lt; / RTI > Either of these emissive states can be mixed or partially impermeable, such as unbalanced translocations, balanced translocations, Robertsonian translocations, recombination, deletion, insertion, crossing, and combinations thereof.
In some embodiments, the knowledge of the measured drainage condition can be used to make clinical decisions. This knowledge, typically stored in a physical arrangement of matter in a storage device, can then be converted into a report. You can run the report later. For example, a clinical decision may be made to terminate the pregnancy; Running, clinical decisions can be made to continue the pregnancy. In some embodiments, the clinical decision may include an intervention designed to reduce the severity of the phenotypic manifestation of the genetic disorder, or a decision to take relevant steps to prepare for a child with a disability.
In one embodiment herein, any of the methods disclosed herein may be modified to allow a plurality of targets to originate from a plurality of blood samples collected from the same target entity, e. G., The same pregnant mother. This can improve the accuracy of the model since multiple genetic measurements can thereby provide more data for which the target genotype can be measured. In one implementation, one set of target genetic data is provided as reported primary data, and the other is provided as data for dual-checking primary target genetic data. In one embodiment, a plurality of sets of genetic data, measured from the genetic material taken from the target entity, are each considered to be parallel, so that a set of target gene data, both of which are measured with high precision, And the like.
In one embodiment, the method can be used for the purpose of a pessary test. For example, by providing SNP-based genotyping information for a male, which may or may not be a mother, and a genetic father, and genotypic information measured from a mixed sample, It is possible to discriminate whether or not it represents an actual genetic father. A simple way to do this is simply to observe the information that the mother is AA and the possible father is AB or BB. In this case, it can be predicted to recognize the paternity contribution 1/2 (AA | AB) or all (AA | BB) time, respectively. Taking into account the predicted ADO, it is simple to determine whether the observed paternal SNP is associated with a possible father's.
One embodiment of the present application may be as follows: if a pregnant woman wishes to know if she will suffer from her fetal Down syndrome and / or suffer from cystic fibrosis, and if she wishes to have children If you do not want to be born. The doctor takes her blood and stains it with a single marker so that it looks clearly red and stains the nuclear material with another marker so that it looks clearly blue. If the maternal red blood cells are typically nuclear-free, or know that a high proportion of fetal cells contain nuclear, the physician can visually isolate the number of nucleated red blood cells by identifying cells displaying both red and blue. The physician picks up these cells from the slide with a micro manipulator, amplifies them, and sends them to a laboratory that genotypes 10 individual cells. By using genetic measurements, PARENTAL SUPPORT^TM In the method, 6 out of 10 cells are maternal blood cells, and 4 out of 10 cells can be measured as fetal cells. If your child is born from a pregnant mother, PARENTAL SUPPORT^TMCan also be used to determine that a fetal cell is distinguishable from the child's cell by indicating that it has a reliable allele requirement on the fetal cell and that they are not similar to that of the born child. Note that the method is similar to the concept of the parent test implementation of the present application. The genetic data measured from fetal cells contain many allele dropouts, so quality can be very poor due to difficulties in genotyping single cells. The clinician uses the measured fetal DNA together with the parent's reliable DNA measurement to determine the side of the fetal genome with high precision using PARENTAL SUPPORT^TM, The genetic data contained in the genetic material of the fetus can be converted into the predicted genetic state of the fetus and stored on a computer. The clinician can measure the maternal condition of the fetus, and both the presence or absence of a large number of the desired disease-linked genes. If the fetus is found to be a mydriatic and not an agent for cystic fibrosis, the mother decides to continue pregnancy.
In one embodiment of the invention, a pregnant mother may want to measure whether she experiences a full chromosomal abnormality in her fetal child. She goes to her doctor, provides her blood, and provides the DNA she and her husband have taken from their ball swabs. Laboratory researchers amplify parental DNA by genotyping parent DNA using the MDA protocol and measure parental genetic data in multiple SNPs with the ILLUMINA INFINIUM sequence. Subsequently, the researchers spin down the blood, take the plasma, and isolate free-floating DNA samples using size exclusion chromatography. Running, the researchers use one or more fluorescent antibodies, such as those specific for fetal hemoglobin, to separate the nucleated fetal erythrocytes. The researcher then takes the isolated or enriched fetal genetic material and amplifies it using a library of 70-mer oligonucleotides designed appropriately so that the two ends of each oligonucleotide correspond to the flanking sequence on one side of the target allele . Upon addition of polymerases, ligases, and appropriate reagents, oligonucleotides undergo gap-filling, circulation, and capture of the desired allele. The exonuclease was added, heat-inactivated, and the product was used directly as a template for PCR amplification. PCR products were sequenced on the ILLUMINA GENOME ANALYZER. Sequence readings are then used to predict the parasite status of the fetus, PARENTAL SUPPORT^TM Method as input.
In another embodiment, the parent is wanting to know if the mother is pregnant and the mother-old couple has congenital fetal child syndrome, Turner's syndrome, Prader Willi syndrome, or some other entire chromosomal abnormality. The gynecologist draws blood from his mother and father. The blood is sent to the laboratory, where the technicians centrifuge the mother sample and separate it into plasma and smoke. By converting the DNA and pseudo blood samples in the smoke film through amplification and further converting the coded genetic data in the amplified genetic material from the molecular stored genetic data to the electronically stored genetic data by operating the genetic material on the high emission sequencer, Measure the genotype of the system. Plasma samples are preferentially concentrated in a set of gene sites using a 5,000-flex anti-nested targeted PCR method. A mixture of DNA fragments is prepared as a DNA library suitable for sequencing. Subsequently, DNA is sequenced using a high-emission sequencing method, for example, ILLUMINA GAIIx GENOME ANALYZER. Sequence analysis converts molecularly encoded information in DNA into electrically encoded information in computer hardware. An information-based technology, including the presently disclosed implementation, e.g., PARENTAL SUPPORT^TMIs used to measure the fetal drainage status. Computing on the computer the probability of alleles at a plurality of polymorphic loci by DNA measurements made on the samples produced; Generate multiple drainage hypotheses regarding different possible drainage states of each of the chromosomes on the computer; Construct a binding distribution model for the number of alleles predicted at multiple polymorphic loci on chromosomes for each diploid hypothesis on a computer; Measuring the relative probability of each drainage hypothesis using a binding distribution model and a number of alleles measured in the prepared sample on a computer; And requesting a drainage state of the fetus by selecting a drainage state corresponding to a hypothesis having a maximum probability. The fetus is estimated to have Down syndrome. Print out a report, or transmit it electronically to a pregnant woman's gynecologist, and the specialist diagnoses the woman. The woman, her husband, and the doctor sit down and discuss their opinions. The couple decides to stop pregnancy based on the knowledge that they suffer from tachycardia.
In one embodiment, the company may decide to provide diagnostic techniques designed to detect the fertility in the pregnant fetus, from maternal blood withdrawal. These products can include what the mother shows her gynecologist who is drawing her blood. The gynecologist can also collect genetic samples from the fetus's father. The clinician separates the plasma from the maternal blood and purifies the plasma DNA. Clinicians can also produce DNA from the smoke film by separating the layer from the mother blood. Clinicians can also produce DNA from parental genetic samples. The clinician attaches a common amplification tag to the DNA in the DNA originating from the plasma sample using the molecular biology techniques described in the present disclosure. Clinicians can amplify commonly tagged DNA. The clinician can preferentially concentrate DNA by a number of techniques, including capture by hybridization and tagged PCR. The tagged PCR may include overlapping, semi-overlapping or some degree of overlapping, or any other approach that results in efficient enrichment of the plasma-derived DNA. The targeted PCR can be amplified enormously, for example, with 10,000 primers in one reaction volume, where the primers are the common genetic loci for

chromosomes

13,18, 21, X and X, and Y, and Optionally target SNPs also on other chromosomes. Selective enrichment and / or amplification may include tagging each individual molecule with a different tag, molecular barcode, amplification tag, and / or sequence analysis tag. The clinician can then sequence the plasma samples and possibly sequencing the prepared maternal and / or paternal DNA. The molecular biology step may be performed in whole or in part by a diagnostic box. Sequence data can be supplied to a single computer, or to other types of computing platforms, as can be found with " turbidity ". The computing platform can calculate the number of alleles at the polymorphic locus targeted by the measurements made by the sequencer. The computational platform can generate multiple drainage hypotheses on chromosomal, monoclonal, dichromatic, matched, and inconsistent trisomy for

chromosomes

13, 18, 21, X and Y respectively have. The computation platform can build a binding distribution model for the number of alleles predicted at the locus targeted on the computer for each of the five hypochromic hypotheses for which the information is obtained. Computational platforms can measure the probability that each preexisting hypothesis will be true using the binding distribution model and the number of surrogate pathogens measured in the preferentially enriched DNA from the plasma sample. The calculation platform can request the fetal drainage status for each of

chromosomes

13, 18, 21, X and Y by selecting the drainage state corresponding to the appropriate hypothesis at maximum probability. A report containing the requested drainage status can be generated and sent electronically to the departments and specialists to be displayed on the output device or to convey the printed hardcopy of the report to the obstetrician. The gynecologist may notify the parents and, optionally, the fetus's father, who can determine which clinical choices are open to them, and which are most desirable.
In another embodiment, a pregnant woman, hereinafter referred to as a "mother, " may determine whether she wishes to know whether her fetal child is carrying any genetic abnormality or other condition. She may want to ensure that there is no overall abnormality before she is convinced to continue pregnancy. She can visit her gynecologist who can take a sample of her blood. He can take a dielectric sample, such as a cotton swab, from her cheek. He can also take genetic samples from the fetus' father, such as ball swabs, sperm samples, or blood samples. He can send the sample to the clinician. Clinicians can concentrate freely floating fetal DNA fractions in maternal blood samples. Clinicians can concentrate the fraction of nucleated fetal blood cells in maternal blood samples. The clinician can measure the genetic data of the Tia using various aspects of the methods described herein. The genetic data may include the status of the fetal diabetic condition, and / or the entity of one or more alleles associated with the disease in the fetus. A report summarizing the results of the fetal screening can be generated. The report can be sent to the doctor or sent by mail, and the doctor can tell the mother about the heredity of the fetus. The mother can decide to stop pregnancy based on the fact that she has one or more chromosomal, genetic abnormalities, or undesirable conditions. She can also decide to continue pregnancy based on the fact that she has no overall chromosomal or genetic abnormality, or any genetic condition of interest.
Other examples may include pregnant women who have been fertilized by a sperm donor. She wants to minimize the risk of carrying her child's genetic information. She collects blood from a specialized physician, separates the nucleated fetal erythrocytes using techniques described herein, and tissue samples are also collected from the mother and her genetic father. The genetic material of fetuses, mothers, and fathers is amplified appropriately and genotyped using the ILLUMINA INFINIUM BEADARRAY. The methods described herein provide highly precise cleaning and staging of parent and fetal genotypes, It accomplishes. Phenotypic susceptibility is predicted from reconstructed fetal genotypes, and when reports are generated, they are sent to the mock doctor to help them determine what is the best clinical decision.
In one embodiment, the crude genetic material of mothers and fathers is transformed by amplifying the amount of DNA in a similar but orderly amount of DNA. Thereafter, in a genotyping method, genetic data encoded by the nucleic acid can be physically and / or electronically stored on a storage device such as those described above. PARENTAL SUPPORT^TM Relevant algorithms for making the algorithms, relevant portions thereof, are discussed in detail herein and are interpreted by a computer program using a programming language. Thereafter, instead of physically encoding the bits and bytes, through the execution of a computer program on a computer hardware, arranged in a form indicative of coarse measurement data, they are transformed into a form indicative of high reliability measurements of the fetal drainage state . The details of such a transition will depend on the data itself and on the computer language and hardware system used to carry out the methods described herein. Thereafter, data representing physically constructed fetal high quality drainage measurements is converted into reports that can be sent to a physician. This conversion can be performed using a printer or a computer display. The report may be a copy printed on paper or on another suitable medium, or it may be electronic. In the case of an electronic report, it can be transmitted and physically stored in a storage device at a location on a computer accessible by a physician, or displayed on a screen to enable it to be read. In the case of a screen display, the data can be converted into a readable form by causing a physical conversion of pixels on the display device. The conversion is accomplished by altering the electrical charge that physically changes the transparency of a particular set of pixels on the screen that can be placed on the front side of the substrate that emits or absorbs the photons, by way of the electrons that are physically radiated in the phosphorescent screen . This conversion can be achieved by modifying the nanoscale orientation of the molecules in the liquid crystal from a nematic to a cholesteric or smectic phase for a particular set of pixels. This conversion can be accomplished in the manner of a current that allows the photons to emit from a special set of pixels made of a plurality of light emitting diodes arranged in a meaningful fashion. This conversion can be accomplished in any other way using display information such as a computer screen, some other output device, or the use of transmission information. The doctor will then act on the report to ensure that the data in the report is converted into action. This behavior may be to continue or stop the pregnancy, in which case the fetus that is genetically abnormal becomes a dead fetus. By combining the information listed herein, for example, pregnant maternal and paternal genetic material can be obtained from a plurality of steps summarized in the present description, resulting in the abortion of a genetically abnormal fetus, or a medical decision . &Lt; / RTI > Running, a physician can switch a set of genotyping measures into a report that helps cure his pregnant patient.
In one embodiment of the invention, the method described herein can be used to measure the maternal mood, i.e., the maternal morbidity of a fetus, even if the pregnant woman is not the biological parent of her fetus. In one embodiment of the invention, the model described herein can be used to use only the maternal blood sample and measure the maternal condition of the fetus without the need for a paternal inheritance sample.
In the presently disclosed embodiment, part of the mathematics makes a hypothesis about a limited number of states of hydrology. In some cases, for example, only 0, 1 or 2 chromosomes are predicted to originate in each patient. In some embodiments of the present application, mathematical deviations can be extended to account for other forms of isomerism, such as sex chromosomality, wherein the three chromosomes do not alter the underlying concept of the present invention, It comes from the castle. At the same time, it is also possible to focus on a lesser number of drainage conditions, for example, only trisomy and dichroism. It should be noted that the measurement of drainage, which represents a non-total chromosome, may exhibit a mosaic phenomenon in a sample of dielectric material.
In some embodiments, genetic abnormalities include Down syndrome (or trisomy 21), Edwards syndrome (trisomy 18), Patau syndrome (trisomy 13), Turner syndrome (45X) , Klinefelter's syndrome (male with the 2X chromosome), Prader-Willie syndrome, and DiGeorge syndrome (UPD 15). Congenital disorders such as those listed in the preceding sentences are generally undesirable and knowledge that one or more children suffer from phenotypic abnormalities can be used to help the physician terminate pregnancy and take the necessary precautions to prepare for the birth of a child with a disability, And may provide a basis for a decision to take some therapeutic approach which implies lowering the severity of the abnormality.
In some embodiments, the methods described herein can be performed at very early gestation, for example, about 4 weeks early, about 5 weeks early, about 6 weeks early, about 7 weeks early, about 8 weeks early, about 9 weeks early, 10 weeks early, 11 weeks early and 12 weeks early can be used.
In some embodiments, the methods disclosed herein are used in connection with pre-implantation genetic diagnosis (PGD) for embryo selection during in vitro fertilization, wherein the target population is a blastocyst, Or by two-cell biopsy or by nutritional exosomal biopsy of the 5 or 6-day embryo. In the PGD setting, only a small number of cells, typically 1 to 5 but as many as 10, 20 or 50 cells are tested. Thereafter, the total number of starting copies of the A and B alleles (at the SNP) is negatively measured by the parental genotype and a number of cells. In NPD, since the number of copies starting is very high, the allele ratio after PCR is expected to reflect the starting ratio precisely. However, a small number of starting copies in PGD means that contamination and incomplete PCR efficacy do not have minor effects on the allele ratio after PCR. The effect may be more important than the depth of the reading in predicting the change in allelic ratios measured after sequencing. The distribution of the measured alleles that provide the known child genotype can be generated by a Monte Carlo simulation of the PCR process based on PCR probe efficacy and contamination probability. By providing an allele ratio for each possible child genotype, the likelihood of various hypotheses can be calculated for NIPD.
Maximum probability evaluation
Most methods known in the art for detecting the presence or absence of a biological phenomenon or medical condition include the use of a single hypothesis rejection test wherein the metric associated with the condition is measured and if the metric is on one side of a predetermined limit If there is a state, but the metric falls on the other side of the limit, the state is absent. The single-hypothesis rejection test observes a null distribution when it is determined between a null and an alternative hypothesis. The reliability at the request can not be calculated because the possibility of each predetermined hypothesis can not be evaluated by the observed data without considering the distribution of the alternatives. Thus, using a single hypothesis rejection test, a positive or negative answer is obtained without a sense of confidence associated with a particular case.
In some embodiments, the methods disclosed herein can detect the presence or absence of a biological phenomenon or medical condition using a maximum likelihood method. This is substantially improved over the way of using a single hypothesis rejection technique since the threshold for requesting the presence or absence of a state can be adjusted as appropriate for each case. This is particularly relevant to diagnostic techniques aimed at measuring the presence or absence of a fetus in a fetus conceived by genetic data available from a mixture of fetal and maternal DNA present in freely floating DNA found in the plasma of the mother, . This is because the optimal limit for requesting isomeric alternate purity varies as the fraction of fetal DNA in the plasma-derived fraction changes. As the fetal fraction falls, the distribution of the data related to the binomiality becomes very similar to the distribution of the data related to the binomial distribution.
Maximum probability Evaluate the probability of conditional data for each hypothesis using the distribution associated with each hypothesis. Thereafter, these conditional possibilities can be converted into hypothesis requests and reliability. Similarly, the maximum inductive evaluation method uses the same conditional probability as the maximum probability evaluation, but also selects the best hypothesestabletIf so, include the previous group.
Thus, a maximum likelihood estimation (MLE) technique, or a closely related maximum-inductive (MAP) technique, offers two advantages: first, it increases the chance of an accurate request, . In one embodiment, choosing a pivotal state corresponding to a hypothesis with a maximum probability is performed using a maximum likelihood estimation or a maximum inductive evaluation. In one embodiment, there is provided a method for measuring the morbid state of a conceived fetus, comprising taking any method currently known in the art using a single hypothesis rejection technique and allowing it to use MLE or MAP technology by reformulating it Is described. Some examples of methods that can be significantly improved by applying these techniques can be found in U.S. Patent No. 8,008,018, U.S. Patent No. 7,888,017, or U.S. Patent No. 7,332,277.
In one embodiment, a method is described for measuring the presence or absence of fetal bioremediation in a maternal plasma sample comprising fetal and maternal genomic DNA, the method comprising: obtaining a maternal plasma sample; Measuring a DNA fragment found in a plasma sample with a high-emission sequencer; Mapping a sequence to a chromosome and measuring the number of sequence readings mapped to each chromosome; Calculating a fraction of fetal DNA in a plasma sample; The predicted distribution of the amount of target chromosomes that can be expected to be present when the second target chromosome is in the infinite range and the one or more predicted distributions that can be predicted if such chromosomes were isomorphic, Calculating using the number of sequence readings mapping one or more reference chromosomes predicted to be pluripotent; And using the MLE or MAP to indicate which of the distributions is most likely to be accurate, thereby indicating the presence or absence of fetal insemination. In one embodiment, measuring the DNA of the plasma may comprise performing a colossal parallel shut-off sequence analysis. In one embodiment, measuring the DNA of a plasma sample can include, for example, sequencing DNA that has been preferentially enriched in multiple polymorphic or non-polymorphic loci through targeted amplification. Multiple loci can be designed to target one or a small number of predicted dihydrogenic chromosomes and one or a small number of reference chromosomes. The purpose of preferential enrichment is to increase the number of sequence readings that are beneficial for drainage determination.
Drainage request information method
Methods for determining the status of a given embryo in which the sequence data is determined are described herein. In some embodiments, the sequence can be determined on a high-throughput sequencer. In some embodiments, the sequence data can be measured on DNA originating from freely floating DNA isolated from maternal blood, wherein free-floating DNA comprises some DNA of maternal origin and some DNA of fetal / placental origin do. This part will describe one embodiment of the invention wherein the embryonic paternity status is estimated by estimating that the fraction of fetal DNA in the analyzed mixture is not known and will be evaluated by the data. It will also describe aspects of the fraction of fetal DNA in the mixture (the "fetal fraction") or the percentage of fetal DNA that can be measured by other methods and which are presumed to be known for measuring the fetal ploidy state. In some embodiments, the fetal fraction can be calculated using only genetic determinations made in the maternal blood sample itself, which is a mixture of fetal and maternal DNA. In some embodiments, the fraction may also be calculated using the measured or otherwise known genotype of the mother, or a different known genotype and / or father's. In another embodiment, the embryonic status of a fetus can be uniquely determined based on a calculated fraction of fetal DNA for the chromosome in question, as compared to a calculated fraction of fetal DNA for a reference chromosome that is assumed to be chromatic.
In a preferred embodiment, for a particular chromosome, we are observing and analyzing N SNPs and assuming that:
● NR Free-Floating DNA Sequence Measurements S = (s_One, ...., s_NR). Since the method uses SNP measurement, all sequence data corresponding to the non-polymorphic locus can be disregarded. In a simplified version of the present invention we have (A, B) numbers for each SNP, where A and B correspond to two alleles at a given locus, S is S = ((a_One, b_One), ...., (a_N, b_N)) (Where a_iIs an A number in SNP i, and b_iIs the B number in SNP i),

to be.
● Parent data consists of:
Genotype based on SNP microarray or other strength Genotype platform:
Mother M = (m_One, ..., m_N), Father F = (f_One, ..., f_N) (Where m_i, f_i

(AA, AB, BB).
○ AND / OR Sequence data measurement: NRM mother measurement SM = (sm_One, ..., sm_nrm), NRF father measurement SF = (sf_One, ..., sf_nrf). Similar to the above amplification, when the present inventor has (A, B) number in each SNP, SM is expressed as ((am_One, bm_One), ..., (am_N, bm_N)), SF = ((af_One, bf_One), ..., (af_N, bf_N))to be.
Collectively, the mother and father child data are denoted by D = (M, F, SM, SF, S). Parent data is desirable and increases the precision of the algorithm, but especially paternal data is not required (NOT). This means that even in the absence of maternal and / or paternal data, it is possible to obtain very accurate copy number results.
Datalogability across the entire hypothesis (H) considered

To maximize the best rating (

Lt; / RTI > Specifically, we use the bond distribution model and the number of alleles measured in the fabricated samples, and use these relative probabilities to measure the relative probability of each of the drainage hypotheses and determine the most likely hypothesis to be corrected as follows:

Similarly, the inductive hypothesis possibility that provided the data can be written as:

Where priorprob (H) is the prior probability assigned to each hypothesis H based on the model design and prior knowledge.
It is also possible to find the maximum inductive evaluation using priors:

In one implementation, the copy number hypothesis that can be considered is:
● Sun chromosome:
○ Mother H10 (one copy of mother)
○ Father H01 (one copy of his father)
● Dichromaticity: H11 (one copy of each mother and father)
● Simple trisomy, crossing is not considered:
○ Mother: H21_ Matched (two identical copies of mother, one copy of father), H21_ not matched (copy of mother, both copies of father)
○ Mother: H12_ Matched (one copy of mother, two identical copies of father), H12_ not matched (one copy of mother, both copies of father)
● Complex trisomy, cross tolerance (using coupled distribution model):
○ Mother H21 (two copies of mother, one copy of father),
○ Father H12 (one copy of mother, two copies of father)
In other embodiments, chromosomal (H00), heterozygous chromosomal (H20 and H02), and chromosomal (H04, H13, H22, H31 and H40) can be considered.
In the absence of a crossover, each trisomy may be one of trisomy, whether identical or incoherent, regardless of whether the origin was meiosis, meiosis I, or meiosis II. Due to the intersection, the actual tricomality is generally a combination of two. First, a method of inducing hypothesis possibilities for a simple hypothesis is described. Thereafter, a method for deriving hypothesis possibilities for complex hypotheses is described combining the individual SNP probabilities with the intersections.
For a simple hypothesis LIK (D | H)
In one implementation, LIK (D | H) can be measured for the following simple hypothesis: For the simple hypothesis H, the logarithm of the hypothesis H, LIK (H), in the entire genome can be calculated as the sum of the logarithms of the individual SNPs by estimating the known or originated child fraction cf. In one embodiment, it is possible to derive cf from the data.

Since this hypothesis does not estimate any link between SNPs, we do not use the joint distribution model.
In some implementations, Log likelihood can be measured per SNP criterion. At a particular SNP i, When estimating fetal DNA hypothesis H and fetal DNA percent cf, the logarithm of the observed data D is defined as:

Where m is the actual maternal genotype as possible and f is the actual paternal genotype as possible where m, f

{AA, AB, BB}, and c is a possible child genotype that provided the hypothesis H. In particular, in the case of monoclonal, c

, And in the case of dichromaticity, c

In the case of trisomy, c

to be.
Genotype precedence frequency: p (m | i)pA _i Is the general precedence probability of the maternal genotype at SNP i, based on the known population frequencies in SNP I, Especially

,

to be.
The father's genotype probability, p (f | i), can be measured in a similar manner.
Actual Child Probability:

Is the probability of obtaining the actual child genotype = c by providing the parents m, f, and estimating the hypothesis H that can be easily calculated. For example, in the case of H11, H21 matches and H21 does not match, and p (c | m, f, H) is provided below.

matched: matched, unmatched: not matched
Data possibilities:

Is the probability of a particular data D at SNP i, taking into account the actual maternal genotype m, the actual paternity genotype f, the actual child genotype c, the hypothesis H and the child fraction cf. This can be broken down by the probability of mother, father and child data as follows:

Mother SNP Array Data Possibility: SNP array Genotype mapped to true genotype m Assuming genotype is correct SNP at SNP i SNP array Genotype data

Simply

to be.
Maternal sequence data probability: The probability of mother sequence data in SNP i is: S_i= (am_i, bm_i), When no extra noise or deflection is associated, P (SM | m, i) = P_X _{| m}(am_i), Where X | m to Binom (p_m(A), am_i+ bm_i)Wow

Is defined as: < RTI ID = 0.0 >

no call: no request
FAT data possibility: Apply a similar expression to paternal data possibilities. It is noted that it is possible to determine the child's genotype in the absence of parental data, in particular paternity data. For example, if paternal genetic data F is available,

Can be used. If the parent sequence data SF is not available, P (SF | f, i) = 1 can be immediately used.
In some embodiments, the method comprises establishing a binding distribution model for the expected number of alleles at a plurality of polymorphic sites on the chromosome for each of the preexisting hypotheses: one method is to achieve the objective described herein will be. Possible liberated fetal DNA data:

Is the probability of the liberated fetal DNA sequence data in SNP i when considering the actual maternal genotype m, the actual child genotype c, the child copy number hypothesis H, and the child fraction cf. In fact, SNP i

Is the probability of the sequence data S at SNP I, taking into account the actual probability of the A content.

For counting, S_iIs a_i, b_i), No excess noise or bias is included in the data,

Where X is Binom (p (A), a_i+ b_i) And p (A) =

to be. In a more complex case where the exact alignment and number of (A, B) per SNP is not known,

Is a combination of integrated biases.
Actual A content probability:

, The actual probability of the A content in the SNP i of such a mother / father mixture is defined as: actual mother genotype = m, actual child genotype = c, and total child fraction = cf:

Here, #A (g) is the number of A of the genotype g,

Is mock somy

Is hypothalamus of the child under hypothesis H (1 for monochrome, 2 for dichromic, and 3 for trisomy).
Use of Bond Distribution Models: For Hybrid Hypothesis LIK (D | H)
In some embodiments, the method comprises constructing a binding distribution model for the number of alleles each predicted at a plurality of polymorphic loci of a chromosome for a diploid hypothesis; One way to achieve this goal is described herein. In many cases, trichomosomes generally do not coincide or coincide due to crossing, so in that paragraph we will derive the results for the combined hypothesis H21 (maternal trisomy) and H12 (parental trisomy) Taking into account the crossover, we combine the matched and unmatched trisomes.
In the case of trichomosomes, when there is no crossover, trichromosity may be trisomic in simplicity or inconsistent. Consistent tricomality is when a child inherits two copies of the same chromosomal segment from one parent. Inconsistent trichromosomal cases are when a child inherits one copy of a homologous chromosomal segment from each parent. Due to the intersection, some segments of the chromosome may have an identical trisomy, and others may have inconsistent trisomy. This paragraph describes a method for constructing a binding distribution model for a set of alleles, i. E., A heterozygosity ratio to the number of alleles predicted at multiple loci for one or more hypotheses.
In SNP i,

Lt; RTI ID = 0.0 > H_m, &Lt; / RTI >

Lt; RTI ID = 0.0 > H &_uAnd pc (i) is assumed to be a cross-probability between SNPs i-1 and i. Then, the full possibility can be calculated as follows:

here,

Is a termination probability in the hypothesis for SNP 1: N. E is the last SNP, E

. Repeatedly, the following can be calculated:

Where E is a hypothesis other than E (not E), and the hypothesis considered here is H_m And H_uto be. In particular, the likelihood of a 1: i SNP can be calculated by multiplying the likelihood of SNPi, based on the likelihood of one (i-1) SNP, using the same hypothesis and non-intersection or opposite hypotheses and crossings.
For SNP 1, i = 1,

to be.
For SNP 2, i = 2,

, i = 3: N, and so on.
In some implementations, the child fraction can be measured. A child's fraction can refer to the ratio of sequences in a mixture of DNA originating from a child. With respect to non-invasive fetal diagnosis, the child fraction can refer to the ratio of sequences in the maternal plasma originating from a part of the placenta with fetal or fetal genotype. This can refer to the fraction of the child in the sample of DNA prepared from the plasma of the mother and can be concentrated in fetal DNA. One goal of measuring the fraction of children in a sample of DNA is to use an algorithm that can achieve a drainage requirement on the fetus so that the fraction of the children can be analyzed using any DNA sample analyzed by sequencing for the purpose of non- I can say that.
Some of the algorithms shown in the present disclosure, which are part of the method of non-invasive parental trait diagnostics, estimate known child fractions, which may not always be the case. In one embodiment, with or without the presence of parental data, it is possible to find the largest possible child fraction by maximizing the likelihood of the chromosome in the selected chromosome.
In particular, for the dichromatic hypothesis and for the child fraction cf in chromosome chr, we assume LIK (D | H11, cf, chr) = log as described above. In the case of a chromosome selected in Cset (generally 1:16), supposing that it is a whole number, the full possibility is as follows:

Most likely child fractions (

silver

.
It is possible to use any set of chromosomes. It is also possible to derive a fraction of the children without estimating the morbidity in reference chromosomes. Using this method it is possible to measure the child fraction for any of the following situations: (1) one has sequencing data in the parental sequence data and in the maternal plasma; (2) one has sequenced data in the parent and targeted sequencing data in the maternal plasma; (3) one has sequencing data targeted at both parental and maternal plasma; (4) one has sequencing data targeted at both the maternal and maternal plasma fractions; (5) one has sequencing data targeted in the maternal plasma fraction; (6) Different combinations of parent and child fraction measurements.
In some implementations, an informational method may be included in the data dropout; This may lead to a higher precision of the pore measurement. The probability of obtaining A somewhere in this book has been estimated to be a direct function of the actual maternal genotype, the actual child genotype, the fraction of children in the mixture, and the number of copies of children. It is also possible that the maternal or child allele may, for example, be dropped out instead of measuring the actual child AB in the mixture, which may be the case when a unique sequence mapped to allele A is measured. Parent dropout rate for genome lighting data d_pg, The parent dropout ratio d_ps And the child dropout ratio d_cs. In some implementations, the maternal dropout rate can be estimated to be zero, the child dropout rate is low; In this case, the result is not seriously affected by dropout. In some implementations, the probability of allele dropout may be large enough that they produce a significant effect of the predicted drainage demand. In this case, the allele dropout is included in the algorithm here:
Parent SNP array data dropout: for maternal genomic data M, the genotype after dropout is m_dAssuming that

ego,
Here, as before

Lt;

For a dropout ratio d, as defined below, taking into account the actual genotype m, the possible post-dropout genotype m_dIs the possibility of.

no call: no request
A similar equation is applied to the paternal SNP array data.
Parental sequence data dropout: About maternal sequence data SM

ego,
here,

Is defined as in the preceding paragraph and is determined from the bias distribution

Probability is defined as above in the Parent Data Capability paragraph. Apply a similar equation to parent sequence data.
Free Floating DNA Sequence Data Dropout:

here,

Is as defined in this paragraph for free floating data possibilities.
Work avatarIn the example,

The actual maternal genotype

And the dropout ratio d_ps And

The observed child genotype

, And the actual child genotype

And the dropout ratio d_csThe observed maternal genotype

. nA_TIs the number of A alleles of the genotype c, nA_DLt; RTI ID = 0.0 &

RTI ID = 0.0 > nA < / RTI >_T = nA_DSimilarly, nB_TIs the number of B alleles in the genotype c, nB_DLt; RTI ID = 0.0 &

Lt; RTI ID = 0.0 > nB < / RTI >_T ≥ nB_D And d is the dropout rate,

to be.
ImplementationIn the example, the informatics method may include random and continuous deflection. There is no persistent sampling bias or random noise (besides the binomial distribution change) per SNP in a large number of sequences in the ideal word. Specifically, in SNP i, in the case of the maternal genotype m, when the actual child genotype c and the child fraction cf, and X is the number of A in the set of (A + B) readings at SNP i, A + B), where p is

And is the actual probability of the A content.
In one implementation, the informational method may include random bias. As a common case, the probability of obtaining A in this SNP is assumed to be equal to q, assuming there is a bias in the measurement, which is slightly different from p as defined above. The degree to which p differs from q depends on the precision of the measurement process and the number of other factors and can be quantified as a standard deviation of q away from p. In one implementation, the mean of the distribution at the center in p, the parameter s

, It is possible to model q by having a beta variable. In particular,

&Lt; / RTI >

to be.

If set to Parameters

The

Lt; RTI ID = 0.0 >

to be.
This is the definition of a beta-binomial distribution, where one is a sample taken from a bias distribution with a variable parameter q, where q follows a beta distribution with an average p. Thus, in a setting without biopsy, at SNP i, the parent sequence data (SM) probability estimating the actual mother genotype (m) The number of parental sequences A in SNP i (am_i) And the number of parental sequences B in SNP i (bm_i), It can be calculated as follows:
P (SM | m, i) = P_X _{| m}(am_i); Where X | m ~ Binom (p_m(A), am_i+ bm_i)
Now, including the random deflection using the standard deviation s, it becomes:
X | m ~ BetaBinom (p_m(A), am_i+ bm_i, s)
In the absence of bias, the probability of the mother plasma DNA sequence data (S) estimating the actual mother genotype (m), the actual child genotype (c), and the child fraction (cf) Number of floating DNA sequences A (a_i) And the sequence B of free floating SNP iNumber(b_i), It can be calculated as follows:

Where X is the bias (p (A), a_i+ b_i) And p (A) is

to be.
In one implementation, including a random bias using the standard deviation s, this is X ~ BetaBinom (p (A), a_i+ b_i, s), where the amount of excess is specified by the deviation parameter s, or equally N. The smaller the value of s (or the larger the value of N), the closer the distribution is to the normal bias distribution. It is possible to use N estimated from the probability by estimating N from the amount of deflection, i.e., the definite contents AA | AA, BB | BB, AA | BB, BB | AA. Depending on the behavior of the data,_i+ b_iConstant or a_i+ b_i, So that a smaller deflection can be achieved for a larger depth of the readout.
In one embodiment, the informatics method may include continuous SNP-biased. Due to artifacts in the sequencing process, some SNPs may be consistently lower or higher, regardless of the actual amount of A component. SNP i is w_i It is assumed that the bias of a percentage is applied to a large number of A numbers. In some implementations, this bias may be evaluated from a set of training data derived under the same conditions, and then added again to the parental sequence data evaluation as follows:
P (SM | m, i) = P_X _{| m}(am_i); Where X | m ~ BetaBinom (p_m(A) + w_i, am_i+ bm_i, s)
The probability of free floating DNA sequence data can be evaluated as follows:

Where X is BetaBinom (p (A) + w_i, a_i+ b_i, s).
In some embodiments, the method can be written specifically considering additional noise, differential sample quality, differential SNP quality, and random sample collection bias. Examples thereof are provided herein. The method has been found to be particularly useful in the context of the data generated using the enormous multiplexed mini-PCR protocol and was used in experiments 7-13. The method includes several steps each introducing different kinds of noise and / or deflection into the final model:
(1) a first sample containing a mixture of maternal and fetal DNA is generally in the range of 1,000 to 40,000 and the size is N₀, Where p is the actual% ref, and < RTI ID = 0.0 >
(2) In amplification using a common connection adapter, N_OneA dog molecule; Generally N_One ~ N₀/ 2 molecules are sampled and a random sample collection bias is assumed to be introduced due to sampling. The amplified sample contains a number of molecules N₂, Where N < RTI ID = 0.0 >₂ >> N_Oneto be. X_OneThis N_OneRepresents the amount of reference gene locus (per SNP bias) in the sample sampled, and the variable introducing random sample collection bias throughout the remainder of the protocol, p_One= X_One/ N_One. This sampling bias is included in the model by using a beta-binomial (BB) distribution instead of using a simple bias distribution model. The parameter N of the beta-binomial distribution can be assessed later in the SNP from 0 <p <1, per sample bias from training data after adjustment for leakage and amplification bias. Leakage is the likelihood that the SNP is misread.
(3) Since the amplification step will amplify any allele deflection, amplification deflection is introduced due to possible non-uniform amplification. Assuming that one allele in the locus is amplified f times and the other allele in the locus is amplified to g times, where f = ge^b(Where b is zero) indicates no bias. The bias parameter, b, centered at 0 indicates how much or less the A allele is amplified in place of the B allele at a particular SNP. The parameter b may vary from SNP to SNP. The bias parameter b may be evaluated per SNP deviation, e.g., by training data.
(4) Sequencing step involves sequencing a sample of amplified molecules. At that stage, there may be a leak, where the leak is a situation in which the SNP is read incorrectly. Leakage can be generated from any number of problems and is not only the precise allele A but also other alleles B found in these loci or SNPs that are readings as alleles C or D that are not typically found in these loci Can be generated. If the sequencing has a size N₃(Where N₃ <N₂), &Lt; / RTI > the sequence data of a plurality of DNA molecules of the amplified sample. In some embodiments, N₃20,000 to 100,000; 100,000 to 500,000; 500,000 to 4,000,000; 4,000,000 to 20,000,000; Or in the range of 20,000,000 to 100,000,000. Each molecule sampled has a probability p_g, Which in this case will appear correctly as the allele A. The sample was 1-p_gWill be incorrectly read as an allele that is not associated with the original molecule, and the probability p_rAs with allele A, probability p_mAs allele B or as probability p_oAs allele C or allele D, where p_r+ p_m+ p_oLt; / RTI > The parameter p_g, p_r, p_m, p_oAre evaluated on the basis of SNP by training data.
Different protocols may involve similar steps with variables in the molecular biology stage to produce different amounts of random sampling, different levels of amplification and different outflow biases. The following models can be applied to each of these cases well. A model for the amount of DNA sampled per SNP basis is provided as follows:
X₃~ Beta binomial (L (F (p, b), p_r, p_g), N * H (p, b))
Where p is the actual amount of reference DNA, b is the SNP bias reference, as described above, p_gIs the probability of correct reading, p_rIs the probability that, as described above, it appears to be an inaccurate but accidentally correct allele:

In some implementations, the method uses a beta-binomial distribution instead of a simple bias distribution; This considers random sample collection bias. The parameter N of the beta-binomial distribution is evaluated on a per-sample basis in the required criterion. Instead of just p, bias correction F (p, b) and H (p, b) are used to consider the amplification bias. The parameter b of bias is assessed by the training data per SNP criterion earlier than scheduled.
In some implementations, the method may use leak correction L (p, p_r, p_g); This accounts for the leakage bias, i.e., the various SNPs and sample quality. In some implementations, the parameter p_g, p_r, p_oWill be assessed per SNP criterion by training data earlier than scheduled. In some implementations, the parameter p_g, p_r, p_oCan be continuously updated with the current sample in consideration of various sample qualities.
The models described herein can very generally consider both differential sample quality and differential SNP quality. Different samples and SNPs are treated differently, as exemplified by the fact that some implementations use a beta-binomial distribution and their mean and variance are a function of the original amount of DNA and its sample and SNP quality.
Platform sampling
Consider a single SNP when the predicted allele ratio in plasma is r (based on maternal and fetal genotype). The predicted allele ratio is defined as the predicted function of the A allele in the combined maternal and fetal DNA. Maternal genotype g_m And child genotype g_c, The predicted allele ratio is provided in Equation 1, assuming that the genotype is also represented by the allele ratio.

Observations in the SNP show that each of the alleles present, n_a And n_b(Which is summed to the depth of the reading d). It is assumed that the limits are already applied to the mapping probability and the phred score so that the lapping and allele observations can be considered correct. The Fred score is a numerical measure of the probability of a particular measurement being erroneous at a particular base. In one embodiment, when the base is determined by sequencing, the Fred score can be calculated from the ratio of the dye strength corresponding to the requested base to the dye strength of the other base. The simplest model for observability is a bias distribution that assumes that each of the d readings is obtained independently from a large hornblende with an allelic ratio r. Equation 2 describes the model.

The binomial model can be extended in a number of ways. If both the maternal and paternal genotypes are A or all B, the predicted allele ratio in plasma will be 0 or 1, and the binary probability will not be well defined. In fact, unexpected alleles are sometimes actually observed. In one embodiment, the corrected allele ratio

= 1 / (n_a + n_b) To allow a small number of unexpected alleles. In one embodiment, it is possible to model the ratio of the unexpected alleles present in each SNP and to correct the predicted allele ratio using the model. If the predicted allele ratio is not 0 or 1, then the observed allele ratio may not converge to a sufficiently large depth of the reading for the predicted allele ratio due to amplification bias or other phenomena. Thereafter, the allele ratio is increased by P (n_a, n_bcan be modeled as a beta distribution in the middle of the predicted allele ratio, resulting in a beta-binomial distribution to the population (| r).
The platform model for response at a single SNP is F (a, b, g_c, g_m, f) (3), or n_a = a and n_b = b, which is also determined by the fetal fraction via Equation 1. The functional form of F may be a binary distribution, a beta-binary distribution, or a similar function, as discussed above.

In one embodiment, the child fraction may be measured as follows. The maximum probability estimate of fetal fraction f for antenatal testing can be derived without the use of paternity information. This may be relevant if paternal genetic data is not available, for example, if the father record is not actually the genetic father of the fetus. Fetal fraction is estimated to be a set of SNPs, where the maternal genotype is 0 or 1, producing only two possible sets of fetal genotypes. S₀Is defined as a set of SNPs with a genotype 0 of the genotype and S_OneIs defined as a set of SNPs with a maternal genotype 1. S₀Possible fetal genotypes are 0 and 0.5, possible allele ratio R₀(f) = {0, f / 2}. Similarly, R_One(f) is {1-f / 2, 1}. The method may be slightly extended to include a SNP, where the maternal genotype is 0.5, but these SNPs will be of little benefit due to a larger set of allelic ratios.
N_a0 And N_b0To S₀, And N_a1For SNP in n_as And n_bs, &Lt; / RTI > and N_a1 And N_b1Similarly, S_One.

Is defined by Equation (4).

Assuming that the number of alleles in each SNP is independently subject to the plasma allele ratio of the SNP, the probability can be expressed as the product of the SNPs in each set (5).

The dependency on f is R₀(f) and R_One(f). < / RTI > SNP probability P (n_as, n_bs| f) can be approached by estimating the maximum probability genotype that is conditioned in f. At sufficiently high fetal fraction and read depth, the choice of the most probable genotype will be high confidence. For example, a fetal fraction of 10 percent and a reading depth of 1000 considers a SNP with a parental genotype 0. The predicted allele ratio is 0 and 5 percent, which will be easily distinguishable at sufficiently high depths of the reading. Replace the estimated child genotype with Equation 5 to generate the complete equation (6) for the fetal fraction evaluation.

The fetal fraction must be in the range [0, 1], so optimization can be easily performed by a limited one-dimensional study.
In the presence of a low depth of reading or a high noise level, it may not be desirable to estimate the maximum probability genotype, which can result in artificially high reliability. Other methods may be the sum over the possible genotypes in each SNP, and S₀Lt; RTI ID = 0.0 > P (n_a, n_b| f), the following equation (7) is generated. The previous probability P (r) is R₀(f), or may be based on population frequency. Group S_OneThe extension to is trivial.

In some implementations, the probability can be derived as follows. Reliability is based on two hypotheses H_t And H_fOf the data. The likelihood of each hypothesis is derived based on a response model, an estimated fetal fraction, a maternal genotype, an allele frequency, and plasma allele frequencies.
The following notation is defined:
G_m, G_c Actual mother and child genotype
G_af, G_tf The actual genotype of the claimed father and actual father
G (g_c, g_m, g_tf) = P (G_c = g_c| G_m = g_m, G_tf = g_tf) Probability of inheritance
P (g) = P (G_tf = g) Population frequency of genotype g in a specific SNP
Assuming that observations at each SNP independently condition the plasma allele, the likelihood of paternity hypothesis is a product of the likelihood at the SNP. The following formula leads to the possibility for a single SNP. Equation (8) is a representation of the probability of any hypothesis h,_t And H_f.

H_t, The claimed father is the actual father and the fetal genotype is derived from the maternal genotype and the claimed parental genotype according to equation (9).

H_f, The asserted father is not the actual father. The best evaluation of the actual paternal genotype is provided by population frequency in each SNP. Thus, the likelihood of a child's genotype is measured by the known maternal genotype, and population frequency, as in equation (10).

Reliability for accurate paternity C_pIs calculated from the product over the SNPs of the two possibilities using the Bayes rule (11).

Maximum Probability Model Using Fetal Fraction Percentage
Mother line It is important to measure the embryonic status of a fetus by measuring freely floating DNA contained in serum or by measuring genotypes in any mixed sample. For example, there are a number of methods to perform an analysis of the number of readings assuming that the total amount of DNA of these chromosomes found in maternal blood will be elevated in relation to the reference chromosomes when they are trisomic in a chromosome specific to a child. One way to detect trisomy in such a fetus is to determine the amount of DNA predicted for each chromosome, depending on the number of SNPs in the assay set corresponding to a given chromosome, or the number of unique mappable regions of the chromosome . Once the measurements are standardized, any chromosomes for which the amount of DNA measured exceeds a certain limit are measured as trisomy. This approach is described in Fan, et al., PNAS, 2008; 105 (42); pp. 16266-6271,And also in Chiu et al. BMJ 2011; 342: c7401. In a paper by Chiu et al., This standardization was achieved by calculating the following Z scores:
The Z score for the 21 percent chromosome in the test case = ((chromosome 21 percent in test case) - (mean 21 percent chromosome in reference control)) / (standard deviation of chromosome 21 percent in reference control).
These methods use a single hypothesis rejection method to measure the maternal condition of the fetus. However, they suffer from some significant disadvantages. These methods of measuring drainage in the fetus are invariant to the percentage of fetal DNA in the sample, so they use one cut-off; The result is that if the precision of the measurement is not optimal and the percentage of fetal DNA in the mixture is relatively low, it will suffer from the worst possible accuracy.
In one embodiment, measuring the parenteral status of a fetus using the method of the invention involves considering a fraction of fetal DNA in the sample. In another embodiment of the present application, the method includes the use of a maximum likelihood estimation. In one embodiment, the method comprises calculating the percentage of DNA in the original fetal or placental sample. In one embodiment, the limit for requesting isomerism is appropriately adjusted based on the calculated percentage of fetal DNA. In some embodiments, a method for evaluating the percentage of DNA of a fetal origin in a mixture of DNA comprises the steps of obtaining a mixed sample comprising maternal genetic material and fetal genetic material, Obtaining DNA from the mixed sample, measuring DNA in the pseudo-sample, and measuring DNA in the mixed sample and the pseudo-sample using the percentage of DNA of the fetal origin in the mixed sample .
In embodiments herein, the fraction of fetal DNA in the mixture, or the percentage of fetal DNA, can be determined. In some embodiments, the fraction can be calculated using only genotype determinations made on the mother plasma sample itself, which is a mixture of fetal and maternal DNA. In some embodiments, the fraction may be calculated using the measured or otherwise known genotype of the mother, or a measured or otherwise known genotype of the parent and / or father. In some embodiments, the percentage of fetal DNA can be calculated using measurements made on a mixture of maternal and fetal DNA with knowledge of the parental relationship. In one embodiment, the fraction of fetal DNA can be modeled on the likelihood of a particular allele measurement by calculating using population frequencies.
In one embodiment of the invention, the reliability can be calculated with the accuracy of the measurement of the maternal condition of the fetus. In one implementation, the maximum probability H_major) Of the hypothesis is (1- H_major) / Σ (all H). If the distribution of all hypotheses is known, it is possible to measure the reliability of the hypothesis. If the parental genotype information is known, it is possible to measure the distribution of all hypotheses. It is possible to calculate the reliability of a drainage measure if knowledge of the predicted distribution of the aneuploidy fetus and the predicted distribution of data on the isoquant fetus is known. It is possible to calculate these predicted distributions if parental genotype data is known. In one embodiment, more reliable requests can be made by measuring both the reliability of the request and the improvement of the limit, using knowledge of the distribution of test statistics around the normal hypothesis and around the abnormal hypothesis. This is particularly useful when the amount and / or percentage of fetal DNA in the mixture is low. For example, if Z statistic does not exceed the limit based on the limit that is optimized for the presence of a higher percentage of fetal DNA, It will help you avoid situations.
In one embodiment, the method described herein can be used to determine fetal integrity by measuring the number of copies of maternal and fetal target chromosomes in a mixture of maternal and fetal genetic material. The method may include obtaining a maternal tissue comprising both maternal and fetal genetic material; In some embodiments, the maternal tissue may be a maternal plasma or tissue isolated from maternal blood. The method may also involve obtaining a mixture of maternal and fetal genetic material from the maternal tissue by processing the maternal tissue described above. The method comprises the steps of distributing the genetic material obtained in a plurality of reaction samples to randomly provide individual reaction samples that do not contain the target sequence of the target chromosome and individual reaction samples that do not contain the target sequence of the target chromosome, For example, performing a high emission sequence analysis on the sample. The method comprises analyzing a target sequence of a genetic material present or absent in the individual reaction sample to determine a first number of binary results, possibly indicative of the presence or absence of an electrophysiologic fetal chromosome in the reaction sample, Lt; RTI ID = 0.0 > a < / RTI > result of the binary result. Any of a number of binary results can be calculated, for example, by a method of an information technology that counts a sequence reading that maps to a particular chromosome, a special region of a chromosome, a specific locus, or a set of loci. The method may include standardizing the number of binary events based on the length of the chromosome, the length of the region of the chromosome, or the number of gene sites in the set. The method may involve calculating a predicted distribution of the number of binary outcomes for an amphotropic fetal chromosome in the reaction sample, possibly using a first number. The method can be performed, for example, by multiplying the expected number of readings of the number of binary outcomes for an electrophysiologic fetal chromosome by (1 + n / 2) (where n is the estimated fetal fraction) May be used to calculate the predicted distribution of the binary outcome for the isomeric fetal chromosome in the reaction sample using the estimated fraction of the fetal DNA found in the first sample and the first number. In some embodiments, a sequence reading can be processed in a stochastic mapping rather than a binary result; The method can produce higher precision, but requires more computing power. The fetal fraction can be assessed in a number of ways, some of which are described elsewhere herein. The method may include using a maximum likelihood approach to determine if the second number corresponds to a possible embryonic fetal chromosome. This method may include requesting the fetal drainage state to be a drainage state corresponding to a hypothesis with the greatest probability of accuracy, given the measured data.
Maximum probability It is noted that the use of the model can be used to increase the accuracy of any method of measuring the maternal condition of the fetus. Similarly, reliability can be calculated for any method of measuring the maternal condition of the fetus. The use of a maximum likelihood model can improve the precision of any method in which the drainage measurement is made using a single hypothesis rejection technique. The maximum likelihood model can be used for any method where the likelihood distribution can be computed for both normal and abnormal cases. The use of a maximum likelihood model implies the ability to compute reliability for a drain request.
Additional discussion of methods
In one embodiment, the methods disclosed herein utilize a quantitative measure of the number of independent observations of each allele at the polymorphic locus, which does not include calculating the ratio of alleles. This differs from methods such as those based on some microarrays that provide information on the ratio of two alleles at the locus, but do not quantify multiple independent observations of other alleles. Some methods known in the art can provide quantitative information on a number of independent observations, but calculations that lead to a drainage measurement use only allelic obesity and do not use quantitative information. To list the importance of holding information on the number of independent observations, consider the sample gene locus with two alleles, A and B. In the first experiment, 20 A alleles and 20 B alleles are observed. In the second experiment, 200 A alleles and 200 B alleles are observed. In both experiments the ratio (A / (A + B)) is equal to 0.5, but in the second experiment more information is conveyed than the first about the specificity of the frequency of the A or B allele. Rather than using allele ratios, the method uses quantitative data that more closely models allele frequencies at each polymorphic locus most closely.
In one embodiment, the method constructs a genetic model to synthesize a plurality of polymorphic locus measurements to better distinguish diploidess from triploid performance and also to measure the type of trisomy. The invention also includes genetic linkage information to improve the accuracy of the method. This is in contrast to some methods known in the art that average allele ratios across all polymorphic loci on chromosomes. The methods disclosed herein can be used to determine the distribution of allelic frequencies predicted in dichromatous and trisomy chromosomes resulting from non-segregation during meiosis I, non-segregation during meiosis II, and non-segregation during mitosis early in fetal development . To illustrate why this is important, during meiosis I, there is no cross non-segregation while two different homologues generate troposomes inherited from one parent; Non-segregation during meiosis II or premature division of fetal development can produce two copies of the same homologue from one parent. Each scenario produces a different predicted allele frequency at each polymorphic locus and also at all physically linked loci considered to be linked (i. E. Locus on the same chromosome). Although crossing that results in the exchange of genetic material between homologues makes the genetic form more complicated, the method accepts it by using the genetic linkage information, that is, the physical distance between the recombination non-information and locus. To distinguish between meiosis I non-separation and meiosis II or mitotic non-separation, the method includes an increased probability of crossing as the distance from the central body to the model increases. Meiosis II and mitotic segregation are characterized by the fact that the mitotic segregation typically produces identical or nearly identical copies of one homologue, but the two congeners present after meiosis II non-segregation phenomena are often due to one or more intersections during spousal formation Can be distinguished by the fact that they are different.
In one embodiment, the method of the present invention may not measure the haplotype of the parent when dichromicity is suspected. In one embodiment, in the case of a tricomic, the method can be characterized by the fact that the plasma takes two copies from one parent and the parent relationship can be determined by the recognition that two copies are inherited from the parent in question Can be used to measure for one parent or both parents. In particular, the child may inherit two copies of the same copy of the parent (trisomic consensus) or both copies of the parent (inconsistent trisomy). The possibility of matching trisomy and inconsistent trisomy in each SNP can be calculated. A diploid request method that does not use a linking model to count crossings can calculate the overall probability of trisomy from a simple weighted average of trisomy and mismatched trisomy in which the sample matches across all chromosomes. However, due to the biologic mechanisms that produce uncoupling errors and crossings, trisomy can only change from unity to coincidence (and vice versa) on the chromosome only when crossing occurs. This method creates a more accurate pivotal request than a method that does not take this into account by probabilistically considering crossability.
In one embodiment, reference chromosomes are used to measure the amount of a child's fraction and the amount or probability distribution of noise levels. In one embodiment, the child fraction, the noise level, and / or the probability distribution are measured using only genetic information available from the chromosome for which the polydispersity status is being measured. The method works without reference chromosomes, and also without the fixation of a particular child fraction or noise level. This is a key improvement and differentiator of the genetic data originating from the reference chromosome by methods known in the art which are essential for correcting child fractions and chromosomal behavior.
In one embodiment, if the reference chromosome does not need to measure the fetal fraction, the hypothesis is measured as follows:

* priorprob (H)
Using a reference chromosome-based algorithm, we can typically estimate that the reference chromosome is dichromic, and then (a) fix the most similar child fraction and random noise level N based on these estimating and reference chromosome data:

Then,

Or (b) assessing the child's fraction and noise level distribution based on these estimating and reference chromosomal data. Specifically, it does not lock only one value for cfr and N, but specifies a probability p (cfr, N) for a broader range of possible cfr, where N is:

ego
Where priorprob (cfr, N) is the preceding probability of a particular child fraction and noise level measured by prior knowledge and experimentation. In some cases, it is only uniform over the range of cfr, N. Since the:

Can be described.
Both of these methods provide excellent results.
Note that in some instances it is not desirable, possible or feasible to use a reference chromosome. In this case, it is possible to separately derive the best drainage demand for each chromosome. Especially:

Can estimate the hypothesis H without estimating the dichromaticity with respect to the reference chromosome, and measure the chromosomes separately as described above for each chromosome. Possibly, using this method, either keep the fixed noise and child fraction parameters in a probabilistic form for each chromosome and each hypothesis, fix one of the parameters, or maintain both parameters It is possible to do.
Measurement of DNA, particularly when the amount of DNA is small, or when the DNA is mixed with contaminated DNA, is high in noise and / or prone to error. This noise produces very little genetic data and precise drainage demand. In some implementations, some other method of platform modeling or noise modeling may be used to calculate the deleterious effects of noise in drainage measurements. The present invention uses a coupled model of both channels, which uses a coupled model of both channels, which calculates for random noise due to the amount of input DNA, DNA quality, and / or protocol quality.
This is in contrast to some methods known in the art in which ploidy measurements are made using the ratio of allelic intensity at the locus. This method makes precise SNP noise modeling impossible. In particular, the measurement error does not specifically depend on the measured channel intensity ratio, which typically reduces the model using one-dimensional information. Precise modeling of noise, channel quality and channel interaction requires a two-dimensional coupled model that can not be modeled using allelic ratios.
In particular, administering two channel information bits (where f (x, y) is r = x / y) itself does not lead to precise channel noise and bias modeling. Noise at a particular SNP is not a function of the ratio, i.e., noise (x, y) ≠ f (x, y), but is actually the combining function of both channels. For example, in the bias model, the noise of the measured ratio has a change amount of r (1-r) / (x + y), not purely a function of r. In this model, if no channel bias or noise is included, at SNP i, the observed channel X value is x = a_iX + b_i, Where X is the actual channel value, and b_iIs additional channel bias and random noise. Similarly, y = c_iY + d_i. (aiX + bi) / (ciY + di) is not a function of X / Y, the observed ratio r = x / y can not accurately predict the actual ratio X / Y or model the residual noise.
The method described herein describes an effective method of modeling noise and bias using the combined bias distribution of both measurement channels individually. The related equations can also be found elsewhere in this article in the section on P (good) and P (ref | bad), P (mut | bad) per SNP persistence bias that effectively adjusts SNP behavior. In one embodiment, the method uses a beta binomial distribution, which avoids limiting the dependence on the allele ratio only, but instead models the behavior based on both channel numbers.
In one embodiment, the methods disclosed herein can request the fertility of the fetus conceived by genetic data found in the maternal plasma by using all possible measures. In one embodiment, the methods disclosed herein can request the fertility of a fetus conceived by genetic information found in the maternal plasma by using a subset of measurements of only the parental relationship. Some methods known in the art are those in which the parental relationship originates from the AA | BB relationship, i.e., the parent is both homozygous for a given locus, but is for a different allele. One problem with this method is that a small population of polymorphic loci, typically less than 10%, originate in the AA | BB relationship. In one embodiment of the method disclosed herein, the method does not use genetic measurements of maternal plasma made at the locus with the maternal information AA | BB. In one embodiment, the method uses plasma measurements for these polymorphic loci using AA | AB, AB | AA, and AB | AB parent relationships.
Some methods known in the art include the step of averaging allelic ratios originating from SNPs in the AA < RTI ID = 0.0 > BB < / RTI > relationship in which both parental genotypes exist and require a measure of the drainage demand resulting from the average allele ratio for these SNPs do. The method undergoes significant precision due to differential SNP activity. It is noted that the method presumes that both parental genotypes are known. In contrast, in some embodiments, the method uses a binding channel distribution that does not presume the presence of a parent, and does not estimate a uniform SNP behavior. In some embodiments, the method considers different SNP behavior / weighting. In some embodiments, the method requires knowledge of one or both parental genotypes. An example of how the present method can accomplish this is as follows:
In some implementations, the logarithm of the hypothesis can be measured per SNP criterion. At a particular SNP i, estimating the fetal DNA hypothesis H and fetal DNA percent cf, the logarithm of the observed data D is defined as:

Where m is a possible actual maternal genotype and f is a possible actual paternity genotype, where m, f

{AA, AB, BB}, where c is a possible child genotype that provided the hypothesis H. In particular, in the case of monoclonal, c

, And in the case of dichromaticity, c

In the case of trisomy, c

to be. Note that inclusion of parental genotype data produces a more precise measure of parenthesis, but parental genotype data is not necessary to work well with this method.
Some methods known in the art include averaging allelic ratios derived from SNPs where the allogenic homozygous but different alleles are measured in plasma (AA | AB or AA | BB relationships) and the average Is requested to measure the drainage requirement by the allelic ratio. This method is intended when the parental genotype is unavailable. How can I precisely request that plasma is homozygous and heterozygous in a specific SNP without the presence of the opposite paternal BB? If I use a lower child fraction, how can I see if the presence of the B allele can be the immediate presence of noise? Will; In addition, it is worth noting how to see if B is not a simple allele dropout of fetal measurement. Even when the heterozygosity of plasma can be measured practically, the method will not be able to distinguish between parental trisomy. In particular, if the mother is AA and the B is the SNP measured in the plasma, if the father's GG is GG, the parental genotype to be generated is AGG and an average ratio of 33% (100% for the child fraction) is generated. If the father is AG, the resulting offspring genotype may be AGG in the case of concordant autosomal domestication, with a mean ratio of 33% A or 66% A, based on AAG for unmatched trisomy It leads. If many trichromosomes are provided on a crossing chromosome, the entire chromosome may have any site between the incoincident trisomy and all incongruent trisomy, with a ratio of between 33 and 66% It can change anywhere. In the case of a clear dichromicity, the ratio may be approximately 50%. Without the use of a coupled model or a precise error model of the average, the method may miss many cases of parental trisomy. In contrast, the methods described herein specify parental genetic probabilities for each parental genetic post-treasure based on available genetic information and population frequency, and do not explicitly require parental genotypes. In addition, the methods disclosed herein can detect triclosanicity even in the absence or presence of parental genomic data and can be used to identify possible cross-over points from consensual trisomy to incoherent trisomy, .
Some methods known in the art require a method to average allele ratios of SNPs that are not known to the parent or parental genotype and to measure the drainage requirement by the mean ratio in these SNPs. However, no method for achieving such a goal is described. The methods disclosed herein can achieve precise drainage requirements in these situations, and the reduction to implement is described using a combinability maximum likelihood method somewhere in this document, and optionally uses SNP noise and bias models and connection models.
Some methods known in the art include a step of averaging allele ratios and are required to measure drainage demands by the average allele ratio in one or a few SNPs. However, this method does not use the concept of connection. The methods disclosed herein do not suffer from these drawbacks.
Use of sequence length as a prefix to measure the origin of DNA
The distribution of sequence lengths is different for maternal and paternal DNA, and fetuses have generally been reported to be shorter. In one embodiment of the invention, it is advantageous to use prior knowledge in the form of experimental data and to construct a pre-distribution for the predicted length of the maternal system (P (X | maternal)) and fetal DNA (P It is possible. It is possible to specify the probability that the provided DNA sequence is a maternal or paternal DNA, based on the pre-probability x for the provided maternal or fetal, by providing a newly unidentified DNA sequence of length x. In the case of P (x | maternal system)> P (x | parent system), the DNA sequence is P (x | maternal system) P (x | fetal) / [(P (x | parent)], when the DNA sequence is a fetus, In one embodiment of the present application, by considering sequences that can be designated as high probability maternal or fetal strains, it is possible to identify maternal and fetal sequence lengths specific for this sequence After measuring the distribution, this sample-specific distribution can be used as the predicted size distribution for this sample.
Variability to minimize sequencing costs Readings depth
For example, in many clinical approaches to diagnosis in the literature (see Chiu et al., BMJ 2011; 342: c7401), after establishing a protocol using multiple parameters, the sample protocol is the same for each of the patients during the approach. This is done using variables. When using sequencing as a method for measuring genetic material to measure the maternal condition of the fetus being conceived in the mother, one suitable parameter is a number of readings. A plurality of readings may refer to a plurality of actual readings, a plurality of intended readings, a fractional lane, a complete lane, or a complete flow cell on a sequencer. In these studies, a number of readouts are typically set at a level that will ensure that all or nearly all samples achieve the desired level of precision. Sequencing is currently a costly technology, and its cost is approximately $ 200 per 5 million mappable readings, but the price is going down and the diagnosis based on sequence analysis can be made with similar level of precision Any method that allows to operate using a small amount of readings will save a significant amount of cost.
The accuracy of the ploidy measurement typically depends on a number of factors including the number of fractions of the readout and fetal DNA in the mixture. The accuracy is typically higher when there are more fractions of fetal DNA in the mixture. At the same time, the precision is typically higher when the number of readings is larger. It is possible to have a situation with two cases in which the ploidy state is measured with a precision considered, wherein the first case has a lower fraction of fetal DNA in the mixture than the second and more of the readings are sequenced in the first case than the second . It is possible to use the estimated fraction of fetal DNA in the mixture as a guide in measuring a large number of readings that are necessary to achieve the level of precision provided.
In one embodiment of the invention, a set of samples can be run, wherein different samples in the set are sequenced to different read depths, wherein the number of readings performed in each sample is selected and the Given a calculated fraction of fetal DNA, a certain level of accuracy is achieved. In one embodiment of the invention, this may involve measuring mixed samples to determine the fraction of fetal DNA in the mixture; This evaluation of the fetal fraction can be performed using sequence analysis, which can be performed with TAQMAN, which can be performed with qPCR, which can be performed with the SNP sequence, Lt; / RTI > can be performed using any method that can distinguish between the two. The need for fetal fraction assessment can be assessed by including hypotheses that include all or selected sets of fetal fractions in a set of hypotheses to be considered when compared to actual measured data. Fetal DNA in the mixture after fractionation can be measured to determine the number of sequences to be read for each sample.
In one embodiment of the invention, 100 pregnant women visit each of these OBs and draw their blood into a blood tube containing something that deactivates the anti-lysing agent and / or DNAase. Each of these gives home a kit for the father of their conceived fetus who provided saliva samples. Both sets of genetic material for all 100 pairs are sent back to the lab where the maternal blood is rotated to separate the envelope and plasma. Plasma contains a mixture of maternal DNA and placental DNA. Maternal smoke and fetal blood are genotyped using SNP sequences, and DNA in maternal plasma samples is targeted with SURESELECT hybridization broth. Using probe-down DNA, generate one hundred targeted libraries, one for each of the maternal samples, where each sample is tagged with a different tag. The fractions were removed from each library and each of these fractions was mixed together and added to the two lanes of the ILLUMINA HISEQ DNA Sequence Analyzer in a multiplexed fashion where each lane produced approximately 50 million mappable readings and 100 Generate approximately 100 million mappable readings for each of the multiplexed mixtures or produce approximately one million readings per sample. Sequence readings were used to determine the fraction of fetal DNA in each of these mixtures. Fifty of the samples had more than 15% fetal DAN in the mixture, and one million readings were sufficient to measure the maternal condition of the fetus with 99.9% confidence.
Of the remaining mixtures, 25 had 10 to 15% fetal DNA; Fractions of each of the associated libraries made from these mixtures were multiplexed and one lane of HISEQ was discontinued to generate an additional 2 million readings for each sample. Two sets of sequence data for each of the mixtures with 10 to 15% fetal DNA were added together to yield 3 million readings per sample sufficient to measure the embryonic status of these fetuses with a confidence of 99.9%.
Of the remaining mixtures, 13 had 6 to 10% fetal DNA; Fractions of each of the associated libraries made from these mixtures were multiplexed to stop one lane of HISEQ to generate an additional 4 million readings for each sample. Two sets of sequence data for each of the mixtures with 6 to 10% fetal DNA were added together to generate a total of 5 million readings per mixture, sufficient to measure the morbidity status of these embryos with 99.9% confidence.
Of the remaining mixtures, 8 had 4 to 6% fetal DNA; Fractions of each of the relevant libraries made from these mixtures were multiplexed to stop one lane of HISEQ to generate an additional 6 million readings per sample. Two sets of sequence data for each of the mixtures with 4 to 6% fetal DNA were added together to generate 7 million readings per mixture, sufficient to measure the morbidity status of these embryos with 99.9% confidence.
Of the remaining 4 mixtures, all of them had 2-4% fetal DNA; Fractions of each of the relevant libraries made from these mixtures were multiplexed to stop one lane of HISEQ to generate an additional 12 million readings for each sample. Two sets of sequence data for each of the mixtures with 2 to 4% fetal DNA were added together to yield 13 million readings per mixture, sufficient to measure the maternal status of these embryos with a confidence of 99.9%.
The method requires six lanes to be sequenced on a HISEQ machine to achieve 99.9% accuracy for over 100 samples. If the same number of runs is required for all samples, 25 sequencing lanes can be taken to ensure that all the drainage measurements are at 99.9% confidence, and if no request ratio or tolerance of 4% is acceptable, This could be achieved with 14 sequencing lanes.
Use of raw genotyping data
There are a number of ways to achieve NPD using fetal genetic information measured in fetal DNA found in maternal blood. Some of these methods involve measuring fetal DNA using SNP sequences, some methods involve untargeted sequence analysis, and some methods involve targeted sequence analysis. Targeted sequencing can target the SNP, which can target STR, which can target other polymorphic loci, which can target non-polymorphic loci, or some combination thereof. Some of these methods may include using a commercially available or appropriate allele requestor requesting the identity of the allele to determine from the intensity data of the sensor in the machine performing such measurements. For example, the ILLUMINA INFINIUM system or the AFFYMETRIX GENECHIP microarray system comprises a bead or microchip with an attached DNA sequence that can hybridize to complementary segments of DNA; During hybridization, there is a change in the fluorescence properties of sensor molecules that can be detected. There may also be a sequencing method, for example, ILLUMINA SOLEXA GENOME SEQUENCER or ABI SOLID GENOME SEQUENCER, wherein the genomic sequence of the DNA fragment is sequenced; Upon extension of the strand of DNA complementary to the strand to be sequenced, the identity of the extended nucleotide is typically detected via a fluorescent or radioactive tag attached to the complementary nucleotide. In both of these methods, the genotype or sequence analysis data is typically measured based on fluorescence or other signals, or defects thereof. These systems are typically combined with low-level software packages that produce fluorescence or specific allele requests (secondary genetic data) from the analog output of another detection device (primary genetic data). For example, in the case of a given allele in a SNP sequence, the software will make a request that a particular SNP is present or absent, for example, if the intensity of fluorescence is measured above or below a certain limit. Similarly, the output of the sequencer is a chromatogram indicating the level of fluorescence detected for each of the dyes, and the software will make a request that a particular base pair is A or T or C or G. The high emission sequencer typically produces a series of such measurements, referred to as readings, which represent the most similar structures of the sequenced DNA sequences. The direct analogous emission of the chromatogram is defined herein as the primary genetic data, and the base pair / SNP requests made by the software are considered to be secondary genetic data herein. In one embodiment, the primary data refers to unprocessed intensity data, which is the unprocessed output of the genotypic platform, wherein the genotypic platform can refer to a SNP sequence, or a sequence analysis platform. Secondary genetic data refers to processed genetic data, where an allele request is made, or sequence data is assigned to a base pair and / or a sequence reading is mapped to the genome.
Many higher-level applications take advantage of these allele requests, SNP requests and sequencing readings, that is, genetic analysis software, secondary genetic data. For example, DNA NEXUS, ELAND or MAQ takes a sequence analysis readout and maps them into the genome. For example, with regard to noninvasive prenatal diagnosis, PARENTAL SUPPORT^TM, Can measure the genotype of an individual using multiple SNP requests as leverage. Also, in connection with preimplantation genetic diagnosis, it is possible to take a set of sequence readings mapped to the genome and measure the morbidity status of an individual by taking a standardized number of readings mapped to each chromosome or cross-section of a chromosome Do. With regard to non-invasive fetal diagnosis, it may be possible to take a set of measured sequence readings in DNA present in the maternal plasma and map them into the genome. Thereafter, the readings that are superimposed on the cross-sections of each chromosome, or chromosome, take a standardized number, and the data can be used to determine the ploidy status of the individual. For example, a chromosome with a disproportionately large number of readings may be able to conclude that the blood from which the blood is drawn is trisomic in the fetus being pregnant.
However, in practice, the initial output of the measuring device is a homologous signal. If a particular base pair is requested by the software associated with the sequencing software, for example, the software may request a base pair T, and in fact the request is a request that the software is deemed most probable. However, in some cases, the request may be less reliable, for example, a homologous signal may indicate that the probability of a particular base pair being T is only 90% and the probability of becoming A is 10%. In another example, a genotype requesting software associated with a SNP array reader may request that a particular allele be G. In practice, however, the homologous signal encountered may indicate that the likelihood of an allele is only 70% and the likelihood of an allele T is 30%. In these cases, if higher-level applications use genotype requests and sequence requests made by lower-level software, they lose some information. That is, as directly measured by the genotypic platform, the primary genetic data may be more messy than the secondary genetic data measured by the attached software package, but it contains more information. When mapping the secondary genetic data sequence to the genome, a large number of readings are ejected because some bases are not read with sufficient clarity and / or the mapping is not clear. When primary genetic data sequence readings are used, all or a large number of readings that can be sold if they are first transferred to a secondary genetic data sequence readout can be processed and used in a stochastic manner.
In embodiments herein, the higher level of software does not rely on sequence readings measured by allele requests, SNP requests, or lower level software. Instead, a higher level of software is based on its computation of analog signals measured directly from the genotypic platform. In one embodiment herein, the PARENTAL SUPPORT^TMIs transformed to use the primary genetic data directly as measured by the genetic platform by processing its ability to reconstruct the genetic data of the embryo / fetus / child. In one embodiment herein, the PARENTAL SUPPORT^TMMay use both the primary genetic data and the allele request and / or the chromosome copy request without using the secondary genetic data. In one embodiment of the invention, all genetic requests, SNP requests, sequence readings, sequence mappings are probabilistic by using raw intact data measured directly by the genotypic platform, rather than converting the primary genetic data to a secondary genetic request Lt; / RTI > In one embodiment, the DNA measurement of the manufactured sample used to calculate the allele probability and calculate the relative probability of each hypothesis includes the primary genetic data.
In some embodiments, the method can increase the precision of the genetic data of a target entity comprising genetic data of at least one related entity, the method comprising the steps of: (S) of the genome (s) of the target entity (s), a set of one or more hypotheses regarding which segments of the chromosome of the individual (s) involved are likely to correspond to these segments in the genome of the target entity Measuring the probability of each hypothesis providing the primary genetic data of the target entity and the genetic data of the associated entity (s), and determining the probability of each of the hypotheses providing the genetic data of the target entity And measuring the most similar state. In some embodiments, the method can measure the number of copies of a segment of a chromosome in the genome of a target entity, which method comprises determining the number of copies of the chromosome segment in the genome of the target entity Including genetic information of one or more related entities and primary genetic data of the target entity into a data set, evaluating characteristics of a platform response associated with the data set, wherein the platform response varies from experiment to experiment Calculating the conditioned probability of each copy number hypothesis, the predetermined data set and the platform response characteristics, and measuring the copy number of the chromosome segment based on the most probable copy number hypothesis. In one embodiment, the method can measure the pivotal state of at least one chromosome in a target entity, the method comprising: obtaining primary genetic data from the target entity and from one or more related entities; Generating a set of at least one drainage state hypothesis for each of the plurality of drainage state hypotheses, using at least one discharge technique each of the drainage state hypotheses in the set, statistical probabilities for each skilled artifact used, , Summing the metabolic probabilities as measured by one or more skilled techniques and for each chromosome in the target entity based on the combined statistical probabilities of the respective mismatched condition hypotheses And measuring the drainage condition. In one embodiment, the method can measure the allelic status of a set of alleles from a set of alleles, a target entity, and / or one or both parents of the target entity and, optionally, from one or more related entities, The method comprises the steps of obtaining the primary genetic data of a target entity, and one or both parents, and any related entity, at least one allele for the target entity, and one or both parents, and optionally one or more related entities Generating a hypothesis, wherein the hypothesis describes a possible allele state in a set of alleles; measuring the statistical likelihood of each of the allelic hypotheses in a particular set of hypotheses; and Based on the statistical probabilities of the allelic hypotheses, Or it comprises two measuring allele status for both, and, optionally, on at least one person associated with the object.
In some embodiments, the genetic data of the mixed sample may include sequence data, wherein the sequence data may not be unique to the human genome. In some embodiments, the genetic data of the mixed sample may include sequence data, wherein the sequence data is mapped to a plurality of locations in the genome, wherein each possible mapping is associated with a probability that a given mapping is correct do. In some embodiments, the sequence reads are not presumed to be associated with a particular location in the genome. In some embodiments, the sequence reads relate to a number of locations in the genome and related probabilities belonging to that location.
Calculation method for measuring chromosome copy number
In one aspect, the invention features a method of testing for an abnormal distribution of a fetal chromosome by comparing the number of sequence tags aligned on different chromosomes (see, e.g., , U.S. Patent No. 8,296,076, filed April 20, 2012). As is known in the art, the term "sequence tag" refers to a relatively short (e.g., 15-100) nucleic acid sequence that can be used to identify a particular larger sequence to be mapped to a chromosome or genomic region or gene It says. In some embodiments, the method comprises (i) contacting a sample comprising a mixture of maternal or fetal DNA with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different target gene sites to produce a reaction mixture wherein the target gene locus originates from a number of different chromosomes, wherein a plurality of different chromosomes have an abnormal distribution in the sample At least one first chromosome presumed to be present and at least one second chromosome presumed to be generally distributed in the sample); (ii) applying the reaction mixture to primer extension reaction conditions to produce an amplified product; (iii) sequencing the amplified product to obtain a plurality of sequence tags aligned to the target gene locus, wherein the sequence tag is of sufficient length to be assigned to a specific target gene locus; (iv) designating a plurality of sequence tags for their corresponding target gene locus on the computer; (v) measuring the number of sequence tags assigned to the target gene locus of the first chromosome and the number of sequence tags assigned to the target gene locus of the second chromosome on the computer; And (vi) comparing the numbers measured in step (v) to determine the presence or absence of an abnormal distribution of the first chromosome.
In one aspect, the present invention provides a method for measuring the presence or absence of fetal insulin by comparing the relative frequencies of target amplicons between chromosomes (cf., for example, PCT Publication No. WO 2012/103031, filed January 23, 2012). In some embodiments, the method comprises (i) contacting a sample comprising a mixture of maternal and fetal DNA with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different non-polymorphic target gene sites to produce a reaction mixture, wherein the target gene locus is from a number of different chromosomes; (ii) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising the target amplicon; (iii) quantifying the relative frequency of the target amplicon of the desired first and second chromosomes on the computer; (iv) comparing the relative frequencies of the target amplicon of the first and second chromosomes desired on the computer; And (v) identifying the presence or absence of the biosynthetic based on the relative relative frequency of the desired first and second chromosomes. In some embodiments, the first chromosome is a chromosome deduced to be of the order of magnitude. In some embodiments, the second chromosome is a chromosome suspected of being isomeric.
Combination of prenatal diagnosis
There are many methods that can be used for fetal diagnosis or prenatal screening of isoleptic or other genetic defects. U.S. Utility Model Serial No. 11 / 603,406, filed November 28, 2006, U.S. Utility Model Serial No. 12 / 076,348, filed March 17, 2008, and PCT Application Serial No. PCT / S09 / 52730 describes one such method of using genetic data of a related entity to increase precision and use it to know or evaluate genetic data of a target entity such as a fetus. Other methods used in fetal diagnosis include measuring levels of specific hormones in the maternal blood, where these hormones are associated with various genetic abnormalities. An example of this is the triple test, in which several (usually two, three, four or five) different hormone levels are measured in the maternal blood. In the case of measuring the likelihood of a particular result using a plurality of methods, it is possible to add more precise predictions than any of the individual methods by summing the predetermined information in these ways, if the method is not deterministic or not itself. In triple trials, summing the given information by three different hormones can predict a genetic abnormality more precise than the individual hormone levels that can be predicted.
Methods for making more precise predictions of the likelihood of genetic abnormalities in the fetus, including combining the predictions of the genetic status of the fetus, specifically the genetic abnormality in the fetus, are described herein, . A "more precise" method may refer to a method of diagnosing an abnormality having a lower false negative rate at a given false positive rate. In a preferred embodiment of the invention, one or more predictions are made on the basis of genetic data known to the fetus, wherein the genetic knowledge is PARENTAL SUPPORT^TM Method was used to measure genetic data of the fetus with greater precision using genetic data of the individual associated with the fetus. In some embodiments, the genetic data may include the embryonic state of the fetus. In some embodiments, the genetic data can refer to a set of allelic requests for the fetal genome. In some implementations, some of the predictions could be made using triple trials. In some implementations, some of the predictions could be made using measurements of other hormone levels in the maternal blood. In some implementations, predictions made by methods that take into account diagnostics may be combined with predictions made by methods that take into account screening. In some embodiments, the method comprises measuring the maternal blood level of alpha-fetoprotein (AFP). In some embodiments, the method comprises contacting unconjugated styrene (UE)₃&Lt; / RTI > In some embodiments, the method comprises measuring the maternal blood level of beta-human chorionic gonadotropin (beta-hCG). In some embodiments, the method comprises measuring the maternal blood level of an invasive nutrition cell stem antigen (ITA). In some embodiments, the method comprises measuring the substance blood level of the inhivin. In some embodiments, the method comprises measuring maternal blood levels of pregnancy-related plasma protein A (PAPP-A). In some embodiments, the method comprises measuring the maternal blood level or maternal serum marker of the other hormone. In some implementations, some of the predictions may be made using other methods. In some implementations, some of the predictions have been made using a fully integrated test, such as ultrasound and blood testing near twelve weeks of gestation plus a second blood sample at around sixteen weeks. In some embodiments, the method comprises measuring the fetal nape translucency (NT). In some embodiments, the method comprises using a measured level of the hormone described above to achieve a prediction. In some embodiments, the method comprises a combination of the methods described above.
There are a number of ways to combine predictions, for example, converting a hormone measurement to multiple intermediate values (MoM) and then converting to a likelihood ratio (LR). Similarly, other measurements can be converted to LR using a mixture model of the NT distribution. NT LR and biochemical markers were able to induce a risk for various conditions such as trisomy 21 by multiplying age and risk associated with digestion. The detection ratio (DR) and the false positive rate (FPR) could be calculated by taking a population with a risk that exceeds a certain risk limit.
In one embodiment, the method for requesting a drainage condition includes comparing the relative probability of each of the measured drainage hypotheses using the joint distribution model and the likelihood of the allele number, comparing the number of readings, comparing the heterozygosity ratio, Including, but not limited to, the only available statistics, if any, the probability of a standardized genotype signal for a particular parent relationship, statistics calculated using the first sample or the estimated fetal fraction of the produced sample, And combining the relative likelihood of each of the calculated drainage hypotheses using a statistical technique taken from another method of measuring a risk score for a non-trisomy adult fetus.
Other methods may include situations using four measured hormone levels, where the probability distribution around these hormones is known: p (x_One, x₂, x₃, x₄| e) and in the case of isomerism p (x_One, x₂, x₃, x₄| a). Thereafter, we can measure the probability distribution for DNA measurement, g (y | e), and g (y | a) for anomalous and anisotropic cases, respectively. Assuming that they are dependent on an assumption of a given water / water permeability, p (x_One, x₂, x₃, x₄| a) g (y | a) and p (x_One, x₂, x₃, x₄| e) g (y | e), then the parent age can be multiplied by the given previous p (a) and p (e), respectively. Thereafter, the maximum can be selected.
In one embodiment, a central limit theorem can be induced to estimate the Gaussian, and the mean and standard deviation, by observing multiple samples on a distribution over g (y | a or e). In other implementations, it may be assumed that they are not independent of the predetermined result and the combined distribution p (x_One, x₂, x₃, x₄| a or e) can be collected.
In one embodiment, the drainage state for the target entity is measured to be a drainage state associated with the hypothesis that the probability of this is maximum. In some cases, one hypothesis will have a standardized, combined probability of over 90%. Each hypothesis is associated with one or a set of drainage states, and its standardized, combined probability is greater than 90%, or some other threshold value such as 50%, 80%, 95%, 98%, 99%, or 99.9% May be selected as the required limit for the hypothesis to be requested as the measured drainage condition.
Pre-pregnancy child present in maternal blood DNA
One difficulty with non-invasive prenatal diagnosis is to differentiate the fetal cells of the current pregnancy from the fetal cells of the previous pregnancy. Some believe that the genetic material of the previous pregnancy will disappear after some time, but no conclusive evidence has appeared. In embodiments herein, the fetal DNA present in the maternal blood of the paternal origin (i.e., the DNA inherited from the fetal parent)^TM (PS) method, and knowledge of the parent system genome. The method may utilize the stepped parental genetic information. It is possible to grade the parental genotype by unstaged genetic information using genetic data of grandparent based genetic data (such as measured genetic data of grandparent sperm), or other fetal child, or genetic data of a sample of a lactic acid . In addition, unstaged genetic information can be phased out by HapMap-based staging, or by embryo-based methods of parental cells. A successful embryo was demonstrated by stopping cells in the phase of mitosis when the chromosome was a strong bundle and the chromosome was isolated in a separate well using microfluidics. In another embodiment, it is possible to detect the presence of one or more homologues of a father using staged parental haplotype data, which implies that the genetic material of one or more children is present in the blood. By focusing on chromosomes that are predicted to be in the fetus, it is possible to exclude the possibility of being affected by the fetal trisomy. It is also possible to measure cases in which fetal DNA is not of origin in the present father, and in this case, other methods such as trials can be used to predict genetic abnormalities.
There may be other sources of fetal genetic material available through methods other than blood sampling. In the case of fetal genetic material available in maternal blood, there are two main categories: (1) whole fetal cells, for example, nucleated fetal red blood cells or red blood cells, and (2) freely floating fetal DNA . In the case of whole fetal cells, the presence of the fetal cells in the maternal blood for an extended period of time can separate the cells of the pregnant woman containing the child or the DNA of the fetus of the previous pregnancy. There is also evidence that freely floating fetal DNA is purified in the system as a matter of concern. One challenge is to make sure that its genetic material measures the substance of an individual contained in the cell, that is, the measured genetic material does not originate from the fetus of the previous pregnancy. In embodiments of the present application, using knowledge of maternal genetic material, it can be seen that the genetic material in question is not a maternal genetic material. PARENTAL SUPPORT, as described in this document or any of the patents mentioned in this document,^TMThere are a number of ways to accomplish this goal, including an informatics-based method such as < RTI ID = 0.0 >
In one embodiment of the invention, the blood collected from a pregnant mother can be separated into fractions that include free floating fetal DNA, and fractions comprising nucleated red blood cells. Free floating DNA can be arbitrarily concentrated, and DNA genotyping information can be measured. The knowledge of the maternal genotype of the measured genotype information of freely floating DNA can be used to measure aspects of the fetal genotype. This aspect refers to a set of allelic entities, and / or a mucosal condition. Thereafter, the nucleated red blood cells of the individual can be genotyped using this document, and other reference patents, in particular the methods described elsewhere in the first paragraph of this document. The knowledge of the maternal genome allows us to measure whether a given single blood cell is genetically simulated or not. And, aspects of the fetal genotype measured as described above allow one to determine whether a single blood cell is genetically originated in the fetus that is currently pregnant. Essentially, this aspect of the present invention uses genetic information of the mother, and possibly genetic information of other related entities such as the father, together with measured genetic information of freely floating DNA found in the maternal blood, (A) genetically parental, (b) originating genetically from the current conceived fetus, or (c) genetically originating from the fetus of the previous pregnancy.
Before birth Sex chromosomal aberration measurement
In methods known in the art, those approaching to determine the gender of the fetus in pregnancy from maternal blood use the fact that the free floating DNA (fffDNA) of the fetus is present in the mock plasma. If the Y-specific gene locus can be detected in the plasma of the mother, this implies that it is a male child of conception. However, the absence of detection of the Y-specific gene locus in the plasma indicates that if the amount of fff DNA is too small to ensure that the Y-specific locus can be detected in a male fetus, Is not always guaranteed.
Here, we describe a novel method that does not require the measurement of Y-specific nucleic acids, i.e. DNA of the entire parental origin gene locus. The parental support method described above measures cross-frequency data, parent-based genotypic data, and the morbidity status of the fetus being conceived using information technology. The gender of the fetus is simply the embryonic state of the fetus in the sex chromosome. XX is a girl and XY is a girl. The method described herein is also intended to measure the maternal condition of the fetus. Note that sex steroids are synonymous with sex chromosome aberration measurements; In the case of a sex star, estimates are often made when there is some possible hypothesis because the child is hypo-diploid.
The method disclosed herein involves generating a baseline in terms of the predicted amount of fetal DNA present in the fetus found in the common genetic locus for both the X and Y chromosomes. Later, it is possible to obtain the information of the specific region only for the X chromosome, and to measure whether it is a child or a girl. In the case of boys, the present inventors predict that fetal DNA can hardly be found in the locus specific for the X chromosome, rather than the specific gene locus for both X and Y. In contrast, in fetal embryos, the inventors predict that the amount of DNA for each of these groups will be the same. The DNA in question can be measured by any technique capable of quantifying the amount of DNA present in the sample, such as qPCR, SNP sequences, genotyping, or sequencing. In the case of purely originating DNA in an individual, the inventor can be expected to find:

By allowing F + M = 100% in the case of fetal DNA mixed with the mother's DNA and in the case where the fraction of fetal DNA in the mixture is F and the fraction of the mother DNA of the mixture is M, You can predict what to look for:

If F and M are known, then the predicted ratio can be calculated and the observed data compared to the predicted data. If M and F are not known, the limit can be selected based on historical data. In both cases, a measured amount of DNA at the locus specific for both X and Y can be used as a baseline, and a test for sex of the fetus was made at the locus specific for the X chromosome only It can be based on the amount of DNA. If the amount is lower than the baseline by an amount approximately equal to ½F or below the predetermined limit, the fetus is determined to be male, and the amount is determined to be approximately equal to the baseline or lower If not, the fetus is measured as a girl.
In another embodiment, it can be found only at the gene locus commonly referred to as the Z chromosome, which is common to both the X and Y chromosomes. The small set of loci on the Z chromosome is typically A on the X chromosome and B on the Y chromosome. When the SNP of the Z chromosome is found to have the B genotype, the fetus is called the male; If the Z chromosome SNP is found to have only the A genotype, the fetus is called a girl. In another embodiment, it can be found in the locus of the gene found only on the X chromosome. A relationship such as AA | B is particularly beneficial because it indicates that the presence of B has the X chromosome of the mother's father. Relationships such as AB | B are also beneficial because the present inventors predict that they will find B only present in half, which is common in the case of fetal babies compared to South Africa. In another embodiment, both A and B alleles are present on the X and Y chromosomes, and SNPs are found on the Z chromosome when a SNP originates from the parent Y chromosome and which is known to originate from the parent X chromosome .
In one embodiment, it is possible to amplify a single nucleotide position known to vary between homologous non-recombinant (HNR) regions shared by chromosome Y and chromosome X. The sequences in the HNR region of interest are largely the same on the X and Y chromosomes. Within this same region, there are single nucleotide positions in the population where the constants in the X and Y chromosomes are different between the X and Y chromosomes. Each PCR assay can amplify the sequence of the locus present in both the X and Y chromosomes. There may be a single base that can be detected using sequencing or some other method within each amplified sequence.
In one embodiment, the sex of the fetus can be measured from the free floating DNA of the fetus found in the maternal plasma, and the method includes some or all of the following steps: 1) PCR (regular or mini-PCR , And optionally multiplexing) primers to amplify a single nucleotide position of the X / Y variant within the HNR region, 2) obtain a maternal plasma, 3) use a HNR X / Y PCR assay to target the maternal plasma PCR amplification, 4) sequencing the amplicon, and 5) testing the sequence data for the presence of the Y-allele within one or more amplified sequences. One or more beings can represent a fetus. The absence of the Y-allele of all amplicons represents a female fetus.
In one embodiment, targeted sequence analysis can be used to measure DNA in the maternal plasma and / or parental genotypes. In one embodiment, all sequences that clearly originate from the DNA received from the parent can be ignored. For example, with respect to AA | AB, the number of A sequences can be counted and all B sequences can be ignored. To measure the heterozygosity ratio for the algorithm, the number of observed A sequences can be compared to the predicted number of full sequences for a given probe. There are many ways to calculate the expected number of sequences for each probe on a sample basis. In one embodiment, it is possible to measure which fractions of all sequences belong to each specific probe and then use this experimental fraction to combine with a whole number of the sequence readouts to evaluate multiple sequences in each probe . Other approaches can be used to target a number of known allelic alleles and then use historical data to relate multiple readings from each probe to a number of readings from known homozygous alleles. For each sample, it was then possible to measure the number of readings in the homozygous allele and then use this measurement in conjunction with the experimentally derived relationship to evaluate multiple sequence reads in each probe.
In some embodiments, it is possible to measure the gender of the fetus by combining predictions made by a number of methods. In some embodiments, a number of methods are taken from the methods described herein. In some embodiments, at least one of the plurality of methods is taken from the method described in the present disclosure.
In some embodiments, the method described herein may be used to measure the maternal condition of a pregnant fetus. In one embodiment, the drainage request method is specific to the X chromosome or uses the common gene locus for both the X and Y chromosomes, but does not use any Y-specific locus. In one embodiment, the mobility request method uses one or more of the following: a genetic locus specific for the X chromosome, a common genetic locus for both the X and Y chromosomes, and a locus specific for the Y chromosome, For example, if the sex chromosomal ratio is similar, for example, 45, X (Turner syndrome), 46, XX (common male) and 47, XXX (tricomal X) Lt; RTI ID = 0.0 > allelic < / RTI > In other embodiments, this can be accomplished by comparing a relative number of sequence readings of sex chromosomes with a number of reference chromosomes that are presumed to be able to be multiplied. It should also be noted that these methods may be extended to include an isomeric case.
Single gene disease screening
In one embodiment, the method of measuring the maternal condition of the fetus may be extended to allow simultaneous testing for a single gene disorder. Single-gene disease diagnostics leverage the same targeted approach used for the isometric test and require additional specific targets. In one embodiment, the single gene NPD diagnosis is through a link analysis. In many cases, direct testing of cfDNA samples is unreliable, as the presence of maternal DNA makes it virtually impossible to determine whether a maternal mutation has been inherited. Detection of a unique parent-originated allele is almost non-challenging, but is sufficiently beneficial only if the disease is dominant and carried by the father, limiting the use of the approach. In one embodiment, the method comprises PCR or an associated amplification approach.
In some embodiments, the method comprises stepping an abnormal allele from parent to periphery into a very tightly linked SNP using information derived from first-degree relatives. Subsequently, parental support is performed on the targeted sequencing data obtained from these SNPs to determine which homologue, normal or abnormal, has been inherited from both parents. As long as the SNPs are sufficiently connected, genetic inheritance of the fetus can be measured very reliably. In some embodiments, the method comprises the steps of: (a) adding a set of SNP loci to flank a specific set of common diseases closely to our multiplexed mull tip pool for an aqueous test; (b) reliably staging the alleles of these applied SNPs with abnormal and normal alleles based on the genetic data of various relatives; And (c) reconstituting a fetal haplotype, or a set of stepwise SNP alleles, in a parental homologue within the region around the genetic material and disease locus, thereby measuring the fetal genotype. In some embodiments, additional probes closely linked to the diseased linked locus are added to the set of polymorphic loci used in the isometric test.
This is because the sample is a mixture of maternal and fetal DNA and is therefore challenging. In some embodiments, the method includes relative information to step the SNPs and disease alleles, then determine the physical distance of the observed data by genetic measurements of recombinant data and maternal plasma based on SNPs and site-specific recombinability To obtain the most probable genotype of the fetus.
In one embodiment, a plurality of additional probes per disease linked locus are included in the set of targeted polymorphic loci; The number of additional probes per disease-associated gene locus can be from 4 to 10, 11 to 20, 21 to 40, 41 to 60, 61 to 80, or a combination thereof.
Staging the parental mobility data can be a challenge and there are a number of ways that can be accomplished. Some of which are discussed in the present disclosure and others are described in more detail elsewhere (cf., for example, PCT Publication WO2009105531, filed February 9, 2009, and August 4, 2009 PCT Publication No. WO2010017214 filed on the same date, each of which is incorporated herein by reference in its entirety). In one embodiment, a parent may be stepped by reasoning by measuring the tissue of the parent, for example, by measuring one or more sperm or oocytes. In one embodiment, the parent may be staged by inference using measured genotype data of a first degree of relevance such as the patient's parent (s) or siblings. In one embodiment, the parent is diluted (wherein the DNA is diluted in a plurality of wells to a point where approximately one copy or less of each haploid is expected to be present in each well) Lt; RTI ID = 0.0 > DNA. &Lt; / RTI > In one embodiment, the parental genotype can be phased out using a computer program using the population based on the haplotype frequency giving the most likely image. In one embodiment, the parent may be staged if the staged haplotype data is known to the other parent, along with unstaged genetic data of one or more genetic children of the parent. In one embodiment, the patient can be staged if the staged haplotype data is known to the other patient, along with unstaged genetic data of one or more genetic children of the patient. In some embodiments, the genetic child of the patient can be one or more embryos, a fetus, and / or a baby. Some of these methods and other methods for stepping one or both parents are described in U.S. Publication No. 2011/0033862, filed August 19, 2010; U.S. Publication No. 2011/0178719, filed February 3, 2011; U.S. Publication No. 2007/0184467, filed November 22, 2006; U.S. Publication No. 2008/0243398, filed March 17, 2008, each of which is incorporated herein by reference in its entirety.
Fetal genome Rework
In one aspect, the invention features a method for measuring a haplotype of a fetus. In various embodiments, the methods can be used to determine which polymorphic locus (such as a SNP) is inherited to the fetus and which homologues (including recombination events) are present in the fetus (and thereby sequence between polymorphic loci Insert it again). In some cases, essentially the entire genome of the fetus can be reworked. This ambiguity can be minimized by analyzing additional polymorphic loci if there is some remaining ambiguity in the genome of the fetus (such as the spacing with the crossover). In various embodiments, the polymorphic locus is selected to include one or more chromosomes at a density that reduces any ambiguity to a desirable level. Rather than directing detection of other desired mutations or polymorphisms in the fetal genome, it is possible to detect them based on the linkage (such as the presence of a polymorphic locus linked in the fetal genome), so that other desired mutations in the fetus It has an important application for the detection of polymorphism. For example, when the patient is a mediator of a mutation associated with cystic fibrosis (CF), a nucleic acid sample containing maternal DNA of the mother of the fetus and fetal DNA of the fetus is analyzed to determine whether the fetal DNA is a haplotype Is included. In particular, by analyzing the polymorphic locus, it is possible to detect the CF mutation itself in the fetal DNA by measuring whether the fetal DNA contains a haplotype containing the CF mutation.
In some embodiments, the method comprises measuring the parent haplotype (e. G., The mother of the fetus or the haplotype of the father). In some implementations, such measurements are accomplished without using data from relatives of the mother or father. In some implementations, the parent haplotype is a dilution approach followed by a dilution approach, as described herein and elsewhere (see, e. G., U.S. Publication No. 2011/0033862, filed August 19, 2010, the disclosure of which is herein incorporated by reference) Lt; RTI ID = 0.0 > SNP < / RTI > genotype analysis. Since the DNA is diluted, it is unlikely that more than one haplotype will be in the same fraction (or tube). Thus, it can be effectively present as a single molecule of DNA in a tube, which allows haplotypes to a single DNA molecule to be measured. In some embodiments, the method comprises dividing the DNA sample into a plurality of fractions so that at least one of the fractions comprises a chromosome segment of one chromosome or chromosome pair, and comparing the DNA sample in the at least one fraction to a genotype And measuring the parent haplotype by analysis (e.g., measuring the presence of two or more polymorphic loci). In some embodiments, genotyping includes sequencing (e. G., Shut-sequence analysis). In some embodiments, the genotype is at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or the use of SNP sequences to detect polymorphic loci such as 100,000 different polymorphic loci. In some embodiments, genotyping includes the use of multiple PCRs. In some embodiments, the method comprises contacting the sample in the fraction with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different polymorphic loci (e.g., SNPs) to produce a reaction mixture; And applying the reaction mixture to primer extension reaction conditions to produce an amplified product as measured by a high-emission sequencer to produce sequencing data.
In some implementations, the haplotype of the mother is measured using any of the methods described herein using data from relatives of the mother. In some embodiments, the haplotype of a father is measured using data from a relative of the father by any of the methods described herein. In some embodiments, the haplotype is measured for both father and mother. In some embodiments, the SNP sequence is at least 1,000 in a DNA sample of a parent (or father) and a relative of a mother (or father); 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or the presence of 100,000 different polymorphic loci. In some embodiments, the method comprises contacting a DNA sample of a relative of a parent (or father) and / or a mother (or father) with at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or a library of primers that hybridize simultaneously to 100,000 different polymorphic loci (e.g., SNPs) to produce a reaction mixture; And applying the reaction mixture to the primer extension reaction conditions to produce an amplified product as measured by a high-emission sequencer for producing sequencing data. Parent haplotypes can be measured based on SNP sequences or sequence analysis data. In some implementations, the parent data may be stepped in a manner described or referenced elsewhere in the document.
Using the parent haplotype, we can measure whether the haplotype is inherited. In some embodiments, a nucleic acid sample comprising maternal DNA of the mother of the fetus and fetal DNA of the fetus is analyzed by a SNP sequence to yield at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or 100,000 different polymorphic loci. In some embodiments, the nucleic acid sample comprising maternal DNA of the mother of the fetus and fetal DNA of the fetus comprises a sample at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or by contacting a library of primers that hybridize simultaneously to 100,000 different polymorphic loci (e.g., SNPs) to produce a reaction mixture. In some embodiments, the reaction mixture is subjected to primer extension reaction conditions to produce amplified products. In some embodiments, the amplified product produces sequencing data by measuring using a high emission sequencer. In various embodiments, SNP sequences or sequencing data can be used to obtain data on chromosome crossing probabilities at different locations in a chromosome (e.g., by using recombination data as can be found in the HapMap database) (By generating a recombinant risk score for the parental haplotype), the dependency between the polymorphic alleles on the chromosome can be modeled. In some embodiments, the number of alleles at the polymorphic locus is calculated on a computer based on sequence analysis data. In some embodiments, multiple diploid hypotheses regarding the different possible pivotal states of each chromosome are generated in the computer; A model for the number of alleles predicted at the polymorphic locus on the chromosome (eg, a binding distribution model) is constructed on a computer for each drainage hypothesis; The relative likelihood of each of the preexisting hypotheses is measured on the computer using the joint distribution model and the number of alleles; The fetus is requested by choosing the drainage state corresponding to the hypothesis at maximum probability. In some embodiments, the construction of a binding distribution model for the number of alleles and the measurement of the relative probability of each hypothesis are performed using a method that does not require the use of reference chromosomes.
In some embodiments, the fetal haplotype is measured for one or more chromosomes taken from the group consisting of

chromosomes

13, 18, 21, X, In some embodiments, the fetal haplotype is measured for all fetal chromosomes. In various embodiments, the method essentially measures the entire genome of the fetus. In some embodiments, the haplotype is measured for at least 30, 40, 50, 60, 70, 80, 90, or 95% of the fetal genome. In some embodiments, the haplotyping of the fetus is determined by determining which allele is at least 1,000; 2,000; 5,000; 7,500; 10,000; 20,000; 25,000; 30,000; 40,000; 50,000; 75,000; Or about 100,000 different polymorphic loci.
DNA Composition of
It may be advantageous to measure the allele distribution in a set of alleles when performing an informatics analysis on sequencing data measured in a mixture of fetal and maternal blood to measure genomic information about fetal, have. Unfortunately, in many cases, such as when approaching to measure the maternal condition of a fetus from a DNA mixture found in the blood plasma of a maternal blood sample, the amount of DNA available can be used to directly measure the allele distribution with good fidelity in the mixture Not enough. In these cases, amplification of the DNA mixture will provide a sufficient number of DNA molecules whose preferred allele distribution can be measured with excellent fidelity. However, current amplification methods typically used for amplification of DNA for sequencing are often highly biased, meaning that they do not amplify both alleles at the polymorphic locus by the same amount. Biased amplification can result in very different allele distributions due to allele distribution in the original mixture. For most purposes, a highly precise measurement of the relative amount of alleles present in the polymorphic locus is not required. In contrast, in embodiments of the invention, amplification or enrichment methods that specifically enrich the polymorphic allele and conserve the allelic ratio may be advantageous.
A number of methods have been described that can be used to preferentially enrich samples of DNA at multiple loci in a manner that minimizes allele deflection. Some examples use circular probes to target multiple loci, wherein the 3 ' and 5 ' ends of the pre-circulated probes are in one or a few positions spaced apart from the polymorphic site of the targeted allele Is designed to hybridize to a base. The other is to use PCR probes wherein the 3 'end PCR probe is designed to hybridize to a base at one or a few positions remote from the polymorphic site of the targeted allele. The other is to generate a mixture of DNA using a split and hornblock approach, where the preferentially enriched gene locus is enriched with low allelic bias without the disadvantages of direct multiplexing. The other is to use a hybrid capture approach wherein the capture probe is designed such that the region of the capture probe designed to hybridize to the DNA flanking the polymorphic site of the target is separated by one or a few bases at the polymorphic site.
When measuring the polydispersity status of an individual using the allele distribution determined in a set of polymorphic loci, it is preferred for genetic measurements, so it is desirable to preserve the relative amount of alleles in the sample of DNA. This preparation means selectively enhancing the presence of molecules of DNA corresponding to WGA amplification, targeted amplification, selective enrichment techniques, hybrid capture techniques, circulating probes or specific alleles, or to amplify the amount of DNA Or < / RTI >
In some embodiments of the invention, there is a set of DNA probes designed to target a locus whose locus has a small minor allele frequency. In some embodiments of the invention, there is a set of probes designed to target sites with the greatest probability of a fetus with a SNP that gives highly useful information at the locus of the locus at which the locus is located. In some embodiments of the invention, there is a set of probes designed such that the probe targets the locus of the gene optimized for a given population group. In some embodiments of the invention, there is a set of probes designed such that the probes are designed to target optimized loci for a particular mix of group subgroups. In some embodiments of the invention, the probe is a set of probes designed to target optimized loci for a particular pair of parents originating from different population subgroups with different minor allele frequency profiles. In some embodiments of the invention, there is a circular chain of DNA comprising at least one base pair annealed to a piece of DNA of fetal origin. In some embodiments herein, there is a circular chain of DNA comprising at least one base pair annealed to a piece of DNA of placental origin. In some embodiments herein, at least some of the nucleotides are annealed to DNA of fetal origin while there is a circulating chain of circulated DNA. In some embodiments herein, at least some of the nucleotides are annealed to placental DNA while there is a circulating chain of circulated DNA. In some embodiments of the invention, there are a set of probes wherein some of the probes target a single chain repetition and some of the probes target a single nucleotide polymorphism. In some embodiments, the locus is selected for non-invasive prenatal diagnosis purposes. In some embodiments, the probe is used for non-invasive prenatal diagnosis purposes. In some embodiments, the locus is targeted using a method that may include circular probes, MIPs, capture by hybridized probes, probes on a SNP array, or a combination thereof. In some embodiments, the probes are used as circulating probes, MIPs, capture by hybridized probes, probes on a SNP array, or a combination thereof. In some embodiments, the locus is sequenced for the purposes of non-invasive fetal diagnosis.
If the relative benefit of the sequence when combined with the relative parental relationship is greater, this would maximize the multiplicity of sequence readings containing SNPs for which the parental relationship is known, thereby maximizing the benefit of the set of sequencing readings in the mixed sample can do. In one embodiment, a plurality of sequence readings in which the maternal information contains known SNPs can be amplified preferentially by amplifying specific sequences by using qPCR. In one embodiment, a plurality of sequence readings in which the maternal information contains known SNPs can be amplified preferentially by amplifying specific sequences by using circular probes (e. G., MIP). In one embodiment, a plurality of sequence readings in which the parental information contains a known SNP can be enhanced by preferentially amplifying the specific sequence by using capture by a hybridization method (e.g., SURESELET). Different methods can be used to improve multiple sequence readings in which the maternal information contains known SNPs. In one embodiment, targeting can be accomplished by extension, extension-free linkage, capture by hybridization, or by PCR.
In a sample of fragmented genomic DNA, fractions of the DNA sequence are uniquely mapped to the individual chromosome; Other DNA sequences can be found on different chromosomes. DNA found in plasma is often less than 500 bp in length, regardless of whether it is typically fragmented from a maternal or fetal origin. In a representative genomic sample, 3.3% of the approximately mappable sequence will be mapped to chromosome 13; 2.2% of the mappable sequence will be mapped to chromosome 18; Approximately 1.35% of the mappable sequence will be mapped to chromosome 21; 4.5% of the mappable sequence will be mapped to chromosome X of the female; 2.25% of the approximately mappable sequence will be mapped to chromosome X (in males); 0.73% of the mappable sequence will be mapped to chromosome Y (in males). These are the most likely chromosomes in the fetus. Also, of the short sequences, approximately one out of the 20 sequences will contain the SNP using the SNPs contained in dbSNP. The population may be even higher than if there were many SNPs not found.
In one embodiment herein, targeting can be used to improve the fraction of DNA in a sample of DNA mapped to a given chromosome, such that the fraction significantly exceeds the percentages listed above for genomic samples. In embodiments of the invention, targeting methods can be used to enhance the fraction of DNA in a sample of DNA so that the percentage of the sequence containing the SNP is significantly greater than can typically be found for a genomic sample. In one embodiment of the invention, a targeting method can be used to target a set of SNPs or chromosomal DNA in a mixture of maternal and fetal DNA for purposes of fetal diagnosis.
The number of readings mapped to the suspected chromosome is counted and compared to the number of readings mapped to the reference chromosome to measure fetal demise and the assumption that the excess of the reading on the suspected chromosome corresponds to fetal trisomy in this chromosome (US Patent No. 7,888, 017). &Lt; Desc / Clms Page number 2 > These methods for fetal diagnosis do not use any kind of targeting, and they may not describe the use of targeting for fetal diagnosis.
By using a targeting approach in sequencing a mixed sample, it may be possible to achieve a certain level of precision with fewer sequence readings. Such precision can refer to sensitivity, or it can refer to specificity, or it can refer to some combination thereof. A preferred level of precision may be from 90% to 95%; Which may be from 95% to 98%; Which may be from 98% to 99%; Which may be 99% to 99.5%; Which may be 99.5% to 99.9%; Which may be from 99.9% to 99.99%; Which may be from 99.99% to 99.999%; This can be from 99.999% to 100%. Levels of precision exceeding 95% can be referred to as high precision.
There are a number of methods published in the prior art that demonstrate methods for measuring the maternal condition of a fetus from a mixed sample of maternal and fetal DNA [e.g., G.J. W. Liao et al. Clinical Chemistry 2011; 57 (1) pp. 92-101]. These methods focus on thousands of locations along each chromosome. From a mixed sample of DNA, the number of positions along the chromosome that can be targeted while still yielding a high degree of precision measurement in the fetus, is unexpectedly low for a certain number of sequence readings. In embodiments herein, precision drainage measurements may be performed using targeted sequencing or by any method of targeting, such as qPCR, ligand mediated PCR, other PCR methods, capture by hybridization, or using a circular probe Where the number of genetic seg- ments according to the chromosome that need to be targeted may be from 5,000 to 2,000 loci; Which may be from 2,000 to 1,000 loci; Which may be from 1,000 to 500 loci; Which may be between 500 and 300 loci; Which may be from 300 to 200 loci; Which may be between 200 and 150 loci; Which may be from 150 to 100 loci; Which may be between 100 and 50 loci; Which may be 50 to 20 loci; This may be 20 to 10 loci. Optionally, it may be between 100 and 500 loci. A high level of precision can be achieved by targeting a small number of loci and executing a small number of unexpected sequence readings. The number of readings may be between 100 and 50 million readings; The number of readings may be 50 to 20 million readings; The number of readings may be 20,000 to 10 million readings; The number of readings may be from 10 million to 5 million readings; The number of readings may be from 5 million to 200 million readings; The number of readings can be from 2 million to 1 million readings; The number of readings may be from 1 million to 500,000 readings; The number of readings may be from 500,000 to 200,000; The number of readings may be from 200,000 to 100,000; The number of readings may be from 100,000 to 50,000; The number of readings may be 50,000 to 20,000; The number of readings may be from 20,000 to 10,000; The number of readings may be 10,000 or less. Fewer readings are necessary for larger amounts of input DNA.
In some embodiments, there is a composition comprising a mixture of DNA of fetal origin and DNA of mature origin, wherein the percentage of sequences that are uniquely mapped to chromosome 13 is at least 4%, at least 5%, at least 6% , 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, 20% or more, 25% or more or 30% or more. In some embodiments of the invention, there is a composition comprising a mixture of DNA of fetal origin and DNA of mature origin, wherein the percentage of sequence unique to chromosome 18 is greater than 3%, greater than 4%, greater than 5% , 6%, 7%, 8%, 9%, 10%, 12%, 15%, 20%, 25%, or 30% In some embodiments of the invention, there is a composition comprising a mixture of DNA of fetal origin and DNA of the mature origin, wherein the percentage of sequences unique to chromosome 21 is greater than 2%, greater than 3%, greater than 4% , 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, 20% or more, 25% or more or 30% or more. In some embodiments of the invention, there is a composition comprising a mixture of DNA of fetal origin and DNA of mature origin, wherein the percentage of sequence unique to chromosome X is at least 6%, at least 7%, at least 8% , 9% or more, 10% or more, 12% or more, 15% or more, 20% or more, 25% or more or 30% or more. In some embodiments of the invention, there is a composition comprising a mixture of DNA of fetal origin and DNA of a maternal origin, wherein the percentage of sequence uniquely mapped to chromosome Y is at least 1%, at least 2%, at least 3% , 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, 20%, 25%, or 30% .
In some embodiments, a composition comprising a mixture of DNA of fetal origin and DNA of a maternal origin is described wherein the percentage of sequences unique to the chromosome and containing at least one single nucleotide polymorphism is greater than or equal to 0.2% , 0.3% or more, 0.4% or more, 0.5% or more, 0.6% or more, 0.7% or more, 0.8% or more, 0.9% or more, 1% or more, 1.2% or more, 1.4% or more, 1.6% or more, %, More than 2.5%, more than 3%, more than 4%, more than 5%, more than 6%, more than 7%, more than 8%, more than 9%, more than 10% , Where the chromosomes are taken from

group

13, 18, 21, X, or Y. [ In some embodiments of the invention, there is a composition comprising a mixture of DNA of fetal origin and DNA of maternal origin, wherein the sequence uniquely mapped to the chromosome and comprising at least one single nucleotide polymorphism in the set of single nucleotide polymorphisms , The percentages of at least 0.15%, at least 0.2%, at least 0.3%, at least 0.4%, at least 0.5%, at least 0.6%, at least 0.7%, at least 0.8%, at least 0.9%, at least 1%, at least 1.2%, at least 1.4% , 1.6% or more, 1.8% or more, 2% or more, 2.5% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, Wherein the chromosome is taken from a set of chromosomes 13, 18, 21, X and Y, wherein the number of single nucleotide polymorphisms in the set of single nucleotide polymorphisms is from 1 to 10, 10 or more To 20, 20 to 50, 50 to 100, 100 to 200, 200 to 500, 500 to 1,000, 1,000 to 2 , 000, 2,000 to 5,000, 5,000 to 10,000, 10,000 to 20,000, 20,000 to 50,000, and 50,000 to 100,000.
Theoretically, each cycle at the time of amplification doubles the amount of DNA present; In fact, the degree of amplification is slightly less than double. In theory, amplification involving targeted amplification would produce amplification free of bias in the DNA mixture; In fact, different alleles tend to be amplified to different degrees than other alleles. When DNA is amplified, the degree of allelic deviation typically increases with the number of amplification steps. In some embodiments, the methods described herein comprise amplifying the DNA to a low level of allelic bias. Since the allele biasing compound is combined with each additional cycle, allele bias per cycle can be measured by calculating the nth root of the total bias, where n is the base 2 log of concentration. In some embodiments, there is provided a composition comprising a second mixture of DNA, wherein a second mixture of DNA is preferentially concentrated to a plurality of polymorphic loci of a first mixture of DNA, wherein the degree of concentration is at least 10, At least 100, at least 1,000, at least 10,000, at least 100,000, or at least 1,000,000, wherein the allele ratio in the second mixture of DNA at each locus is an average of the allele ratio of the polymorphic locus of the first mixture of DNA A ratio of less than 1,000%, 500%, 200%, 100%, 50%, 20%, 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, 0.02% . In some embodiments, there is a composition comprising a second mixture of DNA, wherein a second mixture of DNA is preferentially enriched in a plurality of polymorphic loci of a first mixture of DNA, wherein a plurality of polymorphisms The allele bias per cycle for the locus is on average less than 10%, 5%, 2%, 1%, 0.5%, 0.2%, 0.1%, 0.05%, or 0.02%. In some embodiments, the plurality of polymorphic loci comprises at least 10 loci, at least 20 loci, at least 50 loci, at least 100 loci, at least 200 loci, at least 500 loci, At least 2,000 loci, at least 5,000 loci, at least 10,000 loci, at least 20,000 loci, or at least 50,000 loci
part Example
In some embodiments, a method of generating a report describing the measured drainage state of a chromosome in an infertile fetus is described herein, which method comprises obtaining a first sample containing DNA of a fetal mother and fetal DNA, ; Obtaining genomic data of one or both parents of the fetus; Preparing a first sample by obtaining a sample prepared by isolating DNA; Measuring DNA at a plurality of polymorphic sites in the prepared sample; Calculating the number of alleles or the probability of alleles at a plurality of polymorphic loci by DNA measurement on a sample prepared on a computer; Generating, on the computer, a plurality of hypothetical hypotheses regarding the probability of allele frequencies predicted at a plurality of polymorphic loci on a chromosome for different possible drainage states of the chromosomes; Constructing a binding distribution model for the probability of alleles of each polymorphic locus on a chromosome for each diploid hypothesis using genomic data of one or both parents of the fetus on a computer; Measuring the relative probability of each of the preexisting hypotheses using a joint distribution model and a probability of allele probability calculated for the manufactured sample on a computer; Requesting a drainage state of the fetus by selecting a drainage state corresponding to the hypothesis as a maximum probability; And generating a report describing the measured drainage condition.
In some embodiments, the method is used to measure the pait state of a plurality of conceived fetuses in a plurality of respective models, the method comprising: measuring the percentage of DNA from the fetal origin in each of the produced samples, Further comprising; The step of measuring DNA in the sample prepared here is carried out by sequencing the number of DNA molecules in each of the prepared samples wherein more molecules of the DNA are present in the embryo than the prepared sample having a larger fraction of fetal DNA Sequence analysis of samples prepared to have smaller fractions of DNA is performed.
In some embodiments, the method may be used to measure the pait state of a plurality of conceived fetuses in a plurality of each model, wherein the step of measuring DNA in the sample produced comprises, for each fetus, Sequencing the first fraction of the sample to obtain a first set of measurements that provides a first set of DNA measurements to determine a first relative probability for each of the preexisting hypotheses for each fetus Wherein the first relative probability measure for each of the preexisting hypotheses indicates that the preexisting hypothesis corresponding to the isomeric fetus is not a significant but deterministic probability; Re-sequencing the second fraction of samples made from these embryos to obtain a second set of measurements; Measuring a second relative probability for a drainage hypothesis for the fetus using a second set of measurements and optionally also a first set of measurements; And requesting a drainage state of the fetus to be re-sequenced by selecting a second sample of the second sample with a maximum probability as measured by a second relative probability determination, the drainage state corresponding to the hypothesis.
In some embodiments, a composition of matter is described, wherein the composition of matter comprises a sample of primarily enriched DNA, wherein a sample of preferentially enriched DNA comprises a plurality of polymorphic loci of a first sample of DNA Wherein the first sample of DNA consists of a mixture of maternal DNA and fetal DNA in the maternal plasma wherein the degree of concentration is at least two factors wherein the first sample and the first concentrated sample, Allele biases are selected on average from the group consisting of less than 2%, less than 1%, less than 0.5%, less than 0.2%, less than 0.1%, less than 0.05%, less than 0.02%, and less than 0.01%. In some embodiments, a method for producing a sample of this preferentially enriched DNA is described.
In some embodiments, there is disclosed a method of measuring the presence or absence of fetal integrity in a maternal tissue sample comprising fetal and maternal genomic DNA, said method comprising: (a) determining the presence or absence of fetal and maternal genomic DNA &Lt; / RTI > (b) selectively concentrating a mixture of fetal and maternal DNA in a plurality of polymorphic alleles; (c) distributing selectively enriched fragments from a mixture of fetal and maternal genomic DNA of step (a) to provide a reaction sample comprising a single genomic DNA molecule or an amplification product of a single genomic DNA molecule; (d) performing a gigantic parallel DNA sequence analysis of selectively enriched fragments of genomic DNA in the reaction sample of step (c) to determine the sequence of said selectively enriched fragments; (e) identifying the chromosome to which the sequence obtained in step (d) belongs; (f) analyzing the data of step (d) to determine: i) the number of fragments of genomic DNA measured in step (d) belonging to at least one first target chromosome expected to be a diploid in both maternal and fetal, And ii) measuring the number of fragments of genomic DNA measured in step (d) belonging to the second target chromosome, wherein said second chromosome is predicted to be insoluble in the fetus; (g) Prediction of the number of fragments of genomic DNA measured in step d) for the second target chromosome, using the number determined in step (f) i), if the second target chromosome is in the order of a few Calculating a distributed distribution; (h) using the evaluated fraction of fetal DNA found in the mixture of step (f), part i) and step (b), if the second target chromosome is isomeric, calculating a predicted distribution of the number of fragments of genomic DNA measured in step (d); And (i) the number of fragments of genomic DNA measured in step (f) ii) using the maximum probability or maximal recursive approach is a fraction of the distribution calculated in step (g) or the distribution calculated in step (h) By measuring whether there is more possibility; Indicating the presence or absence of fetal insulin resistance.
Exemplary methods of diagnosing cancer
It is noted that DNA of living cancer can be found in the host's blood in the host. In the same way that genetic diagnosis can be done by measuring mixed DNA found in maternal blood, genetic diagnosis can be equally well done by measuring mixed DNA found in host blood. A genetic diagnosis may include an ischemic condition, or a genetic mutation. Any claim of the present application which is read out from a measurement made in the maternal blood in the measurement of the fetal diploid state or genetic condition is equally well-read in measuring the drainage state or the genetic state of the cancer from the measurement in the host's blood.
In some embodiments, the methods of the invention allow measuring the parenteral condition of a cancer, the method comprising: obtaining a mixed sample containing a genetic material of the host and a genetic material of the cancer; Measuring DNA in the mixed sample; Calculating a fraction of DNA of cancer origin in the mixed sample; And measuring the parenteral condition of the cancer using measurements made up of the mixed sample and the calculated fraction. In some embodiments, the method may further comprise the step of measuring the cancer treatment regimen based on the measurement of the drainage condition of the cancer. In some embodiments, the method may further comprise administering a cancer treatment agent based on the measurement of the drainage condition of the cancer, wherein the cancer treatment agent comprises a pharmaceutical agent, a biological agent, and an antibody-based therapy and combinations thereof .
Exemplary Implementation Method
Any of the embodiments disclosed herein may be implemented in digital electronic circuitry, integrated circuits, specially designed ASICs (application-specific integrated circuits), computer hardware, firmware, software, or a combination thereof. The apparatus of the presently disclosed aspect may be implemented in a computer program product explicitly embodied in a machine-readable storage device for execution by a programmed processor; The method steps of the presently disclosed embodiment may be performed by performing the functions of the presently disclosed embodiment by operating the programmed processor executing the program of instructions in the input data and in the output to be generated. The presently disclosed embodiments may be of a special or general purpose and include at least one and / or at least one output device coupled to receive data and instructions and to transmit data and instructions to the storage system, the at least one input device, May be advantageously implemented in one or more computer programs that may be executed and / or interpreted in a programmed system including a programmed processor of the computer system. Each computer program may optionally be implemented in a high-level process or object-oriented programming language, or in assembly or machine language; In certain cases, the language may be an editable or interpretive language. The computer program may be in any form, including as a stand-alone program, or as a module, component, subroutine, or other unit suitable for use in a computer environment. The computer program may be arranged to be executed or interpreted in one computer or a plurality of computers at one site, or to be interconnected by a communication network distributed over a plurality of sites.
As used herein, a computer-readable storage medium refers to physical or explicit storage (as opposed to a signal) and may be any method for the explicit storage of information such as computer-readable instructions, data structures, program modules or other data Or non-volatile, removable and non-removable media implemented in the art. The computer-readable storage medium may be any type of storage medium such as RAM, ROM, EPROM, EEPROM, flash memory or other semiconductor storage device technology (solid state memory technology), CD ROM, DVD, or other optical storage, magnetic cassette, magnetic tape, Magnetic storage devices, or any other physical or material medium that can be used to store clearly the desired information or data or instructions and which can be accessed by a computer or processor.
Either of the methods described herein includes output of data in a physical form such as a computer screen, or paper output. It is to be understood that, in the description of any embodiment anywhere in the document, the described method can be combined with the output of operational data which can be actuated by a primary care physician. In addition, the described methods can be combined with clinical decisions that result in clinical treatment or with no clinical action. To measure genetic data on a target entity, some of the embodiments described in this document may be combined with a decision to select one or more embryos to deliver in connection with the IVF, or the embryo may be transferred Can be combined with the process. Some of the implementations described in this document to measure genetic data on the target entity may be combined with the recognition of a potential chromosomal abnormality or lack thereof and with the medical specialist and may be associated with the fetus Can be combined with crystals that are not aborted or aborted. Some of the embodiments described herein may be combined with the output of operational data, the execution of clinical decisions resulting in clinical treatment, or the execution of clinical decisions that do not take action.
Exemplary diagnostic box
In one embodiment, the present disclosure encompasses diagnostic boxes that can be partially or fully accomplished with any of the methods described herein. In one embodiment, the diagnostic box may be located at any suitable location that is close enough to the office of a primary care physician, a hospital laboratory, or a patient care point. The box may perform the entire method in an entirely automated fashion, or the box may require one or more steps to be completed manually by the technician. In one embodiment, the box can be analyzed in at least genotype data measured in the maternal plasma. In one embodiment, the box may be associated with means for transmitting the genotypically measured genotypic data in the diagnostic box to an external computational facility, which may then analyze the genotypic data and possibly also generate a report. The diagnostic box may include a robotic unit capable of delivering an aqueous or liquid sample from one container to another. This may include a plurality of reagents both in solid and liquid. This may include a high emission sequencer. This may include a computer.
Experimental paragraph
The presently disclosed embodiments are set forth in the following examples which are set forth to assist in understanding the present disclosure and are not to be construed as limiting the scope of the claimed subject matter in any way as defined in the accompanying claims. The following examples are set forth to provide those of ordinary skill in the art with a complete description of the method and the use of the described implementations and are not intended to limit the scope of the disclosure herein, It is not intended to indicate that this is the only experiment performed or all. While efforts have been made to ensure precision with respect to the number used (eg, quantity, temperature, etc.), some experimental errors and deviations may be considered. Unless otherwise indicated, parts are volumetric parts and temperatures are in degrees Celsius. It should be understood that changes in the described methods can be made without changing the fundamental aspects of enumerating the experiment.
Experiment 1
The purpose of this study is to compare Bayesian maximum likelihood estimation (MLE) algorithms using maternal genotyping to calculate fetal fractions that enhance the accuracy of noninvasive fetal triclosan diagnostics compared to published methods It was for betting.
Simulated sequencing data for maternal cfDNA was generated from trisomy 21 and sample collection readings obtained from each mother line. The exact percentage of dichromatous and tromcholite requests is determined by 500 simulations in published methods (see Chiu et al. BMJ 2011; 342: c7401) and in various fetal fractions for our MLE-based algorithms Respectively. We have demonstrated the simulated test by obtaining 5 million blocked readouts from 4 pregnant mothers and fathers collected under the IRB-certified protocol. Maternal genotypes were obtained in a 290K SNP sequence (see:14)
In the simulation, the MLE-based approach achieved a precision of 99.0% for fetal fractions as low as 9%, and reported a correspondingly good reliability for total precision. The present inventors have proved these results using four actual samples, where we have obtained all the exact requests with calculated confidence in excess of 99%. In contrast, our experiments with published algorithms for Chiu et al. Required 18% fetal fraction to achieve 99.0% precision and achieved only 87.8% precision in 9% fetal DNA .
In relation to the MLE-based approach, the fetal fraction measurement of the parental genotype achieves a higher precision than the algorithm published in the fetal fraction predicted during the first and early second 3 months. In addition, the methods disclosed herein produce a significant confidence metric in measuring the reliability of results, particularly in low fetal fractions where drainage detection is more difficult. The published method uses a near-precise limit method for requesting drainage based on a large set of diploid training data, an approach to predicting a false positive rate. Also, without a confidence measure, the published method is at risk of reporting false negative results in the presence of fetal cfDNA that is insufficient to fulfill the request. In some implementations, a confidence estimate is calculated for the requested drainage condition.
Experiment 2
This object is achieved by improving non-invasive detection of

fetal trisomy

18, 21, and X in a sample of low fetal fractions by using maternal genotype and hapmap data in a Bayesian maximum likelihood estimation (MLE) algorithm .
Four oesophageal and two trisomy-positive pregnancy maternal samples and each of the paternal samples were obtained under IRB-approved protocols from known fetal karyotypes. Maternal cfDNA was extracted from plasma and approximately 10 million sequence readings were obtained after preferential concentration to target specific SNPs. The parental samples were similarly sequenced to obtain a genotype.
The described algorithm correctly requested

chromosomes

18 and 21 of all purified water samples and the normal chromosome of an aqueous sample.

Trisomy

18 and 21 were corrected, as in chromosome X copy number, in male and female embryos. The reliability produced by the algorithm in all cases exceeded 98%.
The method described reported the ploidy of all tested chromosomes of six samples, including samples containing less than 12% fetal DNA, which accounted for approximately 30% of the first and early second-third month samples. An important distinction between this MLE algorithm and the published method is that it leans the parental genotype and hap- map data to improve accuracy and create confidence standards. In the low fetal fraction, all methods become less precise; It is important to accurately identify the sample without sufficient fetal cfDNA to achieve a reliable request. The other uses chromosome Y-specific probes to evaluate the embryo fetal fraction of the fetus, but the coexisting parental genotype allows evaluation of the fetal fraction for both sexes. Another inherent limitation of the published method of using untargeted shorthand sequencing is that the precision of the drainage request varies among chromosomes due to differences in factors such as GC abundance. This targeted sequencing approach is largely independent of these chromosome-scale changes and produces more continuous performance between chromosomes.
Experiment 3
The purpose of this study was to analyze the SNP locus of fetal DNA floating freely in the plasma of the mother by measuring whether tricyclospores were detected with high confidence in the triploid fetus using new informatics.
20 mL of blood was drawn from the pregnant parents after abnormal ultrasonography. After centrifugation, maternal DNA was extracted from the membranes (DNEASY, manufacturer: QIAGEN); Cell-free DNA was extracted from plasma (QIAAMP: QIAGEN). Targeted sequencing was applied to the SNP locus on

chromosome

2, 21, and X in both DNA samples. Maximum-Probability Bayesian estimates selected the most probable hypothesis in all possible drainage states. The method measures the apparent reliability in fetal DNA fractions, drainage status and drainage measurements. No estimates are made for the drainage of reference chromosomes. The diagnosis uses a test statistic that is independent of the number of sequence readings, which is the current state of the art.
This method accurately diagnosed the trisomy of

chromosomes

2 and 21. Children's fractions were evaluated in 11.9% [CI 11.7-12.1]. The fetus effectively has a reliability of 1 (error probability <10^-30) With one mother and two father copies of

chromosomes

2 and 21, respectively. This was achieved using 92,600 and 258,100 readings on

chromosomes

2 and 21, respectively.
This is the first demonstration of non-invasive prenatal diagnosis of trisomy chromosomes in maternal blood when triple mature fetuses are demonstrated, as evidenced by meta-karyotype. Existing methods of non-invasive diagnostics may not detect the integrity of the sample in question. The current method relies on redundant sequence readings on trisomy chromosomes in relation to dichromic reference chromosomes; Trisomic fetuses do not have a dichromic reference material. In addition, current methods may not similarly achieve high-reliability drainage measurements using the number of fractions of fetal DNA and the number of sequence readings. This is easy to extend the approach to all 24 chromosomes.
Experiment 4
The following protocol was used for 800-flex amplification of DNA isolated from maternal plasmids of the anomalous pregnancy using standard PCR (meaning no overlap was used) and also genomic DNA of the trisomy 21 cell line. Laboratory preparation and amplification included A-tailing followed by a single tube flat-ended. The adapter connection was performed using the connection kit found in the AGILENT SURESELECT kit and PCR was performed for 7 cycles. Subsequently, PCR was performed using 15 different cycles of primer pairing with 15 different primer pairs (95 ° C for 30 seconds; 72 ° C for 1 minute; 60 ° C for 4 minutes; 65 ° C for 1 minute; 72 ° C for 30 seconds ) To target SNPs on

chromosomes

2, 21 and X. < tb > < TABLE > The reaction was carried out at a concentration of 12.5 nM primer. Subsequently, the DNA was sequenced with an ILLUMINA IIGAX sequencer. The sequencer emitted 1.9 million readings, of which 92% were mapped to the genome; Of the readings mapped to the genome, over 99% were mapped to one of the regions targeted by the targeted primers. The numbers were essentially the same for both plasma DNA and genomic DNA.15Shows the ratio of two alleles to ~ 780 SNPs detected by a sequencer in genomic DNA taken from the cell line using the trisomy of chromosome 21 known on chromosome 21. Note that the allele ratio is plotted here for easy visualization, since the allele ratio is not readily readable visually. Circles represent SNPs on dichromic chromosomes, while asterisks represent SNPs on trisomy chromosomes.16Is another indication of the same data as in Figure X, where the Y-axis is the relative number of A and B measured for each SNP, where the X-axis is the number of SNPs and the SNPs are separated by chromosomes.16, SNPs 1 to 312 are found on chromosome 2, SNPs 313 to 605 are found on chromosome 21 that is trisomic, SNPs 606 to 800 are found on chromosome X, The data on chromosomes 2 and X represent dichromic chromosomes because the relative sequence numbers lie in three populations: AA at the top of the graph, BB at the bottom of the graph, and AB . Data from trichromatic adult chromosome 21 represent four populations: AAA at the top of the graph, AAB around 0.65 lines (2/3), ABB at .35 lines (1/3), and BBB at the bottom of the graph.
17A To 17dRepresent data for the same 800-flex protocol, but are measured in the amplified DNA of four plasma samples of pregnant women. For these four samples, the present inventors are expected to find seven classes of points: (1) along the top of the graph, there are loci with both maternal and fetal AA, and (2) (3) there is a locus with a parent AB and a child AA at the slightly upper end of the line 0.5, and (4) the mother line and the fetus are both AB (5) there is a genetic locus with a parent AB of slightly AB below the 0.5 line, (6) there is a gene locus at the bottom of the graph with a parent BB and a child AB ) There is a genetic locus along the bottom of the graph where both the mother and fetus are BB. The smaller the fetal fraction, the less separation occurs between groups (1) and (2), groups (3), (4) and (5), and groups (6) and (7). Isolation is predicted to be one-half of the fraction of DNA from the fetus. For example, if the DNA is 20% embryos and 80% aggregates, we predict that (1) to (7) will be concentrated at 1.0, 0.9, 0.6, 0.5, 0.4, 0.1 and 0.0 respectively; For example,17D,See POOL1_BC5_ref_rate. Instead, we assume that (1) to (7) will be concentrated at 1.00, 0.96, 0.54, 0.50, 0.46, 0.04 and 0.00, respectively, if the DNA is 8% For example,Degree 17b, And POOL1_BC2_ref_rate. If fetal DNA is not detected, the inventors do not expect to find (2), (3), (5), or (6); (1) and (2) can be said to be on top of each other as in (3), (4) and (5) have; For example,17C, And POOL1_BC7_ref_rate.17A, The fetal fraction for POOL1_BC1_ref_rate is about 25%.
Experiment 5
Most methods of DNA amplification and measurement will produce some allele biases, wherein the two alleles typically found in the locus are detected in intensity or numbers that do not represent the actual amount of alleles in the sample of DNA. For example, in the case of a single individual, the heterozygous locus of the present inventors is contemplated to find two alleles of a 1: 1 ratio, which is the theoretical ratio predicted for the heterozygous locus, : 45, or even 60:40. It is also noted that, with respect to sequence analysis, a simple stochastic noise could produce significant allelic bias if the depth of the reading was low. In one embodiment, it is possible to model the behavior of each SNP so that if a continuous bias is observed for a particular allele, then the bias can be corrected for it.18Represents the fraction of data that can be accounted for by the bias change before and after the bias correction.18, The star represents allelic deviation observed in the deduced sequence data for the 800-flex experiment; Circles indicate allele biases after correction. If there is no allele deflection entirely, the inventors can predict that the data belongs along the x = y line. A similar set of data produced by amplifying the DNA using 150-flex targeted amplification yielded data very closely related to the 1: 1 line after bias correction.
Experiment 6
Common amplification of DNA using an adapter coupled to a primer specific for the adapter tag, where primer annealing and extension time is limited to several minutes, has the effect of concentrating a shorter percentage of the DNA strand. Most library protocols designed to generate DNA libraries suitable for sequencing contain these steps, and protocol examples are published and are well known to those of skill in the art. In some embodiments of the invention, an adapter with a common tag is attached to plasma DNA and amplified using a primer specific to the adapter tag. In some embodiments, the common tag may be the same tag as used in the sequencing, or it may be the only common tag for PCR amplification, or it may be a set of tags. Since the fetal DNA is typically short in nature, the maternal DNA can be both short and long in nature, but the method has the effect of concentrating the proportion of fetal DNA in the mixture. It is considered to be DNA of apoptotic cells, free floating DNA containing both fetal and maternal DNA is short and is ~ 200 bp or less. The cellular DNA released by cell lysis, which is a common phenomenon after venous incision, is typically almost entirely simulated and also very long - almost 500 bp. Thus, a blood sample set around for more than a few minutes will contain a mixture of short (fetal + fetal) and longer (parent) DNA. Performing targeted amplification followed by common amplification using relatively short extension times in maternal plasma will tend to increase the relative proportion of fetal DNA when compared to amplified plasma using only targeted amplification. This indicates that the fetal percent (vertical axis) measured when the input was plasma DNA versus the measured fetal percentage when the input DNA was plasma DNA with a library prepared using the ILLUMINA GAIIx library manufacturing protocol19. All points belong to the lower line, indicating that the library preparation step enriches the fraction of DNA from the fetus. Two samples of plasma, which represent hematopoietic reds, can be the percentage of maternal DNA with increased amounts due to cellular degradation, represent a particularly significant concentration of the fetal fraction when the library preparation is performed prior to the targeted amplification. The methods disclosed herein are particularly useful when some other situation occurs where cells containing relatively long chains of hemolyzed or contaminating DNA are degraded and become contaminated with a mixed sample of short and long DNA. Typically the relatively short annealing and extension times are from 30 seconds to 2 minutes, although they may be as short as 5 or 10 seconds or less, or as long as 5 or 10 minutes.
Experiment 7
The following protocol was used for 1,200-flex amplification using a direct PCR protocol of genomic DNA from DNA isolated from an afferent pregnancy and also a triple A 21 cell line, and also a semi-overlap approach. Library preparation and amplification included A-tailing followed by a single tube flat-ended. Adapter coupling was performed using a variation of the connection kit found in the AGILENT SURESELECT kit, and PCR was performed for 7 cycles. In the targeted primer mix, there were 550 assays for SNPs on chromosome 21 and 325 assays for SNPs on chromosome 1 and X, respectively. Both protocols included 15 cycles of STA (30 sec at 95 캜; 1 min at 72 캜; 4 min at 60 캜; 30 sec at 65 캜; 30 sec at 72 캜) using 16 nM primer concentration. The semi-overlapping PCR protocol consisted of 15 cycles of STA (30 sec at 95 [deg.] C; 1 min at 72 [deg.] C; 4 min at 60 [deg.] C) using an internal forward tag concentration of 29 nM and reverse tag concentration of 1 [mu] M or 0.1 uM. 30 seconds at 65 [deg.] C; 30 seconds at 72 [deg.] C). Subsequently, the DNA was sequenced using an ILLUMINA IIGAX sequencer. For the direct PCR protocol, 73% of the reads were mapped to the genome; For the anti-overlap protocol, 97.2% of the sequence reads were mapped to the genome. Thus, the anti-overlap protocol generates approximately 30% additional information, which is most likely due to the removal of primers that are most likely to cause primer dimers.
The depth of readability variability tends to be greater when using a semi-overlapping protocol than when using a direct PCR protocol (see < RTI ID = 0.0 >20), Where the diamond shape refers to the depth of the reading for the locus of the gene performed using the semi-overlap protocol, and the square refers to the depth of the reading for the locus performed without the overlap. Because SNPs are arranged by the depth of the readings in the case of diamonds, the diamond shapes are all in the curve, but the squares are considered loosely related; The arrangement of SNPs is arbitrary, which is the height of the point representing the depth of the reading rather than its position from left to right.
In some implementations, the methods described herein can achieve excellent depth of readout (DOR) variables. For example, in one version of the experiment using direct PCR amplification of 1,200-plex of genomic DNA of 1,200 assays (21): 1186 assays had a DOR greater than 10; The mean depth of the readings was 400; 1063 assays (88.6%) had a depth of 200 to 800 readings, and although the number of readings for each allele was high enough to yield meaningful data, the number of readings for each allele was insignificant The ideal window is particularly small if it is not high enough to use. Only 12 alleles had a higher depth of maximum reading in 1035 readings. The standard deviation of the DOR was 290, the average DOR was 453, the coefficient of variation of the DOR was 64%, there were 950,000 total readings, and 63.1% of the readings were mapped to the genome. Other experiments using the 1,200-flex semi-nested protocol (22), The DOR was higher. The standard deviation of DOR was 583, the mean DOR was 630, the coefficient of variation of DOR was 93%, there were 870,000 total readings, and 96.3% of the readings were mapped to the genome. Note that in both of these cases, the SNP is arranged by the depth of the reading relative to the mother, so the curve represents the mother's depth of the reading. The differences between children and fathers are not significant; This is only a significant trend for the purposes of this description.
Experiment 8
In this experiment, the DNA of three cells and one cell was amplified using the anti-nested 1,200-flex PCR protocol. This experiment is related to fetal anemia testing using fetal cells isolated from maternal blood, or for preimplantation genetic diagnosis using biopsy or nutritional ectodermal samples. There were 3 replicates of 1 and 3 cells of 2 individuals per condition (46 XY and 47 XX + 21). The assay targeted

chromosomes

1, 21, and X. Three different degradation methods were used: ARCTURUS, MPERv2 and alkaline degradation. Sequence analysis was performed on 48 samples in one sequencing lane. The algorithm was regressed with accurate drainage requests for each of the three chromosomes, and for each replica.
Experiment 9
In one experiment, four maternal plasma samples were prepared and amplified using the semi-nested 9,600-flex protocol. Samples were prepared in the following manner: The mother blood and plasma were separated by centrifugation of 40 mL or less of maternal blood. Genomic DNA in maternal samples was prepared from the films and paternal DNA was prepared from blood samples or saliva samples. Cell-free DNA in the maternal plasma was separated using a QIAGEN CIRCULATING nucleic acid kit and eluted in 45 μL of TE buffer according to the manufacturer's instructions. A common link adapter was attached to the end of each molecule of 35 μL of purified plasma DNA and the library was amplified for 7 cycles using adapter-specific primers. The library was purified using AGENCOURT AMPURE beads and eluted in 50 μl of water.
3 μl of DNA was added to 15 cycles of STA (95 ° C for 10 min at 95 ° C for initial polymerase activation, 30 sec at 95 ° C for 15 sec, 10 sec at 72 ° C, 1 min at 65 ° C, Min for 3 min at 65 ° C and 30 sec at 72 ° C and 2 min at 72 ° C) and 14.5 nM primers of 9600 target-specific tagged reverse primers and one library adapter specific forward primer at 500 nM. < / RTI >
The semi-nested PCR protocol consisted of 15 cycles of STA using a reverse tag concentration of 1000 nM and a concentration of 16.6 nM for each 9600-specific forward primer (10 min at 95 [deg.] C for initial polymerase activation, For 15 cycles of 95 ° C for 30 seconds; 65 ° C for 1 minute; 60 ° C for 5 minutes; 65 ° C for 5 minutes and 72 ° C for 30 seconds; and 72 ° C for 2 minutes) And a second amplification of the dilution of the STA product.
Subsequently, an aliquot of the STA product was amplified by standard PCR with standard PCR for 10 cycles using 1 [mu] M of tag-specific forward and barcoded reverse primers to generate a bar codeed sequencing library. An aliquot of each library was mixed with a library of different barcodes and purified using a rotary column.
In this way, 9,600 primers were used in a single-well reactor; Primers were designed to target SNPs found on

chromosomes

1, 2, 13, 18, 21, X, Subsequently, the amplicon was sequenced using an ILLUMINA GAIIX sequencer. Approximately 3.9 million readings per sample were generated by the sequencer and 3.7 million readings were mapped to the genome (94%), of which 2.9 million readings (74%) were mapped to the targeted SNPs The mean depth of the readings was 344 and the median depth of the readings was 255. Fetal fractions for four samples were found to be 9.9%, 18.9%, 16.3%, and 21.2%.
Relative maternal and paternal genomic DNA samples were amplified and sequenced using the semi-nested 9600-flex protocol. The semi-conjugated protocol differs in that it is applied to 9,600 external forward primers and tagged reverse primers at 7.3 nM in the first STA. The heat cycle conditions and the composition of the second STA, and bar code PCR were identical to the anti-nested protocol.
Sequence analysis data was analyzed using the informational methods disclosed herein and the pivotal status was requested on six chromosomes for fetuses whose DNA was present in four maternal plasma samples. Drainage requests for all 28 chromosomes in the set were correctly requested with confidence above 99.2%, except for one chromosome that was correctly requested, but with a confidence of 83%.
23, The depth of the readings of the 1,200-flat semi-nested approach described in Experiment 7, and the depth of the readings of the 1,200-fold semi-nested approach described in Experiment 7, even if the number of SNPs having a depth of 100 or more, 200 or more, and 400 or more readings is significantly higher than in the 1,200- Together represents the depth of the reading of the 9,600-flex semi-nested approach. The number of readings at the 90th percentile was divided by the number of readings at the tenth percentile to obtain an indeterminate criterion, which is an index of the uniformity of the depth of the readings; The smaller the number, the more uniform the depth of the reading. The average 90th percentile / 10th percentile ratio is 11.5 for the method performed in Experiment 9, but 5.6 for the method performed in Experiment 7. A smaller sequence reading is better for sequencing efficacy as the depth of the reading for a given protocol drainage is narrower as it is necessary to ensure that a certain percentage of the reading is above the reading water limit.
Experiment 10
In one experiment, four maternal plasma samples were prepared and amplified using the semi-nested 9,600-flex protocol. The details of Experiment 10 are overlapping protocols and are very similar to Experiment 9, except that it contains four sample entities. Drainage requests for all 28 chromosomes in this set were correctly requested to exceed 99.7% confidence. 7.6 million (97%) of the readings were mapped to the genome and 6.30 million (80%) of the reads were mapped to the targeted SNPs. The mean depth of the readings was 751 and the median depth of the readings was 396.
Experiment 11
In one experiment, three maternal plasma samples were divided into four identical sites and each site was amplified using 2,400 multiplexed primers (four sites) or 1,200 multiplexed primers (one site), resulting in a total of 10,800 Lt; RTI ID = 0.0 > anti-nested < / RTI > protocol. After amplification, the sites were harvested together for sequencing. The details of Experiment 11 were a nested protocol and very similar to Experiment 9, except that it was a split and mixed approach. With the exception of one lost request when the confidence was 83%, the drain request for all 21 chromosomes in the set was correctly requested with a confidence level of over 99.7%. 3.4 million readings were mapped to the targeted SNPs, the mean depth of the readings was 404, and the median depth of the readings was 258.
Experiment 12
In one experiment, four maternal plasma samples were divided into four identical sites, each site amplified using 2,400 multiplexed primers and amplified using a semi-nested protocol for a total of 9,600 primers. After amplification, the sites were harvested together for sequencing. The details of Experiment 12 are overlapping protocols and very similar to Experiment 9 except that it is a split and mixed approach. Except for one lost request when the confidence was greater than 78%, the diversion request for all 28 chromosomes in the set was correctly requested with a confidence level of greater than 97%. 4.5 million readings were mapped to the targeted SNPs, the mean depth of the readings was 535, and the median depth of the readings was 412.
Experiment 13
In one experiment, four maternal plasma samples were prepared and amplified using a 9,600-flex triplicate anti-nested protocol for a total of 9,600 primers. The details of Experiment 12 are a nested protocol involving three rounds of overlap; Three rounds were very similar to experiment 9, except that they included 15, 10, and 15 STA cycles, respectively. Drainage requests for 27 of the 28 chromosomes in the set were correctly requested with a confidence level of greater than 99.9%, except for one correctly requested of 94.6% and one lost request of 80.8% confidence. 3.5 million readings were mapped to the targeted SNPs, the mean depth of the readings was 414, and the median depth of the readings was 249.
Experiment 14
In one experiment, 45 sets of cells were amplified using the 1,200-flex anti-nested protocol and the ploidy measurements were performed on three chromosomes. Note that this experiment implies simulating the conditions under which preimplantation genetic diagnosis is performed on a single-cell biopsy of the embryo on day 3, or on a nutritional ectodermal biopsy of the embryo on day 5. Fifteen individual single cells and 30 sets of three cells were placed in 45 individual reaction tubes for a total of 45 reactions where each reaction contained only one cell line but the different reactants contained different cell lines Cells. Cells were prepared with 5 [mu] l of wash buffer and 5 [mu] l of ARCTURUS PICOPURE digestion buffer (APPLIED BIOSYSTEMS) was added and digested and incubated at 56 [deg.] C for 20 min and 95 [deg.] C for 10 min.
DNA from single / 3 cells was stained with 25 cycles of STA (10 min at 95 [deg.] C for initial polymerase activation, followed by 25 cycles 65 ° C for 30 minutes; 72 ° C for 30 minutes; and 72 ° C for 2 minutes).
The semi-nested PCR protocol consisted of 20 cycles of STA (95 < 0 > C for initial polymerase activation using a reverse primer specific primer concentration of 1000 nM and a concentration of 60 nM for each of the 400 target- specific overlapping forward primers, 5 min at 65 캜 and 30 sec at 72 캜, and 2 min at 72 캜) for 10 min, followed by 15 cycles of 95 캜 for 30 sec, 1 min at 65 캜, 5 min at 60 캜 Lt; RTI ID = 0.0 > STA < / RTI > Thus, a total of 1200 targets amplified in the first STA in three parallel 400-plex reactants were amplified.
An aliquot of the STA product was then amplified by standard PCR for 15 cycles using 1 [mu] M of tag-specific forward and bar codeed reverse primers to generate a bar codeed sequencing library. An aliquot of each library was mixed with a library of different barcodes and purified using a rotary column.
In this manner, 1,200 primers are used in a single cell reaction; Primers were designed to target SNPs found on

chromosomes

1, 21, and X. Subsequently, the amplicon was sequenced using an ILLUMINA GAIIX sequencer. Approximately 3.9 million readings per sample were generated using 500,000 to 800,000 million (500 to 800 billion) readings mapped to the genome by a sequencer (74% to 94% of all readings per sample) %).
Relevant maternal and paternal genomic DNA of the cell line was analyzed and sequenced using the same semi-nested 1200-plex black castings with fewer cycles and similar protocols with 1200-plex second STA.
Sequence analysis data was analyzed using the informatics method disclosed herein and the ploidy status was requested on three chromosomes for the samples.
24Shows the normalized depth (vertical axis) of the ratio of readings to 6 samples on three chromosomes (1 = chromosome 1; 2 = chromosome 21; 3 = chromosome X) The ratio was normalized and set equal to the number of readings mapped to chromosomes divided by the number of readings mapped to chromosomes averaged over each of the three wells containing three 46XY cells. Three sets of data points corresponding to 46XY reactants are expected to have a 1: 1 ratio. Three sets of data points corresponding to 47XX + 21 cells are expected to have a ratio of 1: 1 for chromosome 1, 1.5: 1 for chromosome 21, and 2: 1 for chromosome X. [
25Represents the allelic ratio plotted against three chromosomes (1, 21, X) for the three reactants. Reactants in the lower left represent reactions in three 46XY cells. The left region is the allele ratio for chromosome 1, the middle region is the allele ratio for chromosome 21, and the right region is the allele ratio for chromosome X. In the case of 46XY cells, for chromosome 1, the inventors predicted finding a ratio of 1, 0.5 and 0 corresponding to the AA, AB and BB SNP genotypes. In the case of 46XY cells, for chromosome 21, the inventors predicted finding a ratio of 1, 0.5 and 0 corresponding to the AA, AB and BB SNP genotypes. For 46XY cells, for chromosome X, we predicted finding a ratio of 1 and 0 corresponding to A and B SNP genotypes. The reaction in the lower right represents the response of three 47XX + 21 cells. The allele ratio was separated into chromosomes as shown in the lower left graph. In the case of 47XX + 21 cells, for chromosome 1, the inventor expects to find a ratio of 1, 0.5 and 0 corresponding to the AA, AB and BB SNP genotypes. In the case of 47XX + 21 cells, for chromosome 21, the inventor predicts finding a ratio of 1, 0.67, 0.33 and 0 corresponding to the AAA, AAB, ABB and BBB SNP genotypes. In the case of 47XX + 21 cells, for chromosome X we expect to find a ratio of 1, 0.5 and 0 corresponding to the AA, AB, and BB SNP genotypes. At the top right, the plots were made in a reaction containing 1 ng of genomic DNA in the 47XX + 21 cell line.26Lt; RTI ID = 0.0 > 25 < / RTI > but only for the reactants performed in one cell. The left graph shows reactants containing 47XX + 21 cells and the right graph shows reactants containing 46XX cells.
25And26, Two groups of points for the chromosome in the case where the inventor predicted to find the ratio of 1 and 0; When the present inventor predicted to find a ratio of 1, 0.5 and 0, the three groups of points for the chromosome and the inventors predicted to find the ratio of 1, 0.67, 0.33 and 0, It is obvious that there is a dog group. The maternal backbone algorithm was able to achieve accurate requests for all three chromosomes for all of the 45 reactants.
Experiment 15
In one experiment, maternal plasma samples were prepared and amplified using the semi-nested 19,488-flex protocol. Samples were prepared in the following manner: The mother blood and plasma were separated by centrifugation of the mother blood of 20 mL or less. Genomic DNA in maternal samples was prepared from the films and paternal DNA was prepared from blood samples or saliva samples. Cell-free DNA in the maternal plasma was separated using the QIAGEN CIRCULATING NUCLEIC ACID kit and eluted in 50 μl of TE buffer according to the manufacturer's instructions. A common link adapter was attached to each molecular end of 40 [mu] L of purified plasma DNA and the library was amplified using adapter specific primers for 9 cycles. The library was purified with AGENCOURT AMPURE beads and eluted in 50 μl of DNA suspension buffer.
6 μl of DNA was added to 15 cycles of STAR 1 (95 ° C for 10 min for the initial polymerase activation followed by 15 cycles of 96 ° C for 30 sec, 65 ° C for 1 min, 58 ° C for 6 min, Min .; 4 min at 65 ° C and 30 sec at 72 ° C; and 2 min at 72 ° C), and a single library adapter specific at 500 nM And amplified using a forward primer.
The semi-nested PCR protocol consisted of 15 cycles (STAR2) using a reverse tag concentration of 1000 nM and a concentration of 20 nM for each 19,488 target-specific forward primers (95 < 0 > C for 10 minutes , Followed by 15 cycles of 95 ° C for 30 seconds; 65 ° C for 1 minute; 60 ° C for 5 minutes; 65 ° C for 5 minutes and 72 ° C for 30 seconds; and 72 ° C for 2 minutes) Lt; RTI ID = 0.0 > 1 < / RTI > product
Subsequently, an aliquot of the STAR 2 product was amplified with standard PCR for 12 cycles using 1 μM of tag-specific forward and barcoded reverse primers to generate a bar codeed sequencing library. An aliquot of each library was mixed with a library of different barcodes and purified using a rotary column.
In this manner, 19,488 primers were used in a single-well reaction; Primers were designed to target SNPs found on

chromosomes

1, 2, 13, 18, 21, X, Subsequently, the amplicon was sequenced using an ILLUMINA GAIIX sequencer. For plasma samples, approximately 10 million readings were generated by a sequencer, 940-9.6 million readings were mapped to the genome (94-96%) and 99.95% of them were stored at an average depth of 460 readings And a mid-depth of 350 readings. For comparison, a perfectly uniform distribution is as follows: 10 M readings / 19,488 targets = 513 readings / target. For the primer-dimer, 30,000 readings were from the sequenced primer-dimer (0.3% of the readings generated by the sequencer). For genomic samples, 99.4 to 99.7% of the reads were mapped to the genome, of which 99.99% mapped to the target SNP and 0.1% of the reads generated by the sequencer were primer-dimer.
For 10 million sequencing readings, at least 19,350 (99.3%) of the 19,488 targeted SNPs were typically amplified and sequenced. For DNA samples with 2M sequencing reads, typically at least 19,000 targeted SNPs (97.5%) were amplified and sequenced. A lower number may be due to sample noise, since the number of readings is lower and the sequencer loses some of the amplified product. In some cases, increasing the number of sequencing readings can increase the number of targeted SNPs that are amplified and sequenced.
Related maternal and paternal genomic DNA samples were amplified using STAR 1 with 7.5 nM semi-nested 19,488 external forward primers and tagged reverse primers. Thermal cycle conditions of STAR 2 and barcoding PCR were the same as for the semi-nested protocol.
The average fetal fraction for 407 samples was found to be 14.8%. Sequence analysis data were analyzed using the informational method disclosed herein and the ploidy status was determined for the fetuses in 378 chromosomal X samples from 377 of the 407 maternal plasma samples present in 378 of the 407 maternal plasma samples, (13, 18, 21, Y). Drainage requests for all 1,887 chromosomes in this set were correctly requested with a confidence level of over 90%. 1882 of 1887 requests exceeded 95%; Of the 1887 requests, 1,862 were requested with more than 99% confidence.
A similar control experiment was performed using water instead of plasma extracted DNA in the plasma PCR protocol. Based on six of these approaches of the experiment, 5-6% of the sequenced readings were primer-dimers. Other sequenced readings were due to background noise. This experiment demonstrates that even in the presence of a nucleic acid sample with the target gene locus for the primer (rather than hybridizing to another primer to form the amplified primer dimer) for hybridization, little primer dimer is formed .
Experiment 16
The following experiments list exemplary methods for designing and selecting libraries of primers that can be used in any of the multiplex PCR methods of the present invention. The goal is to select a primer in an initial library of post-retention primers that can be used to simultaneously amplify multiple target gene sites (or a small set of target gene sites) in a single reaction. For the initial set of tracer target gene sites, the primers did not need to be designed or selected for each target gene locus. Preferably, the primers are designed and selected for large regions of most preferred target gene sites.
Step 1
A set of trefoil target gene sites (e.g., SNPs) was selected based on publicly available information on the desired parameters for the target gene locus, such as frequency of SNPs in the target population or heterozygosity ratio of SNPs (see worldwide web 2001 Jan 1; 29 (1): 308-11, 2001. [0156] < EMI ID = 19.1 > Each of which is incorporated by reference). For each trait locus, one or more pairs of PCR primers are included in the Primer3 program (primer3.sourceforge.net; global network at libprimer3 release 2.2.3, each of which is hereby incorporated by reference). Where there is no feasible design for PCR primers for a particular target gene locus, the target gene locus is excluded from further consideration.
In some cases, a "target gene locus score" (a higher score indicates a higher degree of preference) may be a target gene locus, such as a target locus score calculated based on the weighted average of the various desired parameters for the target gene locus Can be calculated for most or all of the digits. The parameters may be assigned different weights based on their importance to the particular application for which the primer is to be used. Exemplary parameters include the heterozygosity ratio of the target gene locus, the disease prevalence (e.g., polymorphism) associated with the sequence at the target gene locus, disease penetration ability (e.g., polymorphism) associated with the sequence at the target locus, The size of the target primer (s) used to amplify the target gene locus, and the size of the target amplicon (s) after being used to amplify the target primer (s).
Step 2
The thermodynamic interaction score was calculated for all other target loci in step 1 between each primer and all primers (see, for example, Allawi, HT & SantaLucia, J., Jr. Internal CT Mismatches in DNA ", "Nucleic Acids Res . 26, 2694-2701; Peyret, N., Seneviratne, P. A., Allawi, H. T. & Santa Lucia, J., Jr. (1999), "Nearest-Neighbor Thermodynamics and NMR of DNA Sequences with Internal A-A, C-C, G-G, and T-Biochemistry 38, 3468-3477; Allawi, H. T. & Santa Lucia, J., Jr. (1998), "Nearest-Neighbor Thermodynamics of Internal A-C Mismatches in DNA: Sequence Dependence and pH Effects"Biochemistry 37, 9435-9444 .; Allawi, H. T. & Santa Lucia, J., Jr. (1998), "Nearest Neighbor Thermodynamic Parameters for Internal G-A Mismatches in DNA"Biochemistry 37, 2170-2179; and Allawi, H. T. & Santa Lucia, J., Jr. (1997), "Thermodynamics and NMR of Internal G-T Mismatches in DNA &Biochemistry 36, 10581-10594; 2005, Apr. 21 (8): 1701-2, the contents of which are incorporated herein by reference in their entirety, are hereby incorporated by reference in their entirety for all purposes. Mμl tiPLX 2.1 (Kaplinski L, Andreson R, Puurand T, Remm M. Mμl tiPLX: Automatic grouping and evaluation of PCR primers. . This step generated a 2D matrix of interaction scores. The interaction score predicted the possibility of a primer-dimer containing two interacting primers. The scores were calculated as follows:
Interaction_score = max (- deltaG_2, 0.8 * (- deltaG_1))
Where deltaG_2 is the Gibbs energy (the energy required to break the dimer) for the dimer that can be extended by PCR at both ends, i. E., At the 3'end of each primer annealing to the other primer;
deltaG_1 is the Gibbs energy for the dimer that can be extended by PCR at at least one end.
Step 3:
For each target locus, if there was more than one primer-pair design, one design was chosen using the following method:
For each primer-pair design for one locus, the worst case (maximum) interaction score for two primers in all primers and designs of all designs for all other target loci is found.
2 Select the design using the interaction score for maximum (lowest) and worst case.
Step 4
The graph was constructed such that each node represented a single locus and its associated primer-pair design (eg, Maximal Clique problem). One edge was created between every pair of clauses. Weights were assigned to each edge equivalent to the worst-case (maximal) interaction between the primers associated with the two clauses connected to the edge.
Step 5
In some cases, for each pair of designs for two different target loci, one of the primers of one design and one of the primers of the other design can be annealed to the overlapping target region, Between the sections. The weights of these corners were set equal to the maximum weight specified in step 4. Thus, step 5 prevents one from having a primer that can anneal to the target region overlapping the library, thus interfering with each other during the multiplex PCR reaction.
Step 6
The initial interaction score limits were calculated as follows:
Weight_threshold = max (_edge__weight) - 0.05 * (max_edge__weight) - min (_edge__weight))
here,
max (_edge__weight) Is the maximum corner weight in the graph;
min (_edge__weight) Is the minimum corner weight in the graph.
The initial coupling to the limits was set as follows:
max__weight__Threshold = max (_edge__weight)
min__weight__Threshold = min (_edge__weight)
Step 7
A new graph consisting of the same set of clauses as the graph of step 5 and containing only edges with weights exceeding the weight_threshold was constructed. Thus, the step ignores interactions with scores equal to or less than the weight_threshold value.
Step 8
The section (and all of the corners connected to the removed section) were removed from the graph in step 7 until no corners were left. The clause was repeatedly removed by applying the following procedure:
1. Find the paragraph with the greatest degree (the maximum score of the corner). If more than one exists, select one arbitrarily.
2. Defines a set of clauses that consists of both the selected clause and the clauses connected to it but excludes any clauses that have a degree less than the clause selected above.
3. Select the clause in the set with the minimum target gene locus score of step 1 (lower score indicates lower preference). Remove the clause from the graph.
Step 9
If the number of sections remaining in the graph satisfies the required number of target gene positions for the multiplexed PCR mixture (within tolerance tolerated), the method is continued in step 10.
If there are too many or too few clauses in the graph, a binary search is performed to determine which limit values can cause the desired number of clauses remaining in the graph. Next, if too many clauses were present in the graph, the weight limit binding was adjusted as follows:
max__weight__Threshold = Weight_threshold
Also, if there were two small sections in the graph, the weight limit binding was adjusted as follows:
min__weight__Threshold = Weight_threshold
Thereafter, the weight limit was adjusted as follows:
Weight_threshold = (max__weight__Threshold + min__weight__Threshold) / 2
Steps 7 to 9 were repeated.
Step 10
The primer-pair design associated with the remaining clauses in the graph was chosen for the library of primers. The primer library can be used in any of the methods of the present invention.
In some cases, a method for designing and selecting primers has been performed for a primer library, wherein only one primer (instead of a primer pair) is used for amplification of the target gene site. In this case, the section represents one primer per target gene locus (rather than the primer pair).
Experiment 17
27Is a graph comparing two primer libraries designed using the method of the present invention. The graph shows the number of gene sites with a particular minor allele frequency targeted by each primer library. During the selection of the "new horn molding" library, more primers were retained. The library is capable of amplifying a target gene locus with more target gene loci, particularly relatively small minor allele frequencies (which is a more beneficial allele for some methods of the invention, such as for detecting fetal chromosomal abnormalities) .
These primer libraries were used in the following multiplex PCR methods. Blood (20-40 mL) was injected from each subject into 2 to 4 CELL-FREE^TM DNA tubes (manufactured by Streck). Plasma (min. 7 mL) was separated from each sample at 2,000 g for 20 min followed by a double centrifugation protocol for 30 min at 3,220 g, and the supernatant was transferred after the initial rotation. cfDNA was isolated using a QIAGEN QIAamp circulating nucleic acid kit in 7-20 mL plasma and eluted in 45 [mu] l TE buffer. Pure maternal genomic DNA was isolated from the resulting films after first centrifugation and pure pseudogenetic DNA was similarly prepared from blood, saliva or ball samples.
Maternal cfDNA, maternal genomic DNA, and paternal genomic DNA samples were pre-amplified using 11,000 target-specific assays for 15 cycles and aliquots were amplified using primers superimposed on 15 cycles of the second PCR reaction Relocated. Finally, samples were prepared for sequencing by adding a barcoded tag in a third 12-round round of PCR. Thus, 11,000 tags were amplified in a single reaction; The target contained SNPs found on

chromosomes

13, 18, 21, X, Subsequently, the amplicon was sequenced using the ILLUMINA GAIIx or HISEQ sequencer. The paternal phenotype was sequenced at a lower reading depth (~ 20% cfDNA read depth) than the fetal genotype.
Experiment 18
In some cases, the size and amount of the PCR product can be analyzed using standard methods such as the Agilent Technologies 2100 Bioanalyzer (28A To 28m). For example, in the absence of overlap, the direct PCR method described herein 2,400-Flex Test(Fig. 28 Degree 28g) And 19,488-flex experiments(Fig. 28H To 28m)Respectively. The amount of primerFigure 28B To 28d and 28h To 28jAnd 10 nM, respectively. The amount of primer28E To 28 g and 28 k to 28 mGt; 1 < / RTI > The amount of DNA injected isDegree 28b, 28e, 28h and 28k, Respectively.Figures 28c, 28f, 28i, 28l, Respectively.28d, 28g, 28j, and 28mAnd 250 ng, respectively. Additional input DNA generated a higher proportion of the desired 180 base pair product. The peak in the 140 base pairs was the primer dimer product.
Experiment 19
Principle proof studies have demonstrated the detection of T13, T18, T21, 45, X, and 47, XXY with equally high precision across all chromosomes.
patient
A pair of pregnancies was enrolled in a specific antenatal care center under a protocol approved by the Institutional Review Board according to the method. The inclusion criteria were at least 18 years of age, at least 9 weeks of gestation, singleton pregnancy, and signed informed consent. Blood samples were collected from pregnant mothers and blood or ball samples were collected from the father. Two pregnant women with T13 (Patau Syndrome), two pregnant women with T18 (Edwards syndrome), two pregnant women with T21 (Down syndrome), two pregnant women with 45, X, 47, XXY , And 90 normal pregnant women were selected before testing in a population of ~ 500 women to test which chromosomal anomalies detected the method. Normal fetal karyotypes were identified by molecular karyotype analysis of the samples, where postnatal fetal tissues were available. Hydrophobic samples were collected before invasive testing in low-risk women. The aqueous samples were collected at least 7 days after the invasive test and the water solubility was verified by cytogenetic karyotyping or fluorescence in the reaction system in an independent laboratory.
Sample preparation and multiplexing PCR
30A To 30e, 30g, 30h, and 31a To 31gSample preparation and 19,488-flex-PCR were performed as described in Example 15. < tb > < TABLE >30fSample preparation and 11,000-flex-PCR were performed as described in Experiment 17. < tb > < TABLE >
Methodology and data analysis
Algorithm was used to calculate 19,488 polymorphic loci for a very large number of possible fetal ploidy states, and for alleles predicted from various fetal cfDNA fragments, taking into account the parental genotype and cross-frequency data (as in the HapMap database) The distribution was calculated (29A- 29 c). Unlike allele-ratio-based methods, it also considers linkage imbalances and describes the predicted distribution of allelic measures at a given SNP, with observed platform characteristics and amplification bias using non-Gaussian data models. This is followed by the use of various predicted allele distributions in cfDNA samples (Figure 29c), And compared each of the hypotheses based on sequence data (one chromosomal, heterochromatic, or trisomy-chromosomal, based on a variety of potential cross-overs with multiple hypotheses Quot; exists "). The algorithm combines the possibility of a single chromosomal, dichromic, or trisomy-like hypothesis of each individual (29d), The number of copies, and the fetus fraction as the maximum possible overall probability (29E). Although the laboratory researchers were not blinded to the sample karyotype, the algorithm required a drainage condition without human intervention and was blinded to the truth.
data Translate
Generalized Of data Graphical Display
In order to measure the ploidy status of the desired chromosome, the algorithm considers the distribution of the number of sequences of each of the two possible alleles at 3,000 to 4,000 SNPs per chromosome. It is important to note that the algorithm makes a drain request using an approach that does not leverage itself to visualization. Thus, for purposes of listing, the relevant aspects may be more readily visualized by presenting the data in simplified form as the ratio of the two most similar alleles, labeled A and B herein. This simplified enumeration does not take into account some of the characteristics of the algorithm. For example, two important aspects of an algorithm that can not be enumerated using the visualization method to represent allele ratios are: 1) the ability to leverage the linkage imbalance, ie, the ability to measure in one SNP from a similar entity in a neighboring SNP And 2) a non-Gaussian data model in which platform characteristics and amplification bias describe a predicted distribution of allelic measures at a given SNP. It is also noted that the algorithm considers only the two most common alleles in each SNP, ignoring other possible alleles.
30A To 30hThe graphical representation of which includes samples with two, one, or three fetal chromosomes. Generally, this is an aqueous solution of30AG30c) Sun chromosomal (30D), And trisomy (Figs. 30E to 30HRespectively. In all plots, each point represents a single SNP, wherein the targeted SNPs are sequentially plotted from left to right for one chromosome along a horizontal axis. The vertical axis represents the number of readings for the A allele as a fraction of the total number of readings for both A and B alleles for this SNP. Measurements were performed on total cfDNA isolated from maternal blood and cfDNA included both maternal and fetal cfDNA; Note that each point represents a combination of fetal and maternal DNA contributions to these SNPs. Thus, increasing the ratio of maternal cfDNA from 0% to 100% will gradually move some points upward or downward in the plot, depending on the maternal and fetal genotype. This is described in more detail below with the corresponding plot.
When it is desired to facilitate visualization, the maternal genotype more contributes to the localization of each point, and since most trisomes are inherited in the parent, points can be color-coded according to the parental genotype; This assists in visualizing drainage conditions. Specifically, the SNP with a genotype AA of AA can be represented in red, and the gene having AB of the mother genotype can be shown in green, and the gene having BB of the mother genotype can be shown in blue.
In all cases, SNPs homozygous for the A allele (AA) in both the mother and fetus are closely related to the upper limit of the plot, since the B allele may not be present and the fraction of the A allele is high . Conversely, SNPs heterozygous for the B allele in both maternal and fetal are found to be closely related to the lower limit of the tdhe plot, since the fraction of the A allele reading is low since only the B allele should be present have. A spot not closely related to the higher and lower limits of the plot indicates that the parent, fetus, or both are heterozygous SNPs; These spots are useful for identifying fetal drainage, but may also be beneficial in determining paternal and maternal heritability. These points are separated on the basis of both maternal and fetal genotypes and fetal fractions, and the detailed location of each individual point along the y-axis by itself is dependent on both the stereochemical and fetal fractions. For example, it is predicted that the genetic locus, where the parent is AA and the child is AB, will have different positioning along the y-axis, depending on the fetal fraction, by having different fractions of the A allele reading.
Two chromosomes present
30A To 30c, When the sample is completely collected (no fetal cfDNA exists,30A) If the presence of two chromosomes contains an intermediate fetal cfDNA fraction (Degree 30b), High fetal cfDNA fraction (30c). &Lt; / RTI >
30AQuot; refers to data obtained from cfDNA isolated from the blood of a non-pregnant woman. If the fetal cfDNA is not present and the sample contains only maternal cfDNA, the plot represents a purely muddy phenotype; The Hallmark format includes a "population" of points: the red population is closely related to the top of the point (the SNP with the maternal genotype AA), the blue population is closely related to the bottom of the point SNP with the parental genotype BB) and one with the green population (SNP with the parental genotype AB) (color not shown).
When fetal cfDNA is present, the position of the point shifts so that the population is separated into distinct "bands". For samples with a fetal fraction of 0%, grouping of points is referred to as "population" (30A, The grouping of points is referred to as a "band" for all samples with fetal fractions> 0%30B To 30jAs shown in Fig. If the fetal fraction is high enough, these distinct bands will be readily visible. Specifically,30B and 30CDemonstrate distinctive patterns associated with the two fetal chromosomes, which are represented by the middle and high fetal fractions, respectively. This form contains two "peripheral" bands at both the top (red) and bottom (blue) points of the three center green bands corresponding to homozygous SNPs in the parent and corresponding points in the parent SN homozygous SNP (Color not shown).
30BShows data obtained from cfDNA isolated from a plasma sample of a female with a fetal cfDNA fraction of 12% with an embryonic fetus. Here, the groups of points closely related to the top and bottom of the plot were each separated into two distinct bands: one red and one blue outer peripheral band that remained closely related to the upper or lower limit of the plot, One red and one blue inner perimeter band (color not shown) separated at the limit of the < RTI ID = 0.0 > These inner peripheral bands, which are concentrated around 0.92 and 0.08, have SNPs whose parental genotype is AA and whose fetal genotype is AB (red), and SNPs whose parental genotype is BB and whose fetal genotype is AB . The central group of green dots expands, but the separation into apparent bands in the fetal fraction is not readily visible.
In the high fetal cfDNA fraction, the presence of two chromosomes (three green bands and also two red and two blue peripheral bands) is readily apparent (color not shown).Degree 30cShows data obtained from a plasma sample of a woman with an afferent fetus in a 26% fetal cfDNA fraction. Here, separating the surrounding bands allows the inner band to move toward the center of the plot due to the altered level of the B allele in the increased fetal cfDNA fraction. Significantly, in higher fetal fractions, the separation of the central green population into three distinct bands is now readily apparent. In this case, these three central bands clustered around 0.37, 0.50 and 0.63 correspond to these SNPs, where the maternal genotype is AB and the fetal genotypes are AA (top), AB (middle) and BB (bottom).
These hallmark forms, i.e., three green bands and four surrounding bands (two red and two blue), represent the presence of two chromosomes, such as in the X chromosome or autophospholipid in the female (XX) fetus.
One chromosome present
If a child inherits a single chromosome and inherits only a single allele, heterozygosity of the fetus is impossible. Thus, the only possible fetal SNP entity is A or B. Thus, the maternal-inherited one chromosomal chromosome has a characteristic pattern of two central green bands representing the SNPs homozygous for this, and is homozygous for this, and the upper and lower bounds of the plot (1 and 0 Lt; RTI ID = 0.0 > SNP < / RTI > that is closely related to < RTI ID = 0.0 &(Fig. 30D)(Color not shown). Note the absence of an inner peripheral band. This form represents the presence of one chromosome, as in the mother-inherited autosomal one chromosome, or the case of the X chromosome in the male (XY) fetus.
Three chromosomes present
Trisomy chromosomes have three distinctive forms. The first form represents a mitotic error inherited from two homozygous, non-identical chromosomes from maternal-inherited meiosis trisomal, maternal and maternal mothers(Fig. 30E); This form represents two center green bands each with two surrounding red and blue bands (colors not shown). The second mode is paternity-inherited mitotic trisomy, in which the fetus inherits two homologous, non-identical chromosomes from the father(Fig. 30); This form contains four center green bands and three surrounding red and blue bands, respectively (colors not shown). The third form,(Fig. 30) Or paternal-inherited(Fig. 30H) Shows a mitotic error inherited from two identical chromosomes from mitotic trisomy, trisomy, or father; This form contains four center green bands each with two surrounding red and blue bands. Maternal- and paternal- inherited mitotic tricellism is characterized by the replacement of the flanked red and blue bands by the red and blue inner peripheral bands (which are not related to the limits of the plot) (The color is not shown). This is due to the paternal contribution of the same chromosome. Our previous results indicate that 66.7% of mothers-inherited trisomes are mitotic and only 10.2% of the trisomy is paternal-inherited.
For the Y chromosome, the PS method considers a different set of hypotheses: there are 0, 1, or 2 chromosomes. Because there is no maternal contribution to the sequence reading at each locus and a heterozygous locus is not possible (the two Y chromosomes do not necessarily contain two identical chromosomes), the band is located at the top of the plot Gene) or bottom (B allele) (data not shown), the analysis is greatly simplified depending on the quantitative allele number data. Since the method can query the SNP, it uses the homologous, non-recombinant SNP of the Y chromosome, so that data for both X and Y for one probe pair is obtained.
Confirmation of Lee Sung-sung
Verification of autosomal integration using this plot-based visualization method is simplified in providing a sufficient fetal fraction and requires only a step to identify plots in which an abnormal number of chromosomes exist as described above. The knowledge of the copy number of the X and Y chromosomes is combined to confirm the presence of sex chromosomal aberration. Specifically, plots representing fetuses with 47, XXX genotypes will have a typical "trichromosome" pattern, while plots representing fetuses with 47, XXY genotypes will have a typical "e-chromosome" pattern for the X chromosome , And will have an allele reading indicating the presence of one Y chromosome. The method may require 47, XYY, wherein the "one-chromosome" format indicates the presence of a single X chromosome, and the allele reading indicates the presence of two Y chromosomes. The fetus with a 45, X genotype will have data representative of the "one chromosome" pattern, and zero Y chromosomes for the X chromosome.
Effect of Fetal Fraction
As discussed above, the number of sequence readings of the fetus contributes to the detailed location of each point along the y-axis in the plot. Fetal fraction will also influence the localization of each point, since it affects the proportion of fetuses and maternal readings.30c To 30e AndFigures 30g and 30h, The presence of the fetal DNA of the allele whose genotype is distinct from that of the maternal genotype, although the point group is mainly based on the maternal genotype, To a clear band of < / RTI > However, as the fetal fraction decreasesFigures 30b and 30f, The point regresses toward the center of the pole and plot to create a harder cluster. Specifically, a set of surrounding red bands (wherein the maternal genotype is AA) regresses towards the top of the plot; A set of surrounding blue bands (where the maternal genotype is BB) regresses towards the bottom; The set of center green bands (which are heterozygous driven here) is concentrated in a single population at the center of the plot30B and 30C) (Color not shown). Although this visualization technique is not readily apparent by eye using a low fertilization fetal fraction, the algorithm can identify the drainage status using a very low fetal fraction, such as a 3% fetal fraction. It is possible to perform this because the statistical technique compares the observed data with a very precise data model that predicts the allele distribution for a given set of sample parameters (including number of copies, paternity genotype, and fetal fraction). Data model accuracy is important in the case of low fetal fractions because the difference between the allele distribution for the different mating conditions is proportional to the fetal fraction. In addition, the algorithm can be measured if the data set does not contain enough data to achieve a reliable fetal drainage measurement.
result
Sequence analyzes that map to targeted SNPs have been considered beneficial and have been used by algorithms. Over 95% of the targeted gene loci were observed in the sequencing results. The plot to visualize the main drain request31A To 31gRespectively.31ARepresents an aqueous solution sample. Here,

chromosomes

13, 18, and 21 have a typical "two-chromosome" style (as described herein). It has three sets of center green bands, and two red and two blue peripheral bands. This represents an orthotropic XY genotype (with no color shown), along with the presence of the two central green bands for the X chromosome and the Y chromosome band along the periphery of the plot.
The most prevalent autosomal trisomes, T13, T18, and T21,Degree 31b, 31c, and 31dRespectively. Specifically,31BIndicates a T13 sample. Here,

chromosomes

18 and 21 represent typical "two chromosome" patterns, chromosome X represents a typical "one chromosome" pattern, and there is a reading of the Y chromosome. Together, it represents a

chromosome

18 and 21 chromosomal and identifies the fetal XY genotype. However, chromosome 13 specifically represents a typical "trichromosome" Similarly,Figure 31cRepresents a T18 sample,Figure 31dIndicates a T21 sample.
The method also includes the steps of: 45, X(Fig. 31E), 47, XXY(Fig. 31f), And 47, XYY(Figure 31g)Lt; RTI ID = 0.0 > chromosomal < / RTI > Note that the method asks for a copy number on

chromosome

13,18, 21, X, and Y; The total number of chromosomes has been reported to be dichromic in the remaining chromosomes. 45, X The X chromosome region of the plot representing the sample indicates the presence of a single chromosome. However, the lack of readings of the Y chromosome coupled with the "two-chromosome" mode for

chromosomes

13,18, and 21 represents a 45, X genotype. Conversely, the 47, XXY sample produces a plot showing the presence of two X chromosomes. The data also indicate a reading for the allele of the Y chromosome. With the presence of two copies of

chromosomes

13, 18, and 21, this represents the 47, XXY genotype. 47, XYY genotype appears as the "one chromosome" pattern for the X chromosome, and the presence of the readings indicating the presence of two Y chromosomes.
Argument
The method detected T13, T18, T21, 45, X, 47, XXY, and 47, XYY non-invasively in maternal blood. The method obtains information of the cfDNA of the maternal plasma by targeted multiplex PCR amplification and high-emission sequencing of 19,488 SNPs. (T13, T18, T21, 45, X) of the seven most common types were coupled with a sophisticated informatics analysis of methods to consider multiple sample parameters, including paternal genomic information, and fetal fraction and DNA quality , 47, XXX, 47, XXY, and 47, XYY), and thus achieves highly precise drainage requirements. The method provides a number of clinical advantages over the previous methods, including most significant clinical ranges and sample-specific calculated precision (similar to individualized risk scores).
Increased Clinical scope
The method provides approximately twofold increase in the range of the immunity compared to the clinically available NIPT methodology, while providing its ability to precisely detect autosomal trisomy and sex chromosomal integrity. The methods presented here are the only noninvasive tests that require high precision for drainage in sex chromosomes. The preceding DNA mixing apparatus and the separate plasma samples analyzed in our experimental assays suggest that the method will detect larger populations of sex chromosomal abnormalities, including 47, XXX. The method presented here is also expected to detect isomerism on

chromosomes

13, 18, and 21 with high sensitivity and specificity and is also capable of detecting copy number in the remaining chromosomes with appropriate primer design.
Sample-specific calculated precision
Significantly, the method calculates sample-specific precision for drainage requests on each chromosome in each sample. The accuracy calculated by this method is expected to significantly reduce the percentage of inaccurate requests by identifying and marking samples of individuals with poor quality DNA or low fetal fractions that tend to cause poor accuracy test results. In contrast, a massively parallel closed sequence analysis (MPSS) -based method generates positive or negative requests using a single hypothesis rejection test, and their precision assessment is based on the published study population rather than on the sample characteristics of the individual , Which is presumed to have the same precision as the population. However, the accuracy of an individual for a sample with a parameter in the tail of the population distribution may be significantly different. This is exacerbated by low fetal fractions, such as at early gestational age, or samples with low DNA quality. These samples are generally not identified and are not marked for further processing, which may lead to missed requests. However, the method achieves sample-specific precision for each chromosome copy number request, considering this many parameters, including fetal fraction, multiple DNA quality criteria. This allows the method to identify samples of the subject with low precision and mark them for subsequent use. This is expected to eliminate almost all missed requests at the early pregnancy stage, especially if the fetal fraction is typically low. This assumption is that it is not a very desirable request for a missed request, since there are no requests that simply require redrawing and reanalysis.
Conversion of calculated precision to traditional risk score
The method can provide a controlled risk of high-risk pregnancy for women, where the risk adjusted is a prioria priori(See Benn P, Cuckle H, Pergament E. Non-invasive prenatal diagnosis for Down syndrome: the paradigm will shift, but slowly. Ultrasound Obstet Gynecol 2012; 39: 127-130, Are incorporated by reference). While this method provides a calculated precision that is common to each patient, for clinical use, these precision can be converted to traditional risk scores, which also represent the risk of isomeric pregnancy, but are expressed in fractions. The traditional risk score is considered a high-risk mortality exceeding the risk score, taking into account various parameters including serum levels of age-related risk and biochemical markers, and a risk score for which subsequent intrusive diagnostic procedures are recommended . The method significantly improves these risk scores, thereby reducing both false positive and false negative ratios and giving a more precise assessment of the individual's overall risk. The precision calculated as used herein is such that the drainage demand is accurate and expressed in percent, but the calculated precision used in Experiment 19 does not include age-related risks. Calculations of risk scores typically involve age-related risks, so calculated precision and traditional risk scores are not interchangeable; They must be converted to traditional risk scores. The formula to sum the age-related risks and the calculated precision is:

(Wherein R_OneIs the risk score calculated by this method and R₂Is the risk score calculated by the first 3 months screening).
SNP -Based methods invalidate the issue with amplification variables
An inherent disadvantage of the counting method used by some other methods is that by measuring the ratio of the number of readings mapped to the desired chromosome (e. G., Chromosome 21) against what they map to the reference chromosome, It is measured. Chromosomes with high or low GC content, including chromosome 13, X, and Y, amplify with high variability. This can lead to a single, comparable, scale change in fetal cfDNA signal, which can confuse the copy number request by changing the ratio of allele readings of the desired chromosome to that of the reference chromosome. This can result in lower precision for chromosome 13, X, and Y. [ Significantly, this problem is exacerbated in the low fetal cfDNA fraction, as is the case in early gestational age.
In contrast, SNP-based methods do not rely on consistent levels of amplification between chromosomes, and are expected to provide equally precise results across all chromosomes. The method considers a statistical number of alleles differing in part in the polymorphic locus, so that the definition differs only by a single nucleotide, which does not require the use of a reference chromosome, which is dependent on quantifying the number of readings To-chromosome amplification parameters inherent in the method of the present invention. Unlike quantitative methods that require a small number of reference chromosomes, this method is predicted to be capable of detecting a copy hydrocephalus abnormality similar to the dysomorphic chromosome, as well as trisomy.
The Importance of Early Detection
Significantly, the combined birth sex of sex chromosomal aberration is greater than that of the most common autosomal complement(Fig. 32). However, there is currently no regular non-invasive screening method to reliably detect sex chromosome abnormalities. Thus, sex chromosomal abnormalities are generally detected prior to birth as a side effect of the normal test for Down syndrome or other autosomal insufficiency; Many of the cases are totally missed. Early and precise detection is important for many of these diseases in which early therapeutic intervention improves clinical outcome. Turner syndrome, for example, is one of 2,500 women with a prevalence of total birth at birth, but is not diagnosed until adolescence. Although growth hormone therapy is known to prevent short stature resulting from the disease, treatment is significantly more effective when initiated before 4 years of age. In addition, estrogen replacement therapy may stimulate secondary growth in patients with Turner syndrome, but therapy should be initiated before teenagers, before the syndrome is generally detected. In addition, it highlights the importance of early, regular, and safe detection of sex chromosomal integration. The method provides a first approach with the possibility of being provided as a regular screen for sex chromosome abnormalities.
Additional application
Since the method utilizes targeted amplification, it is uniquely maintained for detection of supramicroscopic abnormalities such as microdeletion and micronization. Non-targeted methods such as MPSS have been found to detect the DiGeorge microdeletion syndrome, but this required a genome range that was high enough to make the approach impractical. This is because a very small fraction of the sequencing reads would be beneficial, as non-targeted amplification would be several orders of magnitude not effective in the supramicroscopic region. In addition, the fact that currently available methods are problematic for precisely identifying the status of drainage to sex chromosomes suggests that they will also encounter various amplification problems in smaller chromosomal segments.
Similarly, SNP-based methods can be used to detect copy numbers that will not be detected by current invasive methods, such as CVS and amniocentesis, which rely on cytogenetic karyotyping and / or hybridization in a fluorescent reaction system, It is possible to detect UPD disease of neutral abnormality. This is because although the SNP-based method can uniquely distinguish haplotypes of individuals, the clinically available MPSS-based and targeted methods can be used to amplify non-polymorphic gene sites, for example, It is not possible to measure whether it originates. This suggests that these microdeletion / microduplication and UPD syndrome, including Prader-Willi, Angelman, and Beckwith-Wiedemann syndrome, are generally diagnosed prenatally It is often misdiagnosed early after birth. This significantly delayed therapeutic intervention. In addition, since the method targets SNPs, the method also facilitates parental haploid remaking, allowing detection of fetal genomes of disease-linked locus loci of individuals (Kitzman JO, Snyder MW, Ventura M, Noninvasive whole-genome sequencing of a human fetus, etc. Sci Transl Med 2012; 4: 137ra76, the disclosure of which is incorporated herein by reference).
The results presented here demonstrate the extended scope of the method for confirming prenatal fertility. Specifically, by amplifying and sequencing 19,488 SNPs, the method can detect the copy number on

chromosomes

13, 18, 21, X, and Y, and by any other clinically available non-invasive method It is uniquely predicted to detect non-detected triploids and other chromosomal abnormalities such as UPD. Increased clinical range and robust, simple-specific calculated precision suggest that the method can provide a valuable additive for invasive testing to detect fetal chromosomal integrity.
All patents, patent applications, and published references cited herein are incorporated herein by reference in their entirety. While the method herein is described in connection with specific embodiments thereof, it will be understood that it may be further modified. This application is also intended to cover any variations, uses, or adaptations of the methods herein, including those within the scope of the appended claims, including those departing from the present disclosure, As will be understood by those skilled in the art. For example, any of the methods disclosed herein for DNA can be readily tailored to RNA by including a reverse transcription step that converts the RNA to DNA. Examples using polymorphic loci for explanation can be easily adjusted for amplification of nonpolymorphic loci, as the case may be.

<110> Natera, Inc. <120> Highly Multiplex PCR Methods and Compositions <130> IPA141367-US <150> US61/675,020 <151> 2012-07-24 <150> US13/683,604 <151> 2012-11-21 <160> 12 <170> KopatentIn 2.0 <210> 1 <211> 42 <212> DNA <213> Artificial Sequence <220> <223> Synthetic Construct <400> 1 aactcacata gcacacgacg ctcttccgat cttgcaagca ca 42 <210> 2 <211> 39 <212> DNA <213> Synthetic construct <400> 2 tcctctgtga cacgacgctc ttccgatctc cctgctctt 39 <210> 3 <211> 40 <212> DNA <213> Synthetic construct <400> 3 tcctctctct acacgacgct cttccgatct cgggctgtca 40 <210> 4 <211> 42 <212> DNA <213> Synthetic construct <400> 4 tacatccttg agacacgacg ctcttccgat ctgctgtgca gt 42 <210> 5 <211> 42 <212> DNA <213> Synthetic construct <400> 5 tttgcttgag ctacacgacg ctcttccgat ctcgggagtt tc 42 <210> 6 <211> 42 <212> DNA <213> Synthetic construct <400> 6 gtcttatggt ggacacgacg ctcttccgat ctcaaagcca gt 42 <210> 7 <211> 50 <212> DNA <213> Synthetic construct <400> 7 aactcacata gctgatcggt acacgacgct cttccgatct tgcaagcaca 50 <210> 8 <211> 47 <212> DNA <213> Synthetic construct <400> 8 tcctctgtgt gatcggtaca cgacgctctt ccgatctccc tgctctt 47 <210> 9 <211> 48 <212> DNA <213> Synthetic construct <400> 9 tcctctctct tgatcggtac acgacgctct tccgatctcg ggctgtca 48 <210> 10 <211> 50 <212> DNA <213> Synthetic construct <400> 10 tacatccttg agtgatcggt acacgacgct cttccgatct gctgtgcagt 50 <210> 11 <211> 50 <212> DNA <213> Synthetic construct <400> 11 tttgcttgag cttgatcggt acacgacgct cttccgatct cgggagtttc 50 <210> 12 <211> 50 <212> DNA <213> Synthetic construct <400> 12 gtcttatggt ggtgatcggt acacgacgct cttccgatct caaagccagt 50 <110> Natera, Inc. <120> Highly Multiplex PCR Methods and Compositions <130> IPA141367-US &Lt; 150 > US61 / 675,020 <151> 2012-07-24 &Lt; 150 > US13 / 683,604 <151> 2012-11-21 <160> 12 <170> Kopatentin 2.0 <210> 1 <211> 42 <212> DNA <213> Artificial Sequence <220> <223> Synthetic Construct <400> 1 aactcacata gcacacgacg ctcttccgat cttgcaagca ca 42 <210> 2 <211> 39 <212> DNA <213> Synthetic construct <400> 2 tcctctgtga cacgacgctc ttccgatctc cctgctctt 39 <210> 3 <211> 40 <212> DNA <213> Synthetic construct <400> 3 tcctctctct acacgacgct cttccgatct cgggctgtca 40 <210> 4 <211> 42 <212> DNA <213> Synthetic construct <400> 4 tacatccttg agacacgacg ctcttccgat ctgctgtgca gt 42 <210> 5 <211> 42 <212> DNA <213> Synthetic construct <400> 5 tttgcttgag ctacacgacg ctcttccgat ctcgggagtt tc 42 <210> 6 <211> 42 <212> DNA <213> Synthetic construct <400> 6 gtcttatggt ggacacgacg ctcttccgat ctcaaagcca gt 42 <210> 7 <211> 50 <212> DNA <213> Synthetic construct <400> 7 aactcacata gctgatcggt acacgacgct cttccgatct tgcaagcaca 50 <210> 8 <211> 47 <212> DNA <213> Synthetic construct <400> 8 tcctctgtgt gatcggtaca cgacgctctt ccgatctccc tgctctt 47 <210> 9 <211> 48 <212> DNA <213> Synthetic construct <400> 9 tcctctctct tgatcggtac acgacgctct tccgatctcg ggctgtca 48 <210> 10 <211> 50 <212> DNA <213> Synthetic construct <400> 10 tacatccttg agtgatcggt acacgacgct cttccgatct gctgtgcagt 50 <210> 11 <211> 50 <212> DNA <213> Synthetic construct <400> 11 tttgcttgag cttgatcggt acacgacgct cttccgatct cgggagtttc 50 <210> 12 <211> 50 <212> DNA <213> Synthetic construct <400> 12 gtcttatggt ggtgatcggt acacgacgct cttccgatct caaagccagt 50

Claims

(a) contacting a nucleic acid sample with a library of test primers that simultaneously hybridize to at least 1,000 different target gene sites to produce a reaction mixture; And
(b) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising a target amplicon.

The method of claim 1, wherein at least 5,000 different target gene sites are amplified.

2. The method of claim 1, wherein at least 10,000 different target gene sites are amplified.

The method of claim 1, wherein at least 20,000 different target gene sites are amplified.

2. The method of claim 1, wherein at least 30,000 different target gene sites are amplified.

2. The method of claim 1, wherein at least 90% of the amplified product is a target amplicon.

2. The method of claim 1, wherein at least 95% of the amplified product is a target amplicon.

2. The method of claim 1, wherein at least 99% of the amplified product is a target amplicon.

2. The method of claim 1 wherein at least 90% of the targeted gene locus is amplified.

3. The method of claim 1, wherein at least 95% of the targeted gene locus is amplified.

2. The method of claim 1, wherein at least 99% of the targeted gene locus is amplified.

2. The method of claim 1, wherein less than 20% of the amplified product is a test primer dimer.

2. The method of claim 1, wherein less than 10% of the amplified product is a test primer dimer.

2. The method of claim 1, wherein less than 1% of the amplified product is a test primer dimer.

2. The method of claim 1, wherein less than 0.1% of the amplified product is a test primer dimer.

2. The method of claim 1, wherein the test primer is selected from a library of candidate primers based on one or more parameters.

17. The method of claim 16, wherein the test primer is selected from a library of candidate primers that is based at least in part on the ability of the post-treasure primer to form the primer dimer.

18. The method of claim 17,
(i) computing an undesirability score for most or all of the possible combinations of two treble primers from said library on a computer, wherein each trefoil score is determined, at least in part, &Lt; / RTI > based on the possibility of dimer formation between;
(ii) removing the post-treasure primer with the highest degree of weathering score in the library of post-treasure primers;
(iii) if the post-treasure primer removed in step (ii) is a member of a primer pair, removing other members of the primer pair from the library of post-treasure primers; And
(iv) optionally repeating steps (ii) and (iii).

18. The method of claim 17,
(i) computing a weathering score for most or all of the possible combinations of the two post-treasure primers in the library on a computer, wherein each weathering score is determined, at least in part, by an amount between the two post-treasure primers Based on the possibility of sieving;
(ii) removing a trefoil primer that is part of a maximum number of combinations of two trefoil primers having a weathering score above a first minimum threshold in the library of trefoil primers;
(iii) if the post-treasure primer removed in step (ii) is a member of a primer pair, removing other members of the primer pair from the library of post-treasure primers; And
(iv) optionally repeating steps (ii) and (iii).

20. The method of claim 19, further comprising: reducing the first minimum limit used in step (ii) to a lower second minimum limit, and determining that all of the residual rainforest scores for the combination of remaining trefoil primers remaining in the library Repeating step (ii) and step (iii) until the number of post-treasure primers remaining in the library is reduced to a desired number, until the number of post-treasure primers remaining in the library is reduced to a value equal to or less than the second minimum limit, Further reducing the number of post-treasure primers remaining in the library.

21. The method of claim 19, further comprising the steps of: increasing the first minimum limit used in step (ii) to a second, higher minimum limit, and determining whether all of the survival scores for the combination of remaining trefoil primers remaining in the library Repeating steps (ii) and (iii) until the second minimum limit is equal to or less than the second minimum limit, or until the number of post-treasure primers remaining in the library is reduced to a desired number Lt; / RTI >

20. The method of claim 18 or 19, wherein the trefoil primer is selected from the group of two or more trefoil primers having the same < RTI ID = 0.0 > tropism < / RTI > score for removal from the library based on one or more parameters.

The method of claim 1, wherein the concentration of each test primer is less than 100 nM.

2. The method of claim 1, wherein the concentration of each test primer is less than 10 nM.

The method of claim 1, wherein the concentration of each test primer is less than 2 nM.

2. The method of claim 1, wherein the library of test primers comprises at least 1,000 test primer pairs comprising a forward test primer and a reverse test primer, wherein each pair of test primers is hybridized to a target gene locus.

2. The method of claim 1, wherein the library of test primers comprises at least 1,000 individual test primers that hybridize to different target loci, respectively.

The method of claim 1, wherein the GC content of the test primer comprises 30% to 80%.

The method of claim 1, wherein the GC content of the test primer is less than 20%.

The method of claim 1, wherein the melting temperature of the test primer comprises 40 占 폚 to 80 占 폚.

The method of claim 1 wherein the range of melting temperatures of said test primers is less than 5 ° C.

The method of claim 1, wherein the test primer comprises 17 to 35 nucleotides in length.

2. The method of claim 1, wherein the test primer comprises a tag that is not a target specific.

34. The method of claim 33, wherein the tag forms an inner loop structure.

36. The method of claim 35, wherein the tag is between two DNA binding sites.

35. The method of claim 34, wherein the test primer comprises a 5 ' region specific for the target gene locus, an internal region that is not specific to the targeted gene locus and forms a loop structure, 3 ' region. &Lt; / RTI >

37. The method of claim 36, wherein the 3 ' region is at least 7 nucleotides in length.

38. The method of claim 37, wherein the length of the 3 ' region comprises 7 to 20 nucleotides.

34. The method of claim 33, wherein said test primer comprises a 5 ' region that is not specific to the target gene locus, a region specific to the target gene locus, an internal region that is not specific to the targeted gene locus, Region, and a 3 ' region specific for the targeted gene locus.

2. The method of claim 1, wherein the range of lengths of the target amplicon is less than 15 nucleotides.

2. The method of claim 1, wherein the primer extension reaction conditions are polymerase chain reaction conditions (PCR).

42. The method of claim 41 wherein the length of the annealing step is greater than 10 minutes.

The method of claim 1, further comprising measuring the presence or absence of at least one target amplicon.

3. The method of claim 1, further comprising measuring a sequence of at least one target amplicon.

2. The method of claim 1, wherein the targeted gene locus is present on the same chromosome of interest.

2. The method of claim 1, wherein at least a portion of the targeted gene locus is present in a different chromosomal nucleic acid of interest.

The method of claim 1, wherein the nucleic acid sample comprises a fragmented or degraded nucleic acid.

2. The method of claim 1, wherein the nucleic acid sample comprises genomic DNA, cDNA, or mRNA.

The method of claim 1, wherein the nucleic acid sample comprises DNA from a single cell.

The method according to claim 1, wherein the nucleic acid sample comprises or is derived from blood, plasma, saliva, sperm, cell culture supernatant, mucosal secretion fluid, dental plaque, gastrointestinal tissue, feces, urine or a forensic sample .

2. The method of claim 1, wherein the target gene site is a segment of a human nucleic acid.

2. The method of claim 1, wherein the target gene locus comprises a single nucleotide polymorphism.

(a) computing a weathering score for most or all of the possible combinations of two post-treasure primers from a library of post-treasure primers on a computer, wherein each weathering score is at least partially between the two post-treasure primers Based on the possibility of dimer formation;
(b) removing the post-treasure primer with the highest weathering score in the library of post-treasure primers;
(c) if the post-treasure primer removed in step (b) is a member of a primer pair, removing other members of the primer pair from the library of post-treasure primers; And
(d) optionally selecting a library of test primers by repeating steps (b) and (c).

(a) computing a weathering score for most or all of the possible combinations of two post-treasure primers from a library of post-treasure primers on a computer, wherein each weathering score is at least partially between the two post-treasure primers Based on the possibility of dimer formation;
(b) removing the trefoil primer that is part of a maximum number of combinations of two trefoil primers with a weathering score exceeding a first minimum limit in the library of trefoil primers;
(c) if the post-treasure primer removed in step (b) is a member of a primer pair, removing other members of the primer pair from the library of post-treasure primers; And
(d) selecting a library of test primers by repeating steps (b) and (c), optionally, selecting a test primer from a library of post-treasure primers.

55. The method of claim 54, wherein the first minimum limit used in step (b) is reduced to a lower second minimum limit, and all the rain- Repeating steps (b) and (c) until it is at least equal to the minimum limit or less than the second minimum limit, or until the number of post-treasure primers remaining in the library is reduced to a desired number Further reducing the number of post-treasure primers.

55. The method of claim 54, further comprising: increasing the first minimum limit used in step (b) to a second, higher minimum limit, and determining that both the rain storability score for the post- Repeating steps (b) and (c) until it is equal to or less than said second minimum limit, or until the number of post-treasure primers remaining in said library is reduced to a desired number.

54. The method of claim 53 or 54, wherein the post-treasure primer is selected from the group of two or more post-treasure primers having an equivalent toughness score for removal from the library of post-treasure primers based on one or more other parameters Way.

54. The method of claim 53 or 54, wherein said weathering score relates to at least partially the heterozygosity ratio of the locus targeted, the disease prevalence associated with the polymorphism or mutation at the targeted locus, the polymorphism or mutation at the locus of the targeted locus The specificity of the post-treasure primer to the targeted gene locus, the size of the post-treasure primer, the melting temperature of the target ampicillon, the GC content of the target ampicillon, the amplification efficiency of the target ampicillon, and the size of the target ampicillon &Lt; / RTI > based on at least one parameter selected from the group consisting of:

59. The method of claim 58, wherein the weathering score is determined at least in part by the heterozygosity ratio of the targeted gene locus, the specificity of the post-treasure primer at the targeted locus, the size of the post-treasure primer, The temperature, the GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; Wherein said test primer is used to measure the presence or absence of fetal chromosomal abnormality by simultaneously amplifying at least 1,000 target gene sites in a sample containing maternal DNA and fetal DNA of a pregnant mother of the fetus.

60. The method of claim 59, further comprising: coupling a common primer binding site to a DNA molecule in the sample; Amplifying the linked DNA molecules using at least 1,000 specific primers and a common primer to produce a first set of amplified products; And amplifying the first set of amplified products using at least 1,000 pairs of specific primers to generate a second set of amplified products.

59. The method of claim 58, wherein the weathering score is at least partially dependent on the heterozygosity ratio of the target gene locus, the specificity of the post-treasure primer to the targeted locus, the size of the post-treasure primer, The GC content of the target amplicon, the amplification efficiency of the target amplicon, and the size of the target amplicon; Wherein said test primer simultaneously amplifies at least 1,000 target gene sites in a sample comprising the DNA of the claimed fetus of a fetus and expresses said target gene locus in a sample comprising maternal DNA and fetal DNA of said pregnant mother of said fetus To amplify at the same time, thereby establishing whether the claimed father is a biological father of the fetus.

59. The method of claim 58, wherein said weathering score is at least partially a heterozygosity ratio of said targeted gene locus, specificity of a post-treasure primer to said targeted locus of the gene; The size of the target primer, and the size of the target amplicon; Wherein said test primer is used to simultaneously amplify at least 1,000 target gene sites in a forensic nucleic acid sample using an annealing time of at least 5 minutes.

59. The method of claim 58, wherein said test primer is used to simultaneously amplify at least 1,000 target gene sites in a control nucleic acid sample to produce a first set of target amplicons and to simultaneously amplify said targeted gene sites in a test nucleic acid sample Producing a second set of target amplicons; And comparing the first and second sets of target amplicons to determine whether the target gene locus is present in one sample but not in the other or whether the target gene locus is present at a different level in the control sample and the test sample &Lt; / RTI >

63. The method of claim 63, wherein the test sample is of a subject having a desired disease or phenotype, or suspected of having increased risk for the desired disease or phenotype; Wherein the at least one of the targeted loci comprises a polymorphism associated with an increased risk for the desired disease or phenotype or associated with the desired disease or phenotype.

59. The method of claim 58, wherein said test primer is used to simultaneously amplify at least 1,000 target gene sites in a control sample comprising RNA to produce a first set of target amplicons, and the target gene locus To produce a second set of target amplicons; And comparing the first and second sets of target amplicons to determine the presence or absence of a difference in RNA expression level between the control sample and the test sample.

66. The method of claim 65, wherein the RNA is mRNA.

65. The method of claim 65, wherein the test sample is derived from an individual suspected of having a cancer or increased risk for cancer, wherein one or more of the targeted gene locus is associated with an increased risk for cancer, Or other mutations.

66. The method of claim 65, wherein the test sample is from an individual diagnosed with cancer; Wherein the difference in the level of RNA expression between the control sample and the test sample represents a target gene locus comprising a polymorphism or other mutation associated with an increased or decreased risk for cancer.

54. The method of claim 53 or 54 further comprising, prior to step (b), removing from the library a primer pair that produces a target amplicon that overlaps the target amplicon produced by the other primer pair .

54. The method of claim 53 or 54, wherein the post-treasure primer remaining in the library is capable of simultaneously amplifying at least 1,000 different target gene sites.

54. The method of claim 53 or 54,
(e) contacting a nucleic acid sample comprising a target gene locus with a post-treasure primer remaining in the library to produce a reaction mixture; And
(f) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising the target amplicon.

A library of primers in which at least 1,000 different target gene sites are amplified simultaneously so that less than 30% of the amplified product is the test primer dimer.

A library of primers that simultaneously amplify at least 1,000 different target gene sites so that at least 80% of the amplified product is a target amplicon.

A library of primers that simultaneously amplifies a target gene locus such that at least 80% of the targeted gene loci in at least 1,000 different target gene loci are amplified.

74. The library according to any one of claims 72 to 74, wherein at least 5,000 different target gene sites are amplified.

74. The library according to any one of claims 72 to 74, wherein at least 10,000 different target gene sites are amplified.

74. The library according to any one of claims 72 to 74, wherein at least 20,000 different target gene sites are amplified.

74. The library according to any one of claims 72 to 74, wherein at least 30,000 different target gene sites are amplified.

74. The library according to any one of claims 72 to 74, wherein at least 90% of the amplified product is a target amplicon.

74. The library according to any one of claims 72 to 74, wherein at least 95% of the amplified product is a target amplicon.

74. The library according to any one of claims 72 to 74, wherein at least 99% of the amplified product is a target amplicon.

74. The library according to any one of claims 72 to 74, wherein at least 90% of the targeted gene locus is amplified.

74. The library according to any one of claims 72 to 74, wherein at least 95% of the targeted gene locus is amplified.

74. The library according to any one of claims 72 to 74, wherein at least 99% of the targeted gene locus is amplified.

74. The library according to any one of claims 72 to 74, wherein less than 20% of the amplified product is primer-derived.

74. The library according to any one of claims 72 to 74, wherein less than 10% of the amplified product is primer-derived.

74. The library according to any one of claims 72 to 74, wherein less than 1% of the amplified product is primer-derived.

74. The library according to any one of claims 72 to 74, wherein less than 0.1% of the amplified product is primer-derived.

74. The library of any one of claims 72-74, comprising at least 1,000 primer pairs, each primer pair comprising a forward primer and a reverse primer that hybridize to a target gene locus.

74. The library of any one of claims 72-74, comprising at least 1,000 individual primers that hybridize to different target gene sites.

(i) the library according to any one of claims 72 to 74; and (iii) instructions for amplifying the target gene locus using the library. Kit.

(a) contacting a nucleic acid sample with a library of primers that simultaneously hybridize to at least 1,000 different polymorphic loci to produce a reaction mixture; Wherein the nucleic acid sample comprises maternal DNA of the mother of the pregnant fetus and fetal DNA of the fetus;
(b) applying the reaction mixture to primer extension reaction conditions to produce an amplified product;
(c) measuring the amplified product with a high-emission sequencer to produce sequencing data;
(d) computing, on the computer, the number of alleles at the polymorphic locus based on the sequence analysis data;
(e) generating, on the computer, a plurality of drainage hypotheses relating to different possible drainage states of chromosomes, respectively;
(f) constructing, on the computer, a binding distribution model for the number of alleles predicted at said polymorphic locus on said chromosome for each diploid hypothesis;
(g) measuring, on the computer, the relative probabilities of each of the drainage hypotheses using the combined distribution model and the number of alleles; And
(h) requesting a morbidity condition of the fetus by selecting a mating condition corresponding to a hypothesis having a maximum probability.

(a) contacting a sample comprising a mixture of maternal and fetal DAN with a library of primers that simultaneously hybridize to at least 1,000 different target gene sites to produce a reaction mixture; Wherein the targeted gene locus is from a plurality of different chromosomes; Said plurality of different chromosomes comprising at least one first chromosome suspected of having an abnormal distribution in said sample and at least one second chromosome generally presumed to be distributed in said sample;
(b) applying the reaction mixture to primer extension reaction conditions to produce an amplified product;
(c) sequencing the amplified product to obtain a plurality of sequence tags aligned with the targeted gene locus; Wherein the sequence tag is of sufficient length to be assigned to a particular target gene locus;
(d) assigning, on the computer, the plurality of sequence tags to their corresponding target gene positions;
(e) on the computer, measuring the number of sequence tags aligned to the targeted gene locus of the first chromosome and the number of sequence tags aligned to the targeted gene locus of the second chromosome; And
(f) comparing, on the computer, the numbers from step (e) to determine the presence or absence of an abnormal distribution of said first chromosome, in a sample comprising a mixture of maternal and fetal DNA How to test an abnormal distribution.

(a) contacting a sample comprising a mixture of maternal and fetal DNA with a library of primers that simultaneously hybridize to at least 1,000 different non-polymorphic target gene sites to produce a reaction mixture; Wherein the targeted gene locus is from a plurality of different chromosomes;
(b) applying the reaction mixture to primer extension reaction conditions to produce an amplified product comprising the target amplicon;
(c) quantifying, on the computer, the relative frequency of the target amplicon of the first and second chromosomes of interest;
(d) comparing, on the computer, the relative frequencies of the target amplicons of the first and second chromosomes of interest;
(e) identifying the presence or absence of aneuploidy based on the compared relative frequencies of the first and second chromosomes of interest.

(a) simultaneously amplifying a plurality of polymorphic loci comprising at least 1,000 different polymorphic loci in the claimed parental genetic material to produce a first set of amplified products;
(b) simultaneously amplifying a corresponding plurality of polymorphic loci in a mixed sample of DNA originating from a blood sample of a pregnant mother to produce a second set of amplified products, wherein said mixed sample of DNA Fetal DNA and maternal DNA;
(c) using the genetic measure based on the first and second sets of amplified products on the computer to measure the probability that the claimed father is a biological father of the fetus; And
(d) establishing that the asserted father is the biological father of the fetus using the measured probability that the asserted father is a biological father of the fetus, wherein the asserted father is pregnant in a pregnant mother How to establish the biological father of the fetus.

95. The method of claim 95, further comprising simultaneously amplifying a corresponding plurality of polymorphic genes on the genetic material of the mother to obtain a third set of amplified products; Wherein the probability that the claimed father is a biological father of the fetus is measured using a genotyping measure based on the first, second, and third sets of amplified products.

(a) amplifying a nucleic acid sample comprising a first standard gene locus, a second standard locus, a first target gene locus, and a second target locus using PCR to obtain an amplified product as; Wherein the first standard gene spot and the first target gene spot have the same number of nucleotides but different sequences in one or more nucleotides; Wherein the second standard gene spot and the second target gene spot have the same number of nucleotides but different sequences in one or more nucleotides;
(b) comparing the amplified product to a second standard gene spot amplified by sequencing and comparing the relative amount of the amplified first standard gene spot to determine a standard ratio; Wherein the standard ratio indicates a difference in PCR efficiency for amplification of the first standard gene locus and the second standard gene locus;
(c) measuring a target ratio comparing the relative amount of the amplified first target gene locus to a second amplified target gene locus;
(d) adjusting the target ratio from step (c) based on the standard ratio from step (b) to determine the relative amount of the first target gene spot and the second target gene spot in the sample And measuring the amount of two or more target gene sites in the nucleic acid sample.

97. The method of claim 97, further comprising measuring an absolute amount of a first target gene spot and a second target gene spot in the sample.

97. The method of claim 97, comprising measuring the presence or absence of a target gene locus in the sample.

98. The method of claim 97, comprising simultaneously amplifying at least 1,000 different target gene sites.

98. The method of claim 97, wherein the targeted gene locus is present on the same desired chromosome.

98. The method of claim 97, wherein at least a portion of the target gene is present on a different chromosome of interest.