WO2016011982A1 - 确定生物样本中游离核酸比例的方法、装置及其用途 - Google Patents

确定生物样本中游离核酸比例的方法、装置及其用途 Download PDF

Info

Publication number
WO2016011982A1
WO2016011982A1 PCT/CN2015/085109 CN2015085109W WO2016011982A1 WO 2016011982 A1 WO2016011982 A1 WO 2016011982A1 CN 2015085109 W CN2015085109 W CN 2015085109W WO 2016011982 A1 WO2016011982 A1 WO 2016011982A1
Authority
WO
WIPO (PCT)
Prior art keywords
ratio
chromosome
predetermined
free
threshold
Prior art date
Application number
PCT/CN2015/085109
Other languages
English (en)
French (fr)
Inventor
蒋馥蔓
袁玉英
王威
尹烨
Original Assignee
深圳华大基因股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因股份有限公司 filed Critical 深圳华大基因股份有限公司
Priority to AU2015292020A priority Critical patent/AU2015292020B2/en
Priority to PL15825462T priority patent/PL3178941T3/pl
Priority to RS20220024A priority patent/RS62803B1/sr
Priority to US15/329,148 priority patent/US20180082012A1/en
Priority to EP15825462.3A priority patent/EP3178941B1/en
Priority to SG11201700602WA priority patent/SG11201700602WA/en
Priority to SI201531771T priority patent/SI3178941T1/sl
Priority to CA2956105A priority patent/CA2956105C/en
Priority to RU2017105504A priority patent/RU2699728C2/ru
Priority to ES15825462T priority patent/ES2903103T3/es
Priority to BR112017001481-5A priority patent/BR112017001481B1/pt
Priority to KR1020177004842A priority patent/KR102018444B1/ko
Priority to HRP20220045TT priority patent/HRP20220045T1/hr
Priority to DK15825462.3T priority patent/DK3178941T3/da
Publication of WO2016011982A1 publication Critical patent/WO2016011982A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/10Nucleic acid folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • the present invention relates to the field of biotechnology, and in particular to a method, device and use thereof for determining the proportion of free nucleic acids in a biological sample.
  • the present invention aims to solve at least one of the technical problems existing in the prior art. To this end, it is an object of the present invention to provide a method for accurately and efficiently determining the proportion of free nucleic acids of a predetermined source in a biological sample.
  • the inventors invented a procedure for estimating the proportion of free fetal DNA based on maternal plasma sequencing, and the method is widely applicable and can be applied to the field of free DNA of different origins. For example, it can also be used to estimate the proportion of cancer-derived DNA in peripheral blood of a tumor patient.
  • the invention provides a method of determining the proportion of free nucleic acids of a predetermined source in a biological sample.
  • the method comprises: (1) performing nucleic acid sequencing on a biological sample containing free nucleic acid to obtain sequencing results of a plurality of sequencing data; (2) determining a length in the sample based on the sequencing result The number of nucleic acid molecules falling within a predetermined range; and (3) determining the ratio of the free nucleic acids in the biological sample based on the number of nucleic acid molecules whose length falls within a predetermined range.
  • the method of the present invention can accurately and efficiently determine the proportion of free nucleic acids from a predetermined source in a biological sample, and is particularly suitable for determining free fetal nucleic acid in peripheral blood of pregnant women, as well as tumor patients, suspected tumor patients or tumor screening.
  • the proportion of free tumor nucleic acid in peripheral blood is particularly suitable for determining free fetal nucleic acid in peripheral blood of pregnant women, as well as tumor patients, suspected tumor patients or tumor screening.
  • the biological sample is peripheral blood.
  • the free nucleic acid of the predetermined source is free fetal nucleic acid or maternal-derived free nucleic acid in the peripheral blood of the pregnant woman, or free tumor nucleic acid in the peripheral blood of the tumor patient, the suspected tumor patient or the tumor screening person or Free nucleic acids of non-tumor origin.
  • the ratio of free fetal nucleic acid in the peripheral blood of the pregnant woman can be easily determined, and the ratio of the free tumor nucleic acid in the peripheral blood of the latter tumor patient can be easily determined. example.
  • the nucleic acid is DNA.
  • the sequencing result comprises the length of the free nucleic acid.
  • the sequencing is double-end sequencing, single-end sequencing or single molecule sequencing.
  • the length of the free nucleic acid is easily obtained, which is advantageous for the subsequent steps.
  • the nucleic acid is DNA.
  • step (2) further comprises: (2-1) aligning the sequencing result with a reference genome to construct a unique alignment sequencing data set, each of the unique alignment sequencing data sets One sequencing data can only match one position of the reference genome; (2-2) determining the length of the nucleic acid molecule corresponding to each sequencing data in the unique alignment sequencing data set; and (2-3) determining the length falling The number of nucleic acid molecules of the predetermined range.
  • the sequence length at which the sequencing data can be matched with the reference genome is taken as the length of the nucleic acid molecule corresponding to the sequencing data.
  • the sequencing is double-end sequencing
  • the method comprises: (2-2-1) sequencing data based on one side of the double-end sequencing data, Determining the 5' end position of the nucleic acid on the reference genome; (2-2-2) determining the 3' of the nucleic acid on the reference genome based on the other side sequencing data of the double end sequencing data End position; and (2-2-3) determining the length of the nucleic acid based on the 5' end position of the nucleic acid and the 3' end position of the nucleic acid.
  • the predetermined range is determined based on a plurality of control samples, wherein a proportion of free nucleic acids of a predetermined source in the control sample is known.
  • the determined predetermined range results are accurate and reliable.
  • the predetermined range is determined based on at least 20 control samples.
  • the predetermined range is determined by: (a) determining a length of a free nucleic acid molecule contained in the plurality of control samples; (b) setting a plurality of candidate length ranges, and respectively Determining a frequency at which the plurality of control samples exhibit free nucleic acid molecules within each candidate length range; (c) determining a frequency of free nucleic acid molecules in each candidate length range based on the plurality of control samples and a predetermined source in the control sample a ratio of free nucleic acids, a correlation coefficient for determining a ratio of each of said candidate length ranges to said predetermined source of nucleic acid; and (d) selecting a candidate length range having a maximum correlation coefficient as said predetermined range.
  • the predetermined range can be determined accurately and efficiently.
  • the candidate length spans from 5 to 20 bp.
  • the step (3) further comprises: (3-1) determining a frequency at which the free nucleic acid molecule occurs within the predetermined range based on the number of nucleic acid molecules whose length falls within a predetermined range; and (3) 2) determining a ratio of free nucleic acids of a predetermined source in the biological sample based on a predetermined function based on a frequency at which the free nucleic acid molecules occur within the predetermined range, wherein the predetermined function is based on the plurality of controls Sample determined.
  • the ratio of the free nucleic acid in the biological sample can be effectively determined, and the result is accurate and reliable, and the repeatability is good.
  • the predetermined function is obtained by (i) determining, in the plurality of control samples, the frequency of occurrence of free nucleic acid molecules within the predetermined range; and (ii) Comparing the plurality of control samples within the predetermined range
  • the frequency at which the free nucleic acid molecule is present is fitted to a known ratio of free nucleic acids of a predetermined source to determine the predetermined function.
  • the fit is a linear fit.
  • the free nucleic acid of the predetermined source is free fetal nucleic acid in the peripheral blood of the pregnant woman, the predetermined range being 185 to 204 bp.
  • control sample is a maternal peripheral blood sample of known free fetal nucleic acid ratio.
  • the predetermined range is determined to be accurate.
  • control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the known free fetal nucleic acid ratio is determined using the Y chromosome.
  • the predetermined range is determined to be accurate.
  • the free nucleic acid ratio of the control sample is the ratio of free fetal DNA, and the ratio of free fetal DNA is estimated using the Y chromosome.
  • the ratio of the free nucleic acid of the control sample can be effectively utilized to determine the predetermined range, thereby determining the number of nucleic acid molecules whose length falls within a predetermined range in the sample of the pregnant woman to be tested and the proportion of free fetal DNA in the sample of the pregnant woman to be tested.
  • the invention also provides an apparatus for determining the proportion of free nucleic acids of a predetermined source in a biological sample.
  • the apparatus includes: a sequencing device for performing nucleic acid sequencing on a biological sample containing free nucleic acid to obtain a sequencing result of a plurality of sequencing data; a counting device, the counting device and the The sequencing device is connected, and is configured to determine, according to the sequencing result, a number of nucleic acid molecules whose length falls within a predetermined range in the sample; and a free nucleic acid ratio determining device connected to the counting device And determining a ratio of the free nucleic acid of the predetermined source in the biological sample based on the number of nucleic acid molecules falling within a predetermined range based on the length.
  • the apparatus of the present invention is suitable for performing the method of the present invention for determining the proportion of free nucleic acids of a predetermined source in a biological sample as described above, thereby enabling the accurate and efficient determination of a predetermined source of biological samples using the apparatus of the present invention.
  • the ratio of free nucleic acids is particularly useful for determining the proportion of free fetal nucleic acid in the peripheral blood of pregnant women, as well as the free tumor nucleic acid in the peripheral blood of tumor patients, suspected tumor patients or tumor screeners.
  • the biological sample is peripheral blood.
  • the free nucleic acid of the predetermined source is free fetal nucleic acid or maternal-derived free nucleic acid in the peripheral blood of the pregnant woman, or free tumor nucleic acid in the peripheral blood of the tumor patient, the suspected tumor patient or the tumor screening person or Free nucleic acids of non-tumor origin.
  • the ratio of free fetal nucleic acid in the peripheral blood of the pregnant woman, or the ratio of free tumor nucleic acid in the peripheral blood of the tumor patient, the suspected tumor patient or the tumor screening person can be easily determined.
  • the nucleic acid is DNA.
  • the sequencing result comprises the length of the free nucleic acid.
  • the sequencing is double-end sequencing, single-end sequencing or single molecule sequencing.
  • the length of the free nucleic acid is easily obtained, which is advantageous for the subsequent steps.
  • the counting device further comprises: a aligning unit for aligning the sequencing result with a reference genome to construct a unique alignment sequencing data set, the unique ratio Each of the sequencing data in the sequencing data set can only match one position of the reference genome; a first length determining unit, the first length determining unit and the comparing unit And determining a length of the nucleic acid molecule corresponding to each of the sequencing data in the unique alignment sequencing data set; and a number determining unit connected to the first length determining unit for determining the length falling The number of nucleic acid molecules of the predetermined range.
  • the first length determining unit matches the sequence length of the sequencing data to the reference genome as the length of the nucleic acid molecule corresponding to the sequencing data.
  • the sequencing is a double-end sequencing
  • the first length determining unit further comprises: a 5' end position determining module, the 5' end position determining module for sequencing based on the double end Data is sequenced on one side of the data, on the reference genome, the 5' end position of the nucleic acid is determined; a 3' end position determining module is connected to the 5' end position determining module, For sequencing data based on the other side of the double-end sequencing data, on the reference genome, determining a 3' end position of the nucleic acid; and a length calculation module, the length calculation module and the 3' end position A determining module is ligated for determining the length of the nucleic acid based on the 5' end position of the nucleic acid and the 3' end position of the nucleic acid.
  • predetermined range determining means for determining the predetermined range based on a plurality of control samples, wherein a proportion of free nucleic acids of a predetermined source in the control sample is known
  • the predetermined range is determined based on at least 20 control samples.
  • the predetermined range determining device further includes: a second length determining unit configured to determine a length of the free nucleic acid molecule contained in the plurality of control samples; the first ratio Determining a unit, the first ratio determining unit is connected to the second length determining unit, configured to set a plurality of candidate length ranges, and respectively determine that the plurality of control samples exhibit free nucleic acids in each of the candidate length ranges a frequency of a molecule; a correlation coefficient determining unit, the correlation coefficient determining unit being connected to the first ratio determining unit, configured to generate a frequency of the free nucleic acid molecule in each candidate length range based on the plurality of control samples a ratio of free nucleic acids of a predetermined source in the control sample, a correlation coefficient of each of the candidate length ranges and a ratio of free nucleic acids of the predetermined source; and a predetermined range determining unit, the predetermined range determining unit and the correlation coefficient Determining units connected for selecting a candidate length
  • the candidate length spans from 1 to 20 bp.
  • the plurality of candidate length ranges have a step size of 1 to 2 bp.
  • the free nucleic acid ratio determining device further includes: a second ratio determining unit configured to determine the number of nucleic acid molecules falling within a predetermined range based on the length a frequency at which a free nucleic acid molecule occurs within a predetermined range; and a free nucleic acid ratio calculating unit connected to the second ratio determining unit for based on a frequency at which a free nucleic acid molecule occurs within the predetermined range, A ratio of free nucleic acids of a predetermined source in the biological sample is determined according to a predetermined function, wherein the predetermined function is determined based on the plurality of control samples. Thereby, the ratio of the free nucleic acid in the biological sample can be effectively determined, and the result is accurate and reliable, and the repeatability is good.
  • the predetermined function determining means comprising: a third ratio determining unit, configured to determine in the plurality of control samples respectively a frequency at which a free nucleic acid molecule occurs within a predetermined range; and a fitting unit coupled to the third ratio determining unit for displaying free nucleic acids within the predetermined range of the plurality of control samples Fitting a frequency of the molecule with a known ratio of free nucleic acids of a predetermined source to determine the pre-determination
  • the function is fixed.
  • the determined predetermined function is accurate and reliable, facilitating the subsequent steps.
  • the fit is a linear fit.
  • the free nucleic acid of the predetermined source is free fetal nucleic acid in the peripheral blood of the pregnant woman, the predetermined range being 185 to 204 bp.
  • control sample is a maternal peripheral blood sample of known free fetal nucleic acid ratio.
  • control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the known free fetal nucleic acid ratio is determined using the Y chromosome.
  • the predetermined range is determined to be accurate.
  • the free nucleic acid ratio of the control sample is a free fetal DNA ratio
  • the free fetal DNA ratio is obtained using a device suitable for performing Y chromosome estimation.
  • the invention provides a method of determining the sex of a twin.
  • the method comprises: (1) performing free nucleic acid sequencing on peripheral blood of a pregnant woman with twins to obtain sequencing results composed of a plurality of sequencing data; (2) based on the sequencing data, according to Determining a ratio of free nucleic acid in the biological sample as described above, determining a ratio of the first free fetal DNA; (3) determining a second free fetal DNA ratio based on the sequencing data derived from the Y chromosome in the sequencing result; and (4) And determining the sex of the twins based on the ratio of the first free fetal DNA to the second free fetal DNA.
  • the inventors have surprisingly found that the twins of pregnant women with twins can be accurately and efficiently determined using the method of the present invention.
  • the second free fetal DNA ratio is determined based on the following formula:
  • Fra.chry (chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
  • fra.chry represents the proportion of the second free fetal DNA
  • chry.ER% represents the percentage of the sequencing data derived from chromosome Y in the sequencing result as a percentage of the total sequencing data
  • Female.chry.ER% indicates a predetermined Sequencing data of free nucleic acids derived from chromosome Y in peripheral blood samples of pregnant women with normal fetuses as an average of the percentage of total sequencing data
  • Man.chry.ER% indicates pre-determined normal male peripheral blood samples derived from chromosomes
  • the sequencing data of the free nucleic acid of Y is the average of the percentage of total sequencing data.
  • step (4) comprising: (a) determining a ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA; and (b) in step (a) The resulting ratio is compared to a predetermined first threshold and second threshold to determine the gender of the twins. Thereby effective Determine the gender of the twins.
  • the first threshold is determined based on a reference sample in which a plurality of known twins are female fetuses
  • the second threshold being determined based on a plurality of reference samples in which the known twins are male fetuses .
  • a ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA is less than the first threshold, indicating that the twins are female fetuses, and the ratio of the second free fetal DNA is The ratio of the ratio of the first free fetal DNA to the second threshold indicates that the twins are both male, and the ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA is equal to the first threshold Or a second threshold, or between the first threshold and the second threshold, indicating that the twin comprises a male and a female.
  • the first threshold is 0.35 and the second threshold is 0.7.
  • the invention provides a system for determining the sex of twins.
  • the system comprises: a first free fetal DNA ratio determining device, wherein the first free fetal DNA ratio determining device is a device for determining a ratio of free nucleic acids in a biological sample as described above, for The peripheral blood of a pregnant woman of twins is subjected to free nucleic acid sequencing to obtain a sequencing result composed of a plurality of sequencing data, and based on the sequencing data, a first free fetal DNA ratio is determined; a second free fetal DNA ratio determining device, said a second free fetal DNA ratio determining device adapted to determine a second free fetal DNA ratio based on sequencing data derived from the Y chromosome in the sequencing result; and a gender determining device adapted to be based on the first free fetal The ratio of DNA to the ratio of the second free fetal DNA determines the sex of the twins.
  • the inventors is a device for determining a ratio of
  • the second free fetal DNA ratio is determined based on the following formula:
  • Fra.chry (chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
  • fra.chry represents the proportion of the second free fetal DNA
  • chry.ER% represents the percentage of the sequencing data derived from chromosome Y in the sequencing result as a percentage of the total sequencing data
  • Female.chry.ER% indicates a predetermined Sequencing data of free nucleic acids derived from chromosome Y in peripheral blood samples of pregnant women with normal fetuses as an average of the percentage of total sequencing data
  • Man.chry.ER% indicates pre-determined normal male peripheral blood samples derived from chromosomes
  • the sequencing data of the free nucleic acid of Y is the average of the percentage of total sequencing data.
  • the sex determination device further includes: a ratio determining unit configured to determine a ratio of the second free fetal DNA ratio to the first free fetal DNA ratio; and a comparison unit And the comparing unit is configured to compare the obtained ratio with a predetermined first threshold and a second threshold to determine the gender of the twins. Thereby, the gender of the twins can be effectively determined.
  • the first threshold is determined based on a reference sample in which a plurality of known twins are female fetuses
  • the second threshold being determined based on a plurality of reference samples in which the known twins are male fetuses .
  • a ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA is less than the first threshold, indicating that the twins are female fetuses, and the ratio of the second free fetal DNA is The ratio of the ratio of the first free fetal DNA to the second threshold indicates that the twins are male fetuses, and the ratio of the second free fetus DNA to the first free fetus A ratio of DNA ratios is equal to the first threshold or a second threshold, or between the first threshold and the second threshold indicating that the twins comprise a male and a female.
  • the first threshold is 0.35 and the second threshold is 0.7.
  • the invention provides a method of detecting aneuploidy in a twin chromosome.
  • the method comprises: (1) performing free nucleic acid sequencing on peripheral blood of a pregnant woman with twins to obtain sequencing results composed of a plurality of sequencing data; (2) based on the sequencing data, according to Determining a ratio of free nucleic acid in the biological sample as described above, determining a ratio of the first free fetal DNA; (3) determining a third free fetal DNA ratio based on the sequencing data derived from the predetermined chromosome in the sequencing result; and (4) Determining whether aneuploidy is present in the twin for the predetermined chromosome based on the ratio of the first free fetal DNA to the third free fetal DNA.
  • step (4) comprising: (a) determining a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA; and (b) in step (a) The resulting ratio is compared to a predetermined third threshold and a fourth threshold to determine if the twins have aneuploidy for the predetermined chromosome.
  • the third threshold is determined based on a reference sample of a plurality of known twins for which there is no aneuploidy for the predetermined chromosome, the fourth threshold being based on a plurality of known twins
  • the predetermined chromosomes are all determined by reference samples of aneuploidy.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the third threshold, indicating that the twins have no aneuploidy for the predetermined chromosome.
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the fourth threshold indicating that the twins are all aneuploidy for the predetermined chromosome, and the ratio of the second free fetal DNA is The ratio of the ratio of the first free fetal DNA is equal to the third threshold or the fourth threshold, or between the third threshold and the fourth value indicating that one of the twins has a non-target for the predetermined chromosome The euploid, the other one does not have aneuploidy for the predetermined chromosome.
  • the third threshold is 0.35 and the fourth threshold is 0.7.
  • the predetermined chromosome is at least one of chromosomes 18, 21 and 23.
  • the present invention provides a system for determining aneuploidy of twin chromosomes.
  • the system comprises: a first free fetal DNA ratio determining device, wherein the first free fetal DNA ratio determining device is a device for determining a ratio of free nucleic acids in a biological sample as described above, for The peripheral blood of a pregnant woman of twins is subjected to free nucleic acid sequencing to obtain a sequencing result composed of a plurality of sequencing data, and based on the sequencing data, the first free fetal DNA ratio is determined; the third free fetal a DNA ratio determining device, wherein the third free fetal DNA ratio determining device is adapted to determine a third free fetal DNA ratio based on sequencing data derived from a predetermined chromosome in the sequencing result; and a first aneuploidy determining device, The first aneuploidy determining device is adapted to determine whether aneuploidy is present in the twin
  • the ratio of the third free fetal DNA can be accurately determined.
  • the aneuploidy determining apparatus further includes: a ratio determining unit configured to determine a ratio of the third free fetal DNA ratio to the first free fetal DNA ratio; and comparing And comparing the obtained ratio to a predetermined third threshold and a fourth threshold to determine whether the twins have aneuploidy for the predetermined chromosome.
  • the third threshold is determined based on a reference sample of a plurality of known twins for which there is no aneuploidy for the predetermined chromosome, the fourth threshold being based on a plurality of known twins
  • the predetermined chromosomes are all determined by reference samples of aneuploidy.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the third threshold, indicating that the twins have no aneuploidy for the predetermined chromosome.
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the fourth threshold, indicating that the twins are all aneuploidy for the predetermined chromosome, and the ratio of the third free fetal DNA is
  • the ratio of the ratio of the first free fetal DNA is equal to the third threshold or the fourth threshold, or between the third threshold and the fourth value indicating that one of the twins has a non-target for the predetermined chromosome The euploid, the other one does not have aneuploidy for the predetermined chromosome.
  • the third threshold is 0.35 and the fourth threshold is 0.7.
  • the predetermined chromosome is at least one of chromosomes 18, 21 and 23.
  • the invention provides a method of determining aneuploidy in a twin chromosome.
  • the method comprises: (1) performing free nucleic acid sequencing on peripheral blood of a pregnant woman with twins to obtain sequencing results consisting of multiple sequencing data; (2) determining sources in the sequencing results
  • the mean value of the data as a percentage of the total sequencing data, ⁇ i the standard deviation of the sequencing data of the
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the ratio of free fetal DNA, chry. ER% indicates the percentage of sequencing data derived from chromosome Y in the sequencing results as a percentage of total sequencing data;
  • Female.chry.ER% indicates a predetermined free nucleic acid derived from chromosome Y in a peripheral blood sample of a pregnant woman with a normal female fetus The average of the sequencing data as a percentage of the total sequencing data;
  • Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data;
  • the two fetuses in the twin fetus are determined to be three-body; when the sample to be tested falls in the second quadrant, one of the twins is determined to be three Body, one normal; when the sample to be tested falls in the third quadrant, the twin Two fetuses were determined to be normal; when the test sample falls in the fourth quadrant, it is determined that the fetus twins relatively low concentration of the sample, the result is not used.
  • the inventors have surprisingly found that the method for determining aneuploidy of twin chromosomes of the present invention can accurately and efficiently detect the aneuploidy of pregnant women's twin chromosomes and determine whether aneuploidy is present in the predetermined chromosomes of the twins.
  • the invention also provides a system for determining aneuploidy of twin chromosomes.
  • the system comprising: x i value determination device, determining the value x i used for isolated nucleic acid sequencing device on peripheral blood of pregnant women pregnant with twins, so as to obtain sequencing data constituted by a plurality of sequencing As a result, and determining the ratio of the number of sequencing data derived from chromosome i in the sequencing result to the total sequencing data x i , wherein i represents a chromosome number, and i is an arbitrary integer ranging from 1 to 22;
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the second free fetal DNA ratio, chry.ER% indicates the percentage of the sequencing data derived from chromosome Y in the sequencing results as a percentage of the total sequencing data; Female.chry.ER% indicates that the pre-determined peripheral blood samples of pregnant women with normal fetuses are derived from chromosome Y.
  • Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data.
  • the two twin fetuses are determined to be normal; when If the sample to be tested falls in the fourth quadrant, the sample with the lower fetal concentration is determined in the twins, and the result is not used.
  • the inventors have found that using the system for determining aneuploidy of twin chromosomes of the present invention, it is possible to accurately and efficiently detect the aneuploidy of pregnant women's twin chromosomes and determine whether aneuploidy is present in the predetermined chromosomes of the twins.
  • the invention provides a method of detecting a fetal chimera.
  • the method comprises: (1) performing free nucleic acid sequencing on peripheral blood of a pregnant woman with a fetus to obtain a sequencing result consisting of a plurality of sequencing data, optionally, the fetus is a male fetus; (2) Based on the sequencing data, determining the ratio of the first free fetal DNA according to the method described above, or estimating the fetal concentration fra.chrY% as the first free fetal DNA ratio using the Y chromosome based on the following formula:
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the ratio of the first free fetal DNA, chry.ER% indicates the percentage of the sequencing data derived from chromosome Y in the sequencing results as a percentage of the total sequencing data; Female.chry.ER% indicates that the pre-determined peripheral blood samples of pregnant women with normal fetuses are derived from chromosome Y.
  • the average of the sequencing data of the free nucleic acid as a percentage of the total sequencing data; and Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data. a value; (3) determining a third free fetal DNA ratio based on the sequencing data derived from the predetermined chromosome in the sequencing result; and (4) based on the ratio of the first free fetal DNA ratio to the third free fetal DNA ratio, A chimera is determined in the fetus for the predetermined chromosome. Thereby, it is possible to accurately analyze whether or not a chimera of a specific chromosome exists in the fetus.
  • the method may further have the following additional technical features:
  • the third free fetal DNA ratio is determined based on the following formula:
  • Fra.chri represents the third free fetal DNA ratio, i is the number of the predetermined chromosome, and i is an arbitrary integer ranging from 1 to 22;
  • chri.ER% represents the percentage of the sequencing data derived from the predetermined chromosome in the sequencing result as a percentage of the total sequencing data
  • the adjust.chri.ER% represents the percentage average of the sequencing data of the free nucleic acid derived from the predetermined chromosome in the peripheral blood sample of the pregnant woman with the normal fetus in the predetermined percentage.
  • the method includes:
  • step (b) comparing the ratio obtained in step (a) with a predetermined plurality of thresholds to determine whether the fetus has a chimera for the predetermined chromosome.
  • the predetermined plurality of thresholds comprise at least one selected from the group consisting of:
  • the seventh threshold being determined based on a plurality of reference samples known to be completely single for the predetermined chromosome
  • An eighth threshold the eighth threshold being determined based on a plurality of reference samples known to be monomeric for the predetermined chromosome
  • the ninth threshold being determined based on a plurality of reference samples known to be normal for the predetermined chromosome
  • a tenth threshold the tenth threshold being determined based on a plurality of reference samples known to be completely trisomy for the predetermined chromosome.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the seventh threshold, indicating that the fetus indicates that the fetus is for the predetermined chromosome for the predetermined chromosome Being completely monomeric;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is not less than the seventh threshold and not greater than the eighth threshold indicating that the fetus represents the fetus for the predetermined chromosome
  • the predetermined chromosome is a monomeric chimera
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the eighth threshold and less than the ninth threshold indicates that the fetus represents the fetus for the predetermined chromosome for the predetermined chromosome Be normal;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is not less than the ninth threshold and not greater than the tenth threshold indicating that the fetus represents the fetus for the predetermined chromosome
  • the predetermined chromosome is a three-body chimera
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the tenth threshold indicating that the fetus indicates that the fetus is a complete trisomy for the predetermined chromosome for the predetermined chromosome.
  • the seventh threshold is at least -1 and the seventh threshold is less than 0, optionally the fifth threshold is -0.85;
  • the eighth threshold is greater than the seventh threshold and the eighth threshold is less than 0, optionally the eighth threshold is -0.3;
  • the ninth threshold is greater than 0 and the ninth threshold is less than 1, optionally the ninth threshold is 0.3;
  • the tenth threshold is greater than the ninth threshold and the tenth threshold is less than one, optionally the tenth threshold is 0.85. Thereby, the efficiency of analyzing the presence or absence of a chimera of a specific chromosome in the fetus can be further improved.
  • the invention provides a system for detecting a fetal chimera.
  • the system comprises:
  • the first free fetal DNA ratio determining device is a device for determining a ratio of free nucleic acids in a biological sample as described above, and is used for nucleic acid sequencing of a peripheral blood of a pregnant woman of a fetus in order to obtain The sequencing result of the plurality of sequencing data, and determining the first free fetal DNA ratio based on the sequencing data, or the first free fetal DNA ratio determining device is adapted to estimate the fetal concentration using the Y chromosome based on the following formula: chrY% as the ratio of the first free fetal DNA:
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the ratio of the first free fetal DNA, chry.ER% indicates the percentage of the sequencing data derived from chromosome Y in the sequencing results as a percentage of the total sequencing data; Female.chry.ER% indicates that the pre-determined peripheral blood samples of pregnant women with normal fetuses are derived from chromosome Y.
  • Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data.
  • a third free fetal DNA ratio determining device the third free fetal DNA ratio determining device being adapted to determine a third free fetal DNA ratio based on sequencing data derived from a predetermined chromosome in the sequencing result; and a chimera determining device
  • the chimera determining device is adapted to determine whether a chimera is present in the twin for the predetermined chromosome based on the ratio of the first free fetal DNA ratio to the third free fetal DNA.
  • the above system for determining a fetal chimera can be effectively implemented before The method of determining the fetal chimera described above enables an effective analysis of the fetal chimera.
  • the above described system for determining a fetal chimera may also have the following additional technical features:
  • the third free fetal DNA ratio is determined based on the following formula:
  • Fra.chri represents the third free fetal DNA ratio, i is the number of the predetermined chromosome, and i is an arbitrary integer ranging from 1 to 22;
  • chri.ER% represents the percentage of the sequencing data derived from the predetermined chromosome in the sequencing result as a percentage of the total sequencing data
  • the adjust.chri.ER% represents a percentage average of the total sequenced data of the free nucleic acid sequencing data derived from the predetermined chromosome in the peripheral blood sample of the pregnant woman harboring the normal fetus.
  • the method includes:
  • step (b) comparing the ratio obtained in step (a) with a predetermined plurality of thresholds to determine whether the fetus has a chimera for the predetermined chromosome.
  • the predetermined plurality of thresholds comprise at least one selected from the group consisting of:
  • the seventh threshold being determined based on a plurality of reference samples known to be completely single for the predetermined chromosome
  • An eighth threshold the eighth threshold being determined based on a plurality of reference samples known to be monomeric for the predetermined chromosome
  • the ninth threshold being determined based on a plurality of reference samples known to be normal for the predetermined chromosome
  • a tenth threshold the tenth threshold being determined based on a plurality of reference samples known to be completely trisomy for the predetermined chromosome.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the seventh threshold, indicating that the fetus indicates that the fetus is for the predetermined chromosome for the predetermined chromosome Being completely monomeric;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is not less than the seventh threshold and not greater than the eighth threshold indicating that the fetus represents the fetus for the predetermined chromosome
  • the predetermined chromosome is a monomeric chimera
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the eighth threshold and less than the ninth threshold indicates that the fetus represents the fetus for the predetermined chromosome for the predetermined chromosome Be normal;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is not less than the ninth threshold and not greater than the tenth threshold indicating that the fetus represents the fetus for the predetermined chromosome
  • the predetermined chromosome is a three-body chimera
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the tenth threshold indicating that the fetus indicates that the fetus is a complete trisomy for the predetermined chromosome for the predetermined chromosome.
  • the seventh threshold is greater than -1 and the seventh threshold is less than 0, optionally the fifth threshold is -0.85;
  • the eighth threshold is greater than the seventh threshold and the eighth threshold is less than 0, optionally the eighth threshold is -0.3;
  • the ninth threshold is greater than 0 and the ninth threshold is less than 1, optionally the ninth threshold is 0.3;
  • the tenth threshold is greater than the ninth threshold and the tenth threshold is less than one, optionally the tenth threshold is 0.85. Thereby, the efficiency of analyzing the presence or absence of a chimera of a specific chromosome in the fetus can be further improved.
  • the invention provides a method of detecting a fetal chimera, according to an embodiment of the invention, the method of determining a fetal chimera comprising:
  • T i (x i - ⁇ i )/ ⁇ i , where i represents the chromosome number, and i is an arbitrary integer in the range of 1 to 22, and ⁇ i represents The sequencing data of the i-th chromosome selected as the reference frame in the reference database accounts for the mean of the percentage of the total sequencing data, ⁇ i : the sequencing data of the i-th chromosome selected as the reference frame in the reference database accounts for the total sequencing The standard deviation of the percentage of data;
  • T2 i (x i - ⁇ i *(1+fra/2))/ ⁇ i ;
  • d(T i , a) and d(T2 i , a) are the probability density function of the t distribution, a is the degree of freedom, and fra is the ratio of free fetal DNA determined according to the method for determining the ratio of free nucleic acids in the biological sample as described above;
  • the fetus is determined to be completely monomeric or monomeric to the predetermined chromosome
  • the fetus is determined to be a monomer chimera for the predetermined chromosome
  • the fetus is determined to be a sample with a relatively low fetal concentration, and the result is not used.
  • T is greater than 0
  • a four-quadrant map is obtained according to the T value and the L value
  • the abscissa is the L value
  • the ordinate is the T value
  • the straight line T the predetermined thirteenth threshold
  • L the predetermined fourteenth threshold
  • the fetus is determined to be a fully trisomy or trisomy chimera for the predetermined chromosome
  • the fetus is determined to be a three-body chimera for the predetermined chromosome
  • the fetus is determined to be a sample with a relatively low fetal concentration, and the result is not used.
  • the eleventh threshold and the thirteenth threshold are each independently 3, and the twelfth threshold and the fourteen threshold are each independently 1.
  • FIG. 1 is a schematic flow diagram of a method of determining a ratio of free nucleic acids in a biological sample, in accordance with one embodiment of the present invention
  • FIG. 2 is a schematic flow diagram of a method of determining the number of nucleic acid molecules falling within a predetermined range, in accordance with one embodiment of the present invention
  • FIG. 3 is a schematic flow diagram of a method of determining the length of a nucleic acid molecule, in accordance with one embodiment of the present invention.
  • FIG. 4 is a flow chart showing a method of determining a predetermined range according to an embodiment of the present invention
  • FIG. 5 is a schematic flow diagram of a method of determining a ratio of free nucleic acids of a predetermined source, in accordance with one embodiment of the present invention
  • FIG. 6 is a flow chart showing a method of determining a predetermined function according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an apparatus for determining a ratio of free nucleic acids of a predetermined source in a biological sample, according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a counting device according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a first length determining unit according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a predetermined range determining device according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural view of a free nucleic acid ratio determining device according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of a predetermined function determining apparatus according to an embodiment of the present invention.
  • Figure 13 is a diagram showing the ratio of free fetal DNA free fetal DNA estimated by Y chromosome in 37 pregnant women samples with normal male fetuses, and the presence of DNA molecules with DNA fragments ranging from 185 bp to 204 bp in length according to one embodiment of the present invention. a linear fit of the correlation coefficient of the frequency;
  • 14-16 are four quadrant diagrams of T values & L values of 11 samples to be tested, in accordance with one embodiment of the present invention.
  • the invention provides a method of determining the proportion of free nucleic acids of a predetermined source in a biological sample.
  • the inventors have surprisingly found that the method of the present invention enables accurate and efficient determination of the proportion of free nucleic acids in a biological sample, and is particularly useful for determining fetal nucleic acid in peripheral blood of pregnant women and the proportion of tumor nucleic acid in peripheral blood of tumor patients.
  • the expression "the ratio of free nucleic acids of a predetermined source in a biological sample” as used herein refers to the ratio of the number of free nucleic acid molecules of a specific source to the total number of free nucleic acid molecules in a biological sample.
  • the “proportion of free nucleic acid of a predetermined source in the biological sample” is a ratio of free fetal nucleic acid, indicating the freeness contained in the peripheral blood of the pregnant woman.
  • the ratio of the number of fetal nucleic acid molecules to the total number of free nucleic acid molecules can sometimes be referred to as "the concentration of free fetal DNA in the peripheral blood of pregnant women" or the ratio of free fetal DNA.
  • the biological sample is a tumor patient, a suspected tumor patient, or a tumor screening person peripheral blood
  • the predetermined source of free nucleic acid is a free tumor nucleic acid
  • the biological sample is predetermined.
  • the ratio of free nucleic acid derived from the source ie the ratio of free tumor nucleic acid, represents the ratio of the number of free tumor nucleic acid molecules contained in the peripheral blood of the tumor patient, suspected tumor patient or tumor screener to the total number of free nucleic acid molecules.
  • the method includes:
  • the biological sample containing the free nucleic acid is subjected to nucleic acid sequencing to obtain sequencing results of the plurality of sequencing data.
  • the biological sample is peripheral blood.
  • the free nucleic acid of the predetermined source is free fetal nucleic acid in the peripheral blood of the pregnant woman, or free tumor nucleic acid.
  • the ratio of free fetal nucleic acid in the peripheral blood of the pregnant woman, or the ratio of free tumor nucleic acid in the peripheral blood of the tumor patient, the suspected tumor patient or the tumor screening person can be easily determined.
  • the nucleic acid is DNA. It should be noted that the term "sequencing data" as used herein is a sequence reads corresponding to the sequenced nucleic acid molecules.
  • the sequencing result comprises the length of the free nucleic acid.
  • the sequencing is double-end sequencing, single-end sequencing or single molecule sequencing.
  • the length of the free nucleic acid is easily obtained, which is advantageous for the subsequent steps.
  • the number of nucleic acid molecules whose length falls within a predetermined range in the sample is determined.
  • length refers to the length of a nucleic acid molecule (reads), which can be expressed in units of base pairs, that is, bp.
  • S200 further includes:
  • S210 Align the sequencing result with a reference genome. Specifically, the sequencing result is aligned with a reference genome to construct a unique alignment sequencing data set, each of the unique alignment sequencing data sets can only match one position of the reference genome, preferably There are no mismatches or at most 1 or at most 2 mismatched sequencing data.
  • S220 Determine the length of the nucleic acid molecule. Specifically, the length of the nucleic acid molecule corresponding to each sequencing data in the unique alignment sequencing data set is determined.
  • S230 Determine the number of nucleic acid molecules falling within a predetermined range. Specifically, the number of nucleic acid molecules whose length falls within the predetermined range is determined.
  • the number of nucleic acid molecules whose length falls within a predetermined range in the sample can be easily determined, and the result is accurate and reliable, and the repeatability is good.
  • step S220 the sequence length capable of matching the sequencing data with the reference genome is taken as the length of the nucleic acid molecule corresponding to the sequencing data.
  • the sequence length capable of matching the sequencing data with the reference genome is taken as the length of the nucleic acid molecule corresponding to the sequencing data.
  • step S220 comprises:
  • S2210 Determine the 5' end position. Specifically, based on the one side sequencing data of the double-end sequencing data, the 5' end position of the nucleic acid is determined on the reference genome.
  • S2220 Determine the 3' end position. Specifically, based on the other side sequencing data of the double-end sequencing data, the 3' end position of the nucleic acid is determined on the reference genome.
  • S2230 Determine the length of the nucleic acid. Specifically, the length of the nucleic acid is determined based on the 5' end position of the nucleic acid and the 3' end position of the nucleic acid.
  • the ratio of the free nucleic acids in the biological sample is determined based on the number of nucleic acid molecules whose length falls within a predetermined range.
  • the method of the present invention further includes a step S400 of determining a predetermined range (not shown).
  • the predetermined range is determined based on a plurality of control samples, wherein the proportion of free nucleic acids in the control sample is known.
  • the determined predetermined range results are accurate and reliable.
  • the predetermined range is determined based on at least 20 control samples.
  • the step S400 of determining a predetermined range includes:
  • S410 Determine the length of the free nucleic acid molecule in the plurality of control samples. Specifically, the length of the free nucleic acid molecule contained in the plurality of control samples is determined.
  • S420 Determine the frequency of occurrence of free nucleic acids at each candidate length. Specifically, a plurality of candidate length ranges are set, and the frequencies at which the plurality of control samples exhibit free nucleic acid molecules within each candidate length range are determined, respectively.
  • S430 Determine a correlation coefficient. Specifically, a correlation coefficient between each of the candidate length ranges and the fetal ratio is determined based on a frequency at which the plurality of control samples exhibit a frequency of free nucleic acid molecules within each candidate length range and a fetal nucleic acid ratio of the control sample.
  • S440 Select a predetermined range. Specifically, a candidate length range in which the correlation coefficient is the largest is selected as the predetermined range.
  • the predetermined range can be determined accurately and efficiently.
  • the candidate length spans from 1 to 20 bp.
  • the plurality of candidate length ranges have a step size of 1 to 2 bp.
  • step S300 further includes:
  • S310 Determine the frequency of occurrence of free nucleic acid molecules within a predetermined range. Specifically, the frequency at which the free nucleic acid molecule occurs within the predetermined range is determined based on the number of nucleic acid molecules whose length falls within a predetermined range.
  • S320 Determine the ratio of free nucleic acids. Specifically, the ratio of free nucleic acids in the biological sample is determined according to a predetermined function based on a frequency at which the free nucleic acid molecules occur within the predetermined range, wherein the predetermined function is determined based on the plurality of control samples .
  • the ratio of the free nucleic acid in the biological sample can be effectively determined, and the result is accurate and reliable, and the repeatability is good.
  • the method of the invention further comprises the step S500 of determining a predetermined function (not shown).
  • the step S500 of determining a predetermined function includes:
  • S510 Determine a frequency of occurrence of free nucleic acid within a predetermined range of the control sample. Specifically, in the plurality of control samples, respectively, the frequency at which the free nucleic acid molecules occur within the predetermined range is determined.
  • S520 Fit. Specifically, the frequency of occurrence of free nucleic acid molecules within the predetermined range in the plurality of control samples is fitted to a known ratio of free nucleic acids to determine the predetermined function.
  • the determined predetermined function is accurate and reliable, facilitating the subsequent steps.
  • the fit is a linear fit.
  • the free nucleic acid of the predetermined source is free fetal nucleic acid in the peripheral blood of the pregnant woman, the predetermined range being 185 to 204 bp.
  • the ratio of the free nucleic acid in the biological sample can be effectively determined, and the result is accurate, reliable, and reproducible.
  • the expression "the frequency of occurrence of free nucleic acid molecules within the predetermined range" used herein means that the number of molecules of free nucleic acid distributed in a predetermined length range in the biological sample accounts for the total number of nucleic acid molecules. proportion.
  • control sample is a maternal peripheral blood sample of known free fetal nucleic acid ratio.
  • the predetermined range is determined to be accurate.
  • control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the known free fetal nucleic acid ratio is determined using the Y chromosome.
  • the predetermined range is determined to be accurate.
  • control sample is a peripheral blood sample of a pregnant woman with a normal fetus.
  • the predetermined range is determined to be accurate.
  • the free nucleic acid ratio of the control sample is the ratio of free fetal DNA, and the ratio of free fetal DNA is estimated using the Y chromosome.
  • the ratio of the free nucleic acid of the control sample can be effectively utilized to determine the predetermined range, thereby determining the number of nucleic acid molecules whose length falls within a predetermined range in the sample of the pregnant woman to be tested and the proportion of free fetal DNA in the sample of the pregnant woman to be tested.
  • the method of the present invention may further include the following steps:
  • WGS Whole Genome Sequencing: Whole-genome sequencing of samples to be tested using a high-throughput platform.
  • free fetal DNA in plasma is relatively short, less than 300 bp, because the length of all free DNA molecules needs to be obtained, so that single-end sequencing requires measurement of the entire free DNA molecule, or double-end sequencing.
  • step 4 a functional relationship between the frequency of DNA molecules in one or more length regions with strong correlation and the known ratio of free free fetal DNA is obtained.
  • step 4) specifically includes the following steps:
  • a reference sample a sample containing a known ratio of free free fetal DNA, is selected.
  • 5 bp is a window, and in 2 bp steps, statistics can be counted [1 bp, 5 bp], [2 bp] , 6bp], [4bp, 8bp], [6bp, 10bp], ... the distribution of DNA molecules in each window; for example, 5bp as a window, in 5bp steps, you can count [1bp, 5bp], [ 6 bp, 10 bp], [11 bp, 15 bp], ... distribution of DNA molecules under each window.
  • total number of molecules means the total number of DNA molecules of all lengths.
  • V Find out the window where the frequency of DNA molecules appearing in each window is more correlated with the sample of known free free fetal DNA.
  • Port or window combination establish a function relationship, select a window with strong correlation or combine the windows, that is, select one with strong correlation or multiple length regions.
  • the invention also provides an apparatus for determining the proportion of free nucleic acids of a predetermined source in a biological sample.
  • the apparatus of the present invention is suitable for performing the method of the present invention for determining the proportion of free nucleic acids of a predetermined source in a biological sample as described above, thereby enabling the accurate and efficient determination of a predetermined source of biological samples using the apparatus of the present invention.
  • the ratio of free nucleic acids is particularly useful for determining the free fetal nucleic acid in the peripheral blood of pregnant women, as well as the proportion of free tumor nucleic acids in the peripheral blood of tumor patients, suspected tumor patients or tumor screeners.
  • the apparatus includes: a sequencing apparatus 100, a counting apparatus 200, and a free nucleic acid ratio determining apparatus 300.
  • the sequencing device 100 is configured to perform nucleic acid sequencing on a biological sample containing free nucleic acid to obtain sequencing results of a plurality of sequencing data;
  • the counting device 200 is connected to the sequencing device 100, and is configured to determine based on the sequencing result a number of nucleic acid molecules whose length falls within a predetermined range;
  • the free nucleic acid ratio determining device 300 is connected to the counting device 200, and is used to determine the biological number based on the number of nucleic acid molecules whose length falls within a predetermined range The proportion of free nucleic acids of the predetermined source in the sample.
  • the kind of the biological sample is not particularly limited.
  • the biological sample is peripheral blood.
  • the free nucleic acid is free fetal nucleic acid or maternal-derived free nucleic acid in the peripheral blood of the pregnant woman, or free tumor nucleic acid or non-tumor-derived free nucleic acid in the peripheral blood of the tumor patient.
  • the ratio of free fetal nucleic acid in the peripheral blood of the pregnant woman, or the ratio of free tumor nucleic acid in the peripheral blood of the tumor patient, the suspected tumor patient or the tumor screening person can be easily determined.
  • the nucleic acid is DNA.
  • the sequencing result comprises the length of the free nucleic acid.
  • the sequencing is double-end sequencing, single-end sequencing or single molecule sequencing.
  • the length of the free nucleic acid is easily obtained, which is advantageous for the subsequent steps.
  • the counting device 200 further includes a comparison unit 210, a first length determining unit 220, and a number determining unit 230.
  • the comparison unit 210 is configured to compare the sequencing result with a reference genome to construct a unique alignment sequencing data set, each of the sequencing data in the unique alignment sequencing data set can only be associated with the reference genome a position matching unit;
  • the first length determining unit 220 is connected to the comparing unit 210, and configured to determine a length of the nucleic acid molecule corresponding to each sequencing data in the unique alignment sequencing data set;
  • the number determining unit 230 and the first A length determining unit 220 is coupled for determining the number of nucleic acid molecules whose length falls within the predetermined range.
  • the first length determining unit 220 uses a sequence length in which the sequencing data can match the reference genome as a length of a nucleic acid molecule corresponding to the sequencing data. Thereby, it is possible to accurately determine the length of the nucleic acid molecule corresponding to each of the sequencing data in the unique alignment data set.
  • the sequencing is double-end sequencing, wherein, referring to FIG. 9, the first length determining unit 220 further includes: a 5' end position determining module 2210, a 3' end position determining module 2220, and a length calculation. Module 2230.
  • the 5' end position determining module 2210 is configured to sequence data based on one side of the double-end sequencing data, on the reference genome, determine The 5' end position of the nucleic acid; the 3' end position determining module 2220 is coupled to the 5' end position determining module 2210 for sequencing data based on the other side of the double end sequencing data, on the reference genome, Determining a 3' end position of the nucleic acid; a length calculation module 2230 coupled to the 3' end position determining module 2220 for determining the 5' end position of the nucleic acid and the 3' end position of the nucleic acid The length of the nucleic acid. Thereby, it is possible to accurately determine the length of the nucleic acid molecule corresponding to each of the sequencing data in the unique alignment data set.
  • the apparatus of the present invention further includes predetermined range determining means 400 for determining the predetermined range based on a plurality of control samples, wherein a predetermined source of the control sample is free
  • the nucleic acid ratio is known, and optionally, the predetermined range is determined based on at least 20 control samples.
  • the predetermined range determining apparatus 400 further includes a second length determining unit 410, a first scale determining unit 420, a correlation coefficient determining unit 430, and a predetermined range determining unit 440.
  • the second length determining unit 410 is configured to determine a length of the free nucleic acid molecules included in the plurality of control samples;
  • the first ratio determining unit 420 is connected to the second length determining unit 410, and configured to set a plurality of a candidate length range, and determining, respectively, a frequency at which the plurality of control samples exhibit free nucleic acid molecules within each of the candidate length ranges;
  • a correlation coefficient determining unit 430 is coupled to the first ratio determining unit 420 for Determining, by a plurality of control samples, a frequency of free nucleic acid molecules in each candidate length range and a ratio of free nucleic acids of a predetermined source in the control sample, and determining a correlation coefficient between each of the candidate length ranges and a ratio of free nucleic acids of the predetermined source
  • the predetermined range determining unit 440 is connected to the correlation coefficient determining unit 430 for selecting a candidate length range having the largest correlation coefficient as the predetermined range. Thereby, the predetermined range can be determined
  • the candidate length spans from 1 to 20 bp.
  • the plurality of candidate length ranges have a step size of 1 to 2 bp.
  • the free nucleic acid ratio determining apparatus 300 further includes: a second ratio determining unit 310 and a free nucleic acid ratio calculating unit 320.
  • the second ratio determining unit 310 is configured to determine a frequency at which the free nucleic acid molecule appears within the predetermined range based on the number of nucleic acid molecules whose length falls within a predetermined range; the free nucleic acid ratio calculating unit 320 and the second The ratio determining unit 310 is connected to determine a ratio of free nucleic acids in the biological sample according to a predetermined function based on a frequency at which the free nucleic acid molecules appear within the predetermined range, wherein the predetermined function is based on the plurality A control sample was determined.
  • the ratio of the free nucleic acid in the biological sample can be effectively determined, and the result is accurate and reliable, and the repeatability is good.
  • the apparatus of the present invention further includes a predetermined function determining means 500.
  • the predetermined function determining means 500 includes a third ratio determining unit 510 and a fitting unit 520.
  • the third ratio determining unit 510 is configured to determine a frequency of occurrence of the free nucleic acid molecule within the predetermined range in the plurality of control samples, respectively;
  • the fitting unit 520 is connected to the third ratio determining unit 510, A method for fitting a frequency of occurrence of a free nucleic acid molecule within the predetermined range of the plurality of control samples to a known ratio of free nucleic acids to determine the predetermined function.
  • the fit is a linear fit.
  • the free nucleic acid of the predetermined source is free fetal nucleic acid in the peripheral blood of the pregnant woman, the predetermined range being 185 to 204 bp.
  • control sample is a maternal peripheral blood sample of known free fetal nucleic acid ratio.
  • control sample is a maternal peripheral blood sample of a normal male fetus with a known ratio of free fetal nucleic acid, and the known free fetal nucleic acid ratio is determined using the Y chromosome.
  • the predetermined range is determined to be accurate.
  • control sample is a peripheral blood sample of a pregnant woman with a normal male fetus.
  • the predetermined range is determined to be accurate.
  • the free nucleic acid ratio of the control sample is a free fetal DNA ratio
  • the free fetal DNA ratio is obtained using a device suitable for performing Y chromosome estimation.
  • the invention provides a method of determining the sex of a twin.
  • the method comprises: (1) performing free nucleic acid sequencing on peripheral blood of a pregnant woman with twins to obtain sequencing results composed of a plurality of sequencing data; (2) based on the sequencing data, according to Determining a ratio of free nucleic acid in the biological sample as described above, determining a ratio of the first free fetal DNA; (3) determining a second free fetal DNA ratio based on the sequencing data derived from the Y chromosome in the sequencing result; and (4) And determining the sex of the twins based on the ratio of the first free fetal DNA to the second free fetal DNA.
  • the inventors have surprisingly found that the twins of pregnant women with twins can be accurately and efficiently determined using the method of the present invention.
  • the second free fetal DNA ratio is determined based on the following formula:
  • Fra.chry (chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
  • fra.chry represents the proportion of the second free fetal DNA
  • chry.ER% represents the percentage of the sequencing data derived from chromosome Y in the sequencing result as a percentage of the total sequencing data
  • Female.chry.ER% indicates a predetermined Sequencing data of free nucleic acids derived from chromosome Y in peripheral blood samples of pregnant women with normal fetuses as an average of the percentage of total sequencing data
  • Man.chry.ER% indicates pre-determined normal male samples derived from chromosome Y
  • the sequencing data of the free nucleic acid is the average of the percentage of the total sequencing data.
  • step (4) comprising: (a) determining a ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA; and (b) in step (a) The resulting ratio is compared to a predetermined first threshold and second threshold to determine the gender of the twins. Thereby, the gender of the twins can be effectively determined.
  • the first threshold is determined based on a reference sample in which a plurality of known twins are female fetuses
  • the second threshold being determined based on a plurality of reference samples in which the known twins are male fetuses .
  • a ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA is less than the first threshold, indicating that the twins are female fetuses, and the ratio of the second free fetal DNA is The ratio of the ratio of the first free fetal DNA to the second threshold indicates that the twins are both male, and the ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA is equal to the first threshold Or a second threshold, or between the first threshold and the second threshold, indicating that the twin comprises a male and a female.
  • the first threshold is 0.35 and the second threshold is 0.7.
  • the invention provides a system for determining the sex of twins.
  • the system comprises: a first free fetal DNA ratio determining device, wherein the first free fetal DNA ratio determining device is a device for determining a ratio of free nucleic acids in a biological sample as described above, for The peripheral blood of a pregnant woman of twins is subjected to free nucleic acid sequencing to obtain a sequencing result composed of a plurality of sequencing data, and based on the sequencing data, a first free fetal DNA ratio is determined; a second free fetal DNA ratio determining device, said a second free fetal DNA ratio determining device adapted to determine a second free fetal DNA ratio based on sequencing data derived from the Y chromosome in the sequencing result; and a gender determining device adapted to be based on the first free fetal The ratio of DNA to the ratio of the second free fetal DNA determines the sex of the twins.
  • the inventors is a device for determining a ratio of
  • the second free fetal DNA ratio is determined based on the following formula:
  • Fra.chry (chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
  • fra.chry represents the proportion of the second free fetal DNA
  • chry.ER% represents the percentage of the sequencing data derived from chromosome Y in the sequencing result as a percentage of the total sequencing data
  • Female.chry.ER% indicates a predetermined The percentage of free nucleic acid sequencing data derived from chromosome Y in peripheral blood samples of pregnant women with normal fetuses as a percentage of total sequencing data
  • Man.chry.ER% indicates the pre-determined normal male peripheral blood samples derived from chromosome Y
  • the sequencing data of the free nucleic acid is the average of the percentage of the total sequencing data. Thereby, the ratio of the second free fetal DNA can be accurately determined.
  • the sex determination device further includes: a ratio determining unit configured to determine a ratio of the second free fetal DNA ratio to the first free fetal DNA ratio; and a comparison unit And the comparing unit is configured to compare the obtained ratio with a predetermined first threshold and a second threshold to determine the gender of the twins. Thereby, the gender of the twins can be effectively determined.
  • the first threshold is determined based on a reference sample in which a plurality of known twins are female fetuses
  • the second threshold being determined based on a plurality of reference samples in which the known twins are male fetuses .
  • a ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA is less than the first threshold, indicating that the twins are female fetuses, and the ratio of the second free fetal DNA is The ratio of the ratio of the first free fetal DNA to the second threshold indicates that the twins are both male, and the ratio of the ratio of the second free fetal DNA to the ratio of the first free fetal DNA is equal to the first threshold Or a second threshold, or between the first threshold and the second threshold, indicating that the twin comprises a male and a female.
  • the first threshold is 0.35 and the second threshold is 0.7.
  • the invention provides a method of detecting aneuploidy in a twin chromosome.
  • the method comprises: (1) performing free nucleic acid sequencing on peripheral blood of a pregnant woman with twins to obtain sequencing results composed of a plurality of sequencing data; (2) based on the sequencing data, according to The method for determining the ratio of free nucleic acids in a biological sample as described above, determining the first Free fetal DNA ratio.
  • the fetal concentration fra.chrY% can also be estimated using the Y chromosome as the first free fetal DNA ratio based on the following formula:
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%
  • fra.chry represents the ratio of the first free fetal DNA
  • chry.ER% represents the percentage of the sequencing data derived from chromosome Y in the sequencing result as a percentage of the total sequencing data
  • Female.chry.ER% indicates that the predetermined normal is normal.
  • the sequencing data of the free nucleic acid derived from chromosome Y in the peripheral blood sample of the pregnant woman of the female fetus accounted for the average of the percentage of the total sequencing data; and Man.chry.ER% indicates the predetermined normal male peripheral blood sample derived from the chromosome Y
  • the sequencing data of the free nucleic acid is an average of the percentage of the total sequencing data; (3) determining the third free fetal DNA ratio based on the sequencing data derived from the predetermined chromosome in the sequencing result; and (4) based on the first free A ratio of the fetal DNA to the third free fetal DNA determines whether aneuploidy is present in the twin for the predetermined chromosome. Thereby, the detection of aneuploidy of twin chromosomes can be performed accurately and efficiently.
  • step (4) comprising: (a) determining a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA; and (b) in step (a) The resulting ratio is compared to a predetermined third threshold and a fourth threshold to determine if the twins have aneuploidy for the predetermined chromosome.
  • the first threshold is determined based on a reference sample in which a plurality of known twins are not aneuploid for the predetermined chromosome, the second threshold being based on a plurality of known twins
  • the predetermined chromosomes are all determined by reference samples of aneuploidy.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the third threshold, indicating that the twins have no aneuploidy for the predetermined chromosome.
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the fourth threshold, indicating that the twins are all aneuploidy for the predetermined chromosome, and the ratio of the third free fetal DNA is The ratio of the ratio of the first free fetal DNA is equal to the third threshold or the fourth threshold, or between the third threshold and the fourth value indicating that one of the twins has a non-target for the predetermined chromosome The euploid, the other one does not have aneuploidy for the predetermined chromosome.
  • the first threshold is 0.35 and the second threshold is 0.7.
  • the predetermined chromosome is at least one of chromosomes 18, 21 and 23.
  • the present invention provides a system for determining aneuploidy of twin chromosomes.
  • the system comprises: a first free fetal DNA ratio determining device, wherein the first free fetal DNA ratio determining device is a device for determining a ratio of free nucleic acids in a biological sample as described above, for The peripheral blood of pregnant women of twins is subjected to free nucleic acid sequencing so that A sequencing result consisting of a plurality of sequencing data is obtained, and based on the sequencing data, a first free fetal DNA ratio is determined.
  • the first free fetal DNA ratio determining device is adapted to estimate the fetal concentration fra.chrY% as the first free fetal DNA ratio using the Y chromosome based on the following formula:
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the ratio of the first free fetal DNA, chry.ER% indicates the percentage of the sequencing data derived from chromosome Y in the sequencing results as a percentage of the total sequencing data; Female.chry.ER% indicates that the pre-determined peripheral blood samples of pregnant women with normal fetuses are derived from chromosome Y.
  • Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data.
  • a third free fetal DNA ratio determining device the third free fetal DNA ratio determining device being adapted to determine a third free fetal DNA ratio based on the sequencing data derived from the predetermined chromosome in the sequencing result; and the first non-integral a ploidy determining device, the first aneuploidy determining device being adapted to determine whether the twins are for the predetermined chromosome based on the ratio of the first free fetal DNA ratio to the third free fetal DNA ratio Aneuploidy in non.
  • the inventors have surprisingly found that the use of the system of the present invention enables accurate and efficient detection of twin chromosome aneuploidy.
  • the aneuploidy determining apparatus further includes: a ratio determining unit configured to determine a ratio of the third free fetal DNA ratio to the first free fetal DNA ratio; and comparing And comparing the obtained ratio to a predetermined third threshold and a fourth threshold to determine whether the twins have aneuploidy for the predetermined chromosome.
  • the first threshold is determined based on a reference sample in which a plurality of known twins are not aneuploid for the predetermined chromosome, the second threshold being based on a plurality of known twins
  • the predetermined chromosomes are all determined by reference samples of aneuploidy.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the third threshold, indicating that the twins have no aneuploidy for the predetermined chromosome.
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the fourth threshold, indicating that the twins are all aneuploidy for the predetermined chromosome, and the ratio of the third free fetal DNA is
  • the ratio of the ratio of the first free fetal DNA is equal to the third threshold or the fourth threshold, or between the third threshold and the fourth value indicating that one of the twins has a non-target for the predetermined chromosome The euploid, the other one does not have aneuploidy for the predetermined chromosome.
  • the first threshold is 0.35 and the second threshold is 0.7.
  • the predetermined chromosome is at least one of chromosomes 18, 21 and 23.
  • the invention also provides a method of determining aneuploidy in a twin chromosome.
  • the method comprises: (1) performing free nucleic acid sequencing on peripheral blood of a pregnant woman with twins to obtain sequencing results consisting of multiple sequencing data; (2) determining sources in the sequencing results
  • the mean of the percentage of total sequencing data, ⁇ i the standard deviation of the sequencing data of the ith chromosome
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the ratio of free fetal DNA, chry. ER% indicates the percentage of sequencing data derived from chromosome Y in the sequencing results as a percentage of total sequencing data;
  • Female.chry.ER% indicates a predetermined free nucleic acid derived from chromosome Y in a peripheral blood sample of a pregnant woman with a normal female fetus The average of the sequencing data as a percentage of the total sequencing data;
  • Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data;
  • the two fetuses in the twin fetus are determined to be three-body; when the sample to be tested falls in the second quadrant, one of the twins is determined to be three Body, one normal; when the sample to be tested falls in the third quadrant, the twin Two fetuses were determined to be normal; when the test sample falls in the fourth quadrant, it is determined that the fetus twins relatively low concentration of the sample, the result is not used.
  • the inventors have surprisingly found that the method for determining aneuploidy of twin chromosomes of the present invention can accurately and efficiently detect the aneuploidy of pregnant women's twin chromosomes and determine whether aneuploidy is present in the predetermined chromosomes of the twins.
  • the invention provides a system for determining aneuploidy of twin chromosomes.
  • the system comprising: x i value determination device, determining the value x i used for isolated nucleic acid sequencing device on peripheral blood of pregnant women pregnant with twins, so as to obtain sequencing data constituted by a plurality of sequencing As a result, and determining the ratio of the number of sequencing data derived from chromosome i in the sequencing result to the total sequencing data x i , wherein i represents a chromosome number, and i is an arbitrary integer ranging from 1 to 22;
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, its Where fra.chry represents the proportion of free fetal DNA, chry.ER% represents the percentage of sequencing data derived from chromosome Y in the sequencing results as a percentage of total sequencing data; Female.chry.ER% indicates a predetermined normal female fetus The sequencing data of the free nucleic acid derived from chromosome Y in the peripheral blood sample of pregnant women accounted for the average of the percentage of the total sequencing data; and Man.chry.ER% indicates the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample.
  • the ordinate is a T value
  • the two fetuses are judged to be three-body; when the sample to be tested falls in the second quadrant, one of the twins is judged to be trisomy, one normal; when the sample to be tested falls in the third quadrant, the twins are Both fetuses
  • the judgment is normal; when the sample to be tested falls in the fourth quadrant, the sample with the lower fetal concentration is determined in the twins, and the result is not used.
  • the inventors have found that using the system for determining aneuploidy of twin chromosomes of the present invention, it is possible to accurately and efficiently detect the aneuploidy of pregnant women's twin chromosomes and determine whether aneuploidy is present in the predetermined chromosomes of the twins.
  • [Mu] i represents the i-th data sequencing chromosomes was selected reference frame in the reference database, the total percentage of the mean of the sequencing data" in the "reference database” means, harbor normal fetal (male fetuses Peripheral blood free nucleic acid and sequencing data of pregnant women, female fetuses, single or twin fetuses;
  • the "sequencing data” described above for the expression "sequencing data derived from chromosome Y” is the read obtained by sequencing.
  • x i may be a result of GC correction.
  • the corrected ER value is calculated according to the above relationship and the ER and GC of the sample:
  • the invention provides a method of detecting a fetal chimera.
  • the method comprises: (1) performing nucleic acid sequencing on peripheral blood of a pregnant woman with a fetus to obtain a sequencing result consisting of a plurality of sequencing data, optionally, the fetus is a male fetus; 2) Based on the sequencing data, determine the first free fetal DNA ratio according to the method described above, or use the Y chromosome to estimate the fetal concentration fra.chrY% as the first free fetal DNA ratio based on the following formula:
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the ratio of the first free fetal DNA, chry.ER% indicates the percentage of the sequencing data derived from chromosome Y in the sequencing results as a percentage of the total sequencing data; Female.chry.ER% indicates that the pre-determined peripheral blood samples of pregnant women with normal fetuses are derived from chromosome Y.
  • the average of the sequencing data of the free nucleic acid as a percentage of the total sequencing data; and Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data. a value; (3) determining a third free fetal DNA ratio based on the sequencing data derived from the predetermined chromosome in the sequencing result; and (4) based on the ratio of the first free fetal DNA ratio to the third free fetal DNA ratio, A chimera is determined in the fetus for the predetermined chromosome. Thereby, it is possible to accurately analyze whether or not a chimera of a specific chromosome exists in the fetus.
  • the method may further have the following additional technical features:
  • the third free fetal DNA ratio is determined based on the following formula:
  • chri.ER% represents the percentage of the sequencing data derived from the predetermined chromosome in the sequencing result as a percentage of the total sequencing data
  • the adjust.chri.ER% represents a percentage average of the total sequenced data of the free nucleic acid sequencing data derived from the predetermined chromosome in the peripheral blood sample of the pregnant woman harboring the normal fetus.
  • the method includes:
  • step (b) comparing the ratio obtained in step (a) with a predetermined plurality of thresholds to determine whether the fetus has a chimera for the predetermined chromosome.
  • the predetermined plurality of thresholds comprise at least one selected from the group consisting of:
  • the seventh threshold being determined based on a plurality of reference samples known to be completely single for the predetermined chromosome
  • An eighth threshold the eighth threshold being determined based on a plurality of reference samples known to be monomeric for the predetermined chromosome
  • the ninth threshold being determined based on a plurality of reference samples known to be normal for the predetermined chromosome
  • a tenth threshold the tenth threshold being determined based on a plurality of reference samples known to be completely trisomy for the predetermined chromosome.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the seventh threshold, indicating that the fetus indicates that the fetus is for the predetermined chromosome for the predetermined chromosome Being completely monomeric;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is not less than the seventh threshold and not greater than the eighth threshold indicating that the fetus represents the fetus for the predetermined chromosome
  • the predetermined chromosome is a monomeric chimera
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the eighth threshold and less than the ninth threshold indicates that the fetus represents the fetus for the predetermined chromosome for the predetermined chromosome Be normal;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is not less than the ninth threshold and not greater than the tenth threshold indicating that the fetus represents the fetus for the predetermined chromosome
  • the predetermined chromosome is a three-body chimera
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the tenth threshold indicating that the fetus indicates that the fetus is a complete trisomy for the predetermined chromosome for the predetermined chromosome.
  • the seventh threshold is at least -1 and the seventh threshold is less than 0, optionally the fifth threshold is -0.85;
  • the eighth threshold is greater than the seventh threshold and the eighth threshold is less than 0, optionally the eighth threshold is -0.3;
  • the ninth threshold is greater than 0 and the ninth threshold is less than 1, optionally the ninth threshold is 0.3;
  • the tenth threshold is greater than the ninth threshold and the tenth threshold is less than one, optionally the tenth threshold is 0.85. Thereby, the efficiency of analyzing the presence or absence of a chimera of a specific chromosome in the fetus can be further improved.
  • the invention provides a system for detecting a fetal chimera.
  • the system comprises:
  • the first free fetal DNA ratio determining device is a device for determining a ratio of free nucleic acids in a biological sample as described above, and is used for nucleic acid sequencing of a peripheral blood of a pregnant woman of a fetus in order to obtain The sequencing result of the plurality of sequencing data, and determining the first free fetal DNA ratio based on the sequencing data, or the first free fetal DNA ratio determining device is adapted to estimate the fetal concentration using the Y chromosome based on the following formula: chrY% as the ratio of the first free fetal DNA:
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%, where fra.chry represents the ratio of the first free fetal DNA, chry.ER% indicates the percentage of the sequencing data derived from chromosome Y in the sequencing results as a percentage of the total sequencing data; Female.chry.ER% indicates that the pre-determined peripheral blood samples of pregnant women with normal fetuses are derived from chromosome Y.
  • Man.chry.ER% represents the average of the percentage of the sequencing data of the free nucleic acid derived from chromosome Y in the predetermined normal male peripheral blood sample as a percentage of the total sequencing data.
  • a third free fetal DNA ratio determining device the third free fetal DNA ratio determining device being adapted to determine a third free fetal DNA ratio based on sequencing data derived from a predetermined chromosome in the sequencing result; and a chimera determining device
  • the chimera determining device is adapted to determine whether a chimera is present in the twin for the predetermined chromosome based on the ratio of the first free fetal DNA ratio to the third free fetal DNA.
  • the above-described system for determining a fetal chimera can effectively carry out the aforementioned method of determining a fetal chimera, thereby enabling an effective analysis of the fetal chimera.
  • the third free fetal DNA ratio is determined based on the following formula:
  • Fra.chri represents the third free fetal DNA ratio, i is the number of the predetermined chromosome, and i is an arbitrary integer ranging from 1 to 22;
  • chri.ER% represents the percentage of the sequencing data derived from the predetermined chromosome in the sequencing result as a percentage of the total sequencing data
  • the adjust.chri.ER% represents a percentage average of the total sequenced data of the free nucleic acid sequencing data derived from the predetermined chromosome in the peripheral blood sample of the pregnant woman harboring the normal fetus.
  • the method includes:
  • step (b) comparing the ratio obtained in step (a) with a predetermined plurality of thresholds to determine whether the fetus has a chimera for the predetermined chromosome.
  • the predetermined plurality of thresholds comprise at least one selected from the group consisting of:
  • the seventh threshold being determined based on a plurality of reference samples known to be completely single for the predetermined chromosome
  • An eighth threshold the eighth threshold being determined based on a plurality of reference samples known to be monomeric for the predetermined chromosome
  • the ninth threshold being determined based on a plurality of reference samples known to be normal for the predetermined chromosome
  • a tenth threshold the tenth threshold being determined based on a plurality of reference samples known to be completely trisomy for the predetermined chromosome.
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is less than the seventh threshold, indicating that the fetus indicates that the fetus is for the predetermined chromosome for the predetermined chromosome Being completely monomeric;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is not less than the seventh threshold and not greater than the eighth threshold indicating that the fetus represents the fetus for the predetermined chromosome
  • the predetermined chromosome is a monomeric chimera
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the eighth threshold and no greater than the ninth threshold indicates that the fetus indicates the fetus for the predetermined chromosome
  • the chromosome is normal;
  • a ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the ninth threshold and no greater than the tenth threshold indicates that the fetus indicates the fetus for the predetermined chromosome
  • the chromosome is a three-body chimera
  • the ratio of the ratio of the third free fetal DNA to the ratio of the first free fetal DNA is greater than the tenth threshold indicating that the fetus indicates that the fetus is a complete trisomy for the predetermined chromosome for the predetermined chromosome.
  • the seventh threshold is greater than -1 and the seventh threshold is less than 0, optionally the fifth threshold is -0.85;
  • the eighth threshold is greater than the seventh threshold and the eighth threshold is less than 0, optionally the eighth threshold is -0.3;
  • the ninth threshold is greater than 0 and the ninth threshold is less than 1, optionally the ninth threshold is 0.3;
  • the tenth threshold is greater than the ninth threshold and the tenth threshold is less than one, optionally the tenth threshold is 0.85. Thereby, the efficiency of analyzing the presence or absence of a chimera of a specific chromosome in the fetus can be further improved.
  • the invention provides a method of detecting a fetal chimera, according to an embodiment of the invention, the method of determining a fetal chimera comprising:
  • T i (x i - ⁇ i )/ ⁇ i , where i represents the chromosome number, and i is an arbitrary integer in the range of 1 to 22, and ⁇ i represents The sequencing data of the i-th chromosome selected as the reference frame in the reference database accounts for the mean of the percentage of the total sequencing data, ⁇ i : the sequencing data of the i-th chromosome selected as the reference frame in the reference database accounts for the total sequencing The standard deviation of the percentage of data;
  • T2 i (x i - ⁇ i *(1+fra/2))/ ⁇ i ;
  • d(T i , a) and d(T2 i , a) are the probability density function of the t distribution, a is the degree of freedom, and fra is the ratio of free fetal DNA determined according to the method for determining the ratio of free nucleic acids in the biological sample as described above;
  • the fetus is determined to be completely monomeric or monomeric to the predetermined chromosome
  • the fetus is determined to be a monomer chimera for the predetermined chromosome
  • the fetus is determined to be a sample with a relatively low fetal concentration, and the result is not used.
  • T is greater than 0
  • a four-quadrant map is obtained according to the T value and the L value
  • the abscissa is the L value
  • the ordinate is the T value
  • the straight line T the predetermined thirteenth threshold
  • L the predetermined fourteenth threshold
  • the fetus is determined to be a fully trisomy or trisomy chimera for the predetermined chromosome
  • the fetus is determined to be a three-body chimera for the predetermined chromosome
  • the fetus is determined to be a sample with a relatively low fetal concentration, and the result is not used.
  • the eleventh threshold and the thirteenth threshold are each independently 3, and the twelfth threshold and the fourteen threshold are each independently 1.
  • normal male fetus/female fetus/fetal used herein means that the fetal chromosome is normal.
  • normal male fetus refers to a male fetus whose normal chromosome is normal.
  • the "normal male/female/fetal” may be a single or twin.
  • a "normal male” may be a normal single male or a normal double male; a "normal fetus” does not limit the fetus.
  • the gender is not limited to single or twin.
  • the proportion of free fetal DNA in 11 plasma samples of the pregnant woman to be tested is estimated as follows:
  • Peripheral blood samples from 11 pregnant women and 37 pregnant women with known male fetuses were extracted for plasma separation to obtain peripheral blood samples of pregnant women and pregnant women with known pregnant women.
  • the sequencing process is performed in strict accordance with the standard operating procedures of Complete Genomics Inc.
  • the length of the window is a certain length, and multiple windows are divided according to a certain step size, and the DNA molecules appearing under each window are counted.
  • Frequency the frequency at which DNA molecules appear at this length.
  • the number of DNA molecules distributed under each window that is, the range of the specified length of the window divided by the total number of molecular molecules, is defined as the frequency at which DNA molecules appear under each window. For example, it can be 1 bp, 5 bp, 10 bp, 15 bp, ... as a window, and the step size ranges from 1 bp to the window length.
  • 5 bp is a window, and in 2 bp steps, statistics can be counted [1 bp, 5 bp], [2 bp] , 6bp], [4bp, 8bp], [6bp, 10bp], ... the distribution of DNA molecules in each window; for example, 5bp as a window, in 5bp steps, you can count [1bp, 5bp], [ 6 bp, 10 bp], [11 bp, 15 bp], ... distribution of DNA molecules under each window.
  • c) Find one or more regions in which the length of the DNA molecules of the 37 pregnant women who are known to have a male fetus is relatively high in correlation with the known proportion of the fetus.
  • 37 samples of pregnant women with known male fetuses were estimated by Y chromosome (for specific estimation methods, please refer to: Fuman Jiang, Jinghui Ren, et al.
  • the fetal sex of 11 samples to be tested was determined according to the following steps:
  • fra.chry/fra.size is not less than 0.35 and not more than 0.7, one fetus in the twin is a male fetus, and one fetus is a fetus;
  • the fetal chromosome aneuploidy of 11 samples to be tested was determined by fetal concentration according to the following procedure:
  • fra.chri/fra.size is not less than 0.35 and not more than 0.7.
  • One fetus in the twin is the third chromosome of the i-th chromosome, and the ith chromosome of one fetus is normal;
  • the fetal chromosome aneuploidy of 11 samples to be tested was determined by the T value & L value according to the following procedure:
  • T i (x i - ⁇ i ) / ⁇ i , where
  • x i the percentage of Effective Reads of the i-th chromosome in the sample
  • ⁇ i the mean of the percentage of Effective Reads of the i-th chromosome selected as the reference frame in the reference database;
  • ⁇ i the standard deviation of the percentage of Effective Reads of the i-th chromosome selected as the reference frame in the reference database;
  • the sample in the twins is determined to be a relatively low fetal concentration, but the quality control is not.
  • the interrupted DNA which is aborted tissue was mixed with the plasma of the unpregnant female in a certain ratio as a simulated pregnant woman sample.
  • the fetal (male fetus) chromosome number abnormalities were detected by the following methods, which included the following steps:
  • the G and C bases account for the percentage of the total base: the position information of the Effective Reads and the Effective Reads Base information, the number of Effective Reads per chromosome in the sample to be tested is the percentage of total Effective Reads; and the G, C bases and all base percentages of all Effective Reads per chromosome are counted.
  • Fra.chri represents the ratio of free fetal DNA, i is the number of the predetermined chromosome, and i is an arbitrary integer in the range of 1 to 22;
  • chri.ER% ER% of sample chromosome i (short for Effective Reads Rate, the percentage of unique alignment sequence);
  • Adjust.chri.ER% The theoretical value of ER% of chromosome i when the sample is normal;
  • Fra.chry (chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%
  • chry.ER% ER% of the chromosome Y of the sample to be tested (short for Effective Reads rate, the percentage of the unique alignment sequence);
  • Female.chry.ER% ER% mean of chromosome Y in female fetus samples
  • Man.chry.ER% ER% mean of chromosome Y in male samples
  • the sequencing process is performed in strict accordance with the standard operating procedures of Complete Genomics Inc.
  • WGS Whole Genome Sequencing
  • the sequencing sequence of the detection sample is compared with the reference genome sequence to obtain the position information capable of uniquely aligning the reference sequence.
  • ER i f i (GC i )+ ⁇ i and calculate the UR mean .
  • the corrected ER value is calculated according to the above relationship and the ER and GC of the sample:
  • x i the percentage of Effective Reads of the i-th chromosome in the sample
  • ⁇ i the mean of the percentage of Effective Reads of the i-th chromosome selected as the reference frame in the reference database;
  • ⁇ i the standard deviation of the percentage of Effective Reads of the i-th chromosome selected as the reference frame in the reference database;
  • T2 i (x i - ⁇ i * (1 + fra / 2)) / ⁇ i ;
  • L i log(d(T i , a)) / log(d(T2 i , a)), d(T i , a) and d(T2 i , a) are the probability density of the t distribution Function, a is the degree of freedom, fra means fra.chry or fra.size.
  • the negative sample used is a normal non-pregnant female plasma sample
  • the positive sample is prepared by randomly aborting the DNA of the aborted tissue according to the plasma of 150 bp to 200 bp and the normal non-pregnant female plasma
  • (T21, T18 are male) Fetal
  • T13 is a female fetus
  • positive chimeric samples were prepared by mixing placental tissue DNA (150 bp to 200 bp randomly interrupted), inflammatory yellow cell line DNA (150 bp to 200 bp randomly interrupted) mixed normal women and plasma
  • T21, T18 is Male fetus
  • T13 is female fetus
  • Fra.chr13 indicates the concentration of mixed DNA estimated by chromosome 13
  • Fra.chr18 indicates the concentration of mixed DNA estimated by chromosome 18
  • Fra.chr21 indicates the concentration of mixed DNA estimated by chromosome 21;
  • Fra.chry indicates the mixed DNA concentration calculated by the Y chromosome
  • T21-10%-30% DNA interrupted by the trisomy 21 cell line was dosed to female plasma at a concentration of 10% and a chimeric ratio of 30%;
  • T.chr13 T value of chromosome 13
  • L.chr13 L value of chromosome 13
  • T.chr18 T value of chromosome 18
  • L.chr18 L value of chromosome 18
  • T.chr21 T value of chromosome 21
  • L.chr21 L value of chromosome 21
  • Fra.chry indicates the fetal concentration of the mixed DNA estimated by the Y chromosome
  • T21-10%-30% DNA interrupted by the trisomy 21 cell line was dosed to female plasma at a concentration of 10% and a chimeric ratio of 30%;
  • Fra.chr13 indicates the concentration of mixed DNA estimated by chromosome 13
  • Fra.chr18 indicates the concentration of mixed DNA estimated by chromosome 18
  • Fra.chr21 indicates the concentration of mixed DNA estimated by chromosome 21;
  • Fra.size indicates the concentration of mixed DNA estimated by the fragment
  • T21-10%-30% DNA interrupted by the trisomy 21 cell line was dosed to female plasma at a concentration of 10% and a chimeric ratio of 30%;
  • T.chr13 T value of chromosome 13
  • L.chr13 L value of chromosome 13
  • T.chr18 T value of chromosome 18
  • L.chr18 L value of chromosome 18
  • T.chr21 T value of chromosome 21
  • L.chr21 L value of chromosome 21
  • Fra.size indicates the concentration of mixed DNA estimated by the fragment
  • T21-10%-30% DNA interrupted by the trisomy 21 cell line was dosed to female plasma at a concentration of 10% and a chimeric ratio of 30%;

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

本发明提供了确定生物样本中预定来源的游离核酸比例的方法、装置及其用途,其中,该方法包括:(1)对含有游离核酸的生物样本进行核酸测序,以便获得多个测序数据的测序结果;(2)基于所述测序结果,确定所述样本中长度落入预定范围的核酸分子的数目;以及(3)基于所述长度落入预定范围的核酸分子的数目,确定所述生物样本中所述游离核酸的比例。

Description

确定生物样本中游离核酸比例的方法、装置及其用途 技术领域
本发明涉及生物技术领域,具体地涉及确定生物样本中游离核酸比例的方法、装置及其用途。
背景技术
自1977年,研究者先后在肿瘤患者的外周血中发现了癌源性DNA,还证实了孕妇血浆中存在cff-DNA,而检测估算出肿瘤患者的外周血中的癌源性DNA,以及孕妇血浆中游离的胎儿DNA比例,即确定生物样本中预定来源的游离核酸比例,意义重大。
然而,目前确定生物样本中游离核酸比例的方法仍有待改进。
发明内容
本发明旨在至少解决现有技术中存在的技术问题之一。为此,本发明的一个目的在于提出一种能够准确高效地确定生物样本中预定来源的游离核酸比例的方法。
需要说明的是,本发明是基于发明人的下列发现而完成的:
目前,用来估算外周血中胎儿游离的DNA比例主要有两个方向:1)利用母体外周血单核细胞中来自母体和来自游离胎儿DNA片段对特定标记物甲基化的不同反应进行估算;2)利用单核苷酸多态性(SNPs)位点表现的差异性,选择多个具有代表性的SNPs位点进行估算。这两种方法都存在一定的局限性:方法1)需要的血浆量比较大,方法2)需要探针捕获和高测序深度或是需要获得父源性信息。然而,至今未发现有报告是基于低覆盖深度的全基因组测序条件下对游离的胎儿DNA比例进行估算。有研究表明,母血循环中游离胎儿DNA片段一般比游离母体DNA片段要短,绝大部分小于313bp。受此启发,发明人发明了一套基于孕妇血浆测序估算游离的胎儿DNA比例的流程,并且,该方法应用性广,能应用到游离的不同来源DNA的领域。例如还可以用于估算肿瘤患者的外周血中的癌源性DNA的比例。
根据本发明的一个方面,本发明提供了一种确定生物样本中预定来源的游离核酸比例的方法。根据本发明的实施例,该方法包括:(1)对含有游离核酸的生物样本进行核酸测序,以便获得多个测序数据的测序结果;(2)基于所述测序结果,确定所述样本中长度落入预定范围的核酸分子的数目;以及(3)基于所述长度落入预定范围的核酸分子的数目,确定所述生物样本中所述游离核酸的比例。发明人惊奇地发现,利用本发明的方法能够准确高效地确定生物样本中预定来源的游离核酸比例,尤其适用于确定孕妇外周血中的游离胎儿核酸,以及肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中游离肿瘤核酸的比例。
根据本发明的实施例,所述生物样本为外周血。
根据本发明的实施例,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸或母亲来源的游离核酸,或者肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中的游离肿瘤核酸或非肿瘤来源的游离核酸。由此,能够容易地确定孕妇外周血中的游离胎儿核酸的比例,后者肿瘤患者外周血中游离肿瘤核酸的比 例。
根据本发明的一些具体示例,所述核酸为DNA。
根据本发明的实施例,所述测序结果包括所述游离核酸的长度。
根据本发明的实施例,所述测序为双末端测序、单末端测序或单分子测序。由此,容易得到游离核酸的长度,有利于后续步骤的进行。
根据本发明的实施例,所述核酸为DNA。
根据本发明的实施例,步骤(2)进一步包括:(2-1)将所述测序结果与参考基因组进行比对,以便构建唯一比对测序数据集,所述唯一比对测序数据集中的每一个测序数据仅能够与所述参考基因组的一个位置匹配;(2-2)确定所述唯一比对测序数据集中各测序数据所对应的核酸分子的长度;以及(2-3)确定长度落入所述预定范围的核酸分子的数目。由此,能够容易地确定样本中长度落入预定范围的核酸分子的数目,且结果准确可靠,重复性好。
根据本发明的实施例,其中,在步骤(2-2)中,将所述测序数据能够与所述参考基因组匹配的序列长度作为与所述测序数据对应的核酸分子的长度。由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
根据本发明的实施例,所述测序为双末端测序,其中,在步骤(2-2)中,包括:(2-2-1)基于所述双末端测序数据的一侧测序数据,在所述参考基因组上,确定所述核酸的5’末端位置;(2-2-2)基于所述双末端测序数据的另一侧测序数据,在所述参考基因组上,确定所述核酸的3’末端位置;以及(2-2-3)基于所述核酸的5’末端位置以及所述核酸的3’末端位置,确定所述核酸的长度。由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
根据本发明的实施例,所述预定范围是基于多个对照样品确定的,其中,所述对照样品中预定来源的游离核酸比例是已知的。由此,确定的预定范围结果准确可靠。
根据本发明的实施例,所述预定范围是基于至少20个对照样品确定的。
根据本发明的实施例,所述预定范围是通过下列步骤确定的:(a)确定所述多个对照样品中所包含游离核酸分子的长度;(b)设定多个候选长度范围,并分别确定所述多个对照样品在各候选长度范围内出现游离核酸分子的频率;(c)基于所述多个对照样品在各候选长度范围内出现游离核酸分子的频率以及所述对照样品中预定来源的游离核酸比例,确定各所述候选长度范围与所述预定来源的核酸比例的相关性系数;以及(d)选择相关性系数最大的候选长度范围作为所述预定范围。由此,能够准确有效地确定预定范围。
根据本发明的实施例,所述候选长度的跨度为5~20bp。
根据本发明的实施例,步骤(3)进一步包括:(3-1)基于所述长度落入预定范围的核酸分子的数目,确定在所述预定范围内出现游离核酸分子的频率;以及(3-2)基于在所述预定范围内出现游离核酸分子的频率,根据预定的函数,确定所述生物样本中预定来源的游离核酸的比例,其中,所述预定的函数是基于所述多个对照样品确定的。由此,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。
根据本发明的实施例,所述预定的函数是通过下列步骤获得的:(i)分别在所述多个对照样品中,确定在所述预定范围内出现游离核酸分子的频率;以及(ii)将所述多个对照样品中在所述预定范围内 出现游离核酸分子的频率与已知的预定来源的游离核酸比例进行拟合,以便确定所述预定的函数。由此,确定的预定的函数准确可靠,有利于后续步骤的进行。
根据本发明的实施例,所述拟合为线性拟合。
根据本发明的实施例,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸,所述预定范围是185~204bp。由此,基于该预定范围,能够准确地确定生物样本中游离核酸的比例。根据本发明的实施例,所述预定的函数为d=0.0334*p+1.6657,其中,d表示游离胎儿核酸比例,p表示在所述预定范围内出现游离核酸分子的频率。基于该预定的函数,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的孕妇外周血样本。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述已知游离胎儿核酸比例是利用Y染色体确定的。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品的游离核酸比例为游离胎儿DNA比例,并且所述游离胎儿DNA比例是利用Y染色体估测得到的。由此,能够有效利用对照样品的游离核酸比例,确定预定范围,进而确定待测孕妇样本中长度落入预定范围的核酸分子的数目以及待测孕妇样本中游离胎儿DNA比例。
根据本发明的另一方面,本发明还提供了一种用于确定生物样本中预定来源的游离核酸比例的设备。根据本发明的实施例,该设备包括:测序装置,所述测序装置用于对含有游离核酸的生物样本进行核酸测序,以便获得多个测序数据的测序结果;计数装置,所述计数装置与所述测序装置相连,并且用于基于所述测序结果,确定所述样本中长度落入预定范围的核酸分子的数目;以及游离核酸比例确定装置,所述游离核酸比例确定装置与所述计数装置相连,并且用于基于所述长度落入预定范围的核酸分子的数目,确定所述生物样本中所述预定来源的游离核酸的比例。发明人惊奇地发现,本发明的设备适于实施前面所述的本发明的确定生物样本中预定来源的游离核酸比例的方法,进而利用本发明的设备能够准确高效地确定生物样本中预定来源的游离核酸的比例,尤其适用于确定孕妇外周血中的游离胎儿核酸,以及肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中的游离肿瘤核酸的比例。
根据本发明的实施例,所述生物样本为外周血。
根据本发明的实施例,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸或母亲来源的游离核酸,或者肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中的游离肿瘤核酸或非肿瘤来源的游离核酸。由此,能够容易地确定孕妇外周血中的游离胎儿核酸的比例,或者肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中游离肿瘤核酸的比例。
根据本发明的实施例,所述核酸为DNA。
根据本发明的实施例,所述测序结果包括所述游离核酸的长度。
根据本发明的实施例,所述测序为双末端测序、单末端测序或单分子测序。由此,容易得到游离核酸的长度,有利于后续步骤的进行。
根据本发明的实施例,所述计数装置进一步包括:比对单元,所述比对单元用于将所述测序结果与参考基因组进行比对,以便构建唯一比对测序数据集,所述唯一比对测序数据集中的每一个测序数据仅能够与所述参考基因组的一个位置匹配;第一长度确定单元,所述第一长度确定单元与所述比对单元相 连,用于确定所述唯一比对测序数据集中各测序数据所对应的核酸分子的长度;以及数目确定单元,所述数目确定单元与所述第一长度确定单元相连,用于确定长度落入所述预定范围的核酸分子的数目。由此,能够容易地确定样本中长度落入预定范围的核酸分子的数目,且结果准确可靠,重复性好。
根据本发明的实施例,其中,所述第一长度确定单元将所述测序数据能够与所述参考基因组匹配的序列长度作为与所述测序数据对应的核酸分子的长度。由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
根据本发明的实施例,所述测序为双末端测序,其中,所述第一长度确定单元进一步包括:5’末端位置确定模块,所述5’末端位置确定模块用于基于所述双末端测序数据的一侧测序数据,在所述参考基因组上,确定所述核酸的5’末端位置;3’末端位置确定模块,所述3’末端位置确定模块与所述5’末端位置确定模块相连,用于基于所述双末端测序数据的另一侧测序数据,在所述参考基因组上,确定所述核酸的3’末端位置;以及长度计算模块,所述长度计算模块与所述3’末端位置确定模块相连,用于基于所述核酸的5’末端位置以及所述核酸的3’末端位置,确定所述核酸的长度。由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
根据本发明的实施例,进一步包括预定范围确定装置,所述预定范围确定装置用于基于多个对照样品确定所述预定范围,其中,所述对照样品中预定来源的游离核酸比例是已知的,任选地,所述预定范围是基于至少20个对照样品确定的。
根据本发明的实施例,所述预定范围确定装置进一步包括:第二长度确定单元,所述第二长度确定单元用于确定所述多个对照样品中所包含游离核酸分子的长度;第一比例确定单元,所述第一比例确定单元与所述第二长度确定单元相连,用于设定多个候选长度范围,并分别确定所述多个对照样品在各所述候选长度范围内出现游离核酸分子的频率;相关性系数确定单元,所述相关性系数确定单元与所述第一比例确定单元相连,用于基于所述多个对照样品在各候选长度范围内出现游离核酸分子的频率以及所述对照样品中预定来源的游离核酸比例,确定各所述候选长度范围与所述预定来源的游离核酸比例的相关性系数;以及预定范围确定单元,所述预定范围确定单元与所述相关性系数确定单元相连,用于选择相关性系数最大的候选长度范围作为所述预定范围。由此,能够准确有效地确定预定范围。
根据本发明的实施例,所述候选长度的跨度为1~20bp。
根据本发明的实施例,所述多个候选长度范围的步长为1~2bp。
根据本发明的实施例,所述游离核酸比例确定装置进一步包括:第二比例确定单元,所述第二比例确定单元用于基于所述长度落入预定范围的核酸分子的数目,确定在所述预定范围内出现游离核酸分子的频率;以及游离核酸比例计算单元,所述游离核酸比例计算单元与所述第二比例确定单元相连,用于基于在所述预定范围内出现游离核酸分子的频率,根据预定的函数,确定所述生物样本中预定来源的游离核酸的比例,其中,所述预定的函数是基于所述多个对照样品确定的。由此,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。
根据本发明的实施例,进一步包括预定函数确定装置,所述预定函数确定装置包括:第三比例确定单元,所述第三比例确定单元用于分别在所述多个对照样品中,确定在所述预定范围内出现游离核酸分子的频率;以及拟合单元,所述拟合单元与所述第三比例确定单元相连,用于将所述多个对照样品中在所述预定范围内出现游离核酸分子的频率与已知的预定来源的游离核酸比例进行拟合,以便确定所述预 定的函数。由此,确定的预定的函数准确可靠,有利于后续步骤的进行。
根据本发明的实施例,所述拟合为线性拟合。
根据本发明的实施例,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸,所述预定范围是185~204bp。由此,基于该预定范围,能够准确地确定生物样本中游离核酸的比例。
根据本发明的实施例,所述预定的函数为d=0.0334*p+1.6657,其中,d表示游离胎儿核酸比例,p表示在所述预定范围内出现游离核酸分子的频率。基于该预定的函数,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的孕妇外周血样本。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述已知游离胎儿核酸比例是利用Y染色体确定的。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品的游离核酸比例为游离胎儿DNA比例,并且所述游离胎儿DNA比例是利用适于进行Y染色体估测的设备得到的。由此,能够有效利用对照样品的游离核酸比例,确定预定范围,进而确定待测孕妇样本中长度落入预定范围的核酸分子的数目以及待测孕妇样本中游离胎儿DNA比例。
需要说明的是,本发明的确定生物样本中游离核酸比例的方法和装置至少具有以下有点:
1)普遍性:可以估算出所有通过质控样品的游离游离胎儿DNA的比率,特别是女胎。
2)可提高NIPT检测的准确度。
3)操作简易性:只需用下机数据直接估算游离的来源游离胎儿DNA比例。
根据本发明的又一方面,本发明提供了一种确定双胞胎性别的方法。根据本发明的实施例,该方法包括:(1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;(2)基于所述测序数据,根据前面所述的确定生物样本中游离核酸比例的方法,确定第一游离胎儿DNA比例;(3)基于所述测序结果中来源于Y染色体的测序数据,确定第二游离胎儿DNA比例;以及(4)基于所述第一游离胎儿DNA比例与所述第二游离胎儿DNA比例,确定所述双胞胎的性别。发明人惊奇地发现,利用本发明的方法能够准确高效地确定怀有双胞胎的孕妇的双胞胎性别。
根据本发明的实施例,所述第二游离胎儿DNA比例是基于下列公式确定的:
fra.chry=(chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
其中,fra.chry表示所述第二游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值。由此,能够准确确定第二游离胎儿DNA比例。根据本发明的实施例,在步骤(4)中,包括:(a)确定所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及(b)将步骤(a)中所得到的比值与预定的第一阈值和第二阈值进行比较,以便确定所述双胞胎的性别。由此,能够有效 确定双胞胎的性别。
根据本发明的实施例,所述第一阈值是基于多个已知双胞胎均为女胎的参照样本确定的,所述第二阈值是基于多个已知双胞胎均为男胎的参照样本确定的。
根据本发明的实施例,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第一阈值表示所述双胞胎均为女胎,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第二阈值表示所述双胞胎均为男胎,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第一阈值或者第二阈值,或者介于所述第一阈值与所述第二阈值之间表示所述双胞胎包括一个男胎和一个女胎。
根据本发明的实施例,所述第一阈值为0.35,所述第二阈值为0.7。
根据本发明的另一方面,本发明提供了一种确定双胞胎性别的系统。根据本发明的实施例,该系统包括:第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为前面所述的确定生物样本中游离核酸比例的设备,用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例;第二游离胎儿DNA比例确定设备,所述第二游离胎儿DNA比例确定设备适于基于所述测序结果中来源于Y染色体的测序数据,确定第二游离胎儿DNA比例;以及性别确定设备,所述性别确定设备适于基于所述第一游离胎儿DNA比例与所述第二游离胎儿DNA比例,确定所述双胞胎的性别。发明人惊奇地发现,利用本发明的系统能够准确高效地确定怀有双胞胎的孕妇的双胞胎性别。
根据本发明的实施例,所述第二游离胎儿DNA比例是基于下列公式确定的:
fra.chry=(chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
其中,fra.chry表示所述第二游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值。由此,能够准确确定第二游离胎儿DNA比例。
根据本发明的实施例,所述性别确定设备进一步包括:比值确定单元,所述比值确定单元用于确定所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及比较单元,所述比较单元用于将所得到的比值与预定的第一阈值和第二阈值进行比较,以便确定所述双胞胎的性别。由此,能够有效确定双胞胎的性别。
根据本发明的实施例,所述第一阈值是基于多个已知双胞胎均为女胎的参照样本确定的,所述第二阈值是基于多个已知双胞胎均为男胎的参照样本确定的。
根据本发明的实施例,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第一阈值表示所述双胞胎均为女胎,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第二阈值表示所述双胞胎均为男胎,所述第二游离胎儿DNA比例与所述第一游离胎儿 DNA比例的比值等于所述第一阈值或者第二阈值,或者介于所述第一阈值与所述第二阈值之间表示所述双胞胎包括一个男胎和一个女胎。
根据本发明的实施例,所述第一阈值为0.35,所述第二阈值为0.7。
根据本发明的又一方面,本发明提供了一种检测双胞胎染色体非整倍性的方法。根据本发明的实施例,该方法包括:(1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;(2)基于所述测序数据,根据前面所述的确定生物样本中游离核酸比例的方法,确定第一游离胎儿DNA比例;(3)基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及(4)基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在非整倍性。由此,能够准确有效地进行双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%,其中,fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;adjust.chri.ER%表示预先确定的怀有正常双胞胎的孕妇外周血样本中的来源于所述预定染色体的游离核酸的测序数据占总测序数据的百分比平均值。由此,能够准确确定第三游离胎儿DNA比例。
根据本发明的实施例,在步骤(4)中,包括:(a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及(b)将步骤(a)中所得到的比值与预定的第三阈值和第四阈值进行比较,以便确定所述双胞胎针对所述预定染色体是否存在非整倍性。由此,能够有效实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第三阈值是基于多个已知双胞胎针对所述预定染色体均不存在非整倍性的参照样本确定的,所述第四阈值是基于多个已知双胞胎针对所述预定染色体均为非整倍性的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第三阈值表示所述双胞胎针对所述预定染色体均不存在非整倍性,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第四阈值表示所述双胞胎均针对所述预定染色体均为非整倍性,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第三阈值或者第四阈值,或者介于所述第三阈值与所述第四值之间表示所述双胞胎其中一个针对所述预定染色体具有非整倍性,另外一个针对所述预定染色体不具有非整倍性。
根据本发明的实施例,所述第三阈值为0.35,所述第四阈值为0.7。
根据本发明的实施例,所述预定染色体为18、21和23号染色体至少之一。
根据本发明的再一方面,本发明提供了一种确定双胞胎染色体非整倍性的系统。根据本发明的实施例,该系统包括:第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为前面所述的确定生物样本中游离核酸比例的设备,用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例;第三游离胎 儿DNA比例确定设备,所述第三游离胎儿DNA比例确定设备适于基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及第一非整倍性确定设备,所述第一非整倍性确定设备适于基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在非整倍性。发明人惊奇地发现,利用本发明的系统能够准确有效地实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%,其中,fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;adjust.chri.ER%表示预先确定的怀有正常双胞胎的孕妇外周血样本中来源于所述预定染色体的游离核酸的测序数据占总测序数据的百分比平均值。由此,能够准确确定第三游离胎儿DNA比例。
根据本发明的实施例,非整倍性确定设备进一步包括:比值确定单元,所述比值确定单元用于确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及比较单元,将所得到的比值与预定的第三阈值和第四阈值进行比较,以便确定所述双胞胎针对所述预定染色体是否存在非整倍性。由此,能够有效实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第三阈值是基于多个已知双胞胎针对所述预定染色体均不存在非整倍性的参照样本确定的,所述第四阈值是基于多个已知双胞胎针对所述预定染色体均为非整倍性的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第三阈值表示所述双胞胎针对所述预定染色体均不存在非整倍性,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第四阈值表示所述双胞胎均针对所述预定染色体均为非整倍性,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第三阈值或者第四阈值,或者介于所述第三阈值与所述第四值之间表示所述双胞胎其中一个针对所述预定染色体具有非整倍性,另外一个针对所述预定染色体不具有非整倍性。由此,能够有效实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第三阈值为0.35,所述第四阈值为0.7。
根据本发明的实施例,所述预定染色体为18、21和23号染色体至少之一。
根据本发明的另一方面,本发明提供了一种确定双胞胎染色体非整倍性的方法。根据本发明的实施例,该方法包括:(1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;(2)确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;(3)基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;(4)基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22 范围内的任意整数,T2i=(xii*(1+fra/2))/σi;d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据前面所述的确定生物样本中游离核酸比例的方法确定的第一游离胎儿DNA比例,或者是使用Y染色体估算的胎儿浓度fra.chrY%,其中,
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;(5)根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第五阈值,L=预定的第六阈值,将所述四象限图进行划分区域,其中,当待测样本落在第一象限,则双胎中两个胎儿都判定为三体;当待测样本落在第二象限,则双胎中有一个胎儿都判定为三体,一个正常;当待测样本落在第三象限,则双胎中两个胎儿都判定为正常;当待测样本落在第四象限,则双胎中判定为胎儿浓度比较低的样本,不采用其结果。发明人惊奇地发现,利用本发明的确定双胞胎染色体非整倍性的方法,能够准确高效地实现对孕妇双胞胎染色体非整倍性的检测,确定双胞胎预定染色体是否存在非整倍性。
根据本发明的另一方面,本发明还提供了一种确定双胞胎染色体非整倍性的系统。根据本发明的实施例,该系统包括:xi值确定设备,所述xi值确定设备用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;T值确定设备,所述T值确定设备用于基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;L值确定设备,所述L值确定设备用于基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,T2i=(xii*(1+fra/2))/σi;d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据前面所述确定的第一游离胎儿DNA比例,或者是使用Y染色体估算的胎儿浓度fra.chrY%,其中,
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示第二游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;第二非整倍性确定设备,所述第二非整倍性确定设备适于根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第五阈值,L=预定的第六阈值,将所述四象限图进行划分区域,其中,当待测样本落在第一象限,则双胎中两个胎儿都判定为三体;当待测样本落在第二象限,则双胎中有一个胎儿都判定为三体,一个正常;当待测样本落在第三象限,则双胎中两个胎儿都判定为正常;当 待测样本落在第四象限,则双胎中判定为胎儿浓度比较低的样本,不采用其结果。发明人发现,利用本发明的确定双胞胎染色体非整倍性的系统,能够准确高效地实现对孕妇双胞胎染色体非整倍性的检测,确定双胞胎预定染色体是否存在非整倍性。
在本发明的又一方面,本发明提出了一种检测胎儿嵌合体的方法。根据本发明的实施例,该方法包括:(1)对怀有胎儿孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,任选地,所述胎儿为男胎;(2)基于所述测序数据,根据前面所述的方法,确定第一游离胎儿DNA比例,或者,基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示第一游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;(3)基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及(4)基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述胎儿中针对所述预定染色体是否存在嵌合体。由此,能够准确地对胎儿中是否存在特定染色体的嵌合体情况进行分析。
根据本发明的实施例,该方法还可以进一步具有下列附加技术特征:
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:
fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
其中,
fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
adjust.chri.ER%表示预先确定的怀有正常胎儿的孕妇外周血样本中来源于所述预定染色体的游离核酸的测序数据占总测序数据的百分比平均值。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
根据本发明的实施例,在步骤(4)中,包括:
(a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
(b)将步骤(a)中所得到的比值与预定的多个阈值进行比较,以便确定所述胎儿针对所述预定染色体是否存在嵌合体。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
根据本发明的实施例,所述预定的多个阈值包括选自下列的至少之一:
第七阈值,所述第七阈值是基于多个已知针对所述预定染色体为完全单体的参照样本确定的,
第八阈值,所述第八阈值是基于多个已知针对所述预定染色体为单体嵌合的参照样本确定的,
第九阈值,所述第九阈值是基于多个已知针对所述预定染色体为正常的参照样本确定的,
第十阈值,所述第十阈值是基于多个已知针对所述预定染色体为完全三体的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第七阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全单体;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第七阈值并且不大于所述第八阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为单体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第八阈值并且小于所述第九阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为正常;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第九阈值并且不大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为三体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全三体。
根据本发明的实施例,所述第七阈值为至少-1并且所述第七阈值小于0,任选地所述第五阈值为-0.85;
所述第八阈值大于所述第七阈值并且所述第八阈值小于0,任选地所述第八阈值为-0.3;
所述第九阈值大于0并且所述第九阈值小于1,任选地所述第九阈值为0.3;
所述第十阈值大于所述第九阈值并且所述第十阈值小于1,任选地所述第十阈值为0.85。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
在本发明的又一方面,本发明提出了一种检测胎儿嵌合体的系统。根据本发明的实施例,该系统包括:
第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为前面所述的确定生物样本中游离核酸比例的设备,用于对胎儿的孕妇的外周血进行核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例,或者,所述第一游离胎儿DNA比例确定设备适于基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示第一游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;第三游离胎儿DNA比例确定设备,所述第三游离胎儿DNA比例确定设备适于基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及嵌合体确定设备,所述嵌合体确定设备适于基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在嵌合体。根据本发明的实施例,上述用于确定胎儿嵌合体的系统可以有效地实施前 面所述的确定胎儿嵌合体的方法,从而能够有效地对胎儿嵌合体情况进行分析。
根据本发明的实施例,上述确定胎儿嵌合体的系统还可以具有下列附加技术特征:
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:
fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
其中,
fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
adjust.chri.ER%表示预先确定的怀有正常胎儿的孕妇外周血样本中来源于所述预定染色体的游离核酸测序数据占总测序数据的百分比平均值。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
根据本发明的实施例,在步骤(4)中,包括:
(a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
(b)将步骤(a)中所得到的比值与预定的多个阈值进行比较,以便确定所述胎儿针对所述预定染色体是否存在嵌合体。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
根据本发明的实施例,所述预定的多个阈值包括选自下列的至少之一:
第七阈值,所述第七阈值是基于多个已知针对所述预定染色体为完全单体的参照样本确定的,
第八阈值,所述第八阈值是基于多个已知针对所述预定染色体为单体嵌合的参照样本确定的,
第九阈值,所述第九阈值是基于多个已知针对所述预定染色体为正常的参照样本确定的,
第十阈值,所述第十阈值是基于多个已知针对所述预定染色体为完全三体的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第七阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全单体;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第七阈值并且不大于所述第八阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为单体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第八阈值并且小于所述第九阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为正常;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第九阈值并且不大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为三体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全三体。
根据本发明的实施例,所述第七阈值大于-1并且所述第七阈值小于0,任选地所述第五阈值为-0.85;
所述第八阈值大于所述第七阈值并且所述第八阈值小于0,任选地所述第八阈值为-0.3;
所述第九阈值大于0并且所述第九阈值小于1,任选地所述第九阈值为0.3;
所述第十阈值大于所述第九阈值并且所述第十阈值小于1,任选地所述第十阈值为0.85。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
在本发明的又一方面,本发明提出了一种检测胎儿嵌合体的方法,根据本发明的实施例,该确定胎儿嵌合体的方法包括:
(1)对怀有胎儿的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;
(2)确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;
(3)基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;
(4)基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,
T2i=(xii*(1+fra/2))/σi
d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据前面所述的确定生物样本中游离核酸比例的方法确定的游离胎儿DNA比例;
(5)如果T不大于0,则根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第十一阈值,L=预定的第十二阈值,将所述四象限图进行划分区域,
其中,
当待测样本落在第一象限,则将所述胎儿判定为针对所述预定染色体为完全单体或单体嵌合;
当待测样本落在第二象限,则将所述胎儿判定为针对所述预定染色体为单体嵌合;
当待测样本落在第三象限,则将所述胎儿判定为针对所述预定染色体为正常;
当待测样本落在第四象限,则将所述胎儿判定为胎儿浓度比较低的样本,不采用其结果,
如果T大于0,则根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第十三阈值,L=预定的第十四阈值,将所述四象限图进行划分区域,
其中,
当待测样本落在第一象限,则将所述胎儿判定为针对所述预定染色体为完全三体或三体嵌合;
当待测样本落在第二象限,则将所述胎儿判定为针对所述预定染色体为三体嵌合;
当待测样本落在第三象限,则将所述胎儿判定为针对所述预定染色体为正常;
当待测样本落在第四象限,则将所述胎儿判定为胎儿浓度比较低的样本,不采用其结果,
任选地,所述第十一阈值和所述第十三阈值分别独立地为3,所述第十二阈值和所述十四阈值分别独立地为1。
由此,可以有效地对胎儿嵌合体情况进行分析。
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本 发明的实践了解到。
附图说明
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:
图1是根据本发明一个实施例的确定生物样本中游离核酸比例的方法的流程示意图;
图2是根据本发明一个实施例的确定落入预定范围的核酸分子的数目的方法的流程示意图;
图3是根据本发明一个实施例的确定核酸分子的长度的方法的流程示意图;
图4是根据本发明一个实施例的确定预定范围的方法的流程示意图;
图5是根据本发明一个实施例的确定预定来源的游离核酸比例的方法的流程示意图;
图6是根据本发明一个实施例的确定预定函数的方法的流程示意图;
图7是根据本发明一个实施例的用于确定生物样本中预定来源的游离核酸比例的设备的结构示意图;
图8是根据本发明一个实施例的计数装置的结构示意图;
图9是根据本发明一个实施例的第一长度确定单元的结构示意图;
图10是根据本发明一个实施例的预定范围确定装置的结构示意图;
图11是根据本发明一个实施例的游离核酸比例确定装置的结构示意图;
图12是根据本发明一个实施例的预定函数确定装置的结构示意图;
图13是根据本发明一个实施例,37例已知怀有正常男胎的孕妇样本由Y染色体估测得到的游离胎儿DNA游离胎儿DNA比例,及DNA片段长度在185bp~204bp内的DNA分子出现的频率的相关性系数线性拟合图;以及
图14-16是根据本发明一个实施例,11例待测样本的T值&L值四象限图。
具体实施方式
下面详细描述本发明的实施例。下面描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。
确定生物样本中游离核酸比例的方法
根据本发明的一个方面,本发明提供了一种确定生物样本中预定来源的游离核酸比例的方法。发明人惊奇地发现,利用本发明的方法能够准确高效地确定生物样本中游离核酸的比例,尤其适用于确定孕妇外周血中的胎儿核酸,以及肿瘤患者外周血中肿瘤核酸的比例。
需要说明的是,在本文中所采用的表达方式“生物样本中预定来源的游离核酸比例”是指在生物样本中特定来源的游离核酸分子数占总游离核酸分子数的比例。例如,当所述生物样本为孕妇外周血,所述预定来源的游离核酸为游离胎儿核酸时,“生物样本中预定来源的游离核酸比例”即游离胎儿核酸比例,表示孕妇外周血中含有的游离胎儿核酸分子数占总游离核酸分子数的比例,有时,也可以称之为“孕妇外周血中的游离胎儿DNA浓度”或者游离胎儿DNA比例。再例如,当所述生物样本为肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血,所述预定来源的游离核酸为游离肿瘤核酸时,“生物样本中预定 来源的游离核酸比例”即游离肿瘤核酸比例,表示肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中含有的游离肿瘤核酸分子数占总游离核酸分子数的比例。根据本发明的实施例,参照图1,该方法包括:
S100:核酸测序
对含有游离核酸的生物样本进行核酸测序,以便获得多个测序数据的测序结果。根据本发明的实施例,所述生物样本为外周血。根据本发明的实施例,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸,或者游离肿瘤核酸。由此,能够容易地确定孕妇外周血中的游离胎儿核酸的比例,或者者肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中游离肿瘤核酸的比例。根据本发明的一些具体示例,所述核酸为DNA。需要说明的是,本文中所使用的术语“测序数据”即sequence reads,对应测序的核酸分子。
根据本发明的实施例,所述测序结果包括所述游离核酸的长度。
根据本发明的实施例,所述测序为双末端测序、单末端测序或单分子测序。由此,容易得到游离核酸的长度,有利于后续步骤的进行。
S200:确定落入预定范围的核酸分子数目
基于所述测序结果,确定所述样本中长度落入预定范围的核酸分子的数目。
其中,需要说明的是,这里所使用的术语“长度”是指核酸分子(reads)的长度,可以用碱基对即bp作为单位。
根据本发明的实施例,参照图2,S200进一步包括:
S210:将测序结果与参考基因组进行比对。具体地,将所述测序结果与参考基因组进行比对,以便构建唯一比对测序数据集,所述唯一比对测序数据集中的每一个测序数据仅能够与所述参考基因组的一个位置匹配,优选没有错配或者至多有1个或者至多2个错配的测序数据。
S220:确定核酸分子的长度。具体地,确定所述唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
S230:确定落入预定范围的核酸分子数目。具体地,确定长度落入所述预定范围的核酸分子的数目。
由此,能够容易地确定样本中长度落入预定范围的核酸分子的数目,且结果准确可靠,重复性好。
其中,根据本发明一些具体示例,其中,在步骤S220中,将所述测序数据能够与所述参考基因组匹配的序列长度作为与所述测序数据对应的核酸分子的长度。由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
根据本发明的实施例,所述测序为双末端测序,其中,参照图3,步骤S220包括:
S2210:确定5’末端位置。具体地,基于所述双末端测序数据的一侧测序数据,在所述参考基因组上,确定所述核酸的5’末端位置。
S2220:确定3’末端位置。具体地,基于所述双末端测序数据的另一侧测序数据,在所述参考基因组上,确定所述核酸的3’末端位置。
S2230:确定核酸的长度。具体地,基于所述核酸的5’末端位置以及所述核酸的3’末端位置,确定所述核酸的长度。
由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
S300:确定游离核酸比例
基于所述长度落入预定范围的核酸分子的数目,确定所述生物样本中所述游离核酸的比例。
进而,本发明的方法进一步包括确定预定范围的步骤S400(图中未示出)。根据本发明的实施例,所述预定范围是基于多个对照样品确定的,其中,所述对照样品中游离核酸比例是已知的。由此,确定的预定范围结果准确可靠。根据本发明的一些实施例,所述预定范围是基于至少20个对照样品确定的。根据本发明的实施例,参照图4,确定预定范围的步骤S400包括:
S410:确定多个对照样品中游离核酸分子的长度。具体地,确定所述多个对照样品中所包含游离核酸分子的长度。
S420:确定在各候选长度出现游离核酸的频率。具体地,设定多个候选长度范围,并分别确定所述多个对照样品在各候选长度范围内出现游离核酸分子的频率。
S430:确定相关性系数。具体地,基于所述多个对照样品在各候选长度范围内出现游离核酸分子的频率以及所述对照样品的胎儿核酸比例,确定各所述候选长度范围与所述胎儿比例的相关性系数。
S440:选择预定范围。具体地,选择相关性系数最大的候选长度范围作为所述预定范围。
由此,能够准确有效地确定预定范围。
根据本发明的实施例,所述候选长度的跨度为1~20bp。
根据本发明的实施例,所述多个候选长度范围的步长为1~2bp。
根据本发明的实施例,参照图5,步骤S300进一步包括:
S310:确定预定范围内出现游离核酸分子的频率。具体地,基于所述长度落入预定范围的核酸分子的数目,确定在所述预定范围内出现游离核酸分子的频率。
S320:确定游离核酸的比例。具体地,基于在所述预定范围内出现游离核酸分子的频率,根据预定的函数,确定所述生物样本中游离核酸的比例,其中,所述预定的函数是基于所述多个对照样品确定的。
由此,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。
根据本发明的实施例,本发明的方法进一步包括确定预定函数的步骤S500(图中未示出)。根据本发明的一些具体示例,参照图6,确定预定函数的步骤S500包括:
S510:确定对照样品预定范围内出现游离核酸的频率。具体地,分别在所述多个对照样品中,确定在所述预定范围内出现游离核酸分子的频率。
S520:拟合。具体地,将所述多个对照样品中在所述预定范围内出现游离核酸分子的频率与已知的游离核酸比例进行拟合,以便确定所述预定的函数。
由此,确定的预定的函数准确可靠,有利于后续步骤的进行。
根据本发明的实施例,所述拟合为线性拟合。
根据本发明的实施例,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸,所述预定范围是185~204bp。由此,基于该预定范围,能够准确地确定生物样本中游离核酸的比例。
根据本发明的实施例,所述预定的函数为d=0.0334*p+1.6657,其中,d表示游离胎儿核酸比例,p表示在所述预定范围内出现游离核酸分子的频率。基于该预定的函数,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。需要说明的是,本文中所采用的表达方式“在所述预定范围内出现游离核酸分子的频率”表示,生物样本中分布在某一预定长度范围内的游离核酸的分子数占总核酸分子数的比例。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的孕妇外周血样本。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述已知游离胎儿核酸比例是利用Y染色体确定的。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品为怀有正常胎的孕妇外周血样本。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品的游离核酸比例为游离胎儿DNA比例,并且所述游离胎儿DNA比例是利用Y染色体估测得到的。由此,能够有效利用对照样品的游离核酸比例,确定预定范围,进而确定待测孕妇样本中长度落入预定范围的核酸分子的数目以及待测孕妇样本中游离胎儿DNA比例。
根据本发明的另一些实施例,本发明的方法,还可以包括以下步骤:
1)全基因组测序(WGS):利用高通量平台对待测样本进行全基因组测序。其中,血浆中游离胎儿DNA比较短,超过300bp的很少,因需要获得所有游离DNA分子的长度,从而单末端测序需测通整条游离DNA分子,或者采用双末端测序。
2)获取唯一比对序列(Unique Reads):将检测样本的测序序列与参考基因组序列进行比对。
3)通过Unique Reads的比对信息,统计该Unique Reads所代表的每条DNA分子的长度:统计待测样本中所有能唯一比对到参考基因组序列的长度。
4)选出一个或者多个相关性较强的长度区域:根据DNA分子长度的分布情况,找出相关性较强的一个或者多个区域。
5)得到函数关系式:通过步骤4得到相关性较强的一个或多个长度区域里的DNA分子频率与已知的游离游离胎儿DNA比例的函数关系式。
6)统计出选定区域的DNA分子出现的频率,即该长度或多个长度下DNA分子出现的频率。
7)通过函数关系式,代入待测样品在一个或多个长度区域里的DNA分子频率,得到待测样本的游离游离胎儿DNA比例。
其中上述步骤4),具体包括下面几个步骤:
I.选出参照样品,即包含已知的游离游离胎儿DNA比例的样品。
II.对所有样品进行WGS测序,通过将Unique Reads比对到染色体的唯一比对信息,获取Unique Reads所代表的每一DNA分子片段的长度信息。
III.统计出所有的参照样品里从0bp到Mbp(M表示DNA分子的最长的值,游离DNA分子长度可以保护到400bp)范围内,每个长度下出现DNA分子的条数。
IV.以某个长度为窗口长度,按照某个步长挪动划分多个窗口,统计出每个窗口下出现DNA分子的频率,即DNA分子在该长度下出现的频率。需要说明的是,在每个窗口下的,即分布在窗口规定长度范围内的DNA分子条数除以总分子条数,被定义为每个窗口下出现DNA分子的频率。例如可以1bp、5bp、10bp、15bp,…为窗口,步长范围从1bp到窗口长度,具体地,比如以5bp为一个窗口,以2bp为步长,则可以统计[1bp,5bp],[2bp,6bp],[4bp,8bp],[6bp,10bp],…各窗口下的DNA分子的分布情况;比如以5bp为一个窗口,以5bp为步长,则可统计[1bp,5bp],[6bp,10bp],[11bp,15bp],…各窗口下的DNA分子的分布情况。其中,前述的“总分子条数”是指所有长度的DNA分子的总条数。
V.找出各个窗口下DNA分子出现的频率与已知游离游离胎儿DNA比例样品相关性比较强的窗 口或窗口组合:建立函数关系式,选出相关性比较强的窗口或对窗口进行组合,即选定相关性比较强的一个或者是多个长度区域。
用于确定生物样本中游离核酸比例的设备
根据本发明的另一方面,本发明还提供了一种用于确定生物样本中预定来源的游离核酸比例的设备。发明人惊奇地发现,本发明的设备适于实施前面所述的本发明的确定生物样本中预定来源的游离核酸比例的方法,进而利用本发明的设备能够准确高效地确定生物样本中预定来源的游离核酸的比例,尤其适用于确定孕妇外周血中的游离胎儿核酸,以及肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中游离肿瘤核酸的比例。
根据本发明的实施例,参照图7,该设备包括:测序装置100、计数装置200和游离核酸比例确定装置300。
具体地,测序装置100用于对含有游离核酸的生物样本进行核酸测序,以便获得多个测序数据的测序结果;计数装置200与所述测序装置100相连,并且用于基于所述测序结果,确定所述样本中长度落入预定范围的核酸分子的数目;游离核酸比例确定装置300与所述计数装置200相连,并且用于基于所述长度落入预定范围的核酸分子的数目,确定所述生物样本中所述预定来源的游离核酸的比例。
根据本发明的实施例,所述生物样本的种类不受特别限制。根据本发明的具体示例,所述生物样本为外周血。根据本发明的实施例,所述游离核酸为孕妇外周血中的游离胎儿核酸或母亲来源的游离核酸,或者肿瘤患者外周血中的游离肿瘤核酸或非肿瘤来源的游离核酸。由此,能够容易地确定孕妇外周血中的游离胎儿核酸的比例,或者肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中游离肿瘤核酸的比例。根据本发明的实施例,所述核酸为DNA。
根据本发明的实施例,所述测序结果包括所述游离核酸的长度。
根据本发明的实施例,所述测序为双末端测序、单末端测序或单分子测序。由此,容易得到游离核酸的长度,有利于后续步骤的进行。
根据本发明的实施例,参照图8,所述计数装置200进一步包括:比对单元210、第一长度确定单元220和数目确定单元230。具体地,比对单元210用于将所述测序结果与参考基因组进行比对,以便构建唯一比对测序数据集,所述唯一比对测序数据集中的每一个测序数据仅能够与所述参考基因组的一个位置匹配;第一长度确定单元220与所述比对单元210相连,用于确定所述唯一比对测序数据集中各测序数据所对应的核酸分子的长度;数目确定单元230与所述第一长度确定单元220相连,用于确定长度落入所述预定范围的核酸分子的数目。由此,能够容易地确定样本中长度落入预定范围的核酸分子的数目,且结果准确可靠,重复性好。
根据本发明的实施例,其中,所述第一长度确定单元220将所述测序数据能够与所述参考基因组匹配的序列长度作为与所述测序数据对应的核酸分子的长度。由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
根据本发明的实施例,所述测序为双末端测序,其中,参照图9,所述第一长度确定单元220进一步包括:5’末端位置确定模块2210、3’末端位置确定模块2220和长度计算模块2230。具体地,5’末端位置确定模块2210用于基于所述双末端测序数据的一侧测序数据,在所述参考基因组上,确定所 述核酸的5’末端位置;3’末端位置确定模块2220与所述5’末端位置确定模块2210相连,用于基于所述双末端测序数据的另一侧测序数据,在所述参考基因组上,确定所述核酸的3’末端位置;长度计算模块2230与所述3’末端位置确定模块2220相连,用于基于所述核酸的5’末端位置以及所述核酸的3’末端位置,确定所述核酸的长度。由此,能够准确地确定唯一比对测序数据集中各测序数据所对应的核酸分子的长度。
根据本发明的实施例,本发明的设备进一步包括预定范围确定装置400,所述预定范围确定装置400用于基于多个对照样品确定所述预定范围,其中,所述对照样品中预定来源的游离核酸比例是已知的,任选地,所述预定范围是基于至少20个对照样品确定的。
根据本发明的实施例,参照图10,所述预定范围确定装置400进一步包括:第二长度确定单元410、第一比例确定单元420、相关性系数确定单元430和预定范围确定单元440。具体地,第二长度确定单元410用于确定所述多个对照样品中所包含游离核酸分子的长度;第一比例确定单元420与所述第二长度确定单元410相连,用于设定多个候选长度范围,并分别确定所述多个对照样品在各所述候选长度范围内出现游离核酸分子的频率;相关性系数确定单元430与所述第一比例确定单元420相连,用于基于所述多个对照样品在各候选长度范围内出现游离核酸分子的频率以及所述对照样品中预定来源的游离核酸比例,确定各所述候选长度范围与所述中预定来源的游离核酸比例的相关性系数;预定范围确定单元440与所述相关性系数确定单元430相连,用于选择相关性系数最大的候选长度范围作为所述预定范围。由此,能够准确有效地确定预定范围。
根据本发明的实施例,所述候选长度的跨度为1~20bp。
根据本发明的实施例,所述多个候选长度范围的步长为1~2bp。
根据本发明的实施例,参照图11,所述游离核酸比例确定装置300进一步包括:第二比例确定单元310和游离核酸比例计算单元320。具体地,第二比例确定单元310用于基于所述长度落入预定范围的核酸分子的数目,确定在所述预定范围内出现游离核酸分子的频率;游离核酸比例计算单元320与所述第二比例确定单元310相连,用于基于在所述预定范围内出现游离核酸分子的频率,根据预定的函数,确定所述生物样本中游离核酸的比例,其中,所述预定的函数是基于所述多个对照样品确定的。由此,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。
根据本发明的实施例,本发明的设备进一步包括预定函数确定装置500,参照图12,所述预定函数确定装置500包括:第三比例确定单元510和拟合单元520。具体地,第三比例确定单元510用于分别在所述多个对照样品中,确定在所述预定范围内出现游离核酸分子的频率;拟合单元520与所述第三比例确定单元510相连,用于将所述多个对照样品中在所述预定范围内出现游离核酸分子的频率与已知的游离核酸比例进行拟合,以便确定所述预定的函数。由此,确定的预定的函数准确可靠,有利于后续步骤的进行。根据本发明的实施例,所述拟合为线性拟合。
根据本发明的实施例,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸,所述预定范围是185~204bp。由此,基于该预定范围,能够准确地确定生物样本中游离核酸的比例。
根据本发明的实施例,所述预定的函数为d=0.0334*p+1.6657,其中,d表示游离胎儿核酸比例,p表示在所述预定范围内出现游离核酸分子的频率。基于该预定的函数,能够有效地确定生物样本中所述游离核酸的比例,且结果准确可靠,可重复性好。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的孕妇外周血样本。
根据本发明的实施例,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述已知游离胎儿核酸比例是利用Y染色体确定的。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品为怀有正常男胎的孕妇外周血样本。由此,预定范围确定准确。
根据本发明的实施例,所述对照样品的游离核酸比例为游离胎儿DNA比例,并且所述游离胎儿DNA比例是利用适于进行Y染色体估测的设备得到的。由此,能够有效利用对照样品的游离核酸比例,确定预定范围,进而确定待测孕妇样本中长度落入预定范围的核酸分子的数目以及待测孕妇外周血样本中游离胎儿DNA比例。
确定双胞胎性别的方法、系统
根据本发明的又一方面,本发明提供了一种确定双胞胎性别的方法。根据本发明的实施例,该方法包括:(1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;(2)基于所述测序数据,根据前面所述的确定生物样本中游离核酸比例的方法,确定第一游离胎儿DNA比例;(3)基于所述测序结果中来源于Y染色体的测序数据,确定第二游离胎儿DNA比例;以及(4)基于所述第一游离胎儿DNA比例与所述第二游离胎儿DNA比例,确定所述双胞胎的性别。发明人惊奇地发现,利用本发明的方法能够准确高效地确定怀有双胞胎的孕妇的双胞胎性别。
根据本发明的实施例,所述第二游离胎儿DNA比例是基于下列公式确定的:
fra.chry=(chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
其中,fra.chry表示所述第二游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值。由此,能够准确确定第二游离胎儿DNA比例。
根据本发明的实施例,在步骤(4)中,包括:(a)确定所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及(b)将步骤(a)中所得到的比值与预定的第一阈值和第二阈值进行比较,以便确定所述双胞胎的性别。由此,能够有效确定双胞胎的性别。
根据本发明的实施例,所述第一阈值是基于多个已知双胞胎均为女胎的参照样本确定的,所述第二阈值是基于多个已知双胞胎均为男胎的参照样本确定的。
根据本发明的实施例,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第一阈值表示所述双胞胎均为女胎,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第二阈值表示所述双胞胎均为男胎,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第一阈值或者第二阈值,或者介于所述第一阈值与所述第二阈值之间表示所述双胞胎包括一个男胎和一个女胎。
根据本发明的实施例,所述第一阈值为0.35,所述第二阈值为0.7。
根据本发明的另一方面,本发明提供了一种确定双胞胎性别的系统。根据本发明的实施例,该系统包括:第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为前面所述的确定生物样本中游离核酸比例的设备,用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例;第二游离胎儿DNA比例确定设备,所述第二游离胎儿DNA比例确定设备适于基于所述测序结果中来源于Y染色体的测序数据,确定第二游离胎儿DNA比例;以及性别确定设备,所述性别确定设备适于基于所述第一游离胎儿DNA比例与所述第二游离胎儿DNA比例,确定所述双胞胎的性别。发明人惊奇地发现,利用本发明的系统能够准确高效地确定怀有双胞胎的孕妇的双胞胎性别。
根据本发明的实施例,所述第二游离胎儿DNA比例是基于下列公式确定的:
fra.chry=(chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
其中,fra.chry表示所述第二游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值。由此,能够准确确定第二游离胎儿DNA比例。
根据本发明的实施例,所述性别确定设备进一步包括:比值确定单元,所述比值确定单元用于确定所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及比较单元,所述比较单元用于将所得到的比值与预定的第一阈值和第二阈值进行比较,以便确定所述双胞胎的性别。由此,能够有效确定双胞胎的性别。
根据本发明的实施例,所述第一阈值是基于多个已知双胞胎均为女胎的参照样本确定的,所述第二阈值是基于多个已知双胞胎均为男胎的参照样本确定的。
根据本发明的实施例,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第一阈值表示所述双胞胎均为女胎,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第二阈值表示所述双胞胎均为男胎,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第一阈值或者第二阈值,或者介于所述第一阈值与所述第二阈值之间表示所述双胞胎包括一个男胎和一个女胎。
根据本发明的实施例,所述第一阈值为0.35,所述第二阈值为0.7。
检测双胞胎染色体非整倍性的方法、系统
根据本发明的又一方面,本发明提供了一种检测双胞胎染色体非整倍性的方法。根据本发明的实施例,该方法包括:(1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;(2)基于所述测序数据,根据前面所述的确定生物样本中游离核酸比例的方法,确定第一 游离胎儿DNA比例。根据本发明的实施例,也可以基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,
其中,fra.chry表示第一游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;(3)基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及(4)基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在非整倍性。由此,能够准确有效地进行双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%,其中,fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;adjust.chri.ER%表示预先确定的怀有正常双胞胎孕妇外周血样本中来源于所述预定染色体的游离核酸测序数据占总测序数据的百分比平均值。由此,能够准确确定第三游离胎儿DNA比例。
根据本发明的实施例,在步骤(4)中,包括:(a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及(b)将步骤(a)中所得到的比值与预定的第三阈值和第四阈值进行比较,以便确定所述双胞胎针对所述预定染色体是否存在非整倍性。由此,能够有效实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第一阈值是基于多个已知双胞胎针对所述预定染色体均不存在非整倍性的参照样本确定的,所述第二阈值是基于多个已知双胞胎针对所述预定染色体均为非整倍性的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第三阈值表示所述双胞胎针对所述预定染色体均不存在非整倍性,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第四阈值表示所述双胞胎均针对所述预定染色体均为非整倍性,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第三阈值或者第四阈值,或者介于所述第三阈值与所述第四值之间表示所述双胞胎其中一个针对所述预定染色体具有非整倍性,另外一个针对所述预定染色体不具有非整倍性。
根据本发明的实施例,所述第一阈值为0.35,所述第二阈值为0.7。
根据本发明的实施例,所述预定染色体为18、21和23号染色体至少之一。
根据本发明的再一方面,本发明提供了一种确定双胞胎染色体非整倍性的系统。根据本发明的实施例,该系统包括:第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为前面所述的确定生物样本中游离核酸比例的设备,用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便 获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例。根据本发明的实施例,所述第一游离胎儿DNA比例确定设备适于基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示第一游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;第三游离胎儿DNA比例确定设备,所述第三游离胎儿DNA比例确定设备适于基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及第一非整倍性确定设备,所述第一非整倍性确定设备适于基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在非整倍性。发明人惊奇地发现,利用本发明的系统能够准确有效地实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%,其中,fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;adjust.chri.ER%表示预先确定的怀有正常双胞胎孕妇外周血样本中来源于所述预定染色体的游离核酸测序数据占总测序数据的百分比平均值。由此,能够准确确定第三游离胎儿DNA比例。
根据本发明的实施例,非整倍性确定设备进一步包括:比值确定单元,所述比值确定单元用于确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及比较单元,将所得到的比值与预定的第三阈值和第四阈值进行比较,以便确定所述双胞胎针对所述预定染色体是否存在非整倍性。由此,能够有效实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第一阈值是基于多个已知双胞胎针对所述预定染色体均不存在非整倍性的参照样本确定的,所述第二阈值是基于多个已知双胞胎针对所述预定染色体均为非整倍性的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第三阈值表示所述双胞胎针对所述预定染色体均不存在非整倍性,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第四阈值表示所述双胞胎均针对所述预定染色体均为非整倍性,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第三阈值或者第四阈值,或者介于所述第三阈值与所述第四值之间表示所述双胞胎其中一个针对所述预定染色体具有非整倍性,另外一个针对所述预定染色体不具有非整倍性。由此,能够有效实现双胞胎染色体非整倍性检测。
根据本发明的实施例,所述第一阈值为0.35,所述第二阈值为0.7。
根据本发明的实施例,所述预定染色体为18、21和23号染色体至少之一。
根据本发明的另一方面,本发明还提供了一种确定双胞胎染色体非整倍性的方法。根据本发明的实施例,该方法包括:(1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;(2)确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例Xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;(3)基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;(4)基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,T2i=(xii*(1+fra/2))/σi;d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据前面所述的确定生物样本中游离核酸比例的方法确定的游离胎儿DNA比例,或者是使用Y染色体估算的胎儿浓度fra.chrY%,其中,
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;(5)根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第五阈值,L=预定的第六阈值,将所述四象限图进行划分区域,其中,当待测样本落在第一象限,则双胎中两个胎儿都判定为三体;当待测样本落在第二象限,则双胎中有一个胎儿都判定为三体,一个正常;当待测样本落在第三象限,则双胎中两个胎儿都判定为正常;当待测样本落在第四象限,则双胎中判定为胎儿浓度比较低的样本,不采用其结果。发明人惊奇地发现,利用本发明的确定双胞胎染色体非整倍性的方法,能够准确高效地实现对孕妇双胞胎染色体非整倍性的检测,确定双胞胎预定染色体是否存在非整倍性。
根据本发明的另一方面,本发明提供了一种确定双胞胎染色体非整倍性的系统。根据本发明的实施例,该系统包括:xi值确定设备,所述xi值确定设备用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;T值确定设备,所述T值确定设备用于基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;L值确定设备,所述L值确定设备用于基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,T2i=(xii*(1+fra/2))/σi;d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据权利要求1~20所述的方法确定的游离胎儿DNA比例,或者是使用Y染色体估算的胎儿浓度fra.chrY%,其中,
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其 中,fra.chry表示游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;第二非整倍性确定设备,所述第二非整倍性确定设备适于根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第五阈值,L=预定的第六阈值,将所述四象限图进行划分区域,其中,当待测样本落在第一象限,则双胎中两个胎儿都判定为三体;当待测样本落在第二象限,则双胎中有一个胎儿都判定为三体,一个正常;当待测样本落在第三象限,则双胎中两个胎儿都判定为正常;当待测样本落在第四象限,则双胎中判定为胎儿浓度比较低的样本,不采用其结果。发明人发现,利用本发明的确定双胞胎染色体非整倍性的系统,能够准确高效地实现对孕妇双胞胎染色体非整倍性的检测,确定双胞胎预定染色体是否存在非整倍性。
其中,需要说明的是:
“μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值”中所述的“参考数据库”是指,怀有正常胎儿(男胎、女胎,单胎或双胎均可)的孕妇外周血游离核酸、测序数据;
前面所使用的表达方式“来源于染色体Y的测序数据”所述的“测序数据”即为测序获得的read。
根据本发明的一些具体示例,在本文中,术语“xi”、“ERi”与“Chri.ER%”可以互换使用,也即xi可以为经过GC修正的结果。具体地,可以利用已知正常样本数据对每条染色体的UR和GC含量进行拟合,得到关系式:ERi=fi(GCi)+εi,并计算UR均值
Figure PCTCN2015085109-appb-000001
。对于待分析样品,根据以上关系式及样品的ER和GC计算修正后的ER值:
Figure PCTCN2015085109-appb-000002
检测胎儿嵌合体的方法、系统
在本发明的又一方面,本发明提出了一种检测胎儿嵌合体的方法。根据本发明的实施例,该方法包括:(1)对怀有胎儿孕妇的外周血进行核酸测序,以便获得由多个测序数据构成的测序结果,任选地,所述胎儿为男胎;(2)基于所述测序数据,根据前面所述的方法,确定第一游离胎儿DNA比例,或者,基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示第一游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;(3)基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及(4)基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述胎儿中针对所述预定染色体是否存在嵌合体。由此,能够准确地对胎儿中是否存在特定染色体的嵌合体情况进行分析。
根据本发明的实施例,该方法还可以进一步具有下列附加技术特征:
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:
fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
其中,
fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
adjust.chri.ER%表示预先确定的怀有正常胎儿的孕妇外周血样本中来源于所述预定染色体的游离核酸测序数据占总测序数据的百分比平均值。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
根据本发明的实施例,在步骤(4)中,包括:
(a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
(b)将步骤(a)中所得到的比值与预定的多个阈值进行比较,以便确定所述胎儿针对所述预定染色体是否存在嵌合体。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
根据本发明的实施例,所述预定的多个阈值包括选自下列的至少之一:
第七阈值,所述第七阈值是基于多个已知针对所述预定染色体为完全单体的参照样本确定的,
第八阈值,所述第八阈值是基于多个已知针对所述预定染色体为单体嵌合的参照样本确定的,
第九阈值,所述第九阈值是基于多个已知针对所述预定染色体为正常的参照样本确定的,
第十阈值,所述第十阈值是基于多个已知针对所述预定染色体为完全三体的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第七阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全单体;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第七阈值并且不大于所述第八阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为单体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第八阈值并且小于所述第九阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为正常;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第九阈值并且不大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为三体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全三体。
根据本发明的实施例,所述第七阈值为至少-1并且所述第七阈值小于0,任选地所述第五阈值为-0.85;
所述第八阈值大于所述第七阈值并且所述第八阈值小于0,任选地所述第八阈值为-0.3;
所述第九阈值大于0并且所述第九阈值小于1,任选地所述第九阈值为0.3;
所述第十阈值大于所述第九阈值并且所述第十阈值小于1,任选地所述第十阈值为0.85。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
在本发明的又一方面,本发明提出了一种检测胎儿嵌合体的系统。根据本发明的实施例,该系统包括:
第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为前面所述的确定生物样本中游离核酸比例的设备,用于对胎儿的孕妇的外周血进行核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例,或者,所述第一游离胎儿DNA比例确定设备适于基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,其中,fra.chry表示第一游离胎儿DNA比例,chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;第三游离胎儿DNA比例确定设备,所述第三游离胎儿DNA比例确定设备适于基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及嵌合体确定设备,所述嵌合体确定设备适于基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在嵌合体。根据本发明的实施例,上述用于确定胎儿嵌合体的系统可以有效地实施前面所述的确定胎儿嵌合体的方法,从而能够有效地对胎儿嵌合体情况进行分析。
根据本发明的实施例,上述确定胎儿嵌合体的系统还可以具有下列附加技术特征:
根据本发明的实施例,所述第三游离胎儿DNA比例是基于下列公式确定的:
fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
其中,
fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
adjust.chri.ER%表示预先确定的怀有正常胎儿的孕妇外周血样本中来源于所述预定染色体的游离核酸测序数据占总测序数据的百分比平均值。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
根据本发明的实施例,在步骤(4)中,包括:
(a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
(b)将步骤(a)中所得到的比值与预定的多个阈值进行比较,以便确定所述胎儿针对所述预定染色体是否存在嵌合体。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效 率。
根据本发明的实施例,所述预定的多个阈值包括选自下列的至少之一:
第七阈值,所述第七阈值是基于多个已知针对所述预定染色体为完全单体的参照样本确定的,
第八阈值,所述第八阈值是基于多个已知针对所述预定染色体为单体嵌合的参照样本确定的,
第九阈值,所述第九阈值是基于多个已知针对所述预定染色体为正常的参照样本确定的,
第十阈值,所述第十阈值是基于多个已知针对所述预定染色体为完全三体的参照样本确定的。
根据本发明的实施例,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第七阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全单体;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第七阈值并且不大于所述第八阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为单体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第八阈值并且不大于所述第九阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为正常;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第九阈值并且不大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为三体嵌合;
所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全三体。
根据本发明的实施例,所述第七阈值大于-1并且所述第七阈值小于0,任选地所述第五阈值为-0.85;
所述第八阈值大于所述第七阈值并且所述第八阈值小于0,任选地所述第八阈值为-0.3;
所述第九阈值大于0并且所述第九阈值小于1,任选地所述第九阈值为0.3;
所述第十阈值大于所述第九阈值并且所述第十阈值小于1,任选地所述第十阈值为0.85。由此,可以进一步提高对胎儿中是否存在特定染色体的嵌合体情况进行分析的效率。
在本发明的又一方面,本发明提出了一种检测胎儿嵌合体的方法,根据本发明的实施例,该确定胎儿嵌合体的方法包括:
(1)对怀有胎儿的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;
(2)确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;
(3)基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;
(4)基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,
T2i=(xii*(1+fra/2))/σi
d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据前面所述的确定生物样本中游离核酸比例的方法确定的游离胎儿DNA比例;
(5)如果T不大于0,则根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第十一阈值,L=预定的第十二阈值,将所述四象限图进行划分区域,
其中,
当待测样本落在第一象限,则将所述胎儿判定为针对所述预定染色体为完全单体或单体嵌合;
当待测样本落在第二象限,则将所述胎儿判定为针对所述预定染色体为单体嵌合;
当待测样本落在第三象限,则将所述胎儿判定为针对所述预定染色体为正常;
当待测样本落在第四象限,则将所述胎儿判定为胎儿浓度比较低的样本,不采用其结果,
如果T大于0,则根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第十三阈值,L=预定的第十四阈值,将所述四象限图进行划分区域,
其中,
当待测样本落在第一象限,则将所述胎儿判定为针对所述预定染色体为完全三体或三体嵌合;
当待测样本落在第二象限,则将所述胎儿判定为针对所述预定染色体为三体嵌合;
当待测样本落在第三象限,则将所述胎儿判定为针对所述预定染色体为正常;
当待测样本落在第四象限,则将所述胎儿判定为胎儿浓度比较低的样本,不采用其结果,
任选地,所述第十一阈值和所述第十三阈值分别独立地为3,所述第十二阈值和所述十四阈值分别独立地为1。
由此,可以有效地对胎儿嵌合体情况进行分析。
需要说明的是,在本文中所使用的表达方式“正常男胎/女胎/胎”是指胎儿染色体正常,例如,“正常男胎”是指染色体正常的男胎。并且,“正常男胎/女胎/胎”可以为单胎或者双胎,例如,“正常男胎”可以是正常单男胎,也可以为正常双男胎;“正常胎儿”则不限定胎儿的性别,也不限定是单胎还是双胎。
下面将结合实施例对本发明的实施方案进行详细描述,但是本领域技术人员将会理解,下列实施例仅用于说明本发明,而不应视为限定本发明的范围。实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市场获得的常规产品。
实施例1:
根据本发明的确定生物样本中游离核酸比例的方法,对11例待测孕妇血浆样本进行游离胎儿DNA比例进行估算,具体如下:
1)样品收集及处理
分别提取11个待测孕妇和37个已知怀有男胎的孕妇的孕期外周血2ml进行血浆分离,以便获得各待测孕妇的外周血样本和已知怀有男胎的孕妇外周血样本。
2)文库构建
根据Complete Genomics Inc.公司的血浆文库构建要求进行文库构建。
3)测序
测序过程严格按照Complete Genomics Inc.的标准操作流程进行上机操作。
4)数据分析
通过双末端测序得到的序列,对样本进行DNA片段长度分布的分析,其流程如图1所示,步骤如 下:
a)分别计算11例待测样本和37个已知怀有正常男胎的孕妇样本的DNA片段长度:参考唯一比对序列长度选双向测序reads一端19bp,另一端12bp,统计参考唯一比对序列的位置,通过序列的起始位置和终止位置得到DNA片段长度。
b)针对37个已知怀有男胎的孕妇样本中的每一个样本,均以某个长度为窗口长度,按照某个步长挪动划分多个窗口,统计出每个窗口下出现DNA分子的频率,即DNA分子在该长度下出现的频率。需要说明的是,在每个窗口下的,即分布在窗口规定长度范围内的DNA分子条数除以总分子条数,被定义为每个窗口下出现DNA分子的频率。例如可以1bp、5bp、10bp、15bp,…为窗口,步长范围从1bp到窗口长度,具体地,比如以5bp为一个窗口,以2bp为步长,则可以统计[1bp,5bp],[2bp,6bp],[4bp,8bp],[6bp,10bp],…各窗口下的DNA分子的分布情况;比如以5bp为一个窗口,以5bp为步长,则可统计[1bp,5bp],[6bp,10bp],[11bp,15bp],…各窗口下的DNA分子的分布情况。
c)找出上述37个已知怀有男胎的孕妇样本的某长度DNA分子出现的频率与已知胎儿比例相关性比较强的一个或者多个区域。其中,对于某个长度范围,利用37例已知怀有男胎的孕妇样本由Y染色体估测(具体估测方法请参照:Fuman Jiang,Jinghui Ren,et al.Noninvasive Fetal Trisomy(NIFTY)test:an advanced noninvasive prenatal diagnosis methodology for fetal autosomal and sex chromosomal aneuploidies.BMC Med Genomics.2012Dec 1;5:57.doi:10.1186/1755-8794-5-57.,通过参照将其全文并入本文)得到的游离胎儿DNA比例及DNA片段长度在M内的DNA分子出现的频率,并计算二者的相关性系数。进而,选出相关性系数绝对值最大的长度区域M,如M=185~204bp,相关性系数R=-0.87,见附图13;或者,M=121~150bp,相关性系数为R=-0.6199。
d)确定孕妇外周血中长度范围在M里的DNA分子出现的频率与游离的游离胎儿DNA比例(记作d)的函数关系:针对上述37例游离胎儿DNA片段比例d已知的样品(即37个已知怀有男胎的孕妇样本),利用DNA分子在185~204bp范围内出现的频率pi(i=1,2,…,48)和游离胎儿DNA比例di(i=1,2,..,48)作线性拟合图(如图13),得到二者的关系式:d=a*p+b,d=0.0334*p+1.6657。
e)分别统计11个待测孕妇样本的DNA片段长度分布及处于M区间的DNA分子出现的频率:统计待测孕妇样本DNA片段在185bp~204bp范围内出现的频率p,结果见下表1。
f)估测测试样本的游离胎儿DNA片段比例:根据上述统计获得的各待测孕妇样本DNA片段在185bp~204bp范围内出现的频率pj(j为测试样本标记)及关系式d=a*p+b直接计算出测试样本的游离胎儿DNA片段比例dj
g)待测样本的游离胎儿DNA比例的估算结果见下表1(其中,37个已知怀有男胎的孕妇的chrY估测和通过本发明的方法估测获得的游离胎儿DNA比例基本一致)。
表1
Figure PCTCN2015085109-appb-000003
Figure PCTCN2015085109-appb-000004
实施例2:
根据本发明的确定双胞胎性别、染色体非整倍性的方法,基于实施例1中确定的游离胎儿DNA比例的结果,对实施例1中所述的11例怀有双胎孕妇血浆进行性别判定以及染色体非整倍性的检测:
1、判定待测样本的胎儿性别:
基于实施例1中确定的游离胎儿DNA比例的结果,按照以下步骤,确定11例待测样本的胎儿性别:
a)通过Y染色体的方法计算胎儿浓度fra.chry;
b)通过片段估算胎儿浓度fra.size;
c)判定标准:
I.当fra.chry/fra.size的值小于0.35,双胎中两个胎儿都为女胎;
II.当错误!未找到引用源。fra.chry/fra.size的值不小于0.35且不大于0.7,双胎中一个胎儿为男胎,一个胎儿为女胎;
III.当错误!未找到引用源。fra.chry/fra.size的值大于0.7,双胎中两个胎儿都为男胎。
结果如下表所示:
11例待测样本的胎儿性别判定结果
Figure PCTCN2015085109-appb-000005
Figure PCTCN2015085109-appb-000006
1、通过胎儿浓度判定待测样本双胞胎的染色体非整倍性情况:
基于实施例1中确定的游离的来源游离胎儿DNA比例的结果,按照以下步骤,通过胎儿浓度确定11例待测样本的胎儿染色体非整倍性:
a)通过第i(i=13,18,21)条染色体计算出胎儿浓度fra.chri;
b)通过片段估算胎儿浓度fra.size;
c)判定标准:
I.当错误!未找到引用源。fra.chri/fra.size的值小于0.35,双胎中两个胎儿第i条染色体都正常;
II.当错误!未找到引用源。fra.chri/fra.size的值不小于0.35且不大于0.7,双胎中一个胎儿为第i条染色体为三体,一个胎儿第i条染色体正常;
III.当错误!未找到引用源。fra.chri/fra.size的值大于0.7,双胎中两个胎儿的第i条染色体都为三体。
结果如下表所示:
通过胎儿浓度检测11例待测样本胎儿染色体非整倍性的结果
Figure PCTCN2015085109-appb-000007
2、通过T值&L值判定待测样本的染色体非整倍性情况:
基于实施例1中确定的游离的来源游离胎儿DNA比例的结果,按照以下步骤,通过T值&L值确定11例待测样本的胎儿染色体非整倍性:
a)全基因组测序(WGS):利用高通量平台Illumina对待测样本进行全基因组测序。
b)获取有效比对序列(Effective Reads)的位置信息:将检测样本的测序序列与参考基因组序列hg19进行比对,获取能唯一比对上参考序列的位置信息。
c)统计Effective Reads的百分含量:对步骤(2)获得的序列,统计每条染色体的Effective Reads占总的Effective Reads的比例。
d)GC修正:
利用已知正常样本数据对每条染色体的UR和GC含量进行拟合,得到关系式:ERi=fi(GCi)+εi,并计算UR均值
Figure PCTCN2015085109-appb-000008
对于待分析样品,根据以上关系式及样品的ER和GC计算修正后的ER值:
Figure PCTCN2015085109-appb-000009
e)计算T值:Ti=(xii)/σi,其中,
i:染色体编号(i=13,18,21);
xi:分析样品中第i条染色体的Effective Reads的百分含量;
μi:在参考数据库中被选为参照系的第i条染色体的Effective Reads的百分含量的均值;
σi:在参考数据库中被选为参照系的第i条染色体的Effective Reads的百分含量标准差;
f)计算L值:
基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,T2i=(xii*(1+fra/2))/σi;d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据实施例1中所述的方法确定的游离胎儿DNA比例;
g)根据T值和L值作四象限图,横坐标为L值,纵坐标为T值以T=3和L=0.8为cutoff值进行划分区域(胎儿浓度<5%判定为质控不过),具体如下:
I.当待测样本落在第一象限(T>3,L>0.8),则双胎中两个胎儿都判定为三体;
II.当待测样本落在第二象限(T>3,L≤0.8),则双胎中有一个胎儿都判定为三体,一个正常;
III.当待测样本落在第三象限(T≤3,L≤0.8),则双胎中两个胎儿都判定为正常;
IV.当待测样本落在第四象限(T≤3,L>0.8),则双胎中判定为胎儿浓度比较低的样本,质控不过。
11例待测样本的T值&L值四象限图结果见图14-16。由图14-16可知,通过T值&L值确定的11例待测样本的胎儿染色体非整倍性情况,与步骤2中通过胎儿浓度确定的胎儿染色体非整倍性结果一致。
实施例3:嵌合体检测
在下面的实施例中,采用是流产组织的打断DNA与未怀孕女性血浆按照一定比例进行混合作为模拟孕妇样本。按照下列方法检测胎儿(男胎)染色体数目异常(包括完全三体、完全单体、三体中的嵌合体以及单体中的嵌合体),其包括以下步骤:
1)全基因组测序(WGS):利用高通量平台对待测样本进行全基因组测序。
2)获取有效比对序列(Effective Reads)的位置信息:将检测样本的测序序列与参考基因组序列进行比对,获取能唯一比对上参考序列的位置信息。
3)统计每条染色体的唯一比对序列的百分含量和每条染色体的唯一比对序列的G、C碱基占总的碱基的百分含量:通过Effective Reads的位置信息和Effective Reads的碱基信息,统计待测样本中每条染色体的Effective Reads数目占总的Effective Reads的百分含量;同时统计每条染色体的所有Effective Reads中G、C碱基与所有碱基百分含量。
4)通过第i条染色体计算出来DNA含量,标为fra.chri;
fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
其中,
fra.chri表示游离胎儿DNA比例,i为预定染色体的编号,并且i为1~22范围内的任意整数;
chri.ER%:样品染色体i的ER%(Effective Reads Rate的简称,唯一比对序列的百分比);
adjust.chri.ER%:样品正常时染色体i的ER%理论值;
5)通过Y染色体的方法计算胎儿浓度,标为fra.chry;
fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%
其中:
chry.ER%:待测样本染色体Y的ER%(Effective Reads率的简称,唯一比对序列的百分比);
Female.chry.ER%:女胎样本染色体Y的ER%均值,Man.chry.ER%:男性样本染色体Y的ER%均值
6)判定标准:
I.当fra.chri/fra.chry<A1(A1为某常量,且A1>-1,如-0.85),胎儿为第i条染色体为完全单体;
II.当fra.chri/fra.chry∈[A1,A2](A1,A2为某常量,且-1<A1<A2<0如[-0.85,-0.3]),胎儿为第i条染色体为单体嵌合;
III.当fra.chri/fra.chry∈[A2,A3](A2,A3为某常量,且A2<0<A3如[-0.3,0.3]),胎儿为第i条染色体为正常;
IV.当fra.chri/fra.chry∈[A3,A4](A3,A4为某常量,且0<A3<A4<1如[0.3,0.85]),胎儿为第i条染色体为三体嵌合;
V.当fra.chri/fra.chry>A4(A4为某常量,如0.85),胎儿为第i条染色体为完全三体。
3.1对19例(M1```M19)打断DNA混女性血浆(详情如表a)的样本和2例(N1,N2)怀有男胎孕妇的血浆样本(分别是T18嵌合体,T21嵌合体)进行染色体非整倍性的检测。
1)样品收集及处理
提取外周血2ml进行血浆分离。
2)文库构建
根据Complete Genomics Inc.公司的血浆文库构建要求进行文库构建。
3)测序
测序过程严格按照Complete Genomics Inc.的标准操作流程进行上机操作。
4)数据分析
a)全基因组测序(WGS):利用高通量平台对待测样本进行全基因组测序。(需要获得所有游离DNA分子的长度,如:单末端测序需测通整条游离DNA分子,或者是双末端测序,这点很重要)
b)获取有效比对序列(Effective Reads)的位置信息:将检测样本的测序序列与参考基因组序列进行比对,获取能唯一比对上参考序列的位置信息。
c)统计Effective Reads的百分含量:对步骤(2)获得的序列,统计每条染色体的Effective Reads占总的Effective Reads的比例。
d)GC修正
利用已知正常样本数据对每条染色体的UR和GC含量进行拟合,得到关系式:
ERi=fi(GCi)+εi,并计算UR均值。对于待分析样品,根据以上关系式及样品的ER和GC计算修正后的ER值:
Figure PCTCN2015085109-appb-000011
e)通过第i(i=13,18,21)条染色体计算出胎儿浓度fra.chri;
f)通过Y染色体的方法计算胎儿浓度fra.chry;
g)通过片段的方法(即本发明的确定游离胎儿DNA比例的方法)计算胎儿浓度fra.size;
h)计算T值:Ti=(xii)/σi
i:染色体编号(i=1、2…22);
xi:分析样品中第i条染色体的Effective Reads的百分含量;
μi:在参考数据库中被选为参照系的第i条染色体的Effective Reads的百分含量的均值;
σi:在参考数据库中被选为参照系的第i条染色体的Effective Reads的百分含量标准差;
i)计算L值
先计算T2值:T2i=(xii*(1+fra/2))/σi
再计算L值:Li=log(d(Ti,a))/log(d(T2i,a)),d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra表示fra.chry或者是fra.size。
◆通过胎儿浓度(胎儿浓度fra.chry用的是Y染色体估算的方法进行估算)判定待测样本的染色体非整倍性情况(如表A)
a)判定标准:
I.当fra.chri/fra.chry<-0.85,胎儿为第i条染色体为完全单体;
II.当fra.chri/fra.chry∈[-0.85,-0.3],胎儿为第i条染色体为单体嵌合;
III.当fra.chri/fra.chry∈[-0.3,0.3],胎儿为第i条染色体为正常;
IV.当fra.chri/fra.chry∈[0.3,0.85],胎儿为第i条染色体为三体嵌合;
V.当fra.chri/fra.chry>0.85,胎儿为第i条染色体为完全三体;
◆通过T值&L值判定待测样本的染色体非整倍性情况(胎儿浓度fra.chry估算的方法是通过Y染色体进行估算)(如表B)
a)根据T值和L值作四象限图;
b)当T≤0时,根据T值的绝对值和L值作四象限图,横坐标为L值的绝对值,纵坐标为T值。以T=3和L=1为cutoff值进行划分区域(胎儿浓度<5%判定为质控不过)
I.当待测样本落在第一象限(T>3,L>1),则为单体或单体嵌合;
II.当待测样本落在第二象限(T>3,L≤1),则单体嵌合
III.当待测样本落在第三象限(T≤3,L≤1),则正常;
IV.当待测样本落在第四象限(T≤3,L>1),则则判定为胎儿浓度比较低的样本,质控不过;
当T>0时,T值和L值作四象限图,横坐标为L值,纵坐标为T值。以T=3和L=0.8为cutoff值进行划分区域(胎儿浓度<5%判定为质控不过)
I.当待测样本落在第一象限(T>3,L>1),则为三体或者三体嵌合
II.当待测样本落在第二象限(T>3,L≤1,则三体嵌合
III.当待测样本落在第三象限(T≤3,L≤1),则正常;
IV.当待测样本落在第四象限(T≤3,L>1),则则判定为胎儿浓度比较低的样本,质控不过;
◆通过胎儿浓度(胎儿浓度fra.size用的是片段估算的方法进行估算)判定待测样本的染色体非整倍性情况(如表C)
b)判定标准:
I.当fra.chri/fra.size<-0.85,胎儿为第i条染色体为完全单体;
II.当fra.chri/fra.size∈[-0.85,-0.3],胎儿为第i条染色体为单体嵌合;
III.当fra.chri/fra.size∈[-0.3,0.3],胎儿为第i条染色体为正常;
IV.当fra.chri/fra.size∈[0.3,0.85],胎儿为第i条染色体为三体嵌合;
V.当fra.chri/fra.size>0.85,胎儿为第i条染色体为完全三体;
◆通过T值&L值判定待测样本的染色体非整倍性情况(胎儿浓度fra.size估算的方法是通过片段进行估算)(如表D,图2)
a)根据T值和L值作四象限图;
b)当T≤0时,根据T值的绝对值和L值作四象限图,横坐标为L值的绝对值,纵坐标为T值。以T=3和L=1为cutoff值进行划分区域(胎儿浓度<5%判定为质控不过)
I.当待测样本落在第一象限(T>3,L>1),则为单体或者单体嵌合;
II.当待测样本落在第二象限(T>3,L≤1),则单体嵌合
III.当待测样本落在第三象限(T≤3,L≤1),则正常;
IV.当待测样本落在第四象限(T≤3,L>1),则则判定为胎儿浓度比较低的样本,质控不过;
当T>0时,T值和L值作四象限图,横坐标为L值,纵坐标为T值。以T=3和L=1为cutoff值进行划分区域(胎儿浓度<5%判定为质控不过)
I.当待测样本落在第一象限(T>3,L>1),则为三体或者三体嵌合
II.当待测样本落在第二象限(T>3,L≤1),则三体嵌合
III.当待测样本落在第三象限(T≤3,L≤1),则正常;
IV.当待测样本落在第四象限(T≤3,L>1),则则判定为胎儿浓度比较低的样本,质控不过;
在本实施例中,采用的阴性样本为正常未怀孕的女性血浆样本;阳性样本为利用流产组织的DNA按照150bp到200bp随机打断与正常未怀孕的女性血浆混合制备;(T21、T18为男胎;T13为女胎);阳性嵌合样本为用胎盘组织DNA(150bp到200bp随机打断)、炎黄细胞系DNA(150bp到200bp随机打断)混合正常女性和血浆制备;(T21、T18为男胎;T13为女胎);
样本编号 流产组织核型 胎儿浓度 嵌合比率 表达形式
M1 13三体 3.5% 0 T13-3.5%
M2 13三体 5% 0 T13-5%
M3 13三体 8% 0 T13-8%
M4 13三体 8% 0 T13-8%
M5 18三体 10% 30% T18-10%-30%
M6 18三体 10% 70% T18-10%-70%
M7 18三体 10% 70% T18-10%-70%
M8 18三体 10% 70% T18-10%-70%
M9 18三体 3.5% 0 T18-3.5%
M10 18三体 5% 0 T18-5%
M11 18三体 5% 0 T18-5%
M12 18三体 8% 0 T18-8%
M13 21三体 10% 30% T21-10%-30%
M14 21三体 10%- 70% T21-10%-70%
M15 21三体 10% 70% T21-10%-70%
M16 21三体 10% 70% T21-10%-70%
M17 21三体 3.5% 0 T21-3.5%
M18 21三体 5% 0 T21-5%
M19 21三体 8% 0 T21-8%
样本编号 核型结果
N1 47,XN+18[3]/46,XN[20]
N2 47,XN+21[30]/46,XN[18]
表A 通过Y染色体估算混合DNA浓度检测染色体非整倍性
Figure PCTCN2015085109-appb-000012
注:
fra.chr13:表示通过13号染色体反估出来的混合DNA浓度;
fra.chr18:表示通过18号染色体反估出来的混合DNA浓度;
fra.chr21:表示通过21号染色体反估出来的混合DNA浓度;
fra.chry:表示通过Y染色体计算出来的混合DNA浓度;
T21-10%-30%:用21三体的细胞系打断的DNA按照10%的浓度,30%的嵌合比例配到女性血浆中;
表B 通过T&L值检测染色体非整倍性(通过Y染色体估算混合DNA浓度方法)
Figure PCTCN2015085109-appb-000013
注:
T.chr13:13号染色体的T值;
L.chr13:13号染色体的L值;
T.chr18:18号染色体的T值;
L.chr18:18号染色体的L值;
T.chr21:21号染色体的T值;
L.chr21:21号染色体的L值;
fra.chry:表示通过Y染色体估算出来的混合DNA胎儿浓度;
T21-10%-30%:用21三体的细胞系打断的DNA按照10%的浓度,30%的嵌合比例配到女性血浆中;
表C 通过片段估算胎儿浓度检测染色体非整倍性
Figure PCTCN2015085109-appb-000014
Figure PCTCN2015085109-appb-000015
注:
fra.chr13:表示通过13号染色体反估出来的混合DNA浓度;
fra.chr18:表示通过18号染色体反估出来的混合DNA浓度;
fra.chr21:表示通过21号染色体反估出来的混合DNA浓度;
fra.size:表示通过片段估算出来的混合DNA浓度;
T21-10%-30%:用21三体的细胞系打断的DNA按照10%的浓度,30%的嵌合比例配到女性血浆中;
表D 通过T&L值检测染色体非整倍性(片段估算胎儿浓度方法)
Figure PCTCN2015085109-appb-000016
Figure PCTCN2015085109-appb-000017
注:
T.chr13:13号染色体的T值;
L.chr13:13号染色体的L值;
T.chr18:18号染色体的T值;
L.chr18:18号染色体的L值;
T.chr21:21号染色体的T值;
L.chr21:21号染色体的L值;
fra.size:表示通过片段估算出来的混合DNA浓度;
T21-10%-30%:用21三体的细胞系打断的DNA按照10%的浓度,30%的嵌合比例配到女性血浆中;
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。

Claims (76)

  1. 一种确定生物样本中预定来源的游离核酸比例的方法,其特征在于,包括:
    (1)对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果;
    (2)基于所述测序结果,确定所述样本中长度落入预定范围的核酸分子的数目;以及
    (3)基于所述长度落入预定范围的核酸分子的数目,确定所述生物样本中所述预定来源的游离核酸比例。
  2. 根据权利要求1所述的方法,其特征在于,所述生物样本为外周血。
  3. 根据权利要求2所述的方法,其特征在于,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸或母亲来源的游离核酸,或者肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中的游离肿瘤核酸或非肿瘤来源的游离核酸。
  4. 根据权利要求1所述的方法,其特征在于,所述测序为双末端测序、单末端测序或单分子测序。
  5. 根据权利要求1所述的方法,其特征在于,所述核酸为DNA。
  6. 根据权利要求1~5任一项所述的方法,其特征在于,步骤(2)进一步包括:
    (2-1)将所述测序结果与参考基因组进行比对,以便构建唯一比对测序数据集,所述唯一比对测序数据集中的每一个测序数据仅能够与所述参考基因组的一个位置匹配;
    (2-2)确定所述唯一比对测序数据集中各测序数据所对应的核酸分子的长度;以及
    (2-3)确定长度落入所述预定范围的核酸分子的数目。
  7. 根据权利要求6所述的方法,其特征在于,其中,在步骤(2-2)中,将所述测序数据中能够与所述参考基因组匹配的序列长度作为与所述测序数据对应的核酸分子的长度。
  8. 根据权利要求6所述的方法,其特征在于,所述测序为双末端测序,并且步骤(2-2)中,包括:
    (2-2-1)基于所述双末端测序数据的一侧测序数据,在所述参考基因组上,确定所述核酸的5’末端位置;
    (2-2-2)基于所述双末端测序数据的另一侧测序数据,在所述参考基因组上,确定所述核酸的3’末端位置;以及
    (2-2-3)基于所述核酸的5’末端位置以及所述核酸的3’末端位置,确定所述核酸的长度。
  9. 根据权利要求1所述的方法,其特征在于,所述预定范围是基于多个对照样品确定的,其中,所述对照样品中预定来源的游离核酸比例是已知的。
  10. 根据权利要求9所述的方法,其特征在于,所述预定范围是基于至少20个对照样品确定的。
  11. 根据权利要求9所述的方法,其特征在于,所述预定范围是通过下列步骤确定的:
    (a)确定所述多个对照样品中所包含游离核酸分子的长度;
    (b)设定多个候选长度范围,并分别确定所述多个对照样品在各候选长度范围内出现游离核酸分子的频率;
    (c)基于所述多个对照样品在各候选长度范围内出现游离核酸分子的频率以及所述对照样品中预定来源的游离核酸比例,确定各所述候选长度范围与所述预定来源的游离核酸比例的相关性系数;以及
    (d)基于所述相关性系数的数值,确定至少一个候选长度范围或者候选长度范围组合作为所述预 定范围。
  12. 根据权利要求11所述的方法,其特征在于,所述候选长度范围的跨度为1~20bp。
  13. 根据权利要求11所述的方法,其特征在于,所述多个候选长度范围的步长为1~2bp。
  14. 根据权利要求9所述的方法,其特征在于,步骤(3)进一步包括:
    (3-1)基于所述长度落入预定范围的核酸分子的数目,确定在所述预定范围内出现游离核酸分子的频率;以及
    (3-2)基于在所述预定范围内出现游离核酸分子的频率,根据预定的函数,确定所述生物样本中预定来源的游离核酸的比例,
    其中,
    所述预定的函数是基于所述多个对照样品确定的。
  15. 根据权利要求14所述的方法,其特征在于,所述预定的函数是通过下列步骤获得的:
    (i)分别在所述多个对照样品中,确定在所述预定范围内出现游离核酸分子的频率;以及
    (ii)将所述多个对照样品中在所述预定范围内出现游离核酸分子的频率与已知的预定来源的游离核酸比例进行拟合,以便确定所述预定的函数。
  16. 根据权利要求15所述的方法,其特征在于,所述拟合为线性拟合。
  17. 根据权利要求1所述的方法,其特征在于,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸,所述预定范围是185~204bp。18、根据权利要求14所述的方法,其特征在于,所述预定的函数为d=0.0334*p+1.6657,其中,d表示游离胎儿核酸比例,p表示在所述预定范围内出现游离核酸分子的频率。
  18. 根据权利要求9所述的方法,其特征在于,所述对照样品为已知游离胎儿核酸比例的孕妇外周血样本。
  19. 根据权利要求18所述的方法,其特征在于,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述已知游离胎儿核酸比例是利用Y染色体确定的。
  20. 一种用于确定生物样本中预定来源的游离核酸比例的设备,其特征在于,包括:
    测序装置,所述测序装置用于对含有游离核酸的生物样本进行核酸测序,以便获得由多个测序数据构成的测序结果;
    计数装置,所述计数装置与所述测序装置相连,并且用于基于所述测序结果,确定所述样本中长度落入预定范围的核酸分子的数目;以及
    游离核酸比例确定装置,所述游离核酸比例确定装置与所述计数装置相连,并且用于基于所述长度落入预定范围的核酸分子的数目,确定所述生物样本中所述预定来源的游离核酸比例。
  21. 根据权利要求20所述的设备,其特征在于,所述生物样本为外周血。
  22. 根据权利要求20所述的设备,其特征在于,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸或母亲来源的游离核酸,或者肿瘤患者、疑似肿瘤患者或者肿瘤筛查者外周血中的游离肿瘤核酸或非肿瘤来源的游离核酸。
  23. 根据权利要求20所述的设备,其特征在于,所述测序为双末端测序、单末端测序或单分子测序。
  24. 根据权利要求20所述的设备,其特征在于,所述核酸为DNA。
  25. 根据权利要求20~24任一项所述的设备,其特征在于,所述计数装置进一步包括:
    比对单元,所述比对单元用于将所述测序结果与参考基因组进行比对,以便构建唯一比对测序数据集,所述唯一比对测序数据集中的每一个测序数据仅能够与所述参考基因组的一个位置匹配;
    第一长度确定单元,所述第一长度确定单元与所述比对单元相连,用于确定所述唯一比对测序数据集中各测序数据所对应的核酸分子的长度;以及
    数目确定单元,所述数目确定单元与所述第一长度确定单元相连,用于确定长度落入所述预定范围的核酸分子的数目。
  26. 根据权利要求25所述的设备,其特征在于,所述第一长度确定单元将所述测序数据能够与所述参考基因组匹配的序列长度作为与所述测序数据对应的核酸分子的长度。
  27. 根据权利要求25所述的设备,其特征在于,所述测序为双末端测序,
    其中,
    所述第一长度确定单元进一步包括:
    5’末端位置确定模块,所述5’末端位置确定模块用于基于所述双末端测序数据的一侧测序数据,在所述参考基因组上,确定所述核酸的5’末端位置;
    3’末端位置确定模块,所述3’末端位置确定模块与所述5’末端位置确定模块相连,用于基于所述双末端测序数据的另一侧测序数据,在所述参考基因组上,确定所述核酸的3’末端位置;以及
    长度计算模块,所述长度计算模块与所述3’末端位置确定模块相连,用于基于所述核酸的5’末端位置以及所述核酸的3’末端位置,确定所述核酸的长度。
  28. 根据权利要求20所述的设备,其特征在于,进一步包括预定范围确定装置,所述预定范围确定装置用于基于多个对照样品确定所述预定范围,其中,所述对照样品中预定来源的游离核酸比例是已知的,
    任选地,所述预定范围是基于至少20个对照样品确定的。
  29. 根据权利要求28所述的设备,其特征在于,所述预定范围确定装置进一步包括:
    第二长度确定单元,所述第二长度确定单元用于确定所述多个对照样品中所包含游离核酸分子的长度;
    第一比例确定单元,所述第一比例确定单元与所述第二长度确定单元相连,用于设定多个候选长度范围,并分别确定所述多个对照样品在各所述候选长度范围内出现游离核酸分子的频率;
    相关性系数确定单元,所述相关性系数确定单元与所述第一比例确定单元相连,用于基于所述多个对照样品在各候选长度范围内出现游离核酸分子的频率以及所述对照样品中预定来源的游离核酸比例,确定各所述候选长度范围与所述预定来源的游离核酸比例的相关性系数;以及
    预定范围确定单元,所述预定范围确定单元与所述相关性系数确定单元相连,用于选择相关性系数最大的候选长度范围作为所述预定范围。
  30. 根据权利要求29所述的设备,其特征在于,所述候选长度的跨度为1~20bp,
    任选地,所述多个候选长度范围的步长为1~2bp。
  31. 根据权利要求28所述的设备,其特征在于,所述游离核酸比例确定装置进一步包括:
    第二比例确定单元,所述第二比例确定单元用于基于所述长度落入预定范围的核酸分子的数目,确定在所述预定范围内出现游离核酸分子的频率;以及
    游离核酸比例计算单元,所述游离核酸比例计算单元与所述第二比例确定单元相连,用于基于在所述预定范围内出现游离核酸分子的频率,根据预定的函数,确定所述生物样本中预定来源的游离核酸的比例,
    其中,
    所述预定的函数是基于所述多个对照样品确定的。
  32. 根据权利要求31所述的设备,其特征在于,进一步包括预定函数确定装置,所述预定函数确定装置包括:
    第三比例确定单元,所述第三比例确定单元用于分别在所述多个对照样品中,确定在所述预定范围内出现游离核酸分子的频率;以及
    拟合单元,所述拟合单元与所述第三比例确定单元相连,用于将所述多个对照样品中在所述预定范围内出现游离核酸分子的频率与已知的预定来源的游离核酸比例进行拟合,以便确定所述预定的函数。
  33. 根据权利要求32所述的设备,其特征在于,所述拟合为线性拟合。
  34. 根据权利要求20所述的设备,其特征在于,所述预定来源的游离核酸为孕妇外周血中的游离胎儿核酸,所述预定范围是185~204bp。
  35. 根据权利要求31所述的设备,其特征在于,所述预定的函数为d=0.0334*p+1.6657,其中,d表示游离胎儿核酸比例,p表示在所述预定范围内出现游离核酸分子的频率。
  36. 根据权利要求28所述的设备,其特征在于,所述对照样品为已知游离胎儿核酸比例的孕妇外周血样本。
  37. 根据权利要求36所述的设备,其特征在于,所述对照样品为已知游离胎儿核酸比例的怀有正常男胎的孕妇外周血样本,并且所述已知游离胎儿核酸比例是利用Y染色体确定的。
  38. 一种确定双胞胎性别的方法,其特征在于,包括:
    (1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;
    (2)基于所述测序数据,根据权利要求1~19任一项所述的方法,确定第一游离胎儿DNA比例;
    (3)基于所述测序结果中来源于Y染色体的测序数据,确定第二游离胎儿DNA比例;以及
    (4)基于所述第一游离胎儿DNA比例与所述第二游离胎儿DNA比例,确定所述双胞胎的性别。
  39. 根据权利要求38所述的方法,其特征在于,所述第二游离胎儿DNA比例是基于下列公式确定的:
    fra.chry=(chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
    其中,
    fra.chry表示所述第二游离胎儿DNA比例,
    chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;
    Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸 的测序数据占总测序数据的百分比的平均值;以及
    Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值。
  40. 根据权利要求38所述的方法,其特征在于,在步骤(4)中,包括:
    (a)确定所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
    (b)将步骤(a)中所得到的比值与预定的第一阈值和第二阈值进行比较,以便确定所述双胞胎的性别。
  41. 根据权利要求40所述的方法,其特征在于,所述第一阈值是基于多个已知双胞胎均为女胎的参照样本确定的,所述第二阈值是基于多个已知双胞胎均为男胎的参照样本确定的。
  42. 根据权利要求41所述的方法,其特征在于,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第一阈值表示所述双胞胎均为女胎,
    所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第二阈值表示所述双胞胎均为男胎,
    所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第一阈值或者第二阈值,或者介于所述第一阈值与所述第二阈值之间表示所述双胞胎包括一个男胎和一个女胎。
  43. 根据权利要求42所述的方法,其特征在于,所述第一阈值为0.35,所述第二阈值为0.7。
  44. 一种确定双胞胎性别的系统,其特征在于,包括:
    第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为权利要求21~45所述的确定生物样本中游离核酸比例的设备,用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例;
    第二游离胎儿DNA比例确定设备,所述第二游离胎儿DNA比例确定设备适于基于所述测序结果中来源于Y染色体的测序数据,确定第二游离胎儿DNA比例;以及
    性别确定设备,所述性别确定设备适于基于所述第一游离胎儿DNA比例与所述第二游离胎儿DNA比例,确定所述双胞胎的性别。
  45. 根据权利要求44所述的系统,其特征在于,所述第二游离胎儿DNA比例是基于下列公式确定的:
    fra.chry=(chry.ER%-Female.chry.ER%)/(Man.chry.ER%-Female.chry.ER%)*100%
    其中,
    fra.chry表示所述第二游离胎儿DNA比例,
    chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;
    Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及
    Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占 总测序数据的百分比的平均值。
  46. 根据权利要求44所述的系统,其特征在于,性别确定设备进一步包括:
    比值确定单元,所述比值确定单元用于确定所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
    比较单元,所述比较单元用于将所得到的比值与预定的第一阈值和第二阈值进行比较,以便确定所述双胞胎的性别。
  47. 根据权利要求46所述的系统,其特征在于,所述第一阈值是基于多个已知双胞胎均为女胎的参照样本确定的,所述第二阈值是基于多个已知双胞胎均为男胎的参照样本确定的。
  48. 根据权利要求47所述的系统,其特征在于,所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第一阈值表示所述双胞胎均为女胎,
    所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第二阈值表示所述双胞胎均为男胎,
    所述第二游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第一阈值或者第二阈值,或者介于所述第一阈值与所述第二阈值之间表示所述双胞胎包括一个男胎和一个女胎。
  49. 根据权利要求48所述的系统,其特征在于,所述第一阈值为0.35,所述第二阈值为0.7。
  50. 一种检测双胞胎染色体非整倍性的方法,其特征在于,包括:
    (1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;
    (2)基于所述测序数据,根据权利要求1~19任一项所述的方法,确定第一游离胎儿DNA比例,
    (3)基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及
    (4)基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在非整倍性。
  51. 根据权利要求50所述的方法,其特征在于,所述第三游离胎儿DNA比例是基于下列公式确定的:
    fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
    其中,
    fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
    chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
    adjust.chri.ER%表示预先确定的怀有正常双胞胎的孕妇外周血样本中来源于所述预定染色体的游离核酸的测序数据占总测序数据的百分比平均值。
  52. 根据权利要求51所述的方法,其特征在于,在步骤(4)中,包括:
    (a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
    (b)将步骤(a)中所得到的比值与预定的第三阈值和第四阈值进行比较,以便确定所述双胞胎针 对所述预定染色体是否存在非整倍性。
  53. 根据权利要求52所述的方法,其特征在于,所述第三阈值是基于多个已知双胞胎针对所述预定染色体均不存在非整倍性的参照样本确定的,所述第四阈值是基于多个已知双胞胎针对所述预定染色体均为非整倍性的参照样本确定的。
  54. 根据权利要求53所述的方法,其特征在于,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第三阈值表示所述双胞胎针对所述预定染色体均不存在非整倍性,
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第四阈值表示所述双胞胎均针对所述预定染色体均为非整倍性,
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第三三阈值或者第四阈值,或者介于所述第三阈值与所述第四值之间表示所述双胞胎其中一个针对所述预定染色体具有非整倍性,另外一个针对所述预定染色体不具有非整倍性。
  55. 根据权利要求54所述的方法,其特征在于,所述第三阈值为0.35,所述第四阈值为0.7。
  56. 根据权利要求50所述的方法,其特征在于,所述预定染色体为18、21和23号染色体至少之一。
  57. 一种确定双胞胎染色体非整倍性的系统,其特征在于,包括:
    第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为权利要求21~45所述的确定生物样本中游离核酸比例的设备,用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例,
    第三游离胎儿DNA比例确定设备,所述第三游离胎儿DNA比例确定设备适于基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及
    第一非整倍性确定设备,所述第一非整倍性确定设备适于基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在非整倍性。
  58. 根据权利要求57所述的系统,其特征在于,所述第三游离胎儿DNA比例是基于下列公式确定的:
    fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
    其中,
    fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
    chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
    adjust.chri.ER%表示预先确定的怀有正常双胞胎的孕妇外周血样本中来源于所述预定染色体的游离核酸的测序数据占总测序数据的百分比平均值。
  59. 根据权利要求58所述的系统,其特征在于,非整倍性确定设备进一步包括:
    比值确定单元,所述比值确定单元用于确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
    比较单元,将所得到的比值与预定的第三阈值和第四阈值进行比较,以便确定所述双胞胎针对所述 预定染色体是否存在非整倍性。
  60. 根据权利要求59所述的系统,其特征在于,所述第三阈值是基于多个已知双胞胎针对所述预定染色体均不存在非整倍性的参照样本确定的,所述第四阈值是基于多个已知双胞胎针对所述预定染色体均为非整倍性的参照样本确定的。
  61. 根据权利要求60所述的系统,其特征在于,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第三阈值表示所述双胞胎针对所述预定染色体均不存在非整倍性,
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第四阈值表示所述双胞胎均针对所述预定染色体均为非整倍性,
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值等于所述第三阈值或者第四阈值,或者介于所述第三阈值与所述第四值之间表示所述双胞胎其中一个针对所述预定染色体具有非整倍性,另外一个针对所述预定染色体不具有非整倍性。
  62. 根据权利要求61所述的系统,其特征在于,所述第三阈值为0.35,所述第四阈值为0.7。
  63. 根据权利要求57所述的系统,其特征在于,所述预定染色体为18、21和23号染色体至少之一。
  64. 一种确定双胞胎染色体非整倍性的方法,其特征在于,包括:
    (1)对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;
    (2)确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;
    (3)基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;
    (4)基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,
    T2i=(xii*(1+fra/2))/σi
    d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据权利要求1~19所述的方法确定的第一游离胎儿DNA比例,
    (5)根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第五阈值,L=预定的第六阈值,将所述四象限图进行划分区域,
    其中,
    当待测样本落在第一象限,则双胎中两个胎儿都判定为三体;
    当待测样本落在第二象限,则双胎中有一个胎儿都判定为三体,一个正常;
    当待测样本落在第三象限,则双胎中两个胎儿都判定为正常;
    当待测样本落在第四象限,则双胎中判定为胎儿浓度比较低的样本,不采用其结果。
  65. 一种检测胎儿嵌合体的方法,其特征在于,包括:
    (1)对怀有胎儿孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,任选地,所述胎儿为男胎;
    (2)基于所述测序数据,根据权利要求1~20任一项所述的方法,确定第一游离胎儿DNA比例,
    或者,基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
    fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,
    其中,
    fra.chry表示第一游离胎儿DNA比例,
    chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;
    Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及
    Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;
    (3)基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及
    (4)基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述胎儿中针对所述预定染色体是否存在嵌合体。
  66. 根据权利要求65所述的方法,其特征在于,所述第三游离胎儿DNA比例是基于下列公式确定的:
    fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
    其中,
    fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
    chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
    adjust.chri.ER%表示预先确定的怀有正常胎儿的孕妇外周血样本中来源于所述预定染色体的游离核酸的测序数据占总测序数据的百分比平均值。
  67. 根据权利要求66所述的方法,其特征在于,在步骤(4)中,包括:
    (a)确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
    (b)将步骤(a)中所得到的比值与预定的多个阈值进行比较,以便确定所述胎儿针对所述预定染色体是否存在嵌合体。
  68. 根据权利要求67所述的方法,其特征在于,所述预定的多个阈值包括选自下列的至少之一:
    第七阈值,所述第七阈值是基于多个已知针对所述预定染色体为完全单体的参照样本确定的,
    第八阈值,所述第八阈值是基于多个已知针对所述预定染色体为单体嵌合的参照样本确定的,
    第九阈值,所述第九阈值是基于多个已知针对所述预定染色体为正常的参照样本确定的,
    第十阈值,所述第十阈值是基于多个已知针对所述预定染色体为完全三体的参照样本确定的,
    任选地,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第七阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全单体;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第七阈值并且不大于所述第八阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为单体嵌合;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第八阈值并且小于于所述第九阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为正常;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第九阈值并且不大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为三体嵌合;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全三体。
  69. 根据权利要求68所述的方法,其特征在于,所述第七阈值大于-1并且所述第七阈值小于0,任选地所述第七阈值为-0.85;
    所述第八阈值大于所述第七阈值并且所述第八阈值小于0,任选地所述第八阈值为-0.3;
    所述第九阈值大于0并且所述第九阈值小于1,任选地所述第九阈值为0.3;
    所述第十阈值大于所述第九阈值并且所述第十阈值小于1,任选地所述第十阈值为0.85。
  70. 一种检测胎儿嵌合体的系统,其特征在于,包括:
    第一游离胎儿DNA比例确定设备,所述第一游离胎儿DNA比例确定设备为权利要求21~45所述的确定生物样本中游离核酸比例的设备,用于对胎儿的孕妇的外周血进行核酸测序,以便获得由多个测序数据构成的测序结果,并且基于所述测序数据,确定第一游离胎儿DNA比例,
    或者,所述第一游离胎儿DNA比例确定设备适于基于下列公式,使用Y染色体估算胎儿浓度fra.chrY%作为第一游离胎儿DNA比例:
    fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,
    其中,
    fra.chry表示第一游离胎儿DNA比例,
    chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;
    Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及
    Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;
    第三游离胎儿DNA比例确定设备,所述第三游离胎儿DNA比例确定设备适于基于所述测序结果中来源于预定染色体的测序数据,确定第三游离胎儿DNA比例;以及
    嵌合体确定设备,所述嵌合体确定设备适于基于所述第一游离胎儿DNA比例与所述第三游离胎儿DNA比例,确定所述双胞胎中针对所述预定染色体是否存在嵌合体。
  71. 根据权利要求70所述的系统,其特征在于,所述第三游离胎儿DNA比例是基于下列公式确定的:
    fra.chri=2*(chri.ER%/adjust.chri.ER%-1)*100%
    其中,
    fra.chri表示所述第三游离胎儿DNA比例,i为所述预定染色体的编号,并且i为1~22范围内的任意整数;
    chri.ER%表示所述测序结果中来源于所述预定染色体的测序数据占总测序数据的百分比;
    adjust.chri.ER%表示预先确定的怀有正常胎儿的孕妇外周血样本中来源于所述预定染色体的游离核酸的测序数据占总测序数据的百分比平均值。
  72. 根据权利要求71所述的系统,其特征在于,非整倍性确定设备进一步包括:
    比值确定单元,所述比值确定单元用于确定所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值;以及
    比较单元,将所得到的比值与预定的多个阈值进行比较,以便确定所述胎儿针对所述预定染色体是否存在嵌合体。
  73. 根据权利要求72所述的系统,其特征在于,所述预定的多个阈值包括选自下列的至少之一:
    第七阈值,所述第七阈值是基于多个已知针对所述预定染色体为完全单体的参照样本确定的,
    第八阈值,所述第八阈值是基于多个已知针对所述预定染色体为单体嵌合的参照样本确定的,
    第九阈值,所述第九阈值是基于多个已知针对所述预定染色体为正常的参照样本确定的,
    第十阈值,所述第十阈值是基于多个已知针对所述预定染色体为完全三体的参照样本确定的,
    任选地,所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值小于所述第七阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全单体;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第七阈值并且不大于所述第八阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为单体嵌合;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第八阈值并且小于所述第九阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为正常;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值不小于所述第九阈值并且不大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为三体嵌合;
    所述第三游离胎儿DNA比例与所述第一游离胎儿DNA比例的比值大于所述第十阈值表示所述胎儿针对所述预定染色体表示所述胎儿针对所述预定染色体为完全三体。
  74. 根据权利要求73所述的系统,其特征在于,所述第七阈值大于-1并且所述第七阈值小于0,任选地所述第五阈值为-0.85;
    所述第八阈值大于所述第七阈值并且所述第八阈值小于0,任选地所述第八阈值为-0.3;
    所述第九阈值大于0并且所述第九阈值小于1,任选地所述第九阈值为0.3;
    所述第十阈值大于所述第九阈值并且所述第十阈值小于1,任选地所述第十阈值为0.85。
  75. 一种检测胎儿嵌合体的方法,其特征在于,包括:
    (1)对怀有胎儿的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果;
    (2)确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;
    (3)基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;
    (4)基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,
    T2i=(xii*(1+fra/2))/σi
    d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据权利要求1~19所述的方法确定的游离胎儿DNA比例,或者是使用Y染色体估算的胎儿浓度fra.chrY%,其中,
    fra.chry=(chry.ER%-Female.chr.ER%)/(Man.chry.UR%-Female.chry.ER%)*100%,
    其中,
    fra.chry表示游离胎儿DNA比例,
    chry.ER%表示所述测序结果中来源于染色体Y的测序数据占总测序数据的百分比;
    Female.chry.ER%表示预先确定的怀有正常女胎的孕妇外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;以及
    Man.chry.ER%表示预先确定的正常男性外周血样本中来源于染色体Y的游离核酸的测序数据占总测序数据的百分比的平均值;
    (5)如果T不大于0,则根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第十一阈值,L=预定的第十二阈值,将所述四象限图进行划分区域,
    其中,
    当待测样本落在第一象限,则将所述胎儿判定为针对所述预定染色体为完全单体或单体嵌合;
    当待测样本落在第二象限,则将所述胎儿判定为针对所述预定染色体为单体嵌合;
    当待测样本落在第三象限,则将所述胎儿判定为针对所述预定染色体为正常;
    当待测样本落在第四象限,则将所述胎儿判定为胎儿浓度比较低的样本,不采用其结果,
    如果T大于0,则根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第十三阈值,L=预定的第十四阈值,将所述四象限图进行划分区域,
    其中,
    当待测样本落在第一象限,则将所述胎儿判定为针对所述预定染色体为完全三体或三体嵌合;
    当待测样本落在第二象限,则将所述胎儿判定为针对所述预定染色体为三体嵌合;
    当待测样本落在第三象限,则将所述胎儿判定为针对所述预定染色体为正常;
    当待测样本落在第四象限,则将所述胎儿判定为胎儿浓度比较低的样本,不采用其结果,
    任选地,所述第十一阈值和所述第十三阈值分别独立地为3,所述第十二阈值和所述十四阈值分别独立地为1。
  76. 一种确定双胞胎染色体非整倍性的系统,其特征在于,包括:
    xi值确定设备,所述xi值确定设备用于对怀有双胞胎的孕妇的外周血进行游离核酸测序,以便获得由多个测序数据构成的测序结果,并且确定所述测序结果中来源于染色体i的测序数据数目占总测序数据的比例xi,其中,i表示染色体编号,并且i为1~22范围内的任意整数;
    T值确定设备,所述T值确定设备用于基于公式Ti=(xii)/σi,确定针对染色体i的T值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,μi表示在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的均值,σi:在参考数据库中被选为参照系的第i条染色体的测序数据占总测序数据的百分含量的标准差;
    L值确定设备,所述L值确定设备用于基于公式Li=log(d(Ti,a))/log(d(T2i,a))确定针对染色体i的L值,其中,i表示染色体编号,并且i为1~22范围内的任意整数,
    T2i=(xii*(1+fra/2))/σi
    d(Ti,a)和d(T2i,a)为t分布概率密度函数,a为自由度,fra是根据权利要求1~20所述的方法确定的游离胎儿DNA比例,
    第二非整倍性确定设备,所述第二非整倍性确定设备适于根据T值和L值作四象限图,横坐标为L值,纵坐标为T值,按照直线T=预定的第五阈值,L=预定的第六阈值,将所述四象限图进行划分区域,
    其中,
    当待测样本落在第一象限,则双胎中两个胎儿都判定为三体;
    当待测样本落在第二象限,则双胎中有一个胎儿都判定为三体,一个正常;
    当待测样本落在第三象限,则双胎中两个胎儿都判定为正常;
    当待测样本落在第四象限,则双胎中判定为胎儿浓度比较低的样本,不采用其结果。
PCT/CN2015/085109 2014-07-25 2015-07-24 确定生物样本中游离核酸比例的方法、装置及其用途 WO2016011982A1 (zh)

Priority Applications (14)

Application Number Priority Date Filing Date Title
AU2015292020A AU2015292020B2 (en) 2014-07-25 2015-07-24 Method and device for determining a ratio of free nucleic acids in a biological sample and use thereof
PL15825462T PL3178941T3 (pl) 2014-07-25 2015-07-24 Sposób oznaczania frakcji wolnych pozakomórkowych płodowych kwasów nukleinowych w próbce krwi obwodowej kobiety ciężarnej i jego zastosowanie
RS20220024A RS62803B1 (sr) 2014-07-25 2015-07-24 Postupak za određivanje frakcije slobodnih fetalnih nukleinskih kiselina u uzorku periferne krvi trudnice i njihova upotreba
US15/329,148 US20180082012A1 (en) 2014-07-25 2015-07-24 Method and device for determining fraction of cell-free nucleic acids in biological sample and use thereof
EP15825462.3A EP3178941B1 (en) 2014-07-25 2015-07-24 Method for determining the fraction of cell-free fetal nucleic acids in a peripheral blood sample from a pregnant woman and use thereof
SG11201700602WA SG11201700602WA (en) 2014-07-25 2015-07-24 Method and device for determining a ratio of free nucleic acids in a biological sample and use thereof
SI201531771T SI3178941T1 (sl) 2014-07-25 2015-07-24 Postopek za določanje deleža brezceličnih fetalnih nukleinskih kislin v vzorcu periferne krvi nosečnice in njegova uporaba
CA2956105A CA2956105C (en) 2014-07-25 2015-07-24 Method and device for determining fraction of cell-free nucleic acids in biological sample and use therof
RU2017105504A RU2699728C2 (ru) 2014-07-25 2015-07-24 Способ и устройство для определения фракции внеклеточных нуклеиновых кислот в биологическом образце и их применение
ES15825462T ES2903103T3 (es) 2014-07-25 2015-07-24 Método para determinar la fracción de ácidos nucleicos fetales libres de células en una muestra de sangre periférica de una mujer embarazada y uso del mismo
BR112017001481-5A BR112017001481B1 (pt) 2014-07-25 2015-07-24 Método e dispositivo para a determinação de uma fração de ácidos nucleicos fetal livres de células em uma amostra de sangue periférico de uma mulher grávida, sistema para determinar o sexo de gêmeos, e sistema para determinação de uma aneuploidia cromossômica de gêmeos, sistema para detectar quimera fetal
KR1020177004842A KR102018444B1 (ko) 2014-07-25 2015-07-24 생물학적 샘플 중의 무세포 핵산의 분획을 결정하기 위한 방법 및 장치 및 이의 용도
HRP20220045TT HRP20220045T1 (hr) 2014-07-25 2015-07-24 Postupak za određivanje frakcije slobodnih fetalnih nukleinskih kiselina u uzorku periferne krvi trudnice i njihova uporaba
DK15825462.3T DK3178941T3 (da) 2014-07-25 2015-07-24 Fremgangsmåde til bestemmelse af fraktionen af cellefrie føtale nukleinsyrer i en prøve af perifert blod fra en gravid kvinde og anvendelse deraf

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410359726.4 2014-07-25
CN201410359726 2014-07-25

Publications (1)

Publication Number Publication Date
WO2016011982A1 true WO2016011982A1 (zh) 2016-01-28

Family

ID=55162536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/085109 WO2016011982A1 (zh) 2014-07-25 2015-07-24 确定生物样本中游离核酸比例的方法、装置及其用途

Country Status (17)

Country Link
US (1) US20180082012A1 (zh)
EP (1) EP3178941B1 (zh)
KR (1) KR102018444B1 (zh)
CN (1) CN105296606B (zh)
AU (1) AU2015292020B2 (zh)
CA (1) CA2956105C (zh)
DK (1) DK3178941T3 (zh)
ES (1) ES2903103T3 (zh)
HK (1) HK1213601A1 (zh)
HR (1) HRP20220045T1 (zh)
HU (1) HUE059031T2 (zh)
PL (1) PL3178941T3 (zh)
RS (1) RS62803B1 (zh)
RU (1) RU2699728C2 (zh)
SG (1) SG11201700602WA (zh)
SI (1) SI3178941T1 (zh)
WO (1) WO2016011982A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018178700A1 (en) * 2017-03-31 2018-10-04 Premaitha Limited Method of detecting a fetal chromosomal abnormality
CN110191964A (zh) * 2017-01-24 2019-08-30 深圳华大基因股份有限公司 确定生物样本中预定来源的游离核酸比例的方法及装置

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012129363A2 (en) 2011-03-24 2012-09-27 President And Fellows Of Harvard College Single cell nucleic acid detection and analysis
US20160040229A1 (en) 2013-08-16 2016-02-11 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
IL305303A (en) 2012-09-04 2023-10-01 Guardant Health Inc Systems and methods for detecting rare mutations and changes in number of copies
US10876152B2 (en) 2012-09-04 2020-12-29 Guardant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US11913065B2 (en) 2012-09-04 2024-02-27 Guardent Health, Inc. Systems and methods to detect rare mutations and copy number variation
JP6571665B2 (ja) 2013-12-28 2019-09-04 ガーダント ヘルス, インコーポレイテッド 遺伝的バリアントを検出するための方法およびシステム
CN108603228B (zh) 2015-12-17 2023-09-01 夸登特健康公司 通过分析无细胞dna确定肿瘤基因拷贝数的方法
ES2840003T3 (es) 2016-09-30 2021-07-06 Guardant Health Inc Métodos para análisis multi-resolución de ácidos nucleicos libres de células
US9850523B1 (en) 2016-09-30 2017-12-26 Guardant Health, Inc. Methods for multi-resolution analysis of cell-free nucleic acids
TW202348802A (zh) * 2017-01-25 2023-12-16 香港中文大學 使用核酸片段之診斷應用
US11535896B2 (en) * 2017-05-15 2022-12-27 Katholieke Universiteit Leuven Method for analysing cell-free nucleic acids
CN107239676B (zh) * 2017-05-17 2018-04-17 东莞博奥木华基因科技有限公司 一种针对胚胎染色体的序列数据处理装置
CN108733979A (zh) * 2017-10-30 2018-11-02 成都凡迪医疗器械有限公司 Nipt的gc含量校准方法、装置及计算机可读存储介质
CN108060218A (zh) * 2017-11-14 2018-05-22 广州精科医学检验所有限公司 核酸测序文库中预定范围的核酸片段的筛选方法
CA3111887A1 (en) 2018-09-27 2020-04-02 Grail, Inc. Methylation markers and targeted methylation probe panel
AU2020216438A1 (en) 2019-01-31 2021-07-29 Guardant Health, Inc. Compositions and methods for isolating cell-free DNA
US11211147B2 (en) 2020-02-18 2021-12-28 Tempus Labs, Inc. Estimation of circulating tumor fraction using off-target reads of targeted-panel sequencing
US11211144B2 (en) 2020-02-18 2021-12-28 Tempus Labs, Inc. Methods and systems for refining copy number variation in a liquid biopsy assay
US11475981B2 (en) 2020-02-18 2022-10-18 Tempus Labs, Inc. Methods and systems for dynamic variant thresholding in a liquid biopsy assay
WO2023010242A1 (zh) * 2021-08-02 2023-02-09 深圳华大生命科学研究院 估计无创产前基因检测数据中胎儿核酸浓度的方法和系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101137760A (zh) * 2005-03-18 2008-03-05 香港中文大学 检测染色体非整倍性的方法
EP2524056A1 (en) * 2010-01-15 2012-11-21 The University Of British Columbia Multiplex amplification for the detection of nucleic acid variations
CN103923987A (zh) * 2014-04-01 2014-07-16 中山大学达安基因股份有限公司 一种基于高通量测序检测13、18、21三体综合征的方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2633311A4 (en) * 2010-10-26 2014-05-07 Univ Stanford NON-INVASIVE F TAL GENE SCREENING BY SEQUENCING ANALYSIS
US9892230B2 (en) * 2012-03-08 2018-02-13 The Chinese University Of Hong Kong Size-based analysis of fetal or tumor DNA fraction in plasma
EP3026124A1 (en) * 2012-10-31 2016-06-01 Genesupport SA Non-invasive method for detecting a fetal chromosomal aneuploidy
WO2014075228A1 (zh) * 2012-11-13 2014-05-22 深圳华大基因医学有限公司 确定生物样本中染色体数目异常的方法、系统和计算机可读介质
CN103525939B (zh) * 2013-10-28 2015-12-02 博奥生物集团有限公司 无创检测胎儿染色体非整倍体的方法和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101137760A (zh) * 2005-03-18 2008-03-05 香港中文大学 检测染色体非整倍性的方法
EP2524056A1 (en) * 2010-01-15 2012-11-21 The University Of British Columbia Multiplex amplification for the detection of nucleic acid variations
CN103923987A (zh) * 2014-04-01 2014-07-16 中山大学达安基因股份有限公司 一种基于高通量测序检测13、18、21三体综合征的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3178941A4 *
ZHAO, YUEHONG; ET AL.: "Quantification of Fetal DNA by Using Methylation-Based Discrimination", JOURNAL OF XINXIANG MEDICAL UNIVERSITY, vol. 29, no. 8, 31 August 2012 (2012-08-31), pages 589 - 591, XP008184613 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110191964A (zh) * 2017-01-24 2019-08-30 深圳华大基因股份有限公司 确定生物样本中预定来源的游离核酸比例的方法及装置
CN110191964B (zh) * 2017-01-24 2023-12-05 深圳华大基因股份有限公司 确定生物样本中预定来源的游离核酸比例的方法及装置
WO2018178700A1 (en) * 2017-03-31 2018-10-04 Premaitha Limited Method of detecting a fetal chromosomal abnormality

Also Published As

Publication number Publication date
HUE059031T2 (hu) 2022-10-28
RU2017105504A (ru) 2018-08-27
AU2015292020A1 (en) 2017-02-16
RS62803B1 (sr) 2022-02-28
US20180082012A1 (en) 2018-03-22
ES2903103T3 (es) 2022-03-31
PL3178941T3 (pl) 2022-02-14
CA2956105C (en) 2021-04-27
EP3178941B1 (en) 2021-10-13
EP3178941A4 (en) 2017-12-06
KR102018444B1 (ko) 2019-09-04
SI3178941T1 (sl) 2022-04-29
RU2699728C2 (ru) 2019-09-09
CN105296606B (zh) 2019-08-09
RU2017105504A3 (zh) 2018-08-27
SG11201700602WA (en) 2017-03-30
DK3178941T3 (da) 2022-01-17
CN105296606A (zh) 2016-02-03
BR112017001481A2 (pt) 2017-12-05
AU2015292020B2 (en) 2018-07-05
CA2956105A1 (en) 2016-01-28
KR20170036734A (ko) 2017-04-03
HRP20220045T1 (hr) 2022-04-15
EP3178941A1 (en) 2017-06-14
HK1213601A1 (zh) 2016-07-08

Similar Documents

Publication Publication Date Title
WO2016011982A1 (zh) 确定生物样本中游离核酸比例的方法、装置及其用途
US20210257053A1 (en) Size-based analysis of cell-free tumor dna for classifying level of cancer
TWI611186B (zh) 多重妊娠之分子檢驗
EP2961851B1 (en) Maternal plasma transcriptome analysis by massively parallel rna sequencing
US20140099642A1 (en) Noninvasive detection of fetal genetic abnormality
IL249095B1 (en) Detection of subchromosomal aneuploidy in the fetus and variations in the number of copies
JP2013509870A5 (zh)
US20190228131A1 (en) Novel method capable of differentiating fetal sex and fetal sex chromosome abnormality on various platforms
JP2014507158A5 (zh)
CN103459614A (zh) 胎儿性染色体的非侵入性产前基因分型
CN105844116A (zh) 测序数据的处理方法和处理装置
Song et al. Quantitation of fetal DNA fraction in maternal plasma using circulating single molecule amplification and re-sequencing technology (cSMART)
JP6929778B2 (ja) 着床前遺伝子スクリーニングにおける一塩基多型を用いた品質管理方法
CN104951671A (zh) 基于单样本外周血检测胎儿染色体非整倍性的装置
CN106591451A (zh) 测定胎儿游离dna含量的方法及其用于实施该方法的装置
CN108229099B (zh) 数据处理方法、装置、存储介质及处理器
WO2018137496A1 (zh) 确定生物样本中预定来源的游离核酸比例的方法及装置
WO2017082034A1 (ja) 細胞間または細胞群間の同一人かどうか、他人かどうか、親子かどうか、または血縁関係かどうかの判定方法
CN109280697A (zh) 利用孕妇血浆游离dna进行胎儿基因型鉴定的方法
WO2023010242A1 (zh) 估计无创产前基因检测数据中胎儿核酸浓度的方法和系统
Lu et al. Two approaches for calculating female fetal DNA fraction in noninvasive prenatal testing based on size analysis of maternal DNA fragments
Shubina et al. Quantification of fetal DNA in the plasma of pregnant women using next generation sequencing of frequent single nucleotide polymorphisms
BR112017001481B1 (pt) Método e dispositivo para a determinação de uma fração de ácidos nucleicos fetal livres de células em uma amostra de sangue periférico de uma mulher grávida, sistema para determinar o sexo de gêmeos, e sistema para determinação de uma aneuploidia cromossômica de gêmeos, sistema para detectar quimera fetal
GB2564846A (en) Prenatal screening and diagnostic system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15825462

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2956105

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15329148

Country of ref document: US

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017001481

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2015292020

Country of ref document: AU

Date of ref document: 20150724

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20177004842

Country of ref document: KR

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2015825462

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015825462

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017105504

Country of ref document: RU

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 112017001481

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170124