WO2019227420A1 - Method and system for determining presence of triploids in male sample to be tested, and computer readable medium - Google Patents

Method and system for determining presence of triploids in male sample to be tested, and computer readable medium Download PDF

Info

Publication number
WO2019227420A1
WO2019227420A1 PCT/CN2018/089328 CN2018089328W WO2019227420A1 WO 2019227420 A1 WO2019227420 A1 WO 2019227420A1 CN 2018089328 W CN2018089328 W CN 2018089328W WO 2019227420 A1 WO2019227420 A1 WO 2019227420A1
Authority
WO
WIPO (PCT)
Prior art keywords
chromosome
average
sequencing
triploid
depth
Prior art date
Application number
PCT/CN2018/089328
Other languages
French (fr)
Chinese (zh)
Inventor
柴相花
王军
李佳霖
王宇秋
陈丽娜
袁玉英
张红云
彭智宇
刘娜
尹烨
Original Assignee
深圳华大临床检验中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大临床检验中心 filed Critical 深圳华大临床检验中心
Priority to CN201880056925.4A priority Critical patent/CN111373054A/en
Priority to PCT/CN2018/089328 priority patent/WO2019227420A1/en
Publication of WO2019227420A1 publication Critical patent/WO2019227420A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Definitions

  • the invention relates to the field of biomedicine. Specifically, it relates to a method, system, and computer-readable medium for determining whether a triploid is present in a male test sample.
  • Triploid means that the patient's somatic cells have three sets of chromosomes, one for each pair of chromosomes, and the total number of chromosomes is 69 (3n), including 69, XXY, 69, XYY, 69, XXX.
  • Triploid fetuses account for 2-3% of pregnancy and 15% of early abortions.
  • the main causes of triploid are twin female fertilization and twin male fertilization.
  • the current methods for detecting triploid include fluorescence in situ hybridization (FISH), comparative genomic hybridization (CGH), single nucleotide polymorphism chip technology (single nucleotide polymorphism, SNP array) ), Short tandem repeat analysis (Short Tandem Repeat, STR), real-time polymerase chain reaction (quantitative real-time polymerase chain reaction, qPCR).
  • FISH fluorescence in situ hybridization
  • CGH comparative genomic hybridization
  • single nucleotide polymorphism chip technology single nucleotide polymorphism, SNP array
  • STR Short tandem repeat analysis
  • real-time polymerase chain reaction quantitative real-time polymerase chain reaction
  • SNP array can detect all chromosome aneuploidies and some single-gene hereditary diseases, but it takes too long, and is costly, difficult to analyze data, simple STR operation, high detection accuracy, but affected by loci
  • the restriction detection function is single, and the inconvenience of operation limits the large-scale application; qPCR is prone to allele tripping or selective allele amplification, and the incidence rate can reach 10% to 25%, which seriously affects the analysis results. Accuracy.
  • the present invention aims to solve at least one of the technical problems existing in the prior art.
  • the present invention proposes a method for determining whether a triploid exists in a male test sample.
  • the method includes: (1) comparing a sequencing result from the male test sample with a reference sequence, where the sequencing result is composed of multiple sequencing sequences; (2) based on steps (1) the result of the middle comparison to determine the average sequencing depth of a predetermined chromosome, the predetermined chromosome comprising a Y chromosome and at least one autosome; The ratio of the sequencing depth to the average sequencing depth of the Y chromosome is recorded as DR i , where i represents a chromosome number; and (4) determining whether the three samples of the male test are present based on the DRi obtained in step (3).
  • the male triploid detection can be performed based on the low-coverage sequencing data. Compared with the prior art, the detection cost is greatly reduced, the period is greatly shortened, and the accuracy of the detection result is high.
  • the invention proposes a system for determining whether a triploid is present in a male sample to be tested.
  • the system includes: an alignment device configured to compare a sequencing result from the male test sample with a reference sequence, and the sequencing result is obtained by multiple sequencing Sequence composition; an average sequencing depth determination device, which is connected to the comparison device, and is configured to determine an average sequencing depth of a predetermined chromosome based on an alignment result obtained by the comparison device, the predetermined chromosome includes A Y chromosome and at least one autosome; a DRi determination device connected to the average sequencing depth determination device for determining each of the at least one autosome and the average sequencing depth of the autosome and Y
  • the ratio of the average sequencing depth of the chromosomes is denoted as DRi, where i represents the chromosome number; and a determination device, the determination device is connected to the DRi determination device, and is configured to determine the DRi based on the DR
  • the detection of male triploid based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
  • the invention proposes a computer-readable medium.
  • instructions are stored in the computer-readable medium, and the instructions are adapted to perform the following steps to determine whether a triploid exists in a male sample to be tested, (1) from the male to be tested
  • the sequencing result of the sample is compared with a reference sequence, the sequencing result is composed of multiple sequencing sequences; (2) an average sequencing depth of a predetermined chromosome is determined based on the result of the alignment in step (1), and the predetermined chromosome is determined Including the Y chromosome and at least one autosome; (3) for each of the at least one autosome, determining a ratio of an average sequencing depth of the autosome to an average sequencing depth of the Y chromosome, denoted as DRi, where i represents Chromosome numbering; and (4) determining whether the triploid is present in the male test sample based on the DRi obtained in step (3).
  • the computer-readable medium of the embodiment of the present invention detection of male triploids based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
  • FIG. 1 is a schematic structural diagram of a system for determining whether a triploid exists in a male test sample according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a determination device according to an embodiment of the present invention.
  • FIG. 3 is an average depth ratio distribution of a test set according to an embodiment of the present invention, wherein “o” represents a negative sample, “x” represents a positive sample, and dashed lines represent four boundaries for judging an unknown sample.
  • first and second are only used for description purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, unless otherwise stated, the meaning of "a plurality" is two or more.
  • the present invention provides a method for determining whether a triploid exists in a male sample to be tested.
  • the method includes: (1) comparing a sequencing result from the male test sample with a reference sequence, where the sequencing result is composed of multiple sequencing sequences; (2) based on steps (1) the result of the middle comparison to determine the average sequencing depth of a predetermined chromosome, the predetermined chromosome including a Y chromosome and at least one autosome; The ratio of the depth to the average sequencing depth of the Y chromosome is recorded as DR i , where i represents the chromosome number; and (4) determining whether the three-fold presence of the male test sample is based on the DRi obtained in step (3). body.
  • detection of male triploid can be performed based on low-coverage sequencing data. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection
  • the sequencing result from the male test sample is compared with a reference sequence.
  • SOAP v2.20
  • Hg19 human gene reference sequence
  • the triploid is XXY or XYY.
  • the sequencing result is from low-depth sequencing.
  • the method is particularly suitable for analysis of low-depth sequencing data.
  • the sample to be tested is from an abortion tissue. Furthermore, it is convenient to obtain materials, which further reduces the testing cost.
  • the average sequencing depth of the predetermined chromosome is determined based on: (a) the number of sequences of the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome; (b) the A length of a reference sequence of a predetermined chromosome; and (c) an average sequence length of the plurality of sequencing sequences.
  • the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome is a uniquely aligned sequence.
  • step (2) the average sequencing depth is determined according to the following formula:
  • D i represents the average depth of chromosome i
  • i is at least one integer in the range of 1 to 24, where 23 and 24 represent X and Y chromosomes, respectively,
  • R i represents the number of sequences of the sequencing sequence capable of being aligned with the reference sequence of the ith chromosome
  • R_len represents the average sequence length of the multiple sequencing sequences
  • C_len i represents the length of chromosome i in the reference sequence.
  • the average sequencing depth of each chromosome is obtained to determine the ratio of the average sequencing depth of each autosome to the average sequencing depth of the Y chromosome.
  • the predetermined chromosome includes at least 2 autosomes, preferably at least 10 autosomes, and most preferably 22 autosomes.
  • step (4) further includes: (4-1) determining, for each of the at least one autosome in the predetermined chromosome, a ratio of a DRi of the autosome to a reference depth ratio, and recording Is DDR i ; (4-2) Based on the DDR i obtained in step (4-1), determine an average depth ratio of the at least one autosome in the predetermined chromosome, and record it as (4-3) Based on what was obtained in step (4-2) It is determined whether the triploid is present in the male test sample. Further, in step (4), according to The result of determining whether the triploid is present in a male test sample is more accurate.
  • the reference depth ratio is determined in advance based on a plurality of control samples having a known triploid state.
  • control sample is targeted to a sex chromosome and does not have a triploid.
  • the reference depth ratio is determined in advance based on at least 100, preferably 1,000, control samples having a known triploid state.
  • the reference depth ratio can be obtained by randomly selecting a plurality of (eg, 1000) targeted chromosomes, and a negative control sample without triploid as a reference set, and calculating each control in the reference set.
  • D'i D' ratio DR i / D '24, wherein, D' i each represents a control sample of autosomal Average sequencing depth, D '24 represents the average sequencing depth of the Y chromosome in each control sample, and D' i is obtained in the same manner as described above for D i , that is, sequencing is performed based on each control sample to obtain each control Comparison result between the sample sequencing sequence and the reference sequence-the number of sequences of the sequencing sequence that can be compared with the reference sequence of chromosome i and the average sequence length of multiple sequencing sequences to obtain each control sample Average autosomal sequencing depth D ′ i ), and then calculate the average mean sequencing depth ratio of autosomal and Y chromosomes for all samples in the reference set which is Where n is the number of control samples in the reference set, Is the reference depth ratio of chromosome i.
  • the ratio DDR i is according to a formula definite.
  • the "average depth ratio of the at least one autosome in the predetermined chromosome" described in this application refers to the average value of the DDRi of the at least one autosome in the predetermined chromosome, that is, the DRi of the autosome and The average of the ratios of the reference depth ratios.
  • the average depth ratio of the at least one autosome in the predetermined chromosome is Is according to the formula definite.
  • the predetermined chromosome includes a Y chromosome and all autosomes, and an average depth ratio of all the autosomes is Is according to the formula definite.
  • step (4-3) the average depth ratio is Compared with a threshold value, it is determined whether the triploid exists in the male test sample.
  • the average depth ratio is Not lower than the first threshold is an indication that the male sample to be tested is XXY triploid
  • the average depth ratio Not exceeding the second threshold is an indication that the male test sample is an XYY triploid
  • the first threshold and the second threshold are determined based on a plurality of reference samples of known triploid types.
  • the first threshold and the second threshold are determined based on 100 to 10,000 reference samples of known triploid type.
  • the first threshold value is at least 1.14, preferably at least 1.15, and the second threshold value is not more than 0.9, preferably 0.88, and more preferably 0.85.
  • the method further includes: the average depth ratio Located within a range of a predetermined interval is an indication that the male test sample is non-triploid, and the range of the predetermined interval is determined based on the first threshold and the second threshold.
  • the left end value of the predetermined interval range is not less than the second threshold value, and the right end value of the predetermined interval range is not higher than the first threshold value.
  • the difference between the left end value and the second threshold value and the difference between the right end value and the first threshold value are independently not less than 0.02, preferably not less than 0.03.
  • the invention proposes a system for determining whether a triploid is present in a male sample to be tested.
  • the system includes:
  • the comparison device 100 is configured to compare a sequencing result from the male test sample with a reference sequence, and the sequencing result is composed of multiple sequencing sequences.
  • the sequence obtained by sequencing is compared with a reference genomic sequence, and the alignment may adopt SOAP (v2.20), and the sequence obtained by sequencing is aligned to a human gene reference sequence (Hg19) to obtain an alignment file.
  • the sequencing sequence capable of being aligned with the reference sequence of the predetermined chromosome is a unique alignment sequence.
  • An average sequencing depth determination device 200 which is connected to the comparison device 100, and is configured to determine an average sequencing depth of a predetermined chromosome based on the comparison result obtained by the comparison device, where the predetermined chromosome includes Y chromosome and at least one autosome.
  • the average sequencing depth of the predetermined chromosome is determined based on: (a) the number of sequences of the sequencing sequence capable of being aligned with the reference sequence of the predetermined chromosome, and (b) the reference sequence of the predetermined chromosome And (c) the average sequence length of the plurality of sequencing sequences; specifically, the average sequencing depth is determined according to the following formula: D i represents the average sequencing depth of chromosome i, i is at least one integer in the range of 1 to 24, 23 and 24 represent the X and Y chromosomes, respectively, and R i represents all the genes that can be compared with the reference sequence of the i chromosome.
  • the sequence number of the sequencing sequence, R_len represents the average sequence length of the plurality of sequencing sequences, and C_len i represents the length of the i-th chromosome in the reference sequence.
  • the predetermined chromosome includes at least 2 autosomes, preferably at least 10 autosomes, and most preferably 22 autosomes.
  • DR i determining means 300 said DR i determining means 300 determines the average depth of the sequencing means 200 is connected, for each one of the at least one autosomal sequencing to determine the average depth of the Y chromosome of autosomal
  • the average sequencing depth of the at least one autosome is D24, and D24 represents the average sequencing depth of the Y chromosome.
  • a determining device 400 which is connected to the DR i determining device 300 and configured to determine whether the triploid exists in the male test sample based on the DR i obtained in the DR i determining device 300.
  • the triploid is XXY or XYY.
  • the determining device 400 includes:
  • the DDR i determination unit 401 a determining unit 401 for DDR i DDR i ratio for said at least one predetermined chromosomes of each autosomal determining the ratio DRi autosomal reference depth.
  • the ratio DDR i can be calculated according to the formula OK, where DR i represents the average sequencing depth ratio of autosome and Y chromosome, Indicates the reference depth ratio.
  • the reference depth ratio It can be obtained in the following way: randomly select multiple (at least 100, preferably 1000) targeted chromosomes, and use the negative control sample without triploid as the reference set, and calculate the i-th autosome and the control sample of each control sample in the reference set.
  • the average sequencing depth ratio of Y chromosome DR ' i (i 1, 2, 3... .22), and then calculate the average of the average sequencing depth ratio of autosomal and Y chromosomes of all samples in the reference set.
  • n is the number of control samples in the reference set
  • the determining unit 402 is connected to the DDR i determining unit 401 and is configured to determine an average depth ratio of the at least one autosome in the predetermined chromosome based on the DDR i obtained in the DDR i determining unit 401.
  • the average depth ratio of the at least one autosome in the predetermined chromosome refers to the average value of the DDR i of the at least one autosome in the predetermined chromosome, that is, the average value of the ratio of the DR i of the autosome to the reference depth ratio. According to formula Calculated.
  • the predetermined chromosome includes a Y chromosome and all autosomes, an average depth ratio of the entire autosomes Can follow the formula determine.
  • the average depth ratio is Compared with a threshold value, it is determined whether the triploid exists in the male test sample.
  • the average depth ratio Not lower than the first threshold is an indication that the male sample to be tested is XXY triploid
  • the average depth ratio Not exceeding the second threshold is an indication that the male test sample is an XYY triploid.
  • the first threshold and the second threshold are determined based on a plurality of reference samples of known triploid types. For example, the first threshold and the second threshold are based on 100 to 10,000 known triples. Determined by reference sample of ploidy type.
  • the first threshold value is at least 1.14, preferably at least 1.15
  • the second threshold value is not more than 0.9, preferably 0.88, and more preferably 0.85.
  • the average depth ratio Located within a range of a predetermined interval is an indication that the male test sample is non-triploid, and the range of the predetermined interval is determined based on the first threshold and the second threshold.
  • the left end value of the predetermined interval range is not less than the second threshold value
  • the right end value of the predetermined interval range is not higher than the first threshold value.
  • the difference between the left end value and the second threshold value and the difference between the right end value and the first threshold value are independently not less than 0.02, preferably not less than 0.03.
  • the determination criterion set by the determination unit 403 is as follows:
  • the triploid is XXY or XYY.
  • the sequencing result is from low-depth sequencing.
  • the system according to an embodiment of the present invention is particularly suitable for analysis of low-depth sequencing data.
  • the sample to be tested is from an abortion tissue. Furthermore, it is convenient to obtain materials, which further reduces the testing cost.
  • the detection of male triploid based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
  • the invention proposes a computer-readable medium.
  • instructions are stored in the computer-readable medium, and the instructions are adapted to perform the following steps to determine whether a triploid exists in a male sample to be tested, (1) from the male to be tested
  • the sequencing result of the sample is compared with a reference sequence, the sequencing result is composed of multiple sequencing sequences; (2) an average sequencing depth of a predetermined chromosome is determined based on the result of the alignment in step (1), and the predetermined chromosome is determined Including the Y chromosome and at least one autosome; (3) for each of the at least one autosome, determining a ratio of an average sequencing depth of the autosome to an average sequencing depth of the Y chromosome, denoted as DRi, where i represents Chromosome numbering; and (4) determining whether the triploid is present in the male test sample based on the DRi obtained in step (3).
  • the computer-readable medium of the embodiment of the present invention detection of male triploids based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
  • a total of 1438 male samples were used for the implementation of the technical scheme and the effect evaluation.
  • the total sample includes 1370 negative samples and 68 positive samples, of which 6 positive samples were retested once.
  • the sequencing data of all samples are based on the 35 bp (ie, SE 35 bp) sequence set obtained by single-ended sequencing of the BGISEQ-500 platform. Based on the sequence set, the specific implementation steps are as follows:
  • R_len represents the average sequence length of multiple sequencing sequences
  • C_len represents the length of chromosome i in the reference sequence
  • n represents the total number of samples in the reference set, that is, 1000, and the calculation results are shown in Table 1 (mean of the ratio of the average sequencing depth of each autosomal and Y chromosome calculated based on 1000 negative samples).

Abstract

A method for determining the presence of triploids in a male sample to be tested. Said method comprises: (1) comparing a sequencing result from the male sample to be tested with a reference sequence, the sequencing result comprising a plurality of sequencing sequences; (2) determining the average sequencing depth of predetermined chromosomes on the basis of the comparison result in step (1), the predetermined chromosomes including a Y chromosome and at least one autosome; (3) determining, for each of said at least one autosome, a ratio of the average sequencing depth of the autosome to the average sequencing depth of the Y chromosome, and marking same as DRi, i representing the serial number of the chromosome; and (4) determining, on the basis of the DRi obtained in step (3), the presence of the triploids in the male sample to be tested.

Description

确定男性待测样本是否存在三倍体的方法、系统和计算机可读介质Method, system and computer-readable medium for determining the presence of triploid in male test samples 技术领域Technical field
本发明涉及生物医学领域。具体而言,涉及确定男性待测样本是否存在三倍体的方法、系统和计算机可读介质。The invention relates to the field of biomedicine. Specifically, it relates to a method, system, and computer-readable medium for determining whether a triploid is present in a male test sample.
背景技术Background technique
三倍体指患者的体细胞具有三套染色体组,每对染色体都增加了一条,染色体总数为69(3n),包括69,XXY、69,XYY、69,XXX三种。三倍体胎儿在妊娠中占比达到2-3%,在早期流产物中占比达到15%,同样在IVF中也存在三倍体的情况,并且在IVF后三倍体的发生率高达2%-10%。三倍体主要产生原因为双雌受精和双雄受精。Triploid means that the patient's somatic cells have three sets of chromosomes, one for each pair of chromosomes, and the total number of chromosomes is 69 (3n), including 69, XXY, 69, XYY, 69, XXX. Triploid fetuses account for 2-3% of pregnancy and 15% of early abortions. There are also triploids in IVF, and the incidence of triploids after IVF is as high as 2 % -10%. The main causes of triploid are twin female fertilization and twin male fertilization.
目前检测三倍体的方法的主要有荧光原位杂交(fluorescent in situ hybridization,FISH),比较基因组杂交(comparative genomic hybridization,CGH),单核苷酸多态性芯片技术(single nucleotide polymorphism,SNP array),短串联重复序列分析(Short tandem repeat,STR),实时聚合酶连锁反应(quantitative real-time polymerase chain reaction,qPCR)。FISH方法简单快速,但分辨率和准确性低,并局限于单次杂交的探针数;CGH可以分析全部染色体,但分析时间长,仅能检测XYY和XXY两种,并且不能检测平衡易位和复杂的染色体畸变;SNP array可检测全部染色体非整倍体和部分单基因遗传病,但耗时过长,且成本高、数据分析困难,STR操作简单,检测准确性高,但受到位点限制检测功能单一,其操作的不便利性均限制了大规模的应用;qPCR容易发生等位基因脱扣或等位基因选择性扩增,发生率可达10%~25%,严重影响分析结果的准确性。The current methods for detecting triploid include fluorescence in situ hybridization (FISH), comparative genomic hybridization (CGH), single nucleotide polymorphism chip technology (single nucleotide polymorphism, SNP array) ), Short tandem repeat analysis (Short Tandem Repeat, STR), real-time polymerase chain reaction (quantitative real-time polymerase chain reaction, qPCR). The FISH method is simple and fast, but has low resolution and accuracy, and is limited to the number of probes for a single hybridization. CGH can analyze all chromosomes, but the analysis time is long, it can only detect XYY and XXY, and it cannot detect equilibrium translocation. And complex chromosome aberrations; SNP array can detect all chromosome aneuploidies and some single-gene hereditary diseases, but it takes too long, and is costly, difficult to analyze data, simple STR operation, high detection accuracy, but affected by loci The restriction detection function is single, and the inconvenience of operation limits the large-scale application; qPCR is prone to allele tripping or selective allele amplification, and the incidence rate can reach 10% to 25%, which seriously affects the analysis results. Accuracy.
因此,针对三倍体的检测方法还有待开发和改进。Therefore, the detection method for triploid has yet to be developed and improved.
发明内容Summary of the Invention
本发明旨在至少解决现有技术中存在的技术问题之一。The present invention aims to solve at least one of the technical problems existing in the prior art.
为此,在本发明的第一方面,本发明提出了一种确定男性待测样本是否存在三倍体的方法。根据本发明的实施例,所述方法包括:(1)将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;(2)基于步骤(1)中比对的结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;(3)针对所述至少一条常染色体的每一条,确定所述常染色体平均测序深度与Y染色体的平均测序深度的比例,记为DR i,其中,i表示染色体编号;以及(4)基于步骤(3)中获得的DRi,确定所述男性待测样本是否存在所述三倍体。根据本发明实施例的方法,可基于低覆盖度测序数据进行男性三倍体的检测,相比于现有技术,检测成本大幅降低、周期大幅缩短,且检测结果准确率高。 Therefore, in the first aspect of the present invention, the present invention proposes a method for determining whether a triploid exists in a male test sample. According to an embodiment of the present invention, the method includes: (1) comparing a sequencing result from the male test sample with a reference sequence, where the sequencing result is composed of multiple sequencing sequences; (2) based on steps (1) the result of the middle comparison to determine the average sequencing depth of a predetermined chromosome, the predetermined chromosome comprising a Y chromosome and at least one autosome; The ratio of the sequencing depth to the average sequencing depth of the Y chromosome is recorded as DR i , where i represents a chromosome number; and (4) determining whether the three samples of the male test are present based on the DRi obtained in step (3). Ploidy. According to the method of the embodiment of the present invention, the male triploid detection can be performed based on the low-coverage sequencing data. Compared with the prior art, the detection cost is greatly reduced, the period is greatly shortened, and the accuracy of the detection result is high.
在本发明的第二方面,本发明提出了一种确定男性待测样本是否存在三倍体的系统。根 据本发明的实施例,所述系统包括:比对装置,所述比对装置用于将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;平均测序深度确定装置,所述平均测序深度确定装置与所述比对装置相连,用于基于比对装置所获得的比对结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;DRi确定装置,所述DRi确定装置与所述平均测序深度确定装置相连,用于针对所述至少一条常染色体的每一条,确定所述常染色体平均测序深度与Y染色体的平均测序深度的比例,记为DRi,,其中,i表示染色体编号;以及判定装置,所述判定装置与所述DRi确定装置相连,用于基于DRi确定装置中获得的DRi,确定所述男性待测样本是否存在所述三倍体。根据本发明实施例的系统,可实现基于低覆盖度测序数据进行男性三倍体的检测,相比于现有技术,检测成本大幅降低、周期大幅缩短,且检测结果准确率高。In a second aspect of the invention, the invention proposes a system for determining whether a triploid is present in a male sample to be tested. According to an embodiment of the present invention, the system includes: an alignment device configured to compare a sequencing result from the male test sample with a reference sequence, and the sequencing result is obtained by multiple sequencing Sequence composition; an average sequencing depth determination device, which is connected to the comparison device, and is configured to determine an average sequencing depth of a predetermined chromosome based on an alignment result obtained by the comparison device, the predetermined chromosome includes A Y chromosome and at least one autosome; a DRi determination device connected to the average sequencing depth determination device for determining each of the at least one autosome and the average sequencing depth of the autosome and Y The ratio of the average sequencing depth of the chromosomes is denoted as DRi, where i represents the chromosome number; and a determination device, the determination device is connected to the DRi determination device, and is configured to determine the DRi based on the DRi obtained in the DRi determination device. Whether the triploid is present in a male test sample. According to the system of the embodiment of the present invention, the detection of male triploid based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
在本发明的第三方面,本发明提出了一种计算机可读介质。根据本发明的实施例,所述计算机可读介质中存储有指令,所述指令被适于处理执行以下步骤确定男性待测样本是否存在三倍体,(1)将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;(2)基于步骤(1)中所述比对的结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;(3)针对所述至少一条常染色体的每一条,确定所述常染色体的平均测序深度与Y染色体的平均测序深度的比例,记为DRi,其中,i表示染色体编号;以及(4)基于步骤(3)中获得的DRi,确定所述男性待测样本是否存在所述三倍体。根据本发明实施例的计算机可读介质,可实现基于低覆盖度测序数据进行男性三倍体的检测,相比于现有技术,检测成本大幅降低、周期大幅缩短,且检测结果准确率高。In a third aspect of the invention, the invention proposes a computer-readable medium. According to an embodiment of the present invention, instructions are stored in the computer-readable medium, and the instructions are adapted to perform the following steps to determine whether a triploid exists in a male sample to be tested, (1) from the male to be tested The sequencing result of the sample is compared with a reference sequence, the sequencing result is composed of multiple sequencing sequences; (2) an average sequencing depth of a predetermined chromosome is determined based on the result of the alignment in step (1), and the predetermined chromosome is determined Including the Y chromosome and at least one autosome; (3) for each of the at least one autosome, determining a ratio of an average sequencing depth of the autosome to an average sequencing depth of the Y chromosome, denoted as DRi, where i represents Chromosome numbering; and (4) determining whether the triploid is present in the male test sample based on the DRi obtained in step (3). According to the computer-readable medium of the embodiment of the present invention, detection of male triploids based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。Additional aspects and advantages of the present invention will be given in part in the following description, part of which will become apparent from the following description, or be learned through the practice of the present invention.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and / or additional aspects and advantages of the present invention will become apparent and easily understood from the description of the embodiments in conjunction with the following drawings, in which:
图1是根据本发明实施例的确定男性待测样本是否存在三倍体的系统的结构示意图;1 is a schematic structural diagram of a system for determining whether a triploid exists in a male test sample according to an embodiment of the present invention;
图2是根据本发明实施例的判定装置的结构示意图;以及2 is a schematic structural diagram of a determination device according to an embodiment of the present invention; and
图3是根据本发明实施例的测试集平均深度比分布图,其中,“o”表示阴性样本,“x”表示阳性样本,虚线表示判断未知样本的四个界线。FIG. 3 is an average depth ratio distribution of a test set according to an embodiment of the present invention, wherein “o” represents a negative sample, “x” represents a positive sample, and dashed lines represent four boundaries for judging an unknown sample.
具体实施方式Detailed ways
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附 图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。Hereinafter, embodiments of the present invention will be described in detail. Examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.
需要说明的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。进一步地,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。It should be noted that the terms “first” and “second” are only used for description purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, unless otherwise stated, the meaning of "a plurality" is two or more.
确定男性待测样本是否存在三倍体的方法Method for determining the presence of triploid in male test samples
在本发明的第一方面,本发明提出了一种确定男性待测样本是否存在三倍体的方法。根据本发明的实施例,所述方法包括:(1)将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;(2)基于步骤(1)中比对结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;(3)针对所述至少一条常染色体的每一条,确定所述常染色体平均测序深度与Y染色体的平均测序深度的比例,记为DR i,其中,i表示染色体编号;以及(4)基于步骤(3)中获得的DRi,确定所述男性待测样本是否存在所述三倍体。根据本发明实施例的方法,可基于低覆盖度测序数据进行男性三倍体的检测,相比于现有技术,检测成本大幅降低、周期大幅缩短,且检测结果准确率高。 In a first aspect of the present invention, the present invention provides a method for determining whether a triploid exists in a male sample to be tested. According to an embodiment of the present invention, the method includes: (1) comparing a sequencing result from the male test sample with a reference sequence, where the sequencing result is composed of multiple sequencing sequences; (2) based on steps (1) the result of the middle comparison to determine the average sequencing depth of a predetermined chromosome, the predetermined chromosome including a Y chromosome and at least one autosome; The ratio of the depth to the average sequencing depth of the Y chromosome is recorded as DR i , where i represents the chromosome number; and (4) determining whether the three-fold presence of the male test sample is based on the DRi obtained in step (3). body. According to the method of the embodiment of the present invention, detection of male triploid can be performed based on low-coverage sequencing data. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
根据本发明的实施例,所述将来自于所述男性待测样本的测序结果与参照序列进行比对可以采用SOAP(v2.20),把测序获得的序列比对到人类基因参考序列(Hg19)上,进而得到比对文件,以便确定能够与所述参照基因组序列比对上的所述测序序列的序列数。According to an embodiment of the present invention, the sequencing result from the male test sample is compared with a reference sequence. SOAP (v2.20) can be used to align the sequence obtained by sequencing to a human gene reference sequence (Hg19 ), And then obtain an alignment file to determine the number of sequences of the sequencing sequence that can be aligned with the reference genome sequence.
根据本发明的实施例,所述三倍体为XXY或者XYY。According to an embodiment of the present invention, the triploid is XXY or XYY.
根据本发明的实施例,所述测序结果来自于低深度测序。根据本发明实施例,所述方法尤其适用于低深度测序数据的分析。According to an embodiment of the present invention, the sequencing result is from low-depth sequencing. According to an embodiment of the present invention, the method is particularly suitable for analysis of low-depth sequencing data.
根据本发明的实施例,所述待测样本来自流产组织。进而取材方便,进一步降低检测成本。According to an embodiment of the present invention, the sample to be tested is from an abortion tissue. Furthermore, it is convenient to obtain materials, which further reduces the testing cost.
根据本发明的实施例,所述预定染色体的平均测序深度是基于下列确定的:(a)能够与所述预定染色体的参照序列比对上的所述测序序列的序列数;(b)所述预定染色体的参照序列的长度;以及(c)所述多个测序序列的平均序列长度。According to an embodiment of the present invention, the average sequencing depth of the predetermined chromosome is determined based on: (a) the number of sequences of the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome; (b) the A length of a reference sequence of a predetermined chromosome; and (c) an average sequence length of the plurality of sequencing sequences.
根据本发明的实施例,所述能够与所述预定染色体的参照序列比对上的所述测序序列为唯一比对序列。According to an embodiment of the present invention, the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome is a uniquely aligned sequence.
根据本发明的具体实施例,在步骤(2)中,所述平均测序深度是按照下列公式确定的:According to a specific embodiment of the present invention, in step (2), the average sequencing depth is determined according to the following formula:
Figure PCTCN2018089328-appb-000001
Figure PCTCN2018089328-appb-000001
其中,among them,
D i表示第i号染色体的平均深度, D i represents the average depth of chromosome i,
i为1~24范围内的至少一个整数,其中,23和24分别表示X和Y染色体,i is at least one integer in the range of 1 to 24, where 23 and 24 represent X and Y chromosomes, respectively,
R i表示能够与第i染色体的参照序列比对上的所述测序序列的序列数, R i represents the number of sequences of the sequencing sequence capable of being aligned with the reference sequence of the ith chromosome,
R_len表示所述多个测序序列的平均序列长度,R_len represents the average sequence length of the multiple sequencing sequences,
C_len i表示所述参照序列中第i号染色体的长度。 C_len i represents the length of chromosome i in the reference sequence.
进而基于测序数据,获得每条染色体的平均测序深度,用于确定每条常染色体的平均测序深度与Y染色体的平均测序深度的比例。Based on the sequencing data, the average sequencing depth of each chromosome is obtained to determine the ratio of the average sequencing depth of each autosome to the average sequencing depth of the Y chromosome.
根据本发明的实施例,所述预定染色体包括至少2条常染色体,优选至少10条常染色体,最优选22条常染色体。According to an embodiment of the present invention, the predetermined chromosome includes at least 2 autosomes, preferably at least 10 autosomes, and most preferably 22 autosomes.
根据本发明的实施例,待测样本的常染色体与Y染色体的平均测序深度比是按照公式DRi=D i/D 24确定的,其中D i(i=1,2,3,……,22)表示所述预定染色体中所述至少一条常染色体的平测序均深度,D 24表示Y染色体的平均测序深度。 According to the embodiment of the present invention, the average sequencing depth ratio of the autosomal and Y chromosomes of the sample to be tested is determined according to the formula DRi = D i / D 24 , where D i (i = 1, 2, 3, ..., 22 ) Represents the average sequencing depth of the at least one autosome in the predetermined chromosome, and D 24 represents the average sequencing depth of the Y chromosome.
根据本发明的实施例,步骤(4)进一步包括:(4-1)针对所述预定染色体中所述至少一条常染色体的每一条,确定所述常染色体的DRi与参考深度比的比值,记为DDR i,;(4-2)基于步骤(4-1)中获得的DDR i,确定所述预定染色体中所述至少一条常染色体的平均深度比,记为
Figure PCTCN2018089328-appb-000002
(4-3)基于步骤(4-2)中获得的
Figure PCTCN2018089328-appb-000003
确定所述男性待测样本中是否存在所述三倍体。进而,在步骤(4)中,依据
Figure PCTCN2018089328-appb-000004
判定男性待测样本中是否存在所述三倍体的结果更加准确。
According to an embodiment of the present invention, step (4) further includes: (4-1) determining, for each of the at least one autosome in the predetermined chromosome, a ratio of a DRi of the autosome to a reference depth ratio, and recording Is DDR i ; (4-2) Based on the DDR i obtained in step (4-1), determine an average depth ratio of the at least one autosome in the predetermined chromosome, and record it as
Figure PCTCN2018089328-appb-000002
(4-3) Based on what was obtained in step (4-2)
Figure PCTCN2018089328-appb-000003
It is determined whether the triploid is present in the male test sample. Further, in step (4), according to
Figure PCTCN2018089328-appb-000004
The result of determining whether the triploid is present in a male test sample is more accurate.
根据本发明的实施例,所述参考深度比是预先基于多个具有已知三倍体状态的对照样本确定的。According to an embodiment of the present invention, the reference depth ratio is determined in advance based on a plurality of control samples having a known triploid state.
根据本发明的再一具体实施例,所述对照样本针对性染色体,不具有三倍体。According to yet another specific embodiment of the present invention, the control sample is targeted to a sex chromosome and does not have a triploid.
根据本发明的具体实施例,所述参考深度比是预先基于至少100个,优选1000个,具有已知三倍体状态的对照样本确定的。According to a specific embodiment of the present invention, the reference depth ratio is determined in advance based on at least 100, preferably 1,000, control samples having a known triploid state.
根据本发明的实施例,所述参考深度比可通过如下方式获得:随机选择多份(如1000份)针对性染色体,不具有三倍体的阴性对照样品作为参考集,计算参考集中每一个对照样本的第i号常染色体与Y染色体的平均测序深度比DR’ i(i=1,2,3….22)(需要说明的是,此处的每一个对照样本的第i号常染色体与Y染色体的平均测序深度比DR’ i与前面所述的平均测序深度DRi的计算方式一致,即DR’i=D’ i/D’ 24,其中,D’ i表示每一个对照样本中常染色体的平均测序深度,D’ 24表示每一个对照样本中Y染色体的平均测序深度,而D’ i的获得也有前面所述的D i获得方式一致,即基于每一个对照样本进行测序,获得每一个对照样本测序序列与参照序列的对比结果——能够与第i号染色体的参照序列比对上的测序序列的序列数以及多个测序序列的平均序列长度,进而获得每一个对照样本中常染色体的平均测序深度D’ i),进而再计算参考集中所有样品的常染色体与Y染色体的平均测序深度比的均值
Figure PCTCN2018089328-appb-000005
Figure PCTCN2018089328-appb-000006
其中,n为参考集中的对照样品数,
Figure PCTCN2018089328-appb-000007
为第i号染色体的参考深度比。
According to the embodiment of the present invention, the reference depth ratio can be obtained by randomly selecting a plurality of (eg, 1000) targeted chromosomes, and a negative control sample without triploid as a reference set, and calculating each control in the reference set. i-th sample and the average sequencing depth autosomal chromosome Y 'i (i = 1,2,3 ... .22 ) ( Note that the ratio of DR, each of the i-th control sample with autosomal herein the average depth Y chromosome sequencing 'i sequence is consistent with the average of the calculated depth DRi foregoing manner, i.e. DR'i = D' ratio DR i / D '24, wherein, D' i each represents a control sample of autosomal Average sequencing depth, D '24 represents the average sequencing depth of the Y chromosome in each control sample, and D' i is obtained in the same manner as described above for D i , that is, sequencing is performed based on each control sample to obtain each control Comparison result between the sample sequencing sequence and the reference sequence-the number of sequences of the sequencing sequence that can be compared with the reference sequence of chromosome i and the average sequence length of multiple sequencing sequences to obtain each control sample Average autosomal sequencing depth D ′ i ), and then calculate the average mean sequencing depth ratio of autosomal and Y chromosomes for all samples in the reference set
Figure PCTCN2018089328-appb-000005
which is
Figure PCTCN2018089328-appb-000006
Where n is the number of control samples in the reference set,
Figure PCTCN2018089328-appb-000007
Is the reference depth ratio of chromosome i.
根据本发明的实施例,所述比值DDR i是按照公式
Figure PCTCN2018089328-appb-000008
确定的。
According to an embodiment of the present invention, the ratio DDR i is according to a formula
Figure PCTCN2018089328-appb-000008
definite.
根据本发明的实施例,本申请所述的“预定染色体中所述至少一条常染色体的平均深度比”是指预定染色体中所述至少一条常染色体的DDRi的平均值,即常染色体的DRi与参考深度比的比值的平均值。根据本发明的具体实施例,所述预定染色体中所述至少一条常染色体的平均深度比
Figure PCTCN2018089328-appb-000009
是按照公式
Figure PCTCN2018089328-appb-000010
确定的。根据本发明的再一具体实施例,所述预定染色体包括Y染色体和全部常染色体,并且所述全部常染色体的平均深度比
Figure PCTCN2018089328-appb-000011
是按照公式
Figure PCTCN2018089328-appb-000012
确定的。
According to the embodiment of the present invention, the "average depth ratio of the at least one autosome in the predetermined chromosome" described in this application refers to the average value of the DDRi of the at least one autosome in the predetermined chromosome, that is, the DRi of the autosome and The average of the ratios of the reference depth ratios. According to a specific embodiment of the present invention, the average depth ratio of the at least one autosome in the predetermined chromosome is
Figure PCTCN2018089328-appb-000009
Is according to the formula
Figure PCTCN2018089328-appb-000010
definite. According to still another specific embodiment of the present invention, the predetermined chromosome includes a Y chromosome and all autosomes, and an average depth ratio of all the autosomes is
Figure PCTCN2018089328-appb-000011
Is according to the formula
Figure PCTCN2018089328-appb-000012
definite.
根据本发明的实施例,步骤(4-3)中,将所述平均深度比
Figure PCTCN2018089328-appb-000013
与阈值比较,确定所述男性待测样本中是否存在所述三倍体。
According to an embodiment of the present invention, in step (4-3), the average depth ratio is
Figure PCTCN2018089328-appb-000013
Compared with a threshold value, it is determined whether the triploid exists in the male test sample.
根据本发明的实施例,步骤(4-3)中,所述平均深度比
Figure PCTCN2018089328-appb-000014
不低于第一阈值是所述男性待测样本为XXY三倍体的指示,所述平均深度比
Figure PCTCN2018089328-appb-000015
不超过第二阈值是所述男性待测样本为XYY三倍体的指示。
According to an embodiment of the present invention, in step (4-3), the average depth ratio is
Figure PCTCN2018089328-appb-000014
Not lower than the first threshold is an indication that the male sample to be tested is XXY triploid, and the average depth ratio
Figure PCTCN2018089328-appb-000015
Not exceeding the second threshold is an indication that the male test sample is an XYY triploid.
根据本发明的实施例,所述第一阈值和所述第二阈值是基于多个已知三倍体类型的参考样本确定的。According to an embodiment of the present invention, the first threshold and the second threshold are determined based on a plurality of reference samples of known triploid types.
根据本发明的实施例,所述第一阈值和所述第二阈值是基于100~10000个已知三倍体类型的参考样本确定的。According to an embodiment of the present invention, the first threshold and the second threshold are determined based on 100 to 10,000 reference samples of known triploid type.
根据本发明的实施例,所述第一阈值为至少1.14,优选至少1.15,所述第二阈值为不超过0.9,优选0.88,更优选0.85。According to an embodiment of the present invention, the first threshold value is at least 1.14, preferably at least 1.15, and the second threshold value is not more than 0.9, preferably 0.88, and more preferably 0.85.
根据本发明的实施例,步骤(4-3)中,进一步包括,所述平均深度比
Figure PCTCN2018089328-appb-000016
位于预定区间范围内,是所述男性待测样本为非三倍体的指示,所述预定区间范围是基于所述第一阈值和所述第二阈值确定的。
According to an embodiment of the present invention, in step (4-3), the method further includes: the average depth ratio
Figure PCTCN2018089328-appb-000016
Located within a range of a predetermined interval is an indication that the male test sample is non-triploid, and the range of the predetermined interval is determined based on the first threshold and the second threshold.
根据本发明的实施例,所述预定区间范围的左端值不小于所述第二阈值,所述预定区间的右端值不高于所述第一阈值。According to an embodiment of the present invention, the left end value of the predetermined interval range is not less than the second threshold value, and the right end value of the predetermined interval range is not higher than the first threshold value.
根据本发明的实施例,所述左端值与所述第二阈值的差值以及所述右端值与所述第一阈值的差值分别独立地不小于0.02,优选不小于0.03。According to an embodiment of the present invention, the difference between the left end value and the second threshold value and the difference between the right end value and the first threshold value are independently not less than 0.02, preferably not less than 0.03.
需要说明的是,如果排除测序数据波动,样本污染,染色体嵌合,染色体间长度差异等因素的影响,
Figure PCTCN2018089328-appb-000017
是所述男性待测样本为非三倍体的指示;
Figure PCTCN2018089328-appb-000018
是所述男性待测样本为XXY三倍体的指示,
Figure PCTCN2018089328-appb-000019
是所述男性待测样本为XYY三倍体的指示。实际上,染色体间长度差异,染色体嵌合,数据波动等因素的影响下,对于XXY三倍体样本而言,应该小于1.5,XYY而言应该大于0.75,阴性样本应该是在1左右波动。因此,结合实际数据特征,根据本发明实施例的设定的判定标准如下:
It should be noted that if the effects of factors such as fluctuations in sequencing data, sample contamination, chromosome chimerism, and differences in length between chromosomes are excluded,
Figure PCTCN2018089328-appb-000017
Is an indication that the male test sample is non-triploid;
Figure PCTCN2018089328-appb-000018
Is an indication that the male test sample is XXY triploid,
Figure PCTCN2018089328-appb-000019
It is an indication that the male test sample is an XYY triploid. In fact, under the influence of factors such as length differences between chromosomes, chromosome chimerism, and data fluctuations, XXY triploid samples should be less than 1.5, XYY should be greater than 0.75, and negative samples should fluctuate around 1. Therefore, in combination with the characteristics of actual data, the set determination criteria according to the embodiment of the present invention are as follows:
Figure PCTCN2018089328-appb-000020
判定为XYY三倍体;
Figure PCTCN2018089328-appb-000021
判定为未知样本;
Figure PCTCN2018089328-appb-000022
判定为阴性;
Figure PCTCN2018089328-appb-000023
判定为未知样本;
Figure PCTCN2018089328-appb-000024
判定为XXY三倍体。
Figure PCTCN2018089328-appb-000020
Determined as XYY triploid;
Figure PCTCN2018089328-appb-000021
Determined as an unknown sample;
Figure PCTCN2018089328-appb-000022
Judged negative
Figure PCTCN2018089328-appb-000023
Determined as an unknown sample;
Figure PCTCN2018089328-appb-000024
Determined as XXY triploid.
确定男性待测样本是否存在三倍体的系统System for determining the presence of triploid in male test samples
在本发明的第二方面,本发明提出了一种确定男性待测样本是否存在三倍体的系统。根据本发明的实施例,参考图1,所述系统包括:In a second aspect of the invention, the invention proposes a system for determining whether a triploid is present in a male sample to be tested. According to an embodiment of the present invention, referring to FIG. 1, the system includes:
比对装置100,所述比对装置100用于将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成。其中,将测序获得的序列与参照基因组序列进行比对,所述比对可采用SOAP(v2.20),把测序获得的序列比对到人类基因参考序列(Hg19)上,进而得到比对文件,以便确定能够与所述参照序列比对上的所述测序序列的序列数。需要说明的是,能够与所述预定染色体的参照序列比对上的所述测序序列为唯一比对序列。The comparison device 100 is configured to compare a sequencing result from the male test sample with a reference sequence, and the sequencing result is composed of multiple sequencing sequences. Wherein, the sequence obtained by sequencing is compared with a reference genomic sequence, and the alignment may adopt SOAP (v2.20), and the sequence obtained by sequencing is aligned to a human gene reference sequence (Hg19) to obtain an alignment file. To determine the number of sequences of the sequencing sequence that can be aligned with the reference sequence. It should be noted that the sequencing sequence capable of being aligned with the reference sequence of the predetermined chromosome is a unique alignment sequence.
平均测序深度确定装置200,所述平均测序深度确定装置200与所述比对装置100相连,用于基于比对装置所获得的比对结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体。其中,所述预定染色体的平均测序深度是基于下列确定的:(a)能够与所述预定染色体的参照序列比对上的所述测序序列的序列数,(b)所述预定染色体的参照序列的长度,以及(c)所述多个测序序列的平均序列长度;具体地,所述平均测序深度是按照下列公式确定的:
Figure PCTCN2018089328-appb-000025
D i表示第i号染色体的平均测序深度,i为1~24范围内的至少一个整数,23和24分别表示X和Y染色体,R i表示能够与第i染色体的参照序列比对上的所述测序序列的序列数,R_len表示所述多个测序序列的平均序列长度,C_len i表示所述参照序列中第i号染色体的长度。所述预定染色体包括至少2条常染色体,优选至少10条常染色体,最优选22条常染色体。
An average sequencing depth determination device 200, which is connected to the comparison device 100, and is configured to determine an average sequencing depth of a predetermined chromosome based on the comparison result obtained by the comparison device, where the predetermined chromosome includes Y chromosome and at least one autosome. Wherein, the average sequencing depth of the predetermined chromosome is determined based on: (a) the number of sequences of the sequencing sequence capable of being aligned with the reference sequence of the predetermined chromosome, and (b) the reference sequence of the predetermined chromosome And (c) the average sequence length of the plurality of sequencing sequences; specifically, the average sequencing depth is determined according to the following formula:
Figure PCTCN2018089328-appb-000025
D i represents the average sequencing depth of chromosome i, i is at least one integer in the range of 1 to 24, 23 and 24 represent the X and Y chromosomes, respectively, and R i represents all the genes that can be compared with the reference sequence of the i chromosome. The sequence number of the sequencing sequence, R_len represents the average sequence length of the plurality of sequencing sequences, and C_len i represents the length of the i-th chromosome in the reference sequence. The predetermined chromosome includes at least 2 autosomes, preferably at least 10 autosomes, and most preferably 22 autosomes.
DR i确定装置300,所述DR i确定装置300与所述平均测序深度确定装置200相连,用于针对所述至少一条常染色体的每一条,确定所述常染色体的平均测序深度与Y染色体的平均测序深度的比例DR i,其中,i表示染色体编号。其中,待测样本的常染色体与Y染色体的平均测序深度比可按照公式DR i=Di/D24确定,其中D i(i=1,2,3,……,22)表示所述预定染色体中所述至少一条常染色体的平测序均深度,D24表示Y染色体的平均测序深度。 DR i determining means 300, said DR i determining means 300 determines the average depth of the sequencing means 200 is connected, for each one of the at least one autosomal sequencing to determine the average depth of the Y chromosome of autosomal The ratio DR i of the average sequencing depth, where i represents the chromosome number. The average sequencing depth ratio of the autosomal and Y chromosomes of the test sample can be determined according to the formula DR i = Di / D24, where D i (i = 1, 2, 3,..., 22) represents the predetermined chromosome. The average sequencing depth of the at least one autosome is D24, and D24 represents the average sequencing depth of the Y chromosome.
判定装置400,所述判定装置400与所述DR i确定装置300相连,用于基于DR i确定装置300中获得的DR i,确定所述男性待测样本是否存在所述三倍体。具体地,所述三倍体为XXY或者XYY。 A determining device 400, which is connected to the DR i determining device 300 and configured to determine whether the triploid exists in the male test sample based on the DR i obtained in the DR i determining device 300. Specifically, the triploid is XXY or XYY.
根据本发明的实施例,参考图2,所述判定装置400包括:According to an embodiment of the present invention, referring to FIG. 2, the determining device 400 includes:
DDR i确定单元401,所述DDR i确定单元401用于针对所述预定染色体中所述至少一条常染色体的每一条,确定所述常染色体的DRi与参考深度比的比值DDR i。其中,所述比值 DDR i可按照公式
Figure PCTCN2018089328-appb-000026
确定,其中DR i表示常染色体与Y染色体的平均测序深度比,
Figure PCTCN2018089328-appb-000027
表示参考深度比。而参考深度比
Figure PCTCN2018089328-appb-000028
可采用下列方式获得:随机选择多份(至少100个,优选1000个)针对性染色体,不具有三倍体的阴性对照样品作为参考集,计算参考集中每一个对照样本的第i号常染色体与Y染色体的平均测序深度比DR’ i(i=1,2,3….22),进而再计算参考集中所有样品的常染色体与Y染色体的平均测序深度比的均值
Figure PCTCN2018089328-appb-000029
Figure PCTCN2018089328-appb-000030
其中,n为参考集中的对照样品数,
Figure PCTCN2018089328-appb-000031
为第i号染色体的参考深度比。
DDR i determination unit 401, a determining unit 401 for DDR i DDR i ratio for said at least one predetermined chromosomes of each autosomal determining the ratio DRi autosomal reference depth. The ratio DDR i can be calculated according to the formula
Figure PCTCN2018089328-appb-000026
OK, where DR i represents the average sequencing depth ratio of autosome and Y chromosome,
Figure PCTCN2018089328-appb-000027
Indicates the reference depth ratio. And the reference depth ratio
Figure PCTCN2018089328-appb-000028
It can be obtained in the following way: randomly select multiple (at least 100, preferably 1000) targeted chromosomes, and use the negative control sample without triploid as the reference set, and calculate the i-th autosome and the control sample of each control sample in the reference set. The average sequencing depth ratio of Y chromosome DR ' i (i = 1, 2, 3… .22), and then calculate the average of the average sequencing depth ratio of autosomal and Y chromosomes of all samples in the reference set.
Figure PCTCN2018089328-appb-000029
which is
Figure PCTCN2018089328-appb-000030
Where n is the number of control samples in the reference set,
Figure PCTCN2018089328-appb-000031
Is the reference depth ratio of chromosome i.
Figure PCTCN2018089328-appb-000032
确定单元402,所述
Figure PCTCN2018089328-appb-000033
确定单元402与所述DDR i确定单元401相连,用于基于DDR i确定单元401中获得的DDR i,确定所述预定染色体中所述至少一条常染色体的平均深度比
Figure PCTCN2018089328-appb-000034
其中,预定染色体中所述至少一条常染色体的平均深度比是指预定染色体中所述至少一条常染色体的DDR i的平均值,即常染色体的DR i与参考深度比的比值的平均值,可根据公式
Figure PCTCN2018089328-appb-000035
计算获得。当所述预定染色体包括Y染色体和全部常染色体时,所述全部常染色体的平均深度比
Figure PCTCN2018089328-appb-000036
可按照公式
Figure PCTCN2018089328-appb-000037
确定。
Figure PCTCN2018089328-appb-000032
A determining unit 402,
Figure PCTCN2018089328-appb-000033
The determining unit 402 is connected to the DDR i determining unit 401 and is configured to determine an average depth ratio of the at least one autosome in the predetermined chromosome based on the DDR i obtained in the DDR i determining unit 401.
Figure PCTCN2018089328-appb-000034
The average depth ratio of the at least one autosome in the predetermined chromosome refers to the average value of the DDR i of the at least one autosome in the predetermined chromosome, that is, the average value of the ratio of the DR i of the autosome to the reference depth ratio. According to formula
Figure PCTCN2018089328-appb-000035
Calculated. When the predetermined chromosome includes a Y chromosome and all autosomes, an average depth ratio of the entire autosomes
Figure PCTCN2018089328-appb-000036
Can follow the formula
Figure PCTCN2018089328-appb-000037
determine.
判定单元403,所述判定单元403与所述
Figure PCTCN2018089328-appb-000038
确定单元402相连,用于基于
Figure PCTCN2018089328-appb-000039
确定单元402中获得的
Figure PCTCN2018089328-appb-000040
确定所述男性待测样本中是否存在所述三倍体。
A determining unit 403, where the determining unit 403 and the
Figure PCTCN2018089328-appb-000038
The determining unit 402 is connected for
Figure PCTCN2018089328-appb-000039
Obtained in the determination unit 402
Figure PCTCN2018089328-appb-000040
It is determined whether the triploid is present in the male test sample.
根据本发明的实施例,将所述平均深度比
Figure PCTCN2018089328-appb-000041
与阈值比较,确定所述男性待测样本中是否存在所述三倍体。
According to an embodiment of the present invention, the average depth ratio is
Figure PCTCN2018089328-appb-000041
Compared with a threshold value, it is determined whether the triploid exists in the male test sample.
根据本发明的实施例,所述平均深度比
Figure PCTCN2018089328-appb-000042
不低于第一阈值是所述男性待测样本为XXY三倍体的指示,所述平均深度比
Figure PCTCN2018089328-appb-000043
不超过第二阈值是所述男性待测样本为XYY三倍体的指示。其中,所述第一阈值和所述第二阈值是基于多个已知三倍体类型的参考样本确定的,如所述第一阈值和所述第二阈值是基于100~10000件已知三倍体类型的参考样本确定的。根据本发明再一实施例,所述第一阈值为至少1.14,优选至少1.15,所述第二阈值为不超过0.9,优选0.88,更优选0.85。
According to an embodiment of the present invention, the average depth ratio
Figure PCTCN2018089328-appb-000042
Not lower than the first threshold is an indication that the male sample to be tested is XXY triploid, and the average depth ratio
Figure PCTCN2018089328-appb-000043
Not exceeding the second threshold is an indication that the male test sample is an XYY triploid. The first threshold and the second threshold are determined based on a plurality of reference samples of known triploid types. For example, the first threshold and the second threshold are based on 100 to 10,000 known triples. Determined by reference sample of ploidy type. According to another embodiment of the present invention, the first threshold value is at least 1.14, preferably at least 1.15, and the second threshold value is not more than 0.9, preferably 0.88, and more preferably 0.85.
根据本发明的实施例,所述平均深度比
Figure PCTCN2018089328-appb-000044
位于预定区间范围内,是所述男性待测样本为非三倍体的指示,所述预定区间范围是基于所述第一阈值和所述第二阈值确定的。根据本发明的具体实施例,所述预定区间范围的左端值不小于所述第二阈值,所述预定区间的右端值不高于所述第一阈值。如所述左端值与所述第二阈值的差值以及所述右端值与所述第一阈值的差值分别独立地不小于0.02,优选不小于0.03。
According to an embodiment of the present invention, the average depth ratio
Figure PCTCN2018089328-appb-000044
Located within a range of a predetermined interval is an indication that the male test sample is non-triploid, and the range of the predetermined interval is determined based on the first threshold and the second threshold. According to a specific embodiment of the present invention, the left end value of the predetermined interval range is not less than the second threshold value, and the right end value of the predetermined interval range is not higher than the first threshold value. For example, the difference between the left end value and the second threshold value and the difference between the right end value and the first threshold value are independently not less than 0.02, preferably not less than 0.03.
需要说明的是,如果排除测序数据波动,样本污染,染色体嵌合,染色体间长度差异等因素的影响,
Figure PCTCN2018089328-appb-000045
是所述男性待测样本为非三倍体的指示;
Figure PCTCN2018089328-appb-000046
是所述男性待测样本为XXY三倍体的指示,
Figure PCTCN2018089328-appb-000047
是所述男性待测样本为XYY三倍体的指示。实际上, 染色体间长度差异,染色体嵌合,数据波动等因素的影响下,对于XXY三倍体样本而言,应该小于1.5,XYY而言应该大于0.75,阴性样本应该是在1左右波动。因此,结合实际数据特征,根据本发明实施例的判定单元403设定的判定标准如下:
It should be noted that if the effects of factors such as fluctuations in sequencing data, sample contamination, chromosome chimerism, and differences in length between chromosomes are excluded,
Figure PCTCN2018089328-appb-000045
Is an indication that the male test sample is non-triploid;
Figure PCTCN2018089328-appb-000046
Is an indication that the male test sample is XXY triploid,
Figure PCTCN2018089328-appb-000047
It is an indication that the male test sample is an XYY triploid. In fact, under the influence of chromosome length differences, chromosome chimerism, and data fluctuations, XXY triploid samples should be less than 1.5, XYY should be greater than 0.75, and negative samples should fluctuate around 1. Therefore, in combination with the characteristics of actual data, the determination criterion set by the determination unit 403 according to the embodiment of the present invention is as follows:
Figure PCTCN2018089328-appb-000048
判定为XYY三倍体;
Figure PCTCN2018089328-appb-000049
判定为未知样本;
Figure PCTCN2018089328-appb-000050
判定为阴性;
Figure PCTCN2018089328-appb-000051
判定为未知样本;
Figure PCTCN2018089328-appb-000052
判定为XXY三倍体。
Figure PCTCN2018089328-appb-000048
Determined as XYY triploid;
Figure PCTCN2018089328-appb-000049
Determined as an unknown sample;
Figure PCTCN2018089328-appb-000050
Judged negative
Figure PCTCN2018089328-appb-000051
Determined as an unknown sample;
Figure PCTCN2018089328-appb-000052
Determined as XXY triploid.
根据本发明的实施例,所述三倍体为XXY或者XYY。According to an embodiment of the present invention, the triploid is XXY or XYY.
根据本发明的实施例,所述测序结果来自于低深度测序。根据本发明实施例的系统尤其适用于低深度测序数据的分析。According to an embodiment of the present invention, the sequencing result is from low-depth sequencing. The system according to an embodiment of the present invention is particularly suitable for analysis of low-depth sequencing data.
根据本发明的实施例,所述待测样本来自流产组织。进而取材方便,进一步降低检测成本。According to an embodiment of the present invention, the sample to be tested is from an abortion tissue. Furthermore, it is convenient to obtain materials, which further reduces the testing cost.
根据本发明实施例的系统,可实现基于低覆盖度测序数据进行男性三倍体的检测,相比于现有技术,检测成本大幅降低、周期大幅缩短,且检测结果准确率高。According to the system of the embodiment of the present invention, the detection of male triploid based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
计算机可读介质Computer readable medium
在本发明的第三方面,本发明提出了一种计算机可读介质。根据本发明的实施例,所述计算机可读介质中存储有指令,所述指令被适于处理执行以下步骤确定男性待测样本是否存在三倍体,(1)将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;(2)基于步骤(1)中所述比对的结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;(3)针对所述至少一条常染色体的每一条,确定所述常染色体的平均测序深度与Y染色体的平均测序深度的比例,记为DRi,其中,i表示染色体编号;以及(4)基于步骤(3)中获得的DRi,确定所述男性待测样本是否存在所述三倍体。根据本发明实施例的计算机可读介质,可实现基于低覆盖度测序数据进行男性三倍体的检测,相比于现有技术,检测成本大幅降低、周期大幅缩短,且检测结果准确率高。In a third aspect of the invention, the invention proposes a computer-readable medium. According to an embodiment of the present invention, instructions are stored in the computer-readable medium, and the instructions are adapted to perform the following steps to determine whether a triploid exists in a male sample to be tested, (1) from the male to be tested The sequencing result of the sample is compared with a reference sequence, the sequencing result is composed of multiple sequencing sequences; (2) an average sequencing depth of a predetermined chromosome is determined based on the result of the alignment in step (1), and the predetermined chromosome is determined Including the Y chromosome and at least one autosome; (3) for each of the at least one autosome, determining a ratio of an average sequencing depth of the autosome to an average sequencing depth of the Y chromosome, denoted as DRi, where i represents Chromosome numbering; and (4) determining whether the triploid is present in the male test sample based on the DRi obtained in step (3). According to the computer-readable medium of the embodiment of the present invention, detection of male triploids based on low-coverage sequencing data can be realized. Compared with the prior art, the detection cost is greatly reduced, the cycle is greatly shortened, and the accuracy of the detection result is high.
根据本发明实施例的计算机可读介质的附加技术特征与效果与根据本发明实施例的确定男性待测样本是否存在三倍体的方法和系统类似,在此不再赘述。The additional technical features and effects of the computer-readable medium according to the embodiment of the present invention are similar to the method and system for determining whether a triploid exists in a male test sample according to the embodiment of the present invention, and details are not described herein again.
下面详细描述本发明的实施例,可以理解的是,下面描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。The embodiments of the present invention are described in detail below. It can be understood that the embodiments described below are exemplary and are only used to explain the present invention, but should not be construed as limiting the present invention.
实施例 方法准确度确定Example method accuracy determination
本实施例共采用1438例男性样本进行技术方案的实施和效果评估,总样本包括1370份阴性样本和68份阳性样本,其中6个阳性样本分别重测一次。所有样本的测序数据是基于BGISEQ-500平台单端测序得到的35bp(即SE 35bp)的序列集。基于该序列集,具体实施步骤如下:In this embodiment, a total of 1438 male samples were used for the implementation of the technical scheme and the effect evaluation. The total sample includes 1370 negative samples and 68 positive samples, of which 6 positive samples were retested once. The sequencing data of all samples are based on the 35 bp (ie, SE 35 bp) sequence set obtained by single-ended sequencing of the BGISEQ-500 platform. Based on the sequence set, the specific implementation steps are as follows:
(1)比对。采用SOAP(v2.20)把测序获得的序列比对到人类基因参考序列(Hg19)上,得到比对文件;(1) Compare. Using SOAP (v2.20) to align the sequence obtained by sequencing to the human gene reference sequence (Hg19) to obtain an alignment file;
(2)深度统计。统计第i号染色体比对到每条参照序列上的所述测序序列的序列数R i,并按照如下公式计算每条染色体的平均测序深度D i(i=1,2,3,……,23,24); (2) In-depth statistics. Count the sequence number R i of the sequencing sequence on the chromosome i to each reference sequence, and calculate the average sequencing depth D i of each chromosome according to the following formula (i = 1, 2, 3, ..., 23,24);
Figure PCTCN2018089328-appb-000053
Figure PCTCN2018089328-appb-000053
其中,R_len表示多个测序序列的平均序列长度,C_len表示参照序列中第i号染色体的长度;Among them, R_len represents the average sequence length of multiple sequencing sequences, and C_len represents the length of chromosome i in the reference sequence;
(3)从1370份阴性样本集中选择随机选择1000份样本作为参考集,剩余370份样本作为阴性测试集,68份阳性样本作为阳性测试集;(3) 1000 randomly selected samples from the 1370 negative sample set as the reference set, the remaining 370 samples as the negative test set, and 68 positive samples as the positive test set
(4)针对每个数据集中的每个样本,计算每条常染色体的平均深度与Y染色体的平均测序深度的比值,记为DR i=D i/D 24(i=1,2,3,……,22); (4) For each sample in each data set, calculate the ratio of the average depth of each autosomal to the average sequencing depth of the Y chromosome, and record it as DR i = D i / D 24 (i = 1, 2, 3, ……,twenty two);
(5)计算参考集中所有样本每条常染色体平均测序深度的比值的均值,作为参考深度比,记为
Figure PCTCN2018089328-appb-000054
其中,n表示参考集中样本总数,即1000,计算结果如表1(基于1000份阴性样本计算出来的每条常染色体与Y染色体平均测序深度的比值的均值)所示。
(5) Calculate the average value of the ratio of the average sequencing depth of each autosome in all samples in the reference set, as the reference depth ratio, and record it as
Figure PCTCN2018089328-appb-000054
Among them, n represents the total number of samples in the reference set, that is, 1000, and the calculation results are shown in Table 1 (mean of the ratio of the average sequencing depth of each autosomal and Y chromosome calculated based on 1000 negative samples).
表1Table 1
染色体号Chromosome number 参考深度比Reference depth ratio
Chr1Chr1 6.1129649376.112964937
Chr2Chr2 6.7362922076.736292207
Chr3Chr3 6.8396411716.839641171
Chr4Chr4 6.7897537396.789753739
Chr5Chr5 6.7134719176.713471917
Chr6Chr6 6.7896174536.789617453
Chr7Chr7 6.477692846.47769284
Chr8Chr8 6.7446086946.744608694
Chr9Chr9 5.3622507955.362250795
Chr10Chr10 6.6002601836.600260183
Chr11Chr11 6.6764040076.676404007
Chr12Chr12 6.6768603926.676860392
Chr13Chr13 5.7736926585.773692658
Chr14Chr14 5.6171356775.617135677
Chr15Chr15 5.2334902875.233490287
Chr16Chr16 5.7148391975.714839197
Chr17Chr17 6.2029547636.202954763
Chr18Chr18 6.7271499286.727149928
Chr19Chr19 5.8948058625.894805862
Chr20Chr20 6.6248908276.624890827
Chr21Chr21 5.1924913285.192491328
Chr22Chr22 4.2972674754.297267475
(6)计算测试集中每个样本每条常染色体的平均测序深度的比值与参考深度比的比值, 记为
Figure PCTCN2018089328-appb-000055
(6) Calculate the ratio of the average sequencing depth ratio of each autosomal sample to the reference depth ratio of each sample in the test set, and record it as
Figure PCTCN2018089328-appb-000055
(7)计算测试集中每个样本所有常染色体的平均测序深度的比值与参考深度比的比值的平均深度比,记为
Figure PCTCN2018089328-appb-000056
计算结果如表2(370份阴性测试集和68份阳性测试集的平均)所示,散点图如图3所示。
(7) Calculate the average depth ratio of the ratio of the average sequencing depth of all autosomes in each test set to the ratio of the reference depth ratio, and record
Figure PCTCN2018089328-appb-000056
The calculation results are shown in Table 2 (average of 370 negative test sets and 68 positive test sets), and the scatter plot is shown in FIG. 3.
表2Table 2
Figure PCTCN2018089328-appb-000057
Figure PCTCN2018089328-appb-000057
Figure PCTCN2018089328-appb-000058
Figure PCTCN2018089328-appb-000058
根据以上步骤,检测结论如下:According to the above steps, the test conclusions are as follows:
(1)68例阳性样本中,5例判定为XYY,其中一例
Figure PCTCN2018089328-appb-000059
结合NGS分析数据,确定为性染色体严重异常的样本,即XXY+的样本,+表示介于YY和YYY之间;61例判 定为XXY;2例无法判定;6例重测样本判定结果均一致;
(1) Of the 68 positive samples, 5 were determined to be XYY, one of which
Figure PCTCN2018089328-appb-000059
Combined with the NGS analysis data, the samples that were determined to be severely abnormal in the sex chromosome, that is, samples of XXY +, + were between YY and YYY; 61 cases were judged as XXY; 2 cases could not be judged; 6 cases of retested samples were consistent with the judgment results;
(2)370例阴性样本中,366例判定为阴性,4例无法判定。4例无法判定的样本根据NGS分析数据,均由于Y染色体整体偏低导致;(2) Of the 370 negative samples, 366 were negative, and 4 were unsuccessful. According to the NGS analysis data of 4 undecidable samples, they were all caused by the low Y chromosome as a whole;
(3)本方法的准确率可达98.63%。(3) The accuracy of this method can reach 98.63%.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示意性实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the description with reference to the terms “one embodiment”, “some embodiments”, “exemplary embodiments”, “examples”, “specific examples”, or “some examples”, etc., means that the implementation is combined The specific features, structures, materials, or characteristics described in the examples or examples are included in at least one embodiment or example of the present invention. In this specification, the schematic expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art can understand that various changes, modifications, replacements and variations can be made to these embodiments without departing from the principles and spirit of the present invention The scope of the invention is defined by the claims and their equivalents.

Claims (41)

  1. 一种确定男性待测样本是否存在三倍体的方法,其特征在于,包括:A method for determining whether a triploid exists in a male sample to be tested, which comprises:
    (1)将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;(1) comparing the sequencing result from the male test sample with a reference sequence, where the sequencing result is composed of multiple sequencing sequences;
    (2)基于步骤(1)中所述比对的结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;(2) determining an average sequencing depth of a predetermined chromosome based on a result of the comparison in step (1), the predetermined chromosome including a Y chromosome and at least one autosome;
    (3)针对所述至少一条常染色体的每一条,确定所述常染色体平均测序深度与Y染色体的平均测序深度的比例,记为DRi,其中,i表示染色体编号;以及(3) determining, for each of the at least one autosome, a ratio of the average sequencing depth of the autosome to the average sequencing depth of the Y chromosome, denoted as DRi, where i represents a chromosome number; and
    (4)基于步骤(3)中获得的DRi,确定所述男性待测样本是否存在所述三倍体。(4) Based on the DRi obtained in step (3), determine whether the triploid exists in the male test sample.
  2. 根据权利要求1所述的方法,其特征在于,所述三倍体为XXY或者XYY;The method according to claim 1, wherein the triploid is XXY or XYY;
    优选地,所述测序结果来自于低深度测序;Preferably, the sequencing result is from low-depth sequencing;
    任选地,所述待测样本来自流产组织。Optionally, the sample to be tested is from an abortion tissue.
  3. 根据权利要求1所述的方法,其特征在于,所述预定染色体的平均测序深度是基于下列确定的:The method according to claim 1, wherein the average sequencing depth of the predetermined chromosome is determined based on:
    (a)能够与所述预定染色体的参照序列比对上的所述测序序列的序列数;(a) a sequence number of the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome;
    (b)所述预定染色体的参照序列的长度;以及(b) the length of the reference sequence of said predetermined chromosome; and
    (c)所述多个测序序列的平均序列长度。(c) an average sequence length of the plurality of sequencing sequences.
  4. 根据权利要求3所述的方法,其特征在于,所述能够与所述预定染色体的参照序列比对上的所述测序序列为唯一比对序列。The method according to claim 3, wherein the sequencing sequence capable of being aligned with the reference sequence of the predetermined chromosome is a uniquely aligned sequence.
  5. 根据权利要求3所述的方法,其特征在于,在步骤(2)中,所述平均测序深度是按照下列公式确定的:
    Figure PCTCN2018089328-appb-100001
    The method according to claim 3, wherein in step (2), the average sequencing depth is determined according to the following formula:
    Figure PCTCN2018089328-appb-100001
    其中,among them,
    D i表示第i号染色体的平均测序深度, D i represents the average sequencing depth of chromosome i,
    i为1~24范围内的至少一个整数;i is at least one integer in the range of 1 to 24;
    R i表示能够与第i号染色体的参照序列比对上的所述测序序列的序列数, R i represents the number of sequences of the sequencing sequence that can be aligned with the reference sequence of chromosome i,
    R_len表示所述多个测序序列的平均序列长度,R_len represents the average sequence length of the multiple sequencing sequences,
    C_len i表示所述参照序列中第i号染色体的长度。 C_len i represents the length of chromosome i in the reference sequence.
  6. 根据权利要求1所述的方法,其特征在于,所述预定染色体包括至少2条常染色体,优选至少10条常染色体,最优选22条常染色体。The method according to claim 1, wherein the predetermined chromosome comprises at least 2 autosomes, preferably at least 10 autosomes, and most preferably 22 autosomes.
  7. 根据权利要求1所述的方法,其特征在于,待测样本的常染色体与Y染色体的平均测序深度比是按照公式DRi=D i/D 24确定的,其中D i(i=1,2,3,……,22)表示所述预定染色体中所述至少一条常染色体的平均测序深度,D 24表示Y染色体的平均测序深度。 The method according to claim 1, wherein the average sequencing depth ratio of the autosomal and Y chromosomes of the sample to be tested is determined according to the formula DRi = D i / D 24 , where D i (i = 1,2, 3, ..., 22) represents the average sequencing depth of the at least one autosome in the predetermined chromosome, and D 24 represents the average sequencing depth of the Y chromosome.
  8. 根据权利要求1所述的方法,其特征在于,步骤(4)进一步包括:The method according to claim 1, wherein step (4) further comprises:
    (4-1)针对所述预定染色体中所述至少一条常染色体的每一条,确定所述常染色体的DRi与参考深度比的比值,记为DDR i,; (4-1) for each of the at least one autosome in the predetermined chromosome, determining a ratio of a DRi of the autosome to a reference depth ratio, and recording it as DDR i ;
    (4-2)基于步骤(4-1)中获得的DDR i,确定所述预定染色体中所述至少一条常染色体的平均深度比,记为
    Figure PCTCN2018089328-appb-100002
    (4-2) Based on the DDR i obtained in step (4-1), determine an average depth ratio of the at least one autosome in the predetermined chromosome, and record it as
    Figure PCTCN2018089328-appb-100002
    (4-3)基于步骤(4-2)中获得的
    Figure PCTCN2018089328-appb-100003
    确定所述男性待测样本中是否存在所述三倍体。
    (4-3) Based on what was obtained in step (4-2)
    Figure PCTCN2018089328-appb-100003
    It is determined whether the triploid is present in the male test sample.
  9. 根据权利要求8所述的方法,其特征在于,所述参考深度比是预先基于多个具有已知三倍体状态的对照样本确定的。The method according to claim 8, wherein the reference depth ratio is determined in advance based on a plurality of control samples having a known triploid state.
  10. 根据权利要求9所述的方法,其特征在于,所述对照样本针对性染色体,不具有三倍体。The method according to claim 9, wherein the control sample is targeted to a sex chromosome and does not have a triploid.
  11. 根据权利要求9所述的方法,其特征在于,所述参考深度比是预先基于至少100个,优选1000个,具有已知三倍体状态的对照样本确定的。The method according to claim 9, wherein the reference depth ratio is determined in advance based on at least 100, preferably 1000, control samples having a known triploid state.
  12. 根据权利要求9所述的方法,其特征在于,所述参考深度比是基于公式
    Figure PCTCN2018089328-appb-100004
    确定的,
    The method according to claim 9, wherein the reference depth ratio is based on a formula
    Figure PCTCN2018089328-appb-100004
    definite,
    其中,among them,
    n为所述对照样本的数目;n is the number of the control samples;
    DR’ i为(i=1,2,3….22),表示每一个对照样本的第i号常染色体与Y染色体的平均测序深度比, DR ' i is (i = 1,2,3 .... 22), which represents the average sequencing depth ratio of the i-th autosome to the Y chromosome of each control sample,
    Figure PCTCN2018089328-appb-100005
    为第i号染色体的参考深度比。
    Figure PCTCN2018089328-appb-100005
    Is the reference depth ratio of chromosome i.
  13. 根据权利要求8所述的方法,其特征在于,所述比值DDR i是按照公式
    Figure PCTCN2018089328-appb-100006
    确定的。
    The method according to claim 8, wherein the ratio DDR i is according to a formula
    Figure PCTCN2018089328-appb-100006
    definite.
  14. 根据权利要求8所述的方法,其特征在于,所述预定染色体中所述至少一条常染色体的平均深度比
    Figure PCTCN2018089328-appb-100007
    是按照公式
    Figure PCTCN2018089328-appb-100008
    确定的。
    The method according to claim 8, wherein an average depth ratio of said at least one autosome in said predetermined chromosome
    Figure PCTCN2018089328-appb-100007
    Is according to the formula
    Figure PCTCN2018089328-appb-100008
    definite.
  15. 根据权利要求14所述的方法,其特征在于,所述预定染色体包括Y染色体和全部常染色体,并且所述全部常染色体的平均深度比
    Figure PCTCN2018089328-appb-100009
    是按照公式
    Figure PCTCN2018089328-appb-100010
    确定的。
    The method according to claim 14, wherein the predetermined chromosome comprises a Y chromosome and all autosomes, and an average depth ratio of the all autosomes
    Figure PCTCN2018089328-appb-100009
    Is according to the formula
    Figure PCTCN2018089328-appb-100010
    definite.
  16. 根据权利要求8所述的方法,其特征在于,在步骤(4-3)中,基于步骤(4-2)中获得的
    Figure PCTCN2018089328-appb-100011
    确定所述男性待测样本中是否存在所述三倍体是通过如下方式实现的:
    The method according to claim 8, characterized in that in step (4-3), based on the obtained in step (4-2)
    Figure PCTCN2018089328-appb-100011
    Determining whether the triploid is present in the male test sample is achieved by:
    将所述平均深度比
    Figure PCTCN2018089328-appb-100012
    与阈值比较。
    The average depth ratio
    Figure PCTCN2018089328-appb-100012
    Compare with threshold.
  17. 根据权利要求16所述的方法,其特征在于,所述平均深度比
    Figure PCTCN2018089328-appb-100013
    不低于第一阈值是所述男性待测样本为XXY三倍体的指示,所述平均深度比
    Figure PCTCN2018089328-appb-100014
    不超过第二阈值是所述男性待测样本为XYY三倍体的指示。
    The method according to claim 16, wherein the average depth ratio
    Figure PCTCN2018089328-appb-100013
    Not lower than the first threshold is an indication that the male sample to be tested is XXY triploid, and the average depth ratio
    Figure PCTCN2018089328-appb-100014
    Not exceeding the second threshold is an indication that the male test sample is an XYY triploid.
  18. 根据权利要求17所述的方法,其特征在于,所述第一阈值和所述第二阈值是基于多个已知三倍体类型的参考样本确定的。The method according to claim 17, wherein the first threshold and the second threshold are determined based on a plurality of reference samples of known triploid types.
  19. 根据权利要求18所述的方法,其特征在于,所述第一阈值和所述第二阈值是基于100~10000个已知三倍体类型的参考样本确定的。The method according to claim 18, wherein the first threshold and the second threshold are determined based on 100 to 10,000 reference samples of known triploid type.
  20. 根据权利要求18所述的方法,其特征在于,所述第一阈值为至少1.14,优选至少1.15,所述第二阈值为不超过0.9,优选0.88,更优选0.85。The method according to claim 18, wherein the first threshold value is at least 1.14, preferably at least 1.15, and the second threshold value is not more than 0.9, preferably 0.88, more preferably 0.85.
  21. 根据权利要求18所述的方法,其特征在于,步骤(4-3)中,进一步包括,所述平均深度比 位于预定区间范围内,是所述男性待测样本为非三倍体的指示,所述预定区间范围是基于所述第一阈值和所述第二阈值确定的。 The method according to claim 18, wherein in step (4-3), further comprising: the average depth ratio Located within a range of a predetermined interval is an indication that the male test sample is non-triploid, and the range of the predetermined interval is determined based on the first threshold and the second threshold.
  22. 根据权利要求21所述的方法,其特征在于,所述预定区间范围的左端值不小于所述第二阈值,所述预定区间的右端值不高于所述第一阈值。The method according to claim 21, wherein a left end value of the predetermined interval range is not smaller than the second threshold, and a right end value of the predetermined interval is not higher than the first threshold.
  23. 根据权利要求22所述的方法,其特征在于,所述左端值与所述第二阈值的差值以及所述右端值与所述第一阈值的差值分别独立地不小于0.02,优选不小于0.03。The method according to claim 22, wherein the difference between the left end value and the second threshold value and the difference between the right end value and the first threshold value are independently not less than 0.02, preferably not less than 0.02, respectively. 0.03.
  24. 一种确定男性待测样本是否存在三倍体的系统,其特征在于,包括:A system for determining whether a triploid exists in a male sample to be tested, which comprises:
    比对装置,所述比对装置用于将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;An alignment device for comparing the sequencing result from the male test sample with a reference sequence, where the sequencing result is composed of multiple sequencing sequences;
    平均测序深度确定装置,所述平均测序深度确定装置与所述比对装置相连,用于基于比对装置所获得的比对结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;An average sequencing depth determination device, which is connected to the comparison device and is configured to determine an average sequencing depth of a predetermined chromosome based on the comparison result obtained by the comparison device, the predetermined chromosome includes a Y chromosome and At least one autosome
    DRi确定装置,所述DRi确定装置与所述平均测序深度确定装置相连,用于针对所述至少一条常染色体的每一条,确定所述常染色体平均测序深度与Y染色体的平均测序深度的比例,记为DRi,,其中,i表示染色体编号;以及A DRi determination device connected to the average sequencing depth determination device and configured to determine a ratio of the average sequencing depth of the autosome to the average sequencing depth of the Y chromosome for each of the at least one autosome; Recorded as DRi, where i represents the chromosome number; and
    判定装置,所述判定装置与所述DRi确定装置相连,用于基于DRi确定装置中获得的DRi,确定所述男性待测样本是否存在所述三倍体。A determination device, which is connected to the DRi determination device and is configured to determine whether the triploid exists in the male test sample based on the DRi obtained in the DRi determination device.
  25. 根据权利要求24所述的系统,其特征在于,所述三倍体为XXY或者XYY;The system according to claim 24, wherein the triploid is XXY or XYY;
    优选地,所述测序结果来自低深度测序;Preferably, the sequencing result is from low-depth sequencing;
    任选地,所述待测样本来自流产组织;Optionally, the sample to be tested is from an abortion tissue;
    任选地,所述预定染色体的平均测序深度是基于下列确定的:Optionally, the average sequencing depth of the predetermined chromosome is determined based on:
    (a)能够与所述预定染色体的参照序列比对上的所述测序序列的序列数;(a) a sequence number of the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome;
    (b)所述预定染色体的参照序列的长度;以及(b) the length of the reference sequence of said predetermined chromosome; and
    (c)所述多个测序序列的平均序列长度;(c) an average sequence length of the plurality of sequencing sequences;
    任选地,所述能够与所述预定染色体的参照序列比对上的所述测序序列为唯一比对序列;Optionally, the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome is a uniquely aligned sequence;
    任选地,所述平均测序深度是按照下列公式确定的:Optionally, the average sequencing depth is determined according to the following formula:
    Figure PCTCN2018089328-appb-100016
    Figure PCTCN2018089328-appb-100016
    其中,among them,
    D i表示第i号染色体的平均深度, D i represents the average depth of chromosome i,
    i为1~24范围内的至少一个整数,i is at least one integer in the range of 1 to 24,
    R i表示能够与第i染色体的参照序列比对上的所述测序序列的序列数, R i represents the number of sequences of the sequencing sequence capable of being aligned with the reference sequence of the ith chromosome,
    R_len表示所述多个测序序列的平均序列长度,R_len represents the average sequence length of the multiple sequencing sequences,
    C_len i表示所述参照序列中第i号染色体的长度; C_len i represents the length of chromosome i in the reference sequence;
    任选地,所述预定染色体包括至少2条常染色体,优选至少10条常染色体,最优选22条常染色体。Optionally, the predetermined chromosome comprises at least 2 autosomes, preferably at least 10 autosomes, and most preferably 22 autosomes.
  26. 根据权利要求24所述的系统,其特征在于,待测样本的常染色体与Y染色体的平均测序深度比是按照公式DRi=D i/D 24确定的,其中D i(i=1,2,3,……,22)表示所述预定染色体中所述至少一条常染色体的平测序均深度,D 24表示Y染色体的平均测序深度。 The system according to claim 24, wherein the average sequencing depth ratio of the autosomal and Y chromosomes of the sample to be tested is determined according to the formula DRi = D i / D 24 , where D i (i = 1,2, 3, ..., 22) represents the average sequencing depth of the at least one autosome in the predetermined chromosome, and D 24 represents the average sequencing depth of the Y chromosome.
  27. 根据权利要求24所述的系统,其特征在于,所述判定装置包括:The system according to claim 24, wherein the determining device comprises:
    DDR i确定单元,所述DDR i确定单元用于针对所述预定染色体中所述至少一条 常染色体的每一条,确定所述常染色体的DRi与参考深度比的比值DDR i,; DDR i determination means, the means for determining the ratio of DDR i DDR i for at least said predetermined one of said chromosomes of each autosomal determining the ratio DRi autosomal reference depth;
    Figure PCTCN2018089328-appb-100017
    确定单元,所述确定单元与所述DDR i确定单元相连,用于基于DDR i确定单元中获得的DDR i,确定所述预定染色体中所述至少一条常染色体的平均深度比
    Figure PCTCN2018089328-appb-100018
    判定单元,所述判定单元与所述
    Figure PCTCN2018089328-appb-100019
    确定单元相连,用于基于
    Figure PCTCN2018089328-appb-100020
    确定单元中获得的
    Figure PCTCN2018089328-appb-100021
    确定所述男性待测样本中是否存在所述三倍体;
    Figure PCTCN2018089328-appb-100017
    A determining unit connected to the DDR i determining unit and configured to determine an average depth ratio of the at least one autosome in the predetermined chromosome based on the DDR i obtained in the DDR i determining unit;
    Figure PCTCN2018089328-appb-100018
    Determination unit, said determination unit and said
    Figure PCTCN2018089328-appb-100019
    Determination unit is connected for
    Figure PCTCN2018089328-appb-100020
    Determined in the unit
    Figure PCTCN2018089328-appb-100021
    Determining whether the triploid exists in the male test sample;
    任选地,所述参考深度比是预先基于多个具有已知三倍体状态的对照样本确定的;Optionally, the reference depth ratio is determined in advance based on a plurality of control samples having a known triploid state;
    任选地,所述对照样本针对性染色体,不具有三倍体;Optionally, the control sample is targeted to a sex chromosome and does not have a triploid;
    任选地,所述参考深度比是预先基于至少100个,优选1000个,具有已知三倍体状态的对照样本确定的;Optionally, the reference depth ratio is determined in advance based on at least 100, preferably 1000, control samples having a known triploid state;
    任选地,所述参考深度比是基于公式
    Figure PCTCN2018089328-appb-100022
    确定的,
    Optionally, the reference depth ratio is based on a formula
    Figure PCTCN2018089328-appb-100022
    definite,
    其中,among them,
    n为所述对照样本的数目;n is the number of the control samples;
    DR’ i为(i=1,2,3….22),表示每一个对照样本的第i号常染色体与Y染色体的平均测序深度比, DR ' i is (i = 1,2,3 .... 22), which represents the average sequencing depth ratio of the i-th autosome to the Y chromosome of each control sample,
    Figure PCTCN2018089328-appb-100023
    为第i号染色体的参考深度比。
    Figure PCTCN2018089328-appb-100023
    Is the reference depth ratio of chromosome i.
  28. 根据权利要求27所述的系统,其特征在于,所述比值DDR i是按照公式
    Figure PCTCN2018089328-appb-100024
    确定的。
    The system according to claim 27, wherein the ratio DDR i is according to a formula
    Figure PCTCN2018089328-appb-100024
    definite.
  29. 根据权利要求27所述的系统,其特征在于,所述预定染色体中所述至少一条常染色体的平均深度比
    Figure PCTCN2018089328-appb-100025
    是按照公式
    Figure PCTCN2018089328-appb-100026
    确定的;
    The system according to claim 27, wherein an average depth ratio of said at least one autosome in said predetermined chromosome
    Figure PCTCN2018089328-appb-100025
    Is according to the formula
    Figure PCTCN2018089328-appb-100026
    definite;
    任选地,所述预定染色体包括Y染色体和全部常染色体,并且所述全部常染色体的平均深度比
    Figure PCTCN2018089328-appb-100027
    是按照公式
    Figure PCTCN2018089328-appb-100028
    确定的。
    Optionally, the predetermined chromosome includes a Y chromosome and all autosomes, and an average depth ratio of the all autosomes
    Figure PCTCN2018089328-appb-100027
    Is according to the formula
    Figure PCTCN2018089328-appb-100028
    definite.
  30. 根据权利要求27所述的系统,其特征在于,所述判定单元适于执行以下操作:The system according to claim 27, wherein the determination unit is adapted to perform the following operations:
    将所述平均深度比
    Figure PCTCN2018089328-appb-100029
    与阈值比较,确定所述男性待测样本中是否存在所述三倍体;
    The average depth ratio
    Figure PCTCN2018089328-appb-100029
    Comparing with a threshold to determine whether the triploid exists in the male test sample;
    优选地,所述平均深度比
    Figure PCTCN2018089328-appb-100030
    不低于第一阈值是所述男性待测样本为XXY三倍 体的指示,所述平均深度比
    Figure PCTCN2018089328-appb-100031
    不超过第二阈值是所述男性待测样本为XYY三倍体的指示。
    Preferably, the average depth ratio
    Figure PCTCN2018089328-appb-100030
    Not lower than the first threshold is an indication that the male sample to be tested is XXY triploid, and the average depth ratio
    Figure PCTCN2018089328-appb-100031
    Not exceeding the second threshold is an indication that the male test sample is an XYY triploid.
  31. 根据权利要求30所述的系统,其特征在于,所述第一阈值和所述第二阈值是基于多个已知三倍体类型的参考样本确定的;The system according to claim 30, wherein the first threshold and the second threshold are determined based on a plurality of reference samples of known triploid types;
    任选地,所述第一阈值和所述第二阈值是基于100~10000件已知三倍体类型的参考样本确定的;Optionally, the first threshold and the second threshold are determined based on 100 to 10,000 reference samples of known triploid type;
    优选地,所述第一阈值为至少1.14,优选至少1.15,所述第二阈值为不超过0.9,优选0.88,更优选0.85。Preferably, the first threshold value is at least 1.14, preferably at least 1.15, and the second threshold value is not more than 0.9, preferably 0.88, and more preferably 0.85.
  32. 根据权利要求30所述的系统,其特征在于,所述判定单元进一步适于执行以下操作,所述平均深度比
    Figure PCTCN2018089328-appb-100032
    位于预定区间范围内,是所述男性待测样本为非三倍体的指示,所述预定区间范围是基于所述第一阈值和所述第二阈值确定的;
    The system according to claim 30, wherein the determination unit is further adapted to perform the following operation, the average depth ratio
    Figure PCTCN2018089328-appb-100032
    Located within a predetermined interval range is an indication that the male test sample is non-triploid, and the predetermined interval range is determined based on the first threshold and the second threshold;
    任选地,所述预定区间范围的左端值不小于所述第二阈值,所述预定区间的右端值不高于所述第一阈值;Optionally, a left end value of the predetermined interval range is not smaller than the second threshold value, and a right end value of the predetermined interval range is not higher than the first threshold value;
    任选地,所述左端值与所述第二阈值的差值以及所述右端值与所述第一阈值的差值分别独立地不小于0.02,优选不小于0.03。Optionally, the difference between the left end value and the second threshold value and the difference between the right end value and the first threshold value are independently not less than 0.02, preferably not less than 0.03.
  33. 一种计算机可读介质,其特征在于,所述计算机可读介质中存储有指令,所述指令被适于处理执行以下步骤确定男性待测样本是否存在三倍体,A computer-readable medium, characterized in that the computer-readable medium stores instructions, and the instructions are adapted to process and execute the following steps to determine whether a male sample has a triploid,
    (1)将来自于所述男性待测样本的测序结果与参照序列进行比对,所述测序结果由多个测序序列构成;(1) comparing the sequencing result from the male test sample with a reference sequence, where the sequencing result is composed of multiple sequencing sequences;
    (2)基于步骤(1)中所述比对的结果,确定预定染色体的平均测序深度,所述预定染色体包括Y染色体和至少一条常染色体;(2) determining an average sequencing depth of a predetermined chromosome based on a result of the comparison in step (1), the predetermined chromosome including a Y chromosome and at least one autosome;
    (3)针对所述至少一条常染色体的每一条,确定所述常染色体的平均测序深度与Y染色体的平均测序深度的比例,记为DRi,其中,i表示染色体编号;以及(3) for each of the at least one autosome, determining a ratio of an average sequencing depth of the autosome to an average sequencing depth of the Y chromosome, and record it as DRi, where i represents a chromosome number; and
    (4)基于步骤(3)中获得的DRi,确定所述男性待测样本是否存在所述三倍体。(4) Based on the DRi obtained in step (3), determine whether the triploid exists in the male test sample.
  34. 根据权利要求33所述的计算机可读介质,其特征在于,所述三倍体为XXY或者XYY;The computer-readable medium according to claim 33, wherein the triploid is XXY or XYY;
    优选地,所述测序结果来自低深度测序;Preferably, the sequencing result is from low-depth sequencing;
    任选地,所述待测样本来自流产组织;Optionally, the sample to be tested is from an abortion tissue;
    任选地,所述预定染色体的平均测序深度是基于下列确定的:Optionally, the average sequencing depth of the predetermined chromosome is determined based on:
    (a)能够与所述预定染色体的参照序列比对上的所述测序序列的序列数;(a) a sequence number of the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome;
    (b)所述预定染色体的参照序列的长度;以及(b) the length of the reference sequence of said predetermined chromosome; and
    (c)所述多个测序序列的平均序列长度;(c) an average sequence length of the plurality of sequencing sequences;
    任选地,所述能够与所述预定染色体的参照序列比对上的所述测序序列为唯一比对序列;Optionally, the sequencing sequence capable of being aligned with a reference sequence of the predetermined chromosome is a uniquely aligned sequence;
    任选地,在步骤(2)中,所述平均测序深度是按照下列公式确定的:Optionally, in step (2), the average sequencing depth is determined according to the following formula:
    Figure PCTCN2018089328-appb-100033
    Figure PCTCN2018089328-appb-100033
    其中,among them,
    D i表示第i号染色体的平均深度, D i represents the average depth of chromosome i,
    i为1~24范围内的至少一个整数;i is at least one integer in the range of 1 to 24;
    R i表示能够与第i染色体的参照序列比对上的所述测序序列的序列数, R i represents the number of sequences of the sequencing sequence capable of being aligned with the reference sequence of the ith chromosome,
    R_len表示所述多个测序序列的平均序列长度,R_len represents the average sequence length of the multiple sequencing sequences,
    C_len i表示所述参照序列中第i号染色体的长度; C_len i represents the length of chromosome i in the reference sequence;
    任选地,所述预定染色体包括至少2条常染色体,优选至少10条常染色体,最优选22条常染色体。Optionally, the predetermined chromosome comprises at least 2 autosomes, preferably at least 10 autosomes, and most preferably 22 autosomes.
  35. 根据权利要求33所述的计算机可读介质,其特征在于,待测样本的常染色体与Y染色体的平均测序深度比是按照公式DRi=D i/D 24确定的,其中D i(i=1,2,3,……,22)表示所述预定染色体中所述至少一条常染色体的平测序均深度,D 24表示Y染色体的平均测序深度。 The computer-readable medium according to claim 33, wherein the average sequencing depth ratio of the autosome and Y chromosome of the sample to be tested is determined according to the formula DRi = D i / D 24 , where D i (i = 1 , 2, 3, ..., 22) represents the average sequencing depth of the at least one autosome in the predetermined chromosome, and D 24 represents the average sequencing depth of the Y chromosome.
  36. 根据权利要求33所述的计算机可读介质,其特征在于,步骤(4)进一步包括:The computer-readable medium of claim 33, wherein step (4) further comprises:
    (4-1)针对所述预定染色体中所述至少一条常染色体的每一条,确定所述常染色体的DRi与参考深度比的比值,记为DDR i,; (4-1) for each of the at least one autosome in the predetermined chromosome, determining a ratio of a DRi of the autosome to a reference depth ratio, and recording it as DDR i ;
    (4-2)基于步骤(4-1)中获得的DDR i,确定所述预定染色体中所述至少一条常染色体的平均深度比,记为
    Figure PCTCN2018089328-appb-100034
    (4-2) Based on the DDR i obtained in step (4-1), determine an average depth ratio of the at least one autosome in the predetermined chromosome, and record it as
    Figure PCTCN2018089328-appb-100034
    (4-3)基于步骤(4-2)中获得的
    Figure PCTCN2018089328-appb-100035
    确定所述男性待测样本中是否存在所述三倍体;
    (4-3) Based on what was obtained in step (4-2)
    Figure PCTCN2018089328-appb-100035
    Determining whether the triploid exists in the male test sample;
    任选地,所述参考深度比是预先基于多个具有已知三倍体状态的对照样本确定的;Optionally, the reference depth ratio is determined in advance based on a plurality of control samples having a known triploid state;
    任选地,所述对照样本针对性染色体,不具有三倍体;Optionally, the control sample is targeted to a sex chromosome and does not have a triploid;
    任选地,所述参考深度比是预先基于至少100个,优选1000个,具有已知三倍体状态的对照样本确定的;Optionally, the reference depth ratio is determined in advance based on at least 100, preferably 1000, control samples having a known triploid state;
    任选地,所述参考深度比是基于公式
    Figure PCTCN2018089328-appb-100036
    确定的,
    Optionally, the reference depth ratio is based on a formula
    Figure PCTCN2018089328-appb-100036
    definite,
    其中,among them,
    n为所述对照样本的数目;n is the number of the control samples;
    DR’ i为(i=1,2,3….22),表示每一个对照样本的第i号常染色体与Y染色体的平均测序深度比, DR ' i is (i = 1,2,3 .... 22), which represents the average sequencing depth ratio of the i-th autosome to the Y chromosome of each control sample,
    Figure PCTCN2018089328-appb-100037
    为第i号染色体的参考深度比。
    Figure PCTCN2018089328-appb-100037
    Is the reference depth ratio of chromosome i.
  37. 根据权利要求36所述的计算机可读介质,其特征在于,所述比值DDR i是按照公式
    Figure PCTCN2018089328-appb-100038
    确定的。
    The computer-readable medium of claim 36, wherein the ratio DDR i is according to a formula
    Figure PCTCN2018089328-appb-100038
    definite.
  38. 根据权利要求36所述的计算机可读介质,其特征在于,所述预定染色体中所述至少一条常染色体的平均深度比
    Figure PCTCN2018089328-appb-100039
    是按照公式
    Figure PCTCN2018089328-appb-100040
    确定的;
    The computer-readable medium of claim 36, wherein the average depth ratio of the at least one autosome in the predetermined chromosome is
    Figure PCTCN2018089328-appb-100039
    Is according to the formula
    Figure PCTCN2018089328-appb-100040
    definite;
    任选地,所述预定染色体包括Y染色体和全部常染色体,并且所述全部常染色体的平均深度比
    Figure PCTCN2018089328-appb-100041
    是按照公式
    Figure PCTCN2018089328-appb-100042
    确定的。
    Optionally, the predetermined chromosome includes a Y chromosome and all autosomes, and an average depth ratio of the all autosomes
    Figure PCTCN2018089328-appb-100041
    Is according to the formula
    Figure PCTCN2018089328-appb-100042
    definite.
  39. 根据权利要求36所述的计算机可读介质,其特征在于,步骤(4-3)中,基于步骤(4-2)中获得的
    Figure PCTCN2018089328-appb-100043
    确定所述男性待测样本中是否存在所述三倍体是通过如下方式实现的:
    The computer-readable medium according to claim 36, wherein in step (4-3), based on the information obtained in step (4-2)
    Figure PCTCN2018089328-appb-100043
    Determining whether the triploid is present in the male test sample is achieved by:
    将所述平均深度比
    Figure PCTCN2018089328-appb-100044
    与阈值比较;
    The average depth ratio
    Figure PCTCN2018089328-appb-100044
    Compared with threshold
    任选地,所述平均深度比
    Figure PCTCN2018089328-appb-100045
    不低于第一阈值是所述男性待测样本为XXY三倍体的指示,所述平均深度比
    Figure PCTCN2018089328-appb-100046
    不超过第二阈值是所述男性待测样本为XYY三倍体的指示。
    Optionally, the average depth ratio
    Figure PCTCN2018089328-appb-100045
    Not lower than the first threshold is an indication that the male sample to be tested is XXY triploid, and the average depth ratio
    Figure PCTCN2018089328-appb-100046
    Not exceeding the second threshold is an indication that the male test sample is an XYY triploid.
  40. 根据权利要求39所述的计算机可读介质,其特征在于,所述第一阈值和所述第二阈值是基于多个已知三倍体类型的参考样本确定的;The computer-readable medium of claim 39, wherein the first threshold and the second threshold are determined based on a plurality of reference samples of known triploid type;
    任选地,所述第一阈值和所述第二阈值是基于100~10000个已知三倍体类型的参考样本确定的;Optionally, the first threshold and the second threshold are determined based on 100 to 10,000 reference samples of known triploid type;
    优选地,所述第一阈值为至少1.14,优选至少1.15,所述第二阈值为不超过0.9, 优选0.88,更优选0.85。Preferably, the first threshold value is at least 1.14, preferably at least 1.15, and the second threshold value is not more than 0.9, preferably 0.88, and more preferably 0.85.
  41. 根据权利要求40所述的计算机可读介质,其特征在于,步骤(4-3)中,进一步包括,所述平均深度比
    Figure PCTCN2018089328-appb-100047
    位于预定区间范围内,是所述男性待测样本为非三倍体的指示,所述预定区间范围是基于所述第一阈值和所述第二阈值确定的;
    The computer-readable medium according to claim 40, wherein in step (4-3), further comprising: the average depth ratio
    Figure PCTCN2018089328-appb-100047
    Located within a predetermined interval range is an indication that the male test sample is non-triploid, and the predetermined interval range is determined based on the first threshold and the second threshold;
    任选地,所述预定区间范围的左端值不小于所述第二阈值,所述预定区间的右端值不高于所述第一阈值;Optionally, a left end value of the predetermined interval range is not smaller than the second threshold value, and a right end value of the predetermined interval range is not higher than the first threshold value;
    优选地,所述左端值与所述第二阈值的差值以及所述右端值与所述第一阈值的差值分别独立地不小于0.02,优选不小于0.03。Preferably, the difference between the left end value and the second threshold value and the difference between the right end value and the first threshold value are independently not less than 0.02, preferably not less than 0.03.
PCT/CN2018/089328 2018-05-31 2018-05-31 Method and system for determining presence of triploids in male sample to be tested, and computer readable medium WO2019227420A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880056925.4A CN111373054A (en) 2018-05-31 2018-05-31 Method, system and computer readable medium for determining the presence of triploids in a male test sample
PCT/CN2018/089328 WO2019227420A1 (en) 2018-05-31 2018-05-31 Method and system for determining presence of triploids in male sample to be tested, and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/089328 WO2019227420A1 (en) 2018-05-31 2018-05-31 Method and system for determining presence of triploids in male sample to be tested, and computer readable medium

Publications (1)

Publication Number Publication Date
WO2019227420A1 true WO2019227420A1 (en) 2019-12-05

Family

ID=68697709

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/089328 WO2019227420A1 (en) 2018-05-31 2018-05-31 Method and system for determining presence of triploids in male sample to be tested, and computer readable medium

Country Status (2)

Country Link
CN (1) CN111373054A (en)
WO (1) WO2019227420A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113113081B (en) * 2020-08-31 2021-12-14 东莞博奥木华基因科技有限公司 System for detecting polyploid and genome homozygous region ROH based on CNV-seq sequencing data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110003293A1 (en) * 2006-06-14 2011-01-06 Artemis Health, Inc. Fetal aneuploidy detection by sequencing
CN104120181A (en) * 2011-06-29 2014-10-29 深圳华大基因医学有限公司 Method and device for carrying out GC correction on chromosome sequencing results
CN104156631A (en) * 2014-07-14 2014-11-19 天津华大基因科技有限公司 Triploid testing method for chromosomes
CN105074011A (en) * 2013-06-13 2015-11-18 阿瑞奥萨诊断公司 Statistical analysis for non-invasive sex chromosome aneuploidy determination
CN105825076A (en) * 2015-01-08 2016-08-03 北京圣庭生物技术有限公司 Method for removing GC preferences in euchromosomes and between chromosomes as well as detection system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100112590A1 (en) * 2007-07-23 2010-05-06 The Chinese University Of Hong Kong Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment
AU2012261664B2 (en) * 2011-06-29 2014-07-03 Bgi Genomics Co., Ltd. Noninvasive detection of fetal genetic abnormality
EP2728014B1 (en) * 2012-10-31 2015-10-07 Genesupport SA Non-invasive method for detecting a fetal chromosomal aneuploidy
CN106029899B (en) * 2013-09-30 2021-08-03 深圳华大基因股份有限公司 Method, system and computer readable medium for determining SNP information in predetermined region of chromosome
WO2015089726A1 (en) * 2013-12-17 2015-06-25 深圳华大基因科技有限公司 Chromosome aneuploidy detection method and apparatus therefor
WO2017220156A1 (en) * 2016-06-23 2017-12-28 Trisomytest, S.R.O. A method for non-invasive prenatal detection of fetal chromosome aneuploidy from maternal blood

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110003293A1 (en) * 2006-06-14 2011-01-06 Artemis Health, Inc. Fetal aneuploidy detection by sequencing
CN104120181A (en) * 2011-06-29 2014-10-29 深圳华大基因医学有限公司 Method and device for carrying out GC correction on chromosome sequencing results
CN105074011A (en) * 2013-06-13 2015-11-18 阿瑞奥萨诊断公司 Statistical analysis for non-invasive sex chromosome aneuploidy determination
CN104156631A (en) * 2014-07-14 2014-11-19 天津华大基因科技有限公司 Triploid testing method for chromosomes
CN105825076A (en) * 2015-01-08 2016-08-03 北京圣庭生物技术有限公司 Method for removing GC preferences in euchromosomes and between chromosomes as well as detection system

Also Published As

Publication number Publication date
CN111373054A (en) 2020-07-03

Similar Documents

Publication Publication Date Title
US11031100B2 (en) Size-based sequencing analysis of cell-free tumor DNA for classifying level of cancer
TWI640636B (en) A method for simultaneous performing gene locus, chromosome and linkage analysis
Hou et al. Genome analyses of single human oocytes
AU2019246833B2 (en) Maternal plasma transcriptome analysis by massively parallel RNA sequencing
Hayes et al. Diagnosis of copy number variation by Illumina next generation sequencing is comparable in performance to oligonucleotide array comparative genomic hybridisation
US20220199196A1 (en) Comprehensive detection of single cell genetic structural variations
BR112013020220B1 (en) METHOD FOR DETERMINING THE PLOIDIA STATUS OF A CHROMOSOME IN A PREGNANT FETUS
US20210130900A1 (en) Multiplexed parallel analysis of targeted genomic regions for non-invasive prenatal testing
WO2022105629A1 (en) Method for screening snp sites for detecting contamination level of sample and method for detecting contamination level of sample
CN106795551B (en) CNV analysis method and detection device for single cell chromosome
Mamoor A single nucleotide variant on chromosome 12 residing within RAB3IP distinguishes patients with luminal B breast cancer.
WO2019227420A1 (en) Method and system for determining presence of triploids in male sample to be tested, and computer readable medium
WO2014075228A1 (en) Method, system and computer readable medium for determining whether chromosome number variation exists in biological sample
TWI637058B (en) Determining fetal genomes for multiple fetus pregnancies
CN114269948A (en) Method for detecting loss of heterozygosity by low-depth genome sequencing
CN114836536B (en) Method and system for screening single-cell high-amplification region based on MALBAC
US20160138105A1 (en) System and methods for determining a woman's risk of aneuploid conception
Xing et al. Long‐read Oxford nanopore sequencing reveals a de novo case of complex chromosomal rearrangement involving chromosomes 2, 7, and 13
Mamoor A single nucleotide variant on chromosome 5 residing within LINC01331 distinguishes patients with basal-like human breast cancer.
TWI564742B (en) Methods for determining the aneuploidy of fetal chromosomes, systems and computer-readable media
Ip et al. Molecular Techniques in the Diagnosis and Monitoring of Acute and Chronic Leukaemias
Veerappa et al. Global patterns of large copy number variations in the human genome reveal complexity in chromosome organization
Mamoor A single-nucleotide variant on chromosome 9, rs1156793, residing within PTPRD is associated with the basal-like (triple negative) phenotype in human breast cancer, and black women display an allele frequency imbalance at this site.
Paulraj et al. 44. Ring chromosome 7 in patients with dysplastic features in bone marrow
CN117737272A (en) Screening method for target microorganism markers and application of screening method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920198

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18920198

Country of ref document: EP

Kind code of ref document: A1