CN107273715A - A kind of detection method and device - Google Patents

A kind of detection method and device Download PDF

Info

Publication number
CN107273715A
CN107273715A CN201710326413.2A CN201710326413A CN107273715A CN 107273715 A CN107273715 A CN 107273715A CN 201710326413 A CN201710326413 A CN 201710326413A CN 107273715 A CN107273715 A CN 107273715A
Authority
CN
China
Prior art keywords
mrow
probability
sample
pollution
pollution rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710326413.2A
Other languages
Chinese (zh)
Other versions
CN107273715B (en
Inventor
李阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yaji Technology Co.,Ltd.
Original Assignee
Anji Kang Er (shenzhen) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anji Kang Er (shenzhen) Technology Co Ltd filed Critical Anji Kang Er (shenzhen) Technology Co Ltd
Priority to CN201710326413.2A priority Critical patent/CN107273715B/en
Publication of CN107273715A publication Critical patent/CN107273715A/en
Application granted granted Critical
Publication of CN107273715B publication Critical patent/CN107273715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention is applied to molecular biosciences information detection technology field there is provided a kind of detection method and device, and methods described includes:Obtain and the result data that high-flux sequence is obtained is carried out to sample;According to the result data, the Making by Probability Sets of the pollution rate of the sample is counted using bayes method;The Making by Probability Sets obtained according to statistics determines the level of pollution of the sample.Technical scheme realizes the accurate detection to the sample contamination level of high-flux sequence data, it is to avoid difficulty and uncertainty that sample contamination control is brought are carried out before sequencing, the accuracy rate of post analysis result is greatly improved.

Description

A kind of detection method and device
Technical field
The invention belongs to molecular biosciences information detection technology field, more particularly to a kind of detection method and device.
Background technology
When carrying out hereditary disease genetic test using high-flux sequence, a large amount of short long sequences of reading can be produced to carry out deoxidation core Ribosomal ribonucleic acid (deoxyribonucleic acid, DNA) is sequenced and detection in Gene Mutation, but in the process, cross pollution is outstanding It is that influence of the cross pollution to high-flux sequence is larger between sample, even a small amount of pollution can cause testing result to be forbidden Really, there is the error result of false positive or false negative.
In existing high-flux sequence flow, intersection is reduced often through the experiment flow and sample process of early stage dirty The generation of dye, but control cross pollution difficulty is larger in this way, and with uncertainty, due to sequencing depth and meter The limitation of calculation method, it is also no to determine at present still not to the direct cross pollution detection method of high-flux sequence data The method of high-flux sequence sample cross contamination during amount analysis hereditary disease genetic test.
The content of the invention
In view of this, in the prior art can not be right with solution the embodiments of the invention provide a kind of detection method and device The problem of high-flux sequence data carry out sample contamination detection.
The first aspect of the embodiment of the present invention provides a kind of detection method, including:
Obtain and the result data that high-flux sequence is obtained is carried out to sample;
According to the result data, the Making by Probability Sets of the pollution rate of the sample is counted using bayes method;
The Making by Probability Sets obtained according to statistics determines the level of pollution of the sample.
The second aspect of the embodiment of the present invention provides a kind of detection means, including:
Acquisition module, the result data that high-flux sequence is obtained is carried out to sample for obtaining;
Statistical module, for according to the result data, counted using bayes method the sample pollution rate it is general Rate set;
Determining module, the Making by Probability Sets for being obtained according to statistics determines the level of pollution of the sample.
The beneficial effect that the embodiment of the present invention exists compared with prior art is:Obtain and sample progress high-flux sequence is obtained The result data arrived, the Making by Probability Sets of the pollution rate of bayes method statistical sample is used according to result data, and general according to this Rate set determines the level of pollution of sample, realizes the accurate detection to the sample contamination level of high-flux sequence data, it is to avoid Difficulty and uncertainty that sample contamination control is brought are carried out before sequencing, the accurate of post analysis result is greatly improved Rate.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art In required for the accompanying drawing that uses be briefly described, it should be apparent that, drawings in the following description are only some of the present invention Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these Accompanying drawing obtains other accompanying drawings.
Fig. 1 is a kind of implementation process figure for detection method that the embodiment of the present invention one is provided;
Fig. 2 is a kind of implementation process figure for detection method that the embodiment of the present invention two is provided;
Fig. 3 is a kind of structured flowchart for detection means that the embodiment of the present invention three is provided;
Fig. 4 is a kind of structured flowchart for detection means that the embodiment of the present invention four is provided.
Embodiment
In describing below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etc Body details, thoroughly to understand the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention can also be realized in the other embodiments of details.In other situations, omit to well-known system, device, electricity Road and the detailed description of method, in case unnecessary details hinders description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one:
Fig. 1 is a kind of flow chart for detection method that the embodiment of the present invention one is provided, the executive agent of the embodiment of the present invention For computing device, it can be specifically computer, server etc., and a kind of detection method of Fig. 1 examples can specifically include step S101 is to step S103, and details are as follows:
S101, acquisition carry out the result data that high-flux sequence is obtained to sample.
High-flux sequence (High-throughput sequencing) is also known as " next generation " sequencing technologies (" Next- Generation " sequencing technology), so that once parallel sequence can be carried out to millions of DNA moleculars to hundreds of thousands Row are determined and general reading is long shorter for mark.
Specifically, by carrying out high-flux sequence to sample, one group of result data is obtained.
S102, according to result data, using the Making by Probability Sets of the pollution rate of bayes method statistical sample.
Bayes method (Bayesian Analysis) provides a kind of method for calculating and assuming probability, and this method is Based on the assumption that prior probability, given assume that the lower probability for observing different pieces of information and the data observed are drawn in itself. It will be integrated on the prior information of unknown parameter and sample information, posterior information drawn further according to Bayesian formula, and according to rear Test the information inference unknown parameter.
Specifically, the result data obtained according to step S101, to the pollution rate of sample, this unknown parameter infers one Prior probability, and the sample data of the prior probability and high-flux sequence is integrated, sample is drawn according to Bayesian formula Pollution rate posterior probability, and the corresponding posterior probability of a variety of possible values of the pollution rate of sample is constituted into Making by Probability Sets, In the Making by Probability Sets, the corresponding probability of a variety of possible pollution rates of sample is contained.
S103, the Making by Probability Sets obtained according to statistics determine the level of pollution of sample.
Specifically, according to the Making by Probability Sets of the pollution rate of the obtained samples of step S102, it will be met in the Making by Probability Sets The corresponding pollution rate of probability of preparatory condition is defined as the level of pollution of sample.
Preparatory condition can be to being averaged that the probable value progress progress arithmetic mean more than default probability threshold value is obtained Probable value, can also the maximum probable value of probable value, can also be most probable value of equal probabilities value occurrence number etc., specifically According to being configured the need for practical application, it is not limited herein.
In the present embodiment, obtain and the result data that high-flux sequence is obtained is carried out to sample, shellfish is used according to result data The Making by Probability Sets of the pollution rate of this method statistic sample of leaf, and the level of pollution of sample is determined according to the Making by Probability Sets, realize Accurate detection to the sample contamination level of high-flux sequence data, it is to avoid carry out what sample contamination control was brought before sequencing Difficulty and uncertainty, greatly improve the accuracy rate of post analysis result.
Embodiment two:
Fig. 2 is a kind of flow chart for detection method that the embodiment of the present invention two is provided, the executive agent of the embodiment of the present invention For computing device, it can be specifically computer, server etc., and a kind of detection method of Fig. 2 examples can specifically include step S201 is to step S206, and details are as follows:
S201, acquisition carry out the result data that high-flux sequence is obtained to sample.
High-flux sequence is also known as " next generation " sequencing technologies, with can be once parallel to hundreds of thousands to millions of DNA moleculars Carry out sequencing and general reading is long shorter for mark.
Specifically, by carrying out high-flux sequence to sample, one group of result data is obtained.
S202, the pollution rate for enumerating according to fixed step size sample.
Specifically, the span of the pollution rate of sample is between 0 to 1, i.e., more than or equal to 0 and less than or equal to 1.
Fixed step size could be arranged to 0.01, but be not limited to this, with specific reference to practical application the need for be configured, this Place is not limited.
Assuming that so that fixed step size is 0.1 as an example, the pollution rate for enumerating sample according to the fixed step size can obtain 11 dirts The value of dye rate, be respectively:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 and 1.
It is understood that the value of fixed step size is smaller, the value quantity of the pollution rate of sample is more, the sample finally given This level of pollution is more accurate.
S203, the probability according to each pollution rate of formula (1) calculating:
Wherein, α is pollution rate, and F (α) is the probability of pollution rate, and S is the quantity in the mutational site in result data, R be The quantity of high-flux sequence sequence on each site, G is sample genotype, and g is pollution genotype, bijFor in i-th site The base observed in j-th of high-flux sequence sequence, af is the gene frequency of sample population, and P represents probability, P (bij| G) prior probability for the base that j-th of high-flux sequence sequence in i-th of site occurs in reference gene group is represented.
Specifically, the value of each pollution rate included to step S202, each pollution rate is calculated according to formula (1) Probability.
Site refers to the position of a gene or mark on chromosome.
Base includes adenine (ADENINE, A), and thymidine (THYMINE, T), guanine (GUANINE, G) and born of the same parents are phonetic Four kinds of pyridine (CYTOSINE, C), DNA is that, by the helical structure of four kinds of base compositions, base exists in the form of base-pair, base To being that the base being mutually matched for a pair is connected by hydrogen bond.
It should be noted that four kinds of bases can form n base-pair, including B1, B2, B3 ..., Bn, sample genotype G Belong to take N number of base-pair, i.e. G ∈ { B1, B2, B3 ..., Bn }, g ∈ { B1, B2, B3 ..., Bn } with pollution genotype g.
Gene frequency is allele percentage shared in the specific gene seat of sample population, for showing sample The diversity of gene in this population.
Further, there is base in reference gene group in j-th of high-flux sequence sequence for calculating i-th of site Prior probability P (bij| when g), reference gene group deviation is introduced, i.e., probable deviation is set to prior probability, for adjusting and inferring Prior probability, so as to improve the accuracy of the probability calculation to pollution rate.
Specific deviation probability can be configured according to the situation of practical application, be not limited herein.For example, probability is inclined Difference could be arranged to 0.025, i.e., will refer to the prior probability of the base occurred in reference gene group from 0.5 lifting to 0.525 The prior probability for the base not occurred in genome drops to 0.475 from 0.5.
S204, the probability composition Making by Probability Sets by each pollution rate.
Specifically, the probability of the obtained each pollution rates of step S203 is constituted into Making by Probability Sets.
For example, so that fixed step size is 0.1 as an example, obtaining the probability of 11 pollution rates, this 11 probability are constituted into probability set Close.
S205, the probability for comparing each pollution rate in Making by Probability Sets, sample is defined as by the corresponding pollution rate of maximum probability Level of pollution.
Specifically, the Making by Probability Sets obtained according to step S204, the maximum probability of select probability value in the Making by Probability Sets Corresponding pollution rate is defined as the level of pollution of sample.
For example, by fixed step size be 0.1 exemplified by, obtained Making by Probability Sets for 0.1,0,0.2,0.3,0.15,0.4,0.3, 0.25,0.32,0.2,0.24 }, the most probable value in the Making by Probability Sets is 0.4, then calculates the maximum probability using formula (1) The pollution rate of value is the level of pollution of sample.
S206, the level of pollution according to sample, the confidence level of assay data.
Specifically, the level of pollution determined according to step S205, examines the credible of the obtained result data of high-flux sequence Degree.
If the level of pollution that the method for inspection of confidence level can be specifically sample exceedes default pollution threshold, knot is assert Fruit data are insincere, if the level of pollution of sample is not less than default pollution threshold, assert that result data is credible.For example, false If result data is shown as positive, if the level of pollution of sample exceedes default pollution threshold, confirm that the result data is shown The positive be false positive.
In the present embodiment, obtain and the result data that high-flux sequence is obtained is carried out to sample, then arranged according to fixed step size The pollution rate of sample is lifted, the probability of each pollution rate is calculated each pollution rate according to formula (1), and introduce when calculating probability Reference gene group deviation, probable deviation is set to prior probability, and the probability for calculating obtained each pollution rate is constituted into probability set Close, compare the probability of each pollution rate in the Making by Probability Sets, the corresponding pollution rate of maximum probability is defined as to the contaminant water of sample It is flat, realize the accurate detection to the sample contamination level of high-flux sequence data, it is to avoid sample contamination is carried out before sequencing The difficulty and uncertainty brought are controlled, the accuracy rate of post analysis result is greatly improved;Also, according to the contaminant water of sample The confidence level of flat assay data, helps to exclude due to error detection result caused by sample contamination, improves high pass and measures The reliability and precision of sequence data processing, enabling more accurately determine hereditary disease gene mutation using high-flux sequence, False positive site is effectively excluded, so that high-flux sequence is preferably used for into scientific research and clinical practice.
Embodiment three:
Fig. 3 is a kind of structural representation for detection means that the embodiment of the present invention three is provided, and for convenience of description, is only shown The part related to the embodiment of the present invention.A kind of detection means of Fig. 3 examples can be the detection that previous embodiment one is provided The executive agent of method.A kind of detection means of Fig. 3 examples includes:Acquisition module 31, statistical module 32 and determining module 33, respectively Functional module describes in detail as follows:
Acquisition module 31, the result data that high-flux sequence is obtained is carried out to sample for obtaining;
Statistical module 32, for the result data obtained according to acquisition module 31, using bayes method statistical sample The Making by Probability Sets of pollution rate;
Determining module 33, for counting the level of pollution that obtained Making by Probability Sets determines sample according to statistical module 32.
Each module realizes the process of respective function in a kind of detection means that the present embodiment is provided, and specifically refers to earlier figures The description of 1 illustrated embodiment, here is omitted.
It was found from a kind of detection means of above-mentioned Fig. 3 examples, in the present embodiment, obtain and sample progress high-flux sequence is obtained The result data arrived, the Making by Probability Sets of the pollution rate of bayes method statistical sample is used according to result data, and general according to this Rate set determines the level of pollution of sample, realizes the accurate detection to the sample contamination level of high-flux sequence data, it is to avoid Difficulty and uncertainty that sample contamination control is brought are carried out before sequencing, the accurate of post analysis result is greatly improved Rate.
Example IV:
Fig. 4 is a kind of structural representation for detection means that the embodiment of the present invention four is provided, and for convenience of description, is only shown The part related to the embodiment of the present invention.A kind of detection means of Fig. 4 examples can be the detection that previous embodiment one is provided The executive agent of method.A kind of detection means of Fig. 4 examples includes:Acquisition module 41, statistical module 42 and determining module 43, respectively Functional module describes in detail as follows:
Acquisition module 41, the result data that high-flux sequence is obtained is carried out to sample for obtaining;
Statistical module 42, for the result data obtained according to acquisition module 41, using bayes method statistical sample The Making by Probability Sets of pollution rate;
Determining module 43, for counting the level of pollution that obtained Making by Probability Sets determines sample according to statistical module 42.
Further, statistical module 42 includes:
Submodule 421 is enumerated, the pollution rate for enumerating sample according to fixed step size;
Calculating sub module 422, the probability for each pollution rate that submodule 421 is obtained is enumerated for being calculated according to formula (2):
Wherein, α is to enumerate the pollution rate that submodule 421 is obtained, and F (α) is the probability of pollution rate, and S is in result data The quantity in mutational site, R is the quantity of the high-flux sequence sequence on each site, and G is sample genotype, and g is pollution gene Type, bijFor the base observed in j-th of high-flux sequence sequence in i-th of site, af is the allele of sample population Frequency, P represents probability, P (bij| g) j-th of high-flux sequence sequence in i-th of site of expression occurs in reference gene group The prior probability of base;
Submodule 423 is combined, the probability of each pollution rate for calculating sub module 422 to be obtained constitutes the probability set Close.
Further, calculating sub module 422 is additionally operable to:
Probable deviation is set to prior probability.
Further, it is determined that module 43 includes:
The probability of each pollution rate in submodule 431, the Making by Probability Sets obtained for comparison combination submodule 423 is selected, The corresponding pollution rate of maximum probability is defined as to the level of pollution of sample.
Further, the device also includes:
Inspection module 44, for determining level of pollution, the confidence level of assay data according to selection submodule 431.
Each module realizes the process of respective function in a kind of detection means that the present embodiment is provided, and specifically refers to earlier figures The description of 4 illustrated embodiments, here is omitted.
It was found from a kind of detection means of above-mentioned Fig. 4 examples, in the present embodiment, obtain and sample progress high-flux sequence is obtained The result data arrived, then enumerates the pollution rate of sample according to fixed step size, calculates each according to formula (2) to each pollution rate The probability of pollution rate, and reference gene group deviation is introduced when calculating probability, probable deviation is set to prior probability, will be calculated The probability composition Making by Probability Sets of each pollution rate arrived, compares the probability of each pollution rate in the Making by Probability Sets, by maximum probability Corresponding pollution rate is defined as the level of pollution of sample, realizes the accurate inspection to the sample contamination level of high-flux sequence data Survey, it is to avoid difficulty and uncertainty that sample contamination control is brought are carried out before sequencing, post analysis result is greatly improved Accuracy rate;Also, according to the confidence level of the level of pollution assay data of sample, help to exclude because sample contamination is led The error detection result of cause, improves the reliability and precision of high-flux sequence data processing, enabling utilize high-flux sequence More accurate determination hereditary disease gene mutation, effectively excludes false positive site, so that high-flux sequence is preferably used for into section Grind and clinical practice.
It should be understood that the size of the sequence number of each step is not meant to the priority of execution sequence, each process in above-described embodiment Execution sequence should determine that the implementation process without tackling the embodiment of the present invention constitutes any limit with its function and internal logic It is fixed.
It is apparent to those skilled in the art that, for convenience of description and succinctly, only with above-mentioned each work( Energy unit, the division progress of module are for example, in practical application, as needed can distribute above-mentioned functions by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completion The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used To be that unit is individually physically present, can also two or more units it is integrated in a unit, it is above-mentioned integrated Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.In addition, each function list Member, the specific name of module are also only to facilitate mutually differentiation, is not limited to the protection domain of the application.Said system The specific work process of middle unit, module, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Member and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, depending on the application-specific and design constraint of technical scheme.Professional and technical personnel Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed apparatus and method, others can be passed through Mode is realized.For example, system embodiment described above is only schematical, for example, the division of the module or unit, It is only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can be with With reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed Coupling each other or direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or Communication connection, can be electrical, machinery or other forms.
The unit illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized using in the form of SFU software functional unit and as independent production marketing or used When, it can be stored in a computer read/write memory medium.Understood based on such, the technical scheme of the embodiment of the present invention The part substantially contributed in other words to prior art or all or part of the technical scheme can be with software products Form embody, the computer software product is stored in a storage medium, including some instructions are to cause one Computer equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform this hair The all or part of step of each embodiment methods described of bright embodiment.And foregoing storage medium includes:USB flash disk, mobile hard disk, Read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic Dish or CD etc. are various can be with the medium of store program codes.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to foregoing reality Example is applied the present invention is described in detail, it will be understood by those within the art that:It still can be to foregoing each Technical scheme described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic;And these are changed Or replace, the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme, all should Within protection scope of the present invention.

Claims (10)

1. a kind of detection method, it is characterised in that the detection method includes:
Obtain and the result data that high-flux sequence is obtained is carried out to sample;
According to the result data, the Making by Probability Sets of the pollution rate of the sample is counted using bayes method;
The Making by Probability Sets obtained according to statistics determines the level of pollution of the sample.
2. detection method as claimed in claim 1, it is characterised in that described according to the result data, using Bayes side The Making by Probability Sets that method counts the pollution rate of the sample includes:
The pollution rate of the sample is enumerated according to fixed step size;
The probability of each pollution rate is calculated according to equation below:
<mrow> <mi>F</mi> <mrow> <mo>(</mo> <mi>&amp;alpha;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>S</mi> </munderover> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>G</mi> <mi>i</mi> </mrow> </munder> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>g</mi> <mi>i</mi> </mrow> </munder> <mo>{</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>R</mi> </munderover> <mrow> <mo>(</mo> <mo>(</mo> <mrow> <mn>1</mn> <mo>-</mo> <mi>&amp;alpha;</mi> </mrow> <mo>)</mo> <mi>P</mi> <mo>(</mo> <mrow> <msub> <mi>b</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mi>G</mi> </mrow> <mo>)</mo> <mo>+</mo> <mi>&amp;alpha;</mi> <mi>P</mi> <mo>(</mo> <mrow> <msub> <mi>b</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mi>g</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>}</mo> <mi>P</mi> <mrow> <mo>(</mo> <mi>G</mi> <mo>|</mo> <mi>a</mi> <mi>f</mi> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>g</mi> <mo>|</mo> <mi>a</mi> <mi>f</mi> <mo>)</mo> </mrow> </mrow>
Wherein, α be the pollution rate, F (α) be the pollution rate probability, S be the result data in mutational site number Amount, R is the quantity of the high-flux sequence sequence on each site, and G is sample genotype, and g is pollution genotype, bijFor The base observed in j-th of high-flux sequence sequence in i-th of site, af is the gene frequency of the sample population, P Represent probability, P (bij| g) represent the base that j-th of high-flux sequence sequence in i-th of site occurs in reference gene group Prior probability;
The probability of each pollution rate is constituted into the Making by Probability Sets.
3. detection method as claimed in claim 2, it is characterised in that the detection method also includes:
Probable deviation is set to the prior probability.
4. detection method as claimed in claim 2, it is characterised in that the Making by Probability Sets obtained according to statistics is determined The level of pollution of the sample includes:
Compare the probability of each pollution rate in the Making by Probability Sets, the corresponding pollution rate of maximum probability is defined as the sample This level of pollution.
5. the detection method as described in any one of Claims 1-4, it is characterised in that the probability set that the statistics is obtained Close after the level of pollution for determining the sample, the detection method also includes:
According to the level of pollution, the confidence level of the result data is examined.
6. a kind of detection means, it is characterised in that the detection means includes:
Acquisition module, the result data that high-flux sequence is obtained is carried out to sample for obtaining;
Statistical module, for according to the result data, the probability set of the pollution rate of the sample to be counted using bayes method Close;
Determining module, the Making by Probability Sets for being obtained according to statistics determines the level of pollution of the sample.
7. detection means as claimed in claim 6, it is characterised in that the statistical module includes:
Submodule is enumerated, the pollution rate for enumerating the sample according to fixed step size;
Calculating sub module, the probability for calculating each pollution rate according to equation below:
<mrow> <mi>F</mi> <mrow> <mo>(</mo> <mi>&amp;alpha;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>S</mi> </munderover> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>G</mi> <mi>i</mi> </mrow> </munder> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>g</mi> <mi>i</mi> </mrow> </munder> <mo>{</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>R</mi> </munderover> <mrow> <mo>(</mo> <mo>(</mo> <mrow> <mn>1</mn> <mo>-</mo> <mi>&amp;alpha;</mi> </mrow> <mo>)</mo> <mi>P</mi> <mo>(</mo> <mrow> <msub> <mi>b</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mi>G</mi> </mrow> <mo>)</mo> <mo>+</mo> <mi>&amp;alpha;</mi> <mi>P</mi> <mo>(</mo> <mrow> <msub> <mi>b</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mi>g</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>}</mo> <mi>P</mi> <mrow> <mo>(</mo> <mi>G</mi> <mo>|</mo> <mi>a</mi> <mi>f</mi> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>g</mi> <mo>|</mo> <mi>a</mi> <mi>f</mi> <mo>)</mo> </mrow> </mrow>
Wherein, α be the pollution rate, F (α) be the pollution rate probability, S be the result data in mutational site number Amount, R is the quantity of the high-flux sequence sequence on each site, and G is sample genotype, and g is pollution genotype, bijFor The base observed in j-th of high-flux sequence sequence in i-th of site, af is the gene frequency of the sample population, P Represent probability, P (bij| g) represent the base that j-th of high-flux sequence sequence in i-th of site occurs in reference gene group Prior probability;
Submodule is combined, for the probability of each pollution rate to be constituted into the Making by Probability Sets.
8. detection means as claimed in claim 7, it is characterised in that the calculating sub module is additionally operable to:
Probable deviation is set to the prior probability.
9. detection means as claimed in claim 7, it is characterised in that the determining module includes:
Select submodule, the probability for comparing each pollution rate in the Making by Probability Sets, by the corresponding dirt of maximum probability Dye rate is defined as the level of pollution of the sample.
10. the detection means as described in any one of claim 6 to 9, it is characterised in that described device also includes:
Inspection module, for according to the level of pollution, examining the confidence level of the result data.
CN201710326413.2A 2017-05-10 2017-05-10 Detection method and device Active CN107273715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710326413.2A CN107273715B (en) 2017-05-10 2017-05-10 Detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710326413.2A CN107273715B (en) 2017-05-10 2017-05-10 Detection method and device

Publications (2)

Publication Number Publication Date
CN107273715A true CN107273715A (en) 2017-10-20
CN107273715B CN107273715B (en) 2020-03-17

Family

ID=60074097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710326413.2A Active CN107273715B (en) 2017-05-10 2017-05-10 Detection method and device

Country Status (1)

Country Link
CN (1) CN107273715B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273715B (en) * 2017-05-10 2020-03-17 安吉康尔(深圳)科技有限公司 Detection method and device
CN111341383A (en) * 2020-03-17 2020-06-26 安吉康尔(深圳)科技有限公司 Method, device and storage medium for detecting copy number variation
CN117436532A (en) * 2023-12-21 2024-01-23 中用科技有限公司 Root cause analysis method for gaseous molecular pollutants in clean room

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103114150A (en) * 2013-03-11 2013-05-22 上海美吉生物医药科技有限公司 Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics
WO2015164432A1 (en) * 2014-04-21 2015-10-29 Natera, Inc. Detecting mutations and ploidy in chromosomal segments

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273715B (en) * 2017-05-10 2020-03-17 安吉康尔(深圳)科技有限公司 Detection method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103114150A (en) * 2013-03-11 2013-05-22 上海美吉生物医药科技有限公司 Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics
WO2015164432A1 (en) * 2014-04-21 2015-10-29 Natera, Inc. Detecting mutations and ploidy in chromosomal segments

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273715B (en) * 2017-05-10 2020-03-17 安吉康尔(深圳)科技有限公司 Detection method and device
CN111341383A (en) * 2020-03-17 2020-06-26 安吉康尔(深圳)科技有限公司 Method, device and storage medium for detecting copy number variation
CN111341383B (en) * 2020-03-17 2021-06-29 安吉康尔(深圳)科技有限公司 Method, device and storage medium for detecting copy number variation
CN117436532A (en) * 2023-12-21 2024-01-23 中用科技有限公司 Root cause analysis method for gaseous molecular pollutants in clean room
CN117436532B (en) * 2023-12-21 2024-03-22 中用科技有限公司 Root cause analysis method for gaseous molecular pollutants in clean room

Also Published As

Publication number Publication date
CN107273715B (en) 2020-03-17

Similar Documents

Publication Publication Date Title
EP3619653B1 (en) Deep learning-based variant classifier
WO2020014280A1 (en) DEEP LEARNING-BASED FRAMEWORK FOR IDENTIFYING SEQUENCE PATTERNS THAT CAUSE SEQUENCE-SPECIFIC ERRORS (SSEs)
EP3619712B1 (en) Deep learning-based framework for identifying sequence patterns that cause sequence-specific errors
CN109063417B (en) Genotype filling method for constructing hidden Markov chain
US20230207057A1 (en) Variant calling without a target reference genome
CN107273715A (en) A kind of detection method and device
Sesia et al. Controlling the false discovery rate in GWAS with population structure
CN115035950A (en) Genotype detection method, sample contamination detection method, apparatus, device and medium
Adam et al. Performing post-genome-wide association study analysis: overview, challenges and recommendations
US20200140925A1 (en) Methods for validation of microbiome sequence processing and differential abundance analyses via multiple bespoke spike-in mixtures
Utro et al. iXora: exact haplotype inferencing and trait association
Lijoi et al. A Bayesian nonparametric approach for comparing clustering structures in EST libraries
Zararsiz et al. Introduction to statistical methods for microRNA analysis
Weber et al. Phyolin: Identifying a linear perfect phylogeny in single-cell dna sequencing data of tumors
Yap et al. Modeling DNA base substitution in large genomic regions from two organisms
Pipes et al. A rapid phylogeny-based method for accurate community profiling of large-scale metabarcoding datasets
NL2021473B1 (en) DEEP LEARNING-BASED FRAMEWORK FOR IDENTIFYING SEQUENCE PATTERNS THAT CAUSE SEQUENCE-SPECIFIC ERRORS (SSEs)
US10964407B2 (en) Method for estimating the probe-target affinity of a DNA chip and method for manufacturing a DNA chip
Mikhaylenko Signature Topology: functional analysis of omics data
Haque Testing Global Hypotheses Using Combination Tests, with Application to Phylogenetic Inference
Aloqaily et al. Feature prioritisation on big genomic data for analysing gene-gene interactions
Taub Analysis of high-throughput biological data: some statistical problems in RNA-seq and mouse genotyping
Cao et al. Constructing genotype and phenotype network helps reveal disease heritability and phenome-wide association studies
Ramdas et al. Next-Generation Sequencing in Genetic Studies of Psychiatric Disorders
Naiman [16] Random Data Set Generation to Support Microarray Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 518000 a3803, building 11, Shenzhen Bay science and technology ecological park, No. 16, Keji South Road, community, high tech Zone, Yuehai street, Nanshan District, Shenzhen, Guangdong

Patentee after: Shenzhen Yaji Technology Co.,Ltd.

Address before: 518000 unit B, 3 / F, Shenzhen North Science and technology innovation building, No. 9, Yuexing fifth road, South District, high tech park, Yuehai street, Nanshan District, Shenzhen, Guangdong

Patentee before: AEGICARE (SHENZHEN) TECHNOLOGY CO.,LTD.