CN110648720A - Metagenome sequencing quality control prediction evaluation method and model - Google Patents

Metagenome sequencing quality control prediction evaluation method and model Download PDF

Info

Publication number
CN110648720A
CN110648720A CN201910911574.7A CN201910911574A CN110648720A CN 110648720 A CN110648720 A CN 110648720A CN 201910911574 A CN201910911574 A CN 201910911574A CN 110648720 A CN110648720 A CN 110648720A
Authority
CN
China
Prior art keywords
sequencing
strain
length
evaluation
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910911574.7A
Other languages
Chinese (zh)
Other versions
CN110648720B (en
Inventor
许腾
刘足
李永军
王小锐
苏杭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Vision Gene Technology Co ltd
Guangzhou Weiyuan Medical Equipment Co ltd
Guangzhou Weiyuan Medical Laboratory Co ltd
Shenzhen Weiyuan Medical Technology Co ltd
Original Assignee
Guangzhou Weiyuan Gene Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weiyuan Gene Technology Co Ltd filed Critical Guangzhou Weiyuan Gene Technology Co Ltd
Priority to CN201910911574.7A priority Critical patent/CN110648720B/en
Publication of CN110648720A publication Critical patent/CN110648720A/en
Application granted granted Critical
Publication of CN110648720B publication Critical patent/CN110648720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a metagenome sequencing quality control prediction evaluation method and a metagenome sequencing quality control prediction evaluation model, and belongs to the technical field of gene detection. The method comprises the following steps: q20 threshold system evaluation flow: obtaining sequencing parameters, constructing a Q20 threshold model, inputting the sequencing parameters into the Q20 threshold model for solving, and obtaining the relation between the proportion and the accuracy of Q20; sequencing data volume threshold system evaluation flow: acquiring data parameters, constructing a sequencing data quantity threshold model, inputting the data parameters into the sequencing data quantity threshold model for solving to obtain the relation between the sequencing data quantity and the detected strain unique region; sequencing fragment length threshold system evaluation flow: acquiring strain mutation parameters, constructing a sequencing fragment length threshold model, inputting the strain mutation parameters into the sequencing fragment length threshold model for solving, and obtaining the relationship between the sequencing fragment length and the strain reduction accuracy. The method can be applied to metagenome detection as the preset quality control standard evaluation.

Description

Metagenome sequencing quality control prediction evaluation method and model
Technical Field
The invention relates to the technical field of gene detection, in particular to a metagenome sequencing quality control prediction evaluation method and a metagenome sequencing quality control prediction evaluation model.
Background
The concept of Metagenomics (Metagenomics) was first introduced in 1998, which refers to a technique for qualitative or quantitative analysis of microorganisms contained in a sample by nondifferential and nonselective sequencing of nucleic acid molecules in an environmental or biological sample using genomic techniques, with the sequencing results being compared against a database of known sequences of microorganisms. Along with the birth and development of Next-Generation Sequencing (NGS), the cost of whole genome Sequencing is reduced by ten thousand times within ten years, the metagenomic NGS (NGS) based detection has extremely comprehensive coverage on pathogens, does not need microbial culture, does not need to be preset according to experience, can detect drug-resistant mutation and virulence genes, and provides a brand-new thought and solution for clinical diagnosis and treatment of critical and unexplained infection.
However, in the conventional metagenome detection project, key quality control indexes are set, such as: q20, sequencing data amount, sequencing fragment length, etc., are usually set to a relatively common value empirically, and do not combine with the actual conditions of the project (sequencing platform, expected detection performance, etc.) to perform theoretical modeling evaluation of the system, so that the setting of the threshold lacks corresponding theoretical support and is not usually completely applicable.
In addition, when evaluating the detection performance of the metagenomic detection project, a certain amount of samples are usually accumulated, and then historical samples are reviewed for statistics, so that there is no theoretical expectation of the detection performance at the beginning of the establishment of the project, the achievable theoretical detection performance cannot be calculated according to a set threshold, and a relevant threshold cannot be set according to the desired detection performance.
Disclosure of Invention
Therefore, it is necessary to provide a metagenome sequencing quality control prediction and evaluation method and a metagenome sequencing quality control prediction and evaluation model, and by adopting the evaluation method, when a metagenome detection project is started, the detection performance of the project can be predicted according to a set threshold value of a key index, and the threshold value of the key index can be set according to the preset detection performance to be achieved.
A metagenome sequencing quality control prediction evaluation method comprises the following steps:
q20 threshold system evaluation flow: obtaining sequencing parameters in a preset sequencing process, constructing a Q20 threshold model, inputting the sequencing parameters into the Q20 threshold model for solving, and obtaining the relation between the proportion and the accuracy of Q20;
sequencing data volume threshold system evaluation flow: acquiring data parameters in a preset sequencing process, constructing a sequencing data quantity threshold model, inputting the data parameters into the sequencing data quantity threshold model for solving to obtain a relation between the sequencing data quantity and a detected strain unique area (namely a specific sequence of a strain);
sequencing fragment length threshold system evaluation flow: acquiring strain mutation parameters in a preset sequencing process, constructing a sequencing fragment length threshold model, inputting the strain mutation parameters into the sequencing fragment length threshold model for solving, and obtaining the relationship between the sequencing fragment length and the strain reduction accuracy.
In one embodiment, the Q20 threshold system evaluation flow includes:
the sequencing parameters include: sequencing platform type, sequencing data amount and sequencing fragment length;
the Q20 threshold model calculation method is as follows:
1) converting the sequencing data quantity and the sequencing fragment length into base number;
2) counting the sequencing quality value and proportional distribution of a predetermined sequencing platform;
3) converting the base number into a correct part and an incorrect part according to the error rate converted by the sequencing quality value;
4) configuring the correct part into a set formed by characters 'A' and configuring the wrong part into a set formed by characters 'B', and constructing a sampling pool;
5) randomly sampling in a sampling pool to construct a test sequence set, wherein the test sequence set consists of preset strain sequences, and the length of each strain sequence is the length of the sequencing fragment; for example: the number of the strain sequences can be calculated by 1% multiplied by 20M to obtain 200000 strains.
6) The number of characters "a" contained in the test sequence set constructed by statistical random sampling, for example: assuming that the sequence constructed by sampling is "AAAAAAAABAAAAAAA", the number of "A" s included is 18;
7) when the number of characters A in one strain sequence in the constructed test sequence set is more than or equal to a preset value, judging that the strain sequence has no influence on the result; defining the proportion of sequences which have no influence on the result in the constructed predetermined strain sequences as the accuracy;
8) and setting a Q20 proportional gradient and calculating the corresponding accuracy.
The sequencing quality value is obtained according to the data of the statistical corresponding sequencing platform in a conventional mode. It is understood that in the set of the character "a" and the set of the character "B," a "or" B "is only an alphabetic code, and other symbols can be used, only the correct part and the wrong part need to be distinguished.
In one embodiment, the number of the strain sequences is 1% of the amount of sequencing data; the Q20 proportional gradient was set at 100%, 95%, 90%, 85%, 80%, 75%, 70%. The 1% is the sequence proportion of the microorganism in the conventional batch sample obtained after statistics, and the rest is the human sequence.
In one embodiment, the sequencing data volume threshold system evaluates: the data parameters include: input quantity, joint connection efficiency, average broken length, human source proportion, target bacterium proportion in strains, strain genome size, strain unique region length, dup rate of sequencing data and human DNA molecular weight;
the sequencing data amount threshold value model calculation method comprises the following steps:
1) calculating the reads combination of the library according to the data parameters to respectively obtain the number of human sources, the number of target strain unique reads and the number of non-target strain unique reads;
2) configuring the human source reads as a set consisting of characters O, configuring the reads of non-target strain unique reads as a set consisting of characters P, configuring the target strain unique reads as a set consisting of characters Q, and constructing a sampling pool according to the respective calculated reads;
3) setting a target sequencing data volume;
4) and according to the set target sequencing data quantity, randomly sampling in the constructed sampling pool, and counting the corresponding relation between the sequencing data quantity and the probability that the number of the extracted characters Q is more than or equal to 3, namely the relation between the sequencing data quantity and the detected strain unique region.
It is understood that, in the set of characters "O", the set of characters "P" and the set of characters "Q", O "," P "or" Q "are only alphabetic symbols, and other symbols may be used, so that only different parts need to be distinguished.
In one embodiment, the target sequencing data volume is a plurality of preset gradient data volumes, and a relation curve between the data volume and the probability that the number of extracted characters Q is more than or equal to 3 is obtained through the sequencing data volume threshold model calculation.
In one embodiment, the sequencing fragment length threshold system evaluation procedure comprises a sequencing length comparison evaluation procedure and/or a lowest sequencing length evaluation procedure;
the sequencing length comparison evaluation flow comprises the following steps:
1) obtaining the mutation rate of a target sequencing strain, and setting a gradient range according to the mutation rate;
2) setting the sequencing length to be evaluated as X and Y, wherein X is more than Y, and generating a simulation sample with the length of X for each mutation rate;
3) extracting the first Y bp of the simulation sample with the length of X to generate a corresponding simulation sample with the length of Y;
4) analyzing the generated X and Y simulation samples, counting analysis results, comparing the differences of the X and Y simulation samples in terms of unique reads and strain reduction accuracy, and evaluating to obtain a comparison evaluation result;
the minimum sequencing length evaluation flow is as follows:
1) evaluating the mutation rate of the target sequencing strain, and setting a gradient range according to the mutation rate;
2) generating a plurality of simulation samples with the length of N for each mutation rate; for example: for each mutation rate, 5 50bp mock samples can be generated using masson (software for generating mock data).
3) On the basis of each simulation sample with the length of N, intercepting gradient data with the interval of M to generate a plurality of sub-simulation sample Nm sets; n is more than M; for example: based on the data of 50bp, the first 5bp, 10bp, 15bp, 20bp, 25bp, 30bp, 35bp, 40bp and 45bp were taken to generate nine additional samples, and the amount of the samples was 4 × 5 × 10 ═ 200.
4) And analyzing all the generated N simulation samples and Nm simulation samples, counting analysis results, comparing and analyzing the differences of simulation samples with different lengths in terms of unique reads and strain reduction accuracy, and evaluating to obtain the lowest sequencing length.
The target sequencing strain is a strain to be detected in the specific metagenome sequencing, and all strains which can be detected by a microorganism detection product can be selected and determined according to the specific project requirements.
In one embodiment, in step 2) of the minimum sequencing length evaluation procedure, 5 simulation samples with a length of N are generated for each mutation rate; the M is selected from: 4. 5 and 6.
The invention also discloses a metagenome sequencing quality control prediction evaluation model, which comprises the following steps:
a data input module: the method is used for obtaining sequencing parameters, data parameters and strain mutation parameters in a preset sequencing process;
a model calculation module: carrying out evaluation analysis according to the metagenome sequencing quality control prediction evaluation method;
a result output module: and the evaluation analysis result of the model calculation module is output.
In one embodiment, the model calculation module comprises: a Q20 threshold system evaluation module, a sequencing data quantity threshold system evaluation module and a sequencing fragment length threshold system evaluation module;
the Q20 threshold system evaluation module solves the problem according to the following method:
1) converting the sequencing data quantity and the sequencing fragment length into base number;
2) counting the mass value and the proportional distribution of a predetermined sequencing platform;
3) converting the base number into a correct part and an incorrect part according to the error rate converted by the sequencing quality value;
4) configuring the correct part into a set formed by characters 'A' and configuring the wrong part into a set formed by characters 'B', and constructing a sampling pool;
5) randomly sampling in a sampling pool to construct a test sequence set, wherein the test sequence set consists of preset strain sequences, and the length of each strain sequence is the length of the sequencing fragment;
6) counting the number of characters A contained in a test sequence set constructed by random sampling;
7) when the number of characters A in one strain sequence in the constructed test sequence set is more than or equal to a preset value, judging that the strain sequence has no influence on the result; defining the proportion of sequences which have no influence on the result in the constructed predetermined strain sequences as the accuracy;
8) setting a Q20 proportional gradient, and calculating the corresponding accuracy;
the sequenced data quantity threshold value evaluation module is used for solving the problems according to the following method:
1) calculating the reads combination of the library according to the data parameters to respectively obtain the number of human sources, the number of target strain unique reads and the number of non-target strain unique reads;
2) configuring the human source reads as a set consisting of characters O, configuring the reads of non-target strain unique reads as a set consisting of characters P, configuring the target strain unique reads as a set consisting of characters Q, and constructing a sampling pool according to the respective calculated reads;
3) setting a target sequencing data volume;
4) and according to the set target sequencing data quantity, randomly sampling in the constructed sampling pool, and counting the corresponding relation between the sequencing data quantity and the probability that the number of the extracted characters Q is more than or equal to 3, namely the relation between the sequencing data quantity and the detected strain unique region.
The sequencing fragment length threshold evaluation module comprises: a sequencing length comparison evaluation module and/or a lowest sequencing length evaluation module:
the sequencing length comparison evaluation module is used for solving the problems according to the following method:
1) obtaining the mutation rate of a target sequencing strain, and setting a gradient range according to the mutation rate;
2) setting the sequencing length to be evaluated as X and Y, wherein X is more than Y, and generating a simulation sample with the length of X for each mutation rate;
3) extracting the first Y bp of the simulation sample with the length of X to generate a corresponding simulation sample with the length of Y;
4) analyzing the generated X and Y simulation samples, counting analysis results, comparing the differences of the X and Y simulation samples in terms of unique reads and strain reduction accuracy, and evaluating to obtain a comparison evaluation result;
the minimum sequencing length evaluation module is used for solving the problems according to the following method:
1) evaluating the mutation rate of the target sequencing strain, and setting a gradient range according to the mutation rate;
2) generating a plurality of simulation samples with the length of N for each mutation rate;
3) on the basis of each simulation sample with the length of N, intercepting gradient data with the interval of M to generate a plurality of sub-simulation sample Nm sets; n is more than M;
4) and analyzing all the generated N simulation samples and Nm simulation samples, counting analysis results, comparing and analyzing the differences of simulation samples with different lengths in terms of unique reads and strain reduction accuracy, and evaluating to obtain the lowest sequencing length.
The invention also discloses application of the metagenome sequencing quality control prediction evaluation method in the metagenome detection as a preset quality control standard evaluation.
Compared with the prior art, the invention has the following beneficial effects:
according to the method for predicting and evaluating the quality control of the metagenome sequencing, the relation between the proportion of Q20 and the accuracy is obtained by evaluating a Q20 threshold system; evaluating a sequencing data quantity threshold system to obtain the relationship between the sequencing data quantity and the detected strain unique region (namely the specific sequence of the strain); and evaluating a sequencing fragment length threshold system to obtain the relationship between the sequencing fragment length and the strain reduction accuracy. Therefore, when the metagenome detection item is started, the detection performance of the item can be predicted according to the set threshold value of the key index, and the threshold value of the key index is set according to the preset detection performance to be achieved.
Drawings
FIG. 1 is a graph showing the relationship between the Q20 ratio and the accuracy in example 1;
FIG. 2 is a schematic representation of unique reads of the bacterial mock sample of example 1;
FIG. 3 is a diagram showing the accuracy of reduction of the bacterial species in the bacterial mock sample of example 1;
FIG. 4 is a schematic representation of unique reads for the fungal simulant sample of example 1;
FIG. 5 is a diagram showing the accuracy of strain reduction of a fungal simulant sample in example 1;
FIG. 6 is a schematic representation of unique reads for the parasite mock sample of example 1;
FIG. 7 is a graph showing the accuracy of the reduction of the species in the parasite analog sample of example 1;
FIG. 8 is a schematic view of unique reads of a minimal performance-determining bacterial mock sample of example 1;
FIG. 9 is a graph showing the recovery accuracy of the strains in the simulated samples of the lowest-performing bacteria in example 1;
FIG. 10 is a graph of the ratio of measured sample to simulated sample Q20 versus accuracy in example 3.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The programs referred to in the following embodiments, if they are programs in the R language, may be run in windows or linux environments, where running in windows requires IDE "Rgui" or "Rstudio" in the R language, and running in linux requires installation of R; if the program is perl, the program needs to run in a linux environment.
In the following embodiments, simulation data is used in a data result level, and a person is selected by software for generating the simulation data, where a specific use method is as follows:
mason illumina GCF_001729245.1_ASM172924v1_genomic.fna.fa-aNg-N 10000-sq-i-f-n 75-pmm 0.004-pmmb 0.002-pmme 0.012-sn 10000000-o GCF_001729245.1_ASM172924v1_genomic.fna.fa.fq
the above commands are a specific method for generating simulation data by using software "person", wherein the meaning of each command is as follows:
1, masson: software;
2, illumina: a sequencing platform;
3, GCF _001729245.1_ ASM172924v1_ genomic. A microbial reference genomic sequence used to generate the simulation data;
4, -aNg: allowing "N" to exist when generating the simulation data;
5, -N: the number of reads that one wants to generate (10000 pieces);
6, -sq: simulating a sequencing quality value;
7, -i: the simulation data comprises some reads description information;
8, -f: generating only forward sequences;
9, -n: a reads length (75);
10, -pmm: average sequencing error rate;
11, -pmmb: the sequencing error rate of the first base;
12, -pmme: the sequencing error rate of the last base;
13, -o: and generating a simulated fastq file.
Example 1
A metagenome sequencing quality control prediction evaluation method comprises the following steps:
first, the Q20 threshold system evaluates the flow.
And constructing a Q20 threshold model, obtaining sequencing parameters in a preset sequencing process, inputting the sequencing parameters into the Q20 threshold model for solving, and obtaining the relation between the proportion and the accuracy of Q20.
The specific process is as follows:
1. obtaining parameters
1) The model calculation function is loaded to the R environment.
source(“Q20.r”)
2) And setting parameters according to a preset sequencing process (the values corresponding to the parameters are set according to the actual condition requirements), and operating the loaded function to obtain a result.
In this embodiment, the parameter 1: the sequencing data size was set to 20M (a ═ 20)
Parameter 2: the sequencing fragment length was set to 50bp (b as 50)
Parameter 3: the sequencing platform was illumina.
2. Building models
1) Converting the sequencing data quantity (20M reads) and the sequencing fragment length (bp) into a base number, namely 20 Mx 50 bp;
2) counting the sequencing quality value and proportional distribution of a predetermined sequencing platform;
3) converting the base number into a correct part and an incorrect part according to the error rate converted by the sequencing quality value;
4) configuring the correct part into a set formed by characters 'A' and configuring the wrong part into a set formed by characters 'B', and constructing a sampling pool;
5) randomly sampling in a sampling pool to construct a test sequence set, wherein the test sequence set consists of preset strain sequences, and the length of each strain sequence is the length of the sequencing fragment; in this embodiment, the number of the strains can be calculated by 1% x 20M to obtain 200000 strains, each of which is 50 bp.
6) Counting the number of characters A contained in a test sequence set constructed by random sampling, namely the number of bases without mismatch;
7) when the number of characters A in one strain sequence in the constructed test sequence set is more than or equal to a preset value, judging that the strain sequence has no influence on the result; defining the proportion of sequences which have no influence on the result in the constructed predetermined strain sequences as the accuracy; in this embodiment, the predetermined value is 48, which is set by combining the alignment parameters of the specific process, for example, the length of the sequencing fragment is 50bp, which is therefore set to 48; it is understood that the length of the sequenced fragment is 75bp, which can be 72bp, and the length can be set according to the tolerable number of mismatches (e.g. 4%).
8) The Q20 proportional gradients were set at 100%, 95%, 90%, 85%, 80%, 75%, 70%, and the corresponding accuracy was calculated.
The model function Q20(a, b) is built as described above, and the loaded function is run.
3. Model solution
The model function is calculated, the corresponding relation between the proportion of Q20 and the accuracy is returned, which is the result of the theoretical calculation level, the result shown in FIG. 1 is obtained, the result shows that the accuracy is continuously improved along with the improvement of the proportion of Q20, and when the proportion of Q20 reaches 85%, the accuracy reaches more than 95%, and reaches an acceptable threshold.
And secondly, evaluating a sequencing data quantity threshold system.
Obtaining data parameters in a preset sequencing process, constructing a sequencing data quantity threshold model, inputting the data parameters into the sequencing data quantity threshold model for solving to obtain the relation between the sequencing data quantity and a detected strain unique area (namely a specific sequence of a strain)
The specific process is as follows:
1. obtaining parameters
1) input amount: 30 ng.
2) Joint connection efficiency: 50 percent.
3) Average length of break: 200 bp.
4) Human source proportion: 99 percent.
5) The proportion of target bacteria in the strains is as follows: 5 percent.
6) The size of the strain genome is as follows: 2M.
7) Strain unique region: 1K.
8) Dup rate of sequencing data: 5 percent.
9) Human DNA molecular weight: 3.3 pg.
It will be appreciated that the above parameters are obtained based on the sample conditions and other conditions of the intended sequencing process, and may be adjusted according to particular needs in different evaluation environments.
2. Building models
1) Calculating reads combination of the library according to the data parameters to respectively obtain human reads number (6.75 multiplied by 10)10) Number of unique reads of target Strain (1.7X 10)5) Number of unique reads of non-target bacterial species (6.8X 10)8) (ii) a The above data were calculated as follows:
a, parameter setting: input is the amount of input; species _ size ═ size of the bacterial genome; adapter _ rate is the joint connection efficiency; reads _ length is the average length of the break; unique _ size ═ strain unique region; target _ rate is the proportion of the target species in the strain.
B, calculating:
TABLE 1 parameter calculation
2) Configuring the human source reads as a set consisting of characters O, configuring the reads of non-target strain unique reads as a set consisting of characters P, configuring the target strain unique reads as a set consisting of characters Q, and constructing a sampling pool according to the respective calculated reads;
3) setting target sequencing data quantity, such as 5M,6M,7M, …,25M, gradient change; it will also be appreciated that the range of the sequencing data volume and the gradient interval value may be adjusted depending on the situation to be assessed.
4) And according to the set target sequencing data quantity, randomly sampling in the constructed sampling pool, and counting the corresponding relation between the sequencing data quantity and the probability that the number of the extracted characters Q is more than or equal to 3, namely the relation between the sequencing data quantity and the detected strain unique region.
3. Model solution
Through the above analysis procedure, the results shown in the following table were obtained.
TABLE 2. determination of the theoretical probability that the unique reads of the target strain is greater than or equal to 3
Figure BDA0002214853440000091
As can be seen from the above results, the theoretical probability of the measured target strain unique reads being greater than or equal to 3 is 100%.
And thirdly, evaluating a sequencing fragment length threshold system.
Acquiring strain mutation parameters in a preset sequencing process, constructing a sequencing fragment length threshold model, inputting the strain mutation parameters into the sequencing fragment length threshold model for solving, and obtaining the relationship between the sequencing fragment length and the strain reduction accuracy.
The sequencing fragment length threshold system evaluation flow comprises a sequencing length comparison evaluation flow and a lowest sequencing length evaluation flow.
The specific process of sequencing length comparison and evaluation is as follows:
1) obtaining the mutation rate of a target sequencing strain, and setting a gradient range (such as 0%, 1%, 2%, 3%) according to the mutation rate;
2) the length of the sequencing to be evaluated is X (75bp) and Y (50bp), and a masson (software for generating simulation data) is used for generating a simulation sample with the length of 75 for each mutation rate;
3) extracting the first 50bp of the simulation sample with the length of 75 to generate a corresponding simulation sample with the length of 50;
4) the generated 75bp and 50bp mock samples were analyzed de novo according to a predetermined sequencing procedure;
5) and (4) counting the analysis result, comparing the difference between the 75bp and 50bp simulation samples in terms of unique reads number and strain reduction accuracy, and evaluating to obtain a comparative evaluation result.
Through the above analysis process, the results shown in FIGS. 2 to 7 were obtained.
Wherein, FIG. 2 is a schematic diagram of unique reads of the bacteria mock sample, and it can be seen that 75bp unique reads are higher than 50 bpp.
When the mutation rate was 3%, the unique reads number of 75bp was less than 50bpp, which is caused by mapping rate, mismatch rate, and the like, which are parameters defining unique alignment in the detection procedure, and the same applies to the following fungal and parasitic model samples.
FIG. 3 is a schematic diagram showing the strain reduction accuracy of the bacteria simulation sample, and it can be seen that the strain reduction accuracy of 75bp is higher than 50 bp.
FIG. 4 is a schematic representation of unique reads from a fungal mock sample, from which it can be seen that the unique reads of 75bp is higher than 50 bpp.
FIG. 5 is a schematic diagram showing the recovery accuracy of the fungus strain from the fungus simulant sample, wherein it can be seen that the recovery accuracy of 75bp strain is higher than 50 bp.
FIG. 6 is a graph showing the unique reads number of the parasite mock sample, from which it can be seen that the unique reads number of 75bp was higher than 50 bpp.
FIG. 7 is a graph showing the recovery accuracy of the bacterial species in the parasite simulant sample, wherein it can be seen that the recovery accuracy of the bacterial species of 75bp is higher than 50 bp.
From the above overall results, the performance of sequencing 75bp in length is better than 50 bp.
The specific process of the minimum sequencing length evaluation is as follows:
1) evaluating the mutation rate of the target sequencing strain, and setting a gradient range (0%, 1%, 2%, 3%) according to the mutation rate;
2) 5 50bp each of the mock samples (20 samples total) were generated for each mutation rate using masson (software for generating mock data);
3) performing data extraction with different length gradients on the basis of the 50bp samples (for example, based on the 50bp data, taking the first 5bp, 10bp, 15bp, 20bp, 25bp, 30bp, 35bp, 40bp, and 45bp to generate another nine samples, wherein the sample size is 4 × 5 × 10 ═ 200);
4) according to a preset sequencing process, 200 simulation samples generated by de novo analysis are batched, analysis results are counted, differences of simulation samples with different lengths in terms of unique reads and strain reduction accuracy are compared and analyzed, and the lowest sequencing length is obtained through evaluation.
Through the above analysis process, the results shown in FIGS. 8 to 9 were obtained.
FIG. 8 is a schematic diagram of unique reads of a bacterial mock sample, and it can be seen from the diagram that the unique reads tend to decrease with decreasing fragment length, and the decrease ratio is particularly obvious from 35bp to 30 bp.
FIG. 9 is a schematic diagram showing the strain reduction accuracy of the bacteria-simulated sample, and it can be seen that the strain reduction accuracy is within a range of 0.1%.
And (4) integrating the unique reads number and the strain reduction accuracy, and judging the lowest sequencing length to be 35bp by combining the lowest performance which can be received by the project.
Example 2
A metagenome sequencing quality control prediction evaluation model comprises:
a data input module: the method is used for obtaining sequencing parameters, data parameters and strain mutation parameters in a preset sequencing process;
a model calculation module: evaluation analysis was performed according to the metagenomic sequencing quality control prediction evaluation method of example 1;
a result output module: and the evaluation analysis result of the model calculation module is output.
Example 3
And (5) verifying, testing, analyzing and comparing.
First, Q20 threshold system comparison.
Streptococcus agalactiae was diluted with primary water to 6 different concentrations: 10cfu/ml,102cfu/ml,103cfu/ml,104cfu/ml,105cfu/ml,106cfu/ml, 6 samples were obtained.
NGS sequencing and data analysis were performed following the predetermined sequencing procedure in example 1, and the relationship between the Q20 ratio and "correct ratio" was calculated for 6 samples and compared to theoretical data, with the results shown in fig. 10.
It can be seen from the figure that the theoretical values and the actual performance obtained from the above analysis substantially match, taking into account that there is a certain proportion of mutations in the actual data.
And secondly, comparing a sequencing data amount threshold system.
Continuing with the comparison using Streptococcus agalactiae set forth above, the main parameters of this group are shown in the following table.
TABLE 3 actual Streptococcus agalactiae data parameters
Figure BDA0002214853440000111
NGS sequencing and data analysis were performed according to the predetermined sequencing procedure in example 1, with the results shown in the table below.
TABLE 4 actual Streptococcus agalactiae assay results
Sample name Name of species Name of Chinese Number of reads
L1-79D Streptococcus_agalactiae Streptococcus agalactiae 43
L2-80D Streptococcus_agalactiae Streptococcus agalactiae 472
L3-81D Streptococcus_agalactiae Streptococcus agalactiae 5644
L4-82D Streptococcus_agalactiae Streptococcus agalactiae 12677
L5-83D Streptococcus_agalactiae Streptococcus agalactiae 14038
L6-84D Streptococcus_agalactiae Streptococcus agalactiae 300014
The above results indicate that the actual detection results are consistent with the predicted results of the theoretical model.
And thirdly, comparing the sequencing fragment length threshold system.
Continuing with the comparison using Streptococcus agalactiae described above, based on the actual data (which is consistent with that used for system Q20), the 75bp data was cut into the first 50bp (1-50), the middle 50bp (11-60) and the last 50bp (26-75), and the performance of 75bp and 50bp was compared from the standpoint of unique reads and the accuracy of strain reduction, and the results are shown in the following table.
TABLE 5 actual Streptococcus agalactiae test results
Figure BDA0002214853440000112
From the results of the actual data, the performance of 75bp is better than that of 50bp, and the actual detection result is consistent with the prediction result of a theoretical model.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A metagenome sequencing quality control prediction evaluation method is characterized by comprising the following steps:
q20 threshold system evaluation flow: obtaining sequencing parameters in a preset sequencing process, constructing a Q20 threshold model, inputting the sequencing parameters into the Q20 threshold model for solving, and obtaining the relation between the proportion and the accuracy of Q20;
sequencing data volume threshold system evaluation flow: acquiring data parameters in a preset sequencing process, constructing a sequencing data quantity threshold model, inputting the data parameters into the sequencing data quantity threshold model for solving to obtain the relation between the sequencing data quantity and a detected strain unique area;
sequencing fragment length threshold system evaluation flow: acquiring strain mutation parameters in a preset sequencing process, constructing a sequencing fragment length threshold model, inputting the strain mutation parameters into the sequencing fragment length threshold model for solving, and obtaining the relationship between the sequencing fragment length and the strain reduction accuracy.
2. The metagenomic sequencing quality control prediction assessment method according to claim 1, wherein in the Q20 threshold system assessment procedure:
the sequencing parameters include: sequencing platform type, sequencing data amount and sequencing fragment length;
the Q20 threshold model calculation method is as follows:
1) converting the sequencing data quantity and the sequencing fragment length into base number;
2) counting the sequencing quality value and proportional distribution of a predetermined sequencing platform;
3) converting the base number into a correct part and an incorrect part according to the error rate converted by the sequencing quality value;
4) configuring the correct part into a set formed by characters 'A' and configuring the wrong part into a set formed by characters 'B', and constructing a sampling pool;
5) randomly sampling in a sampling pool to construct a test sequence set, wherein the test sequence set consists of preset strain sequences, and the length of each strain sequence is the length of the sequencing fragment;
6) counting the number of characters A contained in a test sequence set constructed by random sampling;
7) when the number of characters A in one strain sequence in the constructed test sequence set is more than or equal to a preset value, judging that the strain sequence has no influence on the result; defining the proportion of sequences which have no influence on the result in the constructed predetermined strain sequences as the accuracy;
8) and setting a Q20 proportional gradient and calculating the corresponding accuracy.
3. The metagenome sequencing quality control prediction evaluation method according to claim 2, wherein the number of strain sequences is 1% of the sequencing data amount; the Q20 proportional gradient was set at 100%, 95%, 90%, 85%, 80%, 75%, 70%.
4. The metagenomic sequencing quality control prediction assessment method according to claim 1, wherein in the sequencing data volume threshold system assessment process: the data parameters include: input quantity, joint connection efficiency, average broken length, human source proportion, target bacterium proportion in strains, strain genome size, strain unique region length, dup rate of sequencing data and human DNA molecular weight;
the sequencing data amount threshold value model calculation method comprises the following steps:
1) calculating the reads combination of the library according to the data parameters to respectively obtain the number of human sources, the number of target strain unique reads and the number of non-target strain unique reads;
2) configuring the human source reads as a set consisting of characters O, configuring the reads of non-target strain unique reads as a set consisting of characters P, configuring the target strain unique reads as a set consisting of characters Q, and constructing a sampling pool according to the respective calculated reads;
3) setting a target sequencing data volume;
4) and according to the set target sequencing data quantity, randomly sampling in the constructed sampling pool, and counting the corresponding relation between the sequencing data quantity and the probability that the number of the extracted characters Q is more than or equal to 3, namely the relation between the sequencing data quantity and the detected strain unique region.
5. The metagenome sequencing quality control prediction evaluation method according to claim 4, wherein the target sequencing data volume is a plurality of preset gradient data volumes, and a relation curve between the data volume and the probability that the number of extracted characters Q is more than or equal to 3 is obtained through the sequencing data volume threshold model calculation.
6. The metagenomic sequencing quality control prediction evaluation method of claim 1, wherein the sequencing fragment length threshold system evaluation procedure comprises a sequencing length comparison evaluation procedure and/or a minimum sequencing length evaluation procedure;
the sequencing length comparison evaluation flow comprises the following steps:
1) obtaining the mutation rate of a target sequencing strain, and setting a gradient range according to the mutation rate;
2) setting the sequencing length to be evaluated as X and Y, wherein X is more than Y, and generating a simulation sample with the length of X for each mutation rate;
3) extracting the first Y bp of the simulation sample with the length of X to generate a corresponding simulation sample with the length of Y;
4) analyzing the generated X and Y simulation samples, counting analysis results, comparing the differences of the X and Y simulation samples in terms of unique reads and strain reduction accuracy, and evaluating to obtain a comparison evaluation result;
the minimum sequencing length evaluation flow is as follows:
1) evaluating the mutation rate of the target sequencing strain, and setting a gradient range according to the mutation rate;
2) generating a plurality of simulation samples with the length of N for each mutation rate;
3) on the basis of each simulation sample with the length of N, intercepting gradient data with the interval of M to generate a plurality of sub-simulation sample Nm sets; n is more than M;
4) and analyzing all the generated N simulation samples and Nm simulation samples, counting analysis results, comparing and analyzing the differences of simulation samples with different lengths in terms of unique reads and strain reduction accuracy, and evaluating to obtain the lowest sequencing length.
7. The method for quality control, prediction and evaluation of metagenomic sequencing according to claim 6, wherein in step 2) of the minimum sequencing length evaluation procedure, 5 simulation samples with a length of N are generated for each mutation rate; the M is selected from: 4. 5 and 6.
8. A metagenome sequencing quality control prediction evaluation model is characterized by comprising the following steps:
a data input module: the method is used for obtaining sequencing parameters, data parameters and strain mutation parameters in a preset sequencing process;
a model calculation module: performing evaluation analysis according to the metagenome sequencing quality control prediction evaluation method of any one of claims 1 to 7;
a result output module: and the evaluation analysis result of the model calculation module is output.
9. The metagenomic sequencing quality control prediction evaluation model of claim 8, wherein the model computation module comprises: a Q20 threshold system evaluation module, a sequencing data quantity threshold system evaluation module and a sequencing fragment length threshold system evaluation module;
the Q20 threshold system evaluation module solves the problem according to the following method:
1) converting the sequencing data quantity and the sequencing fragment length into base number;
2) counting the mass value and the proportional distribution of a predetermined sequencing platform;
3) converting the base number into a correct part and an incorrect part according to the error rate converted by the sequencing quality value;
4) configuring the correct part into a set formed by characters 'A' and configuring the wrong part into a set formed by characters 'B', and constructing a sampling pool;
5) randomly sampling in a sampling pool to construct a test sequence set, wherein the test sequence set consists of preset strain sequences, and the length of each strain sequence is the length of the sequencing fragment;
6) counting the number of characters 'A' contained in a test sequence set constructed by random sampling;
7) when the number of characters A in one strain sequence in the constructed test sequence set is more than or equal to a preset value, judging that the strain sequence has no influence on the result; defining the proportion of sequences which have no influence on the result in the constructed predetermined strain sequences as the accuracy;
8) setting a Q20 proportional gradient, and calculating the corresponding accuracy;
the sequenced data quantity threshold value evaluation module is used for solving the problems according to the following method:
1) calculating the reads combination of the library according to the data parameters to respectively obtain the number of human sources, the number of target strain unique reads and the number of non-target strain unique reads;
2) configuring the human source reads as a set consisting of characters O, configuring the reads of non-target strain unique reads as a set consisting of characters P, configuring the target strain unique reads as a set consisting of characters Q, and constructing a sampling pool according to the respective calculated reads;
3) setting a target sequencing data volume;
4) and according to the set target sequencing data quantity, randomly sampling in the constructed sampling pool, and counting the corresponding relation between the sequencing data quantity and the probability that the number of the extracted characters Q is more than or equal to 3, namely the relation between the sequencing data quantity and the detected strain unique region.
The sequencing fragment length threshold evaluation module comprises: a sequencing length comparison evaluation module and/or a lowest sequencing length evaluation module:
the sequencing length comparison evaluation module is used for solving the problems according to the following method:
1) obtaining the mutation rate of a target sequencing strain, and setting a gradient range according to the mutation rate;
2) setting the sequencing length to be evaluated as X and Y, wherein X is more than Y, and generating a simulation sample with the length of X for each mutation rate;
3) extracting the first Y bp of the simulation sample with the length of X to generate a corresponding simulation sample with the length of Y;
4) analyzing the generated X and Y simulation samples, counting analysis results, comparing the differences of the X and Y simulation samples in terms of unique reads and strain reduction accuracy, and evaluating to obtain a comparison evaluation result;
the minimum sequencing length evaluation module is used for solving the problems according to the following method:
1) evaluating the mutation rate of the target sequencing strain, and setting a gradient range according to the mutation rate;
2) generating a plurality of simulation samples with the length of N for each mutation rate;
3) on the basis of each simulation sample with the length of N, intercepting gradient data with the interval of M to generate a plurality of sub-simulation sample Nm sets; n is more than M;
4) and analyzing all generated N simulation samples and Nm simulation samples from the beginning, counting analysis results, comparing and analyzing the difference of simulation samples with different lengths in terms of unique reads number and strain reduction accuracy, and evaluating to obtain the lowest sequencing length.
10. The application of the metagenome sequencing quality control prediction evaluation method of any one of claims 1-7 in metagenome detection as a preset quality control standard evaluation.
CN201910911574.7A 2019-09-25 2019-09-25 Metagenome sequencing quality control prediction evaluation method and model Active CN110648720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910911574.7A CN110648720B (en) 2019-09-25 2019-09-25 Metagenome sequencing quality control prediction evaluation method and model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910911574.7A CN110648720B (en) 2019-09-25 2019-09-25 Metagenome sequencing quality control prediction evaluation method and model

Publications (2)

Publication Number Publication Date
CN110648720A true CN110648720A (en) 2020-01-03
CN110648720B CN110648720B (en) 2020-06-19

Family

ID=68992140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910911574.7A Active CN110648720B (en) 2019-09-25 2019-09-25 Metagenome sequencing quality control prediction evaluation method and model

Country Status (1)

Country Link
CN (1) CN110648720B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816245A (en) * 2020-07-20 2020-10-23 成都博欣医学检验实验室有限公司 Pathogenic microorganism detection method and system combining mNGS and clinical knowledge base
CN112102882A (en) * 2020-11-18 2020-12-18 上海鼎晶生物医药科技股份有限公司 Quality control system and method for NGS detection process of tumor sample
CN114724632A (en) * 2022-04-21 2022-07-08 内江师范学院 Method and device for evaluating genome assembly integrity

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104762402A (en) * 2015-04-21 2015-07-08 广州定康信息科技有限公司 Method for rapidly detecting human genome single base mutation and micro-insertion deletion
US20160243175A1 (en) * 2013-10-03 2016-08-25 The Trustees Of The University Of Pennsylvania Compositions and methods comprising a defined microbiome and methods of use thereof
CN107577919A (en) * 2017-08-21 2018-01-12 上海派森诺生物科技股份有限公司 A kind of grand genomic data analysis method based on high throughput sequencing technologies
CN108280325A (en) * 2017-12-08 2018-07-13 北京雅康博生物科技有限公司 Processing method, processing unit, storage medium and the processor of high-flux sequence data
CN108334750A (en) * 2018-04-19 2018-07-27 江苏先声医学诊断有限公司 A kind of macro genomic data analysis method and system
CN108949942A (en) * 2018-07-17 2018-12-07 浙江大学 A kind of mitochondria genome sequencing method based on high-flux sequence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160243175A1 (en) * 2013-10-03 2016-08-25 The Trustees Of The University Of Pennsylvania Compositions and methods comprising a defined microbiome and methods of use thereof
CN104762402A (en) * 2015-04-21 2015-07-08 广州定康信息科技有限公司 Method for rapidly detecting human genome single base mutation and micro-insertion deletion
CN107577919A (en) * 2017-08-21 2018-01-12 上海派森诺生物科技股份有限公司 A kind of grand genomic data analysis method based on high throughput sequencing technologies
CN108280325A (en) * 2017-12-08 2018-07-13 北京雅康博生物科技有限公司 Processing method, processing unit, storage medium and the processor of high-flux sequence data
CN108334750A (en) * 2018-04-19 2018-07-27 江苏先声医学诊断有限公司 A kind of macro genomic data analysis method and system
CN108949942A (en) * 2018-07-17 2018-12-07 浙江大学 A kind of mitochondria genome sequencing method based on high-flux sequence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟珍等: "一种基因序列测序数据质量控制方案", 《科研信息化技术与应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816245A (en) * 2020-07-20 2020-10-23 成都博欣医学检验实验室有限公司 Pathogenic microorganism detection method and system combining mNGS and clinical knowledge base
CN111816245B (en) * 2020-07-20 2022-09-09 成都博欣医学检验实验室有限公司 Pathogenic microorganism detection method and system combining mNGS and clinical knowledge base
CN112102882A (en) * 2020-11-18 2020-12-18 上海鼎晶生物医药科技股份有限公司 Quality control system and method for NGS detection process of tumor sample
CN112102882B (en) * 2020-11-18 2021-02-12 上海鼎晶生物医药科技股份有限公司 Quality control system and method for NGS detection process of tumor sample
CN114724632A (en) * 2022-04-21 2022-07-08 内江师范学院 Method and device for evaluating genome assembly integrity

Also Published As

Publication number Publication date
CN110648720B (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN110648720B (en) Metagenome sequencing quality control prediction evaluation method and model
Evans et al. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions
Vaishnav et al. The evolution, evolvability and engineering of gene regulatory DNA
Fan et al. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
Ramette Quantitative community fingerprinting methods for estimating the abundance of operational taxonomic units in natural microbial communities
Auer et al. Statistical design and analysis of RNA sequencing data
CN110111843B (en) Method, apparatus and storage medium for clustering nucleic acid sequences
LeDuc et al. Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics*[S]
CN105740650A (en) Method for rapidly and accurately identifying high-throughput genome data pollution sources
CN115997255A (en) Molecular techniques for predicting bacterial phenotypic traits from genome
Plotkin et al. Codon usage and selection on proteins
JP2018518725A (en) Method and apparatus for estimating the amount of microorganisms within a taxon in a sample
Bastola et al. Utilization of the relative complexity measure to construct a phylogenetic tree for fungi
CN113260710A (en) Compositions, systems, devices, and methods for validating microbiome sequence processing and differential abundance analysis by multiple custom blended mixtures
CN106326689A (en) Method and device for determining site subject to selection in colony
Basha Gutierrez et al. A genetic algorithm for motif finding based on statistical significance
CN108595914A (en) One grows tobacco mitochondrial RNA (mt RNA) editing sites high-precision forecasting method
CN116312798B (en) Metagenome sequencing data species verification method and application
CN117789823B (en) Identification method, device, storage medium and equipment of pathogen genome co-evolution mutation cluster
CN116153410B (en) Microbial genome reference database, construction method and application thereof
CN117012274B (en) Device for identifying gene deletion based on high-throughput sequencing
Wu et al. Ultrafast learning of 4-node hybridization cycles in phylogenetic networks using algebraic invariants
CN108038350B (en) Method for judging microbial community structure of stacked fermented grains by using physicochemical indexes
Holmes et al. Novel method for prediction of combinatorial phase-variable gene expression states
Siegel et al. MPRAudit Quantifies the Fraction of Variance Described by Unknown Features in Massively Parallel Reporter Assays

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201015

Address after: 510130 No. 301, building G10, South China new material innovation park, self compiled building 3, No. 31, Kefeng Road, Guangzhou high tech Industrial Development Zone, Guangdong Province

Patentee after: Guangzhou Weiyuan Medical Equipment Co.,Ltd.

Patentee after: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.

Patentee after: Guangzhou Weiyuan medical laboratory Co.,Ltd.

Patentee after: Shenzhen Weiyuan Medical Technology Co.,Ltd.

Patentee after: Weiyuan (Shenzhen) Medical Research Center Co.,Ltd.

Address before: 510130 Three South China New Materials Innovation Park G10 Building 303, No. 31 Kefeng Road, Guangzhou High-tech Industrial Development Zone, Guangdong Province

Patentee before: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Prediction and evaluation method and model of quality control of metagenome sequencing

Effective date of registration: 20220424

Granted publication date: 20200619

Pledgee: Bank of China Limited Guangzhou Development Zone Branch

Pledgor: Shenzhen Weiyuan Medical Technology Co.,Ltd.|Guangzhou Weiyuan medical laboratory Co.,Ltd.|Weiyuan (Shenzhen) Medical Research Center Co.,Ltd.|GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.|Guangzhou Weiyuan Medical Equipment Co.,Ltd.

Registration number: Y2022980004742

PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230509

Granted publication date: 20200619

Pledgee: Bank of China Limited Guangzhou Development Zone Branch

Pledgor: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.|Guangzhou Weiyuan Medical Equipment Co.,Ltd.|Guangzhou Weiyuan medical laboratory Co.,Ltd.|Shenzhen Weiyuan Medical Technology Co.,Ltd.|Weiyuan (Shenzhen) Medical Research Center Co.,Ltd.

Registration number: Y2022980004742

PC01 Cancellation of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Macrogenomic sequencing quality control prediction evaluation methods and models

Effective date of registration: 20230510

Granted publication date: 20200619

Pledgee: Bank of China Limited Guangzhou Development Zone Branch

Pledgor: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.|Guangzhou Weiyuan Medical Equipment Co.,Ltd.|Guangzhou Weiyuan medical laboratory Co.,Ltd.|Shenzhen Weiyuan Medical Technology Co.,Ltd.|Weiyuan (Shenzhen) Medical Research Center Co.,Ltd.

Registration number: Y2023980040254

PE01 Entry into force of the registration of the contract for pledge of patent right
TR01 Transfer of patent right

Effective date of registration: 20230831

Address after: Room 301, G10, South China new material innovation park, building 3, No. 31, Kefeng Road, Guangzhou hi tech Industrial Development Zone, Guangdong 510130

Patentee after: Guangzhou Weiyuan Medical Equipment Co.,Ltd.

Patentee after: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.

Patentee after: Guangzhou Weiyuan medical laboratory Co.,Ltd.

Patentee after: Shenzhen Weiyuan Medical Technology Co.,Ltd.

Address before: Room 301, G10, South China new material innovation park, building 3, No. 31, Kefeng Road, Guangzhou hi tech Industrial Development Zone, Guangdong 510130

Patentee before: Guangzhou Weiyuan Medical Equipment Co.,Ltd.

Patentee before: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.

Patentee before: Guangzhou Weiyuan medical laboratory Co.,Ltd.

Patentee before: Shenzhen Weiyuan Medical Technology Co.,Ltd.

Patentee before: Weiyuan (Shenzhen) Medical Research Center Co.,Ltd.

TR01 Transfer of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20200619

Pledgee: Bank of China Limited Guangzhou Development Zone Branch

Pledgor: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.|Guangzhou Weiyuan Medical Equipment Co.,Ltd.|Guangzhou Weiyuan medical laboratory Co.,Ltd.|Shenzhen Weiyuan Medical Technology Co.,Ltd.|Weiyuan (Shenzhen) Medical Research Center Co.,Ltd.

Registration number: Y2023980040254

PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and model for quality control prediction and evaluation of metagenomic sequencing

Granted publication date: 20200619

Pledgee: Bank of China Limited Guangzhou Development Zone Branch

Pledgor: GUANGZHOU VISION GENE TECHNOLOGY Co.,Ltd.|Guangzhou Weiyuan Medical Equipment Co.,Ltd.|Guangzhou Weiyuan medical laboratory Co.,Ltd.|Shenzhen Weiyuan Medical Technology Co.,Ltd.|Weiyuan (Shenzhen) Medical Research Center Co.,Ltd.

Registration number: Y2024980019292

PE01 Entry into force of the registration of the contract for pledge of patent right