CN115831224B - Method and device for predicting probiotics potential of microorganism - Google Patents

Method and device for predicting probiotics potential of microorganism Download PDF

Info

Publication number
CN115831224B
CN115831224B CN202211397170.9A CN202211397170A CN115831224B CN 115831224 B CN115831224 B CN 115831224B CN 202211397170 A CN202211397170 A CN 202211397170A CN 115831224 B CN115831224 B CN 115831224B
Authority
CN
China
Prior art keywords
sub
genome
historical
sequence
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211397170.9A
Other languages
Chinese (zh)
Other versions
CN115831224A (en
Inventor
左永春
李海成
孙宇
郭树春
赵小庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University
Original Assignee
Inner Mongolia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University filed Critical Inner Mongolia University
Priority to CN202211397170.9A priority Critical patent/CN115831224B/en
Publication of CN115831224A publication Critical patent/CN115831224A/en
Application granted granted Critical
Publication of CN115831224B publication Critical patent/CN115831224B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention provides a method and a device for predicting the probiotics potential of microorganisms, wherein the method comprises the following steps: determining a sample genome sequence corresponding to the microorganism, wherein the sample genome sequence is obtained by high-throughput sequencing based on the genome DNA of the microorganism sample; based on a sample genome sequence, determining a sub-fragment set corresponding to the sub-genome sequence and the abundance of the sub-fragment set by using a k-mer algorithm, wherein the sample genome sequence comprises a plurality of sub-genome sequences, and the abundance of the sub-fragment set is used for representing the abundance distribution condition of the sub-genome sequences in the sample genome sequence; predicting the sub-fragment set and the abundance of the sub-fragment set by using a support vector machine model to obtain a result of the probiotics potential of the microorganism, wherein the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence. The embodiment of the invention can improve the efficiency of acquiring the probiotics potential of the microorganism and simultaneously can ensure the precision of the probiotics potential of the microorganism.

Description

Method and device for predicting probiotics potential of microorganism
Technical Field
The invention relates to the technical field of gene sequencing, in particular to a method and a device for predicting the probiotics potential of microorganisms.
Background
Genomic research based on high-throughput sequencing is one of the recent research hotspots. The biological information analysis means of genome are gradually perfecting and maturing, greatly promoting the development of genome research, and particularly achieving remarkable results in the aspects of inheritance and evolution, gene discovery, related research of human diseases and the like.
In the process of developing samples of microorganisms, the types and the number of organisms are numerous, and in the process of acquiring samples of microorganisms and excavating the probiotics potential of the microorganisms, a complex experimental verification process is required, and in the experimental process, a longer experimental period and labor cost are required. For this reason, it is highly desirable to provide a method for predicting the probiotic potential of microorganisms to increase the efficiency of obtaining the probiotic potential of microorganisms.
Disclosure of Invention
The invention aims to provide a method and a device for predicting the probiotics potential of microorganisms, which can improve the efficiency of acquiring the probiotics potential of the microorganisms and ensure the precision of the probiotics potential of the microorganisms.
To achieve the above object, in a first aspect, the present invention provides a method for predicting the probiotic potential of a microorganism, comprising:
determining a sample genome sequence corresponding to the microorganism, wherein the sample genome sequence is obtained by high-throughput sequencing based on the genome DNA of the microorganism sample;
Determining a sub-fragment set corresponding to the sub-genome sequence and the abundance of the sub-fragment set by using a k-mer algorithm based on the sample genome sequence, wherein the sample genome sequence comprises a plurality of sub-genome sequences, and the abundance of the sub-fragment set is used for representing the abundance distribution condition of the sub-genome sequences in the sample genome sequence;
and predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model to obtain a result of the probiotics potential of the microorganism, wherein the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence.
Optionally, before the predicting the sub-segment set and the abundance of the sub-segment set by using the support vector machine model, the method further includes:
Training an initial support vector machine model to obtain a support vector machine model, wherein the initial support vector machine model is a model which is not trained yet.
Optionally, the training the initial support vector machine model includes:
Determining a historical genome training sequence;
an initial support vector machine model is trained based on the historical genomic sequence.
Optionally, the determining the historical genome training sequence includes:
acquiring a historical genome sequence;
carrying out k-mer calculation on the historical genome sequence to obtain a historical sub-fragment set and the abundance of the historical sub-fragment set;
And carrying out normalization processing on the historical sub-fragment set and the abundance of the historical sub-fragment set, and carrying out feature screening to obtain a historical genome training sequence.
Optionally, the determining the sample genome sequence corresponding to the microorganism comprises
Performing high-throughput sequencing on the genome DNA of the microbial sample based on the microorganism to obtain a sequencing result;
And comparing the sequencing result with a reference genome to obtain a sample genome sequence.
In a second aspect, the present invention provides an apparatus for predicting the probiotic potential of a microorganism, comprising:
The sample genome sequence determining module is used for determining a sample genome sequence corresponding to the microorganism, and the sample genome sequence is obtained by high-throughput sequencing based on the sample genome DNA of the microorganism;
The sub-segment extraction module is used for determining a sub-segment set corresponding to the sub-genome sequence and the abundance of the sub-segment set by using a k-mer algorithm based on the sample genome sequence, wherein the sample genome sequence comprises a plurality of sub-genome sequences, and the abundance of the sub-segment set is used for representing the abundance distribution condition of the sub-genome sequence in the sample genome sequence;
The prediction module is used for predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model to obtain a result of the probiotics potential of the microorganism, and the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence.
Optionally, the method further comprises:
the training module is used for training an initial support vector machine model to obtain a support vector machine model, wherein the initial support vector machine model is a model which is not trained yet.
Optionally, the training module is configured to train an initial support vector machine model, and includes:
Determining a historical genome training sequence;
an initial support vector machine model is trained based on the historical genomic sequence.
Optionally, the training module is configured to determine a historical genomic training sequence, including:
acquiring a historical genome sequence;
carrying out k-mer calculation on the historical genome sequence to obtain a historical sub-fragment set and the abundance of the historical sub-fragment set;
And carrying out normalization processing on the historical sub-fragment set and the abundance of the historical sub-fragment set, and carrying out feature screening to obtain a historical genome training sequence.
Optionally, the sample genome sequence determining module is configured to determine a sample genome sequence corresponding to a microorganism, and includes:
Performing high-throughput sequencing on the genome DNA of the microbial sample based on the microorganism to obtain a sequencing result; and comparing the sequencing result with a reference genome to obtain a sample genome sequence.
Based on the above, the invention provides a method for predicting the probiotic potential of microorganisms and a device thereof, comprising the following steps: determining a sample genome sequence corresponding to the microorganism, wherein the sample genome sequence is obtained by high-throughput sequencing based on the genome DNA of the microorganism sample; determining a sub-fragment set corresponding to the sub-genome sequence and the abundance of the sub-fragment set by using a k-mer algorithm based on the sample genome sequence, wherein the sample genome sequence comprises a plurality of sub-genome sequences, and the abundance of the sub-fragment set is used for representing the abundance distribution condition of the sub-genome sequences in the sample genome sequence; and predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model to obtain a result of the probiotics potential of the microorganism, wherein the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence. According to the method, the sample genome sequence is screened out by utilizing the gene sequence, the step of manually screening microorganisms is replaced, the microorganism selection process is shortened, the prediction is realized by utilizing the vector machine model, the efficiency of acquiring the probiotic potential of the microorganisms is improved, and the prediction precision of the probiotic potential of the microorganisms can be improved due to the fact that the vector machine model is finished through pre-training. Therefore, the embodiment of the invention can improve the efficiency of acquiring the probiotics potential of the microorganism and simultaneously can ensure the precision of the probiotics potential of the microorganism.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of the steps of a method for predicting the probiotic potential of a microorganism provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating steps for training an initial SVM model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating steps for determining a historical genome training sequence according to an embodiment of the present invention;
FIG. 4 is a graph of AUROC values corresponding to an initial support vector machine model in an embodiment of the present invention;
fig. 5 is an alternative block diagram of an apparatus for predicting the probiotic potential of a microorganism provided in an embodiment of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
As described in the background art, at present, the existing microorganism selection process has the technical problems of high cost and long period of screening technology, and the process of obtaining the probiotic potential of the microorganism is complex and has low efficiency. The inventor researches and discovers that different microorganisms can be effectively screened based on different gene sequences in the process of selecting the microorganisms, and therefore, the embodiment of the invention provides a method for predicting the probiotics potential of the microorganisms, firstly, a sample genome sequence corresponding to the microorganisms is determined, and the sample genome sequence is obtained by high-throughput sequencing based on the genome DNA of a sample of the microorganisms; further, based on the sample genome sequence, determining a sub-fragment set corresponding to the sub-genome sequence and the abundance of the sub-fragment set by using a k-mer algorithm; and finally, predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model to obtain a result of the probiotics potential of the microorganism, wherein the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence. In the embodiment of the invention, the screening of specific microorganisms in large-scale microorganisms can be realized through the gene sequences corresponding to the microorganisms to obtain sample genome sequences, and then the sub-fragment sets and the abundance of the sub-fragment sets obtained by processing the sample genome sequences are predicted by using a support vector machine model, so that the corresponding result of the probiotics potential can be obtained. According to the method, the sample genome sequence is screened out by utilizing the gene sequence, the step of manually screening microorganisms is replaced, the microorganism selection process is shortened, the prediction is realized by utilizing the vector machine model, the efficiency of acquiring the probiotic potential of the microorganisms is improved, and the prediction precision of the probiotic potential of the microorganisms can be improved due to the fact that the vector machine model is finished through pre-training. Therefore, the embodiment of the invention can improve the efficiency of acquiring the probiotics potential of the microorganism and simultaneously can ensure the precision of the probiotics potential of the microorganism.
FIG. 1 is a schematic diagram of the steps of a method for predicting the probiotic potential of a microorganism provided in an embodiment of the present invention.
Referring to fig. 1, the method for predicting the probiotic potential of microorganisms specifically includes:
and step S12, determining a sample genome sequence corresponding to the microorganism, wherein the sample genome sequence is obtained by high-throughput sequencing based on the sample genome DNA of the microorganism.
Alternatively, determining a sample genomic sequence corresponding to the microorganism comprises
Performing high-throughput sequencing on the genome DNA of the microbial sample based on the microorganism to obtain a sequencing result;
And comparing the sequencing result with a reference genome to obtain a sample genome sequence.
In an alternative embodiment, the microorganism is a probiotic (probiotic), which refers to an edible microorganism that is generally considered to be positively beneficial to the host (e.g., animal or human) after ingestion. Under the embodiment of the invention, the sample genome sequence corresponding to the probiotics can be in the fasta format.
In other alternative implementations of the invention, the sequencing results may also be data evaluated and summarized to provide statistics and arrangement of microorganisms.
Step S14, based on the sample genome sequence, determining a sub-fragment set corresponding to the sub-genome sequence and the abundance of the sub-fragment set by using a k-mer algorithm, wherein the sample genome sequence comprises a plurality of sub-genome sequences, and the abundance of the sub-fragment set is used for representing the abundance distribution condition of the sub-genome sequences in the sample genome sequence.
Illustratively, a k-mer algorithm is used to extract a sub-fragment set comprising k bases and its sub-fragment set abundance from the sample genomic sequence based on the sample genomic sequence. If the genome length is L and the k-mer length is set to k, then the number of subfragments in the resulting set of subfragments is: l-k+1.
The set of subfragments may be a set of k-mer fragments to predict the probiotic potential of the microorganism subsequently using a model from the set of k-mer fragments.
And S16, predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model (support vector machines, SVM) to obtain a result of the probiotics potential of the microorganism, wherein the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence.
In a further alternative implementation of the present invention, with continued reference to fig. 1, before predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model in step S16, the method further includes:
and step S15, training an initial support vector machine model to obtain a support vector machine model, wherein the initial support vector machine model is a model which is not trained yet.
Optionally, with continued reference to fig. 2, training the initial support vector machine model may specifically include:
Step S151, determining a historical genome training sequence.
The historical genome training sequence is a microbial genome sequence in the history, and the longer the sequence length in the historical genome training sequence is, the better the effect of training the initial support vector machine model is.
Optionally, the determining the historical genomic training sequence, as shown in fig. 3, may include:
step S1511, acquiring a historical genome sequence;
s1512, performing k-mer calculation on the historical genome sequence to obtain a historical sub-fragment set and the abundance of the historical sub-fragment set;
And step S1513, carrying out normalization processing on the historical sub-segment set and the abundance of the historical sub-segment set, and carrying out feature screening to obtain a historical genome training sequence.
And step S152, training an initial support vector machine model based on the historical genome sequence.
In an alternative implementation of the present invention, with continued reference to fig. 4, when the average AUROC value of the initial support vector machine model reaches 98.00%, the training process may be terminated, and at this time, the initial support vector machine model may be used as a support vector machine model, and then the initial support vector machine model after the training is completed is used to predict the probiotic potential of the microorganism.
Further, in an alternative embodiment of the present invention, the set of sub-segments is:
GTCAT,ATGAC,CATGA,TCATG,GTTCA,TGAAC,CAATTG,TGATCA,GA TCAT,ATGATC,CGTCAA,AATCGT,TCATGA,ACGATT,CATGAT,ATCATG,TA ATCG,CGATTA,GGAATC,CGAATT,AATTCG,TTCATG,GAATTC,CATGAA,GGTTCA,TGAACC,ATGAAC,GTTCAT,TCAATTG,TGATCAT,ATGATCA,AATC GTT,GGCAATT,TCATGAT,ATCATGA,TCGTCAA,TTGACGA,AATTGCC,TTC ATGA,TCATGAA,TGTCAGC,CGATTGA,TCGTCAT,GCTGACA,AACGGTT,GT CATTG,TTAATCG,TGATGAC,TTCGTCA,ATGACGA,CAATCGT,TGACGAA,AACCGTT,ACGATTG,CGTTCAA,CAATGAC,CGTCAAT,TTGACGG,TTGAACG,TAATCGT,ATTGACG,AACGGCT,ACGATTA,AATCGTC,CGGAATT,TGAAGA C,GACGATT,CGTTCTT,AAGAACG,AATTCCG,ATTGTCG,TTAACGG,AATGA CG,CGACAAT,CGTCATT,AAGACGA,ACGTCAA,ACGAATT,GGAATCA,AAT TCGT,GTCATGA,TCGGAAT,GAATTCA,ATGATCC,GGTCATT,CGAATTG,CA ATTCG,GGATCAT,CTTCATG,AATCGTG,TGAATTC,TGATCAAG,CCGATTA,GACCGTT,CCGAATT,AACGGTC,AATTCGA,TCAATTGA,CGTAATC,CGTCC AA,AATTCGG,CTTGATCA,GGAATTC,TAATCGG,ATGACGT,TCGAATT,TTG ATCAT,TGTTCAT,ACGTCTT,ATGATCAA,CATGTCA,ATTCACG,TGACATG,ATGAACC,CGGCAATT,ATCATGAT,TTGTCAGC,GGCAATTG,TGTCAGCA,AATTGCCG,GGTTCAT,TCATGATC,CAATTGCC,GGGAACA,GATCATGA,ATG GTCC,CGATTAAT,GGGTTCA,AATTGACG,ATTAATCG,GACATGT,CAGTCA TT,CCGCAATT,AATTGCGG,TGTCAGCT,CGTTAATT,AATTAACG,GGATCA AT,CCGATTAA,TTAATCGG,CAAGTCCG,CCGTTAAT,TTAAGACG,GCCATA TG,CGTTAGTC,GGTCTAT,GTGTAGA,CCCTATC,ACCCTAT,TCTACAC,CTCT ACC,GCTATAC,ACTCCTG,CTCTACA,TGTAGAG,TCTATCC,GTAGAGA,TCT CTAC,CCATAGC,GCTATGG,GCTCTAT,GAGATAG,TATCTGC,CTATCTC,CT ATCTG,GATAGAG,GGTATAG,ATAGAAC,GTTCTAT,CTATACC,CTCTATC,ATAGGG,CCCTAT,GTAGAG,AGATAGA,TCTATCT,CTCTAC,CTATGG,AGATA G,TCTATC,CTATCT,CTCTAT,GTAGA,TCTAC.
Based on the above, in the embodiment of the invention, the screening of specific microorganisms in large-scale microorganisms can be realized through the gene sequences corresponding to the microorganisms, the sample genome sequences are obtained, and then the sub-fragment sets and the abundance of the sub-fragment sets obtained by processing the sample genome sequences are predicted by using the support vector machine model, so that the corresponding result of the probiotics potential can be obtained. Therefore, the method utilizes the gene sequence to screen the sample genome sequence, replaces the step of manually screening microorganisms, further utilizes the vector machine model to realize prediction, improves the efficiency of acquiring the probiotics potential of the microorganisms, and further improves the accuracy of acquiring the probiotics potential of the microorganisms as the vector machine model is finished through pre-training.
The foregoing provides a method for predicting the probiotic potential of microorganisms according to an embodiment of the present invention, and accordingly, an embodiment of the present invention further provides an apparatus for predicting the probiotic potential of microorganisms, which is described in a relatively simple manner, since the apparatus embodiments are substantially similar to the method embodiments, and details of the relevant technical features should be found in the corresponding descriptions of the method embodiments provided above, and the following descriptions of the apparatus embodiments are merely illustrative. As shown in fig. 5, an alternative block diagram of the apparatus for predicting the probiotic potential of a microorganism provided in this embodiment comprises:
the sample genome sequence determining module 520 is configured to determine a sample genome sequence corresponding to a microorganism, where the sample genome sequence is obtained by high-throughput sequencing based on a sample genome DNA of the microorganism;
The sub-segment extraction module 540 is configured to determine a sub-segment set and a sub-segment set abundance corresponding to a sub-genome sequence by using a k-mer algorithm based on the sample genome sequence, where the sample genome sequence includes a plurality of sub-genome sequences, and the sub-segment set abundance is used to represent an abundance distribution situation of the sub-genome sequence in the sample genome sequence;
And the prediction module 560 is configured to predict the sub-segment set and the abundance of the sub-segment set by using a support vector machine model to obtain a result of the probiotic potential of the microorganism, where the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence.
Optionally, with continued reference to fig. 5, further includes:
The training module 550 is configured to train an initial support vector machine model to obtain a support vector machine model, where the initial support vector machine model is a model that has not been trained.
Optionally, the training module 550 is configured to train an initial support vector machine model, including:
Determining a historical genome training sequence;
an initial support vector machine model is trained based on the historical genomic sequence.
Optionally, the training module 550 is configured to determine a historical genomic training sequence, including:
acquiring a historical genome sequence;
carrying out k-mer calculation on the historical genome sequence to obtain a historical sub-fragment set and the abundance of the historical sub-fragment set;
And carrying out normalization processing on the historical sub-fragment set and the abundance of the historical sub-fragment set, and carrying out feature screening to obtain a historical genome training sequence.
Optionally, the sample genome sequence determining module 520 is configured to determine a sample genome sequence corresponding to a microorganism, and includes:
Performing high-throughput sequencing on the genome DNA of the microbial sample based on the microorganism to obtain a sequencing result; and comparing the sequencing result with a reference genome to obtain a sample genome sequence.
The foregoing describes several embodiments of the present invention, and the various alternatives presented by the various embodiments may be combined, cross-referenced, with each other without conflict, extending beyond what is possible embodiments, all of which are considered to be embodiments of the present invention disclosed and disclosed.
Although the embodiments of the present invention are disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims (10)

1. A method of predicting the probiotic potential of a microorganism comprising:
determining a sample genome sequence corresponding to the microorganism, wherein the sample genome sequence is obtained by high-throughput sequencing based on the genome DNA of the microorganism sample;
Determining a sub-fragment set corresponding to the sub-genome sequence and the abundance of the sub-fragment set by using a k-mer algorithm based on the sample genome sequence, wherein the sample genome sequence comprises a plurality of sub-genome sequences, and the abundance of the sub-fragment set is used for representing the abundance distribution condition of the sub-genome sequences in the sample genome sequence;
Extracting a sub-fragment set containing k bases and the abundance of the sub-fragment set from a sample genome sequence by adopting a k-mer algorithm, wherein the genome length is L, the k-mer length is set as k, and the number of sub-fragments in the generated sub-fragment set is as follows: l-k+1;
and predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model to obtain a result of the probiotics potential of the microorganism, wherein the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence.
2. The method of claim 1, wherein prior to predicting the set of subfragments and the abundance of the set of subfragments using a support vector machine model, further comprising:
Training an initial support vector machine model to obtain a support vector machine model, wherein the initial support vector machine model is a model which is not trained yet.
3. A method of predicting the probiotic potential of a microorganism according to claim 2, wherein said training an initial support vector machine model comprises:
Determining a historical genome training sequence;
An initial support vector machine model is trained based on the historical genomic training sequence.
4. A method of predicting the probiotic potential of a microorganism according to claim 3, wherein said determining a historical genomic training sequence comprises:
acquiring a historical genome sequence;
carrying out k-mer calculation on the historical genome sequence to obtain a historical sub-fragment set and the abundance of the historical sub-fragment set;
And carrying out normalization processing on the historical sub-fragment set and the abundance of the historical sub-fragment set, and carrying out feature screening to obtain a historical genome training sequence.
5. The method of claim 1, wherein determining the sample genomic sequence corresponding to the microorganism comprises
Performing high-throughput sequencing on the genome DNA of the microbial sample based on the microorganism to obtain a sequencing result;
And comparing the sequencing result with a reference genome to obtain a sample genome sequence.
6. An apparatus for predicting the probiotic potential of a microorganism, comprising:
The sample genome sequence determining module is used for determining a sample genome sequence corresponding to the microorganism, and the sample genome sequence is obtained by high-throughput sequencing based on the sample genome DNA of the microorganism;
The sub-segment extraction module is used for determining a sub-segment set corresponding to the sub-genome sequence and the abundance of the sub-segment set by using a k-mer algorithm based on the sample genome sequence, wherein the sample genome sequence comprises a plurality of sub-genome sequences, and the abundance of the sub-segment set is used for representing the abundance distribution condition of the sub-genome sequence in the sample genome sequence;
extracting a sub-fragment set containing k bases and the abundance of the sub-fragment set from a sample genome sequence by adopting a k-mer algorithm, wherein the genome length is L, the k-mer length is set as k, and the number of sub-fragments in the generated sub-fragment set is as follows: l-k+1; the prediction module is used for predicting the sub-segment set and the abundance of the sub-segment set by using a support vector machine model to obtain a result of the probiotics potential of the microorganism, and the support vector machine model is obtained by training an initial support vector machine model through a historical genome training sequence.
7. The apparatus for predicting the probiotic potential of a microorganism of claim 6, further comprising:
the training module is used for training an initial support vector machine model to obtain a support vector machine model, wherein the initial support vector machine model is a model which is not trained yet.
8. The apparatus for predicting the probiotic potential of a microorganism of claim 7, wherein the training module is configured to train an initial support vector machine model, comprising:
Determining a historical genome training sequence;
An initial support vector machine model is trained based on the historical genomic training sequence.
9. The apparatus of claim 8, wherein the training module is configured to determine a historical genomic training sequence comprising:
acquiring a historical genome sequence;
carrying out k-mer calculation on the historical genome sequence to obtain a historical sub-fragment set and the abundance of the historical sub-fragment set;
And carrying out normalization processing on the historical sub-fragment set and the abundance of the historical sub-fragment set, and carrying out feature screening to obtain a historical genome training sequence.
10. The apparatus of claim 6, wherein the sample genomic sequence determination module is configured to determine a sample genomic sequence corresponding to the microorganism, comprising:
Performing high-throughput sequencing on the genome DNA of the microbial sample based on the microorganism to obtain a sequencing result; and comparing the sequencing result with a reference genome to obtain a sample genome sequence.
CN202211397170.9A 2022-11-09 2022-11-09 Method and device for predicting probiotics potential of microorganism Active CN115831224B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211397170.9A CN115831224B (en) 2022-11-09 2022-11-09 Method and device for predicting probiotics potential of microorganism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211397170.9A CN115831224B (en) 2022-11-09 2022-11-09 Method and device for predicting probiotics potential of microorganism

Publications (2)

Publication Number Publication Date
CN115831224A CN115831224A (en) 2023-03-21
CN115831224B true CN115831224B (en) 2024-05-03

Family

ID=85527308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211397170.9A Active CN115831224B (en) 2022-11-09 2022-11-09 Method and device for predicting probiotics potential of microorganism

Country Status (1)

Country Link
CN (1) CN115831224B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955629A (en) * 2014-02-18 2014-07-30 吉林大学 Micro genome segment clustering method based on fuzzy k-mean
CN105095688A (en) * 2014-08-28 2015-11-25 吉林大学 Method for detecting bacterial communities and abundances of human intestinal metagenome
CN107480474A (en) * 2017-08-01 2017-12-15 山东师范大学 Grader modeling evaluation method of calibration and system based on gut flora abundance
WO2019126824A1 (en) * 2017-12-22 2019-06-27 Trace Genomics, Inc. Metagenomics for microbiomes
CN110462064A (en) * 2017-04-18 2019-11-15 深圳华大生命科学研究院 The method and its application of microorganism detection are carried out based on excretion body nucleic acid
WO2020038765A1 (en) * 2018-08-21 2020-02-27 Koninklijke Philips N.V. Method for the identification of organisms from sequencing data from microbial genome comparisons
KR20200027900A (en) * 2018-09-05 2020-03-13 주식회사 천랩 taxonomy profiling method for microorganism in sample
CN111816258A (en) * 2020-07-20 2020-10-23 杭州谷禾信息技术有限公司 Optimization method for accurately identifying human flora 16S rDNA high-throughput sequencing species
CN111951889A (en) * 2020-08-18 2020-11-17 安徽农业大学 Identification prediction method and system for M5C site in RNA sequence
CN113744807A (en) * 2021-11-03 2021-12-03 微岩医学科技(北京)有限公司 Macrogenomics-based pathogenic microorganism detection method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955629A (en) * 2014-02-18 2014-07-30 吉林大学 Micro genome segment clustering method based on fuzzy k-mean
CN105095688A (en) * 2014-08-28 2015-11-25 吉林大学 Method for detecting bacterial communities and abundances of human intestinal metagenome
CN110462064A (en) * 2017-04-18 2019-11-15 深圳华大生命科学研究院 The method and its application of microorganism detection are carried out based on excretion body nucleic acid
CN107480474A (en) * 2017-08-01 2017-12-15 山东师范大学 Grader modeling evaluation method of calibration and system based on gut flora abundance
WO2019126824A1 (en) * 2017-12-22 2019-06-27 Trace Genomics, Inc. Metagenomics for microbiomes
WO2020038765A1 (en) * 2018-08-21 2020-02-27 Koninklijke Philips N.V. Method for the identification of organisms from sequencing data from microbial genome comparisons
KR20200027900A (en) * 2018-09-05 2020-03-13 주식회사 천랩 taxonomy profiling method for microorganism in sample
CN111816258A (en) * 2020-07-20 2020-10-23 杭州谷禾信息技术有限公司 Optimization method for accurately identifying human flora 16S rDNA high-throughput sequencing species
CN111951889A (en) * 2020-08-18 2020-11-17 安徽农业大学 Identification prediction method and system for M5C site in RNA sequence
CN113744807A (en) * 2021-11-03 2021-12-03 微岩医学科技(北京)有限公司 Macrogenomics-based pathogenic microorganism detection method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于机器学习挖掘益生菌基因组分子标记及构建可视化筛选预测平台;孙宇;中国优秀硕士学位论文全文数据库基础科学辑;摘要以及第21-49页 *
孙宇.基于机器学习挖掘益生菌基因组分子标记及构建可视化筛选预测平台.中国优秀硕士学位论文全文数据库基础科学辑.2021,摘要以及第21-49页. *

Also Published As

Publication number Publication date
CN115831224A (en) 2023-03-21

Similar Documents

Publication Publication Date Title
Lowe et al. Transcriptomics technologies
Shin et al. Analysis of the mouse gut microbiome using full-length 16S rRNA amplicon sequencing
CN109273053B (en) High-throughput sequencing microbial data processing method
Faure et al. DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies
CN109887546B (en) Single-gene or multi-gene copy number detection system and method based on next-generation sequencing
CN111816258B (en) Optimization method for accurate identification of human flora 16S rDNA high-throughput sequencing species
CN113744807B (en) Macrogenomics-based pathogenic microorganism detection method and device
CN106295246A (en) Find the lncRNA relevant to tumor and predict its function
EP3405573A1 (en) Methods and systems for high fidelity sequencing
CN115719616B (en) Screening method and system for pathogen species specific sequences
AU2022298428A1 (en) Gene sequencing analysis method and apparatus, and storage medium and computer device
CN114974411A (en) Metagenome pathogenic microorganism genome database and construction method thereof
CN115831224B (en) Method and device for predicting probiotics potential of microorganism
CN108504750B (en) Method and system for determining flora SNP site set and application thereof
Lötstedt et al. Spatial host-microbiome sequencing
CN110970091A (en) Label quality control method and device
CN106906220A (en) A kind of COL4A5 genes of mutation and its application
CN115261499B (en) Intestinal microbial marker related to endurance and application thereof
CN115976235B (en) Identification method of Lactobacillus delbrueckii CICC6047 strain, and primer, kit and application thereof
CN109215736B (en) High-throughput detection method and application of enterovirus group
CN114317725B (en) Crohn disease biomarker, kit and screening method of biomarker
WO2020252320A1 (en) Dna methylation based high resolution characterization of microbiome using nanopore sequencing
CN110993024B (en) Method and device for establishing fetal concentration correction model and method and device for quantifying fetal concentration
JP2008161056A (en) Dna sequence analyzer and method and program for analyzing dna sequence
CN113736773A (en) Cross-species individual identification method and individual identification analysis system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant