CN114058691B - Skeletal muscle early injury time prediction method based on Stacking ensemble learning - Google Patents

Skeletal muscle early injury time prediction method based on Stacking ensemble learning Download PDF

Info

Publication number
CN114058691B
CN114058691B CN202111317633.1A CN202111317633A CN114058691B CN 114058691 B CN114058691 B CN 114058691B CN 202111317633 A CN202111317633 A CN 202111317633A CN 114058691 B CN114058691 B CN 114058691B
Authority
CN
China
Prior art keywords
skeletal muscle
ensemble learning
stacking
prediction
stacking ensemble
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111317633.1A
Other languages
Chinese (zh)
Other versions
CN114058691A (en
Inventor
李娜
党丽虹
李健
冯娜
梁芯瑞
安国帅
任康
杜秋香
曹洁
靳茜茜
孙俊红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Medical University
Original Assignee
Shanxi Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Medical University filed Critical Shanxi Medical University
Priority to CN202111317633.1A priority Critical patent/CN114058691B/en
Publication of CN114058691A publication Critical patent/CN114058691A/en
Application granted granted Critical
Publication of CN114058691B publication Critical patent/CN114058691B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Abstract

The invention relates to the field of forensic medicine, in particular to a skeletal muscle early injury time prediction method based on Stacking ensemble learning, which comprises the following steps: collecting skeletal muscle samples of rats at different damage time to obtain the expression quantity of genes related to skeletal muscle damage repair; the prediction models of the three base classifiers are used for Stacking the prediction probability values of the three base classifiers to form a new feature set, and training is carried out to obtain a final Stacking ensemble learning model; and inputting the data of the unknown sample into a Stacking ensemble learning model so as to predict the damage time of the unknown sample. According to the prediction method, the prediction results of the three basic classifiers are integrated by adopting Stacking ensemble learning, and the three basic classifiers are subjected to parameter optimization through grid search and cross validation, so that the accuracy and stability of skeletal muscle early damage time inference are effectively improved.

Description

Skeletal muscle early injury time prediction method based on Stacking ensemble learning
Technical Field
The invention relates to the field of forensic medicine, in particular to a skeletal muscle early injury time prediction method based on Stacking ensemble learning.
Background
In forensic practice and research, accurate inference of injury time is a critical problem that needs to be solved urgently, and especially in early injury, since the living reaction of the body changes insignificantly, the inference of early injury time is more difficult. In general, when mechanical injury occurs to human tissue, a series of characteristic changes such as bleeding, wound, inflammatory reaction, and enzyme activity change are often formed on the surface and tissues of the body. But it is difficult to make a more accurate inference of time to injury by life alone for individuals who die immediately after injury or who survive for a shorter period of time. With the development of biological techniques, the study of estimating the damage time has been expanded from morphological indexes based on histology to molecular biological indexes for estimating the damage time by detecting proteins, mrnas, and the like. Since mRNA production is earlier than protein, changes in mRNAs during lesion repair are more favorable for early lesion time inference. The repair of skeletal muscle after injury is a complex process, wherein the repair involves the participation of multiple genes, multiple channels and multiple cells, so that the injury time is difficult to accurately infer by only using a single index, more and more students think that more indexes related to the injury time are searched and multi-index joint analysis is carried out, the error of the inference of the injury time can be reduced, and the accuracy of the inference of the injury time is improved.
With the rapid development of scientific technology, especially computer technology, machine Learning (ML), such as Support Vector Machine (SVM), random forest classifier (RF), and multilayer perceptron (MLP), has been gradually applied to the field of forensic medicine, and also provides algorithm support for inference and prediction of lesion time and multi-index union. However, in the face of various algorithms, how to select a proper machine algorithm and how to improve the accuracy rate are still difficult. The principle and the sensitivity of each algorithm to data are different, and for the same classification problem, the training error and the generalization error of the model may be different, which causes the difficulty of prediction and decision. The Stacking ensemble learning can integrate a plurality of sub-learners and compensate errors by utilizing the output of the group learner, and has higher decision performance and generalization capability compared with a single model. At present, the relevant research of the application of the Stacking ensemble learning to the damage time inference is not reported.
Disclosure of Invention
The invention provides a skeletal muscle early injury time prediction method based on Stacking ensemble learning, and aims to better integrate multiple mRNA changes to improve the accuracy of early injury time inference and the stability of prediction.
The invention is realized by the following technical scheme: a skeletal muscle early damage time prediction method based on Stacking ensemble learning comprises the following steps:
1) Collecting skeletal muscle samples of rats at different injury times, extracting total RNA in tissues, carrying out reverse transcription on the total RNA to obtain cDNA, and acquiring expression quantity data of genes related to skeletal muscle injury repair at a transcription level by using an RT-qPCR technology;
2) Selecting a support vector machine, a random forest and a multilayer perceptron as ensemble learning base classifiers, respectively establishing prediction models of the three base classifiers, stacking the prediction probability values of the three base classifiers to form a new feature set, and training the new feature set by using Logitics regression to obtain a final Stacking ensemble learning model;
3) And inputting the expression quantity data of the skeletal muscle damage repair related genes of the unknown sample into a Stacking ensemble learning model so as to predict the damage time of the unknown sample.
As a further improvement of the technical scheme of the invention, the RT-qPCR technology comprises the steps of using reference genes RPL13 and RPL32 mRNA as standardized reference, and applying 2 -△△ct The Ct value of the target gene measured by RT-qPCR is calculated by the method, and the relative expression quantity of the target gene is obtained.
As a further improvement of the technical scheme of the invention, in the step 2), according to the principle of random sampling, one part of the expression quantity data is selected as a training set, and the other part of the expression quantity data is selected as a test set.
As a further improvement of the technical scheme of the invention, the training set is 70 percent, and the test set is 30 percent.
As a further improvement of the technical means of the present invention, the method for obtaining the expression level data of the skeletal muscle injury repair-related gene of the unknown sample is the same as that in step 1).
As a further improvement of the technical scheme of the invention, in the step 1), the rat skeletal muscle sample comprises an undamaged skeletal muscle sample.
According to the prediction method, an early injury time inference prediction model is established based on skeletal muscle injury repair related gene expression data, the model integrates prediction results of three base classifiers by adopting Stacking ensemble learning, and the three base classifiers are optimized in parameters through grid search and cross validation, so that accuracy and stability of skeletal muscle early injury time inference are effectively improved.
The invention provides a new research thought and method for the damage time prediction method, and provides an algorithm model basis for the human skeletal muscle early damage time inference method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of the prediction method of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Specific examples of the technical solution of the present invention are given below.
1. Grouping of laboratory animals
56 male Sprague-Dewley rats, 6-8 weeks old and about 180-220g in body mass, were selected for this study and were provided by the Experimental animals center of Shanxi university of medicine. The rats were randomly divided into a control group and an injured group, the control group was rats with intact skeletal muscles, and the injured groups included 4h, 8h, 12h, 16h, 20h and 24h groups, each of which was 8 rats.
2. Preparation of animal model with skeletal muscle injury
After fasting for 12h, rats were anesthetized with 3% sodium pentobarbital (40 mg/kg) by intraperitoneal injection. A500 g gravity hammer is adopted to strike the skeletal muscle of the right hind limb of the rat by freely falling from the height of 30cm in the plastic sleeve, so as to form a skeletal muscle injury model of the rat. Then, sufficient grain and water were given for continued feeding, and the rats were sacrificed at 4h, 8h, 12h, 16h, 20h and 24h post-injury by intraperitoneal injection of a lethal dose of sodium pentobarbital, and the rats of the control group were sacrificed using the same method. Taking 100mg of muscle tissue at the center of a skeletal muscle injury area of a right hind limb of a rat in an injury group, taking 100mg of muscle tissue in the same area corresponding to the injury area of the right hind limb of the rat in a control group, averagely dividing the muscle tissue into two parts, respectively wrapping the two parts by using tinfoil, quickly freezing the two parts in liquid nitrogen, and then placing the two parts in a refrigerator at the temperature of 80 ℃ below zero for later use.
3. Total RNA extraction and quality control of samples
Total RNA from skeletal muscle was extracted by TRIzol method, and the purity and concentration of total RNA were measured by Infinite M200 Pro microplate reader, and RNA with absorbance OD260/280 between 1.8-2.2 was used for subsequent experiments. Total RNA integrity was measured using Agilent RNA 6000Nano kit and Agilent 2100 (Agilent Technologies, USA), and samples with RNA Integer Number (RIN) values greater than 7.0 were considered to have better integrity and could be used in subsequent experiments.
Synthesis of cDNA by reverse transcription of RNA
Total RNA after completion of detection was PrimeScript TM RT Master Mix (Perfect Real Time) kit (TaKaRa company) is used for reverse transcription into cDNA, and the specific steps are as follows: within the clean bench, primeScript is used TM RT Master Mix testThe kit was prepared with 10. Mu.l reverse transcription system: 2 μ l 5 XPrimeScript TM RT Master Mix,400ng Total RNA, make up the system to 10. Mu.l with RNase Free water. And (3) placing the prepared reverse transcription reaction system in a thermal cycler T-1 type, setting the reaction conditions to be 15min at 37 ℃ and 15s at 85 ℃, and finishing the RNA reverse transcription. The cDNA obtained after reverse transcription was subpackaged and stored at-20 ℃ for subsequent experiments.
RT-qPCR detection of the relative expression level of the target mRNA
Acquiring all sequences of target detection genes from GenBank, and then acquiring the positions of introns on the corresponding gene sequences by using BLAT function of UCSC; primers and probes of reference genes RPL13 and RPL32 and target genes are designed by using Allole ID 6.0 software, so that the sequence between the related upstream and downstream primers spans the position of an intron to avoid the interference of genomic DNA in amplification; the primers and probes for the reference genes RPL13 and RPL32 and the target gene were synthesized by Shanghai Biotechnology, inc., and the sequences and amplification efficiencies of the primers and probes were shown in Table 1.
TABLE 1 primer and probe sequences for reference and target genes
Figure BDA0003344114400000041
Figure BDA0003344114400000051
Figure BDA0003344114400000061
Premix Ex Taq was used TM The instructions of the (Probe qPCR) kit (TaKaRa company) are used for preparing a composite amplification (4 genes in total, including 2 reference genes and 2 target genes) reaction system as follows: 12.5. Mu.L of Taq DNA polymerase, 0.5. Mu.L of the forward primer, the reverse primer and the fluorescent probe (8 primers and 4 probes in total), 10% DMSO 2.0. Mu.L, 1.5. Mu.L of cDNA, 3. Mu.L of RNase Free Water. Using Bio-Rad CFX384Touch TM The fluorescent quantitative PCR detection system (BIO-RAD, USA) performs reverse transcription real-time fluorescent quantitative PCR, and each sample is repeated three times. The reaction conditions set in this study were: pre-denaturation at 95 ℃ for 30s, denaturation at 95 ℃ for 5s, annealing and extension at 60 ℃ for 40s, 40 cycles in total, and fluorescence signals are collected at the end of each cycle. The expression levels of RPL13 and RPL32 mRNA were used as normalization parameters, application 2 -△△ct The Ct value of the target gene measured by RT-qPCR is calculated by the method, and the relative expression quantity of 9 target genes is obtained.
6. Constructing a damage time Stacking prediction model
The Stacking prediction model is formed by overlapping two layers of models, a random forest, a support vector machine and a multilayer perceptron model are used in the first layer of base classifier, and the second layer adopts Logitics regression to stack the prediction probability values of the three base classifiers to form a new feature set for training to obtain a final integrated model, and the method specifically comprises the following steps:
(1) According to the principle of random sampling, 70% of data set is selected as training set, 30% is selected as testing set
(2) Selecting a support vector machine, a random forest and a multilayer perceptron as ensemble learning base classifiers, and respectively establishing prediction models of the three base classifiers, wherein the specific method comprises the following steps:
1) A Support Vector Machine (SVM) prediction model is established, and the specific method comprises the following steps: bringing training set data into a support vector machine model for training, screening after grid search and cross validation to obtain the optimal hyperparameter of the SVM model, wherein the punishment parameter c of the important parameter is 1, the kernel function (kernel) is 'rbf' and the kernel function parameter (gamma) is 1, and establishing the support vector machine model by utilizing the optimal hyperparameter;
2) For the Random Forest (RF) model, parameters are optimized by network searching and cross validation, wherein the number of basic decision trees is 400, the maximum depth of each basic decision tree model is 80, bootstrap (Boolean value) is True, namely, a sampling method bootstrap sampling is used to generate training data of the decision trees, and the random forest classification model is built by using the searched optimal parameter combination based on the training set data.
3) The training set is brought into a multi-layer perceptron (MLP) classifier, grid search and cross validation are applied to optimize two important parameters of the multi-layer perceptron, namely hidden layer number (hidden _ layer _ sizes) and an optimization mode (solution) to obtain an optimal hyper-parameter, wherein the hidden layers are three layers, the number of the hidden layers of each layer is respectively 64, 128 and 256, the optimization mode is adam, and the multi-layer perceptron model is constructed by utilizing the optimal hyper-parameter.
(3) And establishing an ensemble learning Stacking model, stacking the predicted probability values of the three base classifiers to form a new feature set, and training the new feature set by using Logitics regression to obtain a final Stacking ensemble model.
(4) And (3) testing unknown data, substituting the test set randomly divided in the step (1) into a trained Stacking integrated model and three base classifiers, and evaluating the model by adopting indexes such as ROC (rock characteristic) curves, AUC (AUC) area and accuracy (Table 2). The result shows that the MLP classification model has the highest accuracy among the three classification models established on the basis of the optimal hyper-parameter, the test accuracy is 88.24%, the area under the curve AUC value is 0.98, the prediction accuracy of the integrated Stacking integration model can reach 94.12%, the prediction effect is better than that of a single model, the Stacking integration model is also shown to integrate the advantages of the three classification models, the method is more stable and reliable, the accuracy and the coverage capability of the classification identification of the damage time are more comprehensive and balanced, and the reliability and the prediction power are higher.
TABLE 2 comparison of results of SVM, RF and MLP classification models with Stacking integration model
Figure BDA0003344114400000081
7. Knowing sample preparation, detection and result inference
And (3) carrying out sample detection on a rat skeletal muscle sample to be detected according to the steps 1-5, obtaining the relative expression quantity of 9 target genes of the sample, introducing the relative expression quantity of the 9 target genes into a Stacking integrated model for prediction, and deducing the rat skeletal muscle damage time.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
<110> Shanxi university of medical science
<120> skeletal muscle early injury time prediction method based on Stacking ensemble learning
<160>33
<210>1
<211>22
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of RPL13
<400>1
TCGTGAGGTGCCCTACAGTTAG
<210>2
<211>23
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of RPL13
<400>2
CACACCAAGGTCCGGGCTGGCAG
<210>3
<211>21
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of RPL13
<400>3
GGTGCGTGCCATTTTCTTGTG
<210>4
<211>22
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of RPL32
<400>4
ATCTGGCCCTTGAATCTTCTCC
<210>5
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of RPL32
<400>5
TGTCGATGCCTCTGGGTTTCCGCC
<210>6
<211>23
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of RPL32
<400>6
AGAGGACCAAGAAGTTCATCAGG
<210>7
<211>21
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of Rae1
<400>7
AAGCTGAAGACCTCAGAGCAG
<210>8
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Rae1
<400>8
CCGTGGCAGCGTGTGGCTTCAACC
<210>9
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of Rae1
<400>9
TTATAAAACTCATGCCCCTTGGAC
<210>10
<211>20
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of Ier3
<400>10
CGTGCGTCCGAACACTTCTC
<210>11
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Ier3
<400>11
CGAAAACGCAGCCGACGGGTGCTC
<210>12
<211>21
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of Ier3
<400>12
AATGTTGGGTTCCTCGGTTGG
<210>13
<211>23
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of Leprot
<400>13
GGGATTGTTGTTTCTGCCTTTGG
<210>14
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Leprot
<400>14
TGCCAGCCAGCACAAGACCACAGG
<210>15
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of Leprot
<400>15
GCCTTGGATCGTGAGGAAAATAAC
<210>16
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> -F5 of impact
<400>16
AAGGTTCTTGCCAAGTTGTATGAG
<210>17
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of impact
<400>17
TCGCCAGTGCCACCCACAACATCT
<210>18
<211>22
<212>DNA
<213> Artificial sequence
<220>
<223> -R5 of impact
<400>18
GCTGTTTCTCCATCATCTTCGG
<210>19
<211>21
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of Asb5
<400>19
GGTCGTCTTCTTGCTCTGAGG
<210>20
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Asb5
<400>20
CCACATGGTCACCCAGGCAGGCTT
<210>21
<211>20
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of Asb5
<400>21
TCCAGCTTCCAGGAGAGTCC
<210>22
<211>22
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of Sc65
<400>22
GGAGATGAGTCCCTCACTGATC
<210>23
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Sc65
<400>23
CCGCTCCATGTGTTCTGTGCTGCT
<210>24
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of Sc65
<400>24
AGCAAAGACGGTCATATAATCAGC
<210>25
<211>20
<212>DNA
<213> Artificial sequence
<220>
<223> -F5 of Myg1
<400>25
ACCTCGCAACAACCTCATGG
<210>26
<211>23
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Myg1
<400>26
CGAATCGGGACGCACAACGGCAC
<210>27
<211>20
<212>DNA
<213> Artificial sequence
<220>
<223> -R5 of Myg1
<400>27
CCGAGTCCGCACAATCTCTG
<210>28
<211>20
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of Dennd5a
<400>28
TACCATCCGTCAGCCCAAAC
<210>29
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Dennd5a
<400>29
CCTGTCTCCCTCGGTCATTGCCCA
<210>30
<211>22
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of Dennd5a
<400>30
CCCATCTTCTCTACCAGCATCC
<210>31
<211>21
<212>DNA
<213> Artificial sequence
<220>
<223> F5 of Slfn3/4
<400>31
AAAGGCCCTCTTCAGTCAAGC
<210>32
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> P5 of Slfn3/4
<400>32
CTGCCACACAGTCCCCGTAGCTGC
<210>33
<211>21
<212>DNA
<213> Artificial sequence
<220>
<223> R5 of Slfn3/4
<400>33
TGAGAACAGTTTCCCGCAGAG

Claims (5)

1. A skeletal muscle early damage time prediction method based on Stacking ensemble learning is characterized by comprising the following steps:
1) Collecting skeletal muscle samples of rats at different damage time, extracting total RNA in tissues, carrying out reverse transcription on the total RNA to obtain cDNA (complementary deoxyribonucleic acid), and obtaining expression quantity data of skeletal muscle damage repair related genes at a transcription level by utilizing an RT-qPCR (reverse transcription-quantitative polymerase chain reaction) technology, wherein the skeletal muscle damage repair related genes are Rae1, ier3, leprot, impact, asb5, sc65, myg1, dennd5a and Slfn3/4;
2) Selecting a support vector machine, a random forest and a multilayer perceptron as ensemble learning base classifiers, respectively establishing prediction models of the three base classifiers, optimizing the parameters by utilizing network search and cross validation to obtain optimal parameters, stacking the prediction probability values of the three base classifiers to form a new feature set, and training the new feature set by adopting Logitics regression to obtain a final Stacking ensemble learning model;
3) Inputting the expression quantity data of skeletal muscle injury repair related genes of an unknown sample into a Stacking ensemble learning model so as to predict the injury time of the unknown sample;
the method is aimed at non-disease diagnosis.
2. The method for predicting the early skeletal muscle damage time based on Stacking ensemble learning as claimed in claim 1, wherein the RT-qPCR technology comprises using reference genes RPL13 and RPL32 mRNA as standardized reference, and applying 2 -△△ct The Ct value of the target gene measured by RT-qPCR is calculated by the method, and the relative expression quantity of the target gene is obtained.
3. The method for predicting the early skeletal muscle damage time based on Stacking ensemble learning as claimed in claim 1, wherein in step 2), one part of the expression quantity data is selected as a training set and the other part is selected as a testing set according to a random sampling principle.
4. The method for predicting the early skeletal muscle injury time based on Stacking ensemble learning as claimed in claim 3, wherein the training set is 70% and the testing set is 30%.
5. The method for predicting the early skeletal muscle damage time based on Stacking ensemble learning according to claim 1, wherein the method for acquiring the expression level data of the skeletal muscle damage repair-related gene of the unknown sample is the same as that in step 1).
CN202111317633.1A 2021-11-09 2021-11-09 Skeletal muscle early injury time prediction method based on Stacking ensemble learning Active CN114058691B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111317633.1A CN114058691B (en) 2021-11-09 2021-11-09 Skeletal muscle early injury time prediction method based on Stacking ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111317633.1A CN114058691B (en) 2021-11-09 2021-11-09 Skeletal muscle early injury time prediction method based on Stacking ensemble learning

Publications (2)

Publication Number Publication Date
CN114058691A CN114058691A (en) 2022-02-18
CN114058691B true CN114058691B (en) 2023-02-03

Family

ID=80274365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111317633.1A Active CN114058691B (en) 2021-11-09 2021-11-09 Skeletal muscle early injury time prediction method based on Stacking ensemble learning

Country Status (1)

Country Link
CN (1) CN114058691B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925716B (en) * 2022-03-22 2023-08-25 西南交通大学 Carbon fiber composite material damage positioning method based on ensemble learning algorithm
CN115281662B (en) * 2022-09-26 2023-01-17 北京科技大学 Intelligent auxiliary diagnosis system for instable chronic ankle joints

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113189249A (en) * 2021-06-07 2021-07-30 山西医科大学 Method for deducing death time of rat based on UPLC-MS technology

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020163314A1 (en) * 2019-02-05 2020-08-13 Smith & Nephew, Inc. Algorithm-based optimization for knee arthroplasty procedures

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113189249A (en) * 2021-06-07 2021-07-30 山西医科大学 Method for deducing death time of rat based on UPLC-MS technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A novel stacking technique for prediction of diabetes;Kalagotla SK等;《Comput Biol Med》;20210608;第4页左栏最后一段,图1-2 *
基于多种数学模型推断损伤时间的比较和优化的初步研究;张艺博;《中国优秀博硕士学位论文全文数据库(硕士)医药卫生科技辑》;20200915(第9期);摘要,第1.3节,第24页第1段 *

Also Published As

Publication number Publication date
CN114058691A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN114058691B (en) Skeletal muscle early injury time prediction method based on Stacking ensemble learning
US20230203573A1 (en) Methods for detection of donor-derived cell-free dna
Hamilton et al. An evaluation of sampling effects on multiple DNA barcoding methods leads to an integrative approach for delimiting species: a case study of the North American tarantula genus Aphonopelma (Araneae, Mygalomorphae, Theraphosidae)
CN105339797B (en) Prognosis prediction diagnosis gene marker of early-stage breast cancer and application thereof
CN107619857B (en) Method for detecting CNV (CNV) marker of beef cattle KLF8 gene and application of CNV marker
JP2012529914A5 (en)
CN111778326B (en) Gene marker combination for endometrial receptivity assessment and application thereof
CN104046624B (en) Gene and application thereof for lung cancer for prognosis
Huang et al. A nonlethal sampling method to obtain, generate and assemble whole blood transcriptomes from small, wild mammals
CN111471775B (en) Specific DNA fragment SSM2 for sturgeon gender identification and application
CN102409099A (en) Method for analyzing difference of gene expression of porcine mammary gland tissue by sequencing technology
CN112391479A (en) Nanyang black pig fat deposition character key gene mining method based on multiomics
CN105243296A (en) Tumor feature gene selection method combining mRNA and microRNA expression profile chips
CN106609300B (en) Coronary artery disease risk assessment kit and risk assessment method
CN113066527A (en) Target prediction method and system for siRNA knockdown of mRNA
CN113444793B (en) Kit for detecting lung adenocarcinoma antioxidant stress pathway related gene mutation
CN104789684B (en) A kind of kit and application method for embryonic development quality evaluation
CN113862351B (en) Kit and method for identifying extracellular RNA biomarkers in body fluid sample
CN112609003A (en) Composition and kit for identifying benign and malignant thyroid nodules and application of composition and kit
CN111944900A (en) Characteristic lincRNA expression profile combination and early endometrial cancer prediction method
CN111733251A (en) Characteristic miRNA expression profile combination and early prediction method of renal clear cell carcinoma
CN111944902A (en) Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics
CN110770849A (en) PRAEGNANT prognostic indicators of poor outcome in a group of metastatic breast cancers
TW201512404A (en) Genetic marker and method for prediction of breast cancer recurrence
CN110592204A (en) Serum miRNA combination as molecular marker for evaluating non-obstructive azoospermia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant