CN116705150A - Method, device, equipment and medium for determining gene expression efficiency - Google Patents

Method, device, equipment and medium for determining gene expression efficiency Download PDF

Info

Publication number
CN116705150A
CN116705150A CN202310659118.4A CN202310659118A CN116705150A CN 116705150 A CN116705150 A CN 116705150A CN 202310659118 A CN202310659118 A CN 202310659118A CN 116705150 A CN116705150 A CN 116705150A
Authority
CN
China
Prior art keywords
efficiency
genetic material
preset
expression efficiency
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310659118.4A
Other languages
Chinese (zh)
Inventor
吴琪
杜佳伟
菅晓东
康波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Supercomputer Center In Tianjin
Original Assignee
National Supercomputer Center In Tianjin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Supercomputer Center In Tianjin filed Critical National Supercomputer Center In Tianjin
Priority to CN202310659118.4A priority Critical patent/CN116705150A/en
Publication of CN116705150A publication Critical patent/CN116705150A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The embodiment of the disclosure relates to a method, a device, equipment and a medium for determining gene expression efficiency, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: intercepting preset genetic material to obtain a genetic material fragment to be detected containing a promoter; inputting the genetic material fragments to be detected into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material. According to the embodiment of the disclosure, the genetic material fragments including the promoter are determined, and the whole genetic material expression efficiency result is determined based on the genetic material fragments, so that the determination of the expression efficiency result is realized without determining the specific position and specific type of the promoter in the genetic material, the complexity of the expression efficiency result determination process is reduced, and the method is suitable for a scene in which the position and/or type of the promoter cannot be determined, and is wide in application scene.

Description

Method, device, equipment and medium for determining gene expression efficiency
Technical Field
The disclosure relates to the technical field of artificial intelligence, and in particular relates to a method, a device, equipment and a medium for determining gene expression efficiency.
Background
Gene expression efficiency is an important parameter for gene expression prediction.
In the related art, the type and position of a promoter in a segment of deoxyribonucleic acid (DeoxyriboNucleic Acid, DNA) can be detected by a promoter detection tool according to the existing promoter data, and the gene expression efficiency of the DNA segment can be determined according to the type and position of the promoter. However, in the above method, since the process of determining the type and position of the promoter is complicated, the process of determining the expression efficiency is complicated, and in the case where the type and position of the promoter cannot be determined, the determination of the gene expression efficiency of the DNA fragment cannot be performed, the method is limited in applicable scenes.
Disclosure of Invention
In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a method, apparatus, device and medium for determining gene expression efficiency.
The embodiment of the disclosure provides a method for determining gene expression efficiency, which comprises the following steps:
intercepting preset genetic material to obtain a genetic material fragment to be detected containing a promoter;
inputting the genetic material segment to be tested into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material.
The embodiment of the disclosure also provides a device for determining gene expression efficiency, which comprises:
the intercepting module is used for intercepting preset genetic materials to obtain a genetic material fragment to be detected containing a promoter;
the processing module is used for inputting the genetic material fragments to be detected into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement a method for determining gene expression efficiency as provided by an embodiment of the present disclosure.
The present disclosure also provides a computer-readable storage medium storing a computer program for executing the method of determining gene expression efficiency as provided by the embodiments of the present disclosure.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: according to the method for determining the gene expression efficiency, provided by the embodiment of the disclosure, preset genetic materials are intercepted, and a genetic material fragment to be detected containing a promoter is obtained; inputting the genetic material fragments to be detected into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material. By adopting the technical scheme, the genetic material fragments comprising the promoter are determined, and the whole genetic material expression efficiency result is determined based on the genetic material fragments, so that the determination of the expression efficiency result is realized without determining the specific position and specific type of the promoter in the genetic material, the complexity of the expression efficiency result determination process is reduced, and the method is suitable for scenes in which the position and/or type of the promoter cannot be determined, and is wide in application scenes.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a method for determining gene expression efficiency according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another method for determining gene expression efficiency according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a network structure of an efficiency classification model according to an embodiment of the disclosure;
FIG. 4 is a flow chart of another method for determining gene expression efficiency according to an embodiment of the present disclosure;
FIG. 5 is a flow chart of a method for determining the expression efficiency of another gene according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of accuracy of an efficiency classification model according to an embodiment of the disclosure;
FIG. 7 is a schematic diagram of the Pearson correlation coefficient of an efficiency regression model provided by embodiments of the present disclosure;
FIG. 8 is a schematic diagram of a correspondence relationship between a preset number of bits and a correct rate according to an embodiment of the disclosure;
FIG. 9 is a schematic diagram of a device for determining gene expression efficiency according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
Gene expression efficiency is an important parameter for gene expression prediction.
The gene expression efficiency is influenced by promoters, enhancers, border elements and the like, wherein different promoters regulate the expression efficiency through different biochemical mechanisms; enhancers are DNA fragments that increase the expression activity of a gene, and have no positional or directional relationship with the gene; the border element is a DNA fragment that can block the effect of an enhancer on the activity of gene expression.
In the related art, the type and position of a promoter in a DNA fragment can be detected based on existing promoter data by a promoter detection means, and the gene expression efficiency of the DNA fragment can be determined based on the type and position of the promoter.
However, in the above method, since the process of determining the type and position of the promoter is complicated, the process of determining the expression efficiency is complicated, and in the case where the type and position of the promoter cannot be determined, the determination of the gene expression efficiency of the DNA fragment cannot be performed, the method is limited in applicable scenes. In addition, the expression efficiency of the gene is not only influenced by the promoter, but also influenced by a plurality of factors such as an enhancer, a boundary element and the like, and the method for determining the expression efficiency only according to the type and the position of the promoter has strong limitation.
In order to solve the above-described problems, embodiments of the present disclosure provide a method of determining gene expression efficiency, which is described below in connection with specific embodiments.
Fig. 1 is a flow chart of a method for determining gene expression efficiency according to an embodiment of the present disclosure, where the method may be performed by a device for determining gene expression efficiency, and the device may be implemented in software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 1, the method includes:
Step 101, intercepting preset genetic material to obtain a genetic material fragment to be detected containing a promoter.
The genetic material may be a material that transmits genetic information, and the type of the genetic material is not limited in this embodiment, for example, the genetic material may be deoxyribonucleic acid (DeoxyriboNucleic Acid, DNA), and the species of the genetic material is not limited in this embodiment, for example, the genetic material may be DNA of yeast. The promoter may be a DNA sequence that recognizes, binds to, and initiates transcription by a Ribonucleic Acid (RNA) polymerase. The segment of genetic material to be tested may be a portion of the predetermined genetic material, and the segment of genetic material to be tested may be a continuous stretch of base pairs in the predetermined genetic material. The length of the fragment of genetic material to be tested may be longer than the length of the promoter.
In the embodiment of the disclosure, the determining device of the gene expression efficiency may obtain a preset genetic material, and intercept consecutive base pairs in the preset genetic material according to a preset interception policy, to obtain a genetic material fragment to be detected including a promoter. The interception policy may be set according to a user requirement, etc., and the embodiment is not limited. It should be noted that, intercepting the genetic material fragment to be detected including the promoter does not need to know the specific position of the promoter, the user can determine the position range of the promoter according to his own experience, edit the position range as an interception policy, and the subsequent determination device of the gene expression efficiency intercepts the preset genetic material according to the interception policy.
In some embodiments of the present disclosure, intercepting predetermined genetic material to obtain a fragment of genetic material to be tested comprising a promoter, comprising:
determining the position of a transcription initiation site in preset genetic material; determining a interception starting position in the upstream direction of the locus position according to the first preset quantity, and determining an interception ending position in the downstream direction of the locus position according to the second preset quantity; and intercepting preset genetic materials according to the interception starting position and the interception ending position to obtain a genetic material fragment to be detected.
Wherein the transcription initiation site (Transcription Start Site, TSS) can be a base on the DNA strand corresponding to the first nucleotide of the nascent RNA strand. The location of the site may characterize the location of the site in genetic material. The predetermined number may be a user-determined number of base pairs cut out of predetermined genetic material, and the predetermined number may be a base pair (bp). The first preset number may be the number of base pairs taken from the upstream direction of the locus position and the second preset number may be the number of clip pairs taken from the downstream direction of the locus position. The first preset number and the second preset number may be set according to user experience, etc., which is not limited in this embodiment. In an alternative embodiment, the first preset number and the second preset number may be integers between 49 and 151.
The upstream direction may be a direction near the 5 'end of the predetermined genetic material, and the downstream direction may be a direction near the 3' end of the predetermined genetic material. The interception start position may be a position on the preset genetic material at which interception starts, and the interception end position may be a position on the preset genetic material at which interception ends.
In this embodiment, the determination means of gene expression efficiency may analyze a preset genetic material to determine the site position of the transcription initiation site therein. Alternatively, the preset genetic material may be input into analysis software, the analysis software determining the site position of the transcription initiation site therein, and the site position may be sent to a gene expression efficiency determining device which receives the site position.
Further, the determining device of gene expression efficiency may trace back a first preset number of base pairs upstream of the site position with the site position as a reference point, to determine the interception start position; and tracing back a second preset number of base pairs to the downstream of the position of the locus to determine the interception termination position. And intercepting preset genetic material positioned between the interception starting position and the interception ending position to obtain a genetic material fragment to be detected.
In the scheme, fragments comprising the promoter in the preset genetic materials are determined based on the transcription initiation sites, the first preset quantity and the second preset quantity, detection of specific positions, types and the like of the promoter is avoided, detection of the promoter in the related technology is converted into detection of the transcription initiation sites, the extraction difficulty of data to be detected is reduced, and the difficulty of determining the expression efficiency result is reduced.
Step 102, inputting the genetic material segment to be tested into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material.
Wherein the expression efficiency is also called gene expression efficiency, and the expression efficiency can be characterized by the expression amount of a part of a gene to be transcribed in a cell, and it is understood that the larger the expression efficiency is, the more protein is produced in the cell by genetic material. The expression efficiency detection model may be a model for detecting expression efficiency, the expression efficiency detection model may be a model generated based on a neural network technique, the number of the expression efficiency detection models and the model type are not limited in this embodiment, for example, the number of the expression efficiency detection models may be 2, and the model type of one model is a classification model and the model type of the other model is a regression model. The expression efficiency result may be a predicted result of expression efficiency, which may be an efficiency classification of expression efficiency, for example, which may be high expression efficiency or low expression efficiency; the expression efficiency result may also be an efficiency prediction value of a specific expression efficiency, i.e. the expression efficiency result may be a specific value of the expression efficiency.
In the embodiment of the disclosure, after obtaining a genetic material segment to be tested including a promoter, the determining device of gene expression efficiency inputs the genetic material segment to be tested into a trained expression efficiency detection model, and the expression efficiency detection model outputs a corresponding expression efficiency result.
According to the method for determining the gene expression efficiency, provided by the embodiment of the disclosure, preset genetic materials are intercepted, and a genetic material fragment to be detected containing a promoter is obtained; inputting the genetic material fragments to be detected into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material. By adopting the technical scheme, the genetic material fragments comprising the promoter are determined, and the whole genetic material expression efficiency result is determined based on the genetic material fragments, so that the determination of the expression efficiency result is realized without determining the specific position and specific type of the promoter in the genetic material, the complexity of the expression efficiency result determination process is reduced, and the method is suitable for scenes in which the position and/or type of the promoter cannot be determined, and is wide in application scenes.
Besides, other base pairs of the genetic material fragment to be detected possibly exist except the promoter, and the expression efficiency result can be determined from the dimension of the promoter and other dimensions of other base pairs, so that the comprehensiveness of data input into an expression efficiency detection model is improved.
Fig. 2 is a flow chart of another method for determining gene expression efficiency according to an embodiment of the present disclosure, as shown in fig. 2, in some embodiments of the present disclosure, an expression efficiency detection model includes an efficiency classification model and an efficiency regression model, and a genetic material fragment to be tested is input into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to a preset genetic material, including:
step 201, inputting the genetic material segment to be tested into the efficiency classification model to obtain a first efficiency classification of the genetic material segment to be tested.
Step 202, inputting the genetic material segment to be tested into an efficiency regression model to obtain an efficiency prediction value of the genetic material segment to be tested.
The efficiency classification model may be a neural network model trained in advance for determining efficiency classification of genetic material, through which qualitative analysis of expression efficiency can be performed, and the embodiment does not limit a model type of the efficiency classification model, for example, the efficiency classification model may be a two-class model or a multi-class model, and in particular, the efficiency classification model may be a convolutional neural network (Convolutional Neural Networks, CNN) model.
Fig. 3 is a schematic diagram of a network structure of an efficiency classification model according to an embodiment of the present disclosure, where, as shown in fig. 3, a base sequence of a genetic material segment to be tested is subjected to a single-heat encoding process to obtain a single-heat encoding sequence corresponding to the genetic material segment to be tested, a position matrix feature sequence corresponding to the genetic material segment to be tested is determined based on a position matrix, the single-heat encoding sequence and the position matrix feature sequence are spliced to obtain a feature matrix, the feature matrix is input into an efficiency classification model, and the efficiency classification model can output a first efficiency classification corresponding to the genetic material segment to be tested as high expression efficiency or low expression efficiency.
The efficiency regression model may be a neural network model trained in advance for determining a specific value of expression efficiency of genetic material, through which quantitative analysis of expression efficiency can be performed, and the embodiment does not limit a model type of the efficiency regression model, and in particular, the efficiency regression model may be a Long Short-Term Memory (LSTM) model.
The first efficiency classification may be a classification of expression efficiency determined based on a neural network model, and the present embodiment does not limit the first efficiency classification, for example, the first efficiency classification may include high expression efficiency and low expression efficiency; the efficiency prediction value may be a specific value of expression efficiency determined based on the neural network model.
In this embodiment, the expression efficiency detection model is formed based on the efficiency classification model and the efficiency regression model together, and the determining device of the gene expression efficiency inputs the genetic material fragment to be tested into the efficiency classification model, and the efficiency classification model determines the first efficiency classification of the genetic material fragment to be tested. The determining device of the gene expression efficiency inputs the genetic material fragment to be detected into an efficiency regression model, and the efficiency regression model determines the efficiency prediction value of the genetic material fragment to be detected.
And 203, determining an expression efficiency result according to the first efficiency classification and the efficiency prediction value.
In this embodiment, after determining the first efficiency classification and the efficiency prediction value, it is determined whether the first efficiency classification and the efficiency prediction value are consistent, if yes, the expression efficiency result is determined to be the first efficiency classification or the efficiency prediction value, otherwise, the expression efficiency result is determined to be null.
In some embodiments of the present disclosure, determining an expression efficiency result from the first efficiency classification and the efficiency prediction value includes:
determining a second efficiency classification of the piece of genetic material to be tested according to the efficiency prediction value; and if the first efficiency classification is consistent with the second efficiency classification, determining the expression efficiency result as the first efficiency classification.
The second efficiency classification may be an expression efficiency classification determined based on a specific efficiency prediction value, and the embodiment does not limit the second efficiency classification, for example, the second efficiency classification may include a high expression efficiency and a low expression efficiency.
In this embodiment, the determining means of gene expression efficiency may determine the second efficiency class according to a magnitude relation between the efficiency prediction value and a preset expression efficiency threshold value. Specifically, if the efficiency prediction value is greater than the expression efficiency threshold value, determining that the second efficiency is classified as high expression efficiency; and if the predicted efficiency value is less than or equal to the expression efficiency threshold value, determining that the second efficiency is classified as low expression efficiency. The expression efficiency threshold may be set according to a user's demand or the like, and the present embodiment is not limited, and for example, the expression efficiency threshold may be 10.
After the second efficiency classification is determined, comparing the first efficiency classification with the second classification efficiency, and if the first efficiency classification is the same as the second classification efficiency, indicating that the results determined by the efficiency classification model and the efficiency regression model are consistent, determining the final expression efficiency result as the first efficiency classification; if the first efficiency classification and the second classification efficiency are different, and the results determined through the efficiency classification model and the efficiency regression model are inconsistent, the final expression efficiency result is determined to be null or other marks representing detection failure.
In the scheme, the efficiency prediction value is converted into the second efficiency classification, so that the detection result of the efficiency classification model can be compared with the detection result of the efficiency regression model, and if the two detection results are consistent, the expression efficiency result is determined to be the corresponding efficiency classification, so that the accuracy of qualitative analysis on the expression efficiency is improved.
In some embodiments of the present disclosure, the predetermined amount of genetic material is a plurality, and the method of determining the efficiency of gene expression further comprises:
the preset genetic materials are arranged in a descending order according to the efficiency predicted value, and a plurality of preset genetic materials arranged in the preset digit number are determined to be a plurality of candidate genetic materials; and determining the candidate genetic material with the first efficiency classification consistent with the preset efficiency classification and the expression efficiency result being the first efficiency classification as the target genetic material in the plurality of candidate genetic materials.
Wherein the descending order of arrangement may be from large to small. The preset number of bits may be set according to the user's requirement, and the embodiment is not limited, and for example, the preset number of bits may be 100 or 1000. The preset efficiency classification may be a classification that is preset to obtain expression efficiency by screening, and the preset efficiency classification may be set according to a user requirement, for example, if genetic material with higher expression efficiency needs to be screened out, the preset efficiency classification may be high expression efficiency. The target genetic material fragment may be a final determined desired genetic material fragment to be tested. The candidate genetic material fragments may be intermediate genetic material fragments determined during the process of determining the target genetic material fragment.
In this embodiment, the number of preset genetic materials is plural, and each corresponding preset genetic material is determined to have a corresponding first efficiency classification and an efficiency prediction value, and the determining device for gene expression efficiency may sort the preset genetic materials in order of the efficiency prediction values from the high to the low, and select the preset genetic material with the preset number of digits arranged in front as the candidate genetic material.
And continuing to screen the candidate genetic materials, and determining the candidate genetic materials with the first efficiency classification identical to the preset efficiency classification and the expression efficiency result identical to the first efficiency classification in the candidate genetic materials as target genetic materials. The first efficiency classification in the candidate genetic materials is the same as the preset efficiency classification, and the first efficiency classification of the candidate genetic materials is matched with the user requirement; if the expression efficiency result is the same as the first efficiency class, it is indicated that the first efficiency class of candidate genetic material is more accurate.
For example, in some application scenarios, genetic material with higher expression efficiency needs to be selected for experiments, the preset number of bits may be 1000, and the preset efficiency classification may be high expression efficiency. Specifically, a predetermined genetic material whose efficiency prediction value is located at the top 1000 positions is determined as a candidate genetic material, and among the candidate genetic materials, a candidate genetic material whose first efficiency is classified as high expression efficiency and whose second efficiency classification determined from the efficiency prediction value is also high expression efficiency is determined as a target genetic material.
In the scheme, the genetic material with relatively accurate expression efficiency results and relatively high expression efficiency is determined, and the genetic material with relatively high expression efficiency is required to be selected for experiments during experiments, so that a foundation is provided for subsequent experiments based on the target genetic material.
FIG. 4 is a flow chart of another method for determining expression efficiency of a gene according to an embodiment of the disclosure, as shown in FIG. 4, in some embodiments of the disclosure, the training process of the expression efficiency detection model includes:
step 401, obtaining a sample efficiency result of a sample genetic material fragment.
And step 402, splicing the single-hot coding sequence and the position matrix characteristic sequence of the sample genetic material fragment to obtain the sample characteristic sequence of the sample genetic material fragment.
Wherein the sample genetic material fragment may be a genetic material fragment for which an expression efficiency value has been determined, the sample efficiency result may be an expression efficiency result determined based on the expression efficiency, and it will be appreciated that if the efficiency detection model comprises an efficiency regression model, the sample efficiency result may comprise a specific sample efficiency value; if the efficiency detection model includes an efficiency classification model, the sample efficiency result may include a sample efficiency classification.
The One Hot coding (One Hot) sequence may be a sequence obtained by translation using One Hot coding. The Position matrix feature sequence is also called a Position-specific matrix (PSSM) sequence. The sample signature sequence may be a sequence that ultimately characterizes a fragment of sample genetic material.
In this embodiment, the determination device of gene expression efficiency may acquire a plurality of sample genetic material fragments and sample efficiency results corresponding to each sample genetic material fragment. And translating the sample genetic material fragment into a single thermal coding sequence by adopting single thermal coding, calculating a position matrix characteristic sequence of the sample genetic material fragment by using a BLAST+ tool, and splicing the single thermal coding sequence before the position matrix characteristic sequence to obtain a corresponding sample characteristic sequence.
Step 403, training a preset initial model according to the sample feature sequence and the sample efficiency result to obtain an expression efficiency detection model.
The number of the preset initial models is not limited, and for example, the number of the preset neural network models may be 2. The model type of the preset initial model is not limited in this embodiment, and for example, the model type may be a classification model (for example, CNN model) and a regression model (LSTM model).
In this embodiment, a sample feature sequence is used as an input of a preset initial model, a sample efficiency result is used as an output of the preset initial model, and the preset initial model is trained to obtain an expression efficiency detection model.
In some embodiments of the present disclosure, if the preset initial model includes an initial classification model and an initial regression model, a sample feature sequence may be used as an input of the initial classification model, and sample efficiency classification may be used as an output of the initial classification model, so as to obtain a trained efficiency classification model. And taking the sample characteristic sequence as the input of the initial regression model, taking the sample efficiency value as the output of the initial classification model, and training to obtain a trained efficiency regression model. And combining the trained efficiency classification model and the trained efficiency regression model into an expression efficiency detection model.
In this embodiment, the initial super-parameters of the initial classification model may include the number of convolution layers, the number of hidden nodes, the size of convolution kernel, the batch processing size, the learning rate, the discard (Dropout) rate, the iteration round number, and the like, and the initial super-parameters are adjusted by adopting a grid search method in the training process, so as to obtain the target super-parameters and the efficiency classification model taking the target super-parameters as parameters.
In this embodiment, the initial hyper-parameters of the initial regression model may be preset, and the steps of training the model-verifying the model-adjusting the hyper-parameters may be repeated using the pearson correlation coefficient as an evaluation index until the efficiency regression model is obtained.
In some embodiments of the present disclosure, the method for determining the expression efficiency of a gene further comprises:
clustering the sample genetic material fragments into a plurality of sample groups; and determining the result difference of the sample efficiency results in each sample group, and deleting the sample group if the result difference is larger than a preset difference threshold value.
Wherein the sample set is a set comprising at least one fragment of genetic material, and the fragments of genetic material in the sample set have similar characteristics. The result difference may be a parameter characterizing a difference in sample efficiency results in the same sample group, and the type of the result difference is not limited in this embodiment, for example, the result difference may be a variance. The preset difference threshold may be a preset maximum value of the difference between the results under normal conditions, and if the difference between the results is greater than the preset difference threshold, it indicates that the difference between the sample efficiency results in the sample group is too large.
In this embodiment, there are various methods for clustering the sample genetic material fragments, but the present embodiment is not limited thereto, and for example, the determining device for gene expression efficiency may cluster the sample genetic material fragments into a plurality of sample groups according to the sample feature sequences of the sample genetic material fragments. Alternatively, the sample genetic material fragments may be clustered into a plurality of sample groups by a preset clustering tool. The preset clustering tool may be a preset tool capable of classifying the genetic material fragments, and the preset clustering tool is not limited in this embodiment, for example, the preset clustering tool may be a CD-HIT tool.
After clustering the sample genetic material fragments into a plurality of sample groups, calculating a result difference of sample efficiency results included in each sample group, comparing the result difference with a preset difference threshold value, and if the result difference is larger than the preset difference threshold value, indicating that the result difference is too large, namely, although the similarity of the sample genetic material fragments in the sample group is higher, the expression efficiency difference of the sample genetic material fragments is larger, and the sample genetic material fragments in the sample group are abnormal, so that the sample group is deleted.
In the scheme, the sample group with higher similarity but larger expression efficiency difference is deleted, so that the influence of the abnormal data on the subsequent training of the expression efficiency detection model is avoided, and the detection accuracy of the expression efficiency detection model is improved.
Next, a method for determining the expression efficiency of a gene in the embodiments of the present disclosure will be further described by way of a specific example. FIG. 5 is a flowchart of another method for determining gene expression efficiency according to an embodiment of the present disclosure, as shown in FIG. 5, the method for determining gene expression efficiency includes:
step 501, preprocessing generates a sample genetic material fragment and a sample efficiency result.
75120 DNA sequences and the gene expression efficiencies corresponding to each DNA sequence in yeast cells were collected. And intercepting 150bp upstream and 50bp downstream of the transcription start site according to the distribution rule of the promoter in the DNS sequence to obtain DNA fragments, and obtaining 75120 DNA fragments with the length of 200 bp. The expression efficiency of each DNA sequence was counted in the range of [0.2-13416], wherein the number of expression efficiencies of less than 1 was 39946, which was 53% of the total DNA sequence, and the number of expression efficiencies of less than 10 was 63411, which was 84% of the total DNA sequence. The expression efficiency data has a wide value range, but most of the expression efficiency data is concentrated in a region from 0 to 10, the number of DNA sequences with the expression efficiency of more than 10 is less, the distribution is sparse, and the conclusion accords with the real situation in the nature.
Clustering the DNA fragments by using a CD-HIT tool to obtain a plurality of sample groups, wherein the difference of expression efficiency is larger in part of sample groups although the similarity of the DNA fragments is higher, and deleting the sample groups. The expression efficiency threshold value is determined to be 10, the DNA fragments are divided into low expression samples (63409) with the expression efficiency less than 10 and high expression samples (11711) with the expression efficiency greater than 10, and 11700 samples are randomly extracted from the low expression samples and the high expression samples to obtain 23400 200bp DNA fragments in total in order to improve the balance of positive and negative samples. The 23400 DNA fragments were divided into training sets (18000), validation sets (2700) and test sets (2700). The DNA fragments are translated into the single thermal coding sequences by single thermal coding, and the position matrix characteristic sequences of the DNA fragments are calculated by using BLAST+ tools, so that the single thermal coding sequences and the position matrix characteristic sequences are spliced into sample characteristic sequences.
Step 502, training, verifying and optimizing a preset initial model to obtain an expression efficiency detection model.
Constructing a convolutional neural network model, and setting initial superparameters of the convolutional neural network model, wherein the initial superparameters include, but are not limited to: the method comprises the steps of training a convolutional neural network model by using 18000 DNA fragments and corresponding efficiency classifications thereof as training sets, evaluating the prediction correctness of an initial classification model by using 2700 DNA fragments and corresponding efficiency classifications thereof as verification sets, judging whether the convolutional neural network model reaches a preset correct rate, and if not, adopting a grid search method to adjust the super parameters of the convolutional neural network model, and returning to continue training the convolutional neural network model until the convolutional neural network model reaches the preset correct rate. If the convolutional neural network model reaches the preset accuracy, judging whether the convolutional neural network model reaches the preset generalization capability, and if not, returning to adjust the data in the training set, the verification set and the test set until the convolutional neural network model reaches the preset generalization capability. If the convolutional neural network model reaches the preset generalization capability, the convolutional neural network model is used as an efficiency classification model, and an efficiency classification model with better performance on a verification set is obtained.
And constructing a long-period memory network model, setting initial super parameters of the long-period memory network model, using the pearson correlation coefficient as an evaluation index, and repeating the processes of training the model, verifying the model and adjusting the super parameters of the model to obtain an efficiency regression model with better performance on a verification set.
Step 503, testing and result analysis of the expression efficiency detection model.
And (3) evaluating the generalization capability of the expression efficiency detection model by using 2700 pieces of test data to obtain an evaluation result, wherein the prediction accuracy of the efficiency classification model on the 2700 pieces of test data reaches 76%, and the prediction performance is superior to that of the prior art for determining the expression efficiency according to the position and the type of the promoter. The pearson correlation coefficient between the predicted result and the real result on 2700 pieces of test data is close to 0.6 by the efficiency regression model, and has higher positive correlation.
Data analysis of gene expression efficiency shows that more than 84% of DNA fragments in the sample are low expression efficiency, and in the practical experimental application process, samples with high expression efficiency are generally more concerned. Therefore, the efficiency predicted values obtained by predicting the efficiency regression model and the actual expression efficiency of the samples are ranked from high to low, and the accuracy of the samples which are positioned at the front 100 in the actual expression efficiency is found to be 95%, the predicted expression values of the samples which are positioned at the front 100 in the actual expression efficiency are generally higher, and the accuracy of the samples which are positioned at the front 1000 in the actual expression value is found to be 85%. And then, DNA fragments with high expression efficiency can be screened out by combining the efficiency classification model and the efficiency regression model, so that the basis is improved for finding out high-expression gene regulatory elements.
In the scheme, the efficiency classification model and the efficiency regression model have excellent performance on a test set, the accuracy of the efficiency classification model reaches 76%, and the pearson correlation coefficient of the efficiency regression model reaches 0.6, which is superior to the prior art.
Fig. 6 is a schematic diagram of accuracy of an efficiency classification model according to an embodiment of the present disclosure, as shown in fig. 6, as training rounds increase, loss decreases, and accuracy of training sets is improved to approximately 0.9, and accuracy of verification sets and test sets is kept at about 0.76.
Fig. 7 is a schematic diagram of pearson correlation coefficients of an efficiency regression model provided by an embodiment of the present disclosure, as shown in fig. 7, as training rounds increase, loss decreases, and pearson correlation coefficients of a training set are raised to approximately 0.75, and pearson correlation coefficients of a verification set and a test set remain at about 0.6.
Fig. 8 is a schematic diagram of a correspondence between a preset number of bits and a correct rate, as shown in fig. 8, genetic materials are arranged in a descending order according to an efficiency prediction value, and the number of preset bits of genetic materials is taken from front to back, and the correct rate decreases with the increase of the preset number of bits, but when the preset number of bits is 1000, the correct rates of an efficiency regression model and an efficiency classification model are both above 0.84.
In experiments, studies on gene expression efficiency have generally focused on DNA having higher gene expression efficiency, such as regulatory elements that regulate genes in synthetic biology to have high expression efficiency. Therefore, how to determine a DNA fragment with high expression efficiency from a large number of high-throughput sequencing results is an evaluation criterion of practical significance. The method achieves the prediction accuracy of 85-95% by combining the results of the efficiency classification model and the efficiency regression model, and provides a basis for the subsequent prediction of the gene regulatory element based on the DNA with high expression efficiency.
Fig. 9 is a schematic structural diagram of a device for determining gene expression efficiency according to an embodiment of the present disclosure, where the device 900 may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 9, the apparatus includes:
the intercepting module 901 is used for intercepting preset genetic materials to obtain a genetic material fragment to be detected containing a promoter;
the processing module 902 is configured to input the genetic material segment to be tested into a pre-trained expression efficiency detection model, so as to obtain an expression efficiency result corresponding to the preset genetic material.
In an alternative embodiment, the intercepting module 901 is configured to:
Determining the position of a transcription initiation site in preset genetic material;
determining interception starting positions in the upstream direction of the locus position according to a first preset number, and determining interception ending positions in the downstream direction of the locus position according to a second preset number;
and intercepting the preset genetic material according to the interception starting position and the interception ending position to obtain the genetic material fragment to be detected.
In an alternative embodiment, the intercepting module 901, the processing module 902 includes:
the first processing unit is used for inputting the genetic material fragments to be detected into the efficiency classification model to obtain first efficiency classifications of the genetic material fragments to be detected;
the second processing unit is used for inputting the genetic material segment to be detected into the efficiency regression model to obtain an efficiency prediction value of the genetic material segment to be detected;
and the determining unit is used for determining the expression efficiency result according to the first efficiency classification and the efficiency prediction value.
In an alternative embodiment, the determining unit is configured to:
determining a second efficiency classification of the genetic material fragment to be tested according to the efficiency prediction value;
And if the first efficiency classification is consistent with the second efficiency classification, determining that the expression efficiency result is the first efficiency classification.
In an alternative embodiment, the number of the preset genetic materials is a plurality, and the method further comprises:
the arrangement module is used for arranging the preset genetic materials in a descending order according to the efficiency predicted value, and determining a plurality of preset genetic materials arranged in the preset number of bits as a plurality of candidate genetic materials;
and the determining module is used for determining the candidate genetic materials with the first efficiency classification consistent with the preset efficiency classification and the expression efficiency result of the first efficiency classification as target genetic materials.
In an alternative embodiment, the apparatus further comprises a training module for:
obtaining a sample genetic material fragment and a sample efficiency result of the sample genetic material fragment;
splicing the single-heat coding sequence and the position matrix characteristic sequence of the sample genetic material fragment to obtain a sample characteristic sequence of the sample genetic material fragment;
training a preset initial model according to the sample characteristic sequence and the sample efficiency result to obtain the expression efficiency detection model.
In an alternative embodiment, the training module is further configured to:
clustering the sample genetic material fragments into a plurality of sample groups;
and determining the result difference of the sample efficiency result in each sample group, and deleting the sample group if the result difference is larger than a preset difference threshold value.
The device for determining the gene expression efficiency provided by the embodiment of the disclosure can execute the method for determining the gene expression efficiency provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 10, an electronic device 1000 includes one or more processors 1001 and memory 1002.
The processor 1001 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities and may control other components in the electronic device 1000 to perform desired functions.
Memory 1002 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 1001 to implement the methods of determining gene expression efficiency and/or other desired functions of the embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
In one example, the electronic device 1000 may further include: an input device 1003 and an output device 1004, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
In addition, the input device 1003 may include, for example, a keyboard, a mouse, and the like.
The output device 1004 may output various information to the outside, including the determined distance information, direction information, and the like. The output 1004 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 1000 that are relevant to the present disclosure are shown in fig. 10 for simplicity, components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 1000 may include any other suitable components depending on the particular application.
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the method of determining the efficiency of gene expression provided by the embodiments of the present disclosure.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Further, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the method of determining gene expression efficiency provided by embodiments of the present disclosure.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for determining the expression efficiency of a gene, comprising:
intercepting preset genetic material to obtain a genetic material fragment to be detected containing a promoter;
inputting the genetic material segment to be tested into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material.
2. The method of claim 1, wherein said intercepting the predetermined genetic material to obtain a fragment of the genetic material to be tested comprising the promoter comprises:
determining the position of a transcription initiation site in preset genetic material;
determining interception starting positions in the upstream direction of the locus position according to a first preset number, and determining interception ending positions in the downstream direction of the locus position according to a second preset number;
and intercepting the preset genetic material according to the interception starting position and the interception ending position to obtain the genetic material fragment to be detected.
3. The method according to claim 1, wherein the expression efficiency detection model includes an efficiency classification model and an efficiency regression model, and the inputting the genetic material segment to be tested into the pre-trained expression efficiency detection model to obtain the expression efficiency result corresponding to the preset genetic material includes:
Inputting the genetic material fragments to be detected into the efficiency classification model to obtain a first efficiency classification of the genetic material fragments to be detected;
inputting the genetic material segment to be detected into the efficiency regression model to obtain an efficiency prediction value of the genetic material segment to be detected;
and determining the expression efficiency result according to the first efficiency classification and the efficiency prediction value.
4. A method according to claim 3, wherein said determining said expression efficiency result from said first efficiency class and said efficiency prediction value comprises:
determining a second efficiency classification of the genetic material fragment to be tested according to the efficiency prediction value;
and if the first efficiency classification is consistent with the second efficiency classification, determining that the expression efficiency result is the first efficiency classification.
5. A method according to claim 3, wherein the predetermined amount of genetic material is a plurality, the method further comprising:
the preset genetic materials are arranged in a descending order according to the efficiency predicted value, and a plurality of preset genetic materials arranged in the preset digits are determined to be a plurality of candidate genetic materials;
and determining the candidate genetic material with the first efficiency classification consistent with the preset efficiency classification and the expression efficiency result being the first efficiency classification as the target genetic material in the plurality of candidate genetic materials.
6. The method of claim 1, wherein the training process of the expression efficiency detection model comprises:
obtaining a sample genetic material fragment and a sample efficiency result of the sample genetic material fragment;
splicing the single-heat coding sequence and the position matrix characteristic sequence of the sample genetic material fragment to obtain a sample characteristic sequence of the sample genetic material fragment;
training a preset initial model according to the sample characteristic sequence and the sample efficiency result to obtain the expression efficiency detection model.
7. The method of claim 6, wherein the method further comprises:
clustering the sample genetic material fragments into a plurality of sample groups;
and determining the result difference of the sample efficiency result in each sample group, and deleting the sample group if the result difference is larger than a preset difference threshold value.
8. A device for determining gene expression efficiency, comprising:
the intercepting module is used for intercepting preset genetic materials to obtain a genetic material fragment to be detected containing a promoter;
the processing module is used for inputting the genetic material fragments to be detected into a pre-trained expression efficiency detection model to obtain an expression efficiency result corresponding to the preset genetic material.
9. An electronic device, the electronic device comprising:
a processor and a memory;
the processor is adapted to perform the steps of the method according to any of claims 1 to 7 by invoking a program or instruction stored in the memory.
10. A computer readable storage medium storing a program or instructions for causing a computer to perform the steps of the method according to any one of claims 1 to 7.
CN202310659118.4A 2023-06-05 2023-06-05 Method, device, equipment and medium for determining gene expression efficiency Pending CN116705150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310659118.4A CN116705150A (en) 2023-06-05 2023-06-05 Method, device, equipment and medium for determining gene expression efficiency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310659118.4A CN116705150A (en) 2023-06-05 2023-06-05 Method, device, equipment and medium for determining gene expression efficiency

Publications (1)

Publication Number Publication Date
CN116705150A true CN116705150A (en) 2023-09-05

Family

ID=87830614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310659118.4A Pending CN116705150A (en) 2023-06-05 2023-06-05 Method, device, equipment and medium for determining gene expression efficiency

Country Status (1)

Country Link
CN (1) CN116705150A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831630A (en) * 2024-03-05 2024-04-05 北京普译生物科技有限公司 Method and device for constructing training data set for base recognition model and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063416A (en) * 2018-07-23 2018-12-21 太原理工大学 Gene expression prediction technique based on LSTM Recognition with Recurrent Neural Network
CN112837741A (en) * 2021-01-25 2021-05-25 浙江工业大学 Protein secondary structure prediction method based on cyclic neural network
CN113205230A (en) * 2021-05-31 2021-08-03 平安科技(深圳)有限公司 Data prediction method, device and equipment based on model set and storage medium
CN114283888A (en) * 2021-12-22 2022-04-05 山东大学 Differential expression gene prediction system based on hierarchical self-attention mechanism
CN114694756A (en) * 2020-12-31 2022-07-01 微软技术许可有限责任公司 Protein structure prediction
CN115019876A (en) * 2022-05-31 2022-09-06 清华大学 Gene expression prediction method and device
CN115579060A (en) * 2022-12-08 2023-01-06 国家超级计算天津中心 Gene locus detection method, device, equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063416A (en) * 2018-07-23 2018-12-21 太原理工大学 Gene expression prediction technique based on LSTM Recognition with Recurrent Neural Network
CN114694756A (en) * 2020-12-31 2022-07-01 微软技术许可有限责任公司 Protein structure prediction
CN112837741A (en) * 2021-01-25 2021-05-25 浙江工业大学 Protein secondary structure prediction method based on cyclic neural network
CN113205230A (en) * 2021-05-31 2021-08-03 平安科技(深圳)有限公司 Data prediction method, device and equipment based on model set and storage medium
CN114283888A (en) * 2021-12-22 2022-04-05 山东大学 Differential expression gene prediction system based on hierarchical self-attention mechanism
CN115019876A (en) * 2022-05-31 2022-09-06 清华大学 Gene expression prediction method and device
CN115579060A (en) * 2022-12-08 2023-01-06 国家超级计算天津中心 Gene locus detection method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAO CHENG等: "A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets", GENOME BIOLOGY, 16 January 2011 (2011-01-16), pages 2 - 3 *
杨科利;许强;: "基于离散增量结合支持向量机方法的果蝇启动子预测", 生物技术, no. 02, 15 April 2008 (2008-04-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831630A (en) * 2024-03-05 2024-04-05 北京普译生物科技有限公司 Method and device for constructing training data set for base recognition model and electronic equipment
CN117831630B (en) * 2024-03-05 2024-05-17 北京普译生物科技有限公司 Method and device for constructing training data set for base recognition model and electronic equipment

Similar Documents

Publication Publication Date Title
Hassan et al. Evaluation of computational techniques for predicting non-synonymous single nucleotide variants pathogenicity
JP7319197B2 (en) Methods for Aligning Target Nucleic Acid Sequencing Data
CN112235327A (en) Abnormal log detection method, device, equipment and computer readable storage medium
CN116705150A (en) Method, device, equipment and medium for determining gene expression efficiency
CN116432184A (en) Malicious software detection method based on semantic analysis and bidirectional coding characterization
CN110491443B (en) lncRNA protein correlation prediction method based on projection neighborhood non-negative matrix decomposition
CN114139636B (en) Abnormal operation processing method and device
US20210398605A1 (en) System and method for promoter prediction in human genome
CN104462870A (en) Method and device for identifying human gene promoter
CN107516020B (en) Method, device, equipment and storage medium for determining importance of sequence sites
CN115579060B (en) Gene locus detection method, device, equipment and medium
CN116153396A (en) Non-coding variation prediction method based on transfer learning
CN111048145A (en) Method, device, equipment and storage medium for generating protein prediction model
WO2023183422A1 (en) Identifying genome features in health and disease
CN113704464B (en) Construction method and system of time-evaluation composition material corpus based on network news
CN109308934A (en) A kind of gene regulatory network construction method based on integration characteristic importance and chicken group's algorithm
CN112634947B (en) Animal voice and emotion feature set sequencing and identifying method and system
CN111951889B (en) Recognition prediction method and system for M5C locus in RNA sequence
CN108595914A (en) One grows tobacco mitochondrial RNA (mt RNA) editing sites high-precision forecasting method
CN114300036A (en) Genetic variation pathogenicity prediction method and device, storage medium and computer equipment
KR102376212B1 (en) Gene expression marker screening method using neural network based on gene selection algorithm
KR101128425B1 (en) Methods for providing information for an inhibition prediction of hERG channel
CN107622184B (en) Evaluation method for amino acid reliability and modification site positioning
KR20210050362A (en) Ensemble pruning method, ensemble model generation method for identifying programmable nucleases and apparatus for the same
CN111009287B (en) SLiMs prediction model generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination