CN115512396B

CN115512396B - Method and system for predicting anti-cancer peptide and antibacterial peptide based on deep neural network

Info

Publication number: CN115512396B
Application number: CN202211352672.XA
Authority: CN
Inventors: 柳军涛; 周婉芸; 刘雨菲
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2023-04-07
Anticipated expiration: 2042-11-01
Also published as: CN115512396A

Abstract

The invention discloses a method and a system for predicting anti-cancer peptides and antibacterial peptides based on a deep neural network, which belong to the technical field of peptide identification and comprise the following steps: obtaining a peptide sequence; extracting fingerprint information, evolution information and physicochemical property information of the peptide sequence; the method comprises the steps of obtaining a peptide identification result through fingerprint information, evolution information, physicochemical property information and a trained peptide sequence identification model, wherein the peptide sequence identification model comprises a first feature extraction network, a second feature extraction network and a third feature extraction network, the first feature extraction network extracts fingerprint features from the fingerprint information, the second feature extraction network extracts evolution features from the evolution information, the third feature extraction network extracts physicochemical property features from the physicochemical property information, the fingerprint features, the evolution features and the physicochemical property features are fused to obtain fusion information, the fusion information is identified, and the peptide identification result is obtained. The accuracy of the peptide identification result is improved.

Description

Method and system for predicting anti-cancer peptide and antibacterial peptide based on deep neural network

Technical Field

The invention relates to the technical field of peptide prediction, in particular to a method and a system for predicting anti-cancer peptides and antibacterial peptides based on a deep neural network.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The antibacterial peptide and the anticancer peptide are bioactive peptides consisting of a plurality of amino acids, the antibacterial peptide can solve the problem of drug effect reduction caused by drug resistance generated by bacterial pathogens, and the anticancer peptide effectively controls the drug resistance of cancer cells to anticancer drugs, thereby improving the curative effect of the drugs. Therefore, accurate prediction of anticancer and antibacterial peptides plays an important role in the treatment of cancer and the design of antibacterial drugs.

The existing anti-cancer peptide and antibacterial peptide prediction technology mainly comprises two parts of feature extraction and model prediction, and most of the two parts adopt a simple combination of an existing sequence feature extraction method and a deep learning network to train a model. Although these methods also have some predictive performance, there are three limitations:

1. in the aspect of feature extraction, no feature capable of representing the global information of a peptide chain exists, and meanwhile, specific physicochemical property features are often directly used in the aspect of physicochemical property feature extraction, so that redundancy and low quality of sequence feature information are caused.

2. Machine learning or deep learning methods specially corresponding to different features are not designed for processing. Many models use the same or similar neural network to process multiple features, resulting in unreasonable utilization of sequence feature information.

3. The traditional neural network model training mode is to randomly divide a training set and a verification set to train to obtain a final model, and the training and the reasonable division of the verification set cannot be performed by fully utilizing the preference of the neural network of the existing data set. Due to the randomness of data division, the deviation of different division modes under the same test set is large, and therefore the finally obtained network model prediction effect is extremely unstable.

Due to the reasons, the existing identification methods for the anti-cancer peptides and the antibacterial peptides have low identification accuracy and poor generalization capability.

Disclosure of Invention

In order to solve the problems, the invention provides a method and a system for predicting an anti-cancer peptide and an anti-bacterial peptide based on a deep neural network, and the accuracy of peptide sequence identification is improved.

In order to realize the purpose, the invention adopts the following technical scheme:

in a first aspect, a method for predicting an anticancer peptide and an antimicrobial peptide based on a deep neural network is provided, which includes:

obtaining a peptide sequence;

determining the physicochemical properties of each amino acid in the peptide sequence;

extracting evolution information of the peptide sequence, and extracting fingerprint information and physicochemical property information of the peptide sequence according to the physicochemical property of the amino acid;

the method comprises the steps of obtaining a peptide identification result through fingerprint information, evolution information, physicochemical property information and a trained peptide sequence identification model, wherein the peptide sequence identification model comprises a first feature extraction network, a second feature extraction network and a third feature extraction network, the first feature extraction network extracts fingerprint features from the fingerprint information, the second feature extraction network extracts evolution features from the evolution information, the third feature extraction network extracts physicochemical property features from the physicochemical property information, the fingerprint features, the evolution features and the physicochemical property features are fused to obtain fusion information, the fusion information is identified, and the peptide identification result is obtained.

In a second aspect, a deep neural network based anticancer and antibacterial peptide prediction system is provided, comprising:

a data acquisition module for acquiring a peptide sequence;

the information extraction module is used for extracting evolution information of the peptide sequence, determining the physicochemical property of each amino acid in the peptide sequence, and extracting fingerprint information and physicochemical property information of the peptide sequence according to the physicochemical property of the amino acid;

and the identification module is used for acquiring a peptide identification result through the fingerprint information, the evolution information, the physicochemical property information and the trained peptide sequence identification model, wherein the peptide sequence identification model comprises a first feature extraction network, a second feature extraction network and a third feature extraction network, the first feature extraction network extracts fingerprint features from the fingerprint information, the second feature extraction network extracts evolution features from the evolution information, the third feature extraction network extracts physicochemical property features from the physicochemical property information, the fingerprint features, the evolution features and the physicochemical property features are fused to acquire fused information, and the fused information is identified to acquire the peptide identification result.

In a third aspect, an electronic device is provided, which includes a memory and a processor, and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method for predicting an anticancer peptide and an antimicrobial peptide based on a deep neural network.

In a fourth aspect, a computer readable storage medium is provided for storing computer instructions, which when executed by a processor, perform the steps of a deep neural network-based method for predicting anticancer and antibacterial peptides.

Compared with the prior art, the invention has the following beneficial effects:

1. the method obtains comprehensive characteristic information of the peptide sequence by extracting the evolution information characteristic, the fingerprint information characteristic and the physicochemical property information characteristic of the peptide sequence, fuses the three characteristics, obtains a peptide identification result by identifying the fused information, and improves the accuracy of the peptide identification result.

2. The invention extracts the characteristics of the evolution information, the fingerprint information and the physicochemical property information through different characteristic extraction networks, can extract effective characteristics from each kind of information, realizes the reasonable utilization of the information, and thus effectively ensures the accuracy of the peptide identification result.

3. When the peptide sequence recognition model is trained, firstly, the training set and the verification set are randomly divided, then the training set and the verification set are reasonably divided according to the preference of the peptide sequence recognition model, and the peptide sequence recognition model is trained through the training set and the verification set, so that the recognition accuracy of the trained peptide sequence recognition model can be improved, and the accuracy of a peptide recognition result is further ensured.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments and illustrations of the application are intended to explain the application and are not intended to limit the application.

FIG. 1 is a block flow diagram of the method disclosed in example 1;

fig. 2 is a block diagram of a first feature extraction network disclosed in embodiment 1;

fig. 3 is a block diagram of a second feature extraction network disclosed in embodiment 1;

fig. 4 is a block diagram of a third feature extraction network disclosed in embodiment 1.

Detailed Description

The invention is further described with reference to the following figures and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example 1

In order to improve the accuracy of the peptide identification result, in this embodiment, a method for predicting anticancer peptides and antibacterial peptides based on a deep neural network is disclosed, as shown in fig. 1, including:

s1: obtaining a peptide sequence;

s2: determining the physicochemical properties of each amino acid in the peptide sequence; extracting evolution information of the peptide sequence, and extracting fingerprint information and physicochemical property information of the peptide sequence according to the physicochemical property of the amino acid;

s3: the method comprises the steps of obtaining a peptide identification result through fingerprint information, evolution information, physicochemical property information and a trained peptide sequence identification model, wherein the peptide sequence identification model comprises a first feature extraction network, a second feature extraction network and a third feature extraction network, the first feature extraction network extracts fingerprint features from the fingerprint information, the second feature extraction network extracts evolution features from the evolution information, the third feature extraction network extracts physicochemical property features from the physicochemical property information, the fingerprint features, the evolution features and the physicochemical property features are fused to obtain fusion information, the fusion information is identified, and the peptide identification result is obtained.

In particular implementations, 158 physicochemical properties of each amino acid in the peptide sequence were determined, the 158 physicochemical properties being determined by a physicochemical properties database (AAindex).

Evolutionary information of peptide sequences is represented by constructing a PSSM matrix of peptide sequences, including in particular: obtaining a position-specific scoring matrix (PSSM matrix) of the peptide sequence by using PSI-BLAST, wherein the evolution information of the obtained peptide sequence is a PSSM characteristic matrix of L multiplied by 20 in the case that the peptide sequence comprises 20 amino acids, wherein L is the length of the peptide sequence, and the matrix is the first

Element and position

By amino acid

The probability of substitution is proportional.

According to the physicochemical properties of amino acids, the process of extracting the fingerprint information of the peptide sequence comprises the following steps:

s21: CGR curves of peptides were constructed according to the physicochemical properties of amino acids. The method comprises the following specific steps:

sequencing all amino acids in a peptide sequence according to the numerical values of the physicochemical properties, and uniformly mapping all the amino acids on a unit circle to construct a CGR curve of the peptide, wherein the coordinates of the unit circle are as follows:

wherein,

representing the constituent peptide sequenceTo the first of the columniAnd (3) amino acid.

For the firstiAmino acid of each amino acid

Its coordinates in the unit circle are:

wherein,Nindicates the length of the peptide sequence.

For each peptide sequence, 158 kinds of CGR curves were obtained using 158 kinds of physicochemical properties selected in the physicochemical property database (AAindex).

S22: dividing the CGR curve into a plurality of sub-blocks, and determining points on the boundary of adjacent sub-blocks; rotating the partitioned CGR curve to obtain corresponding points of the rotated sub-blocks and points on the boundaries of the adjacent sub-blocks after rotation; and calculating the Euclidean distance of the points on the two adjacent boundaries and the Euclidean distance of the corresponding points after the two adjacent circles are rotated to form a distance matrix.

Preferably, the CGR curve is divided into four sub-blocks according to four quadrants of a coordinate system, the coordinate axes are rotated by 45 ° by using position related information of points on the boundary of adjacent sub-blocks to obtain the rotated four sub-blocks, and for each CGR curve, euclidean distances between all point pairs in the eight sub-blocks are calculated to obtain eight distance matrices

。

S23: and extracting main characteristic values of each distance matrix, and forming peptide sequence fingerprint information through the main characteristic values.

Calculating the main eigenvalue of each distance matrix to obtain a characteristic matrix

Comprises the following steps:

wherein,

for the matrix of main eigenvalues of each CGR curve,

is a matrix

The main eigenvalue of (1).

According to the physicochemical property of amino acid, the process of extracting the physicochemical property information of the peptide sequence comprises the following steps:

clustering all physicochemical properties in the physicochemical property database, and extracting the most representative property in each cluster as the representative physicochemical property of the amino acid;

and extracting the representative physicochemical property of the amino acid from the physicochemical property of each amino acid of the peptide sequence to obtain the physicochemical property information of the peptide sequence.

Compared with the conventional method which usually selects specific physicochemical properties directly, in this embodiment, in order to avoid redundancy and obtain more comprehensive physicochemical property distribution, 556 physicochemical properties in the AAindex are first divided into 8 clusters;

eight most representative physicochemical properties in each cluster are extracted as representative physicochemical properties, and each amino acid is coded into an eight-dimensional vector by utilizing PCPE (physical and chemical property embedding).

Constructing an L multiplied by 8 dimensional characteristic matrix of each amino acid in the peptide sequence according to the physicochemical property of each amino acid in the peptide sequence

And obtaining the physicochemical property information of the peptide sequence.

Inputting fingerprint information, evolution information and physicochemical property information into the trained peptide sequenceBefore identification of the model, evolution information of the peptide sequence was analyzed

And physicochemical property information

Are unified to accommodate subsequent peptide sequence recognition models.

Specifically, the length of the information may be uniformly set to 50, and if the length is insufficient, the information is padded with 0.

The peptide sequence recognition model is described in detail.

In order to extract effective features from fingerprint information, evolution information and physicochemical property information, a first feature extraction network, a second feature extraction network and a third feature extraction network are arranged in a peptide sequence identification model, fingerprint features are extracted from the fingerprint information through the first feature extraction network, evolution features are extracted from the evolution information through the second feature extraction network, and physicochemical property features are extracted from the physicochemical property information through the third feature extraction network.

Wherein, the first feature extraction network adopts a multi-channel convolution neural network, and adds a channel attention mechanism in the multi-channel convolution neural network, as shown in fig. 2, fingerprint information is extracted

And inputting the first feature extraction network to capture important features through local connectivity and weight sharing.

Due to the feature matrix

Each row of (a) represents 158 physicochemical properties and each column represents 8-dimensional features extracted from one CGR curve, the model learns the shared weights of the 158 physicochemical properties using a more appropriate convolution kernel of size 1 × 8 instead of a general square convolution kernel. In the present invention, the number of filters is set to 16.

Global information is obtained by comprehensively considering all 158 attributes using a Channel Attention Module (CAM).

The method comprises the following specific steps:

(1) Obtaining a three-dimensional feature map by a convolution layerM’ _DCGR = f _conv (MDCGR)

(2) To pair

Global averaging and maximum pooling is performed, and then a multi-layer perceptron (MLP) with shared weights consisting of two fully-connected layers is used, resulting in

Each channel of

And given channel weight to the classification importance ofCAM _i 。

The overall process of the CAM is represented as:

wherein,

a sigmoid function is represented as a function,

，

respectively by averaging and maximum pooling calculations,

and

representing the weight matrix of the shared MLP.

Assigning channel weights to three-dimensional feature maps by element-wise multiplication

On the corresponding channel, obtaining a characteristic diagram

。

Will be provided with

Flattening and passing through a full connection layer to generate a final 400-dimensional DCGR feature vector

。

The second feature extraction network adopts a bidirectional long-short memory network (Bi-LSTM), and evolves information as shown in FIG. 3

Input into Bi-LSTM.

The calculation of forward LSTM is summarized as follows:

wherein,

,

,

are respectively weight matrixes;

respectively representing offset vectors;

is a forgetting gate;

is an input gate;

is an output gate;

is the current input;

is a previous cell state;

is the current cell state;

is a new value added to the cell state;

and

respectively in a previous and a current hidden state; Ä denotes element-by-element multiplication.

The working principle of the backward LSTM is the same as that of the forward LSTM, and the current hidden state is calculated as

。

Resulting in a representation of the relevant PSSM eigenvector as 256 dimensions

In which

The last time step.

The third feature extraction network adopts a multi-head self-attention network (Transformer network), and a feature matrix obtained by PCPE

Since the sequence of residues plays a crucial role in peptide sequence, the present example uses sine and cosine position coding to reflect the distribution of physicochemical properties in peptide sequence, and the specific method is as follows:

wherein,

indicates the amino acid position of the peptide sequence,

and

representing even and odd element positions of the embedding vector,

wherein

Is the dimension of the embedding vector.

Obtaining a new feature matrix incorporating position-coded information

Wherein

is shown as

Feature vectors for individual residuals, L = 50, represent the peptide chain length.

As shown in fig. 4, will

The coding region of the Transformer is input, and specifically comprises:

the dependency relationship between amino acid residues at any distance is extracted by utilizing a single-head self-attention mechanism, the physicochemical property distribution information of the peptide is effectively obtained, and the calculation process is as follows:

wherein,

representing an attention score matrix;

、

and

respectively representing three vectors of query, key and value;

is the dimension of the same or a different dimension,

,

and

is the relevant weight matrix.

Average pooling is adopted for the feature matrix passing through the coding region to obtain 50-dimensional PCPE feature vector of the peptide chain

。

Fusing the fingerprint characteristics, the evolutionary characteristics and the physicochemical property characteristics to obtain fusion information, identifying the fusion information to obtain a peptide identification result, specifically:

by batch normalization, 400-dimensional, 256-dimensional, and 50-dimensional feature vectors are obtained as the output of each branch.

And splicing the characteristic vectors output by the branches, and performing back propagation through a Dropout layer and a full connection layer to obtain a final prediction result of the anticancer peptide or the antibacterial peptide.

The process of obtaining the trained peptide sequence recognition model is as follows:

acquiring a training set and a verification set for each training, training the constructed peptide sequence recognition model through the training set for each training, and verifying the training effect of the peptide sequence recognition model through the verification set for each training;

selecting samples with model prediction error times exceeding the set error times in the training process of the last set round number from the verification set for the training to form a verification set after screening; selecting samples which are accurately classified in the training process of the last set number of rounds from the training set for the training to form a training set after screening; selecting samples from the screened verification set to form a verification set to-be-exchanged sample set, selecting samples from the screened training set to form a training set to-be-exchanged sample set, exchanging the verification set to-be-exchanged sample set in the verification set for the training with the training set to-be-exchanged sample set in the training set for the training to form a new verification set and a new training set, and using the new verification set and the new training set as the verification set and the training set for the next training.

The process of obtaining the training set and the verification set for the first training is as follows:

acquiring a peptide sequence for training, and labeling the peptide sequence for training;

the evolution information, fingerprint information and physicochemical property information of the training peptide sequence are extracted from the training peptide sequence to form a training data set, and the training data set is randomly divided into a training set and a verification set, namely the training set and the verification set for the first training.

And training the constructed peptide sequence recognition model through a training set and a verification set for the last training, and obtaining the trained peptide sequence recognition model after the training is finished.

In specific implementation, S31: acquiring a peptide sequence for training, and labeling the peptide sequence for training;

s32, randomly dividing a training set

And verification set

A training set and a validation set for the first training are formed.

Wherein,

and

respectively representing the features of the training set and the validation set,

and

indicating a sample label.

S33: training the constructed peptide sequence recognition model through a training set, and verifying the training effect of the peptide sequence recognition model through a verification set;

s34: in the verification set

Samples with error times exceeding 5 times in 10 rounds of training processes after mid-search are generated to be a verification set after screening

. Meanwhile, samples which are accurately classified in the last 10 rounds of training in the training set are found, and a training set after screening is generated

Randomly select [ k/2 ]]Is from

Sample set of

While randomly selecting [ k/2 ]]Is derived from

Sample set of

Will train the setTSample set in (1)T _change And verification setVSample set in (1)V _change Exchanging to construct a new training set

And verification set

I.e. byT _new =T-T _change +V _change ，V _new =V-V _change +T _change 。

S35: in two new sets

And

repeating S33 and S34 to obtain the final training set

And verification set

。

S36: reinitializing the peptide sequence recognition model through the final training set

And verification set

And training the peptide sequence recognition model to obtain the trained peptide sequence recognition model.

The prediction method provided by the embodiment can effectively fuse sequence information in multiple aspects, and on the basis, a three-branch neural network model framework (TriNet) is designed according to the characteristics of the three characteristics, so that each characteristic is properly processed and effectively fused for final prediction.

On an ACP740 data set, five-fold cross validation is utilized, and compared with ACP-DL, MHCNN, iACP-DRLF, CL-ACP and DeepACPpred, the improvement percentages of accuracy, sensitivity, specificity, accuracy, F1 fraction and Markov correlation coefficient are respectively 3.2% -8.6%, 1.9% -6.9%, 3.2% -21.5%, 3.0% -12.0% and 3.2% -6.9%. On ACPmain data set, five-fold cross validation is utilized to compare TriNet with ACP-DL, MHCNN, iACP-DRLF, antiCP 2.0-AAC and AntiCP 2.0-DPC, and the improvement percentages of accuracy, sensitivity, specificity, accuracy, F1 fraction and Mareus correlation coefficient are respectively 9.3% -23.7%, 15.3% -47.3%, 3.6% -13.6%, 5.6% -14.3%, 10.4% -28.2% and 25.4% -73.1%. On AMP2828 data set, using its independent test set, comparing the TriNet of the invention with ACP-DL, MHCNN, iACP-DRLF, antiCP 2.0-AAC and AntiCP 2.0-DPC, the accuracy, sensitivity, specificity, accuracy, F1 fraction and the promotion percentage of the Markov correlation coefficient are respectively 6.1% -29.5%, 0.7% -11.0%, 1.8% -79.4%, 2.3% -44.6%, 6.7% -22.8% and 13.2% -67.4%.

In addition to the commonly used evolutionary features, two new features, sequence fingerprint information and physicochemical property information, were introduced in this example, to properly characterize the global sequence information and physicochemical property distribution of polypeptides. Test results show that the proposed features have wide adaptability and effectiveness. The concrete expression is as follows: and flattening and splicing the characteristic matrix of each branch, and predicting by using a traditional machine learning method XGboost to replace a neural network structure. On an ACP740 data set, the accuracy and the promotion percentage of the Ma Xiusi correlation coefficient are respectively 0.47% -5.8% and 0.28% -15.3%; on the ACPmain data set, the accuracy and the promotion percentage of the Ma Xiusi correlation coefficient are respectively 4.0% -17.7% and 11.3% -53.8%; on the AMP2828 dataset, the percentage improvement of the accuracy and the Ma Xiusi correlation coefficient were 4.3% -27.2% and 9.1% -61.2%, respectively.

In addition, the embodiment also provides a novel TVI (transient overvoltage) training method which can generate a more proper training and verification set according to the deviation of a network model, and specifically comprises the steps that compared with the traditional training method, on an ACP740 data set, the average promotion percentages of the accuracy, the sensitivity, the specificity, the accuracy, the F1 fraction and the Marx correlation coefficient are respectively 2.0%, 2.1%, 1.9%, 1.9%, 2.0% and 4.6%, and the maximum promotion ratios are 3.9%, 6.6%, 8.2%, 6.6%, 4.4% and 9.1%; on the ACPpain data set, the average lifting percentage is respectively 1.7%, 2.3%, 1.0%, 1.1%, 1.7% and 4.5%, and the maximum lifting percentage is respectively 4.8%, 6.0%, 6.3%, 5.6%, 4.8% and 12.8%; on the AMP2828 dataset, the average percent increases were 0.5%, 0.3%, 0.7%, 0.7%, 0.5%, 1.0%, and the maximum percent increases were 1.1%, 1.1%, 2.9%, 2.9%, 1.1%,2.2%, respectively.

Example 2

In this embodiment, disclosed is a deep neural network-based anticancer and antibacterial peptide prediction system, comprising:

a data acquisition module for acquiring a peptide sequence;

the identification module is used for obtaining a peptide identification result through fingerprint information, evolution information, physicochemical property information and a trained peptide sequence identification model, wherein the peptide sequence identification model comprises a first feature extraction network, a second feature extraction network and a third feature extraction network, the first feature extraction network extracts fingerprint features from the fingerprint information, the second feature extraction network extracts evolution features from the evolution information, the third feature extraction network extracts physicochemical property features from the physicochemical property information, the fingerprint features, the evolution features and the physicochemical property features are fused to obtain fused information, and the fused information is identified to obtain the peptide identification result.

Example 3

In this embodiment, an electronic device is disclosed, comprising a memory and a processor, and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method for predicting anti-cancer and anti-microbial peptides based on deep neural networks disclosed in embodiment 1.

Example 4

In this embodiment, a computer readable storage medium is disclosed for storing computer instructions which, when executed by a processor, perform the steps of a method for predicting anti-cancer and anti-microbial peptides based on a deep neural network disclosed in embodiment 1.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A method for predicting anti-cancer peptides and antibacterial peptides based on a deep neural network is characterized by comprising the following steps:

obtaining a peptide sequence;

extracting evolution information of the peptide sequence, and extracting fingerprint information and physicochemical property information of the peptide sequence according to the physicochemical property of the amino acid, wherein the evolution information of the peptide sequence is represented by constructing a PSSM matrix of the peptide sequence; constructing a CGR curve of the peptide according to the physicochemical properties of the amino acid; dividing the CGR curve into a plurality of sub-blocks, and determining points on the boundary of adjacent sub-blocks; rotating the partitioned CGR curve to obtain corresponding points of the rotated sub-blocks and points on the boundaries of the adjacent sub-blocks after rotation; calculating the Euclidean distance of points on two adjacent boundaries and the Euclidean distance of corresponding points after two adjacent rotated points to form a distance matrix; extracting main characteristic values of each distance matrix to form peptide sequence fingerprint information; clustering all physicochemical properties in the physicochemical property database, and extracting the most representative property in each cluster as the representative physicochemical property of the amino acid; extracting representative physicochemical properties of amino acids from the physicochemical properties of each amino acid of the peptide sequence to obtain physicochemical property information of the peptide sequence;

obtaining a peptide identification result through fingerprint information, evolution information, physicochemical property information and a trained peptide sequence identification model, wherein the peptide sequence identification model comprises a first feature extraction network, a second feature extraction network and a third feature extraction network, the first feature extraction network extracts fingerprint features from the fingerprint information, the second feature extraction network extracts evolution features from the evolution information, the third feature extraction network extracts physicochemical property features from the physicochemical property information, the fingerprint features, the evolution features and the physicochemical property features are fused to obtain fusion information, and the fusion information is identified to obtain a peptide identification result;

the first feature extraction network adopts a multi-channel convolution neural network, and a channel attention mechanism is added in the multi-channel convolution neural network; the second feature extraction network adopts a bidirectional long and short memory network; the third feature extraction network adopts a multi-head self-attention network;

selecting samples with model prediction error times exceeding the set error times in the training process of the last set round number from the verification set for the training to form a verification set after screening; selecting samples which are accurately classified in the training process of the last set number of rounds from the training set for the training to form a training set after screening; selecting samples from the screened verification set to form a verification set to-be-exchanged sample set, selecting samples from the screened training set to form a training set to-be-exchanged sample set, exchanging the verification set to-be-exchanged sample set in the verification set for the training with the training set to-be-exchanged sample set in the training set for the training to form a new verification set and a new training set, and using the new verification set and the new training set as the verification set and the training set for the next training;

2. The method of claim 1, wherein the evolution information and the physicochemical property information of the peptide sequence are unified before inputting the fingerprint information, the evolution information and the physicochemical property information into the trained peptide sequence recognition model.

3. An anticancer peptide and antibacterial peptide prediction system based on a deep neural network, which is characterized by comprising:

a data acquisition module for acquiring a peptide sequence;

the information extraction module is used for extracting evolution information of the peptide sequence, determining the physicochemical property of each amino acid in the peptide sequence, and extracting fingerprint information and physicochemical property information of the peptide sequence according to the physicochemical property of the amino acid, wherein the evolution information of the peptide sequence is represented by constructing a PSSM matrix of the peptide sequence; constructing a CGR curve of the peptide according to the physicochemical properties of the amino acid; dividing the CGR curve into a plurality of sub-blocks, and determining points on the boundary of adjacent sub-blocks; rotating the partitioned CGR curve to obtain corresponding points of the rotated sub-blocks and points on the boundaries of the adjacent sub-blocks after rotation; calculating Euclidean distances of points on two adjacent boundaries and Euclidean distances of corresponding points after two adjacent rotated points to form a distance matrix; extracting main characteristic values of each distance matrix to form peptide sequence fingerprint information; clustering all physicochemical properties in the physicochemical property database, and extracting the most representative property in each cluster as the representative physicochemical property of the amino acid; extracting representative physicochemical properties of amino acids from the physicochemical properties of each amino acid of the peptide sequence to obtain physicochemical property information of the peptide sequence;

the identification module is used for acquiring a peptide identification result through fingerprint information, evolution information, physicochemical property information and a trained peptide sequence identification model, wherein the peptide sequence identification model comprises a first feature extraction network, a second feature extraction network and a third feature extraction network, the first feature extraction network extracts fingerprint features from the fingerprint information, the second feature extraction network extracts evolution features from the evolution information, the third feature extraction network extracts physicochemical property features from the physicochemical property information, the fingerprint features, the evolution features and the physicochemical property features are fused to acquire fused information, and the fused information is identified to acquire the peptide identification result;

4. An electronic device comprising a memory and a processor and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method for predicting anti-cancer and anti-microbial peptides based on deep neural networks of any one of claims 1-2.

5. A computer readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method for predicting anti-cancer and anti-microbial peptides based on a deep neural network according to any one of claims 1 to 2.