CN114242169B

CN114242169B - Antigen epitope prediction method for B cells

Info

Publication number: CN114242169B
Application number: CN202111537519.XA
Authority: CN
Inventors: 羊红光; 周云飞; 成彬
Original assignee: Institute Of Applied Mathematics Hebei Academy Of Sciences
Current assignee: Institute Of Applied Mathematics Hebei Academy Of Sciences
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2023-10-20
Anticipated expiration: 2041-12-15
Also published as: CN114242169A

Abstract

An epitope prediction method for B cells, the method comprising first forming a pretrained set PT; in each epoode of the q_learning algorithm, the Q agent takes any 8 consecutive amino acid residues in the primary sequence of the protein as a state, to select k residues from the 12 consecutive residues following each state to incorporate the state as the first action; selecting one of n complementary classifiers as a second action option, searching in PT according to a continuous action search method, giving instant rewards to the searched amino acid sequence by a tendency rewarding rule, calculating Q value and updating until the change of a cost function is less than 1%, and ending training; the amino acid sequence is then searched for in the protein primary sequence using the trained strategy and classified by the selected classifier. According to the invention, the prediction capability of the B cell epitope is greatly enhanced through automatic iteration, and the accuracy of epitope classification is improved.

Description

Antigen epitope prediction method for B cells

Technical Field

The invention relates to an epitope prediction method for B cells, which can accurately predict B cell epitopes and belongs to the technical field of artificial intelligent detection of microorganisms.

Background

The accurate determination of B cell antigen epitope is an important basis for designing bioactive medicine and epitope vaccine, is a key step for developing disease kit, and is a basic technology for researching immunodiagnosis and immunotherapy. The machine learning-based B cell epitope prediction is an important technical route for determining the epitope, and has the advantages of greatly saving time, money and labor cost compared with other technical routes.

The SEPPA is an epitope prediction software recommended in the Immune Epitope Database (IEDB) established by the national institute of allergy and infectious diseases, which has been updated to version 3.0 in 2019. The scholars responsible for developing SEPPA 3.0 have pointed out in their paper that conformational epitope prediction progressed smoothly but slowly over the last decade.

The prior epitope prediction adopts a supervised learning strategy, and learns the epitope sample and the non-epitope sample to obtain a classification predictor. Although new epitope prediction methods are continuously developed, the prediction accuracy is continuously improved, and the problems of low universality, low classification accuracy, slow update of a prediction model and the like exist. In particular, the conventional window method, in which an integer is preset as the number of amino acids in the predicted result before prediction, is very artificial, and it is difficult to predict an epitope having an optimal length.

With the revolutionary breakthrough of alpha gold in the field of protein structure prediction, which defeats the strongest human player in the go war, these successes give us great insight. Both breakthroughs have a common characteristic of introducing an automatic learning mechanism, so that the model is continuously iterated by itself, and strong recognition capability is gradually generated.

However, the existing methods are non-automatic learning and cannot enhance the prediction capability through automatic iteration, so that an automatic mechanism is introduced into the antigen epitope prediction of the B cells, and a method capable of accurately determining the antigen epitope of the B cells is very necessary to design.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art and provides an epitope prediction method for B cells so as to improve the accuracy of B cell epitope prediction.

The problems of the invention are solved by the following technical proposal:

an epitope prediction method for B cells comprises the steps of firstly searching B cell epitope sequence data from an IEDB database to form a set EPT, extracting corresponding protein primary sequences from a uniport database to form a pre-training set PT; based on the Q_learning algorithm, one action of the algorithm is changed into two actions to train; in each epoode, the Q agent takes any 8 consecutive amino acid residues in the primary sequence of the protein as a state to select k residues from the 12 consecutive residues following each state to incorporate the state as the first action; selecting one of n complementary classifiers as a second action option, searching in a protein primary sequence in PT according to a continuous action search method, giving instant rewards to the searched amino acid sequence by a tendency rewarding rule, calculating a Q value and updating until the change of a value function is less than 1%, and ending training; then searching an amino acid sequence in the protein primary sequence by using a strategy obtained by training, and classifying by a selected classifier, thereby realizing B cell epitope prediction.

The above epitope prediction method for B cells, comprising the steps of:

a. b cell antigen epitope sequence data are searched from an IEDB database to form an assembly EPT, corresponding protein primary sequences are extracted from a uniport database to form a pre-training assembly PT, and an assembly containing n more than or equal to 2 complementary classifiers is selected as a second action;

b. taking any 8 consecutive amino acid residues in the primary sequence of the protein as a state, and selecting k residues from 12 consecutive residues following each state to incorporate the state as a first action; selecting one of n complementary classifiers as a second action option, initializing Q values corresponding to all states and actions to 0, setting learning rate alpha to any number between 0 and 1, setting discount factor gamma to any number between 0 and 1, setting value of epoode, and initializing state s ₀ Any 8 amino acid residues that are a pre-training set;

c. in each epoode, the Q agent searches among the primary sequences of proteins in the collection PT according to a continuous action search method: at step t, the Q agent selects an action from the first set of actionsThen select action +.>Awarding the prize R according to a tendency prize law after the two actions are executed _t And the next observation state s _t+1 Then updating the Q value, and updating the state and the action table at the same time, and ending the search training process when the change of the cost function is less than 1%;

d. and searching out an amino acid combination in the primary sequence of each protein by using a strategy obtained by training, classifying by a selected classifier, and if the result of the classifier shows that the searched amino acid sequence is an epitope, judging the amino acid sequence as a B cell epitope, otherwise, judging the amino acid sequence as the B cell epitope.

The method for predicting the epitope of the B cell comprises the following specific searching processes of the continuous action searching method:

any 8 amino acid residues of the primary sequence of each protein are taken as initial state s ₀ The corresponding amino acid sequence is denoted as X ₁ X ₂ …X ₈, wherein X_j Represents the jth amino acid, j=1, 2, …,8 to go from the initial state s ₀ K residues in the following 12 continuous residues are selected to be combined into the state to be used as a first action, wherein k is more than or equal to 1 and less than or equal to 12, and one of n complementary classifiers is selected to be used as a second action option; according to the correspondingValue selection of a first action and a second action, wherein a ¹ ,a ² Respectively, all possible actions in the first action and all possible actions in the second action, then calculating rewards for the two actions by a tendentiousness rewards rule, and calculating a cost function according to the following formula:

wherein ,V_π (s) is the cost function in state s, pi is the policy,is expected to be R _t Is the benefit of the t step, V(s) _t+1 ) Is the next state s _t+1 A lower cost function;

the Q value is calculated according to the following formula:

wherein Q_π (s,a ¹ ,a ² ) Is a cost function of performing two consecutive actions in state s,is the next state s _t+1 Execute two consecutive actions down->Is a cost function of (2);

and simultaneously updating the Q value according to the following steps:

then changing the state, repeating the steps, and updating the Q value.

The above method for predicting epitope of B cell, wherein the predisposition rewarding rule is:

extracting features from the amino acid sequence searched by the first action as input of a classifier selected by the second action, and calculating classification score SC of the amino acid sequence by the classifier _t In the set EPT, the probability of occurrence of various amino acids and the probability of occurrence of an amino acid pair comprising two consecutive amino acids are calculated for any one amino acid as _i Probability of occurrenceThe calculation is performed according to the following formula:

wherein as_i Represents any one of the amino acids in 20, num (as _i ) Representation as _i At the number of times set EPT occurs, maxnum (as ₁ ,as ₂ ,…,as ₂₀ ) Represents the maximum number of occurrences of 20 amino acids in the set EPT, minnum (as ₁ ,as ₂ ,…,as ₂₀ ) A minimum value representing the number of occurrences of 20 amino acids in the aggregate EPT;

any one of amino acid pairs AA _i Probability of occurrence of (a)The calculation is performed according to the following formula:

wherein AA_i Represents one of 400 amino acid pairs, num (AA _i ) Representation of AA _i At the number of times set EPT occurs, maxnum (AA ₁ ,AA ₂ ,…,AA ₄₀₀ ) Represents the maximum number of occurrences of 400 amino acid pairs in the set EPT, minnum (AA ₁ ,AA ₂ ,…,AA ₄₀₀ ) A minimum value representing the number of occurrences of 400 amino acids in the aggregate EPT;

the timely rewards obtained for the amino acid sequence generated in step t are calculated from the following formula:

wherein len (sq) _t ) Representing the sequence sq _t Comprising the number of amino acids, len (sq _t ) The number of consecutive amino acid pairs is indicated.

Advantageous effects

The method combines the Q_learning algorithm, the continuous action search method and the tendency rewarding rule, and introduces the complementary classifier, and greatly enhances the prediction capability of B cell epitopes through automatic iteration, and improves the accuracy of epitope classification.

Drawings

The invention is described in further detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of the Q_learning algorithm.

Detailed Description

The invention provides an epitope prediction method for B cells, which adopts an epitope prediction method based on continuous action search of a Q-learning method, wherein one action selects sequence length and one action selects a complementary classifier, so that the autonomous selection of sequence length is realized, and the optimal classifier can be selected for classification.

In the Q-Learning reinforcement Learning algorithm, each state-action has a corresponding Q value. Therefore, the Learning process of the Q-Learning algorithm is to iteratively calculate the Q value of the Learning state-action pair. Finally, the optimal action strategy obtained by the learner is to select the action corresponding to the maximum Q value in the state s. The Q value Q (s, a) based on the action a in the state s is defined as the cumulative return value obtained by the learner executing the action a in the state s and executing the action according to a certain action strategy. The basic equation for Q value update is:

Q(s _t ,a _t )＝Q(s _t ,a _t )+α[Rs _t +γmaxQ(s _t+1 ，a)-Q(s _t ，at)]

the above formula (wherein: a is optional action in the state; rs) _t An immediate prize awarded for the environment in state s at time t; alpha is the learning rate; q(s) _t ,a _t ) Evaluation value of state-operation (s, a) at time t.

The Q-Learning algorithm pseudocode is shown in Table 1:

TABLE 1Q-Learning algorithm pseudocode

FIG. 1 is a schematic diagram of the Q_learning algorithm.

Interpretation of technical terms

(1) Protein primary sequence, sequence composed of 20 amino acids, such as ADFCEGHIKLST.

(2) B cell epitopes, which are part of the primary sequence of a protein, may be composed of partial sequences.

(3) An "epoode" is a process in which an Agent (Agent) performs a policy within an environment from start to end.

(4) Agent (Agent): a software and hardware mechanism. It takes corresponding measures through interaction with the surrounding environment. Agent chinese, also known as Agent, is often referred to as Q Agent in Q learning algorithm, which is the subject of exploration and learning to the environment.

(5) Action (Action): various possible actions that the agent may take. Although the actions themselves are somewhat self-explanatory, we still need to have agents able to choose from a series of discrete and possible actions.

(6) Environment (Environment): there is an interaction between the external environment and the agent, and a responsive relationship. The environment takes as input the current state and action of the agent and takes as output the rewards and next state of the agent. An environment is all presence outside of an agent.

(7) State (State): the state is a self-discovery, specific and immediate case by the agent, including: specific location, time of day, and transient configuration that associates agents with other important things.

(8) Rewards (Reward): rewards are feedback by which we can measure the success or failure of various actions of an agent in a given state.

(9) Discount factor (discover): the discount factor is a multiplier. Future rewards found by the agent are multiplied by the factor to attenuate the cumulative impact of such rewards on the agent's current action selection. This is the core of reinforcement learning, i.e., by gradually decreasing the value of future rewards in order to give more weight to the most recent actions. This is critical for a paradigm based on the principle of "delayed action".

(10) Policy (Policy): is a function of the input state observation and output actions. Is the policy that the agent uses to determine the next action based on the current state. It can map different states onto various actions to promise the highest rewards.

(11) Value (Value): it is defined as the long-term expected rewards (not short-term rewards) with discounts for the current state under a specific policy. Short term rewards are transient rewards that are earned by an agent in a certain state and taking a specific action. The value is the amount of rewards that an agent expects to be earned from a certain state until the future.

(12) Q value (Q-value) or action value (action-value): the difference from the "value" is that the Q value requires an additional parameter, i.e. the current action. It refers to a long-term rewards generated by the current state for an action under a specific policy.

(13) Bellman equation: it is a set of equations that decompose the value function into an instant prize plus discounted future values.

(14) Value iteration (Value iteration): this is an algorithm that computes a function with the best state value by iteratively refining the estimates for the values. The algorithm initializes the value function to an arbitrary random value and then repeatedly updates the values of the Q value and the value function until they converge.

(15) Policy iteration (Policy iteration): since the agent is only concerned with finding the optimal strategy, the optimal strategy sometimes converges before the cost function. Thus, policy iteration should not repeatedly improve the evaluation of the value function, but rather need to redefine the policy at each step and calculate the value from the new policy until the policy converges.

(16) Q learning (Q-learning): as an example of a model-less learning algorithm, it does not assume that the agent has already performed a transition of state and rewards model like a finger palm, but rather "thinks" that the agent will find the correct action through trial and error. Therefore, the basic idea of Q learning is: during agent interaction with the environment, a sample of the Q-value function is observed to approximate the Q-function of a "state-action pair".

The epitope prediction method based on machine learning mostly adopts marked sequences as positive samples and unlabeled sequences as negative samples, uses amino acid physical and chemical properties, statistical properties, structural information and the like as characteristic inputs, trains a classifier by using a common classification learning algorithm, and classifies the sequences by using the classifier. This prediction method generally sets a window in advance, and the window size is the number of amino acids contained in the result. Because of the large sequence differences of epitopes, it is difficult to predict epitopes with a trained classifier, and thus the integration method is more advantageous.

The method comprises the steps of firstly searching B cell epitope sequence data from an IEDB database to form a set EPT, and extracting a corresponding protein primary sequence from a uniport database to form a pre-training set PT; in each epoode, the Q agent searches for a combination of residues in the PT according to a continuous action search method, taking any 8 consecutive amino acid residues in the PT as states, and selecting k residues from 12 consecutive residues following each state to incorporate the state as a first action; and (3) selecting one of n complementary classifiers as a second action option, giving instant rewards to the searched amino acid sequence by a tendentiousness rewarding rule, calculating a Q value and updating until the change of a cost function is less than 1%, and then predicting the epitope of the B cell by using the trained classifier.

The specific searching steps are as follows:

first, searching B cell epitope sequence data from an IEDB database to form a set EPT, extracting corresponding protein primary sequences from a uniport database to form a pre-training set PT, and selecting a set containing n more than or equal to 2 complementary classifiers as a second action.

A second step of initializing Q values corresponding to all states and actions to 0, setting learning rate alpha to any number between 0 and 1, setting discount factor gamma to any number between 0 and 1, setting value of epinode, and setting state s ₀ Any 8 amino acid residues from a pre-training set.

Third, in each epoode, the Q agent searches in the collection PT according to the continuous action search method, and in the t-th step, the Q agent selects an action from the collection of the first actionsThen select an action from the set of second actionsThe two actions are continuously carried out, and rewards R are given according to a tendentiousness rewarding rule after the execution is finished _t And the next observation state s _t+1 . The Q value is then updated, as are the states and action tables. When the change in the cost function is less than 1%, the search training process ends.

And fourthly, searching out an amino acid combination in the primary sequence of each protein by using a strategy obtained by training, classifying by a selected classifier, and considering the searched amino acid sequence as an epitope if the result of the classifier shows that the searched amino acid sequence is the epitope, otherwise, not considering the amino acid sequence as the epitope.

In the third step, the "continuous motion search method" is performed by using any 8 amino acid residues of the primary sequence of each protein as the initial state s ₀ The corresponding sequence is denoted as X ₁ X ₂ …X ₈ In s ₀ The latter i residues combine to act, i=1, 2, …,12. According to the correspondingValue and state s ₀ Selecting a first action and a second action, wherein a ¹ ,a ² Refers to all possible actions in the first action and all possible actions in the second action, respectively, perform action +.>Corresponding to the initial state s ₀ The first action is to select k residues from the latter 12 consecutive residues and incorporate this state, resulting in a first fragment X of the amino acid sequence of length k+8 ₁ X ₂ …X ₈ …X _k+8 Execute action->Selecting an mth classifier in the second action set, wherein m is equal to or less than 1 and n, calculating rewards for the two actions by using a tendentiousreward rule, calculating a cost function according to a formula (1), calculating a Q value according to a formula (2), and observing the next state s ₁ And simultaneously, the Q value is updated according to the formula (3). Then, bySum state s ₁ Select action-> and />Calculating a cost function according to the formula (1), calculating a Q value according to the formula (2), and observing the next state s ₂ At the same time according to the publicEquation (3) updates the Q value. Thereafter, training is performed in the same manner as described above.

The cost function under each group of continuous actions in the learning network is:

wherein ,V_π (s) is the cost function in state s, pi is the policy,is expected to be R _t Is the benefit of the t step, V(s) _t+1 ) Is the next state s _t+1 The following cost function.

The calculation formula of the Q value of each group of actions at the step t:

wherein Q_π (s,a ¹ ,a ² ) Is a cost function of performing two consecutive actions in state s,is the next state s _t+1 Execute two consecutive actions down->Is a cost function of (a).

The Q value updating and calculating method comprises the following steps:

and in the third step, the tendency rewarding rule is calculated by two parts of classification scores given to sequences by the classifier and the occurrence probability of the sequences. Extracting features from the amino acid sequence searched for in the first action as input to the classifier selected in the second action, calculating a classification score for the amino acid sequence from the classifier, e.g. in step t, classifying the amino acid sequence obtained in the first action in the second actionThe score calculated by the machine is denoted as SC _t . In the aggregate EPT, the number of occurrences of each amino acid and the amino acid pair comprising two consecutive amino acids were counted. For any one amino acid as _i Probability of occurrenceAccording to the formula->Calculation of as _i Represents any one of the amino acids in 20, num (as _i ) Representation as _i At the number of times set EPT occurs, maxnum (as ₁ ,as ₂ ,…,as ₂₀ ) Represents the maximum number of occurrences of 20 amino acids in the set EPT, minnum (as ₁ ,as ₂ ,…,as ₂₀ ) Represents the minimum number of occurrences of 20 amino acids in the aggregate EPT. Any one of amino acid pairs AA _i Is->According to the formula->Performing a calculation, wherein AA _i Represents one of 400 amino acid pairs, num (AA _i ) Representation of AA _i At the number of times set EPT occurs, maxnum (AA ₁ ,AA ₂ ,…,AA ₄₀₀ ) Represents the maximum number of occurrences of 400 amino acid pairs in the set EPT, minnum (AA ₁ ,AA ₂ ,…,AA ₄₀₀ ) Represents the minimum number of occurrences of 400 amino acids in the aggregate EPT.

The time-ordered rewards obtained from the amino acid sequence generated in step t are represented by the formulaCalculation, wherein len (sq _t ) Representing the sequence sq _t Comprising the number of amino acids, len (sq _t ) The number of consecutive amino acid pairs is indicated.

Compared with the traditional epitope prediction scheme based on the window method, the epitope prediction method based on continuous action search realizes autonomous selection of the epitope sequence, eliminates the influence of human factors, introduces a complementary classifier, and improves the classification accuracy.

The invention adopts a continuous action search method, which takes the number of amino acids in a common epitope sequence of 8-20 as a design basis, and meets the calculation requirement of a complementary classifier under the condition of covering the length of the common epitope.

The method adopts a tendency rewarding rule, not only considers the calculation score of the classifier on the search sequence, but also considers the statistical characteristics of the sequence, comprehensively judges the score of the classifier, the occurrence probability of amino acid and the occurrence probability of amino acid pairs, and calculates the timely rewards of two continuous actions under each state.

Claims

1. The method is characterized in that firstly, B cell epitope sequence data is searched from an IEDB database to form an EPT set, and a corresponding protein primary sequence is extracted from a uniport database to form a pretrained PT set; based on the Q_learning algorithm, one action of the algorithm is changed into two actions to train; in each epoode, the Q agent takes any 8 consecutive amino acid residues in the primary sequence of the protein as a state to select k residues from the 12 consecutive residues following each state to incorporate the state as a first action; selecting one of n complementary classifiers as a second action, searching in a protein primary sequence in PT according to a continuous action search method, giving instant rewards to the searched amino acid sequence by a tendency rewarding rule, calculating a Q value and updating until the change of a value function is less than 1%, and ending training; then searching an amino acid sequence in the protein primary sequence by using a strategy obtained by training, and classifying by a selected classifier, thereby realizing B cell epitope prediction;

the method comprises the following steps:

a. b cell antigen epitope sequence data are searched from an IEDB database to form an assembly EPT, corresponding protein primary sequences are extracted from a uniport database to form a pre-training assembly PT, n complementary classifiers are selected to be used as a second action assembly, and n is more than or equal to 2;

b. taking any 8 consecutive amino acid residues in the primary sequence of the protein as a state, and selecting k residues from 12 consecutive residues following each state to incorporate the state as a first action; selecting one of n complementary classifiers as a second action, initializing Q values corresponding to all states and actions to 0, setting learning rate alpha to any number between 0 and 1, setting discount factor gamma to any number between 0 and 1, setting value of epoode, and initializing state s ₀ Any 8 amino acid residues that are a pre-training set;

c. in each epoode, the Q agent searches among the primary sequences of proteins in the collection PT according to a continuous action search method: at the t-th step, let its state be s _t The Q-agent selects an action from a set of first actionsThen select action +.>Awarding the prize R according to a tendency prize law after the two actions are executed _t And the next observation state s _t+1 Then updating the Q value, and updating the state and the action table at the same time, and ending the search training process when the change of the cost function is less than 1%;

d. searching out an amino acid combination in the primary sequence of each protein by utilizing a strategy obtained by training, classifying by a selected classifier, and if the result of the classifier shows that the searched amino acid sequence is an epitope, considering the epitope as a B cell epitope, otherwise, not considering the epitope as a B cell epitope;

the specific searching process of the continuous action searching method comprises the following steps:

any 8 amino acid residues of the primary sequence of each protein are taken as initial state s ₀ The corresponding amino acid sequence is denoted as X ₁ X ₂ …X _j, wherein X_j Represents the jth amino groupAcid, j=1, 2, …,8 to go from initial state s ₀ K residues from the following 12 consecutive residues are selected to be incorporated into the state as a first action, 1.ltoreq.k.ltoreq.12, and one of n complementary classifiers is selected as a second action; according to the correspondingValue selection of the first action and the second action, wherein +.>Respectively, all possible actions in the first action and all possible actions in the second action, and then calculating rewards for the two actions by a tendentiousness rewarding rule, and calculating a cost function according to the following formula:

wherein ,V_π (s _t ) Is state s _t The cost function, pi, below, is the policy,is expected to be R _t Is a reward for t steps, V(s) _t+1 ) Is the next state s _t+1 A lower cost function;

the Q value is calculated according to the following formula:

wherein ,is state s _t A cost function for performing two consecutive actions, < ->Is the next state s _t+1 Execute two consecutive actions down->Is a cost function of (2);

and simultaneously updating the Q value according to the following steps:

then changing the state, repeating the steps, and updating the Q value;

the tendentiousness rewarding rule is as follows:

extracting features from the amino acid sequence searched by the first action as input of a classifier selected by the second action, and calculating a classification score SC of the amino acid sequence by the classifier _t In the set EPT, the probability of occurrence of various amino acids and the probability of occurrence of an amino acid pair comprising two consecutive amino acids are calculated for any one amino acid as _u Probability of occurrenceThe calculation is performed according to the following formula:

wherein as_u Represents any one of 20 amino acids, num (as _u ) Representation as _u At the number of times set EPT occurs, maxnum (as ₁ ,as ₂ ,…,as ₂₀ ) Represents the maximum number of occurrences of 20 amino acids in the set EPT,

minnum(as ₁ ,as ₂ ,…,as ₂₀ ) A minimum value representing the number of occurrences of 20 amino acids in the aggregate EPT;

any one of amino acid pairs AA _v Probability of occurrence of (a)The calculation is performed according to the following formula:

wherein AA_v Represents one of 400 amino acid pairs, num (AA _v ) Representation of AA _v At the number of times set EPT occurs, maxnum (AA ₁ ,AA ₂ ,…,AA ₄₀₀ ) Represents the maximum of the number of occurrences of 400 amino acid pairs in the set EPT,

minnum(AA ₁ ,AA ₂ ,…,AA ₄₀₀ ) A minimum value representing the number of occurrences of 400 amino acids in the aggregate EPT;

wherein len (sq _t ) Representing the sequence sq _t Comprising the number of amino acids.