CN114863508B

CN114863508B - Expression recognition model generation method, medium and device of self-adaptive attention mechanism

Info

Publication number: CN114863508B
Application number: CN202210298795.3A
Authority: CN
Inventors: 张通; 叶汉云; 陈俊龙
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2024-08-06
Anticipated expiration: 2042-03-24
Also published as: CN114863508A

Abstract

The invention provides an expression recognition model generation method, medium and equipment of a self-adaptive attention mechanism; the method comprises the following steps: setting the number of individuals of each generation of population as k; setting the number of modules contained in each individual, and the number of selectable intervals and nodes of the number of output channels of each module, and the gene format and the gene length of the individual; initializing the individual genes of the generation 1 population to generate k individuals; training and testing the individual; when the iteration number reaches the maximum iteration number, finding out individuals with the highest fitness in all generation populations to obtain an optimal expression recognition model; otherwise, selecting parent individuals of the next generation population and changing, and then carrying out the next iteration processing. According to the method, the expression neural network can be designed according to different expression data scenes, and the attention mechanism is adaptively introduced to improve the performance of expression recognition, so that the network can rapidly, accurately and light-weight perform expression recognition.

Description

Expression recognition model generation method, medium and device of self-adaptive attention mechanism

Technical Field

The present invention relates to the field of expression recognition technology, and in particular, to a method, medium, and apparatus for generating an expression recognition model by a self-adaptive attention mechanism.

Background

Emotion is an indispensable part of the human-to-human interaction process, and different emotions provide a large amount of information in the interaction process, so that the interaction and decision among people are influenced. Emotion capability is also an important sign of artificial intelligence, and it is desirable to be able to perform human-computer interactions in a more friendly way. How to enable a computer to calculate and even possess emotion becomes an important direction of artificial intelligence research.

Emotion computing is proposed to solve the problem, and the aim of emotion computing is to endow a computer with the ability to sense, understand and even express emotion. With this premise, it becomes particularly important how the computer perceives and understands the emotion of the human. Among the various ways of expressing emotion, facial expression data are most easily obtained, and most intuitively represent emotion changes of the subject, and are receiving a great deal of attention.

The task of facial expression recognition is to locate and extract facial feature information from a picture or video sequence and use this information to classify expressions into different expression categories. At present, the most commonly used deep learning method is to perform expression recognition, and the core structure of the method is generally a deep convolutional neural network (Deep convolution neural networks, DCNN).

At present, a plurality of expression recognition methods are generally applied to a facial expression data set collected in a laboratory, conditions such as gestures, angles and illumination are relatively fixed, however, at present, a plurality of expression recognition tasks are required to recognize expressions of a real-world scene, the facial expression data set based on the laboratory is not considered enough for complex and changeable environments, and the robustness of a recognition system is not high. Although the deep neural network can extract effective characteristics for expression recognition, training a complex neural network requires a great deal of computational cost and training time.

At present, a mature expression recognition model needs a great deal of time cost and manual experience, and in the face of increasingly complex network architecture and facial expression, the design is more and more complex only by relying on manual work, and researchers lacking a great deal of priori knowledge may not have a way to well design the expression recognition model.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention aims to provide an expression recognition model generation method, medium and equipment of a self-adaptive attention mechanism; according to the method, the expression neural network can be designed according to different expression data scenes, and the attention mechanism is adaptively introduced to improve the performance of expression recognition, so that the network can rapidly, accurately and light-weight perform expression recognition.

In order to achieve the above purpose, the invention is realized by the following technical scheme: a self-adaptive attention mechanism expression recognition model generation method is characterized in that: the method comprises the following steps:

S1, setting the number of individuals of each generation of population as k; setting the number of modules contained in each individual as m, wherein the number of nodes of the j th module and the selectable interval of the number of output channels of each module are n _j, j=1, … and m respectively;

The gene length of each individual is L; the genes of each individual respectively comprise information corresponding to m modules; the information of each module includes five parts: the first part indicates whether a module exists; the second part represents the number of output channels of the module; the third part represents whether the output of the module is input to the attention network to control the learning of sparse center loss; the fourth part represents whether the multi-head attention module is added after the module is output; the fifth part represents the connection mode of the nodes inside the module;

S2, randomly sampling k binary codes with the length L from Gaussian distribution; respectively initializing the genes of individuals in the generation 1 population into k binary codes;

s3, generating k individuals of the current generation population according to the genes of the individuals;

S4, respectively inputting training samples into k individuals in the current generation population for training to obtain k trained individuals;

S5, respectively inputting the test samples into k trained individuals for testing to obtain the adaptability of k individuals in the current generation population;

s6, judging whether the iteration number reaches the maximum iteration number or not:

If yes, finding out the individual with the highest fitness in all generation groups, and setting the individual with the highest fitness as an optimal expression recognition model;

Otherwise, repeatedly selecting k individuals from the current generation population by using a roulette algorithm as k parent individuals of the next generation population, and respectively changing genes of the k parent individuals by using an interleaving operator;

Comparing the fitness of k parent individuals with the fitness of individuals of the previous generation population respectively to obtain a reward value; calculating the selection probability of the inversion operator and the crossover operator according to the reward value by using a reinforcement learning algorithm, and respectively changing k parent individual genes after the crossover operator is changed according to the probability selection inversion operator or the crossover operator to obtain k individual genes of the next generation population; and then, jumping to the step S3 to perform the next iteration processing.

Preferably, the gene length of each individual

Preferably, in each individual, a front convolution layer is arranged in front of each module, and a rear convolution layer and a pooling layer are arranged behind each module;

In the step S3, the method for generating k individuals of the current generation population according to the genes of each individual is as follows:

judging whether the module exists according to the first part: if the module exists, adding the module into an individual for connection; determining the number of convolution kernel channels of the module according to the second part, and determining the connection mode of the internal nodes of the module according to the fifth part; if the module does not exist, the output of the front convolution layer of the module is directly input to the pooling layer, and the content of the module is not reserved;

and meanwhile, controlling the learning of the sparse center loss according to whether the output of the third part decision module is input into the attention network, and judging whether the multi-head attention module is added after the output of the fourth part decision module.

Preferably, the learning of controlling the sparse center loss according to whether the output of the third part decision module is input to the attention network means: for a module input to the attention network, inputting the output of a rear convolution layer of the module into the convolution layer, inputting the flattened features into the attention network, and then inputting the flattened features into the learning of sparse center loss; the output of the module post convolution layer is directly input into the pooling layer;

Whether the multi-head attention module is added after the output of the fourth part decision module refers to: for a module added with a multi-head attention module, the output of the post-module convolution layer is input to the multi-head attention module, and the output of the multi-head attention module is input to the pooling layer.

Preferably, the calculation method of the sparse center loss L _SC is as follows:

wherein a _r is the attention weight; x _r represents the input features of sparse center loss; n is the number of input features; c is a feature center of the corresponding category of the input feature;

The calculation method of the attention weight a _r is as follows:

Wherein e _r is a feature vector obtained after the input feature x _r passes through a plurality of full connection layers; And Respectively network automatic learning parameters; Representing a probability of containing the corresponding element; Representing a probability of excluding the corresponding element; And Is obtained by mapping e _r to a two-dimensional vector;

Wherein exp is an exponential function.

Preferably, in the multi-head attention module, h= { H ₁,…,H_z } is set to be a plurality of heads of spatial attention, s= { s ₁,…,s_z } is a corresponding spatial attention feature map, where z represents the number of attention heads, and then the spatial attention output by the d-th head may be expressed as:

s_d＝x′×H_d(w_s,x′),d∈{1,z}

wherein x' is the input feature of the multi-head attention module, and w _s is the weight.

Preferably, in the step S6, the method for calculating the prize value is as follows:

Setting a prize value A ₁、A₂、A₃、A₄、A₅; and A ₁＞A₂＞A₃,A₄＜A₅; the rewarding values of the corresponding individuals of the previous generation population of the k parent individuals are respectively used as rewarding basic values of the k parent individuals; the fitness of k parent individuals is compared with the fitness of the corresponding individuals of the previous generation population (i.e., the parents' ancestors), respectively:

If the fitness of the parent individual is greater than the fitness of the corresponding individual in the previous generation population and the fitness of the parent individual is greater than the fitness of the optimal individual in the corresponding individual in the previous generation population, then the prize value = prize base value +a ₁;

If the fitness of the parent individual is greater than the fitness of the corresponding individual in the previous generation population and the fitness of the parent individual is less than or equal to the fitness of the optimal individual in the corresponding individual in the previous generation population, when the reward base value is greater than 0, the reward value=the reward base value +A ₂; when the prize base value is less than or equal to 0, the prize value=prize base value+a ₃;

If the fitness of the parent individuals is less than the fitness of the corresponding individuals of the previous generation population, when the reward base value is more than 0, the reward value=the reward base value-a ₄; when the prize base value is less than or equal to 0, prize value = prize base value-a ₅.

Preferably, in the step S6, the method for calculating the selection probabilities of the inversion operator and the crossover operator according to the reward value by using the reinforcement learning algorithm is:

setting a parameter w _i (t) to assist the learning of the strengthening algorithm, wherein p _i is the probability of the inversion operator being selected, and 1-p is the probability of the intersection operator being selected; initialization probability p _i is 0.5, w _i (t) =0;

At the t-th iteration, the probability p _i (t) of the individual i is calculated according to the following formula:

w_i(t)＝w_i(t-1)+Δw_i

p_i(t)＝p_i(t-1)+Δp_i(t)

Wherein g _i is a probability mass function; y _i = 1 denotes that the inversion operator is selected, y _i = 0 denotes that the crossover operator is selected; b _i is a baseline value of reinforcement learning, η _i is a learning rate of reinforcement learning, and r _i (t) is a reward value.

A storage medium having stored therein a computer program which, when executed by a processor, causes the processor to perform the expression recognition model generation method of the adaptive attention mechanism described above.

A computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the expression recognition model generation method of the adaptive attention mechanism.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the method, the expression recognition model can be adaptively designed aiming at the expression data sets of different scenes, the guidance of human priori knowledge is eliminated, the obtained model can be more suitable for the actual requirements of specific scenes, and meanwhile, the threshold for designing the mature expression recognition model is reduced because manual guidance is not needed;

2. According to the method, searching is carried out in a limited search space, redundant connection of a network is subtracted according to specific requirements, and the parameter quantity of the network is smaller than that of an expression recognition model designed manually to a certain extent, so that training efficiency is improved;

3. The model searched by the method has advantages on a plurality of data sets, and on the RAF-DB expression data set, the recognition accuracy rate of 85.85% is achieved; on the FER+ data set, the recognition accuracy rate of 86.05% is achieved, and the recognition accuracy rate is superior to that of the existing many expression recognition models.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a network architecture diagram of an individual example of the method of the present invention;

FIG. 3 is a diagram of the network structure of an individual with a binary code 110-001010-100-101-010101 in the method of the present invention;

FIG. 4 is a diagram of the network configuration of the module's outputs as they are input to the attention network in the method of the present invention;

FIG. 5 is a diagram of an attention network for calculating an attention weight a _r in the method of the present invention;

FIG. 6 is a diagram of the network architecture of the method of the present invention when a multi-head attention module is added after the module is output;

FIG. 7 is a block diagram of the overall architecture of a multi-headed attention module in the method of the present invention;

fig. 8 (a) and 8 (b) are internal structural diagrams of a multi-head attention module in the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and the detailed description.

Example 1

According to the expression recognition model generation method of the self-adaptive attention mechanism, the design method is mainly based on the design of the expression model by an evolutionary algorithm, and the design method is guided to achieve optimization better by using a reinforcement learning algorithm on the basis.

The inspiration of the evolutionary algorithm is from evolutionary phenomena in biology, and the main steps of the evolutionary algorithm are population initialization, parent selection, offspring generation and the like, so that the evolutionary algorithm is a search algorithm for solving the optimization problem. By binary coding the network structure, each network can be regarded as an independent individual, the coding is the gene thereof, the evolution starts from the population consisting of completely random individuals, the population is changed by generating filial generation through mutation and intersection, the individual presents the recognition accuracy of the expression model on the specific problem, the higher the recognition accuracy is, the better the adaptability of the problem is represented, and the more possible the gene thereof is reserved.

The method of the invention is shown in figure 1 and comprises the following steps:

S1, setting the number of individuals of each generation of population as k; setting the number of modules contained in each individual as m, wherein the number of nodes of the j th module and the selectable interval of the number of output channels of each module are n _j, j=1, … and m respectively; FIG. 2 is a network architecture diagram of an individual example;

The gene length of each individual is L; the genes of each individual respectively comprise information corresponding to m modules; the information of each module includes five parts: the first part indicates whether a module is present. The second part represents the number of output channels of the module, which can be represented by two binary bits, i.e. 00, 01, 10, 11 represents the number of four channels. The third part represents whether the output of the module is input to the attention network to control the learning of sparse center loss; the connection mode of the nodes in the module can be regarded as a directed acyclic graph, only the node with smaller number can be input to the node with higher number, namely the node with 2 number can be input only through the node with 1 number, so the connection mode in the node can be used The first bit indicates whether the node numbered 1 is connected to the node numbered 2, the second, three bits indicate whether the node numbered 1,2 is connected to the node numbered 3, and so on. The fourth section indicates whether the multi-head attention module is to be added after the module outputs. The fifth part represents the connection mode of the nodes inside the module.

Thus, the gene length of each individualFIG. 3 is a diagram of the network structure of an individual with a binary encoding 110-001010-100-101-010101; a front convolution layer is arranged in front of each module, and a rear convolution layer and a pooling layer are arranged behind each module.

S2, randomly sampling k binary codes with the length L from Gaussian distribution; the genes of the individuals in the generation 1 population are respectively initialized to k binary codes.

S3, generating k individuals of the current generation population according to the genes of the individuals.

In order to ensure the continuity of the network, a front convolution layer is arranged in front of each module in each individual, and a rear convolution layer and a pooling layer are arranged behind each module;

and controlling the learning of the sparse center loss according to whether the output of the third part decision module is input to the attention network or not: for a module input to the attention network, inputting the output of a rear convolution layer of the module into the convolution layer, inputting the flattened features into the attention network, and then inputting the flattened features into the learning of sparse center loss; the output of the module post convolution layer is directly input into the pooling layer;

As shown in fig. 4, when the binary bit is 1, the output of the module is to be input into the attention network, and when the binary bit is 0, the output of the module is not to be input into the attention network, and fig. 4 shows a case that an individual is composed of 3 modules and the 3 binary bits are all 1; for more visual presentation, FIG. 4 treats both the pre-convolution and post-convolution layers of each module as part of the module.

Considering that the expression recognition task is easily influenced by large similarity among different categories and large difference in the same category, sparse center loss is provided to improve the performance of the expression recognition model. However, since not all elements in the output features of the module are responsible for the distinction of the expression, we need to select only a subset of the elements to distinguish, so a sparse center penalty is proposed that weights the center penalty by inputting the features into a attention network, thus better constraining the optimization of the center penalty.

The calculation method of the sparse center loss L _SC is as follows:

The attention network for calculating the attention weight a _r is shown in fig. 5; the full connection layer receives the characteristics output by the module, the characteristics are processed by a plurality of full connection layers to obtain a characteristic vector e _r, and two outputs are obtained by a multi-head binary classification unit:

The attention weight a _r of the corresponding element is:

Wherein exp is an exponential function.

Because each hidden layer feature has the potential to contribute to the differentiation of the sample from its class center, the method of the present invention uses a neural architecture search approach to adaptively select the modules that need to be focused on. The attention network is also encoded, and the encoding determines whether the corresponding module output is to be input into the attention network, so n-bit encoding is required, and when the outputs of a plurality of modules are used, the outputs are added together and input into the attention network.

Meanwhile, whether the multi-head attention module is added after the output of the module is determined according to the fourth part: for a module added with a multi-head attention module, inputting the output of a convolution layer arranged behind the module into the multi-head attention module, and inputting the output of the multi-head attention module into a pooling layer;

As shown in fig. 6, when the binary bit is 1, the multi-head attention module is added after the module is output, when the binary bit is 0, the multi-head attention module is not added after the module is output, and fig. 6 shows the situation that an individual is composed of 3 modules and the first two binary bits are 1; for more visual presentation, FIG. 6 treats both the pre-convolution and post-convolution layers of each module as part of the module.

The method provides a focusing mechanism, focuses on the key parts in the hidden layer feature matrix rather than all special diagnosis, and can better distinguish different expressions.

Specifically, the multi-head attention module is composed of a spatial attention module and a channel attention module, the spatial attention module receives the characteristics output from the hidden layer and extracts spatial attention weights, then the spatial attention module inputs the spatial attention weights to the channel attention module to extract channel attention weights, and the extracted spatial attention and channel attention weights are combined with the characteristics output from the hidden layer to serve as final extracted characteristics; the whole structure is shown in fig. 7;

Wherein the attention weight obtained after the feature is input to the attention module is weighted by matrix multiplication with the feature itself.

The internal structure of the attention module is shown in fig. 8 (a) and 8 (b); the spatial attention module constructs convolution kernels of 3×3, 1×3 and 3×1 to capture local features on multiple scales, and performs attention perception on feature information extracted by the convolution kernels through maximum pooling and average pooling on spatial dimensions to obtain final attention weights; the channel attention module directly adopts maximum pooling and average pooling in channel dimension to extract the channel information to be focused, and obtains the final attention weight through the multi-layer perceptron.

In the formula, setting h= { H ₁,…,H_z } to be a plurality of heads of spatial attention, s= { s ₁,…,s_z } to be a corresponding spatial attention feature map, where z represents the number of attention heads, the spatial attention output by the d-th head can be expressed as:

s_d＝x′×H_d(w_s,x′),d∈{1,z}

Similar to sparse center loss, multiple attention modules are also associated with each module in an individual, so modules that post-needs to be added with multiple attention modules are adaptively selected through the neural architecture and encoded into n bits for searching.

S4, respectively inputting the training samples into k individuals in the current generation population for training to obtain k trained individuals.

S5, respectively inputting the test samples into the trained k individuals for testing to obtain the adaptability (namely expression recognition accuracy) of the k individuals in the current generation population.

Otherwise, k individuals can be repeatedly selected from the current generation population as k parent individuals of the next generation population by using a roulette algorithm, the probability that the individual with higher fitness is selected is larger, and the genes of the k parent individuals are respectively changed by using an interleaving operator, wherein each parent individual gene is changed from 0 to 1 or from 1 to 0 by the interleaving operator according to a certain probability;

Comparing the fitness of k parent individuals with the fitness of individuals of the previous generation population respectively to obtain a reward value; calculating the selection probability of the inversion operator and the crossover operator according to the reward value by using a reinforcement learning algorithm, and respectively changing k parent individual genes after the crossover operator is changed according to the probability selection inversion operator or the crossover operator to obtain k individual genes of the next generation population; wherein, the inversion operator randomly inverts a certain section in the gene in sequence; randomly replacing a certain section of the individual to be processed by using a gene at a corresponding position of the individual with higher fitness according to the practical significance by using a crossover operator; and then, jumping to the step S3 to perform the next iteration processing.

Specifically, the method for calculating the prize value is as follows:

The method for calculating the selection probability of the inversion operator and the crossover operator according to the reward value by using the reinforcement learning algorithm is as follows:

Setting y _i =1 to indicate that the inversion operator is selected, and y _i =0 to indicate that the crossover operator is selected; for y _i, the probability mass function g _i is expressed as:

Wherein, p _i is the probability of the inversion operator being selected, and 1-p _i is the probability of the crossover operator being selected;

Setting a parameter w _i (t) to assist the learning of the reinforcement algorithm, and updating w _i (t) after each round of reinforcement learning score is obtained, so as to control the updating of the selection probability:

At the time of the t-th iteration,

w_i(t)＝w_i(t-1)+Δw_i

p_i(t)＝p_i(t-1)+Δp_i(t)

Wherein b _i is a baseline value for reinforcement learning, which is mainly compared with the currently obtained prize value, if the prize value is below the expected value, the probability of selecting the inversion operator is expected to be reduced; if the current value is above the prize value, the probability of selecting the inversion operator is expected to be increased, and the current value is initially set to 0; η _i is the learning rate of reinforcement learning, and r _i (t) is the prize value.

The performance of the two operators cannot be judged initially, so that the initialization probability p _i is 0.5, w _i (t) =0; as learning progresses, the probability of selection will exhibit a certain tendency to select. Obviously, the better the performance of the operator performance, the greater the resulting prize value, and the greater the probability of selecting a response operator, and vice versa.

The present invention introduces a reinforcement learning mechanism to decide which operator to use for the round, their search performance will be feedback controlled by the reinforcement learning algorithm to adaptively select the operator to be executed, the better the operator's optimization performance, the more likely it is to be selected.

Example two

A storage medium of this embodiment is characterized in that the storage medium stores a computer program, which when executed by a processor, causes the processor to perform the expression recognition model generating method of the adaptive attention mechanism of the embodiment.

Example III

The computing device of the present embodiment includes a processor and a memory for storing a program executable by the processor, where the expression recognition model generating method of the adaptive attention mechanism of the first embodiment is implemented when the processor executes the program stored in the memory.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. A self-adaptive attention mechanism expression recognition model generation method is characterized in that: the method comprises the following steps:

S2, randomly sampling k binary codes with the length L from Gaussian distribution; respectively initializing the genes of k individuals in the generation 1 population into k binary codes;

S4, respectively inputting training samples of the expression data set into k individuals in the current generation population for training to obtain k trained individuals;

s5, respectively inputting test samples of the expression data set into k trained individuals for testing to obtain expression recognition accuracy of k individuals in the current generation population;

if yes, finding out the individual with the highest expression recognition accuracy in all generation groups, and setting the individual with the highest expression recognition accuracy as an optimal expression recognition model;

The expression recognition accuracy of k parent individuals is respectively compared with the expression recognition accuracy of individuals of the previous generation population, and a reward value is obtained; calculating the selection probability of the inversion operator and the crossover operator according to the reward value by using a reinforcement learning algorithm, and respectively changing k parent individual genes after the crossover operator is changed according to the probability selection inversion operator or the crossover operator to obtain k individual genes of the next generation population; then, jumping to the step S3 to perform the next iteration processing;

S7, performing expression recognition by adopting an optimal expression recognition model.

2. The expression recognition model generation method of the adaptive attention mechanism according to claim 1, wherein: gene length of individual individuals

3. The expression recognition model generation method of the adaptive attention mechanism according to claim 1, wherein: in each individual, a front convolution layer is arranged in front of each module, and a rear convolution layer and a pooling layer are arranged behind each module;

4. The expression recognition model generation method of the adaptive attention mechanism according to claim 3, wherein: the learning of controlling the sparse center loss according to whether the output of the third part decision module is input to the attention network means: for a module input to the attention network, inputting the output of a rear convolution layer of the module into the convolution layer, inputting the flattened features into the attention network, and then inputting the flattened features into the learning of sparse center loss; the output of the module post convolution layer is directly input into the pooling layer;

5. The expression recognition model generation method of the adaptive attention mechanism according to claim 1, wherein: the calculation method of the sparse center loss L _SC is as follows:

The calculation method of the attention weight a _r is as follows:

Wherein e _r is a feature vector obtained after the input feature x _r passes through a plurality of full connection layers; And Respectively network automatic learning parameters; Representing a probability of containing the corresponding element; Representing a probability of excluding the corresponding element;

Wherein exp is an exponential function.

6. The expression recognition model generation method of the adaptive attention mechanism according to claim 1, wherein: in the multi-head attention module, h= { H ₁,…,H_z } is set to be a plurality of heads of spatial attention, s= { s ₁,…,s_z } is a corresponding spatial attention feature map, where z represents the number of attention heads, and then the spatial attention output by the d-th head can be expressed as:

s_d＝x′×H_d(w_s,x′),d∈{1,z}

7. The expression recognition model generation method of the adaptive attention mechanism according to claim 1, wherein: in the step S6, the method for calculating the prize value is as follows:

Setting a prize value A ₁、A₂、A₃、A₄、A₅; and A ₁＞A₂＞A₃,A₄＜A₅; the rewarding values of the corresponding individuals of the previous generation population of the k parent individuals are respectively used as rewarding basic values of the k parent individuals; the expression recognition accuracy of k parent individuals is respectively compared with the expression recognition accuracy of corresponding individuals of the previous generation population:

If the expression recognition accuracy of the parent individual is greater than the expression recognition accuracy of the corresponding individual in the previous generation population, and the expression recognition accuracy of the parent individual is greater than the expression recognition accuracy of the optimal individual in the corresponding individual in the previous generation population, the reward value=the reward base value+a ₁;

If the expression recognition accuracy of the parent individual is greater than the expression recognition accuracy of the corresponding individual in the previous generation population, and the expression recognition accuracy of the parent individual is less than or equal to the expression recognition accuracy of the optimal individual in the corresponding individual in the previous generation population, when the reward base value is greater than 0, the reward value=the reward base value +A ₂; when the prize base value is less than or equal to 0, the prize value=prize base value+a ₃;

if the expression recognition accuracy of the parent individuals is less than the expression recognition accuracy of the corresponding individuals of the previous generation population, when the reward base value is more than 0, the reward value=the reward base value-A ₄; when the prize base value is less than or equal to 0, prize value = prize base value-a ₅.

8. The expression recognition model generation method of the adaptive attention mechanism according to claim 1, wherein: in the step S6, the method for calculating the selection probabilities of the inversion operator and the crossover operator according to the reward value by using the reinforcement learning algorithm is as follows:

w_i(t)＝w_i(t-1)+Δw_i

p_i(t)＝p_i(t-1)+Δp_i(t)

9. A storage medium having stored therein a computer program which, when executed by a processor, causes the processor to perform the expression recognition model generation method of the adaptive attention mechanism of any one of claims 1-8.

10. A computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the expression recognition model generation method of the adaptive attention mechanism of any one of claims 1-8.