WO2020233368A1

WO2020233368A1 - Expression recognition model training method and apparatus, and device and storage medium

Info

Publication number: WO2020233368A1
Application number: PCT/CN2020/087605
Authority: WO
Inventors: 王丽杰
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2019-05-22
Filing date: 2020-04-28
Publication date: 2020-11-26
Also published as: CN110309713A

Abstract

Disclosed are an expression recognition model training method and apparatus, and a device and a storage medium, relating to the technical field of artificial intelligence. The method comprises: respectively carrying out the following processing on an original training image set: reducing the resolution of the original training image set to obtain a first-type training image set; rendering background light of the original training image set to obtain a second-type training image set; and reducing the resolution of the original training image set, and rendering the background light of the original training image set to obtain a third-type training image set (S2); and training an expression recognition model by means of the original training image set, the first-type training image set, the second-type training image set and the third-type training image set (S3). By adjusting features such as the definition or background tone of an original training image, multiple types of new training images are obtained, manual marking processing does not need to be carried out on the new training images, and a training sample image set of an expression recognition model is enriched.

Description

Expression recognition model training method, device, equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 22, 2019, the application number is 201910427443.1, and the invention title is "Expression Recognition Model Training Method, Apparatus, Equipment, and Storage Medium", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application relates to the field of image processing technology, in particular to an expression recognition model training method, device, equipment and storage medium.

Background technique

The facial expression recognition model is used to recognize facial expressions. Facial expression recognition refers to assigning an expression category to a given facial image, including: anger, disgust, happiness, sadness, fear, surprise, etc. The inventor realizes that the current facial expression recognition technology is gradually showing broad application prospects in the fields of human-computer interaction, clinical diagnosis, remote education, and investigation and interrogation, and it is a popular research direction in computer vision and artificial intelligence. The facial expression recognition model needs to be trained in advance. In the training work, it is necessary to manually collect a large number of training images, and then manually label each training image according to the dimensional information of the training image, so the training image marking is time-consuming and laborious; in addition, the existing expression recognition model training method uses The resolution or tones of the training images are the same or similar, resulting in the expression recognition model after the training can more accurately recognize the expression image within a fixed resolution or tonal range, and the resolution or tone of the same expression image is reduced Changes will reduce the recognition accuracy of the facial expression recognition model.

Summary of the invention

The main purpose of this application is to solve the technical problem that the existing expression recognition model training method is time-consuming and laborious to mark training images, and the recognition accuracy of the trained expression recognition model is easily affected by the resolution and tone of the expression image.

An expression recognition model training method, including: acquiring an original training image set; the original training image set includes a plurality of labeled original training images; and the following processing is performed on the original training image set: reducing the original training image The resolution of each original training image in the set to obtain the first type of training image set; render the background light of each original training image in the original training image set to obtain the second type of training image set; reduce the original training image The resolution of each original training image in the set, and render the background light of each original training image to obtain the third type of training image set; respectively pass the original training image set, the first type training image set, and the first type The second-type training image set and the third-type training image set train the facial expression recognition model.

Based on the same technical concept, the present application also provides an expression recognition model training device, which includes: an acquisition module for acquiring the original training image set. The original training image set includes a plurality of labeled original training images. The processing module is configured to perform the following processing on the original training image set obtained by the obtaining module: reduce the resolution of each original training image in the original training image set to obtain the first type of training image set; rendering The background light of each original training image in the original training image set to obtain the second type of training image set; reduce the resolution of each original training image in the original training image set, and render the background light of each original training image , Get the third type of training image set. The processing module is further configured to train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection respectively.

Based on the same technical concept, the present application also provides a computer device, including an input and output unit, a memory, and a processor. The memory stores computer-readable instructions that are executed by the processor. , Making the processor execute the steps in the above-mentioned expression recognition model training method.

Based on the same technical concept, the present application also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, one or more processors can execute the above Steps in the training method of facial expression recognition model.

This application obtains multiple types of new training images by adjusting the features such as the clarity or background tone of the original training images. The new training images do not need to be manually marked, which enriches the training sample image set of the facial expression recognition model. Reduce the time and labor cost of the training sample image marking operation; in addition, training sample images with a variety of definitions and background tones are used to train the expression recognition model to improve the recognition accuracy of the expression recognition model.

Description of the drawings

FIG. 1 is a schematic flowchart of a method for training an expression recognition model in an embodiment of the application.

Fig. 2 is a schematic structural diagram of an expression recognition model training device in an embodiment of the application.

Fig. 3 is a schematic structural diagram of a computer device in an embodiment of the application.

Detailed ways

It should be understood that the specific embodiments described herein are only used to explain the application, and not used to limit the application.

Those skilled in the art can understand that unless specifically stated otherwise, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the features, procedures, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Procedures, steps, operations, elements, components, and/or groups of them.

FIG. 1 is a flowchart of an expression recognition model training method in some embodiments of the application. The expression recognition model training method is executed by an expression recognition model training device. The expression recognition model training device may be a computer or other devices, as shown in FIG. 1 , Can include the following steps S1-S3:

S1. Obtain the original training image set.

The original training image set includes a plurality of labeled original training images.

The original training image is a manually labeled training sample image, which is used to train the facial expression recognition model. The number of training sample images required for expression recognition model training is very large. The traditional method of marking training sample images is to manually label training sample images one by one, which consumes a lot of time and labor costs.

S2. Perform the following processing on the original training image set: reduce the resolution of each original training image in the original training image set to obtain the first type of training image set; render each original training image in the original training image set; The background light of the training image is obtained to obtain the second type of training image set; the resolution of each original training image in the original training image set is reduced, and the background light of each original training image is rendered to obtain the third type of training image set.

In some embodiments, a deep neural network model is used to reduce the resolution of each original training image in the original training image set.

The deep neural network model is generated by the model generation device according to low-resolution image samples, image conversion algorithms, and a deep neural network framework. The deep neural network model includes a plurality of nonlinear conversion convolutional layers alternately using different parameter matrices as convolution template parameters.

Before step S1, the method further includes the following steps S01-S03:

S01. Divide the low-resolution image sample into multiple low-resolution sub-image samples.

Segment low-resolution image samples to enrich the set of low-resolution image samples.

S02. Perform image conversion on the low-resolution sub-image samples by using an image conversion algorithm to obtain high-resolution sub-image samples corresponding to the low-resolution sub-image samples.

In some embodiments, step S02 includes the following steps S021-S024:

S021. Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples.

Total Variation (Total Variation), also known as Total Variation, is often used for image restoration.

Image decomposition is used to decompose low-resolution sub-image samples into cartoon and texture parts. Among them, the cartoon part extracts the structural information of the low-resolution sub-image samples. The pixel value only changes greatly at the boundary of the object, and the pixel value inside the object changes little, and the image is smooth. The texture part extracts the detailed part of the low-resolution sub-image sample, in which the pixel value changes greatly.

In some embodiments, the expression of the image total variation algorithm is:

Among them, (x _p , y _p ) represents the current central pixel in the low-resolution sub-image sample; (x _q , y _q ) represents the total variational pixel of (x _p , y _p );

Is the variance of the pixel values in the object where (x _p , y _p ) and (x _q , y _q ) are located, c _{p, q} are the multiplication factors; T _g is the preset threshold;

S022: Use an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample.

The pixel points of the cartoon sub-image sample are interpolated using an interpolation template function to obtain the enlarged cartoon sub-image sample. Image interpolation belongs to the prior art, and will not be repeated here.

S023: Use the homotopy method to enlarge the texture sub-image sample to obtain an enlarged texture sub-image sample.

Homotopy Method describes the “continuous change” between two objects in topology. If two topological spaces can be changed from one to another through a series of continuous deformations, then the two topological spaces are said to be the same. Lun.

In some embodiments, step S023 includes the following steps: using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample; using the image block dictionary and an orthogonal matching tracking method to amplify the texture sub-image sample to obtain An initial high-resolution sub-image; perform nearest neighbor edge addition processing to the initial high-resolution sub-image to obtain an edged high-resolution sub-image; perform the first homotopy processing on the edged high-resolution sub-image to obtain A first edged high-resolution sub-image; a second homotopic processing is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.

The image block dictionary includes a high-resolution image block dictionary and a low-resolution image block dictionary.

Optionally, the dictionary training algorithm is a K-SVD dictionary training algorithm. K-SAD is a classic dictionary training algorithm. According to the principle of minimum error, the error term is subjected to Singular Value Decomposition (SVD), and the decomposition term that minimizes the error is selected as the updated dictionary atom and corresponding atomic coefficient. After continuous iteration, an optimized solution is obtained.

S024. Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.

In some embodiments, the expression of image synthesis is:

f _H =f _c +f _t +λ ₁ *G(f _t )

Where f _H is the high-resolution sub-image sample, f _{t is the} enlarged texture image sample, f _c is the enlarged cartoon image sample, and G(f _t ) is the modulus of the Robert gradient of the image f _t , λ ₁ is a constant greater than zero.

In the above-mentioned embodiment, by creating a deep neural network model with a nonlinear conversion convolutional layer, the accuracy of converting a high-resolution image into a low-resolution image is improved.

S03. Use high-resolution sub-image samples as input samples of the deep neural network framework, and use low-resolution sub-image samples as output comparison samples of the deep neural network framework to generate the deep neural network model.

The high-resolution sub-image sample is an image after resolution conversion of the low-resolution sub-image sample.

In addition, the background light rendering of the image is a process of toning the background of the image, which belongs to the prior art and will not be repeated here.

S3. Train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection respectively.

The facial expression recognition model is used to recognize the micro-expressions of people in facial images, such as happy, sad, fearful, angry, surprised and disgusted.

In this embodiment, adjustments to the characteristics of the original training image such as sharpness or background hue do not change the dimensional results of the original training image. However, the obtained training images of the first type, the second type, and the third type of training images are new images for the expression recognition model. Therefore, based on the original training image that has been annotated, the obtained first type training image collection, second type training image collection, and third type training image collection do not need to be manually marked, and the training samples of the expression recognition model are enriched Image set.

In some embodiments, after step S3, the method further includes the following steps S4-S7:

S4. Obtain the original test image collection.

The original test image set includes a plurality of original test images. The original test image is used to test the accuracy of facial image recognition by the trained expression recognition model.

S5. Perform the following processing on the original test image set: reduce the resolution of each original test image in the original test image set to obtain the first type of test image set; render each original test image in the original test image set Test the background light of the image to obtain the second type of test image set; reduce the resolution of each original test image in the original test image set, and render the background light of each original test image to obtain the third type of test image set.

The resolution of the original test image and the processing process of the background light are the same as the resolution of the original training image and the processing process of the background light, which will not be repeated here.

S6. Recognizing the original test image set, the first type test image set, the second type test image set, and the third type test image set through the trained expression recognition model.

S7. Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively .

Taking the first-type test image set as an example, the trained expression recognition model recognizes the first-type test image set, and outputs the recognition result of each first-type test image; compares the recognition result with a preset The results are compared, and if the recognition result is consistent with the preset comparison result, it is determined that the recognition result output by the expression recognition model is correct; otherwise, it is determined that the recognition result output by the expression recognition model is wrong. Record the number of accurately identified test images of the first type, and divide the number of accurately identified test images of the first type by the total number of test images of the first type to obtain the expression recognition model for the first type of test images Accuracy of collection recognition.

In the above-mentioned embodiment, by adjusting the characteristics of the original training image, such as the sharpness or background hue, multiple types of new training images are obtained. The new training images do not need to be manually marked, which enriches the training sample images of the expression recognition model Collection, greatly reducing the time and labor cost of training sample image marking operations. In addition, statistics of the recognition accuracy of the facial expression recognition model on the original training image, the first type of training image, the second type of training image, and the third type of training image are provided for evaluating the actual effect of the facial expression recognition model in accordance with.

Based on the same technical concept, this application also provides an expression recognition model training device, which can be used to enrich the training image set and improve the efficiency of expression recognition model training. The device in the embodiment of the present application can implement the steps corresponding to the method for training an expression recognition model performed in the embodiment corresponding to FIG. 1. The functions realized by the device can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, and the modules may be software and/or hardware. As shown in Figure 2, the device includes an acquisition module 1 and a processing module 2. For the functional realization of the processing module 2 and the acquisition module 2, reference may be made to the operations performed in the embodiment corresponding to FIG. 1, which will not be repeated here. The processing module 2 can be used to control the receiving and sending operations of the acquiring module 1.

The acquisition module 1 is used to acquire the original training image set.

The processing module 2 is configured to perform the following processing on the acquired original training image set acquired by the acquisition module 1 respectively: reduce the resolution of each original training image in the original training image set to obtain the first One type of training image set; rendering the background light of each original training image in the original training image set to obtain the second type of training image set; reducing the resolution of each original training image in the original training image set, and rendering The background light of each original training image obtains the third type of training image set.

The processing module 2 is also configured to train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.

In some embodiments, the acquisition module 1 is also used to acquire an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the recognition of facial images by the expression recognition model after training. Accuracy.

The processing module 2 is also configured to perform the following processing on the original test image set acquired by the acquisition module 1 respectively: reduce the resolution of each original test image in the original test image set to obtain the first type of test Image collection; rendering the background light of each original test image in the original test image collection to obtain the second type of test image collection; reducing the resolution of each original test image in the original test image collection, and rendering each original test The background light of the image, the third type of test image set is obtained; the original test image set, the first type test image set, the second type test image set, and the third type Class test image collection for identification.

The processing module 2 is also used to separately count the training of the facial expression recognition model on the original test image set, the first type test image set, the second type test image set, and the third type test image set. The accuracy of image collection recognition.

The processing module 2 is further configured to use high-resolution sub-image samples as input samples of the deep neural network framework, and use low-resolution sub-image samples as output comparison samples of the deep neural network framework to generate the deep neural network model ; The high-resolution sub-image sample is the image after the resolution conversion of the low-resolution sub-image sample.

In some embodiments, the processing module 2 is further used to divide the low-resolution image sample into multiple low-resolution sub-image samples; image conversion is performed on the low-resolution sub-image samples using an image conversion algorithm to obtain low-resolution sub-image samples. The high-resolution sub-image sample corresponding to the image sample.

In some embodiments, the processing module 2 is specifically configured to decompose low-resolution sub-image samples using an image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples; use an interpolation algorithm to analyze the cartoon sub-image samples Zoom in to obtain an enlarged cartoon sub-image sample; use the homotopy method to enlarge the texture sub-image sample to obtain an enlarged texture sub-image sample; compare the enlarged cartoon sub-image sample and the enlarged The texture sub-image samples are synthesized to obtain high-resolution sub-image samples.

In some embodiments, the expression of the image total variation algorithm is:

In some embodiments, the processing module 2 is specifically configured to use a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample; use the image block dictionary and an orthogonal matching tracking method to amplify the texture sub-image sample , Obtain an initial high-resolution sub-image; perform nearest neighbor edge addition processing to the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image; perform the first homotopy processing on the edge-added high-resolution sub-image To obtain a first edged high-resolution sub-image; perform a second homotopy process on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.

Based on the same technical concept, the present application also provides a computer device, as shown in FIG. 3, the computer device includes an input output unit 31, a processor 32, and a memory 33. The memory 33 stores computer readable instructions, When the computer-readable instructions are executed by the processor 32, the processor executes the steps of the expression recognition model training method in the foregoing embodiments.

The physical device corresponding to the acquisition module 1 shown in FIG. 2 is the input and output unit 31 shown in FIG. 3, which can realize part or all of the functions of the acquisition module 1, or realize the same or similar functions as the acquisition module 1. Features.

The physical device corresponding to the processing module 2 shown in FIG. 2 is the processor 32 shown in FIG. 3, and the processor 32 can implement part or all of the functions of the processing module 2 or implement the same or similar functions as the processing module 2.

Based on the same technical concept, the present application also provides a storage medium storing computer-readable instructions. The computer-readable storage medium may be non-volatile or volatile. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the expression recognition model training method in the foregoing embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product. The computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

The embodiments of the present application are described above with reference to the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative and not restrictive. Those of ordinary skill in the art are Under the enlightenment of this application, without departing from the purpose of this application and the scope of protection of the claims, many forms can be made, any equivalent structure or equivalent process transformation made by using the content of the description and drawings of this application, or Directly or indirectly used in other related technical fields, these are all protected by this application.

Claims

An expression recognition model training method, which includes:

Acquiring an original training image set; the original training image set includes a plurality of labeled original training images;

Perform the following processing on the original training image set:

Reducing the resolution of each original training image in the original training image set to obtain the first type of training image set;

Rendering the background light of each original training image in the original training image set to obtain the second type of training image set;

Reducing the resolution of each original training image in the original training image set, and rendering the background light of each original training image, to obtain a third type of training image set;

The expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
The expression recognition model training method according to claim 1, wherein:

After the expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection respectively, the method further includes:

Acquiring an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the accuracy of facial image recognition by the expression recognition model after training;

Perform the following processing on the original test image set:

Reducing the resolution of each original test image in the original test image set to obtain the first type of test image set;

Rendering the background light of each original test image in the original test image set to obtain the second type of test image set;

Reducing the resolution of each original test image in the original test image set, and rendering the background light of each original test image to obtain the third type of test image set;

Recognizing the original test image collection, the first type test image collection, the second type test image collection, and the third type test image collection through the trained expression recognition model;

Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively.
The expression recognition model training method according to claim 1, wherein:

Using a deep neural network model to reduce the resolution of each original training image in the original training image set;

Before the acquiring the original training image set, the method further includes:

The high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework to generate the deep neural network model; the high-resolution sub-image sample is Low-resolution sub-image sample resolution converted image.
The expression recognition model training method according to claim 3, wherein:

Before the high-resolution sub-image samples are used as the input samples of the deep neural network framework, and the low-resolution sub-image samples are used as the output comparison samples of the deep neural network framework to generate the deep neural network model, the method Also includes:

Divide the low-resolution image sample into multiple low-resolution sub-image samples;

The image conversion algorithm is used to perform image conversion on the low-resolution sub-image samples, and the high-resolution sub-image samples corresponding to the low-resolution sub-image samples are obtained.
The expression recognition model training method according to claim 4, wherein:

The image conversion algorithm for the low-resolution sub-image sample to obtain the high-resolution sub-image sample corresponding to the low-resolution sub-image sample includes:

Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples;

Using an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample;

Amplify the texture sub-image sample by using the homotopy method to obtain an enlarged texture sub-image sample;

Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.
The expression recognition model training method according to claim 5, wherein:

The expression of the image total variation algorithm is:

Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
The expression recognition model training method according to claim 5, wherein:

The using the homotopy method to enlarge the texture sub-image sample to obtain the enlarged texture sub-image sample includes:

Using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample;

Amplify the texture sub-image samples by using the image block dictionary and the orthogonal matching tracking method to obtain an initial high-resolution sub-image;

Performing nearest neighbor edge addition processing on the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image;

Performing the first homotopy processing on the edged high-resolution sub-image to obtain the first edged high-resolution sub-image;

A second homotopy process is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
A computer device, which includes an input and output unit, a memory, and a processor. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the processor executes the following step:

Acquiring an original training image set; the original training image set includes a plurality of labeled original training images;

Perform the following processing on the original training image set:

Reducing the resolution of each original training image in the original training image set to obtain the first type of training image set;

Rendering the background light of each original training image in the original training image set to obtain the second type of training image set;

Reducing the resolution of each original training image in the original training image set, and rendering the background light of each original training image, to obtain a third type of training image set;

The expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
8. The computer device according to claim 8, wherein the original training image set, the first type training image set, the second type training image set, and the third type training image set are passed through After training the facial expression recognition model, the method further includes:

Acquiring an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the accuracy of facial image recognition by the expression recognition model after training;

Perform the following processing on the original test image set:

Reducing the resolution of each original test image in the original test image set to obtain the first type of test image set;

Rendering the background light of each original test image in the original test image set to obtain the second type of test image set;

Reducing the resolution of each original test image in the original test image set, and rendering the background light of each original test image to obtain the third type of test image set;

Recognizing the original test image collection, the first type test image collection, the second type test image collection, and the third type test image collection through the trained expression recognition model;

Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively.
8. The computer device according to claim 8, wherein a deep neural network model is used to reduce the resolution of each original training image in the original training image set;

Before the acquiring the original training image set, the method further includes:

The high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework to generate the deep neural network model; the high-resolution sub-image sample is Low-resolution sub-image sample resolution converted image.
10. The computer device according to claim 10, wherein the high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework, Before generating the deep neural network model, the method further includes:

Divide the low-resolution image sample into multiple low-resolution sub-image samples;

The image conversion algorithm is used to perform image conversion on the low-resolution sub-image samples, and the high-resolution sub-image samples corresponding to the low-resolution sub-image samples are obtained.
11. The computer device according to claim 11, wherein said using an image conversion algorithm to perform image conversion on low-resolution sub-image samples to obtain high-resolution sub-image samples corresponding to the low-resolution sub-image samples comprises:

Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples;

Using an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample;

Amplify the texture sub-image sample by using the homotopy method to obtain an enlarged texture sub-image sample;

Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.
The computer device according to claim 12, wherein the expression of the image total variation algorithm is:

Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
11. The computer device according to claim 13, wherein the using the homotopy method to enlarge the texture sub-image sample to obtain an enlarged texture sub-image sample comprises:

Using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample;

Amplify the texture sub-image samples by using the image block dictionary and the orthogonal matching tracking method to obtain an initial high-resolution sub-image;

Performing nearest neighbor edge addition processing on the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image;

Performing the first homotopy processing on the edged high-resolution sub-image to obtain the first edged high-resolution sub-image;

A second homotopy process is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
A storage medium storing computer-readable instructions, where when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Acquiring an original training image set; the original training image set includes a plurality of labeled original training images;

Perform the following processing on the original training image set:

Reducing the resolution of each original training image in the original training image set to obtain the first type of training image set;

Rendering the background light of each original training image in the original training image set to obtain the second type of training image set;

Reducing the resolution of each original training image in the original training image set, and rendering the background light of each original training image to obtain a third type of training image set;

The expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
The storage medium according to claim 15, wherein the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection are passed through After training the facial expression recognition model, the method further includes:

Acquiring an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the accuracy of facial image recognition by the expression recognition model after training;

Perform the following processing on the original test image set:

Reducing the resolution of each original test image in the original test image set to obtain the first type of test image set;

Rendering the background light of each original test image in the original test image set to obtain the second type of test image set;

Reducing the resolution of each original test image in the original test image set, and rendering the background light of each original test image to obtain the third type of test image set;

Recognizing the original test image collection, the first type test image collection, the second type test image collection, and the third type test image collection through the trained expression recognition model;

Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively.
The storage medium according to claim 15, wherein a deep neural network model is used to reduce the resolution of each original training image in the original training image set;

Before the acquiring the original training image set, the method further includes:

The high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework to generate the deep neural network model; the high-resolution sub-image sample is Low-resolution sub-image sample resolution converted image.
The storage medium according to claim 17, wherein the high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework, Before generating the deep neural network model, the method further includes:

Divide the low-resolution image sample into multiple low-resolution sub-image samples;

The image conversion algorithm is used to perform image conversion on the low-resolution sub-image samples, and the high-resolution sub-image samples corresponding to the low-resolution sub-image samples are obtained.
18. The storage medium according to claim 18, wherein said using an image conversion algorithm to perform image conversion on low-resolution sub-image samples to obtain high-resolution sub-image samples corresponding to the low-resolution sub-image samples comprises:

Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples;

Using an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample;

Amplify the texture sub-image sample by using the homotopy method to obtain an enlarged texture sub-image sample;

Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.
The storage medium according to claim 19, wherein the expression of the image total variation algorithm is:

Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
22. The storage medium according to claim 20, wherein said using the homotopy method to enlarge said texture sub-image sample to obtain an enlarged texture sub-image sample comprises:

Using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample;

Amplify the texture sub-image samples by using the image block dictionary and the orthogonal matching tracking method to obtain an initial high-resolution sub-image;

Performing nearest neighbor edge addition processing on the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image;

Performing the first homotopy processing on the edged high-resolution sub-image to obtain the first edged high-resolution sub-image;

A second homotopy process is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.