WO2020233368A1 - Expression recognition model training method and apparatus, and device and storage medium - Google Patents

Expression recognition model training method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2020233368A1
WO2020233368A1 PCT/CN2020/087605 CN2020087605W WO2020233368A1 WO 2020233368 A1 WO2020233368 A1 WO 2020233368A1 CN 2020087605 W CN2020087605 W CN 2020087605W WO 2020233368 A1 WO2020233368 A1 WO 2020233368A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sub
original
training
resolution
Prior art date
Application number
PCT/CN2020/087605
Other languages
French (fr)
Chinese (zh)
Inventor
王丽杰
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020233368A1 publication Critical patent/WO2020233368A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • This application relates to the field of image processing technology, in particular to an expression recognition model training method, device, equipment and storage medium.
  • the facial expression recognition model is used to recognize facial expressions. Facial expression recognition refers to assigning an expression category to a given facial image, including: anger, disgust, happiness, sadness, fear, surprise, etc.
  • the inventor realizes that the current facial expression recognition technology is gradually showing broad application prospects in the fields of human-computer interaction, clinical diagnosis, remote education, and investigation and interrogation, and it is a popular research direction in computer vision and artificial intelligence.
  • the facial expression recognition model needs to be trained in advance.
  • the existing expression recognition model training method uses The resolution or tones of the training images are the same or similar, resulting in the expression recognition model after the training can more accurately recognize the expression image within a fixed resolution or tonal range, and the resolution or tone of the same expression image is reduced Changes will reduce the recognition accuracy of the facial expression recognition model.
  • the main purpose of this application is to solve the technical problem that the existing expression recognition model training method is time-consuming and laborious to mark training images, and the recognition accuracy of the trained expression recognition model is easily affected by the resolution and tone of the expression image.
  • An expression recognition model training method including: acquiring an original training image set; the original training image set includes a plurality of labeled original training images; and the following processing is performed on the original training image set: reducing the original training image The resolution of each original training image in the set to obtain the first type of training image set; render the background light of each original training image in the original training image set to obtain the second type of training image set; reduce the original training image The resolution of each original training image in the set, and render the background light of each original training image to obtain the third type of training image set; respectively pass the original training image set, the first type training image set, and the first type The second-type training image set and the third-type training image set train the facial expression recognition model.
  • an expression recognition model training device which includes: an acquisition module for acquiring the original training image set.
  • the original training image set includes a plurality of labeled original training images.
  • the processing module is configured to perform the following processing on the original training image set obtained by the obtaining module: reduce the resolution of each original training image in the original training image set to obtain the first type of training image set; rendering The background light of each original training image in the original training image set to obtain the second type of training image set; reduce the resolution of each original training image in the original training image set, and render the background light of each original training image , Get the third type of training image set.
  • the processing module is further configured to train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection respectively.
  • the present application also provides a computer device, including an input and output unit, a memory, and a processor.
  • the memory stores computer-readable instructions that are executed by the processor. , Making the processor execute the steps in the above-mentioned expression recognition model training method.
  • the present application also provides a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, one or more processors can execute the above Steps in the training method of facial expression recognition model.
  • This application obtains multiple types of new training images by adjusting the features such as the clarity or background tone of the original training images.
  • the new training images do not need to be manually marked, which enriches the training sample image set of the facial expression recognition model.
  • FIG. 1 is a schematic flowchart of a method for training an expression recognition model in an embodiment of the application.
  • Fig. 2 is a schematic structural diagram of an expression recognition model training device in an embodiment of the application.
  • Fig. 3 is a schematic structural diagram of a computer device in an embodiment of the application.
  • FIG. 1 is a flowchart of an expression recognition model training method in some embodiments of the application.
  • the expression recognition model training method is executed by an expression recognition model training device.
  • the expression recognition model training device may be a computer or other devices, as shown in FIG. 1 , Can include the following steps S1-S3:
  • the original training image set includes a plurality of labeled original training images.
  • the original training image is a manually labeled training sample image, which is used to train the facial expression recognition model.
  • the number of training sample images required for expression recognition model training is very large.
  • the traditional method of marking training sample images is to manually label training sample images one by one, which consumes a lot of time and labor costs.
  • a deep neural network model is used to reduce the resolution of each original training image in the original training image set.
  • the deep neural network model is generated by the model generation device according to low-resolution image samples, image conversion algorithms, and a deep neural network framework.
  • the deep neural network model includes a plurality of nonlinear conversion convolutional layers alternately using different parameter matrices as convolution template parameters.
  • step S1 the method further includes the following steps S01-S03:
  • Segment low-resolution image samples to enrich the set of low-resolution image samples.
  • S02. Perform image conversion on the low-resolution sub-image samples by using an image conversion algorithm to obtain high-resolution sub-image samples corresponding to the low-resolution sub-image samples.
  • step S02 includes the following steps S021-S024:
  • Total Variation also known as Total Variation
  • Total Variation is often used for image restoration.
  • Image decomposition is used to decompose low-resolution sub-image samples into cartoon and texture parts.
  • the cartoon part extracts the structural information of the low-resolution sub-image samples.
  • the pixel value only changes greatly at the boundary of the object, and the pixel value inside the object changes little, and the image is smooth.
  • the texture part extracts the detailed part of the low-resolution sub-image sample, in which the pixel value changes greatly.
  • the expression of the image total variation algorithm is:
  • (x p , y p ) represents the current central pixel in the low-resolution sub-image sample;
  • (x q , y q ) represents the total variational pixel of (x p , y p );
  • T g is the preset threshold;
  • S022 Use an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample.
  • the pixel points of the cartoon sub-image sample are interpolated using an interpolation template function to obtain the enlarged cartoon sub-image sample.
  • Image interpolation belongs to the prior art, and will not be repeated here.
  • Homotopy Method describes the “continuous change” between two objects in topology. If two topological spaces can be changed from one to another through a series of continuous deformations, then the two topological spaces are said to be the same. Lun.
  • step S023 includes the following steps: using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample; using the image block dictionary and an orthogonal matching tracking method to amplify the texture sub-image sample to obtain An initial high-resolution sub-image; perform nearest neighbor edge addition processing to the initial high-resolution sub-image to obtain an edged high-resolution sub-image; perform the first homotopy processing on the edged high-resolution sub-image to obtain A first edged high-resolution sub-image; a second homotopic processing is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
  • the image block dictionary includes a high-resolution image block dictionary and a low-resolution image block dictionary.
  • the dictionary training algorithm is a K-SVD dictionary training algorithm.
  • K-SAD is a classic dictionary training algorithm.
  • the error term is subjected to Singular Value Decomposition (SVD), and the decomposition term that minimizes the error is selected as the updated dictionary atom and corresponding atomic coefficient. After continuous iteration, an optimized solution is obtained.
  • SVD Singular Value Decomposition
  • the expression of image synthesis is:
  • f H is the high-resolution sub-image sample
  • f t is the enlarged texture image sample
  • f c is the enlarged cartoon image sample
  • G(f t ) is the modulus of the Robert gradient of the image f t
  • ⁇ 1 is a constant greater than zero.
  • the high-resolution sub-image sample is an image after resolution conversion of the low-resolution sub-image sample.
  • the background light rendering of the image is a process of toning the background of the image, which belongs to the prior art and will not be repeated here.
  • the facial expression recognition model is used to recognize the micro-expressions of people in facial images, such as happy, sad, fearful, angry, surprised and disgusted.
  • the obtained training images of the first type, the second type, and the third type of training images are new images for the expression recognition model. Therefore, based on the original training image that has been annotated, the obtained first type training image collection, second type training image collection, and third type training image collection do not need to be manually marked, and the training samples of the expression recognition model are enriched Image set.
  • step S3 the method further includes the following steps S4-S7:
  • the original test image set includes a plurality of original test images.
  • the original test image is used to test the accuracy of facial image recognition by the trained expression recognition model.
  • S5. Perform the following processing on the original test image set: reduce the resolution of each original test image in the original test image set to obtain the first type of test image set; render each original test image in the original test image set Test the background light of the image to obtain the second type of test image set; reduce the resolution of each original test image in the original test image set, and render the background light of each original test image to obtain the third type of test image set.
  • the resolution of the original test image and the processing process of the background light are the same as the resolution of the original training image and the processing process of the background light, which will not be repeated here.
  • the trained expression recognition model recognizes the first-type test image set, and outputs the recognition result of each first-type test image; compares the recognition result with a preset The results are compared, and if the recognition result is consistent with the preset comparison result, it is determined that the recognition result output by the expression recognition model is correct; otherwise, it is determined that the recognition result output by the expression recognition model is wrong. Record the number of accurately identified test images of the first type, and divide the number of accurately identified test images of the first type by the total number of test images of the first type to obtain the expression recognition model for the first type of test images Accuracy of collection recognition.
  • the characteristics of the original training image such as the sharpness or background hue
  • multiple types of new training images are obtained.
  • the new training images do not need to be manually marked, which enriches the training sample images of the expression recognition model Collection, greatly reducing the time and labor cost of training sample image marking operations.
  • statistics of the recognition accuracy of the facial expression recognition model on the original training image, the first type of training image, the second type of training image, and the third type of training image are provided for evaluating the actual effect of the facial expression recognition model in accordance with.
  • this application also provides an expression recognition model training device, which can be used to enrich the training image set and improve the efficiency of expression recognition model training.
  • the device in the embodiment of the present application can implement the steps corresponding to the method for training an expression recognition model performed in the embodiment corresponding to FIG. 1.
  • the functions realized by the device can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions, and the modules may be software and/or hardware.
  • the device includes an acquisition module 1 and a processing module 2.
  • the processing module 2 can be used to control the receiving and sending operations of the acquiring module 1.
  • the acquisition module 1 is used to acquire the original training image set.
  • the original training image set includes a plurality of labeled original training images.
  • the processing module 2 is configured to perform the following processing on the acquired original training image set acquired by the acquisition module 1 respectively: reduce the resolution of each original training image in the original training image set to obtain the first One type of training image set; rendering the background light of each original training image in the original training image set to obtain the second type of training image set; reducing the resolution of each original training image in the original training image set, and rendering The background light of each original training image obtains the third type of training image set.
  • the processing module 2 is also configured to train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
  • the acquisition module 1 is also used to acquire an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the recognition of facial images by the expression recognition model after training. Accuracy.
  • the processing module 2 is also configured to perform the following processing on the original test image set acquired by the acquisition module 1 respectively: reduce the resolution of each original test image in the original test image set to obtain the first type of test Image collection; rendering the background light of each original test image in the original test image collection to obtain the second type of test image collection; reducing the resolution of each original test image in the original test image collection, and rendering each original test
  • reduce the resolution of each original test image in the original test image set is obtained; the original test image set, the first type test image set, the second type test image set, and the third type Class test image collection for identification.
  • the processing module 2 is also used to separately count the training of the facial expression recognition model on the original test image set, the first type test image set, the second type test image set, and the third type test image set. The accuracy of image collection recognition.
  • a deep neural network model is used to reduce the resolution of each original training image in the original training image set.
  • the processing module 2 is further configured to use high-resolution sub-image samples as input samples of the deep neural network framework, and use low-resolution sub-image samples as output comparison samples of the deep neural network framework to generate the deep neural network model ;
  • the high-resolution sub-image sample is the image after the resolution conversion of the low-resolution sub-image sample.
  • the processing module 2 is further used to divide the low-resolution image sample into multiple low-resolution sub-image samples; image conversion is performed on the low-resolution sub-image samples using an image conversion algorithm to obtain low-resolution sub-image samples.
  • the high-resolution sub-image sample corresponding to the image sample.
  • the processing module 2 is specifically configured to decompose low-resolution sub-image samples using an image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples; use an interpolation algorithm to analyze the cartoon sub-image samples Zoom in to obtain an enlarged cartoon sub-image sample; use the homotopy method to enlarge the texture sub-image sample to obtain an enlarged texture sub-image sample; compare the enlarged cartoon sub-image sample and the enlarged
  • the texture sub-image samples are synthesized to obtain high-resolution sub-image samples.
  • the expression of the image total variation algorithm is:
  • (x p , y p ) represents the current central pixel in the low-resolution sub-image sample;
  • (x q , y q ) represents the total variational pixel of (x p , y p );
  • T g is the preset threshold;
  • the processing module 2 is specifically configured to use a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample; use the image block dictionary and an orthogonal matching tracking method to amplify the texture sub-image sample , Obtain an initial high-resolution sub-image; perform nearest neighbor edge addition processing to the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image; perform the first homotopy processing on the edge-added high-resolution sub-image To obtain a first edged high-resolution sub-image; perform a second homotopy process on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
  • the characteristics of the original training image such as the sharpness or background hue
  • multiple types of new training images are obtained.
  • the new training images do not need to be manually marked, which enriches the training sample images of the expression recognition model Collection, greatly reducing the time and labor cost of training sample image marking operations.
  • statistics of the recognition accuracy of the facial expression recognition model on the original training image, the first type of training image, the second type of training image, and the third type of training image are provided for evaluating the actual effect of the facial expression recognition model in accordance with.
  • the present application also provides a computer device, as shown in FIG. 3, the computer device includes an input output unit 31, a processor 32, and a memory 33.
  • the memory 33 stores computer readable instructions, When the computer-readable instructions are executed by the processor 32, the processor executes the steps of the expression recognition model training method in the foregoing embodiments.
  • the physical device corresponding to the acquisition module 1 shown in FIG. 2 is the input and output unit 31 shown in FIG. 3, which can realize part or all of the functions of the acquisition module 1, or realize the same or similar functions as the acquisition module 1.
  • the physical device corresponding to the processing module 2 shown in FIG. 2 is the processor 32 shown in FIG. 3, and the processor 32 can implement part or all of the functions of the processing module 2 or implement the same or similar functions as the processing module 2.
  • the present application also provides a storage medium storing computer-readable instructions.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the one or more processors execute the steps of the expression recognition model training method in the foregoing embodiments.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Abstract

Disclosed are an expression recognition model training method and apparatus, and a device and a storage medium, relating to the technical field of artificial intelligence. The method comprises: respectively carrying out the following processing on an original training image set: reducing the resolution of the original training image set to obtain a first-type training image set; rendering background light of the original training image set to obtain a second-type training image set; and reducing the resolution of the original training image set, and rendering the background light of the original training image set to obtain a third-type training image set (S2); and training an expression recognition model by means of the original training image set, the first-type training image set, the second-type training image set and the third-type training image set (S3). By adjusting features such as the definition or background tone of an original training image, multiple types of new training images are obtained, manual marking processing does not need to be carried out on the new training images, and a training sample image set of an expression recognition model is enriched.

Description

表情识别模型训练方法、装置、设备和存储介质Expression recognition model training method, device, equipment and storage medium
本申请要求于2019年5月22日提交中国专利局,申请号为201910427443.1、发明名称为“表情识别模型训练方法、装置、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 22, 2019, the application number is 201910427443.1, and the invention title is "Expression Recognition Model Training Method, Apparatus, Equipment, and Storage Medium", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及图像处理技术领域,尤其涉及表情识别模型训练方法、装置、设备和存储介质。This application relates to the field of image processing technology, in particular to an expression recognition model training method, device, equipment and storage medium.
背景技术Background technique
表情识别模型用于识别人脸的表情。人脸表情识别是指对给定的人脸图像指定一个表情类别,包括:愤怒、厌恶、开心、伤心、恐惧以及惊讶等等。发明人意识到,目前人脸表情识别技术在人机交互、临床诊断、远程教育以及侦查审讯等领域逐渐显现广阔的应用前景,是计算机视觉和人工智能的热门研究方向。表情识别模型需要预先进行训练。在训练工作中,需要人工收集大批量的训练图像,再由人工依据训练图像的维度信息对各训练图像进行类型标注,所以训练图像打标费时费力;此外,现有的表情识别模型训练方法采用的各训练图像之间的分辨率或色调相同或相似,导致训练后的表情识别模型在固定的分辨率或色调范围内才能较为准确的识别出表情图像,同一张表情图像的分辨率降低或者色调发生变化都会降低表情识别模型的识别准确率。The facial expression recognition model is used to recognize facial expressions. Facial expression recognition refers to assigning an expression category to a given facial image, including: anger, disgust, happiness, sadness, fear, surprise, etc. The inventor realizes that the current facial expression recognition technology is gradually showing broad application prospects in the fields of human-computer interaction, clinical diagnosis, remote education, and investigation and interrogation, and it is a popular research direction in computer vision and artificial intelligence. The facial expression recognition model needs to be trained in advance. In the training work, it is necessary to manually collect a large number of training images, and then manually label each training image according to the dimensional information of the training image, so the training image marking is time-consuming and laborious; in addition, the existing expression recognition model training method uses The resolution or tones of the training images are the same or similar, resulting in the expression recognition model after the training can more accurately recognize the expression image within a fixed resolution or tonal range, and the resolution or tone of the same expression image is reduced Changes will reduce the recognition accuracy of the facial expression recognition model.
发明内容Summary of the invention
本申请的主要目的在于解决现有的表情识别模型训练方法存证训练图像标注费时费力,且训练后的表情识别模型的识别准确率易受表情图像分辨率和色调影响的技术问题。The main purpose of this application is to solve the technical problem that the existing expression recognition model training method is time-consuming and laborious to mark training images, and the recognition accuracy of the trained expression recognition model is easily affected by the resolution and tone of the expression image.
一种表情识别模型训练方法,包括:获取原训练图像集合;所述原训练图像集合包括多个已标注的原训练图像;对所述原训练图像集合分别进行以下处理:降低所述原训练图像集合中的各原训练图像的分辨率,得到第一类训练图像集合;渲染所述原训练图像集合中的各原训练图像的背景光线,得到第二类训练图像集合;降低所述原训练图像集合中的各原训练图像的分辨率,并且渲染各原训练图像的背景光线,得到第三类训练图像集合;分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型。An expression recognition model training method, including: acquiring an original training image set; the original training image set includes a plurality of labeled original training images; and the following processing is performed on the original training image set: reducing the original training image The resolution of each original training image in the set to obtain the first type of training image set; render the background light of each original training image in the original training image set to obtain the second type of training image set; reduce the original training image The resolution of each original training image in the set, and render the background light of each original training image to obtain the third type of training image set; respectively pass the original training image set, the first type training image set, and the first type The second-type training image set and the third-type training image set train the facial expression recognition model.
基于相同的技术构思,本申请还提供了一种表情识别模型训练装置,包括:获取模块,用于获取原训练图像集合。所述原训练图像集合包括多个已标注的原训练图像。处理模块,用于对所述获取模块所获取的所述原训练图像集合分别进行以下处理:降低所述原训练图像集合中的各原训练图像的分辨率,得到第一类训练图像集合;渲染所述原训练图像集合中的各原训练图像的背景光线,得到第二类训练图像集合;降低所述原训练图像集合中的各原训练图像的分辨 率,并且渲染各原训练图像的背景光线,得到第三类训练图像集合。所述处理模块还用于分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型。Based on the same technical concept, the present application also provides an expression recognition model training device, which includes: an acquisition module for acquiring the original training image set. The original training image set includes a plurality of labeled original training images. The processing module is configured to perform the following processing on the original training image set obtained by the obtaining module: reduce the resolution of each original training image in the original training image set to obtain the first type of training image set; rendering The background light of each original training image in the original training image set to obtain the second type of training image set; reduce the resolution of each original training image in the original training image set, and render the background light of each original training image , Get the third type of training image set. The processing module is further configured to train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection respectively.
基于相同的技术构思,本申请还提供了一种计算机设备,包括输入输出单元、存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如上述的表情识别模型训练方法中的步骤。Based on the same technical concept, the present application also provides a computer device, including an input and output unit, a memory, and a processor. The memory stores computer-readable instructions that are executed by the processor. , Making the processor execute the steps in the above-mentioned expression recognition model training method.
基于相同的技术构思,本申请还提供了一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如上述的表情识别模型训练方法中的步骤。Based on the same technical concept, the present application also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, one or more processors can execute the above Steps in the training method of facial expression recognition model.
本申请通过对原训练图像的清晰度或背景色调等特征进行调整,得到多类新的训练图像,新的训练图像无需再做人工打标处理,丰富了表情识别模型的训练样本图像集,大大降低训练样本图像打标作业所消耗的时间和人力成本;此外,采用多种清晰度和背景色调的训练样本图像对表情识别模型进行训练,提高表情识别模型的识别准确率。This application obtains multiple types of new training images by adjusting the features such as the clarity or background tone of the original training images. The new training images do not need to be manually marked, which enriches the training sample image set of the facial expression recognition model. Reduce the time and labor cost of the training sample image marking operation; in addition, training sample images with a variety of definitions and background tones are used to train the expression recognition model to improve the recognition accuracy of the expression recognition model.
附图说明Description of the drawings
图1为本申请实施例中表情识别模型训练方法的流程示意图。FIG. 1 is a schematic flowchart of a method for training an expression recognition model in an embodiment of the application.
图2为本申请实施例中表情识别模型训练装置的结构示意图。Fig. 2 is a schematic structural diagram of an expression recognition model training device in an embodiment of the application.
图3为本申请实施例中计算机设备的结构示意图。Fig. 3 is a schematic structural diagram of a computer device in an embodiment of the application.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the application, and not used to limit the application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可以包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、程序、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、程序、步骤、操作、元件、组件和/或它们的组。Those skilled in the art can understand that unless specifically stated otherwise, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the features, procedures, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Procedures, steps, operations, elements, components, and/or groups of them.
图1为本申请一些实施方式中一种表情识别模型训练方法的流程图,该表情识别模型训练方法由表情识别模型训练设备执行,表情识别模型训练设备可以是电脑等设备,如图1所示,可以包括以下步骤S1-S3:FIG. 1 is a flowchart of an expression recognition model training method in some embodiments of the application. The expression recognition model training method is executed by an expression recognition model training device. The expression recognition model training device may be a computer or other devices, as shown in FIG. 1 , Can include the following steps S1-S3:
S1、获取原训练图像集合。S1. Obtain the original training image set.
所述原训练图像集合包括多个已标注的原训练图像。The original training image set includes a plurality of labeled original training images.
原训练图像为人工标注好的训练样本图像,用于训练表情识别模型。表情识别模型训练所需的训练样本图像的数量很大,传统的训练样本图像打标方式为采用人工一一对训练样本图像进行标注,消耗很多的时间以及人力成本。The original training image is a manually labeled training sample image, which is used to train the facial expression recognition model. The number of training sample images required for expression recognition model training is very large. The traditional method of marking training sample images is to manually label training sample images one by one, which consumes a lot of time and labor costs.
S2、对所述原训练图像集合分别进行以下处理:降低所述原训练图像集合中的各原训练图像的分辨率,得到第一类训练图像集合;渲染所述原训练图像集合中的各原训练图像的背景光线,得到第二类训练图像集合;降低所述原训 练图像集合中的各原训练图像的分辨率,并且渲染各原训练图像的背景光线,得到第三类训练图像集合。S2. Perform the following processing on the original training image set: reduce the resolution of each original training image in the original training image set to obtain the first type of training image set; render each original training image in the original training image set; The background light of the training image is obtained to obtain the second type of training image set; the resolution of each original training image in the original training image set is reduced, and the background light of each original training image is rendered to obtain the third type of training image set.
一些实施方式中,采用深度神经网络模型降低所述原训练图像集合中的各原训练图像的分辨率。In some embodiments, a deep neural network model is used to reduce the resolution of each original training image in the original training image set.
所述深度神经网络模型由模型生成设备根据低分辨率图像样本、图像转换算法以及深度神经网络框架生成。所述深度神经网络模型包括交替采用不同参数矩阵作为卷积模板参数的多个非线性转换卷积层。The deep neural network model is generated by the model generation device according to low-resolution image samples, image conversion algorithms, and a deep neural network framework. The deep neural network model includes a plurality of nonlinear conversion convolutional layers alternately using different parameter matrices as convolution template parameters.
在步骤S1之前,该方法还包括以下步骤S01-S03:Before step S1, the method further includes the following steps S01-S03:
S01、将低分辨率图像样本分割为多个低分辨率子图像样本。S01. Divide the low-resolution image sample into multiple low-resolution sub-image samples.
对低分辨率图像样本进行分割,以丰富低分辨率图像样本集。Segment low-resolution image samples to enrich the set of low-resolution image samples.
S02、采用图像转换算法对低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本。S02. Perform image conversion on the low-resolution sub-image samples by using an image conversion algorithm to obtain high-resolution sub-image samples corresponding to the low-resolution sub-image samples.
一些实施方式中,步骤S02包括以下步骤S021-S024:In some embodiments, step S02 includes the following steps S021-S024:
S021、采用图像全变分算法对低分辨率子图像样本进行分解,得到卡通子图像样本和纹理子图像样本。S021. Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples.
全变分(Total Variation)也称为全变差,常用于图象复原。Total Variation (Total Variation), also known as Total Variation, is often used for image restoration.
采用图像分解把低分辨率子图像样本分解为卡通(cartoon)和纹理(texture)部分。其中,卡通部分提取的是低分辨率子图像样本的结构信息,像素点值只在物体交界处有较大变化,在物体内部的像素点值变化小,图像平滑。纹理部分提取的是低分辨率子图像样本的细节部分,其中的像素点值的变化较大。Image decomposition is used to decompose low-resolution sub-image samples into cartoon and texture parts. Among them, the cartoon part extracts the structural information of the low-resolution sub-image samples. The pixel value only changes greatly at the boundary of the object, and the pixel value inside the object changes little, and the image is smooth. The texture part extracts the detailed part of the low-resolution sub-image sample, in which the pixel value changes greatly.
一些实施方式中,所述图像全变分算法的表达式为:In some embodiments, the expression of the image total variation algorithm is:
Figure PCTCN2020087605-appb-000001
Figure PCTCN2020087605-appb-000001
其中,(x p,y p)表示低分辨率子图像样本中当前中心像素点;(x q,y q)表示(x p,y p)的全变分的像素点;
Figure PCTCN2020087605-appb-000002
是(x p,y p)和(x q,y q)所在物体内的像素值的方差,c p,q为相乘因子;T g为预设阈值;
Figure PCTCN2020087605-appb-000003
Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
Figure PCTCN2020087605-appb-000002
Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
Figure PCTCN2020087605-appb-000003
S022、采用插值算法对所述卡通子图像样本进行放大,得到放大后的卡通子图像样本。S022: Use an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample.
对所述卡通子图像样本的像素点采用插值模板函数进行插值,得到所述放大后的卡通子图像样本。图像插值属于现有技术,在此不再赘述。The pixel points of the cartoon sub-image sample are interpolated using an interpolation template function to obtain the enlarged cartoon sub-image sample. Image interpolation belongs to the prior art, and will not be repeated here.
S023、采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本。S023: Use the homotopy method to enlarge the texture sub-image sample to obtain an enlarged texture sub-image sample.
同伦(Homotopy Method)法在拓扑上描述了两个对象间的“连续变化”,两个拓扑空间如果可以通过一系列连续的形变从一个变到另一个,那么就称这两个拓扑空间同伦。Homotopy Method describes the “continuous change” between two objects in topology. If two topological spaces can be changed from one to another through a series of continuous deformations, then the two topological spaces are said to be the same. Lun.
一些实施方式中,步骤S023包括以下步骤:采用字典训练算法得到所述纹 理子图像样本的图像块字典;采用所述图像块字典和正交匹配跟踪方法对所述纹理子图像样本进行放大,得到初始高分辨率子图像;对所述初始高分辨率子图像进行最近邻的加边处理,得到加边高分辨子图像;对所述加边高分辨子图像进行第一次同伦处理,得到第一加边高分辨率子图像;对所述第一加边高分辨率子图像进行第二次同伦处理,得到所述放大后的纹理子图像样本。In some embodiments, step S023 includes the following steps: using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample; using the image block dictionary and an orthogonal matching tracking method to amplify the texture sub-image sample to obtain An initial high-resolution sub-image; perform nearest neighbor edge addition processing to the initial high-resolution sub-image to obtain an edged high-resolution sub-image; perform the first homotopy processing on the edged high-resolution sub-image to obtain A first edged high-resolution sub-image; a second homotopic processing is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
所述图像块字典包括高分辨率图像块字典和低分辨率图像块字典。The image block dictionary includes a high-resolution image block dictionary and a low-resolution image block dictionary.
可选地,字典训练算法为K-SVD字典训练算法。K-SAD是一种经典的字典训练算法,依据误差最小原则,对误差项进行奇异值分解(Singular Value Decomposition,SVD),选择使误差最小的分解项作为更新的字典原子和对应的原子系数,经过不断的迭代从而得到优化的解。Optionally, the dictionary training algorithm is a K-SVD dictionary training algorithm. K-SAD is a classic dictionary training algorithm. According to the principle of minimum error, the error term is subjected to Singular Value Decomposition (SVD), and the decomposition term that minimizes the error is selected as the updated dictionary atom and corresponding atomic coefficient. After continuous iteration, an optimized solution is obtained.
S024、对所述放大后的卡通子图像样本和所述放大后的纹理子图像样本进行合成,得到高分辨率子图像样本。S024. Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.
一些实施方式中,图像合成的表达式为:In some embodiments, the expression of image synthesis is:
f H=f c+f t1*G(f t) f H =f c +f t1 *G(f t )
其中,f H为所述高分辨率子图像样本,f t放大后的纹理图像样本,f c为放大后的卡通图像样本,G(f t)为对图像f t求Robert梯度的模值,λ 1为大于0的常数。 Where f H is the high-resolution sub-image sample, f t is the enlarged texture image sample, f c is the enlarged cartoon image sample, and G(f t ) is the modulus of the Robert gradient of the image f t , λ 1 is a constant greater than zero.
上述实施方式,通过创建具有非线性转换卷积层的深度神经网络模型,提高了将高分辨率图像转换为低分辨率图像的准确性。In the above-mentioned embodiment, by creating a deep neural network model with a nonlinear conversion convolutional layer, the accuracy of converting a high-resolution image into a low-resolution image is improved.
S03、以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型。S03. Use high-resolution sub-image samples as input samples of the deep neural network framework, and use low-resolution sub-image samples as output comparison samples of the deep neural network framework to generate the deep neural network model.
高分辨率子图像样本为低分辨率子图像样本分辨率转化后的图像。The high-resolution sub-image sample is an image after resolution conversion of the low-resolution sub-image sample.
此外,图像的背景光线渲染是对图像的背景进行调色的过程,属于现有技术,在此不再赘述。In addition, the background light rendering of the image is a process of toning the background of the image, which belongs to the prior art and will not be repeated here.
S3、分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型。S3. Train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection respectively.
表情识别模型用于识别人脸图像中人物的微表情,如快乐、伤心、恐惧、愤怒、惊讶和厌恶等微表情。The facial expression recognition model is used to recognize the micro-expressions of people in facial images, such as happy, sad, fearful, angry, surprised and disgusted.
本实施例中,对原训练图像的清晰度或背景色调等特征进行调整,均不改变原训练图像的维度结果。而所得到的第一类训练图像、第二类训练图像、第三类训练图像对于表情识别模型而言,却是新的图像。所以基于已做标注的原训练图像,所得到的第一类训练图像集合、第二类训练图像集合、第三类训练图像集合无需再做人工打标处理,且丰富了表情识别模型的训练样本图像集。In this embodiment, adjustments to the characteristics of the original training image such as sharpness or background hue do not change the dimensional results of the original training image. However, the obtained training images of the first type, the second type, and the third type of training images are new images for the expression recognition model. Therefore, based on the original training image that has been annotated, the obtained first type training image collection, second type training image collection, and third type training image collection do not need to be manually marked, and the training samples of the expression recognition model are enriched Image set.
一些实施方式中,在步骤S3之后,该方法还包括以下步骤S4-S7:In some embodiments, after step S3, the method further includes the following steps S4-S7:
S4、获取原测试图像集合。S4. Obtain the original test image collection.
所述原测试图像集合包括多个原测试图像。原测试图像用于测试训练后的表情识别模型对人脸图像识别的准确率。The original test image set includes a plurality of original test images. The original test image is used to test the accuracy of facial image recognition by the trained expression recognition model.
S5、对所述原测试图像集合分别进行以下处理:降低所述原测试图像集合 中的各原测试图像的分辨率,得到第一类测试图像集合;渲染所述原测试图像集合中的各原测试图像的背景光线,得到第二类测试图像集合;降低所述原测试图像集合中的各原测试图像的分辨率,并且渲染各原测试图像的背景光线,得到第三类测试图像集合。S5. Perform the following processing on the original test image set: reduce the resolution of each original test image in the original test image set to obtain the first type of test image set; render each original test image in the original test image set Test the background light of the image to obtain the second type of test image set; reduce the resolution of each original test image in the original test image set, and render the background light of each original test image to obtain the third type of test image set.
所述原测试图像的分辨率和背景光线的处理过程与前述原训练图像的分辨率和背景光线的处理过程相同,在此不再赘述。The resolution of the original test image and the processing process of the background light are the same as the resolution of the original training image and the processing process of the background light, which will not be repeated here.
S6、通过训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合进行识别。S6. Recognizing the original test image set, the first type test image set, the second type test image set, and the third type test image set through the trained expression recognition model.
S7、分别统计所述训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合识别的准确率。S7. Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively .
以所述第一类测试图像集合为例,训练后的表情识别模型对所述第一类测试图像集合进行识别,输出每个第一类测试图像的识别结果;将识别结果与预设比对结果进行比较,若识别结果与预设比对结果一致,则判定表情识别模型输出的识别结果正确,否则,判定表情识别模型输出的识别结果错误。记录被准确识别的第一类测试图像的数量,将被准确识别的第一类测试图像的数量与第一类测试图像的总量做除法运算,得到表情识别模型对所述第一类测试图像集合识别的准确率。Taking the first-type test image set as an example, the trained expression recognition model recognizes the first-type test image set, and outputs the recognition result of each first-type test image; compares the recognition result with a preset The results are compared, and if the recognition result is consistent with the preset comparison result, it is determined that the recognition result output by the expression recognition model is correct; otherwise, it is determined that the recognition result output by the expression recognition model is wrong. Record the number of accurately identified test images of the first type, and divide the number of accurately identified test images of the first type by the total number of test images of the first type to obtain the expression recognition model for the first type of test images Accuracy of collection recognition.
上述实施例中,通过对原训练图像的清晰度或背景色调等特征进行调整,得到多类新的训练图像,新的训练图像无需再做人工打标处理,丰富了表情识别模型的训练样本图像集,大大降低了训练样本图像打标作业所消耗的时间和人力成本。此外,统计所述表情识别模型对原训练图像、所述第一类训练图像、所述第二类训练图像、所述第三类训练图像识别的准确率,为评估表情识别模型的实际效果提供依据。In the above-mentioned embodiment, by adjusting the characteristics of the original training image, such as the sharpness or background hue, multiple types of new training images are obtained. The new training images do not need to be manually marked, which enriches the training sample images of the expression recognition model Collection, greatly reducing the time and labor cost of training sample image marking operations. In addition, statistics of the recognition accuracy of the facial expression recognition model on the original training image, the first type of training image, the second type of training image, and the third type of training image are provided for evaluating the actual effect of the facial expression recognition model in accordance with.
基于相同的技术构思,本申请还提供了一种表情识别模型训练装置,其可用于丰富训练图像集,提高表情识别模型训练的效率。本申请实施例中的装置能够实现对应于上述图1所对应的实施例中所执行的表情识别模型训练的方法的步骤。该装置实现的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块,所述模块可以是软件和/或硬件。如图2所示,该装置包括获取模块1和处理模块2。所述处理模块2和获取模块2的功能实现可参考图1所对应的实施例中所执行的操作,此处不作赘述。所述处理模块2可用于控制所述获取模块1的收发操作。Based on the same technical concept, this application also provides an expression recognition model training device, which can be used to enrich the training image set and improve the efficiency of expression recognition model training. The device in the embodiment of the present application can implement the steps corresponding to the method for training an expression recognition model performed in the embodiment corresponding to FIG. 1. The functions realized by the device can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, and the modules may be software and/or hardware. As shown in Figure 2, the device includes an acquisition module 1 and a processing module 2. For the functional realization of the processing module 2 and the acquisition module 2, reference may be made to the operations performed in the embodiment corresponding to FIG. 1, which will not be repeated here. The processing module 2 can be used to control the receiving and sending operations of the acquiring module 1.
所述获取模块1,用于用于获取原训练图像集合。The acquisition module 1 is used to acquire the original training image set.
所述原训练图像集合包括多个已标注的原训练图像。The original training image set includes a plurality of labeled original training images.
所述处理模块2,用于对所述获取模块1所获取的所获取的所述原训练图像集合分别进行以下处理:降低所述原训练图像集合中的各原训练图像的分辨率,得到第一类训练图像集合;渲染所述原训练图像集合中的各原训练图像的背景光线,得到第二类训练图像集合;降低所述原训练图像集合中的各原训练图像的分辨率,并且渲染各原训练图像的背景光线,得到第三类训练图像集合。The processing module 2 is configured to perform the following processing on the acquired original training image set acquired by the acquisition module 1 respectively: reduce the resolution of each original training image in the original training image set to obtain the first One type of training image set; rendering the background light of each original training image in the original training image set to obtain the second type of training image set; reducing the resolution of each original training image in the original training image set, and rendering The background light of each original training image obtains the third type of training image set.
所述处理模块2还用于分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型。The processing module 2 is also configured to train an expression recognition model through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
一些实施方式中,所述获取模块1还用于获取原测试图像集合;所述原测试图像集合包括多个原测试图像;原测试图像用于测试训练后的表情识别模型对人脸图像识别的准确率。In some embodiments, the acquisition module 1 is also used to acquire an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the recognition of facial images by the expression recognition model after training. Accuracy.
所述处理模块2还用于对所述获取模块1所获取的所述原测试图像集合分别进行以下处理:降低所述原测试图像集合中的各原测试图像的分辨率,得到第一类测试图像集合;渲染所述原测试图像集合中的各原测试图像的背景光线,得到第二类测试图像集合;降低所述原测试图像集合中的各原测试图像的分辨率,并且渲染各原测试图像的背景光线,得到第三类测试图像集合;通过训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合进行识别。The processing module 2 is also configured to perform the following processing on the original test image set acquired by the acquisition module 1 respectively: reduce the resolution of each original test image in the original test image set to obtain the first type of test Image collection; rendering the background light of each original test image in the original test image collection to obtain the second type of test image collection; reducing the resolution of each original test image in the original test image collection, and rendering each original test The background light of the image, the third type of test image set is obtained; the original test image set, the first type test image set, the second type test image set, and the third type Class test image collection for identification.
所述处理模块2还用于分别统计所述训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合识别的准确率。The processing module 2 is also used to separately count the training of the facial expression recognition model on the original test image set, the first type test image set, the second type test image set, and the third type test image set. The accuracy of image collection recognition.
一些实施方式中,采用深度神经网络模型降低所述原训练图像集合中的各原训练图像的分辨率。In some embodiments, a deep neural network model is used to reduce the resolution of each original training image in the original training image set.
所述处理模块2还用于以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型;高分辨率子图像样本为低分辨率子图像样本分辨率转化后的图像。The processing module 2 is further configured to use high-resolution sub-image samples as input samples of the deep neural network framework, and use low-resolution sub-image samples as output comparison samples of the deep neural network framework to generate the deep neural network model ; The high-resolution sub-image sample is the image after the resolution conversion of the low-resolution sub-image sample.
一些实施方式中,所述处理模块2还用于将低分辨率图像样本分割为多个低分辨率子图像样本;采用图像转换算法对低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本。In some embodiments, the processing module 2 is further used to divide the low-resolution image sample into multiple low-resolution sub-image samples; image conversion is performed on the low-resolution sub-image samples using an image conversion algorithm to obtain low-resolution sub-image samples. The high-resolution sub-image sample corresponding to the image sample.
一些实施方式中,所述处理模块2具体用于采用图像全变分算法对低分辨率子图像样本进行分解,得到卡通子图像样本和纹理子图像样本;采用插值算法对所述卡通子图像样本进行放大,得到放大后的卡通子图像样本;采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本;对所述放大后的卡通子图像样本和所述放大后的纹理子图像样本进行合成,得到高分辨率子图像样本。In some embodiments, the processing module 2 is specifically configured to decompose low-resolution sub-image samples using an image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples; use an interpolation algorithm to analyze the cartoon sub-image samples Zoom in to obtain an enlarged cartoon sub-image sample; use the homotopy method to enlarge the texture sub-image sample to obtain an enlarged texture sub-image sample; compare the enlarged cartoon sub-image sample and the enlarged The texture sub-image samples are synthesized to obtain high-resolution sub-image samples.
一些实施方式中,所述图像全变分算法的表达式为:In some embodiments, the expression of the image total variation algorithm is:
Figure PCTCN2020087605-appb-000004
Figure PCTCN2020087605-appb-000004
其中,(x p,y p)表示低分辨率子图像样本中当前中心像素点;(x q,y q)表示(x p,y p)的全变分的像素点;
Figure PCTCN2020087605-appb-000005
是(x p,y p)和(x q,y q)所在物体内的像素值的方差, c p,q为相乘因子;T g为预设阈值;
Figure PCTCN2020087605-appb-000006
Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
Figure PCTCN2020087605-appb-000005
Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
Figure PCTCN2020087605-appb-000006
一些实施方式中,所述处理模块2具体用于采用字典训练算法得到所述纹理子图像样本的图像块字典;采用所述图像块字典和正交匹配跟踪方法对所述纹理子图像样本进行放大,得到初始高分辨率子图像;对所述初始高分辨率子图像进行最近邻的加边处理,得到加边高分辨子图像;对所述加边高分辨子图像进行第一次同伦处理,得到第一加边高分辨率子图像;对所述第一加边高分辨率子图像进行第二次同伦处理,得到所述放大后的纹理子图像样本。In some embodiments, the processing module 2 is specifically configured to use a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample; use the image block dictionary and an orthogonal matching tracking method to amplify the texture sub-image sample , Obtain an initial high-resolution sub-image; perform nearest neighbor edge addition processing to the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image; perform the first homotopy processing on the edge-added high-resolution sub-image To obtain a first edged high-resolution sub-image; perform a second homotopy process on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
上述实施例中,通过对原训练图像的清晰度或背景色调等特征进行调整,得到多类新的训练图像,新的训练图像无需再做人工打标处理,丰富了表情识别模型的训练样本图像集,大大降低训练样本图像打标作业所消耗的时间和人力成本。此外,统计所述表情识别模型对原训练图像、所述第一类训练图像、所述第二类训练图像、所述第三类训练图像识别的准确率,为评估表情识别模型的实际效果提供依据。In the above-mentioned embodiment, by adjusting the characteristics of the original training image, such as the sharpness or background hue, multiple types of new training images are obtained. The new training images do not need to be manually marked, which enriches the training sample images of the expression recognition model Collection, greatly reducing the time and labor cost of training sample image marking operations. In addition, statistics of the recognition accuracy of the facial expression recognition model on the original training image, the first type of training image, the second type of training image, and the third type of training image are provided for evaluating the actual effect of the facial expression recognition model in accordance with.
基于相同的技术构思,本申请还提供了一种计算机设备,如图3所示,该计算机设备包括输入输出单元31、处理器32和存储器33,所述存储器33中存储有计算机可读指令,所述计算机可读指令被所述处理器32执行时,使得所述处理器执行上述各实施方式中的所述的表情识别模型训练方法的步骤。Based on the same technical concept, the present application also provides a computer device, as shown in FIG. 3, the computer device includes an input output unit 31, a processor 32, and a memory 33. The memory 33 stores computer readable instructions, When the computer-readable instructions are executed by the processor 32, the processor executes the steps of the expression recognition model training method in the foregoing embodiments.
图2中所示的获取模块1对应的实体设备为图3所示的输入输出单元31,该输入输出单元31能够实现获取模块1部分或全部的功能,或者实现与获取模块1相同或相似的功能。The physical device corresponding to the acquisition module 1 shown in FIG. 2 is the input and output unit 31 shown in FIG. 3, which can realize part or all of the functions of the acquisition module 1, or realize the same or similar functions as the acquisition module 1. Features.
图2中所示的处理模块2对应的实体设备为图3所示的处理器32,该处理器32能够实现处理模块2部分或全部的功能,或者实现与处理模块2相同或相似的功能。The physical device corresponding to the processing module 2 shown in FIG. 2 is the processor 32 shown in FIG. 3, and the processor 32 can implement part or all of the functions of the processing module 2 or implement the same or similar functions as the processing module 2.
基于相同的技术构思,本申请还提供了一种存储有计算机可读指令的存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施方式中的所述的表情识别模型训练方法的步骤。Based on the same technical concept, the present application also provides a storage medium storing computer-readable instructions. The computer-readable storage medium may be non-volatile or volatile. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of the expression recognition model training method in the foregoing embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product. The computer software product is stored in a storage medium (such as ROM/RAM), including Several instructions are used to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,凡是利用本申请说明书及附图内容所作 的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,这些均属于本申请的保护之内。The embodiments of the present application are described above with reference to the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative and not restrictive. Those of ordinary skill in the art are Under the enlightenment of this application, without departing from the purpose of this application and the scope of protection of the claims, many forms can be made, any equivalent structure or equivalent process transformation made by using the content of the description and drawings of this application, or Directly or indirectly used in other related technical fields, these are all protected by this application.

Claims (21)

  1. 一种表情识别模型训练方法,其中,包括:An expression recognition model training method, which includes:
    获取原训练图像集合;所述原训练图像集合包括多个已标注的原训练图像;Acquiring an original training image set; the original training image set includes a plurality of labeled original training images;
    对所述原训练图像集合分别进行以下处理:Perform the following processing on the original training image set:
    降低所述原训练图像集合中的各原训练图像的分辨率,得到第一类训练图像集合;Reducing the resolution of each original training image in the original training image set to obtain the first type of training image set;
    渲染所述原训练图像集合中的各原训练图像的背景光线,得到第二类训练图像集合;Rendering the background light of each original training image in the original training image set to obtain the second type of training image set;
    降低所述原训练图像集合中的各原训练图像的分辨率,并且渲染各原训练图像的背景光线,得到第三类训练图像集合;Reducing the resolution of each original training image in the original training image set, and rendering the background light of each original training image, to obtain a third type of training image set;
    分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型。The expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
  2. 根据权利要求1所述的表情识别模型训练方法,其中,The expression recognition model training method according to claim 1, wherein:
    在所述分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型之后,所述方法还包括:After the expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection respectively, the method further includes:
    获取原测试图像集合;所述原测试图像集合包括多个原测试图像;原测试图像用于测试训练后的表情识别模型对人脸图像识别的准确率;Acquiring an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the accuracy of facial image recognition by the expression recognition model after training;
    对所述原测试图像集合分别进行以下处理:Perform the following processing on the original test image set:
    降低所述原测试图像集合中的各原测试图像的分辨率,得到第一类测试图像集合;Reducing the resolution of each original test image in the original test image set to obtain the first type of test image set;
    渲染所述原测试图像集合中的各原测试图像的背景光线,得到第二类测试图像集合;Rendering the background light of each original test image in the original test image set to obtain the second type of test image set;
    降低所述原测试图像集合中的各原测试图像的分辨率,并且渲染各原测试图像的背景光线,得到第三类测试图像集合;Reducing the resolution of each original test image in the original test image set, and rendering the background light of each original test image to obtain the third type of test image set;
    通过训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合进行识别;Recognizing the original test image collection, the first type test image collection, the second type test image collection, and the third type test image collection through the trained expression recognition model;
    分别统计所述训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合识别的准确率。Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively.
  3. 根据权利要求1所述的表情识别模型训练方法,其中,The expression recognition model training method according to claim 1, wherein:
    采用深度神经网络模型降低所述原训练图像集合中的各原训练图像的分辨率;Using a deep neural network model to reduce the resolution of each original training image in the original training image set;
    在所述获取原训练图像集合之前,所述方法还包括:Before the acquiring the original training image set, the method further includes:
    以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型;高分辨率子图像样本为低分辨率子图像样本分辨率转化后的图像。The high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework to generate the deep neural network model; the high-resolution sub-image sample is Low-resolution sub-image sample resolution converted image.
  4. 根据权利要求3所述的表情识别模型训练方法,其中,The expression recognition model training method according to claim 3, wherein:
    在所述以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型之前,所述方法还包括:Before the high-resolution sub-image samples are used as the input samples of the deep neural network framework, and the low-resolution sub-image samples are used as the output comparison samples of the deep neural network framework to generate the deep neural network model, the method Also includes:
    将低分辨率图像样本分割为多个低分辨率子图像样本;Divide the low-resolution image sample into multiple low-resolution sub-image samples;
    采用图像转换算法对低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本。The image conversion algorithm is used to perform image conversion on the low-resolution sub-image samples, and the high-resolution sub-image samples corresponding to the low-resolution sub-image samples are obtained.
  5. 根据权利要求4所述的表情识别模型训练方法,其中,The expression recognition model training method according to claim 4, wherein:
    所述采用图像转换算法对低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本,包括:The image conversion algorithm for the low-resolution sub-image sample to obtain the high-resolution sub-image sample corresponding to the low-resolution sub-image sample includes:
    采用图像全变分算法对低分辨率子图像样本进行分解,得到卡通子图像样本和纹理子图像样本;Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples;
    采用插值算法对所述卡通子图像样本进行放大,得到放大后的卡通子图像样本;Using an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample;
    采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本;Amplify the texture sub-image sample by using the homotopy method to obtain an enlarged texture sub-image sample;
    对所述放大后的卡通子图像样本和所述放大后的纹理子图像样本进行合成,得到高分辨率子图像样本。Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.
  6. 根据权利要求5所述的表情识别模型训练方法,其中,The expression recognition model training method according to claim 5, wherein:
    所述图像全变分算法的表达式为:The expression of the image total variation algorithm is:
    Figure PCTCN2020087605-appb-100001
    Figure PCTCN2020087605-appb-100001
    其中,(x p,y p)表示低分辨率子图像样本中当前中心像素点;(x q,y q)表示(x p,y p)的全变分的像素点;
    Figure PCTCN2020087605-appb-100002
    是(x p,y p)和(x q,y q)所在物体内的像素值的方差,c p,q为相乘因子;T g为预设阈值;
    Figure PCTCN2020087605-appb-100003
    Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
    Figure PCTCN2020087605-appb-100002
    Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
    Figure PCTCN2020087605-appb-100003
  7. 根据权利要求5所述的表情识别模型训练方法,其中,The expression recognition model training method according to claim 5, wherein:
    所述采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本,包括:The using the homotopy method to enlarge the texture sub-image sample to obtain the enlarged texture sub-image sample includes:
    采用字典训练算法得到所述纹理子图像样本的图像块字典;Using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample;
    采用所述图像块字典和正交匹配跟踪方法对所述纹理子图像样本进行放大,得到初始高分辨率子图像;Amplify the texture sub-image samples by using the image block dictionary and the orthogonal matching tracking method to obtain an initial high-resolution sub-image;
    对所述初始高分辨率子图像进行最近邻的加边处理,得到加边高分辨子图像;Performing nearest neighbor edge addition processing on the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image;
    对所述加边高分辨子图像进行第一次同伦处理,得到第一加边高分辨率子图像;Performing the first homotopy processing on the edged high-resolution sub-image to obtain the first edged high-resolution sub-image;
    对所述第一加边高分辨率子图像进行第二次同伦处理,得到所述放大后的纹理子图像样本。A second homotopy process is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
  8. 一种计算机设备,其中,包括输入输出单元、存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下的步骤:A computer device, which includes an input and output unit, a memory, and a processor. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the processor executes the following step:
    获取原训练图像集合;所述原训练图像集合包括多个已标注的原训练图像;Acquiring an original training image set; the original training image set includes a plurality of labeled original training images;
    对所述原训练图像集合分别进行以下处理:Perform the following processing on the original training image set:
    降低所述原训练图像集合中的各原训练图像的分辨率,得到第一类训练图像集合;Reducing the resolution of each original training image in the original training image set to obtain the first type of training image set;
    渲染所述原训练图像集合中的各原训练图像的背景光线,得到第二类训练图像集合;Rendering the background light of each original training image in the original training image set to obtain the second type of training image set;
    降低所述原训练图像集合中的各原训练图像的分辨率,并且渲染各原训练图像的背景光线,得到第三类训练图像集合;Reducing the resolution of each original training image in the original training image set, and rendering the background light of each original training image, to obtain a third type of training image set;
    分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型。The expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
  9. 根据权利要求8所述的计算机设备,其中,在所述分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型之后,所述方法还包括:8. The computer device according to claim 8, wherein the original training image set, the first type training image set, the second type training image set, and the third type training image set are passed through After training the facial expression recognition model, the method further includes:
    获取原测试图像集合;所述原测试图像集合包括多个原测试图像;原测试图像用于测试训练后的表情识别模型对人脸图像识别的准确率;Acquiring an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the accuracy of facial image recognition by the expression recognition model after training;
    对所述原测试图像集合分别进行以下处理:Perform the following processing on the original test image set:
    降低所述原测试图像集合中的各原测试图像的分辨率,得到第一类测试图像集合;Reducing the resolution of each original test image in the original test image set to obtain the first type of test image set;
    渲染所述原测试图像集合中的各原测试图像的背景光线,得到第二类测试图像集合;Rendering the background light of each original test image in the original test image set to obtain the second type of test image set;
    降低所述原测试图像集合中的各原测试图像的分辨率,并且渲染各原测试图像的背景光线,得到第三类测试图像集合;Reducing the resolution of each original test image in the original test image set, and rendering the background light of each original test image to obtain the third type of test image set;
    通过训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合进行识别;Recognizing the original test image collection, the first type test image collection, the second type test image collection, and the third type test image collection through the trained expression recognition model;
    分别统计所述训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合识别的准确率。Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively.
  10. 根据权利要求8所述的计算机设备,其中,采用深度神经网络模型降低所述原训练图像集合中的各原训练图像的分辨率;8. The computer device according to claim 8, wherein a deep neural network model is used to reduce the resolution of each original training image in the original training image set;
    在所述获取原训练图像集合之前,所述方法还包括:Before the acquiring the original training image set, the method further includes:
    以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型;高分辨率子图像样本为低分辨率子图像样本分辨率转化后的图像。The high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework to generate the deep neural network model; the high-resolution sub-image sample is Low-resolution sub-image sample resolution converted image.
  11. 根据权利要求10所述的计算机设备,其中,在所述以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型之前,所述方法还包括:10. The computer device according to claim 10, wherein the high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework, Before generating the deep neural network model, the method further includes:
    将低分辨率图像样本分割为多个低分辨率子图像样本;Divide the low-resolution image sample into multiple low-resolution sub-image samples;
    采用图像转换算法对低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本。The image conversion algorithm is used to perform image conversion on the low-resolution sub-image samples, and the high-resolution sub-image samples corresponding to the low-resolution sub-image samples are obtained.
  12. 根据权利要求11所述的计算机设备,其中,所述采用图像转换算法对 低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本,包括:11. The computer device according to claim 11, wherein said using an image conversion algorithm to perform image conversion on low-resolution sub-image samples to obtain high-resolution sub-image samples corresponding to the low-resolution sub-image samples comprises:
    采用图像全变分算法对低分辨率子图像样本进行分解,得到卡通子图像样本和纹理子图像样本;Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples;
    采用插值算法对所述卡通子图像样本进行放大,得到放大后的卡通子图像样本;Using an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample;
    采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本;Amplify the texture sub-image sample by using the homotopy method to obtain an enlarged texture sub-image sample;
    对所述放大后的卡通子图像样本和所述放大后的纹理子图像样本进行合成,得到高分辨率子图像样本。Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.
  13. 根据权利要求12所述的计算机设备,其中,所述图像全变分算法的表达式为:The computer device according to claim 12, wherein the expression of the image total variation algorithm is:
    Figure PCTCN2020087605-appb-100004
    Figure PCTCN2020087605-appb-100004
    其中,(x p,y p)表示低分辨率子图像样本中当前中心像素点;(x q,y q)表示(x p,y p)的全变分的像素点;
    Figure PCTCN2020087605-appb-100005
    是(x p,y p)和(x q,y q)所在物体内的像素值的方差,c p,q为相乘因子;T g为预设阈值;
    Figure PCTCN2020087605-appb-100006
    Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
    Figure PCTCN2020087605-appb-100005
    Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
    Figure PCTCN2020087605-appb-100006
  14. 根据权利要求13所述的计算机设备,其中,所述采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本,包括:11. The computer device according to claim 13, wherein the using the homotopy method to enlarge the texture sub-image sample to obtain an enlarged texture sub-image sample comprises:
    采用字典训练算法得到所述纹理子图像样本的图像块字典;Using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample;
    采用所述图像块字典和正交匹配跟踪方法对所述纹理子图像样本进行放大,得到初始高分辨率子图像;Amplify the texture sub-image samples by using the image block dictionary and the orthogonal matching tracking method to obtain an initial high-resolution sub-image;
    对所述初始高分辨率子图像进行最近邻的加边处理,得到加边高分辨子图像;Performing nearest neighbor edge addition processing on the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image;
    对所述加边高分辨子图像进行第一次同伦处理,得到第一加边高分辨率子图像;Performing the first homotopy processing on the edged high-resolution sub-image to obtain the first edged high-resolution sub-image;
    对所述第一加边高分辨率子图像进行第二次同伦处理,得到所述放大后的纹理子图像样本。A second homotopy process is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
  15. 一种存储有计算机可读指令的存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下的步骤:A storage medium storing computer-readable instructions, where when the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取原训练图像集合;所述原训练图像集合包括多个已标注的原训练图像;Acquiring an original training image set; the original training image set includes a plurality of labeled original training images;
    对所述原训练图像集合分别进行以下处理:Perform the following processing on the original training image set:
    降低所述原训练图像集合中的各原训练图像的分辨率,得到第一类训练图像集合;Reducing the resolution of each original training image in the original training image set to obtain the first type of training image set;
    渲染所述原训练图像集合中的各原训练图像的背景光线,得到第二类训练图像集合;Rendering the background light of each original training image in the original training image set to obtain the second type of training image set;
    降低所述原训练图像集合中的各原训练图像的分辨率,并且渲染各原训练 图像的背景光线,得到第三类训练图像集合;Reducing the resolution of each original training image in the original training image set, and rendering the background light of each original training image to obtain a third type of training image set;
    分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型。The expression recognition model is trained through the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection.
  16. 根据权利要求15所述的存储介质,其中,在所述分别通过所述原训练图像集合、所述第一类训练图像集合、所述第二类训练图像集合以及所述第三类训练图像集合训练表情识别模型之后,所述方法还包括:The storage medium according to claim 15, wherein the original training image collection, the first type training image collection, the second type training image collection, and the third type training image collection are passed through After training the facial expression recognition model, the method further includes:
    获取原测试图像集合;所述原测试图像集合包括多个原测试图像;原测试图像用于测试训练后的表情识别模型对人脸图像识别的准确率;Acquiring an original test image set; the original test image set includes a plurality of original test images; the original test images are used to test the accuracy of facial image recognition by the expression recognition model after training;
    对所述原测试图像集合分别进行以下处理:Perform the following processing on the original test image set:
    降低所述原测试图像集合中的各原测试图像的分辨率,得到第一类测试图像集合;Reducing the resolution of each original test image in the original test image set to obtain the first type of test image set;
    渲染所述原测试图像集合中的各原测试图像的背景光线,得到第二类测试图像集合;Rendering the background light of each original test image in the original test image set to obtain the second type of test image set;
    降低所述原测试图像集合中的各原测试图像的分辨率,并且渲染各原测试图像的背景光线,得到第三类测试图像集合;Reducing the resolution of each original test image in the original test image set, and rendering the background light of each original test image to obtain the third type of test image set;
    通过训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合进行识别;Recognizing the original test image collection, the first type test image collection, the second type test image collection, and the third type test image collection through the trained expression recognition model;
    分别统计所述训练后的表情识别模型对所述原测试图像集合、所述第一类测试图像集合、所述第二类测试图像集合以及所述第三类测试图像集合识别的准确率。Count the recognition accuracy rates of the original test image set, the first type test image set, the second type test image set, and the third type test image set by the trained expression recognition model respectively.
  17. 根据权利要求15所述的存储介质,其中,采用深度神经网络模型降低所述原训练图像集合中的各原训练图像的分辨率;The storage medium according to claim 15, wherein a deep neural network model is used to reduce the resolution of each original training image in the original training image set;
    在所述获取原训练图像集合之前,所述方法还包括:Before the acquiring the original training image set, the method further includes:
    以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型;高分辨率子图像样本为低分辨率子图像样本分辨率转化后的图像。The high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework to generate the deep neural network model; the high-resolution sub-image sample is Low-resolution sub-image sample resolution converted image.
  18. 根据权利要求17所述的存储介质,其中,在所述以高分辨率子图像样本作为深度神经网络框架的输入样本,以低分辨率子图像样本作为所述深度神经网络框架的输出对比样本,生成所述深度神经网络模型之前,所述方法还包括:The storage medium according to claim 17, wherein the high-resolution sub-image sample is used as the input sample of the deep neural network framework, and the low-resolution sub-image sample is used as the output comparison sample of the deep neural network framework, Before generating the deep neural network model, the method further includes:
    将低分辨率图像样本分割为多个低分辨率子图像样本;Divide the low-resolution image sample into multiple low-resolution sub-image samples;
    采用图像转换算法对低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本。The image conversion algorithm is used to perform image conversion on the low-resolution sub-image samples, and the high-resolution sub-image samples corresponding to the low-resolution sub-image samples are obtained.
  19. 根据权利要求18所述的存储介质,其中,所述采用图像转换算法对低分辨率子图像样本进行图像转换,得到低分辨率子图像样本对应的高分辨率子图像样本,包括:18. The storage medium according to claim 18, wherein said using an image conversion algorithm to perform image conversion on low-resolution sub-image samples to obtain high-resolution sub-image samples corresponding to the low-resolution sub-image samples comprises:
    采用图像全变分算法对低分辨率子图像样本进行分解,得到卡通子图像样本和纹理子图像样本;Decompose the low-resolution sub-image samples using the image total variation algorithm to obtain cartoon sub-image samples and texture sub-image samples;
    采用插值算法对所述卡通子图像样本进行放大,得到放大后的卡通子图像样本;Using an interpolation algorithm to enlarge the cartoon sub-image sample to obtain an enlarged cartoon sub-image sample;
    采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本;Amplify the texture sub-image sample by using the homotopy method to obtain an enlarged texture sub-image sample;
    对所述放大后的卡通子图像样本和所述放大后的纹理子图像样本进行合成,得到高分辨率子图像样本。Synthesize the enlarged cartoon sub-image sample and the enlarged texture sub-image sample to obtain a high-resolution sub-image sample.
  20. 根据权利要求19所述的存储介质,其中,所述图像全变分算法的表达式为:The storage medium according to claim 19, wherein the expression of the image total variation algorithm is:
    Figure PCTCN2020087605-appb-100007
    Figure PCTCN2020087605-appb-100007
    其中,(x p,y p)表示低分辨率子图像样本中当前中心像素点;(x q,y q)表示(x p,y p)的全变分的像素点;
    Figure PCTCN2020087605-appb-100008
    是(x p,y p)和(x q,y q)所在物体内的像素值的方差,c p,q为相乘因子;T g为预设阈值;
    Figure PCTCN2020087605-appb-100009
    Among them, (x p , y p ) represents the current central pixel in the low-resolution sub-image sample; (x q , y q ) represents the total variational pixel of (x p , y p );
    Figure PCTCN2020087605-appb-100008
    Is the variance of the pixel values in the object where (x p , y p ) and (x q , y q ) are located, c p, q are the multiplication factors; T g is the preset threshold;
    Figure PCTCN2020087605-appb-100009
  21. 根据权利要求20所述的存储介质,其中,所述采用同伦法对所述纹理子图像样本进行放大,得到放大后的纹理子图像样本,包括:22. The storage medium according to claim 20, wherein said using the homotopy method to enlarge said texture sub-image sample to obtain an enlarged texture sub-image sample comprises:
    采用字典训练算法得到所述纹理子图像样本的图像块字典;Using a dictionary training algorithm to obtain an image block dictionary of the texture sub-image sample;
    采用所述图像块字典和正交匹配跟踪方法对所述纹理子图像样本进行放大,得到初始高分辨率子图像;Amplify the texture sub-image samples by using the image block dictionary and the orthogonal matching tracking method to obtain an initial high-resolution sub-image;
    对所述初始高分辨率子图像进行最近邻的加边处理,得到加边高分辨子图像;Performing nearest neighbor edge addition processing on the initial high-resolution sub-image to obtain an edge-added high-resolution sub-image;
    对所述加边高分辨子图像进行第一次同伦处理,得到第一加边高分辨率子图像;Performing the first homotopy processing on the edged high-resolution sub-image to obtain the first edged high-resolution sub-image;
    对所述第一加边高分辨率子图像进行第二次同伦处理,得到所述放大后的纹理子图像样本。A second homotopy process is performed on the first edged high-resolution sub-image to obtain the enlarged texture sub-image sample.
PCT/CN2020/087605 2019-05-22 2020-04-28 Expression recognition model training method and apparatus, and device and storage medium WO2020233368A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910427443.1A CN110309713A (en) 2019-05-22 2019-05-22 Expression Recognition model training method, device, equipment and storage medium
CN201910427443.1 2019-05-22

Publications (1)

Publication Number Publication Date
WO2020233368A1 true WO2020233368A1 (en) 2020-11-26

Family

ID=68075415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087605 WO2020233368A1 (en) 2019-05-22 2020-04-28 Expression recognition model training method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN110309713A (en)
WO (1) WO2020233368A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784857A (en) * 2021-01-29 2021-05-11 北京三快在线科技有限公司 Model training and image processing method and device
CN113255517A (en) * 2021-05-24 2021-08-13 中国科学技术大学 Privacy-protecting expression recognition model training method and expression recognition method and device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309713A (en) * 2019-05-22 2019-10-08 深圳壹账通智能科技有限公司 Expression Recognition model training method, device, equipment and storage medium
CN111597476B (en) * 2020-05-06 2023-08-22 北京金山云网络技术有限公司 Image processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767416A (en) * 2017-09-05 2018-03-06 华南理工大学 The recognition methods of pedestrian's direction in a kind of low-resolution image
CN108492343A (en) * 2018-03-28 2018-09-04 东北大学 A kind of image combining method for the training data expanding target identification
US20180286037A1 (en) * 2017-03-31 2018-10-04 Greg Zaharchuk Quality of Medical Images Using Multi-Contrast and Deep Learning
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
CN110309713A (en) * 2019-05-22 2019-10-08 深圳壹账通智能科技有限公司 Expression Recognition model training method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180286037A1 (en) * 2017-03-31 2018-10-04 Greg Zaharchuk Quality of Medical Images Using Multi-Contrast and Deep Learning
CN107767416A (en) * 2017-09-05 2018-03-06 华南理工大学 The recognition methods of pedestrian's direction in a kind of low-resolution image
CN108492343A (en) * 2018-03-28 2018-09-04 东北大学 A kind of image combining method for the training data expanding target identification
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
CN110309713A (en) * 2019-05-22 2019-10-08 深圳壹账通智能科技有限公司 Expression Recognition model training method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
有三AI (YOUSAN AI): "一文道尽深度学习中的数据增强方法 (上) (non-official translation: All you Need to Know on Data Enhancement Method in Deep Learning (part 1))", HTTPS://WWW.JIANSHU.COM/P/99450DBDADCF, 28 June 2018 (2018-06-28), DOI: 20200707132046Y *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784857A (en) * 2021-01-29 2021-05-11 北京三快在线科技有限公司 Model training and image processing method and device
CN112784857B (en) * 2021-01-29 2022-11-04 北京三快在线科技有限公司 Model training and image processing method and device
CN113255517A (en) * 2021-05-24 2021-08-13 中国科学技术大学 Privacy-protecting expression recognition model training method and expression recognition method and device
CN113255517B (en) * 2021-05-24 2023-10-24 中国科学技术大学 Expression recognition model training method for protecting privacy and expression recognition method and device

Also Published As

Publication number Publication date
CN110309713A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
WO2020233368A1 (en) Expression recognition model training method and apparatus, and device and storage medium
Nhan Duong et al. Temporal non-volume preserving approach to facial age-progression and age-invariant face recognition
JP6891351B2 (en) How to generate a human hairstyle based on multi-feature search and deformation
CN108596024B (en) Portrait generation method based on face structure information
CN109190722B (en) Font style migration transformation method based on Manchu character picture
CN110969250B (en) Neural network training method and device
Ding et al. Latent low-rank transfer subspace learning for missing modality recognition
CN107730451A (en) A kind of compressed sensing method for reconstructing and system based on depth residual error network
CN111931602B (en) Attention mechanism-based multi-flow segmented network human body action recognition method and system
Tian et al. Kaokore: A pre-modern japanese art facial expression dataset
CN109829924B (en) Image quality evaluation method based on principal feature analysis
Zhang et al. Sienet: Siamese expansion network for image extrapolation
CN106777986A (en) Ligand molecular fingerprint generation method based on depth Hash in drug screening
CN111028319B (en) Three-dimensional non-photorealistic expression generation method based on facial motion unit
CN105654127A (en) End-to-end-based picture character sequence continuous recognition method
CN110727819B (en) Method for retrieving scale-adaptive pathological full-section image database
Wang et al. An encrypted traffic classification framework based on convolutional neural networks and stacked autoencoders
Zhang et al. A separation–aggregation network for image denoising
CN112598587A (en) Image processing system and method combining face mask removal and super-resolution
CN108898568A (en) Image composition method and device
CN114495119A (en) Real-time irregular text recognition method under complex scene
CN110766695B (en) Image sparse representation-based matting method
CN113888501A (en) Non-reference image quality evaluation method based on attention positioning network
CN109615005A (en) Image set categorizing system and method based on manifold deep learning and extreme learning machine
CN111368831B (en) Positioning system and method for vertical text

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20810789

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20810789

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20810789

Country of ref document: EP

Kind code of ref document: A1