CN112380898A - Method, device and equipment for recognizing facial expressions in live lessons - Google Patents

Method, device and equipment for recognizing facial expressions in live lessons Download PDF

Info

Publication number
CN112380898A
CN112380898A CN202011065779.7A CN202011065779A CN112380898A CN 112380898 A CN112380898 A CN 112380898A CN 202011065779 A CN202011065779 A CN 202011065779A CN 112380898 A CN112380898 A CN 112380898A
Authority
CN
China
Prior art keywords
image
data set
training data
initial
facial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011065779.7A
Other languages
Chinese (zh)
Inventor
李天驰
孙悦
王帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Dianmao Technology Co Ltd
Original Assignee
Shenzhen Dianmao Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Dianmao Technology Co Ltd filed Critical Shenzhen Dianmao Technology Co Ltd
Priority to CN202011065779.7A priority Critical patent/CN112380898A/en
Publication of CN112380898A publication Critical patent/CN112380898A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method, a device and equipment for identifying facial expressions in a live lesson, wherein the method comprises the following steps: acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set; constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model; and acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result. According to the embodiment of the invention, the ROI processing is adopted in the image processing process, so that the training data set is added, the accuracy of facial expression recognition is improved, and the robustness of a training model is enhanced.

Description

Method, device and equipment for recognizing facial expressions in live lessons
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a device for recognizing facial expressions in a live lesson.
Background
With the rise of the artificial intelligence industry, facial expression recognition technology based on deep learning is more and more concerned by people, and especially in a network live broadcast class, how the current class listening state of a student is can be obtained by analyzing the facial expressions of the student in a live broadcast video, so that teacher management and teaching are facilitated. There are two main problems in the current facial expression recognition method: although the facial expression data sets are many and various at present, the expressions of most data sets are shot by a camera from a certain angle, and the number of expression images is small, so that the trained model has certain uncertainty, weak generalization capability on random new data and low robustness; the traditional LeNet-5 convolutional neural network is used for identifying handwritten numbers, and low-level detail features are not considered when feature extraction is carried out, so that the problem of gradient disappearance or explosion easily occurs if the network is deeper and deeper. Therefore, the facial expression recognition model in the prior art for the live webcasting class is poor in robustness and low in recognition accuracy.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
In view of the foregoing deficiencies of the prior art, an object of the present invention is to provide a method, an apparatus and a device for recognizing facial expressions in a live broadcast class, which aim to solve the technical problems of poor robustness and low recognition accuracy of a facial expression recognition model in a live broadcast class in the prior art.
The technical scheme of the invention is as follows:
a method of identifying facial expressions in a live lesson, the method comprising:
acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set;
constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model;
and acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result.
Further, before the acquiring an original training data set of an original facial expression and performing ROI processing on the original training data set to generate a target training data set, the method includes:
the method comprises the steps of obtaining student images in a live broadcast course through a camera, carrying out face recognition on the student images, and generating facial images to be recognized.
Further preferably, the performing face recognition on the student image to generate a face image to be recognized includes:
the student images are identified through a face identification algorithm, and facial images to be identified, including eyes, a nose and a mouth, are generated according to the identification result.
Further preferably, the acquiring an original training data set of an original facial expression, performing ROI processing on the original training data set, and generating a target training data set includes:
acquiring an initial image in an original training data set of original facial expression, carrying out quartering processing on the initial image, zooming four equally divided images, wherein the four zoomed images are consistent with the size of the initial image, and generating four equally divided images;
respectively carrying out upper half shielding and lower half shielding processing on the initial image to generate two shielding images;
carrying out mirror image processing on the initial image to generate a mirror image;
carrying out center focusing and zooming on the initial image to generate a focused image;
generating an extended training data set corresponding to the initial image according to the initial image, the equally divided images, the mirror image and the gathered image;
after the ROI processing operation is performed on all initial images of the original training data set, an extended training data set corresponding to all initial images in the original training data set is generated, and a target training data set is generated according to the extended training data set.
Preferably, after the first half shielding and the second half shielding are respectively performed on the initial image, two shielding images are generated, including:
and respectively carrying out shielding operation on the upper half part and the lower half part of the initial image according to an image processing tool to generate an upper half shielding image and a lower half shielding image.
Further, the mirroring processing of the initial image to generate a mirror image includes:
and carrying out mirror image processing on the initial image based on a number axis, and recalculating the marked feature points according to a mirror image principle to generate a mirror image.
Further, the center focusing and zooming the initial image to generate a focused image includes:
and carrying out central focusing on the initial image, carrying out size scaling on the focused initial image, wherein the size of the scaled image is the same as that of the initial image, and recalculating the labeled characteristic points according to the scaling to generate a focused image.
Another embodiment of the present invention provides an apparatus for recognizing a facial expression in a live lesson, the apparatus comprising:
the ROI processing module is used for acquiring an original training data set of original facial expressions, and performing ROI processing on the original training data set to generate a target training data set;
the model training module is used for constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, training the initial facial expression recognition model according to a target training data set and generating a target facial expression recognition model;
and the facial expression recognition module is used for acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model and generating a facial expression recognition result.
Another embodiment of the present invention provides an apparatus for identifying facial expressions in a live lesson, the apparatus comprising at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described method of identifying facial expressions in a live class.
Yet another embodiment of the present invention provides a non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described method of identifying facial expressions in a live class.
Has the advantages that: according to the embodiment of the invention, the ROI processing is adopted in the image processing process, so that the training data set is added, the accuracy of facial expression recognition is improved, and the robustness of a training model is enhanced.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a preferred embodiment of a method for identifying facial expressions in a live lesson according to the present invention;
FIG. 2 is a schematic structural diagram of a cross-layer connected convolutional neural network according to a preferred embodiment of the present invention;
FIG. 3 is a functional block diagram of an apparatus for recognizing facial expressions in a live lesson according to an embodiment of the present invention;
fig. 4 is a diagram illustrating a hardware configuration of an apparatus for recognizing facial expressions in a live lesson according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is described in further detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Embodiments of the present invention will be described below with reference to the accompanying drawings.
The embodiment of the invention provides a method for identifying facial expressions in a live lesson. Referring to fig. 1, fig. 1 is a flowchart illustrating a method for recognizing facial expressions in a live lesson according to a preferred embodiment of the present invention. As shown in fig. 1, it includes the steps of:
s100, acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set;
s200, constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model;
and step S300, acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result.
In specific implementation, the embodiment of the invention provides a method for recognizing facial expressions in a live broadcast course aiming at the problems of facial expression recognition in a live broadcast video course in the prior art, the method comprises the steps of processing a situation data set based on the thought of a region of interest (ROI) to generate a target training data set, then improving a LeNet-5 neural network by using a cross-layer connection method, taking low-level network characteristics into consideration, constructing an initial facial expression recognition model, and training the initial area expression recognition model through the target training data set to generate a facial expression recognition model. And acquiring a facial image to be recognized, and inputting the facial image to be recognized into the target facial expression recognition model to analyze the recognition result of the facial expression. The algorithm not only improves the accuracy of facial expression recognition, but also enhances the robustness of the training model.
Fig. 2 is a schematic structural diagram of a cross-layer connected convolutional neural network, and as shown in fig. 2, the network includes 1 input layer, 3 convolutional layers, 2 pooling layers, 1 fully-connected layer, and 1 output layer. The Input layer is an Input layer, and the Input emoticon pixels are 32 × 32. The Layer1 is a convolution Layer having 6 feature maps, and the input 32 × 32 pixel pictures are respectively convolved with 6 convolution kernels of 5 × 5 pixels to obtain a feature map of 28 × 28 pixels. The Layer2 is a pooling Layer, and the 28 × 28 pixel feature map is pooled to obtain a 14 × 14 pixel feature map. The Layer3 is a convolution Layer having 16 feature maps, and the 14 × 14 pixel pictures obtained at the upper Layer are respectively convolved with 16 convolution kernels of 5 × 5 pixels to obtain a feature map of 10 × 10 pixels. The Layer4 is a pooling Layer, and the feature map of 10 × 10 pixels is pooled to obtain a feature map of 5 × 5 pixels. The Layer5 is a convolution Layer having 120 feature maps, and the 5 × 5 pixel pictures obtained at the upper Layer are respectively convolved with 120 convolution kernels of 5 × 5 pixels to obtain a feature map of 1 × 1 pixel. The Layer6 is a fully connected Layer, and has 84 units in total. The Output layer is an Output layer and outputs 7 expression types. The 7 expression types were happy, sad, surprised, afraid, angry, aversive, and neutral, respectively.
Further, before obtaining an original training data set of an original facial expression and performing ROI processing on the original training data set to generate a target training data set, the method includes:
the method comprises the steps of obtaining student images in a live broadcast course through a camera, carrying out face recognition on the student images, and generating facial images to be recognized.
In specific implementation, before the ROI setting process is carried out, the human face needs to be detected by human face recognition, and the whole image area is filled with the human face as much as possible so as to reduce errors.
Further, the face recognition is carried out on the student image, and a face image to be recognized is generated, wherein the face recognition comprises the following steps:
the student images are identified through a face identification algorithm, and facial images to be identified, including eyes, a nose and a mouth, are generated according to the identification result.
In particular implementation, the key point of the ROI setting scheme is to recognize a facial expression by detecting changes in eyes, nose, and mouth. Therefore, the image after face recognition must be a face image including eyes, nose, and mouth.
Further, acquiring an original training data set of an original facial expression, performing ROI processing on the original training data set, and generating a target training data set, including:
acquiring an initial image in an original training data set of original facial expression, carrying out quartering processing on the initial image, zooming four equally divided images, wherein the four zoomed images are consistent with the size of the initial image, and generating four equally divided images;
respectively carrying out upper half shielding and lower half shielding processing on the initial image to generate two shielding images;
carrying out mirror image processing on the initial image to generate a mirror image;
carrying out center focusing and zooming on the initial image to generate a focused image;
generating an extended training data set corresponding to the initial image according to the initial image, the equally divided images, the mirror image and the gathered image;
after the ROI processing operation is performed on all initial images of the original training data set, an extended training data set corresponding to all initial images in the original training data set is generated, and a target training data set is generated according to the extended training data set.
When the method is specifically implemented, ROI interest region processing is firstly carried out on training images, each training image is processed through the ROI interest region, a plurality of specific images are obtained, and a new database is formed. And inputting the new database into the improved cross-layer connection convolutional neural network module, and training to obtain a facial expression recognition model.
We cut the image first and divide the original image equally into four equal parts, which respectively contain the complete left eye area, the complete right eye area and each half area of the mouth. And then, the size of the cut image is zoomed, so that the size of the new image is consistent with that of the original image, the corresponding annotation category is also consistent with that of the original image, and the annotation feature points are recalculated according to the proportion. Thus, 4 new data are obtained, and the 4 new data are recorded as four equally divided images.
The facial expression library is subjected to 8 kinds of improvement processing, and the data of the expression library are added, so that 9 different ROI areas are formed. The data of the original image is added, a batch of data sets are reconstructed, the data sets constructed by the ROI scheme are 9 times of the original data sets in quantity, the diversity of samples is greatly enriched, the effectiveness of the expansion is that different ROI areas are mutually connected and mutually supplemented, and the reliability of the predicted target is enhanced. Most importantly, the data set produced by the method is more detailed and accurate in analyzing the detail characteristics of the expression than the original data. The model can focus more on emotional expression caused by more tiny change of the facial expression.
Further, after the initial image is respectively processed by the first half shielding and the second half shielding, two shielding images are generated, which includes:
and respectively carrying out shielding operation on the upper half part and the lower half part of the initial image according to an image processing tool to generate an upper half shielding image and a lower half shielding image.
In specific implementation, the original image is subjected to shielding processing again, the upper half part and the lower half part of the original image are respectively shielded by using an opencv image processing tool, so that the size of the obtained image is the same as that of the original image, the corresponding labeling type is consistent with that of the original image, the coordinate of the corresponding labeling feature point does not need to be recalculated, and 2 kinds of new data are obtained and recorded as an upper-half shielding image and a lower-half shielding image.
Further, the mirror image processing is performed on the initial image to generate a mirror image, and the mirror image processing method includes:
and carrying out mirror image processing on the initial image based on a number axis, and recalculating the marked feature points according to a mirror image principle to generate a mirror image.
In specific implementation, the original image is mirrored based on a numerical axis, the size of the mirrored image is the same as that of the original image, the corresponding annotation category is consistent with that of the original image, and the annotation feature points are recalculated by using the mirroring principle. This results in 1 new data, called mirror image.
Further, the initial image is focused and scaled to generate a focused image, which includes:
and carrying out central focusing on the initial image, carrying out size scaling on the focused initial image, wherein the size of the scaled image is the same as that of the initial image, and recalculating the labeled characteristic points according to the scaling to generate a focused image.
In specific implementation, the original image is subjected to center focusing, the size of the focused image is zoomed, the size of a new image is consistent with that of the original image, the corresponding annotation type is consistent with that of the original image, and the annotation feature points are recalculated according to the proportion. This results in 1 new data, denoted as the focused image.
The embodiment of the method can show that the invention provides a method for recognizing facial expressions in a direct broadcast course, 8 kinds of processing are carried out on an expression data set based on the thought of a region of interest (ROI), a set of new database is formed, the database is generated by specially training the model, and the image characteristics are particularly beneficial to the improvement of the accuracy of the expression recognition model. The second key point of the invention is that a cross-layer connection method is used for improving the LeNet-5 neural network, the characteristics of a lower layer network are also taken into consideration, the problem of gradient disappearance or explosion is avoided along with the deepening of the network, the accuracy of facial expression recognition is improved, and the robustness of a training model is enhanced.
It should be noted that, a certain order does not necessarily exist between the above steps, and those skilled in the art can understand, according to the description of the embodiments of the present invention, that in different embodiments, the above steps may have different execution orders, that is, may be executed in parallel, may also be executed interchangeably, and the like.
Another embodiment of the present invention provides an apparatus for recognizing facial expressions in a live lesson, as shown in fig. 3, the apparatus 1 including:
the ROI processing module 11 is configured to acquire an original training data set of an original facial expression, perform ROI processing on the original training data set, and generate a target training data set;
the model training module 12 is used for constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, training the initial facial expression recognition model according to a target training data set, and generating a target facial expression recognition model;
and the facial expression recognition module 13 is configured to acquire a facial image to be recognized, input the image to be recognized into the target facial expression recognition model, and generate a facial expression recognition result.
The specific implementation is shown in the method embodiment, and is not described herein again.
Another embodiment of the present invention provides an apparatus for recognizing facial expressions in a live lesson, as shown in fig. 4, the apparatus 10 including:
one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 4, the processor 110 and the memory 120 may be connected by a bus or other means, and fig. 4 illustrates a connection by a bus as an example.
Processor 110 is operative to implement various control logic of apparatus 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single chip, an ARM (Acorn RISC machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. Processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The memory 120, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to a method for identifying facial expressions in a live lesson in embodiments of the present invention. The processor 110 executes various functional applications and data processing of the device 10, i.e. implements the method of identifying facial expressions in a live lesson in the above-described method embodiments, by running non-volatile software programs, instructions and units stored in the memory 120.
The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an application program required for operating the device, at least one function; the storage data area may store data created according to the use of the device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more units are stored in the memory 120, which when executed by the one or more processors 110, perform the method of identifying facial expressions in a live lesson in any of the method embodiments described above, e.g., performing the method steps S100-S300 in fig. 1 described above.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S300 of fig. 1 described above.
By way of example, non-volatile storage media can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described herein are intended to comprise one or more of these and/or any other suitable types of memory.
Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of identifying facial expressions in a live class of the above-described method embodiment. For example, the method steps S100 to S300 in fig. 1 described above are performed.
The above-described embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions essentially or contributing to the related art can be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Conditional language such as "can," "might," or "may" is generally intended to convey that a particular embodiment can include (yet other embodiments do not include) particular features, elements, and/or operations, among others, unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements, and/or operations are in any way required for one or more embodiments or that one or more embodiments must include logic for deciding, with or without input or prompting, whether such features, elements, and/or operations are included or are to be performed in any particular embodiment.
What has been described herein in this specification and the accompanying drawings includes examples of methods and apparatuses that can provide recognition of facial expressions in a live class. It will, of course, not be possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the disclosure, but it can be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications can be made to the disclosure without departing from the scope or spirit thereof. In addition, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings and from practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and the drawings be considered in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims (10)

1. A method of recognizing facial expressions in a live lesson, the method comprising:
acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set;
constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model;
and acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result.
2. The method of claim 1, wherein the obtaining an original training data set of original facial expressions and performing ROI processing on the original training data set to generate a target training data set comprises:
the method comprises the steps of obtaining student images in a live broadcast course through a camera, carrying out face recognition on the student images, and generating facial images to be recognized.
3. The method of claim 2, wherein the performing face recognition on the student image to generate the facial image to be recognized comprises:
the student images are identified through a face identification algorithm, and facial images to be identified, including eyes, a nose and a mouth, are generated according to the identification result.
4. The method of claim 3, wherein the obtaining an original training data set of original facial expressions, performing ROI processing on the original training data set, and generating a target training data set comprises:
acquiring an initial image in an original training data set of original facial expression, carrying out quartering processing on the initial image, zooming four equally divided images, wherein the four zoomed images are consistent with the size of the initial image, and generating four equally divided images;
respectively carrying out upper half shielding and lower half shielding processing on the initial image to generate two shielding images;
carrying out mirror image processing on the initial image to generate a mirror image;
carrying out center focusing and zooming on the initial image to generate a focused image;
generating an extended training data set corresponding to the initial image according to the initial image, the equally divided images, the mirror image and the gathered image;
after the ROI processing operation is performed on all initial images of the original training data set, an extended training data set corresponding to all initial images in the original training data set is generated, and a target training data set is generated according to the extended training data set.
5. The method of claim 4, wherein the generating two occlusion images after respectively performing top half occlusion and bottom half occlusion on the initial image comprises:
and respectively carrying out shielding operation on the upper half part and the lower half part of the initial image according to an image processing tool to generate an upper half shielding image and a lower half shielding image.
6. The method of claim 5, wherein mirroring the initial image to generate a mirror image comprises:
and carrying out mirror image processing on the initial image based on a number axis, and recalculating the marked feature points according to a mirror image principle to generate a mirror image.
7. The method of claim 6, wherein the step of center focusing and zooming the initial image to generate a focused image comprises:
and carrying out central focusing on the initial image, carrying out size scaling on the focused initial image, wherein the size of the scaled image is the same as that of the initial image, and recalculating the labeled characteristic points according to the scaling to generate a focused image.
8. An apparatus for recognizing facial expressions in a live lesson, the apparatus comprising:
the ROI processing module is used for acquiring an original training data set of original facial expressions, and performing ROI processing on the original training data set to generate a target training data set;
the model training module is used for constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, training the initial facial expression recognition model according to a target training data set and generating a target facial expression recognition model;
and the facial expression recognition module is used for acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model and generating a facial expression recognition result.
9. An apparatus for recognizing facial expressions in a live lesson, the apparatus comprising at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of identifying facial expressions in a live lesson of any one of claims 1-7.
10. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of identifying facial expressions in a live lesson of any one of claims 1-7.
CN202011065779.7A 2020-09-30 2020-09-30 Method, device and equipment for recognizing facial expressions in live lessons Pending CN112380898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011065779.7A CN112380898A (en) 2020-09-30 2020-09-30 Method, device and equipment for recognizing facial expressions in live lessons

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011065779.7A CN112380898A (en) 2020-09-30 2020-09-30 Method, device and equipment for recognizing facial expressions in live lessons

Publications (1)

Publication Number Publication Date
CN112380898A true CN112380898A (en) 2021-02-19

Family

ID=74581001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011065779.7A Pending CN112380898A (en) 2020-09-30 2020-09-30 Method, device and equipment for recognizing facial expressions in live lessons

Country Status (1)

Country Link
CN (1) CN112380898A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330420A (en) * 2017-07-14 2017-11-07 河北工业大学 The facial expression recognizing method of rotation information is carried based on deep learning
CN107491726A (en) * 2017-07-04 2017-12-19 重庆邮电大学 A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks
EP3324333A2 (en) * 2016-11-21 2018-05-23 Samsung Electronics Co., Ltd. Method and apparatus to perform facial expression recognition and training

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3324333A2 (en) * 2016-11-21 2018-05-23 Samsung Electronics Co., Ltd. Method and apparatus to perform facial expression recognition and training
CN107491726A (en) * 2017-07-04 2017-12-19 重庆邮电大学 A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks
CN107330420A (en) * 2017-07-14 2017-11-07 河北工业大学 The facial expression recognizing method of rotation information is carried based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孙晓 等: "基于ROI-KNN卷积神经网络的面部表情识别", 自动化学报, vol. 42, no. 06, 15 June 2016 (2016-06-15), pages 2 *
李勇 等: "基于跨连接LeNet-5网络的面部表情识别", 自动化学报, vol. 44, no. 01, 15 January 2018 (2018-01-15), pages 2 *
郭昕刚 等: "连接卷积神经网络人脸表情识别算法", 长春工业大学学报, vol. 41, no. 04, 15 August 2020 (2020-08-15), pages 2 - 5 *

Similar Documents

Publication Publication Date Title
US11842487B2 (en) Detection model training method and apparatus, computer device and storage medium
US11348249B2 (en) Training method for image semantic segmentation model and server
US11908244B2 (en) Human posture detection utilizing posture reference maps
WO2020238560A1 (en) Video target tracking method and apparatus, computer device and storage medium
KR102591961B1 (en) Model training method and device, and terminal and storage medium for the same
WO2021017261A1 (en) Recognition model training method and apparatus, image recognition method and apparatus, and device and medium
Wen et al. End-to-end detection-segmentation system for face labeling
US20230049533A1 (en) Image gaze correction method, apparatus, electronic device, computer-readable storage medium, and computer program product
CN109960742B (en) Local information searching method and device
WO2020215573A1 (en) Captcha identification method and apparatus, and computer device and storage medium
CN109508638A (en) Face Emotion identification method, apparatus, computer equipment and storage medium
CN106326853B (en) Face tracking method and device
WO2021068325A1 (en) Facial action recognition model training method, facial action recognition method and apparatus, computer device, and storage medium
CN107886474A (en) Image processing method, device and server
CN112149651B (en) Facial expression recognition method, device and equipment based on deep learning
CN112633423B (en) Training method of text recognition model, text recognition method, device and equipment
US20220215558A1 (en) Method and apparatus for three-dimensional edge detection, storage medium, and computer device
CN113469092B (en) Character recognition model generation method, device, computer equipment and storage medium
WO2021169642A1 (en) Video-based eyeball turning determination method and system
CN112836653A (en) Face privacy method, device and apparatus and computer storage medium
CN109711356A (en) A kind of expression recognition method and system
CN111275051A (en) Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN112115860A (en) Face key point positioning method and device, computer equipment and storage medium
CN113269013A (en) Object behavior analysis method, information display method and electronic equipment
WO2021073150A1 (en) Data detection method and apparatus, and computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination