CN112380898A - Method, device and equipment for recognizing facial expressions in live lessons - Google Patents
Method, device and equipment for recognizing facial expressions in live lessons Download PDFInfo
- Publication number
- CN112380898A CN112380898A CN202011065779.7A CN202011065779A CN112380898A CN 112380898 A CN112380898 A CN 112380898A CN 202011065779 A CN202011065779 A CN 202011065779A CN 112380898 A CN112380898 A CN 112380898A
- Authority
- CN
- China
- Prior art keywords
- image
- data set
- training data
- initial
- facial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008921 facial expression Effects 0.000 title claims abstract description 102
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 90
- 238000012545 processing Methods 0.000 claims abstract description 48
- 230000001815 facial effect Effects 0.000 claims abstract description 16
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 11
- 239000000126 substance Substances 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 3
- 230000014509 gene expression Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method, a device and equipment for identifying facial expressions in a live lesson, wherein the method comprises the following steps: acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set; constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model; and acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result. According to the embodiment of the invention, the ROI processing is adopted in the image processing process, so that the training data set is added, the accuracy of facial expression recognition is improved, and the robustness of a training model is enhanced.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a device for recognizing facial expressions in a live lesson.
Background
With the rise of the artificial intelligence industry, facial expression recognition technology based on deep learning is more and more concerned by people, and especially in a network live broadcast class, how the current class listening state of a student is can be obtained by analyzing the facial expressions of the student in a live broadcast video, so that teacher management and teaching are facilitated. There are two main problems in the current facial expression recognition method: although the facial expression data sets are many and various at present, the expressions of most data sets are shot by a camera from a certain angle, and the number of expression images is small, so that the trained model has certain uncertainty, weak generalization capability on random new data and low robustness; the traditional LeNet-5 convolutional neural network is used for identifying handwritten numbers, and low-level detail features are not considered when feature extraction is carried out, so that the problem of gradient disappearance or explosion easily occurs if the network is deeper and deeper. Therefore, the facial expression recognition model in the prior art for the live webcasting class is poor in robustness and low in recognition accuracy.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
In view of the foregoing deficiencies of the prior art, an object of the present invention is to provide a method, an apparatus and a device for recognizing facial expressions in a live broadcast class, which aim to solve the technical problems of poor robustness and low recognition accuracy of a facial expression recognition model in a live broadcast class in the prior art.
The technical scheme of the invention is as follows:
a method of identifying facial expressions in a live lesson, the method comprising:
acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set;
constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model;
and acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result.
Further, before the acquiring an original training data set of an original facial expression and performing ROI processing on the original training data set to generate a target training data set, the method includes:
the method comprises the steps of obtaining student images in a live broadcast course through a camera, carrying out face recognition on the student images, and generating facial images to be recognized.
Further preferably, the performing face recognition on the student image to generate a face image to be recognized includes:
the student images are identified through a face identification algorithm, and facial images to be identified, including eyes, a nose and a mouth, are generated according to the identification result.
Further preferably, the acquiring an original training data set of an original facial expression, performing ROI processing on the original training data set, and generating a target training data set includes:
acquiring an initial image in an original training data set of original facial expression, carrying out quartering processing on the initial image, zooming four equally divided images, wherein the four zoomed images are consistent with the size of the initial image, and generating four equally divided images;
respectively carrying out upper half shielding and lower half shielding processing on the initial image to generate two shielding images;
carrying out mirror image processing on the initial image to generate a mirror image;
carrying out center focusing and zooming on the initial image to generate a focused image;
generating an extended training data set corresponding to the initial image according to the initial image, the equally divided images, the mirror image and the gathered image;
after the ROI processing operation is performed on all initial images of the original training data set, an extended training data set corresponding to all initial images in the original training data set is generated, and a target training data set is generated according to the extended training data set.
Preferably, after the first half shielding and the second half shielding are respectively performed on the initial image, two shielding images are generated, including:
and respectively carrying out shielding operation on the upper half part and the lower half part of the initial image according to an image processing tool to generate an upper half shielding image and a lower half shielding image.
Further, the mirroring processing of the initial image to generate a mirror image includes:
and carrying out mirror image processing on the initial image based on a number axis, and recalculating the marked feature points according to a mirror image principle to generate a mirror image.
Further, the center focusing and zooming the initial image to generate a focused image includes:
and carrying out central focusing on the initial image, carrying out size scaling on the focused initial image, wherein the size of the scaled image is the same as that of the initial image, and recalculating the labeled characteristic points according to the scaling to generate a focused image.
Another embodiment of the present invention provides an apparatus for recognizing a facial expression in a live lesson, the apparatus comprising:
the ROI processing module is used for acquiring an original training data set of original facial expressions, and performing ROI processing on the original training data set to generate a target training data set;
the model training module is used for constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, training the initial facial expression recognition model according to a target training data set and generating a target facial expression recognition model;
and the facial expression recognition module is used for acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model and generating a facial expression recognition result.
Another embodiment of the present invention provides an apparatus for identifying facial expressions in a live lesson, the apparatus comprising at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described method of identifying facial expressions in a live class.
Yet another embodiment of the present invention provides a non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described method of identifying facial expressions in a live class.
Has the advantages that: according to the embodiment of the invention, the ROI processing is adopted in the image processing process, so that the training data set is added, the accuracy of facial expression recognition is improved, and the robustness of a training model is enhanced.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a preferred embodiment of a method for identifying facial expressions in a live lesson according to the present invention;
FIG. 2 is a schematic structural diagram of a cross-layer connected convolutional neural network according to a preferred embodiment of the present invention;
FIG. 3 is a functional block diagram of an apparatus for recognizing facial expressions in a live lesson according to an embodiment of the present invention;
fig. 4 is a diagram illustrating a hardware configuration of an apparatus for recognizing facial expressions in a live lesson according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is described in further detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Embodiments of the present invention will be described below with reference to the accompanying drawings.
The embodiment of the invention provides a method for identifying facial expressions in a live lesson. Referring to fig. 1, fig. 1 is a flowchart illustrating a method for recognizing facial expressions in a live lesson according to a preferred embodiment of the present invention. As shown in fig. 1, it includes the steps of:
s100, acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set;
s200, constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model;
and step S300, acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result.
In specific implementation, the embodiment of the invention provides a method for recognizing facial expressions in a live broadcast course aiming at the problems of facial expression recognition in a live broadcast video course in the prior art, the method comprises the steps of processing a situation data set based on the thought of a region of interest (ROI) to generate a target training data set, then improving a LeNet-5 neural network by using a cross-layer connection method, taking low-level network characteristics into consideration, constructing an initial facial expression recognition model, and training the initial area expression recognition model through the target training data set to generate a facial expression recognition model. And acquiring a facial image to be recognized, and inputting the facial image to be recognized into the target facial expression recognition model to analyze the recognition result of the facial expression. The algorithm not only improves the accuracy of facial expression recognition, but also enhances the robustness of the training model.
Fig. 2 is a schematic structural diagram of a cross-layer connected convolutional neural network, and as shown in fig. 2, the network includes 1 input layer, 3 convolutional layers, 2 pooling layers, 1 fully-connected layer, and 1 output layer. The Input layer is an Input layer, and the Input emoticon pixels are 32 × 32. The Layer1 is a convolution Layer having 6 feature maps, and the input 32 × 32 pixel pictures are respectively convolved with 6 convolution kernels of 5 × 5 pixels to obtain a feature map of 28 × 28 pixels. The Layer2 is a pooling Layer, and the 28 × 28 pixel feature map is pooled to obtain a 14 × 14 pixel feature map. The Layer3 is a convolution Layer having 16 feature maps, and the 14 × 14 pixel pictures obtained at the upper Layer are respectively convolved with 16 convolution kernels of 5 × 5 pixels to obtain a feature map of 10 × 10 pixels. The Layer4 is a pooling Layer, and the feature map of 10 × 10 pixels is pooled to obtain a feature map of 5 × 5 pixels. The Layer5 is a convolution Layer having 120 feature maps, and the 5 × 5 pixel pictures obtained at the upper Layer are respectively convolved with 120 convolution kernels of 5 × 5 pixels to obtain a feature map of 1 × 1 pixel. The Layer6 is a fully connected Layer, and has 84 units in total. The Output layer is an Output layer and outputs 7 expression types. The 7 expression types were happy, sad, surprised, afraid, angry, aversive, and neutral, respectively.
Further, before obtaining an original training data set of an original facial expression and performing ROI processing on the original training data set to generate a target training data set, the method includes:
the method comprises the steps of obtaining student images in a live broadcast course through a camera, carrying out face recognition on the student images, and generating facial images to be recognized.
In specific implementation, before the ROI setting process is carried out, the human face needs to be detected by human face recognition, and the whole image area is filled with the human face as much as possible so as to reduce errors.
Further, the face recognition is carried out on the student image, and a face image to be recognized is generated, wherein the face recognition comprises the following steps:
the student images are identified through a face identification algorithm, and facial images to be identified, including eyes, a nose and a mouth, are generated according to the identification result.
In particular implementation, the key point of the ROI setting scheme is to recognize a facial expression by detecting changes in eyes, nose, and mouth. Therefore, the image after face recognition must be a face image including eyes, nose, and mouth.
Further, acquiring an original training data set of an original facial expression, performing ROI processing on the original training data set, and generating a target training data set, including:
acquiring an initial image in an original training data set of original facial expression, carrying out quartering processing on the initial image, zooming four equally divided images, wherein the four zoomed images are consistent with the size of the initial image, and generating four equally divided images;
respectively carrying out upper half shielding and lower half shielding processing on the initial image to generate two shielding images;
carrying out mirror image processing on the initial image to generate a mirror image;
carrying out center focusing and zooming on the initial image to generate a focused image;
generating an extended training data set corresponding to the initial image according to the initial image, the equally divided images, the mirror image and the gathered image;
after the ROI processing operation is performed on all initial images of the original training data set, an extended training data set corresponding to all initial images in the original training data set is generated, and a target training data set is generated according to the extended training data set.
When the method is specifically implemented, ROI interest region processing is firstly carried out on training images, each training image is processed through the ROI interest region, a plurality of specific images are obtained, and a new database is formed. And inputting the new database into the improved cross-layer connection convolutional neural network module, and training to obtain a facial expression recognition model.
We cut the image first and divide the original image equally into four equal parts, which respectively contain the complete left eye area, the complete right eye area and each half area of the mouth. And then, the size of the cut image is zoomed, so that the size of the new image is consistent with that of the original image, the corresponding annotation category is also consistent with that of the original image, and the annotation feature points are recalculated according to the proportion. Thus, 4 new data are obtained, and the 4 new data are recorded as four equally divided images.
The facial expression library is subjected to 8 kinds of improvement processing, and the data of the expression library are added, so that 9 different ROI areas are formed. The data of the original image is added, a batch of data sets are reconstructed, the data sets constructed by the ROI scheme are 9 times of the original data sets in quantity, the diversity of samples is greatly enriched, the effectiveness of the expansion is that different ROI areas are mutually connected and mutually supplemented, and the reliability of the predicted target is enhanced. Most importantly, the data set produced by the method is more detailed and accurate in analyzing the detail characteristics of the expression than the original data. The model can focus more on emotional expression caused by more tiny change of the facial expression.
Further, after the initial image is respectively processed by the first half shielding and the second half shielding, two shielding images are generated, which includes:
and respectively carrying out shielding operation on the upper half part and the lower half part of the initial image according to an image processing tool to generate an upper half shielding image and a lower half shielding image.
In specific implementation, the original image is subjected to shielding processing again, the upper half part and the lower half part of the original image are respectively shielded by using an opencv image processing tool, so that the size of the obtained image is the same as that of the original image, the corresponding labeling type is consistent with that of the original image, the coordinate of the corresponding labeling feature point does not need to be recalculated, and 2 kinds of new data are obtained and recorded as an upper-half shielding image and a lower-half shielding image.
Further, the mirror image processing is performed on the initial image to generate a mirror image, and the mirror image processing method includes:
and carrying out mirror image processing on the initial image based on a number axis, and recalculating the marked feature points according to a mirror image principle to generate a mirror image.
In specific implementation, the original image is mirrored based on a numerical axis, the size of the mirrored image is the same as that of the original image, the corresponding annotation category is consistent with that of the original image, and the annotation feature points are recalculated by using the mirroring principle. This results in 1 new data, called mirror image.
Further, the initial image is focused and scaled to generate a focused image, which includes:
and carrying out central focusing on the initial image, carrying out size scaling on the focused initial image, wherein the size of the scaled image is the same as that of the initial image, and recalculating the labeled characteristic points according to the scaling to generate a focused image.
In specific implementation, the original image is subjected to center focusing, the size of the focused image is zoomed, the size of a new image is consistent with that of the original image, the corresponding annotation type is consistent with that of the original image, and the annotation feature points are recalculated according to the proportion. This results in 1 new data, denoted as the focused image.
The embodiment of the method can show that the invention provides a method for recognizing facial expressions in a direct broadcast course, 8 kinds of processing are carried out on an expression data set based on the thought of a region of interest (ROI), a set of new database is formed, the database is generated by specially training the model, and the image characteristics are particularly beneficial to the improvement of the accuracy of the expression recognition model. The second key point of the invention is that a cross-layer connection method is used for improving the LeNet-5 neural network, the characteristics of a lower layer network are also taken into consideration, the problem of gradient disappearance or explosion is avoided along with the deepening of the network, the accuracy of facial expression recognition is improved, and the robustness of a training model is enhanced.
It should be noted that, a certain order does not necessarily exist between the above steps, and those skilled in the art can understand, according to the description of the embodiments of the present invention, that in different embodiments, the above steps may have different execution orders, that is, may be executed in parallel, may also be executed interchangeably, and the like.
Another embodiment of the present invention provides an apparatus for recognizing facial expressions in a live lesson, as shown in fig. 3, the apparatus 1 including:
the ROI processing module 11 is configured to acquire an original training data set of an original facial expression, perform ROI processing on the original training data set, and generate a target training data set;
the model training module 12 is used for constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, training the initial facial expression recognition model according to a target training data set, and generating a target facial expression recognition model;
and the facial expression recognition module 13 is configured to acquire a facial image to be recognized, input the image to be recognized into the target facial expression recognition model, and generate a facial expression recognition result.
The specific implementation is shown in the method embodiment, and is not described herein again.
Another embodiment of the present invention provides an apparatus for recognizing facial expressions in a live lesson, as shown in fig. 4, the apparatus 10 including:
one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 4, the processor 110 and the memory 120 may be connected by a bus or other means, and fig. 4 illustrates a connection by a bus as an example.
The memory 120, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to a method for identifying facial expressions in a live lesson in embodiments of the present invention. The processor 110 executes various functional applications and data processing of the device 10, i.e. implements the method of identifying facial expressions in a live lesson in the above-described method embodiments, by running non-volatile software programs, instructions and units stored in the memory 120.
The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an application program required for operating the device, at least one function; the storage data area may store data created according to the use of the device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more units are stored in the memory 120, which when executed by the one or more processors 110, perform the method of identifying facial expressions in a live lesson in any of the method embodiments described above, e.g., performing the method steps S100-S300 in fig. 1 described above.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S300 of fig. 1 described above.
By way of example, non-volatile storage media can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described herein are intended to comprise one or more of these and/or any other suitable types of memory.
Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of identifying facial expressions in a live class of the above-described method embodiment. For example, the method steps S100 to S300 in fig. 1 described above are performed.
The above-described embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions essentially or contributing to the related art can be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Conditional language such as "can," "might," or "may" is generally intended to convey that a particular embodiment can include (yet other embodiments do not include) particular features, elements, and/or operations, among others, unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements, and/or operations are in any way required for one or more embodiments or that one or more embodiments must include logic for deciding, with or without input or prompting, whether such features, elements, and/or operations are included or are to be performed in any particular embodiment.
What has been described herein in this specification and the accompanying drawings includes examples of methods and apparatuses that can provide recognition of facial expressions in a live class. It will, of course, not be possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the disclosure, but it can be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications can be made to the disclosure without departing from the scope or spirit thereof. In addition, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings and from practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and the drawings be considered in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Claims (10)
1. A method of recognizing facial expressions in a live lesson, the method comprising:
acquiring an original training data set of original facial expressions, and performing ROI (region of interest) processing on the original training data set to generate a target training data set;
constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, and training the initial facial expression recognition model according to a target training data set to generate a target facial expression recognition model;
and acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model, and generating a facial expression recognition result.
2. The method of claim 1, wherein the obtaining an original training data set of original facial expressions and performing ROI processing on the original training data set to generate a target training data set comprises:
the method comprises the steps of obtaining student images in a live broadcast course through a camera, carrying out face recognition on the student images, and generating facial images to be recognized.
3. The method of claim 2, wherein the performing face recognition on the student image to generate the facial image to be recognized comprises:
the student images are identified through a face identification algorithm, and facial images to be identified, including eyes, a nose and a mouth, are generated according to the identification result.
4. The method of claim 3, wherein the obtaining an original training data set of original facial expressions, performing ROI processing on the original training data set, and generating a target training data set comprises:
acquiring an initial image in an original training data set of original facial expression, carrying out quartering processing on the initial image, zooming four equally divided images, wherein the four zoomed images are consistent with the size of the initial image, and generating four equally divided images;
respectively carrying out upper half shielding and lower half shielding processing on the initial image to generate two shielding images;
carrying out mirror image processing on the initial image to generate a mirror image;
carrying out center focusing and zooming on the initial image to generate a focused image;
generating an extended training data set corresponding to the initial image according to the initial image, the equally divided images, the mirror image and the gathered image;
after the ROI processing operation is performed on all initial images of the original training data set, an extended training data set corresponding to all initial images in the original training data set is generated, and a target training data set is generated according to the extended training data set.
5. The method of claim 4, wherein the generating two occlusion images after respectively performing top half occlusion and bottom half occlusion on the initial image comprises:
and respectively carrying out shielding operation on the upper half part and the lower half part of the initial image according to an image processing tool to generate an upper half shielding image and a lower half shielding image.
6. The method of claim 5, wherein mirroring the initial image to generate a mirror image comprises:
and carrying out mirror image processing on the initial image based on a number axis, and recalculating the marked feature points according to a mirror image principle to generate a mirror image.
7. The method of claim 6, wherein the step of center focusing and zooming the initial image to generate a focused image comprises:
and carrying out central focusing on the initial image, carrying out size scaling on the focused initial image, wherein the size of the scaled image is the same as that of the initial image, and recalculating the labeled characteristic points according to the scaling to generate a focused image.
8. An apparatus for recognizing facial expressions in a live lesson, the apparatus comprising:
the ROI processing module is used for acquiring an original training data set of original facial expressions, and performing ROI processing on the original training data set to generate a target training data set;
the model training module is used for constructing an initial facial expression recognition model based on a cross-layer connection convolutional neural network, training the initial facial expression recognition model according to a target training data set and generating a target facial expression recognition model;
and the facial expression recognition module is used for acquiring a facial image to be recognized, inputting the image to be recognized into the target facial expression recognition model and generating a facial expression recognition result.
9. An apparatus for recognizing facial expressions in a live lesson, the apparatus comprising at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of identifying facial expressions in a live lesson of any one of claims 1-7.
10. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of identifying facial expressions in a live lesson of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011065779.7A CN112380898A (en) | 2020-09-30 | 2020-09-30 | Method, device and equipment for recognizing facial expressions in live lessons |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011065779.7A CN112380898A (en) | 2020-09-30 | 2020-09-30 | Method, device and equipment for recognizing facial expressions in live lessons |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112380898A true CN112380898A (en) | 2021-02-19 |
Family
ID=74581001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011065779.7A Pending CN112380898A (en) | 2020-09-30 | 2020-09-30 | Method, device and equipment for recognizing facial expressions in live lessons |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112380898A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330420A (en) * | 2017-07-14 | 2017-11-07 | 河北工业大学 | The facial expression recognizing method of rotation information is carried based on deep learning |
CN107491726A (en) * | 2017-07-04 | 2017-12-19 | 重庆邮电大学 | A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks |
EP3324333A2 (en) * | 2016-11-21 | 2018-05-23 | Samsung Electronics Co., Ltd. | Method and apparatus to perform facial expression recognition and training |
-
2020
- 2020-09-30 CN CN202011065779.7A patent/CN112380898A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3324333A2 (en) * | 2016-11-21 | 2018-05-23 | Samsung Electronics Co., Ltd. | Method and apparatus to perform facial expression recognition and training |
CN107491726A (en) * | 2017-07-04 | 2017-12-19 | 重庆邮电大学 | A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks |
CN107330420A (en) * | 2017-07-14 | 2017-11-07 | 河北工业大学 | The facial expression recognizing method of rotation information is carried based on deep learning |
Non-Patent Citations (3)
Title |
---|
孙晓 等: "基于ROI-KNN卷积神经网络的面部表情识别", 自动化学报, vol. 42, no. 06, 15 June 2016 (2016-06-15), pages 2 * |
李勇 等: "基于跨连接LeNet-5网络的面部表情识别", 自动化学报, vol. 44, no. 01, 15 January 2018 (2018-01-15), pages 2 * |
郭昕刚 等: "连接卷积神经网络人脸表情识别算法", 长春工业大学学报, vol. 41, no. 04, 15 August 2020 (2020-08-15), pages 2 - 5 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11842487B2 (en) | Detection model training method and apparatus, computer device and storage medium | |
US11348249B2 (en) | Training method for image semantic segmentation model and server | |
US11908244B2 (en) | Human posture detection utilizing posture reference maps | |
WO2020238560A1 (en) | Video target tracking method and apparatus, computer device and storage medium | |
KR102591961B1 (en) | Model training method and device, and terminal and storage medium for the same | |
WO2021017261A1 (en) | Recognition model training method and apparatus, image recognition method and apparatus, and device and medium | |
Wen et al. | End-to-end detection-segmentation system for face labeling | |
US20230049533A1 (en) | Image gaze correction method, apparatus, electronic device, computer-readable storage medium, and computer program product | |
CN109960742B (en) | Local information searching method and device | |
WO2020215573A1 (en) | Captcha identification method and apparatus, and computer device and storage medium | |
CN109508638A (en) | Face Emotion identification method, apparatus, computer equipment and storage medium | |
CN106326853B (en) | Face tracking method and device | |
WO2021068325A1 (en) | Facial action recognition model training method, facial action recognition method and apparatus, computer device, and storage medium | |
CN107886474A (en) | Image processing method, device and server | |
CN112149651B (en) | Facial expression recognition method, device and equipment based on deep learning | |
CN112633423B (en) | Training method of text recognition model, text recognition method, device and equipment | |
US20220215558A1 (en) | Method and apparatus for three-dimensional edge detection, storage medium, and computer device | |
CN113469092B (en) | Character recognition model generation method, device, computer equipment and storage medium | |
WO2021169642A1 (en) | Video-based eyeball turning determination method and system | |
CN112836653A (en) | Face privacy method, device and apparatus and computer storage medium | |
CN109711356A (en) | A kind of expression recognition method and system | |
CN111275051A (en) | Character recognition method, character recognition device, computer equipment and computer-readable storage medium | |
CN112115860A (en) | Face key point positioning method and device, computer equipment and storage medium | |
CN113269013A (en) | Object behavior analysis method, information display method and electronic equipment | |
WO2021073150A1 (en) | Data detection method and apparatus, and computer device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |