CN112364737A - Facial expression recognition method, device and equipment for live webcast lessons - Google Patents
Facial expression recognition method, device and equipment for live webcast lessons Download PDFInfo
- Publication number
- CN112364737A CN112364737A CN202011193684.3A CN202011193684A CN112364737A CN 112364737 A CN112364737 A CN 112364737A CN 202011193684 A CN202011193684 A CN 202011193684A CN 112364737 A CN112364737 A CN 112364737A
- Authority
- CN
- China
- Prior art keywords
- facial expression
- neural network
- convolutional neural
- network model
- expression recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008921 facial expression Effects 0.000 title claims abstract description 120
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 96
- 238000012549 training Methods 0.000 claims abstract description 53
- 230000004927 fusion Effects 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 230000003213 activating effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 2
- 230000014509 gene expression Effects 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a facial expression recognition method, a device and equipment for a network live broadcast course, wherein the method comprises the following steps: constructing an initial convolutional neural network model, optimizing the initial convolutional neural network model, and constructing a deep convolutional neural network model of a double-channel full-connection layer; acquiring a facial expression training sample, training a deep convolutional neural network model according to the facial expression training sample, and generating a facial expression recognition model; and acquiring a facial expression image in a network live broadcast course, inputting the facial expression image into a facial expression recognition model, and generating a facial expression recognition result. According to the embodiment of the invention, the influence of full-connection layers with different scales on the high-level semantic feature expression capability of the image is considered, the full-connection layer with the fusion of two channels is designed, the feature expression capability of the deep convolutional neural network model is enhanced, and the facial expression recognition accuracy is improved.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a facial expression recognition method, a device and equipment for a live webcast course.
Background
With the rise of the artificial intelligence industry, facial expression recognition technology based on deep learning is more and more concerned by people, and especially in a network live broadcast class, how the current class listening state of a student is can be obtained by analyzing the facial expressions of the student in a live broadcast video, so that teacher management and teaching are facilitated. In recent years, deep learning has achieved excellent results in many computer vision tasks such as image classification and face recognition. Until recently, facial expression recognition methods based on deep learning have also been developed. The characteristic information contained in each layer of the DCNN is distributed in a layering mode in the whole network, and the lower layer mainly contains texture and corner characteristics of the image and is local characteristics of the image. Higher layers contain a specific class of features that are more suited to complex tasks that require global features. As layers grow deeper, features become more complex and global. The features extracted by the full connection layer are generally regarded as high-level features, the traditional DCNN such as the Letnet and the Alexnet uses the full connection layer of a single channel, in addition, the traditional single-channel full connection layer only reserves partial 'important' features of the last pooling layer, and discards the features regarded as 'less important', so that the features extracted by the full connection layer have certain limitation on the aspect of image expression capacity, and the expression recognition accuracy is low.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
In view of the defects of the prior art, the present invention aims to provide a facial expression recognition method, device and equipment for a live webcast lesson, and aims to solve the technical problem that in the prior art, a single-channel full-connection layer is adopted in the facial expression recognition method, so that features extracted by the full-connection layer have certain limitations in the aspect of image expression capability, and the facial expression recognition accuracy is low.
The technical scheme of the invention is as follows:
a facial expression recognition method for a live webcast session, the method comprising:
constructing an initial convolutional neural network model, optimizing the initial convolutional neural network model, and constructing a deep convolutional neural network model of a double-channel full-connection layer;
acquiring a facial expression training sample, training a deep convolutional neural network model according to the facial expression training sample, and generating a facial expression recognition model;
and acquiring a facial expression image in a network live broadcast course, inputting the facial expression image into a facial expression recognition model, and generating a facial expression recognition result.
Further, the constructing an initial convolutional neural network model, optimizing the initial convolutional neural network model, and constructing a deep convolutional neural network model of a dual-channel fully-connected layer includes:
constructing an initial convolutional neural network model, continuously convolving hidden layers of the initial convolutional neural network by adopting a minimum-scale convolution kernel, and then pooling;
optimizing the network internal structure of the pooled convolutional neural network model, and constructing a deep convolutional neural network model of a double-channel full-connection layer.
Further preferably, the optimizing the network internal structure of the pooled convolutional neural network model includes:
and optimizing the network internal structure of the pooled convolutional neural network model according to the Maxout activating function and the Dropout algorithm.
Further preferably, the obtaining of the facial expression training sample, training the deep convolutional neural network model according to the facial expression training sample, and generating the facial expression recognition model includes:
acquiring a facial expression training sample, and training a deep convolutional neural network model according to the facial expression training sample;
and in the training process, learning is carried out by adopting an A-Softmax algorithm, and a facial expression recognition model is generated according to the learning result.
Preferably, the constructing an initial convolutional neural network model, performing continuous convolution on the hidden layer of the initial convolutional neural network by using a minimum-scale convolution kernel, and then performing pooling includes:
constructing an initial convolutional neural network model, obtaining a convolutional layer of the convolutional neural network model, and using a 0-value filling technology in the convolutional layer;
and carrying out continuous convolution on the convolution layers of the filled convolution neural network by adopting a convolution kernel with the minimum scale, and then carrying out pooling.
Further, performing continuous convolution on hidden layers of the initial convolutional neural network by using a minimum-scale convolution kernel, and then performing pooling, includes:
and continuously convolving hidden layers of the initial convolutional neural network by using a filter convolution kernel of 3x3, and then pooling.
Further, the convolutional neural network is provided with a fully-connected fusion layer, and the network internal structure of the pooled convolutional neural network model is optimized according to the Maxout activation function and the Dropout algorithm, and the method further comprises the following steps:
and optimizing the fully-connected fusion layer of the pooled convolutional neural network model according to the Maxout activating function and the Dropout algorithm.
Another embodiment of the present invention provides a facial expression recognition device for a live webcast session, the device comprising:
the model building module is used for building an initial convolutional neural network model, optimizing the initial convolutional neural network model and building a deep convolutional neural network model of a double-channel full-connection layer;
the model training module is used for acquiring a facial expression training sample, training the deep convolutional neural network model according to the facial expression training sample and generating a facial expression recognition model;
and the facial expression recognition module is used for acquiring a facial expression image in a live network course, inputting the facial expression image into the facial expression recognition model and generating a facial expression recognition result.
Another embodiment of the present invention provides a facial expression recognition apparatus for a live webcast session, the apparatus comprising at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described facial expression recognition method for a live webcast session.
Another embodiment of the present invention also provides a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-mentioned facial expression recognition method for a live webcast session.
Has the advantages that: according to the embodiment of the invention, the influence of full-connection layers with different scales on the high-level semantic feature expression capability of the image is considered, the full-connection layer with the fusion of two channels is designed, the feature expression capability of the deep convolutional neural network model is enhanced, and the facial expression recognition accuracy is improved.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flowchart illustrating a preferred embodiment of a facial expression recognition method for live online lessons according to the present invention;
fig. 2 is a schematic network structure diagram of a face recognition model according to a specific application embodiment of the facial expression recognition method for live webcast lessons in the present invention;
FIG. 3a is a schematic diagram of parameters of each network layer of a face recognition model in the prior art;
fig. 3b is a schematic diagram of parameters of each network layer in a specific application embodiment of the facial expression recognition method for the live webcast lesson according to the present invention;
FIG. 4 is a functional block diagram of an embodiment of a facial expression recognition apparatus for live online lessons according to the present invention;
fig. 5 is a schematic diagram of a hardware structure of a facial expression recognition device for live webcast lessons according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is described in further detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Embodiments of the present invention will be described below with reference to the accompanying drawings.
The embodiment of the invention provides a facial expression recognition method for a live webcast course. Referring to fig. 1, fig. 1 is a flowchart illustrating a facial expression recognition method for live webcast lessons according to a preferred embodiment of the present invention. As shown in fig. 1, it includes the steps of:
s100, constructing an initial convolutional neural network model, optimizing the initial convolutional neural network model, and constructing a deep convolutional neural network model of a double-channel full-connection layer;
s200, obtaining a facial expression training sample, training a deep convolutional neural network model according to the facial expression training sample, and generating a facial expression recognition model;
step S300, collecting facial expression images in a live online course, inputting the facial expression images into a facial expression recognition model, and generating a facial expression recognition result.
In specific implementation, the facial expression recognition algorithm of the embodiment of the invention is used for recognizing the expressions of students in a live webcast class, and by recognizing the expressions, the class listening state of the students can be obtained, thereby providing convenience for subsequently monitoring the class listening effect of the students.
Acquiring an initial convolutional neural network model, continuously convolving hidden layers of the initial convolutional neural network by adopting a minimum-scale convolution kernel, and then pooling; optimizing the internal structure of the network, and constructing a deep convolutional neural network model of a double-channel full-connection layer; inputting the characteristics of the collected face image into a deep convolution neural network model for training to obtain a trained facial expression recognition model; the facial expression image is collected through an image collecting device, the facial image to be recognized is input into a facial expression recognition model, and recognized facial expressions are generated.
Further, an initial convolutional neural network model is constructed, the initial convolutional neural network model is optimized, and a deep convolutional neural network model of a double-channel full-connection layer is constructed, and the method comprises the following steps:
constructing an initial convolutional neural network model, continuously convolving hidden layers of the initial convolutional neural network by adopting a minimum-scale convolution kernel, and then pooling;
optimizing the network internal structure of the pooled convolutional neural network model, and constructing a deep convolutional neural network model of a double-channel full-connection layer.
During specific implementation, an initial Convolutional Neural network model is constructed, continuous convolution is carried out on a hidden layer by adopting a small-scale convolution kernel, then (maximum + average) pooling is carried out, the internal structure of the network is optimized, meanwhile, a traditional single-channel full-connection layer is improved, and a DCNN (Deep Convolutional Neural network) model with a double-channel full-connection layer is constructed.
Further, optimizing the network internal structure of the pooled convolutional neural network model, including:
and optimizing the network internal structure of the pooled convolutional neural network model according to the Maxout activating function and the Dropout algorithm.
In specific implementation, the internal structure of the network is optimized by combining the Maxout activation function and the Dropout technology.
Further, acquiring a facial expression training sample, training the deep convolutional neural network model according to the facial expression training sample, and generating a facial expression recognition model, including:
acquiring a facial expression training sample, and training a deep convolutional neural network model according to the facial expression training sample;
and in the training process, learning is carried out by adopting an A-Softmax algorithm, and a facial expression recognition model is generated according to the learning result.
In particular, the A-Softmax loss is used during training, the angle is used as a distance measure, and the angular distance and the learned features are combined to enhance discrimination. The network performance is greatly improved, the network feature extraction capability is improved, the number of parameters in the training process is reduced, and a facial expression recognition training model with a good effect is obtained.
Further, constructing an initial convolutional neural network model, continuously convolving hidden layers of the initial convolutional neural network by adopting a minimum-scale convolution kernel, and then pooling, wherein the pooling comprises the following steps:
constructing an initial convolutional neural network model, obtaining a convolutional layer of the convolutional neural network model, and using a 0-value filling technology in the convolutional layer;
and carrying out continuous convolution on the convolution layers of the filled convolution neural network by adopting a convolution kernel with the minimum scale, and then carrying out pooling.
In specific implementation, the two-channel convolutional neural network includes, as shown in fig. 2, 5 convolutional layers, 3 pooling layers, a fully-connected fusion layer, and a two-channel fully-connected layer. The 0-value padding technique was used at the convolutional layers, and the convolutional operations were performed twice in succession using filters at the C2, C3 convolutional layers and C4, C5 convolutional layers, respectively. By combining the maximum pooling and the average pooling, more diversified feature information can be retained.
Further, performing continuous convolution on hidden layers of the initial convolutional neural network by using a minimum-scale convolution kernel, and then performing pooling, wherein the pooling comprises:
and continuously convolving hidden layers of the initial convolutional neural network by using a filter convolution kernel of 3x3, and then pooling.
In a specific implementation, as shown in fig. 2, two convolution operations were performed successively on the C2 and C3 convolutional layers and the C4 and C5 convolutional layers, respectively, using a filter of size 3 × 3. By combining the maximum pooling and the average pooling, more diversified feature information can be retained.
Further, the convolutional neural network is provided with a fully-connected fusion layer, and the network internal structure of the pooled convolutional neural network model is optimized according to the Maxout activation function and the Dropout algorithm, and the method further comprises the following steps:
and optimizing the fully-connected fusion layer of the pooled convolutional neural network model according to the Maxout activating function and the Dropout algorithm.
In specific implementation, Dropout technology is used for the convolution layer and the full-link layer respectively in order to prevent overfitting, and batch normalization technology is added after the convolution layer in order to improve the generalization of the DCNN model.
As can be seen from fig. 3a and 3b, fig. 3a shows the network layer parameters of the DCNN model before modification, and fig. 3b shows the network layer parameters of the TCNN model after modification. It can be seen that the number of trainable parameters reduced by using successive convolution for C2 and C3 layers is (5x5x24x24-3x3x24x24x2) x 64-258048, the number of trainable parameters reduced by using successive convolution for C4 and C5 layers is (5x5x12x12-3x3x12x12x2) x 128-129024, and the total number of reduced trainable parameters is 387072. Because the improved TCNN model uses two channels at the full link layer, the number of trainable parameters of the full link layer is increased by 256+256 to 512. In fig. 3b, the F3 layer is a feature fusion layer, and the layer is fused by F1 and F2 full link layers, so the parameters of the layer do not belong to trainable parameters. In general, the TCNN model optimizes the number of parameters in the network, and allows trainable parameters in the network to be reduced.
The embodiments of the present invention provide a facial expression recognition method for live webcast lessons, which aims to improve network performance, increase network feature extraction capability, and reduce the number of parameters in a training process, and the method performs (max + average) pooling after continuous convolution by using a small-scale convolution kernel in a hidden layer, optimizes a network internal structure by combining a Maxout activation function and a Dropout technology, and improves a conventional single-channel full-connection layer to construct a DCNN model with a dual-channel full-connection layer. The a-Softmax loss is used during training, using angle as a distance measure, combining the angular distance and learned features to enhance discrimination. And finally, obtaining a facial expression recognition training model with good effect.
According to the embodiment of the invention, the influence of full-connection layers with different scales on the high-level semantic feature expression capability of the image is fully considered, the dual-channel fusion full-connection layer is designed, and the feature expression capability of the DCNN model is enhanced.
And a Maxout activation function is used for replacing a traditional ReLU activation function at a dual-channel full-connection layer, so that the network can express more accurate high-dimensional characteristic information.
Given the problem that ideal facial features exist during FER with a maximum inter-class distance smaller than the minimum inter-class distance, the a-Softmax penalty is used during training to allow TCNN to learn facial features with geometrically interpretable angular separation.
It should be noted that, a certain order does not necessarily exist between the above steps, and those skilled in the art can understand, according to the description of the embodiments of the present invention, that in different embodiments, the above steps may have different execution orders, that is, may be executed in parallel, may also be executed interchangeably, and the like.
Another embodiment of the present invention provides a facial expression recognition apparatus for live online lessons, as shown in fig. 4, the apparatus 1 includes:
the model building module 11 is used for building an initial convolutional neural network model, optimizing the initial convolutional neural network model and building a deep convolutional neural network model of a double-channel full-connection layer;
the model training module 12 is used for acquiring a facial expression training sample, training the deep convolutional neural network model according to the facial expression training sample, and generating a facial expression recognition model;
and the facial expression recognition module 13 is configured to collect facial expression images in a live network course, input the facial expression images into a facial expression recognition model, and generate a facial expression recognition result.
The specific implementation is shown in the method embodiment, and is not described herein again.
Another embodiment of the present invention provides a facial expression recognition apparatus for a live webcast session, as shown in fig. 5, the apparatus 10 includes:
one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 5, the processor 110 and the memory 120 may be connected by a bus or other means, and where fig. 5 illustrates a connection by a bus.
The memory 120 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the facial expression recognition method for live webcast lessons in the embodiment of the present invention. The processor 110 executes various functional applications and data processing of the device 10, namely, implements the facial expression recognition method for live webcast lessons in the above-described method embodiments, by running the nonvolatile software programs, instructions, and units stored in the memory 120.
The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an application program required for operating the device, at least one function; the storage data area may store data created according to the use of the device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more units are stored in the memory 120, and when executed by the one or more processors 110, perform the facial expression recognition method for webcast lessons in any of the above-described method embodiments, e.g., performing the above-described method steps S100 to S300 in fig. 1.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S300 of fig. 1 described above.
By way of example, non-volatile storage media can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described herein are intended to comprise one or more of these and/or any other suitable types of memory.
Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the facial expression recognition method for webcast lessons of the above-described method embodiment. For example, the method steps S100 to S300 in fig. 1 described above are performed.
The above-described embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions essentially or contributing to the related art can be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Conditional language such as "can," "might," or "may" is generally intended to convey that a particular embodiment can include (yet other embodiments do not include) particular features, elements, and/or operations, among others, unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements, and/or operations are in any way required for one or more embodiments or that one or more embodiments must include logic for deciding, with or without input or prompting, whether such features, elements, and/or operations are included or are to be performed in any particular embodiment.
What has been described herein in the specification and drawings includes examples that can provide a facial expression recognition method and apparatus for a live webcast session. It will, of course, not be possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the disclosure, but it can be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications can be made to the disclosure without departing from the scope or spirit thereof. In addition, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings and from practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and the drawings be considered in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Claims (10)
1. A facial expression recognition method for a live webcast course is characterized by comprising the following steps:
constructing an initial convolutional neural network model, optimizing the initial convolutional neural network model, and constructing a deep convolutional neural network model of a double-channel full-connection layer;
acquiring a facial expression training sample, training a deep convolutional neural network model according to the facial expression training sample, and generating a facial expression recognition model;
and acquiring a facial expression image in a network live broadcast course, inputting the facial expression image into a facial expression recognition model, and generating a facial expression recognition result.
2. The facial expression recognition method for the live webcast class according to claim 1, wherein the constructing of the initial convolutional neural network model, the optimizing of the initial convolutional neural network model, and the constructing of the deep convolutional neural network model with the two channels and the full connection layer comprises:
constructing an initial convolutional neural network model, continuously convolving hidden layers of the initial convolutional neural network by adopting a minimum-scale convolution kernel, and then pooling;
optimizing the network internal structure of the pooled convolutional neural network model, and constructing a deep convolutional neural network model of a double-channel full-connection layer.
3. The facial expression recognition method for the live webcast class according to claim 2, wherein the optimizing the network internal structure of the pooled convolutional neural network model comprises:
and optimizing the network internal structure of the pooled convolutional neural network model according to the Maxout activating function and the Dropout algorithm.
4. The facial expression recognition method for the live webcast class according to claim 3, wherein the obtaining of the facial expression training sample, training the deep convolutional neural network model according to the facial expression training sample, and generating the facial expression recognition model comprises:
acquiring a facial expression training sample, and training a deep convolutional neural network model according to the facial expression training sample;
and in the training process, learning is carried out by adopting an A-Softmax algorithm, and a facial expression recognition model is generated according to the learning result.
5. The facial expression recognition method for the live webcasting class according to claim 4, wherein the constructing of the initial convolutional neural network model, the performing of pooling after continuous convolution of the hidden layer of the initial convolutional neural network by using a minimum-scale convolution kernel, comprises:
constructing an initial convolutional neural network model, obtaining a convolutional layer of the convolutional neural network model, and using a 0-value filling technology in the convolutional layer;
and carrying out continuous convolution on the convolution layers of the filled convolution neural network by adopting a convolution kernel with the minimum scale, and then carrying out pooling.
6. The facial expression recognition method for the live webcasting class according to claim 5, wherein the pooling is performed after the continuous convolution of the hidden layer of the initial convolutional neural network by using a convolution kernel with a minimum scale, and the pooling comprises:
and continuously convolving hidden layers of the initial convolutional neural network by using a filter convolution kernel of 3x3, and then pooling.
7. The facial expression recognition method for live webcast lessons according to claim 6, wherein the convolutional neural network is provided with a fully-connected fusion layer, and the network internal structure of the pooled convolutional neural network model is optimized according to the Maxout activation function and the Dropout algorithm, and further comprising:
and optimizing the fully-connected fusion layer of the pooled convolutional neural network model according to the Maxout activating function and the Dropout algorithm.
8. A facial expression recognition apparatus for a live webcast session, the apparatus comprising:
the model building module is used for building an initial convolutional neural network model, optimizing the initial convolutional neural network model and building a deep convolutional neural network model of a double-channel full-connection layer;
the model training module is used for acquiring a facial expression training sample, training the deep convolutional neural network model according to the facial expression training sample and generating a facial expression recognition model;
and the facial expression recognition module is used for acquiring a facial expression image in a live network course, inputting the facial expression image into the facial expression recognition model and generating a facial expression recognition result.
9. A facial expression recognition device for a live webcast class, the device comprising at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7 for facial expression recognition in a live online class.
10. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of facial expression recognition for a live webcast session of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011193684.3A CN112364737A (en) | 2020-10-30 | 2020-10-30 | Facial expression recognition method, device and equipment for live webcast lessons |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011193684.3A CN112364737A (en) | 2020-10-30 | 2020-10-30 | Facial expression recognition method, device and equipment for live webcast lessons |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112364737A true CN112364737A (en) | 2021-02-12 |
Family
ID=74514219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011193684.3A Pending CN112364737A (en) | 2020-10-30 | 2020-10-30 | Facial expression recognition method, device and equipment for live webcast lessons |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112364737A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113569975A (en) * | 2021-08-04 | 2021-10-29 | 华南师范大学 | Sketch work rating method and device based on model fusion |
CN113688714A (en) * | 2021-08-18 | 2021-11-23 | 华南师范大学 | Method, device, equipment and storage medium for identifying multi-angle facial expressions |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529503A (en) * | 2016-11-30 | 2017-03-22 | 华南理工大学 | Method for recognizing face emotion by using integrated convolutional neural network |
CN109272107A (en) * | 2018-08-10 | 2019-01-25 | 广东工业大学 | A method of improving the number of parameters of deep layer convolutional neural networks |
KR20190123372A (en) * | 2018-04-12 | 2019-11-01 | 가천대학교 산학협력단 | Apparatus and method for robust face recognition via hierarchical collaborative representation |
-
2020
- 2020-10-30 CN CN202011193684.3A patent/CN112364737A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529503A (en) * | 2016-11-30 | 2017-03-22 | 华南理工大学 | Method for recognizing face emotion by using integrated convolutional neural network |
KR20190123372A (en) * | 2018-04-12 | 2019-11-01 | 가천대학교 산학협력단 | Apparatus and method for robust face recognition via hierarchical collaborative representation |
CN109272107A (en) * | 2018-08-10 | 2019-01-25 | 广东工业大学 | A method of improving the number of parameters of deep layer convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
张琳琳: "基于卷积神经网络的人脸表情识别研究", 《中国知网》, no. 09, 15 September 2019 (2019-09-15), pages 1 - 4 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113569975A (en) * | 2021-08-04 | 2021-10-29 | 华南师范大学 | Sketch work rating method and device based on model fusion |
CN113688714A (en) * | 2021-08-18 | 2021-11-23 | 华南师范大学 | Method, device, equipment and storage medium for identifying multi-angle facial expressions |
CN113688714B (en) * | 2021-08-18 | 2023-09-01 | 华南师范大学 | Multi-angle facial expression recognition method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102071582B1 (en) | Method and apparatus for classifying a class to which a sentence belongs by using deep neural network | |
CN107273936B (en) | GAN image processing method and system | |
CN112734775B (en) | Image labeling, image semantic segmentation and model training methods and devices | |
CN111160350B (en) | Portrait segmentation method, model training method, device, medium and electronic equipment | |
CN111950656B (en) | Image recognition model generation method and device, computer equipment and storage medium | |
CN111598190B (en) | Training method of image target recognition model, image recognition method and device | |
WO2021184902A1 (en) | Image classification method and apparatus, training method and apparatus, device, and medium | |
CN112347248A (en) | Aspect-level text emotion classification method and system | |
CN112801146A (en) | Target detection method and system | |
CN107784316A (en) | A kind of image-recognizing method, device, system and computing device | |
US20220237917A1 (en) | Video comparison method and apparatus, computer device, and storage medium | |
CN112364737A (en) | Facial expression recognition method, device and equipment for live webcast lessons | |
CN112381763A (en) | Surface defect detection method | |
CN109858022A (en) | A kind of user's intension recognizing method, device, computer equipment and storage medium | |
CN112529146A (en) | Method and device for training neural network model | |
CN112748941A (en) | Feedback information-based target application program updating method and device | |
CN113705715B (en) | Time sequence classification method based on LSTM and multi-scale FCN | |
CN117036834B (en) | Data classification method and device based on artificial intelligence and electronic equipment | |
CN114169501A (en) | Neural network compression method and related equipment | |
CN111783473B (en) | Method and device for identifying best answer in medical question and answer and computer equipment | |
CN112465847A (en) | Edge detection method, device and equipment based on clear boundary prediction | |
CN116758331A (en) | Object detection method, device and storage medium | |
CN115346084B (en) | Sample processing method, device, electronic equipment, storage medium and program product | |
CN112749797A (en) | Pruning method and device for neural network model | |
CN112465848A (en) | Semantic edge detection method, device and equipment based on dynamic feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |