CN115439915A - Classroom participation identification method and device based on region coding and sample balance optimization - Google Patents
Classroom participation identification method and device based on region coding and sample balance optimization Download PDFInfo
- Publication number
- CN115439915A CN115439915A CN202211246980.4A CN202211246980A CN115439915A CN 115439915 A CN115439915 A CN 115439915A CN 202211246980 A CN202211246980 A CN 202211246980A CN 115439915 A CN115439915 A CN 115439915A
- Authority
- CN
- China
- Prior art keywords
- participation
- model
- sample
- representing
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000005457 optimization Methods 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 26
- 238000013507 mapping Methods 0.000 claims description 22
- 238000005520 cutting process Methods 0.000 claims description 14
- 230000035945 sensitivity Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 9
- 125000004122 cyclic group Chemical group 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 230000008485 antagonism Effects 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 238000009826 distribution Methods 0.000 abstract description 8
- 238000010586 diagram Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 238000011176 pooling Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000001815 facial effect Effects 0.000 description 4
- 230000008921 facial expression Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Educational Technology (AREA)
- Educational Administration (AREA)
- Medical Informatics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Marketing (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Primary Health Care (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Image Analysis (AREA)
Abstract
The application provides a classroom participation identification method based on regional coding and sample balance optimization, which comprises the following steps: acquiring video data of on-line learning of a student, and generating original sample data according to the video data, wherein the original sample data comprises high participation sample data and low participation sample data; inputting low participation sample data into a StarGAN model to generate target low participation samples with different styles; inputting original sample data and target low participation samples into an RCN model for training to obtain a trained RCN model; acquiring video data to be identified, and generating image data to be identified according to the video data to be identified; and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result. The method and the device effectively solve the problems of extremely unbalanced sample distribution and face shielding by hands in the participation degree identification task, and remarkably improve the discrimination and robustness of the network model.
Description
Technical Field
The application relates to the technical field of intelligent education and computer vision cross research, in particular to a classroom participation identification method and device based on regional coding and sample balance optimization.
Background
On-line education provides a brand-new knowledge propagation mode and a learning mode, teachers can conduct education activities such as live broadcast teaching, recorded broadcast playback, on-line question answering and correction homework through network education platforms such as MOOC and students can complete learning tasks according to own rhythms. On-line teaching has the characteristics of abundant learning resources, timely knowledge acquisition, various learning modes and the like, and gradually becomes an organic component of normal education and teaching activities. The interaction between teachers and students is a key link in the teaching process. In a traditional classroom environment, a teacher can directly observe the facial expressions and behaviors of students to judge the input degree of the students. However, in an online class, due to factors such as teaching scenes, students lack real-time interaction with teachers in face-to-face communication, attention is easy to disperse, teachers cannot obtain real-time feedback of input states of students, and learning effects of students can only be judged through class questioning and post-school assignment feedback. Therefore, how to realize the automatic evaluation of the learning participation of students in the online learning environment through the computer vision technology is a problem to be solved urgently at present.
Research of automatic recognition of participation can be divided into two categories, namely traditional machine learning-based research and deep learning-based research. Traditional computer vision technology-based recognition methods typically estimate engagement through facial features or manually extracted features of other modalities and through machine learning. For the participation identification task, whether the online class or the offline class, most students can listen to the conversation seriously, and only a few students are not concentrated in attention, so that the problem of serious unbalanced sample distribution exists in the participation data acquired in the natural environment, namely the number of samples with low participation is very small, and the number of samples with high participation accounts for a large proportion. Most of the prior engagement recognition algorithms can obtain higher accuracy in the whole classification task, but the classification capability of most types is often improved, and the judgment of few types of samples is ignored. In addition, since the behavior of students in the learning process is not artificially restricted in the natural environment, part of the facial area is often carelessly covered by hands, so that the facial expression change cannot be captured, and the situation is easily recognized by the model as distraction and a low participation degree predicted value is obtained.
In summary, in the learning participation degree identification method in the prior art, characteristics of unbalanced sample data distribution, the hand shielding condition of the participation degree sample and the like of the participation degree identification task are not fully considered, and the method has the defect of low identification accuracy.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a classroom participation identification method for area Coding and sample balance optimization, which solves the technical problems of extremely unbalanced sample distribution and face occlusion by hand in the participation identification task of the existing participation identification method, and by providing a StarGAN (Star generated adaptive Network) model to generate a low-participation sample, enhance the participation database, and at the same time, providing an RCN (Region Coding Network) model for face area Coding, which can adaptively learn attention weights of different face areas, and combine model feature learning and occlusion area Coding, thereby significantly improving the discriminative power and robustness of the Network model.
A second objective of the present application is to provide a classroom participation identification device with optimized region coding and sample balance.
A third object of the present application is to propose a non-transitory computer-readable storage medium.
In order to achieve the above object, an embodiment of a first aspect of the present application provides a classroom participation identification method based on region coding and sample balance optimization, including: acquiring video data of on-line learning of a student, and generating original sample data according to the video data, wherein the original sample data comprises high participation sample data and low participation sample data; inputting low participation sample data into a StarGAN model to generate target low participation samples with different styles; inputting original sample data and target low participation samples into an RCN model for training to obtain a trained RCN model; acquiring video data to be identified, and generating image data to be identified according to the video data to be identified; and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
Optionally, in an embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to a participation degree label.
Optionally, in an embodiment of the present application, the StarGAN model includes: before inputting the low participation sample data into the StarGAN model and generating target low participation samples with different styles, the mapping network, the style encoder, the generator and the discriminator further comprise:
acquiring low participation training data, wherein the low participation training data are face images;
inputting low participation training data into the StarGAN model for training, and performing iterative optimization on the StarGAN model through a loss function.
Optionally, in an embodiment of the present application, the loss function of the StarGAN model includes: confrontation loss, style reconstruction loss, diversity sensitivity loss, and cycle consistency loss, wherein,
the challenge loss is expressed as:
wherein L is adv Representing the countermeasure loss, E () representing the mathematical expectation value, x representing the input image, y representing the original domain of the input image, D y (x) Is the output of the discriminator in the original domain y,representing the target domain, z represents random gaussian noise,representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,representing the output of the discriminator on the image generated by the generator,a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the style reconstruction penalty is expressed as:
wherein L is sty Representing a loss of stylistic reconstruction, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,representing the target domain, z represents random gaussian noise,representing the style characteristics of the target domain generated by the mapping network based on random gaussian noise,a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the loss of diversity sensitivity is expressed as:
wherein L is ds Representing loss of diversity sensitivity, E () representing the mathematical expectation value, z 1 And z 2 Representing a random gaussian noise vector and representing the noise,andrespectively representing a vector z of random gaussian noise by a mapping network 1 And z 2 Outputting the obtained style characteristic vector and outputting the style characteristic vector,the representation generator is based on the input image and the style characteristicsThe image to be generated is then displayed on the display,the representation generator is based on the input image and the style characteristicsA generated image;
the cycle consistency loss is expressed as:
wherein L is cyc Representing a loss of cyclic consistency, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,representing the target domain, z represents random gaussian noise,is an estimated stylistic encoding of the input image x,representing a false image to be generated using a generatorAndreconstructing to obtain a styleThe image of (a) is displayed on the display,the representation generator is based on the input image and the style characteristicsA generated image;
the StarGAN model is optimized using an objective function, where the objective function is expressed as:
min G,F,E max D L adv +λ sty L sty -λ ds L ds +λ cyc L cyc;
wherein, min G,F,E Representing minimization of an objective function, max, by a training generator, a mapping network and a style encoder D Representing maximization of an objective function, L, by training discriminators adv Denotes the resistance to loss, L sty Represents a loss of style reconstruction, L ds Indicates a loss of diversity sensitivity, L cyc Denotes the loss of cyclic consistency, λ sty 、λ ds And λ cyc Is a hyper-parameter used to balance losses.
Optionally, in an embodiment of the present application, inputting low participation sample data into the StarGAN model, and generating target low participation samples with different styles, includes:
the method comprises the steps of inputting a face image in low-participation sample data into a StarGAN model, generating different style characteristics through a mapping network or a style encoder, and generating target low-participation samples with different styles through a generator according to the input face image and the different style characteristics.
Optionally, in an embodiment of the present application, the RCN model includes a feature extraction unit, a region attention unit, and a global attention unit, and the method includes inputting original sample data and a target low participation sample into the RCN model for training to obtain a trained RCN model, including:
inputting original sample data and a target low participation sample into an RCN model, and performing feature extraction on the original sample data and the target low participation sample through a feature extraction unit to obtain local area features of the sample;
in the feature space, the local region features of the sample are subjected to region coding through a region attention unit learning attention weights of different face regions, and global features of the sample are obtained;
respectively connecting the local area characteristics of the sample with the global characteristics of the sample in series to obtain sample characteristics, obtaining the attention weight of the sample characteristics through a global attention unit, and performing weighted fusion on the sample characteristics to obtain final sample characteristics;
and according to the characteristics of the final sample, performing iterative updating and optimization on the network parameters of the RCN model by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain the trained RCN model.
Optionally, in an embodiment of the present application, inputting image data to be recognized into a trained RCN model to obtain an engagement recognition result, including:
inputting image data to be identified into a feature extraction unit for feature extraction to obtain a feature map, and randomly cutting the feature map into a preset number of regional features;
inputting the regional characteristics into a regional attention unit, calculating attention weight of the regional characteristics, and weighting the regional characteristics to obtain global characteristics;
and respectively connecting the regional characteristics with the global characteristics in series to obtain target characteristics, obtaining the attention weight of the target characteristics through a global attention unit, weighting the target characteristics to obtain final characteristics, and identifying and classifying the final characteristics to obtain the participation degree identification result of the image data to be identified.
To achieve the above object, a second aspect of the present application provides a classroom participation identification device optimized by region coding and sample balance, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring video data of on-line learning of a student and generating original sample data according to the video data, and the original sample data comprises high participation sample data and low participation sample data;
the generating module is used for inputting the low participation sample data into the StarGAN model and generating target low participation samples with different styles;
the training module is used for inputting original sample data and target low participation samples into the RCN model for training to obtain a trained RCN model;
the second acquisition module is used for acquiring the video data to be identified and generating image data to be identified according to the video data to be identified;
and the recognition module is used for inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
Optionally, in an embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from the video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
In order to achieve the above object, a non-transitory computer-readable storage medium is provided in a third aspect of the present application, and when executed by a processor, the instructions in the storage medium can perform a classroom participation identification method based on region coding and sample balance optimization.
According to the classroom participation identification method, device and non-transitory computer-readable storage medium for area coding and sample balance optimization, the technical problems that sample distribution is extremely unbalanced and a hand shields a face in a participation identification task of an existing participation identification method are solved, low-participation samples are generated by providing a StarGAN model, a participation database is enhanced, meanwhile, an area coding network for face area coding is provided, attention weights of different face areas can be learned in a self-adaptive mode, modeling characteristics are learned and shielding area coding is combined, and the discrimination and robustness of a network model are remarkably improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a classroom participation identification method for area coding and sample balance optimization according to an embodiment of the present application;
fig. 2 is another flowchart of a classroom participation identification method for area coding and sample balance optimization according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an online learning low-participation image generated based on a StarGAN model by a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application;
fig. 4 is a schematic diagram of a low-participation sample generated based on the StarGAN model by the area coding and sample balance optimized classroom participation identification method according to the embodiment of the present application;
FIG. 5 is a schematic structural diagram of a feature extraction convolutional neural network of a classroom participation identification method for regional coding and sample balance optimization according to an embodiment of the present application;
fig. 6 is a schematic diagram of an RCN model-based engagement recognition framework of a classroom engagement recognition method for area coding and sample balance optimization according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a classroom participation identification device with area coding and sample balance optimization according to a second embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The method and apparatus for classroom participation identification with region coding and sample balance optimization according to the embodiments of the present application are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application.
As shown in fig. 1, the classroom participation identification method based on region coding and sample balance optimization includes the following steps:
103, inputting original sample data and a target low participation sample into an RCN model for training to obtain a trained RCN model;
104, acquiring video data to be identified, and generating image data to be identified according to the video data to be identified;
and 105, inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
According to the classroom participation identification method based on regional coding and sample balance optimization, video data of online learning of students are obtained, and original sample data are generated according to the video data, wherein the original sample data comprise high participation sample data and low participation sample data; inputting low participation sample data into a StarGAN model to generate target low participation samples with different styles; inputting original sample data and a target low participation sample into an RCN model for training to obtain a trained RCN model; acquiring video data to be identified, and generating image data to be identified according to the video data to be identified; and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result. Therefore, the technical problems of extremely unbalanced sample distribution and face shielding by hands in the participation identification task of the conventional participation identification method can be solved, a low-participation sample is generated by providing a StarGAN model, the participation database is enhanced, and meanwhile, a region coding network for face region coding is provided, so that the attention weights of different face regions can be adaptively learned, modeling feature learning and shielding region coding are combined, and the discrimination and the robustness of a network model are remarkably improved.
Further, in this embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to a participation degree label.
Illustratively, videos of online learning of students can be acquired through a camera, saved as one video every 10 seconds, and an engagement label {0,1,2,3} is defined for each video by utilizing manual and prior information.
The method comprises the steps of extracting image frames by adopting OpenCV (open source computer vision), cutting and extracting a face area of the image frames by adopting a face recognition open source tool face _ recogmtion, and storing the face area into a database, wherein original sample data can be divided into high-participation sample data and low-participation sample data according to a participation degree label of video data, for example, the original sample data generated by videos with participation degree labels of 0 and 1 are divided into the low-participation sample data, and the original sample data generated by the videos with participation degree labels of 2 and 3 are divided into the high-participation sample data.
Further, in an embodiment of the present application, the StarGAN model includes: the mapping network, the style encoder, the generator and the discriminator further comprise the following steps before inputting the low participation sample data into the StarGAN model to generate target low participation samples with different styles:
acquiring low participation training data, wherein the low participation training data are face images;
inputting the training data with low participation into a StarGAN model for training, and performing iterative optimization on the StarGAN model through a loss function.
According to the method and the device, the thought of 'fighting game' for generating the fighting network is introduced, the fighting network StarGAN is generated based on the star, the low-participation-degree sample is generated based on the fighting network StarGAN, the number of the few samples of the database is expanded, and the participation-degree database is enhanced, so that the influence caused by imbalance of the data set is improved.
Initializing StarGAN model parameters, inputting low-participation sample data with participation degree labels of 0 and 1 into a StarGAN model, generating low-participation samples with different styles, and enhancing a database.
The StarGAN model of the present application includes a mapping network, a style encoder, a generator, and a discriminator. The mapping network is composed of a multi-layer perceptron with a plurality of output branches, and can map given random Gaussian noise into diversified style characteristic representations. The style encoder can extract different style feature representations using a depth network given different reference images. The mapping network and the style encoder each have a plurality of output branches, each branch corresponding to a style characteristic of a particular domain. The generator generates a false image with multiple styles but unchanged content according to the given input image and style characteristics. The discriminator has a plurality of output branches corresponding to a plurality of target domains, each output branch being a classifier for discriminating whether the input image is authentic at a specific target domain thereof.
In the StarGAN model training process, the generator combines the input style characteristics to generate a vivid image with certain style characteristics as far as possible, the discriminator identifies the false image generated by the generator as far as possible, the false image and the false image are mutually played continuously, the capability of the generator for generating the vivid image is continuously improved, and finally the false image generated by the generator is close to the real image as far as possible.
According to the data distribution of the on-line learning engagement of students, the field of the engagement data is set based on the engagement degree of the students, namely, the concept of the field in the application refers to the engagement label, and the style characteristics of the image comprise the hair style of a person, the skin color, the beard, whether the person wears glasses, the angle and the posture of the eyes staring at the screen, and the like.
Further, in the embodiment of the present application, the loss function of the StarGAN model includes: confrontation loss, style reconstruction loss, diversity sensitivity loss, and cycle consistency loss, wherein,
the challenge loss is expressed as:
wherein L is adv Representing the countermeasure loss, E () representing the mathematical expectation value, x representing the input image, y representing the original domain of the input image, D y (x) Is the output of the discriminator in the original domain y,representing the target domain, z represents random gaussian noise,representation mapThe ray network generates style characteristics of a target domain according to random Gaussian noise,representing the output of the discriminator on the image generated by the generator,a representation generator generates a false image with a field of y according to the input image and the target style characteristics, and inputs the false image and the target field into a discriminator so that the discriminator learns to distinguish the authenticity of the input image;
the style reconstruction penalty is expressed as:
wherein L is sty Representing a loss of stylistic reconstruction, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,representing the target domain, z represents random gaussian noise,representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the loss of diversity sensitivity is expressed as:
wherein L is ds Representing loss of diversity sensitivity, E () representing the mathematical expectation value, z 1 And z 2 Representing a random gaussian noise vector and representing the noise,andrespectively representing a vector z of random gaussian noise by a mapping network 1 And z 2 Outputting the obtained style feature vector, and outputting the style feature vector,the representation generator is based on the input image and the style characteristicsThe image to be generated is then displayed on the display,the representation generator is based on the input image and the style characteristicsGenerated images, the method maximizing the loss between generated images having different styles, thereby encouraging the generator to generate more diverse styles of images during the training process;
the cycle consistency loss is expressed as:
wherein L is cyc Representing a loss of cyclic consistency, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,representing the target domain, z represents random gaussian noise,is an estimated stylistic encoding of the input image x,representing a false image to be generated using a generatorAndreconstructing to obtain a styleImage of (2)The representation generator is based on the input image and the style characteristicsGenerated image, by constrainingL1 loss from the input image x, so that the generator retains some of the original features of x while changing the style;
the StarGAN model is optimized using an objective function, where the objective function is expressed as:
min G,F,E max D L adv +λ sty L sty -λ ds L ds +λ cyc L cyc ;
wherein, min G,F,E Represents the minimization of an objective function, max, by a training generator, a mapping network and a style encoder D Representing maximization of an objective function, L, by training an arbiter adv Denotes the loss of antagonism, L sty Represents a loss of style reconstruction, L ds Indicates a loss of diversity sensitivity, L cyc Denotes the loss of cyclic consistency, λ sty 、λ ds And λ cyc Is a hyper-parameter used to balance the losses.
The loss functions of the StarGAN model of the present application include confrontational loss, stylistic reconstruction loss, diversity sensitivity loss, and cycle consistency loss. The confrontation loss enables the generator and the discriminator to confront and optimize in the training process, and the model performance is continuously improved. The stylistic reconstruction penalty causes the generator to use a particular stylistic representation when generating the image, resulting in a greater penalty value if other stylistic representations are used. The diversity-sensitive loss makes the images generated by the generator rich in diversity by maximizing the L1 loss between two images of different domains, where the L1 loss is used to minimize the error, expressed as the absolute value of the difference between the true and predicted values. The cyclic consistency loss is used to ensure that certain unaltered features of the input image can be correctly retained in the generated image.
Further, in the embodiment of the present application, inputting low participation sample data into the StarGAN model, and generating target low participation samples with different styles, includes:
inputting the face image in the low participation sample data into a StarGAN model, generating different style characteristics through a mapping network or a style encoder, and generating target low participation samples with different styles through a generator according to the input face image and the different style characteristics.
Further, in this embodiment of the present application, the RCN model includes a feature extraction unit, a region attention unit, and a global attention unit, and the method includes inputting original sample data and a target low participation sample into the RCN model for training to obtain a trained RCN model, including:
inputting original sample data and a target low participation sample into an RCN model, and performing feature extraction on the original sample data and the target low participation sample through a feature extraction unit to obtain local area features of the sample;
in a feature space, a region attention unit learns attention weights of different face regions to perform region coding on local region features of a sample to obtain global features of the sample;
respectively connecting the local area characteristics of the sample with the global characteristics of the sample in series to obtain sample characteristics, obtaining the attention weight of the sample characteristics through a global attention unit, and performing weighted fusion on the sample characteristics to obtain final sample characteristics;
and according to the characteristics of the final sample, performing iterative updating and optimization on the network parameters of the RCN model by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain the trained RCN model.
According to the method and the device, the region coding is carried out by learning the attention weights of different face regions, so that the model focuses more on the region with larger weight, and the model identification performance is further improved.
The method comprises the steps of inputting original sample data and a target low participation sample into an RCN together, firstly, carrying out feature extraction on the input sample, and then carrying out region coding in a feature space by learning attention weights of different face regions; weighting and fusing all local region features to obtain a global feature, connecting the local feature and the global feature in series, obtaining more accurate weight by adopting an attention mechanism, and obtaining final feature representation after weighting and fusing; and finally, carrying out iterative updating and optimization on network parameters by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain a more optimal participation degree identification model.
Wherein the loss of regional bias is used to constrain the attention weight α i I.e. existence of a certain local area F using a hyper-parametric delta constraint i Attention weight of alpha i Larger than the original face image F with edges 0 Weight of alpha 0 ,
The loss of area deviation is expressed as
L RB =max{0,δ-(α max -α 0 )}
Wherein L is RB Indicating regional bias loss, delta denotes hyper-parameter, alpha 0 Is the attention weight, alpha, of the original face image max Representing the maximum weight of all local regions.
The cross entropy loss is expressed as:
wherein L is CE (p, y) represents cross entropy loss, N represents the number of samples, y i Label representing the ith sample, p i And the ith result is represented after the model calculation is output.
Further, in this embodiment of the present application, inputting image data to be recognized into a trained RCN model to obtain a result of participation degree recognition, including:
inputting image data to be identified into a feature extraction unit for feature extraction to obtain a feature map, and randomly cutting the feature map into a preset number of regional features;
inputting the regional characteristics into a regional attention unit, calculating attention weight of the regional characteristics, and weighting the regional characteristics to obtain global characteristics;
and respectively connecting the regional characteristics with the global characteristics in series to obtain target characteristics, obtaining the attention weight of the target characteristics through a global attention unit, weighting the target characteristics to obtain final characteristics, and identifying and classifying the final characteristics to obtain the participation degree identification result of the image data to be identified.
Extracting image frames from a video to be recognized by adopting OpenCV (open source computer vision), and cutting and extracting face areas of the image frames by adopting a face recognition open source tool face-recognition to obtain face images serving as images to be recognized; inputting an image to be recognized into a trained RCN model, firstly extracting facial features of the input image, randomly cutting, then learning weights of different facial regions in a self-adaptive manner, and performing weighted fusion to obtain global features; and connecting the local features and the global features in series, then performing participation identification, and outputting an identification result.
The RCN model in the application comprises a feature extraction unit, a region attention unit and a global attention unit.
The method for recognizing the image to be recognized based on the RCN model is described in detail below.
The feature extraction unit takes the facial expression image to be recognized with the size of 224 multiplied by 3 as input, uses a convolution neural network to carry out feature extraction, and obtains a feature map f with the size of 28 multiplied by 512 0 . The convolutional neural network includes 10 convolutional layers and 3 pooling layers. First, after two convolutions with 64 convolution kernels, pooling is performed once, and then 128 volumes are passedPooling again after twice convolution of the kernels, pooling again after three times of convolution of 256 convolution kernels, and finally pooling again after three times of convolution of 512 convolution kernels to obtain a feature map f 0 . Then f is mixed 0 Randomly cutting the image into n area features f with the size of 6 multiplied by 512 i (i =1, 2.... N), each region being processed separately by a region attention unit. The region attention unit is realized by an attention network which comprises a pooling layer, two convolution layers with convolution kernels of 512 and 128 respectively, a full connection layer and a sigmoid layer. By calculating the input regional characteristics f i Attention weight α of (i =0,1,.., n) i (i =0, 1.. Eta., n), for the region feature f i Weighting to obtain a global attention representation f m And the region coding mechanism is assisted to be optimized from a global angle, and the weight parameters are adaptively adjusted.
Wherein the attention weight α of the region feature i Expressed as:
α i =sigmoid(f i T ·q)
wherein sigmoid () is a nonlinear activation function, f i T The region characteristics after the transposition, the q full link layer parameters,
global attention representation f m Expressed as:
wherein n represents the number of regions, α i Attention weight, f, representing the characteristics of a region i The region features are represented.
Using a regional bias penalty in the regional attention unit for constraining the attention weight α i I.e. to restrict the existence of a certain region f i Attention weight α of (i =1, 2.. Said., n) i Larger than the original face image f with edges 0 Weight of alpha 0 The attention degree to important areas is improved through the 'encouraging' RCN model, so that the model can obtain better areas and global representation weight values.
The area deviation loss function is expressed as:
L RB =max{0,δ-(α max -α 0 )}
where LRB denotes regional bias loss, δ is a hyper-parameter, α max Representing the maximum weight of all local regions.
The global attention unit is implemented by an attention network comprising a fully connected layer and a sigmoid layer. Characterizing the region f i (i =0,1,. Eta., n) and the global representation feature f, respectively m Are connected in series to obtain the target characteristic (f) i :f m ]The attention weight β is then derived by the global attention unit i (i =0,1,. Ang., n), pair [ f [ ] i :f m ]And weighting to obtain a final feature representation P, and finally identifying and classifying the P.
Wherein the attention weight beta of the target feature i Expressed as:
wherein sigmoid () is a nonlinear activation function,the characteristics which are transposed after the area characteristics and the global characteristics are connected in series are shown,representing the full connection layer parameters.
The final feature representation P is expressed as:
wherein n represents the number of regions, α i Attention weight, β, representing a feature of a region i Represents the target feature attention weight, [ f ] i :f m ]Representing the target feature.
Fig. 2 is another flowchart of a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application.
As shown in FIG. 2, the classroom participation degree identification method of area coding and sample balance optimization comprises the steps of capturing real-time online learning pictures of educated persons by using a camera, and synchronously performing data preprocessing; inputting the low participation samples into a StarGAN model, generating the low participation samples with different styles through a mapping network or a style encoder, and expanding the number of the few samples of the database to improve the influence caused by imbalance of the data set; inputting original data and the generated low participation sample into the RCN together, and performing region coding by learning attention weights of different face regions to enable the model to focus more on the region with larger weight; and obtaining an engagement recognition result by the on-line collected real-time learning video through the trained engagement recognition framework.
Fig. 3 is a schematic structural diagram of an online learning low-participation image generated based on a StarGAN model by a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application.
As shown in FIG. 3, random Gaussian noise z is combined with a reference imageRespectively input into the mapping network and the style encoder to generate the target style characteristicsCharacterizing object styleAnd inputting the given image x into a generator G to generate a false image, and finally identifying the generated image by an identifier D to obtain a low participation sample.
Fig. 4 is a schematic diagram of a low-participation sample generated based on the StarGAN model by the area coding and sample balance optimized classroom participation identification method according to the embodiment of the present application.
As shown in fig. 4, given an input image and a reference image, different low-participation generated image samples are obtained after different training iterations.
Fig. 5 is a schematic structural diagram of a feature extraction convolutional neural network of a classroom participation identification method for regional coding and sample balance optimization according to an embodiment of the present application.
As shown in fig. 5, the feature extraction unit performs feature extraction by using a convolutional neural network, performs feature extraction by using a facial expression image with a size of 224 × 224 × 3 as an input, reduces the dimensions of the width and the height of the features after passing through the VGG16 model, and increases the number of channels, thereby finally obtaining a feature map with dimensions of 28 × 28 × 512.
Fig. 6 is a schematic diagram of an RCN model-based engagement recognition framework of a classroom engagement recognition method for region coding and sample balance optimization according to an embodiment of the present application.
As shown in FIG. 6, the original image and the generated image are input to the feature extraction unit for feature extraction to obtain a feature map f 0 A 1 is to f 0 Randomly cutting n area characteristics f i (i =0, 1.., n), calculating the inputted region feature f by the region attention unit i Attention weight α of (i =0, 1.., n) i (i =0, 1.. Eta., n), and for the region feature f i Weighting to obtain a global attention representation f m Characterizing the region f i (i =0, 1.. N.) is associated with the global representation feature f, respectively m Are connected in series to obtain the target characteristic (f) i :f m ]Then the attention weight β is obtained by the global attention unit i (i =0,1,. Eta., n), and pair [ f i :f m ]Weighting is performed to obtain the final feature representation P.
Fig. 7 is a schematic structural diagram of a classroom participation identification device with area coding and sample balance optimization according to a second embodiment of the present application.
As shown in fig. 7, the classroom participation identification device for area coding and sample balance optimization comprises:
the first obtaining module 10 is configured to obtain video data of online learning of a student, and generate original sample data according to the video data, where the original sample data includes high participation sample data and low participation sample data;
the generating module 20 is configured to input the low participation sample data into the StarGAN model, and generate target low participation samples with different styles;
the training module 30 is configured to input the original sample data and the target low participation sample into the RCN model for training, so as to obtain a trained RCN model;
the second obtaining module 40 is configured to obtain video data to be identified, and generate image data to be identified according to the video data to be identified;
and the recognition module 50 is configured to input the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
The classroom participation degree identification device with regional coding and sample balance optimization comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring video data of online learning of students and generating original sample data according to the video data, and the original sample data comprises high participation sample data and low participation sample data; the generating module is used for inputting the low participation sample data into the StarGAN model and generating target low participation samples with different styles; the training module is used for inputting original sample data and target low participation samples into the RCN model for training to obtain a trained RCN model; the second acquisition module is used for acquiring the video data to be identified and generating image data to be identified according to the video data to be identified; and the recognition module is used for inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result. Therefore, the technical problems of extremely unbalanced sample distribution and face shielding by hands in the participation identification task of the conventional participation identification method can be solved, a low-participation sample is generated by providing a StarGAN model, the participation database is enhanced, and meanwhile, a region coding network for face region coding is provided, so that the attention weights of different face regions can be adaptively learned, modeling feature learning and shielding region coding are combined, and the discrimination and the robustness of a network model are remarkably improved.
Further, in this embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from the video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
In order to achieve the above embodiments, the present application further proposes a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the classroom participation identification method for area coding and sample balance optimization of the above embodiments.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are exemplary and should not be construed as limiting the present application and that changes, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.
Claims (10)
1. A classroom participation identification method based on regional coding and sample balance optimization is characterized by comprising the following steps:
the method comprises the steps of obtaining video data of on-line learning of a student, and generating original sample data according to the video data, wherein the original sample data comprises high-participation sample data and low-participation sample data;
inputting the low participation sample data into a StarGAN model to generate target low participation samples with different styles;
inputting the original sample data and the target low participation sample into an RCN model for training to obtain a trained RCN model;
acquiring video data to be identified, and generating image data to be identified according to the video data to be identified;
and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
2. The method of claim 1, wherein said generating original sample data from said video data comprises:
defining a participation degree label of the video data by utilizing manual and prior information;
and extracting an image frame from the video data, cutting and extracting a face area of the image frame to obtain a face image as the original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
3. The method of claim 1, wherein the StarGAN model comprises: before inputting the low participation sample data into a StarGAN model to generate target low participation samples with different styles, the mapping network, the style encoder, the generator and the discriminator further comprise:
acquiring low participation training data, wherein the low participation training data are face images;
inputting the low participation training data into a StarGAN model for training, and performing iterative optimization on the StarGAN model through a loss function.
4. The method of claim 3, wherein the loss function of the StarGAN model comprises: confrontation loss, style reconstruction loss, diversity sensitivity loss, and cycle consistency loss, wherein,
the challenge loss is expressed as:
wherein L is adv Representing the countermeasure loss, E () representing the mathematical expectation value, x representing the input image, y representing the original domain of the input image, D y (x) Is the output of the discriminator in the original domain y,representing the target domain, z represents random gaussian noise,representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,representing the output of the discriminator on the image generated by the generator,the representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the loss of style reconstruction is expressed as:
wherein L is sty Representing a loss of stylistic reconstruction, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,representing the target domain, z represents random gaussian noise,representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the diversity sensitivity loss is expressed as:
wherein L is ds Representing loss of diversity sensitivity, E () representing the mathematical expectation value, z 1 And z 2 Representing a random gaussian noise vector and representing the noise,andrespectively representing a vector z of random gaussian noise by a mapping network 1 And z 2 Outputting the obtained style characteristic vector and outputting the style characteristic vector,the representation generator is based on the input image and the style characteristicsThe image that is generated is displayed on the display,the representation generator is based on the input image and the style characteristicsThe generated image;
the cycle consistency loss is expressed as:
wherein L is cyc Representing a loss of cyclic consistency, E () representing a mathematical expectation value, x representing the input image, y representing the original domain of the input image,representing the target domain, z represents random gaussian noise,is an estimated stylistic encoding of the input image x,representing a false image to be generated using a generatorAndreconstructing to obtain a styleThe image of (a) is displayed on the display,the representation generator is based on the input image and the style characteristicsThe generated image;
optimizing the StarGAN model using an objective function, wherein the objective function is represented as:
min G,F,E max D L adv +λ sty L sty -λ ds L ds +λ cyc L cyc;
wherein, min G,F,E Representation minimization by training generators, mapping networks and style encodersObjective function, max D Representing maximization of an objective function, L, by training an arbiter adv Denotes the loss of antagonism, L sty Represents a loss of style reconstruction, L ds Indicating a loss of diversity sensitivity, L cyc Denotes the loss of cyclic consistency, λ sty 、λ ds And λ cyc Is a hyper-parameter used to balance losses.
5. The method of claim 4, wherein said inputting the low engagement sample data into a StarGAN model, generating target low engagement samples having different styles, comprises:
inputting the face image in the low-participation sample data into a StarGAN model, generating different style characteristics through the mapping network or the style encoder, and generating target low-participation samples with different styles through a generator according to the input face image and the different style characteristics.
6. The method of claim 1, wherein the RCN model comprises a feature extraction unit, a region attention unit, and a global attention unit, and the inputting the original sample data and the target low participation sample into the RCN model for training to obtain a trained RCN model comprises:
inputting the original sample data and the target low participation sample into an RCN model, and performing feature extraction on the original sample data and the target low participation sample through the feature extraction unit to obtain local area features of the sample;
in a feature space, the region attention unit learns the attention weights of different face regions to perform region coding on the local region features of the sample to obtain global features of the sample;
respectively connecting the local area characteristics of the sample with the global characteristics of the sample in series to obtain sample characteristics, obtaining attention weights of the sample characteristics through the global attention unit, and performing weighted fusion on the sample characteristics to obtain final sample characteristics;
and according to the final sample characteristics, performing iterative updating and optimization on the network parameters of the RCN model by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain the trained RCN model.
7. The method of claim 6, wherein the inputting the image data to be recognized into the trained RCN model to obtain an engagement recognition result comprises:
inputting the image data to be identified into the feature extraction unit for feature extraction to obtain a feature map, and randomly cutting the feature map into a preset number of regional features;
inputting the regional features into the regional attention unit, calculating attention weights of the regional features, and weighting the regional features to obtain global features;
and respectively connecting the region features with the global features in series to obtain target features, obtaining attention weights of the target features through the global attention unit, weighting the target features to obtain final features, and identifying and classifying the final features to obtain a participation identification result of the image data to be identified.
8. An area coding and sample balance optimized classroom participation identification device, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring video data of on-line learning of a student and generating original sample data according to the video data, and the original sample data comprises high-participation sample data and low-participation sample data;
the generating module is used for inputting the low participation sample data into a StarGAN model and generating target low participation samples with different styles;
the training module is used for inputting the original sample data and the target low participation sample into an RCN model for training to obtain a trained RCN model;
the second acquisition module is used for acquiring video data to be identified and generating image data to be identified according to the video data to be identified;
and the recognition module is used for inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
9. The apparatus of claim 8, wherein said generating original sample data from said video data comprises:
defining a participation label of the video data by utilizing manual work and prior information;
and extracting an image frame from the video data, cutting and extracting a face area of the image frame to obtain a face image as the original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211246980.4A CN115439915A (en) | 2022-10-12 | 2022-10-12 | Classroom participation identification method and device based on region coding and sample balance optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211246980.4A CN115439915A (en) | 2022-10-12 | 2022-10-12 | Classroom participation identification method and device based on region coding and sample balance optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115439915A true CN115439915A (en) | 2022-12-06 |
Family
ID=84251064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211246980.4A Pending CN115439915A (en) | 2022-10-12 | 2022-10-12 | Classroom participation identification method and device based on region coding and sample balance optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115439915A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291819A (en) * | 2020-02-19 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Image recognition method and device, electronic equipment and storage medium |
CN111597978A (en) * | 2020-05-14 | 2020-08-28 | 公安部第三研究所 | Method for automatically generating pedestrian re-identification picture based on StarGAN network model |
CN113159002A (en) * | 2021-05-26 | 2021-07-23 | 重庆大学 | Facial expression recognition method based on self-attention weight auxiliary module |
CN113158872A (en) * | 2021-04-16 | 2021-07-23 | 中国海洋大学 | Online learner emotion recognition method |
CN113344479A (en) * | 2021-08-06 | 2021-09-03 | 首都师范大学 | Online classroom-oriented learning participation intelligent assessment method and device |
CN113421187A (en) * | 2021-06-10 | 2021-09-21 | 山东师范大学 | Super-resolution reconstruction method, system, storage medium and equipment |
CN113537254A (en) * | 2021-08-27 | 2021-10-22 | 重庆紫光华山智安科技有限公司 | Image feature extraction method and device, electronic equipment and readable storage medium |
CN113936317A (en) * | 2021-10-15 | 2022-01-14 | 南京大学 | Priori knowledge-based facial expression recognition method |
CN114065874A (en) * | 2021-11-30 | 2022-02-18 | 河北省科学院应用数学研究所 | Medicine glass bottle appearance defect detection model training method and device and terminal equipment |
CN114973126A (en) * | 2022-05-17 | 2022-08-30 | 中南大学 | Real-time visual analysis method for student participation degree of online course |
-
2022
- 2022-10-12 CN CN202211246980.4A patent/CN115439915A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291819A (en) * | 2020-02-19 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Image recognition method and device, electronic equipment and storage medium |
CN111597978A (en) * | 2020-05-14 | 2020-08-28 | 公安部第三研究所 | Method for automatically generating pedestrian re-identification picture based on StarGAN network model |
CN113158872A (en) * | 2021-04-16 | 2021-07-23 | 中国海洋大学 | Online learner emotion recognition method |
CN113159002A (en) * | 2021-05-26 | 2021-07-23 | 重庆大学 | Facial expression recognition method based on self-attention weight auxiliary module |
CN113421187A (en) * | 2021-06-10 | 2021-09-21 | 山东师范大学 | Super-resolution reconstruction method, system, storage medium and equipment |
CN113344479A (en) * | 2021-08-06 | 2021-09-03 | 首都师范大学 | Online classroom-oriented learning participation intelligent assessment method and device |
CN113537254A (en) * | 2021-08-27 | 2021-10-22 | 重庆紫光华山智安科技有限公司 | Image feature extraction method and device, electronic equipment and readable storage medium |
CN113936317A (en) * | 2021-10-15 | 2022-01-14 | 南京大学 | Priori knowledge-based facial expression recognition method |
CN114065874A (en) * | 2021-11-30 | 2022-02-18 | 河北省科学院应用数学研究所 | Medicine glass bottle appearance defect detection model training method and device and terminal equipment |
CN114973126A (en) * | 2022-05-17 | 2022-08-30 | 中南大学 | Real-time visual analysis method for student participation degree of online course |
Non-Patent Citations (2)
Title |
---|
KAI WANG等: "Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 29, pages 1 * |
YUNJEY CHOI等: "StarGAN v2: Diverse Image Synthesis for Multiple Domains", 《ARXIV:1912.01865V2》, pages 1 - 3 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Matern et al. | Exploiting visual artifacts to expose deepfakes and face manipulations | |
CN110889672B (en) | Student card punching and class taking state detection system based on deep learning | |
CN109558832A (en) | A kind of human body attitude detection method, device, equipment and storage medium | |
CN110263681A (en) | The recognition methods of facial expression and device, storage medium, electronic device | |
CN114359526B (en) | Cross-domain image style migration method based on semantic GAN | |
Li et al. | Image manipulation localization using attentional cross-domain CNN features | |
CN113283334B (en) | Classroom concentration analysis method, device and storage medium | |
Ververas et al. | Slidergan: Synthesizing expressive face images by sliding 3d blendshape parameters | |
CN111275638A (en) | Face restoration method for generating confrontation network based on multi-channel attention selection | |
Gafni et al. | Wish you were here: Context-aware human generation | |
CN113112416A (en) | Semantic-guided face image restoration method | |
Liu et al. | Modern architecture style transfer for ruin or old buildings | |
CN116403262A (en) | Online learning concentration monitoring method, system and medium based on machine vision | |
CN115731596A (en) | Spontaneous expression recognition method based on progressive label distribution and depth network | |
CN114549341A (en) | Sample guidance-based face image diversified restoration method | |
CN112070181A (en) | Image stream-based cooperative detection method and device and storage medium | |
CN114841887B (en) | Image recovery quality evaluation method based on multi-level difference learning | |
CN108665455B (en) | Method and device for evaluating image significance prediction result | |
CN115439915A (en) | Classroom participation identification method and device based on region coding and sample balance optimization | |
CN110210574A (en) | Diameter radar image decomposition method, Target Identification Unit and equipment | |
CN112115779B (en) | Interpretable classroom student emotion analysis method, system, device and medium | |
JP7362924B2 (en) | Data augmentation-based spatial analysis model learning device and method | |
Li et al. | Face mask removal based on generative adversarial network and texture network | |
CN114049303A (en) | Progressive bone age assessment method based on multi-granularity feature fusion | |
Narayana | Improving gesture recognition through spatial focus of attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20221206 |
|
RJ01 | Rejection of invention patent application after publication |