CN115439915A - Classroom participation identification method and device based on region coding and sample balance optimization - Google Patents

Classroom participation identification method and device based on region coding and sample balance optimization Download PDF

Info

Publication number
CN115439915A
CN115439915A CN202211246980.4A CN202211246980A CN115439915A CN 115439915 A CN115439915 A CN 115439915A CN 202211246980 A CN202211246980 A CN 202211246980A CN 115439915 A CN115439915 A CN 115439915A
Authority
CN
China
Prior art keywords
participation
model
sample
representing
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211246980.4A
Other languages
Chinese (zh)
Inventor
徐敏
张曦淼
王嘉豪
孙众
邱德慧
董瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN202211246980.4A priority Critical patent/CN115439915A/en
Publication of CN115439915A publication Critical patent/CN115439915A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Medical Informatics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a classroom participation identification method based on regional coding and sample balance optimization, which comprises the following steps: acquiring video data of on-line learning of a student, and generating original sample data according to the video data, wherein the original sample data comprises high participation sample data and low participation sample data; inputting low participation sample data into a StarGAN model to generate target low participation samples with different styles; inputting original sample data and target low participation samples into an RCN model for training to obtain a trained RCN model; acquiring video data to be identified, and generating image data to be identified according to the video data to be identified; and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result. The method and the device effectively solve the problems of extremely unbalanced sample distribution and face shielding by hands in the participation degree identification task, and remarkably improve the discrimination and robustness of the network model.

Description

Classroom participation identification method and device based on region coding and sample balance optimization
Technical Field
The application relates to the technical field of intelligent education and computer vision cross research, in particular to a classroom participation identification method and device based on regional coding and sample balance optimization.
Background
On-line education provides a brand-new knowledge propagation mode and a learning mode, teachers can conduct education activities such as live broadcast teaching, recorded broadcast playback, on-line question answering and correction homework through network education platforms such as MOOC and students can complete learning tasks according to own rhythms. On-line teaching has the characteristics of abundant learning resources, timely knowledge acquisition, various learning modes and the like, and gradually becomes an organic component of normal education and teaching activities. The interaction between teachers and students is a key link in the teaching process. In a traditional classroom environment, a teacher can directly observe the facial expressions and behaviors of students to judge the input degree of the students. However, in an online class, due to factors such as teaching scenes, students lack real-time interaction with teachers in face-to-face communication, attention is easy to disperse, teachers cannot obtain real-time feedback of input states of students, and learning effects of students can only be judged through class questioning and post-school assignment feedback. Therefore, how to realize the automatic evaluation of the learning participation of students in the online learning environment through the computer vision technology is a problem to be solved urgently at present.
Research of automatic recognition of participation can be divided into two categories, namely traditional machine learning-based research and deep learning-based research. Traditional computer vision technology-based recognition methods typically estimate engagement through facial features or manually extracted features of other modalities and through machine learning. For the participation identification task, whether the online class or the offline class, most students can listen to the conversation seriously, and only a few students are not concentrated in attention, so that the problem of serious unbalanced sample distribution exists in the participation data acquired in the natural environment, namely the number of samples with low participation is very small, and the number of samples with high participation accounts for a large proportion. Most of the prior engagement recognition algorithms can obtain higher accuracy in the whole classification task, but the classification capability of most types is often improved, and the judgment of few types of samples is ignored. In addition, since the behavior of students in the learning process is not artificially restricted in the natural environment, part of the facial area is often carelessly covered by hands, so that the facial expression change cannot be captured, and the situation is easily recognized by the model as distraction and a low participation degree predicted value is obtained.
In summary, in the learning participation degree identification method in the prior art, characteristics of unbalanced sample data distribution, the hand shielding condition of the participation degree sample and the like of the participation degree identification task are not fully considered, and the method has the defect of low identification accuracy.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a classroom participation identification method for area Coding and sample balance optimization, which solves the technical problems of extremely unbalanced sample distribution and face occlusion by hand in the participation identification task of the existing participation identification method, and by providing a StarGAN (Star generated adaptive Network) model to generate a low-participation sample, enhance the participation database, and at the same time, providing an RCN (Region Coding Network) model for face area Coding, which can adaptively learn attention weights of different face areas, and combine model feature learning and occlusion area Coding, thereby significantly improving the discriminative power and robustness of the Network model.
A second objective of the present application is to provide a classroom participation identification device with optimized region coding and sample balance.
A third object of the present application is to propose a non-transitory computer-readable storage medium.
In order to achieve the above object, an embodiment of a first aspect of the present application provides a classroom participation identification method based on region coding and sample balance optimization, including: acquiring video data of on-line learning of a student, and generating original sample data according to the video data, wherein the original sample data comprises high participation sample data and low participation sample data; inputting low participation sample data into a StarGAN model to generate target low participation samples with different styles; inputting original sample data and target low participation samples into an RCN model for training to obtain a trained RCN model; acquiring video data to be identified, and generating image data to be identified according to the video data to be identified; and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
Optionally, in an embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to a participation degree label.
Optionally, in an embodiment of the present application, the StarGAN model includes: before inputting the low participation sample data into the StarGAN model and generating target low participation samples with different styles, the mapping network, the style encoder, the generator and the discriminator further comprise:
acquiring low participation training data, wherein the low participation training data are face images;
inputting low participation training data into the StarGAN model for training, and performing iterative optimization on the StarGAN model through a loss function.
Optionally, in an embodiment of the present application, the loss function of the StarGAN model includes: confrontation loss, style reconstruction loss, diversity sensitivity loss, and cycle consistency loss, wherein,
the challenge loss is expressed as:
Figure BDA0003887076360000021
wherein L is adv Representing the countermeasure loss, E () representing the mathematical expectation value, x representing the input image, y representing the original domain of the input image, D y (x) Is the output of the discriminator in the original domain y,
Figure BDA0003887076360000022
representing the target domain, z represents random gaussian noise,
Figure BDA0003887076360000023
representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,
Figure BDA0003887076360000031
representing the output of the discriminator on the image generated by the generator,
Figure BDA0003887076360000032
a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the style reconstruction penalty is expressed as:
Figure BDA0003887076360000033
wherein L is sty Representing a loss of stylistic reconstruction, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,
Figure BDA0003887076360000034
representing the target domain, z represents random gaussian noise,
Figure BDA0003887076360000035
representing the style characteristics of the target domain generated by the mapping network based on random gaussian noise,
Figure BDA0003887076360000036
a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the loss of diversity sensitivity is expressed as:
Figure BDA0003887076360000037
wherein L is ds Representing loss of diversity sensitivity, E () representing the mathematical expectation value, z 1 And z 2 Representing a random gaussian noise vector and representing the noise,
Figure BDA0003887076360000038
and
Figure BDA0003887076360000039
respectively representing a vector z of random gaussian noise by a mapping network 1 And z 2 Outputting the obtained style characteristic vector and outputting the style characteristic vector,
Figure BDA00038870763600000310
the representation generator is based on the input image and the style characteristics
Figure BDA00038870763600000311
The image to be generated is then displayed on the display,
Figure BDA00038870763600000312
the representation generator is based on the input image and the style characteristics
Figure BDA00038870763600000313
A generated image;
the cycle consistency loss is expressed as:
Figure BDA00038870763600000314
wherein L is cyc Representing a loss of cyclic consistency, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,
Figure BDA00038870763600000315
representing the target domain, z represents random gaussian noise,
Figure BDA00038870763600000316
is an estimated stylistic encoding of the input image x,
Figure BDA00038870763600000317
representing a false image to be generated using a generator
Figure BDA00038870763600000318
And
Figure BDA00038870763600000319
reconstructing to obtain a style
Figure BDA00038870763600000320
The image of (a) is displayed on the display,
Figure BDA00038870763600000321
the representation generator is based on the input image and the style characteristics
Figure BDA00038870763600000322
A generated image;
the StarGAN model is optimized using an objective function, where the objective function is expressed as:
min G,F,E max D L advsty L styds L dscyc L cyc;
wherein, min G,F,E Representing minimization of an objective function, max, by a training generator, a mapping network and a style encoder D Representing maximization of an objective function, L, by training discriminators adv Denotes the resistance to loss, L sty Represents a loss of style reconstruction, L ds Indicates a loss of diversity sensitivity, L cyc Denotes the loss of cyclic consistency, λ sty 、λ ds And λ cyc Is a hyper-parameter used to balance losses.
Optionally, in an embodiment of the present application, inputting low participation sample data into the StarGAN model, and generating target low participation samples with different styles, includes:
the method comprises the steps of inputting a face image in low-participation sample data into a StarGAN model, generating different style characteristics through a mapping network or a style encoder, and generating target low-participation samples with different styles through a generator according to the input face image and the different style characteristics.
Optionally, in an embodiment of the present application, the RCN model includes a feature extraction unit, a region attention unit, and a global attention unit, and the method includes inputting original sample data and a target low participation sample into the RCN model for training to obtain a trained RCN model, including:
inputting original sample data and a target low participation sample into an RCN model, and performing feature extraction on the original sample data and the target low participation sample through a feature extraction unit to obtain local area features of the sample;
in the feature space, the local region features of the sample are subjected to region coding through a region attention unit learning attention weights of different face regions, and global features of the sample are obtained;
respectively connecting the local area characteristics of the sample with the global characteristics of the sample in series to obtain sample characteristics, obtaining the attention weight of the sample characteristics through a global attention unit, and performing weighted fusion on the sample characteristics to obtain final sample characteristics;
and according to the characteristics of the final sample, performing iterative updating and optimization on the network parameters of the RCN model by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain the trained RCN model.
Optionally, in an embodiment of the present application, inputting image data to be recognized into a trained RCN model to obtain an engagement recognition result, including:
inputting image data to be identified into a feature extraction unit for feature extraction to obtain a feature map, and randomly cutting the feature map into a preset number of regional features;
inputting the regional characteristics into a regional attention unit, calculating attention weight of the regional characteristics, and weighting the regional characteristics to obtain global characteristics;
and respectively connecting the regional characteristics with the global characteristics in series to obtain target characteristics, obtaining the attention weight of the target characteristics through a global attention unit, weighting the target characteristics to obtain final characteristics, and identifying and classifying the final characteristics to obtain the participation degree identification result of the image data to be identified.
To achieve the above object, a second aspect of the present application provides a classroom participation identification device optimized by region coding and sample balance, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring video data of on-line learning of a student and generating original sample data according to the video data, and the original sample data comprises high participation sample data and low participation sample data;
the generating module is used for inputting the low participation sample data into the StarGAN model and generating target low participation samples with different styles;
the training module is used for inputting original sample data and target low participation samples into the RCN model for training to obtain a trained RCN model;
the second acquisition module is used for acquiring the video data to be identified and generating image data to be identified according to the video data to be identified;
and the recognition module is used for inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
Optionally, in an embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from the video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
In order to achieve the above object, a non-transitory computer-readable storage medium is provided in a third aspect of the present application, and when executed by a processor, the instructions in the storage medium can perform a classroom participation identification method based on region coding and sample balance optimization.
According to the classroom participation identification method, device and non-transitory computer-readable storage medium for area coding and sample balance optimization, the technical problems that sample distribution is extremely unbalanced and a hand shields a face in a participation identification task of an existing participation identification method are solved, low-participation samples are generated by providing a StarGAN model, a participation database is enhanced, meanwhile, an area coding network for face area coding is provided, attention weights of different face areas can be learned in a self-adaptive mode, modeling characteristics are learned and shielding area coding is combined, and the discrimination and robustness of a network model are remarkably improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a classroom participation identification method for area coding and sample balance optimization according to an embodiment of the present application;
fig. 2 is another flowchart of a classroom participation identification method for area coding and sample balance optimization according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an online learning low-participation image generated based on a StarGAN model by a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application;
fig. 4 is a schematic diagram of a low-participation sample generated based on the StarGAN model by the area coding and sample balance optimized classroom participation identification method according to the embodiment of the present application;
FIG. 5 is a schematic structural diagram of a feature extraction convolutional neural network of a classroom participation identification method for regional coding and sample balance optimization according to an embodiment of the present application;
fig. 6 is a schematic diagram of an RCN model-based engagement recognition framework of a classroom engagement recognition method for area coding and sample balance optimization according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a classroom participation identification device with area coding and sample balance optimization according to a second embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The method and apparatus for classroom participation identification with region coding and sample balance optimization according to the embodiments of the present application are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application.
As shown in fig. 1, the classroom participation identification method based on region coding and sample balance optimization includes the following steps:
step 101, acquiring video data of online learning of a student, and generating original sample data according to the video data, wherein the original sample data comprises high participation sample data and low participation sample data;
step 102, inputting low-participation sample data into a StarGAN model to generate target low-participation samples with different styles;
103, inputting original sample data and a target low participation sample into an RCN model for training to obtain a trained RCN model;
104, acquiring video data to be identified, and generating image data to be identified according to the video data to be identified;
and 105, inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
According to the classroom participation identification method based on regional coding and sample balance optimization, video data of online learning of students are obtained, and original sample data are generated according to the video data, wherein the original sample data comprise high participation sample data and low participation sample data; inputting low participation sample data into a StarGAN model to generate target low participation samples with different styles; inputting original sample data and a target low participation sample into an RCN model for training to obtain a trained RCN model; acquiring video data to be identified, and generating image data to be identified according to the video data to be identified; and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result. Therefore, the technical problems of extremely unbalanced sample distribution and face shielding by hands in the participation identification task of the conventional participation identification method can be solved, a low-participation sample is generated by providing a StarGAN model, the participation database is enhanced, and meanwhile, a region coding network for face region coding is provided, so that the attention weights of different face regions can be adaptively learned, modeling feature learning and shielding region coding are combined, and the discrimination and the robustness of a network model are remarkably improved.
Further, in this embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to a participation degree label.
Illustratively, videos of online learning of students can be acquired through a camera, saved as one video every 10 seconds, and an engagement label {0,1,2,3} is defined for each video by utilizing manual and prior information.
The method comprises the steps of extracting image frames by adopting OpenCV (open source computer vision), cutting and extracting a face area of the image frames by adopting a face recognition open source tool face _ recogmtion, and storing the face area into a database, wherein original sample data can be divided into high-participation sample data and low-participation sample data according to a participation degree label of video data, for example, the original sample data generated by videos with participation degree labels of 0 and 1 are divided into the low-participation sample data, and the original sample data generated by the videos with participation degree labels of 2 and 3 are divided into the high-participation sample data.
Further, in an embodiment of the present application, the StarGAN model includes: the mapping network, the style encoder, the generator and the discriminator further comprise the following steps before inputting the low participation sample data into the StarGAN model to generate target low participation samples with different styles:
acquiring low participation training data, wherein the low participation training data are face images;
inputting the training data with low participation into a StarGAN model for training, and performing iterative optimization on the StarGAN model through a loss function.
According to the method and the device, the thought of 'fighting game' for generating the fighting network is introduced, the fighting network StarGAN is generated based on the star, the low-participation-degree sample is generated based on the fighting network StarGAN, the number of the few samples of the database is expanded, and the participation-degree database is enhanced, so that the influence caused by imbalance of the data set is improved.
Initializing StarGAN model parameters, inputting low-participation sample data with participation degree labels of 0 and 1 into a StarGAN model, generating low-participation samples with different styles, and enhancing a database.
The StarGAN model of the present application includes a mapping network, a style encoder, a generator, and a discriminator. The mapping network is composed of a multi-layer perceptron with a plurality of output branches, and can map given random Gaussian noise into diversified style characteristic representations. The style encoder can extract different style feature representations using a depth network given different reference images. The mapping network and the style encoder each have a plurality of output branches, each branch corresponding to a style characteristic of a particular domain. The generator generates a false image with multiple styles but unchanged content according to the given input image and style characteristics. The discriminator has a plurality of output branches corresponding to a plurality of target domains, each output branch being a classifier for discriminating whether the input image is authentic at a specific target domain thereof.
In the StarGAN model training process, the generator combines the input style characteristics to generate a vivid image with certain style characteristics as far as possible, the discriminator identifies the false image generated by the generator as far as possible, the false image and the false image are mutually played continuously, the capability of the generator for generating the vivid image is continuously improved, and finally the false image generated by the generator is close to the real image as far as possible.
According to the data distribution of the on-line learning engagement of students, the field of the engagement data is set based on the engagement degree of the students, namely, the concept of the field in the application refers to the engagement label, and the style characteristics of the image comprise the hair style of a person, the skin color, the beard, whether the person wears glasses, the angle and the posture of the eyes staring at the screen, and the like.
Further, in the embodiment of the present application, the loss function of the StarGAN model includes: confrontation loss, style reconstruction loss, diversity sensitivity loss, and cycle consistency loss, wherein,
the challenge loss is expressed as:
Figure BDA0003887076360000081
wherein L is adv Representing the countermeasure loss, E () representing the mathematical expectation value, x representing the input image, y representing the original domain of the input image, D y (x) Is the output of the discriminator in the original domain y,
Figure BDA0003887076360000082
representing the target domain, z represents random gaussian noise,
Figure BDA0003887076360000083
representation mapThe ray network generates style characteristics of a target domain according to random Gaussian noise,
Figure BDA0003887076360000084
representing the output of the discriminator on the image generated by the generator,
Figure BDA0003887076360000085
a representation generator generates a false image with a field of y according to the input image and the target style characteristics, and inputs the false image and the target field into a discriminator so that the discriminator learns to distinguish the authenticity of the input image;
the style reconstruction penalty is expressed as:
Figure BDA0003887076360000086
wherein L is sty Representing a loss of stylistic reconstruction, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,
Figure BDA0003887076360000087
representing the target domain, z represents random gaussian noise,
Figure BDA0003887076360000088
representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,
Figure BDA0003887076360000089
a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the loss of diversity sensitivity is expressed as:
Figure BDA00038870763600000810
wherein L is ds Representing loss of diversity sensitivity, E () representing the mathematical expectation value, z 1 And z 2 Representing a random gaussian noise vector and representing the noise,
Figure BDA00038870763600000811
and
Figure BDA00038870763600000812
respectively representing a vector z of random gaussian noise by a mapping network 1 And z 2 Outputting the obtained style feature vector, and outputting the style feature vector,
Figure BDA00038870763600000813
the representation generator is based on the input image and the style characteristics
Figure BDA00038870763600000814
The image to be generated is then displayed on the display,
Figure BDA00038870763600000815
the representation generator is based on the input image and the style characteristics
Figure BDA00038870763600000816
Generated images, the method maximizing the loss between generated images having different styles, thereby encouraging the generator to generate more diverse styles of images during the training process;
the cycle consistency loss is expressed as:
Figure BDA00038870763600000817
wherein L is cyc Representing a loss of cyclic consistency, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,
Figure BDA00038870763600000818
representing the target domain, z represents random gaussian noise,
Figure BDA00038870763600000819
is an estimated stylistic encoding of the input image x,
Figure BDA00038870763600000820
representing a false image to be generated using a generator
Figure BDA00038870763600000821
And
Figure BDA00038870763600000822
reconstructing to obtain a style
Figure BDA00038870763600000823
Image of (2)
Figure BDA00038870763600000824
The representation generator is based on the input image and the style characteristics
Figure BDA00038870763600000825
Generated image, by constraining
Figure BDA00038870763600000826
L1 loss from the input image x, so that the generator retains some of the original features of x while changing the style;
the StarGAN model is optimized using an objective function, where the objective function is expressed as:
min G,F,E max D L advsty L styds L dscyc L cyc
wherein, min G,F,E Represents the minimization of an objective function, max, by a training generator, a mapping network and a style encoder D Representing maximization of an objective function, L, by training an arbiter adv Denotes the loss of antagonism, L sty Represents a loss of style reconstruction, L ds Indicates a loss of diversity sensitivity, L cyc Denotes the loss of cyclic consistency, λ sty 、λ ds And λ cyc Is a hyper-parameter used to balance the losses.
The loss functions of the StarGAN model of the present application include confrontational loss, stylistic reconstruction loss, diversity sensitivity loss, and cycle consistency loss. The confrontation loss enables the generator and the discriminator to confront and optimize in the training process, and the model performance is continuously improved. The stylistic reconstruction penalty causes the generator to use a particular stylistic representation when generating the image, resulting in a greater penalty value if other stylistic representations are used. The diversity-sensitive loss makes the images generated by the generator rich in diversity by maximizing the L1 loss between two images of different domains, where the L1 loss is used to minimize the error, expressed as the absolute value of the difference between the true and predicted values. The cyclic consistency loss is used to ensure that certain unaltered features of the input image can be correctly retained in the generated image.
Further, in the embodiment of the present application, inputting low participation sample data into the StarGAN model, and generating target low participation samples with different styles, includes:
inputting the face image in the low participation sample data into a StarGAN model, generating different style characteristics through a mapping network or a style encoder, and generating target low participation samples with different styles through a generator according to the input face image and the different style characteristics.
Further, in this embodiment of the present application, the RCN model includes a feature extraction unit, a region attention unit, and a global attention unit, and the method includes inputting original sample data and a target low participation sample into the RCN model for training to obtain a trained RCN model, including:
inputting original sample data and a target low participation sample into an RCN model, and performing feature extraction on the original sample data and the target low participation sample through a feature extraction unit to obtain local area features of the sample;
in a feature space, a region attention unit learns attention weights of different face regions to perform region coding on local region features of a sample to obtain global features of the sample;
respectively connecting the local area characteristics of the sample with the global characteristics of the sample in series to obtain sample characteristics, obtaining the attention weight of the sample characteristics through a global attention unit, and performing weighted fusion on the sample characteristics to obtain final sample characteristics;
and according to the characteristics of the final sample, performing iterative updating and optimization on the network parameters of the RCN model by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain the trained RCN model.
According to the method and the device, the region coding is carried out by learning the attention weights of different face regions, so that the model focuses more on the region with larger weight, and the model identification performance is further improved.
The method comprises the steps of inputting original sample data and a target low participation sample into an RCN together, firstly, carrying out feature extraction on the input sample, and then carrying out region coding in a feature space by learning attention weights of different face regions; weighting and fusing all local region features to obtain a global feature, connecting the local feature and the global feature in series, obtaining more accurate weight by adopting an attention mechanism, and obtaining final feature representation after weighting and fusing; and finally, carrying out iterative updating and optimization on network parameters by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain a more optimal participation degree identification model.
Wherein the loss of regional bias is used to constrain the attention weight α i I.e. existence of a certain local area F using a hyper-parametric delta constraint i Attention weight of alpha i Larger than the original face image F with edges 0 Weight of alpha 0
The loss of area deviation is expressed as
L RB =max{0,δ-(α max0 )}
Wherein L is RB Indicating regional bias loss, delta denotes hyper-parameter, alpha 0 Is the attention weight, alpha, of the original face image max Representing the maximum weight of all local regions.
The cross entropy loss is expressed as:
Figure BDA0003887076360000101
wherein L is CE (p, y) represents cross entropy loss, N represents the number of samples, y i Label representing the ith sample, p i And the ith result is represented after the model calculation is output.
Further, in this embodiment of the present application, inputting image data to be recognized into a trained RCN model to obtain a result of participation degree recognition, including:
inputting image data to be identified into a feature extraction unit for feature extraction to obtain a feature map, and randomly cutting the feature map into a preset number of regional features;
inputting the regional characteristics into a regional attention unit, calculating attention weight of the regional characteristics, and weighting the regional characteristics to obtain global characteristics;
and respectively connecting the regional characteristics with the global characteristics in series to obtain target characteristics, obtaining the attention weight of the target characteristics through a global attention unit, weighting the target characteristics to obtain final characteristics, and identifying and classifying the final characteristics to obtain the participation degree identification result of the image data to be identified.
Extracting image frames from a video to be recognized by adopting OpenCV (open source computer vision), and cutting and extracting face areas of the image frames by adopting a face recognition open source tool face-recognition to obtain face images serving as images to be recognized; inputting an image to be recognized into a trained RCN model, firstly extracting facial features of the input image, randomly cutting, then learning weights of different facial regions in a self-adaptive manner, and performing weighted fusion to obtain global features; and connecting the local features and the global features in series, then performing participation identification, and outputting an identification result.
The RCN model in the application comprises a feature extraction unit, a region attention unit and a global attention unit.
The method for recognizing the image to be recognized based on the RCN model is described in detail below.
The feature extraction unit takes the facial expression image to be recognized with the size of 224 multiplied by 3 as input, uses a convolution neural network to carry out feature extraction, and obtains a feature map f with the size of 28 multiplied by 512 0 . The convolutional neural network includes 10 convolutional layers and 3 pooling layers. First, after two convolutions with 64 convolution kernels, pooling is performed once, and then 128 volumes are passedPooling again after twice convolution of the kernels, pooling again after three times of convolution of 256 convolution kernels, and finally pooling again after three times of convolution of 512 convolution kernels to obtain a feature map f 0 . Then f is mixed 0 Randomly cutting the image into n area features f with the size of 6 multiplied by 512 i (i =1, 2.... N), each region being processed separately by a region attention unit. The region attention unit is realized by an attention network which comprises a pooling layer, two convolution layers with convolution kernels of 512 and 128 respectively, a full connection layer and a sigmoid layer. By calculating the input regional characteristics f i Attention weight α of (i =0,1,.., n) i (i =0, 1.. Eta., n), for the region feature f i Weighting to obtain a global attention representation f m And the region coding mechanism is assisted to be optimized from a global angle, and the weight parameters are adaptively adjusted.
Wherein the attention weight α of the region feature i Expressed as:
α i =sigmoid(f i T ·q)
wherein sigmoid () is a nonlinear activation function, f i T The region characteristics after the transposition, the q full link layer parameters,
global attention representation f m Expressed as:
Figure BDA0003887076360000111
wherein n represents the number of regions, α i Attention weight, f, representing the characteristics of a region i The region features are represented.
Using a regional bias penalty in the regional attention unit for constraining the attention weight α i I.e. to restrict the existence of a certain region f i Attention weight α of (i =1, 2.. Said., n) i Larger than the original face image f with edges 0 Weight of alpha 0 The attention degree to important areas is improved through the 'encouraging' RCN model, so that the model can obtain better areas and global representation weight values.
The area deviation loss function is expressed as:
L RB =max{0,δ-(α max0 )}
where LRB denotes regional bias loss, δ is a hyper-parameter, α max Representing the maximum weight of all local regions.
The global attention unit is implemented by an attention network comprising a fully connected layer and a sigmoid layer. Characterizing the region f i (i =0,1,. Eta., n) and the global representation feature f, respectively m Are connected in series to obtain the target characteristic (f) i :f m ]The attention weight β is then derived by the global attention unit i (i =0,1,. Ang., n), pair [ f [ ] i :f m ]And weighting to obtain a final feature representation P, and finally identifying and classifying the P.
Wherein the attention weight beta of the target feature i Expressed as:
Figure BDA0003887076360000112
wherein sigmoid () is a nonlinear activation function,
Figure BDA0003887076360000113
the characteristics which are transposed after the area characteristics and the global characteristics are connected in series are shown,
Figure BDA0003887076360000114
representing the full connection layer parameters.
The final feature representation P is expressed as:
Figure BDA0003887076360000115
wherein n represents the number of regions, α i Attention weight, β, representing a feature of a region i Represents the target feature attention weight, [ f ] i :f m ]Representing the target feature.
Fig. 2 is another flowchart of a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application.
As shown in FIG. 2, the classroom participation degree identification method of area coding and sample balance optimization comprises the steps of capturing real-time online learning pictures of educated persons by using a camera, and synchronously performing data preprocessing; inputting the low participation samples into a StarGAN model, generating the low participation samples with different styles through a mapping network or a style encoder, and expanding the number of the few samples of the database to improve the influence caused by imbalance of the data set; inputting original data and the generated low participation sample into the RCN together, and performing region coding by learning attention weights of different face regions to enable the model to focus more on the region with larger weight; and obtaining an engagement recognition result by the on-line collected real-time learning video through the trained engagement recognition framework.
Fig. 3 is a schematic structural diagram of an online learning low-participation image generated based on a StarGAN model by a classroom participation identification method based on region coding and sample balance optimization according to an embodiment of the present application.
As shown in FIG. 3, random Gaussian noise z is combined with a reference image
Figure BDA0003887076360000121
Respectively input into the mapping network and the style encoder to generate the target style characteristics
Figure BDA0003887076360000123
Characterizing object style
Figure BDA0003887076360000122
And inputting the given image x into a generator G to generate a false image, and finally identifying the generated image by an identifier D to obtain a low participation sample.
Fig. 4 is a schematic diagram of a low-participation sample generated based on the StarGAN model by the area coding and sample balance optimized classroom participation identification method according to the embodiment of the present application.
As shown in fig. 4, given an input image and a reference image, different low-participation generated image samples are obtained after different training iterations.
Fig. 5 is a schematic structural diagram of a feature extraction convolutional neural network of a classroom participation identification method for regional coding and sample balance optimization according to an embodiment of the present application.
As shown in fig. 5, the feature extraction unit performs feature extraction by using a convolutional neural network, performs feature extraction by using a facial expression image with a size of 224 × 224 × 3 as an input, reduces the dimensions of the width and the height of the features after passing through the VGG16 model, and increases the number of channels, thereby finally obtaining a feature map with dimensions of 28 × 28 × 512.
Fig. 6 is a schematic diagram of an RCN model-based engagement recognition framework of a classroom engagement recognition method for region coding and sample balance optimization according to an embodiment of the present application.
As shown in FIG. 6, the original image and the generated image are input to the feature extraction unit for feature extraction to obtain a feature map f 0 A 1 is to f 0 Randomly cutting n area characteristics f i (i =0, 1.., n), calculating the inputted region feature f by the region attention unit i Attention weight α of (i =0, 1.., n) i (i =0, 1.. Eta., n), and for the region feature f i Weighting to obtain a global attention representation f m Characterizing the region f i (i =0, 1.. N.) is associated with the global representation feature f, respectively m Are connected in series to obtain the target characteristic (f) i :f m ]Then the attention weight β is obtained by the global attention unit i (i =0,1,. Eta., n), and pair [ f i :f m ]Weighting is performed to obtain the final feature representation P.
Fig. 7 is a schematic structural diagram of a classroom participation identification device with area coding and sample balance optimization according to a second embodiment of the present application.
As shown in fig. 7, the classroom participation identification device for area coding and sample balance optimization comprises:
the first obtaining module 10 is configured to obtain video data of online learning of a student, and generate original sample data according to the video data, where the original sample data includes high participation sample data and low participation sample data;
the generating module 20 is configured to input the low participation sample data into the StarGAN model, and generate target low participation samples with different styles;
the training module 30 is configured to input the original sample data and the target low participation sample into the RCN model for training, so as to obtain a trained RCN model;
the second obtaining module 40 is configured to obtain video data to be identified, and generate image data to be identified according to the video data to be identified;
and the recognition module 50 is configured to input the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
The classroom participation degree identification device with regional coding and sample balance optimization comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring video data of online learning of students and generating original sample data according to the video data, and the original sample data comprises high participation sample data and low participation sample data; the generating module is used for inputting the low participation sample data into the StarGAN model and generating target low participation samples with different styles; the training module is used for inputting original sample data and target low participation samples into the RCN model for training to obtain a trained RCN model; the second acquisition module is used for acquiring the video data to be identified and generating image data to be identified according to the video data to be identified; and the recognition module is used for inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result. Therefore, the technical problems of extremely unbalanced sample distribution and face shielding by hands in the participation identification task of the conventional participation identification method can be solved, a low-participation sample is generated by providing a StarGAN model, the participation database is enhanced, and meanwhile, a region coding network for face region coding is provided, so that the attention weights of different face regions can be adaptively learned, modeling feature learning and shielding region coding are combined, and the discrimination and the robustness of a network model are remarkably improved.
Further, in this embodiment of the present application, generating original sample data according to video data includes:
defining a participation degree label of the video data by utilizing manual work and prior information;
extracting an image frame from the video data, cutting a face area of the extracted image frame to obtain a face image as original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
In order to achieve the above embodiments, the present application further proposes a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the classroom participation identification method for area coding and sample balance optimization of the above embodiments.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are exemplary and should not be construed as limiting the present application and that changes, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A classroom participation identification method based on regional coding and sample balance optimization is characterized by comprising the following steps:
the method comprises the steps of obtaining video data of on-line learning of a student, and generating original sample data according to the video data, wherein the original sample data comprises high-participation sample data and low-participation sample data;
inputting the low participation sample data into a StarGAN model to generate target low participation samples with different styles;
inputting the original sample data and the target low participation sample into an RCN model for training to obtain a trained RCN model;
acquiring video data to be identified, and generating image data to be identified according to the video data to be identified;
and inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
2. The method of claim 1, wherein said generating original sample data from said video data comprises:
defining a participation degree label of the video data by utilizing manual and prior information;
and extracting an image frame from the video data, cutting and extracting a face area of the image frame to obtain a face image as the original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
3. The method of claim 1, wherein the StarGAN model comprises: before inputting the low participation sample data into a StarGAN model to generate target low participation samples with different styles, the mapping network, the style encoder, the generator and the discriminator further comprise:
acquiring low participation training data, wherein the low participation training data are face images;
inputting the low participation training data into a StarGAN model for training, and performing iterative optimization on the StarGAN model through a loss function.
4. The method of claim 3, wherein the loss function of the StarGAN model comprises: confrontation loss, style reconstruction loss, diversity sensitivity loss, and cycle consistency loss, wherein,
the challenge loss is expressed as:
Figure FDA0003887076350000011
wherein L is adv Representing the countermeasure loss, E () representing the mathematical expectation value, x representing the input image, y representing the original domain of the input image, D y (x) Is the output of the discriminator in the original domain y,
Figure FDA0003887076350000012
representing the target domain, z represents random gaussian noise,
Figure FDA0003887076350000013
representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,
Figure FDA0003887076350000014
representing the output of the discriminator on the image generated by the generator,
Figure FDA0003887076350000021
the representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the loss of style reconstruction is expressed as:
Figure FDA0003887076350000022
wherein L is sty Representing a loss of stylistic reconstruction, E () representing a mathematical expectation value, x representing an input image, y representing an original field of the input image,
Figure FDA0003887076350000023
representing the target domain, z represents random gaussian noise,
Figure FDA0003887076350000024
representing the style characteristics of the target domain generated by the mapping network from random gaussian noise,
Figure FDA0003887076350000025
a representation generator generates a false image with a field of y according to the input image and the target style characteristics;
the diversity sensitivity loss is expressed as:
Figure FDA0003887076350000026
wherein L is ds Representing loss of diversity sensitivity, E () representing the mathematical expectation value, z 1 And z 2 Representing a random gaussian noise vector and representing the noise,
Figure FDA0003887076350000027
and
Figure FDA0003887076350000028
respectively representing a vector z of random gaussian noise by a mapping network 1 And z 2 Outputting the obtained style characteristic vector and outputting the style characteristic vector,
Figure FDA0003887076350000029
the representation generator is based on the input image and the style characteristics
Figure FDA00038870763500000210
The image that is generated is displayed on the display,
Figure FDA00038870763500000211
the representation generator is based on the input image and the style characteristics
Figure FDA00038870763500000212
The generated image;
the cycle consistency loss is expressed as:
Figure FDA00038870763500000213
wherein L is cyc Representing a loss of cyclic consistency, E () representing a mathematical expectation value, x representing the input image, y representing the original domain of the input image,
Figure FDA00038870763500000214
representing the target domain, z represents random gaussian noise,
Figure FDA00038870763500000215
is an estimated stylistic encoding of the input image x,
Figure FDA00038870763500000216
representing a false image to be generated using a generator
Figure FDA00038870763500000217
And
Figure FDA00038870763500000218
reconstructing to obtain a style
Figure FDA00038870763500000219
The image of (a) is displayed on the display,
Figure FDA00038870763500000220
the representation generator is based on the input image and the style characteristics
Figure FDA00038870763500000221
The generated image;
optimizing the StarGAN model using an objective function, wherein the objective function is represented as:
min G,F,E max D L advsty L styds L dscyc L cyc;
wherein, min G,F,E Representation minimization by training generators, mapping networks and style encodersObjective function, max D Representing maximization of an objective function, L, by training an arbiter adv Denotes the loss of antagonism, L sty Represents a loss of style reconstruction, L ds Indicating a loss of diversity sensitivity, L cyc Denotes the loss of cyclic consistency, λ sty 、λ ds And λ cyc Is a hyper-parameter used to balance losses.
5. The method of claim 4, wherein said inputting the low engagement sample data into a StarGAN model, generating target low engagement samples having different styles, comprises:
inputting the face image in the low-participation sample data into a StarGAN model, generating different style characteristics through the mapping network or the style encoder, and generating target low-participation samples with different styles through a generator according to the input face image and the different style characteristics.
6. The method of claim 1, wherein the RCN model comprises a feature extraction unit, a region attention unit, and a global attention unit, and the inputting the original sample data and the target low participation sample into the RCN model for training to obtain a trained RCN model comprises:
inputting the original sample data and the target low participation sample into an RCN model, and performing feature extraction on the original sample data and the target low participation sample through the feature extraction unit to obtain local area features of the sample;
in a feature space, the region attention unit learns the attention weights of different face regions to perform region coding on the local region features of the sample to obtain global features of the sample;
respectively connecting the local area characteristics of the sample with the global characteristics of the sample in series to obtain sample characteristics, obtaining attention weights of the sample characteristics through the global attention unit, and performing weighted fusion on the sample characteristics to obtain final sample characteristics;
and according to the final sample characteristics, performing iterative updating and optimization on the network parameters of the RCN model by using an SGD algorithm through combining the regional deviation loss and the cross entropy loss to obtain the trained RCN model.
7. The method of claim 6, wherein the inputting the image data to be recognized into the trained RCN model to obtain an engagement recognition result comprises:
inputting the image data to be identified into the feature extraction unit for feature extraction to obtain a feature map, and randomly cutting the feature map into a preset number of regional features;
inputting the regional features into the regional attention unit, calculating attention weights of the regional features, and weighting the regional features to obtain global features;
and respectively connecting the region features with the global features in series to obtain target features, obtaining attention weights of the target features through the global attention unit, weighting the target features to obtain final features, and identifying and classifying the final features to obtain a participation identification result of the image data to be identified.
8. An area coding and sample balance optimized classroom participation identification device, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring video data of on-line learning of a student and generating original sample data according to the video data, and the original sample data comprises high-participation sample data and low-participation sample data;
the generating module is used for inputting the low participation sample data into a StarGAN model and generating target low participation samples with different styles;
the training module is used for inputting the original sample data and the target low participation sample into an RCN model for training to obtain a trained RCN model;
the second acquisition module is used for acquiring video data to be identified and generating image data to be identified according to the video data to be identified;
and the recognition module is used for inputting the image data to be recognized into the trained RCN model to obtain a participation degree recognition result.
9. The apparatus of claim 8, wherein said generating original sample data from said video data comprises:
defining a participation label of the video data by utilizing manual work and prior information;
and extracting an image frame from the video data, cutting and extracting a face area of the image frame to obtain a face image as the original sample data, wherein the original sample data is divided into high-participation sample data and low-participation sample data according to the participation degree label.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method of any one of claims 1-7.
CN202211246980.4A 2022-10-12 2022-10-12 Classroom participation identification method and device based on region coding and sample balance optimization Pending CN115439915A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211246980.4A CN115439915A (en) 2022-10-12 2022-10-12 Classroom participation identification method and device based on region coding and sample balance optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211246980.4A CN115439915A (en) 2022-10-12 2022-10-12 Classroom participation identification method and device based on region coding and sample balance optimization

Publications (1)

Publication Number Publication Date
CN115439915A true CN115439915A (en) 2022-12-06

Family

ID=84251064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211246980.4A Pending CN115439915A (en) 2022-10-12 2022-10-12 Classroom participation identification method and device based on region coding and sample balance optimization

Country Status (1)

Country Link
CN (1) CN115439915A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291819A (en) * 2020-02-19 2020-06-16 腾讯科技(深圳)有限公司 Image recognition method and device, electronic equipment and storage medium
CN111597978A (en) * 2020-05-14 2020-08-28 公安部第三研究所 Method for automatically generating pedestrian re-identification picture based on StarGAN network model
CN113159002A (en) * 2021-05-26 2021-07-23 重庆大学 Facial expression recognition method based on self-attention weight auxiliary module
CN113158872A (en) * 2021-04-16 2021-07-23 中国海洋大学 Online learner emotion recognition method
CN113344479A (en) * 2021-08-06 2021-09-03 首都师范大学 Online classroom-oriented learning participation intelligent assessment method and device
CN113421187A (en) * 2021-06-10 2021-09-21 山东师范大学 Super-resolution reconstruction method, system, storage medium and equipment
CN113537254A (en) * 2021-08-27 2021-10-22 重庆紫光华山智安科技有限公司 Image feature extraction method and device, electronic equipment and readable storage medium
CN113936317A (en) * 2021-10-15 2022-01-14 南京大学 Priori knowledge-based facial expression recognition method
CN114065874A (en) * 2021-11-30 2022-02-18 河北省科学院应用数学研究所 Medicine glass bottle appearance defect detection model training method and device and terminal equipment
CN114973126A (en) * 2022-05-17 2022-08-30 中南大学 Real-time visual analysis method for student participation degree of online course

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291819A (en) * 2020-02-19 2020-06-16 腾讯科技(深圳)有限公司 Image recognition method and device, electronic equipment and storage medium
CN111597978A (en) * 2020-05-14 2020-08-28 公安部第三研究所 Method for automatically generating pedestrian re-identification picture based on StarGAN network model
CN113158872A (en) * 2021-04-16 2021-07-23 中国海洋大学 Online learner emotion recognition method
CN113159002A (en) * 2021-05-26 2021-07-23 重庆大学 Facial expression recognition method based on self-attention weight auxiliary module
CN113421187A (en) * 2021-06-10 2021-09-21 山东师范大学 Super-resolution reconstruction method, system, storage medium and equipment
CN113344479A (en) * 2021-08-06 2021-09-03 首都师范大学 Online classroom-oriented learning participation intelligent assessment method and device
CN113537254A (en) * 2021-08-27 2021-10-22 重庆紫光华山智安科技有限公司 Image feature extraction method and device, electronic equipment and readable storage medium
CN113936317A (en) * 2021-10-15 2022-01-14 南京大学 Priori knowledge-based facial expression recognition method
CN114065874A (en) * 2021-11-30 2022-02-18 河北省科学院应用数学研究所 Medicine glass bottle appearance defect detection model training method and device and terminal equipment
CN114973126A (en) * 2022-05-17 2022-08-30 中南大学 Real-time visual analysis method for student participation degree of online course

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAI WANG等: "Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 29, pages 1 *
YUNJEY CHOI等: "StarGAN v2: Diverse Image Synthesis for Multiple Domains", 《ARXIV:1912.01865V2》, pages 1 - 3 *

Similar Documents

Publication Publication Date Title
Matern et al. Exploiting visual artifacts to expose deepfakes and face manipulations
CN110889672B (en) Student card punching and class taking state detection system based on deep learning
CN109558832A (en) A kind of human body attitude detection method, device, equipment and storage medium
CN110263681A (en) The recognition methods of facial expression and device, storage medium, electronic device
CN114359526B (en) Cross-domain image style migration method based on semantic GAN
Li et al. Image manipulation localization using attentional cross-domain CNN features
CN113283334B (en) Classroom concentration analysis method, device and storage medium
Ververas et al. Slidergan: Synthesizing expressive face images by sliding 3d blendshape parameters
CN111275638A (en) Face restoration method for generating confrontation network based on multi-channel attention selection
Gafni et al. Wish you were here: Context-aware human generation
CN113112416A (en) Semantic-guided face image restoration method
Liu et al. Modern architecture style transfer for ruin or old buildings
CN116403262A (en) Online learning concentration monitoring method, system and medium based on machine vision
CN115731596A (en) Spontaneous expression recognition method based on progressive label distribution and depth network
CN114549341A (en) Sample guidance-based face image diversified restoration method
CN112070181A (en) Image stream-based cooperative detection method and device and storage medium
CN114841887B (en) Image recovery quality evaluation method based on multi-level difference learning
CN108665455B (en) Method and device for evaluating image significance prediction result
CN115439915A (en) Classroom participation identification method and device based on region coding and sample balance optimization
CN110210574A (en) Diameter radar image decomposition method, Target Identification Unit and equipment
CN112115779B (en) Interpretable classroom student emotion analysis method, system, device and medium
JP7362924B2 (en) Data augmentation-based spatial analysis model learning device and method
Li et al. Face mask removal based on generative adversarial network and texture network
CN114049303A (en) Progressive bone age assessment method based on multi-granularity feature fusion
Narayana Improving gesture recognition through spatial focus of attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221206

RJ01 Rejection of invention patent application after publication