CN114519789A - Classroom scene classroom switching discrimination method and device and electronic equipment - Google Patents

Classroom scene classroom switching discrimination method and device and electronic equipment Download PDF

Info

Publication number
CN114519789A
CN114519789A CN202210102525.0A CN202210102525A CN114519789A CN 114519789 A CN114519789 A CN 114519789A CN 202210102525 A CN202210102525 A CN 202210102525A CN 114519789 A CN114519789 A CN 114519789A
Authority
CN
China
Prior art keywords
videos
area
different
classroom
student
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210102525.0A
Other languages
Chinese (zh)
Other versions
CN114519789B (en
Inventor
陈奕名
王超
霍卫涛
马丁
阚海鹏
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghong Software Technology Co ltd
Original Assignee
Beijing Jinghong Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghong Software Technology Co ltd filed Critical Beijing Jinghong Software Technology Co ltd
Priority to CN202210102525.0A priority Critical patent/CN114519789B/en
Publication of CN114519789A publication Critical patent/CN114519789A/en
Application granted granted Critical
Publication of CN114519789B publication Critical patent/CN114519789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to a classroom scene classroom switching discrimination method, a classroom scene classroom switching discrimination device and electronic equipment, wherein the method comprises the following steps: acquiring videos of a student area and videos of a teaching area in the same time period, and capturing photos at different time points from the videos of the student area; counting the dressing classification and the dressing position distribution of students at different time points in the classroom based on a preset clothing detection algorithm; the dressing classifications and the dressing position distribution of the students at different time points are arranged according to a time sequence, and different lessons in the videos of the student areas and the time of the lessons are judged; and according to the time of the class to the class corresponding to different classes contained in the video of the student area, cutting out sub-videos of different classes from the video of the teaching area. The scheme is used for solving the technical problems that in the prior art, the manual cutting scheme can slow the video line feeding speed, the cutting is directly carried out according to the schedule time, and the special scenes cannot be compatible.

Description

Classroom scene classroom switching discrimination method and device and electronic equipment
Technical Field
The present disclosure relates to the field of video segmentation, and in particular, to a classroom scene classroom switching determination method and apparatus, and an electronic device.
Background
Along with off-line recording, the number of on-line playback education scenes is increased, and the intelligent classroom switching is particularly important. Especially, when the scene is complicated on line, the teacher is difficult to avoid the behaviors of class pressing and the like, and the interval time between every two classes is different, so that the time cutting of the class playback on line is difficult.
In the prior art, the on-line playback is usually performed by manually cutting the on-line time, but the manual cutting cost is too high, and particularly when the recorded videos are too many, the manual cutting scheme can seriously slow down the video on-line speed.
On the other hand, if a manual cutting scheme is not adopted, cutting is directly carried out according to schedule time, a teacher needs to be forced to finish the cutting within a specified time, and scenes which often appear such as class at night and class pressing cannot be compatible.
Disclosure of Invention
The invention aims to provide a classroom scene classroom switching discrimination method, a classroom scene classroom switching discrimination device and electronic equipment, which are used for solving the technical problems that in the prior art, a manual cutting scheme can slow the online speed of a video, the video is directly cut according to the schedule time, and special scenes cannot be compatible.
In order to achieve the above object, a first aspect of the present disclosure provides a classroom scene classroom switching discrimination method, where the method includes:
acquiring videos of a student area and videos of a teaching area in the same time period, which are acquired by an image acquisition unit arranged in the classroom, and capturing photos at different time points from the videos of the student area;
counting the dressing classification and the dressing position distribution of students at different time points in the classroom based on the pictures of different time points intercepted from the videos of the student areas and a preset clothing detection algorithm;
the dressing classifications and the dressing position distribution of the students at different time points are arranged according to a time sequence, and different classes in the video of the student area, and the class-giving time and the class-leaving time corresponding to the different classes are judged;
according to the class time and the class leaving time corresponding to different classes contained in the video of the student area and the time corresponding relation between the video of the student area and the video of the teaching area, the sub-videos of different classes are cut out from the video of the teaching area.
Optionally, the statistics of the dressing classifications and the dressing position distributions of the students at different time points in the classroom based on the pictures of different time points captured from the videos of the student areas and a preset clothing detection algorithm includes:
aiming at the photos of different time points intercepted from the videos of the student areas, extracting the interested area corresponding to each student from the photos by adopting a target detection algorithm, and recording the position distribution of each interested area in the photos;
and for each region of interest, obtaining the dressing classification of each student by adopting the clothing detection algorithm, and further counting the dressing classifications of the students at different time points in the classroom.
Optionally, the clothing detection algorithm is a multi-label algorithm; the multi-label detection algorithm adopts a convolutional neural network to extract the characteristics of the input pictures and outputs classification results of different levels at different levels of the convolutional neural network;
wherein the number of levels of the convolutional neural network is greater than or equal to 2; and the classification results of different grades are used for representing dressing styles of students.
Optionally, the convolutional neural network comprises a plurality of backbone networks; the photo generates feature data of a first full connection layer through a first backbone network of the plurality of backbone networks, and the feature data of the first full connection layer is processed by utilizing a softmax function to obtain a corresponding first-stage classification result;
the feature data of the first full connection layer is processed through a second backbone network of the plurality of backbone networks to generate feature data of a second full connection layer, and a softmax function is used for processing the feature data of the second full connection layer to obtain a corresponding second-stage classification result;
and in other backbone networks of the plurality of backbone networks, sequentially analogizing until the convolutional neural network outputs classification results of all levels.
Optionally, the loss function of the convolutional neural network is the sum of the loss functions of the different hierarchical levels;
when calculating the loss functions of different levels, for each level, respectively calculating the cross entropy loss function of each level as the loss function of each level.
Optionally, the backbone network of the convolutional neural network is a network formed by inner volume convolution, and the network formed by the inner volume convolution adopts an inner volume convolution operator.
Optionally, the method further includes:
capturing photos at different time points from the video of the teaching area;
identifying the category of the teaching content at different time points based on the pictures of different time points intercepted from the video of the teaching area and a preset teaching content identification algorithm;
and marking the categories of the sub-videos of different classes according to the categories of the teaching contents of different time points and by combining the class-giving time and the class-leaving time corresponding to different classes.
Optionally, the method for recognizing the teaching content at different time points based on the pictures at different time points captured from the video in the teaching area and a preset teaching content recognition algorithm includes:
and classifying each input photo by adopting a convolutional neural network based on the photos of different time points intercepted from the video of the teaching area, and outputting the position of the teaching content in the photo and the category of the teaching content.
Optionally, the backbone network of the convolutional neural network is a network formed by inner volume convolution, and the network formed by the inner volume convolution adopts an inner volume convolution operator.
A second aspect of the present disclosure provides a classroom scene classroom switching determination device, including:
the acquisition module is used for acquiring videos of a student area and videos of a teaching area in the same time period, which are acquired by an image acquisition unit arranged in the classroom, and capturing photos at different time points from the videos of the student area;
the statistical module is used for counting the dressing classification and the dressing position distribution of students at different time points in the classroom based on the pictures of different time points intercepted from the videos of the student areas and a preset clothing detection algorithm;
the judging module is used for arranging the dressing classification and the dressing position distribution of the students at different time points according to the time sequence, and judging different classes in the video of the student area and the class-on time and class-off time corresponding to the different classes;
the video cutting module is used for cutting out the sub-videos of different classes from the videos of the teaching area according to the class time and the class leaving time corresponding to different classes contained in the videos of the student area and the time corresponding relation between the videos of the student area and the videos of the teaching area.
A third aspect of the disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
A fourth aspect of the present disclosure provides an electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of the first aspect.
According to the technical scheme, the dressing distribution of students in a classroom is judged, the distribution is arranged according to the time sequence, whether the students are in class in the same batch is judged, the class-giving time and the class-leaving time of different classes are judged, and then the sub-videos of different classes are cut out from the videos of the teaching area according to the time corresponding relation between the videos of the student area and the videos of the teaching area.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1A is a schematic illustration of a video of a teaching area shown in accordance with an exemplary embodiment;
FIG. 1B is a schematic diagram of a video of a student area shown in accordance with an exemplary embodiment;
fig. 2 is a flow diagram illustrating a classroom scene classroom switch discrimination method in accordance with an exemplary embodiment;
FIG. 3 is a schematic illustration of a face region and a dressing region shown in accordance with an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating a detection network in accordance with an exemplary embodiment;
FIG. 5 is a schematic illustration of a teaching area shown in accordance with an exemplary embodiment;
FIG. 6 is a schematic diagram illustrating a detection method according to an exemplary embodiment;
fig. 7 is a block diagram illustrating a classroom switching discrimination apparatus according to an exemplary embodiment;
FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
The scheme in the present disclosure is applicable to various classroom scenes, including classroom non-fixed scenes and classroom fixed scenes. The classrooms are not fixed and generally belong to a scene similar to a university model, namely, the number of classes and courses in each classroom are different every day. The fixed scenes of the classroom, which generally belong to the scenes of primary school, junior middle school, high school and the like, namely, students of a unified class can be fixed in the classroom to go on class, but the school timetables are different every day.
In order to better judge different classroom contents, two cameras can be deployed in a classroom, one camera shoots a teacher to obtain videos of a teaching area, namely videos needing to be cut, as shown in fig. 1A, videos of the teaching area are shot, one camera shoots a student to obtain videos of the student area, as shown in fig. 1B, videos of the student area are shot.
Referring to fig. 2, a classroom scene classroom switching discrimination method in the disclosed embodiment includes the following steps.
Step 201, obtaining a video of a student area and a video of a teaching area in the same time period, which are collected by an image collecting unit arranged in the classroom, and capturing photos at different time points from the video of the student area.
In the embodiment of the present disclosure, the distinguishing and the cutting may be performed in real time during the video recording process, or the previously recorded video may be reviewed at a certain time interval, which is not limited in the present disclosure.
And step 202, counting the dressing classification and the dressing position distribution of students at different time points in the classroom based on the pictures of different time points intercepted from the videos of the student areas and a preset clothing detection algorithm.
The clothing classification may include, among other things, the color, style, decoration, etc. of the clothing, and the distribution of positions of the clothing is used to indicate the distribution of positions of different clothing in a classroom.
Step 203, the dressing classifications and the dressing position distributions of the students at different time points are arranged according to a time sequence, and different classes in the video of the student area, and the class time and the class leaving time corresponding to the different classes are judged.
Because students in class are different in different classes, the dressing distribution is completely different, different classes can be judged, the dressing position distribution during class period should be very stable, students can walk back and forth during class period, and different classes and class period can be judged according to the dressing classification and the dressing position distribution.
And 204, cutting out sub-videos with different lessons from the videos in the teaching area according to the lesson-taking time and the lesson-leaving time corresponding to different lessons contained in the videos in the student area and the time corresponding relation between the videos in the student area and the videos in the teaching area.
According to the scheme of the embodiment of the disclosure, the dressing distribution of students in a classroom is judged, the distribution is arranged according to the time sequence, whether the students in the same batch have class is judged, and the class-giving time and the class-leaving time of different classes are accurately judged by combining the position distribution of the dressing, so that the manual cutting scheme is avoided, the cutting is directly carried out according to the schedule time, and the fast and accurate classroom switching judgment method and the teaching video cutting scheme are provided.
In the embodiment of the present disclosure, the position of a person in a video is detected according to the video of a student area, wherein the person detection algorithm may use a common detection algorithm, for example: ssd algorithm, fast rcnn algorithm and yolo algorithm. The centernet network is used in this disclosure to detect personnel locations; then, clothing classification including clothing type, color and the like is counted by adopting a preset clothing detection algorithm aiming at the roi (region of interest) of each person. Finally, the clothing distribution of the statistical personnel can achieve the purpose of distinguishing different classes.
In the embodiment of the present disclosure, the roi is an output region of the detection network, and a region frame (roi region) of a desired target and a corresponding category can be obtained by inputting a picture into the trained detection network. Taking the photo captured from the video of the student area in the embodiment of the present disclosure as an example, for each input photo, the roi is extracted first, that is, the area where each student is located, for example, 30 students will extract 30 roi areas, and then feature extraction is performed on each roi.
The reason for this is that in a classroom ambulatory scenario, different classes in the same classroom are from different batches of students. The class times can usually be distinguished from the difference in the clothing of the students in the classroom. In the conventional scheme, the judgment of different people is carried out by a face recognition method, but the area of the face of a student is usually small in a camera picture, and the area for wearing information such as clothes is large compared with the area of the face, as shown in fig. 3. By combining the existing deep learning detection and identification model, the accuracy rate of detecting and identifying small targets (human faces) is very low, but the accuracy rate of detecting and identifying large targets (dresses and the like) is much higher, and in conclusion, the clothes detection is utilized to judge classroom switching, and compared with the human face judgment, the judgment accuracy rate can be greatly improved.
Next, a clothing detection algorithm in the embodiment of the present disclosure will be explained.
In the embodiment of the disclosure, the clothing detection algorithm is a multi-label algorithm; in the conventional detection algorithm, each type of target is output with only one label, such as the network on the left side of fig. 4. For the clothes detection scene, the style may be a middle category among the large categories of the upper clothes, and the color may be a subclass among the middle categories of the style, or more subclasses may be provided. If all the classes are classified into different classes, the network output is excessive, the network is redundant, and the training is difficult. In the embodiment of the present disclosure, the feature extraction characteristics of the convolutional neural network are utilized to output at different levels to solve the problem of excessive output, and the structure of the convolutional neural network is shown as the network on the right side of fig. 4.
With continued reference to fig. 4, the detection network in the embodiment of the present disclosure uses the latest rednet18 backbone network instead of the rednet50 backbone network, because the rednet18 uses the convolution operator, the final output accuracy is improved more than the convolution operation. In addition, in the algorithm of the embodiment of the present disclosure, labels of different levels are output at different levels respectively by using such an intermediate output manner. For example, inputting a picture including clothes, first passing through a first rednet18 backbone network, and outputting the result of a broad classification (e.g., blue clothes); then the full connection layer 1 continues to output the result of the middle class (such as trimming and lengthening) through the second rednet18 network; finally, the full connection layer 2 outputs subclass classification results (such as buttons, necktie decorations and the like) through a third rednet18 network; this also follows the reasoning pattern of normal deep learning networks, and tiny features (buttons, etc.) need deeper network models to extract.
In the embodiment of the present disclosure, the feature data of the full connection layer may be processed through a softmax function to obtain classification results of each level, and in practical application, the feature data may also be processed in other manners, which is not limited in the present disclosure.
In the embodiment of the present disclosure, when the loss functions of different levels are calculated, for each level, a cross entropy loss function of each level is calculated as the loss function of each level, and the loss function of the convolutional neural network is the sum of the loss functions of different levels. The loss function of the network can be expressed as loss ═ ΣKcrossentropykK ∈ K, where K is the number of output classes, e.g., there are three classes, middle class, and large class in fig. 3, and then K ∈ K3, wherein crosntropy ═ Σ p (x) logq (x). Since the classification of each level calculates the loss function separately and then sums, not all classes are put together for calculation, which is more accurate.
In a possible implementation, the category of the sub-videos of different lessons can also be marked in combination with the videos of the teaching area, for example: the current lesson is a chinese lesson, a physical lesson, etc.
In the embodiment of the disclosure, after a video of a teaching area is obtained, photos at different time points are intercepted from the video of the teaching area; then, identifying the category of the teaching content at different time points based on the pictures of different time points intercepted from the video of the teaching area and a preset teaching content identification algorithm; and marking the categories of the sub-videos of different classes according to the categories of the teaching contents of different time points and by combining the class-giving time and the class-leaving time corresponding to different classes.
In the embodiment of the present disclosure, based on the photos at different time points captured from the video in the teaching area, a convolutional neural network may also be used to classify each input photo, and output the position of the teaching content in the photo and the category of the teaching content.
In one possible implementation, the teacher ppt teaching area can be directly detected for classification by using the cornernet, as shown in fig. 5. This patent uses the corn net as main detection and classification network, and the key point of this scheme lies in changing the backbone network of corn net into rednet50 to obtain higher rate of accuracy. The backbone Network of the original cornernet is a Hourglass Network, which mainly takes common convolution as a main Network, however, in the scheme of the embodiment of the disclosure, rednet takes inner volume convolution as a main Network, and the overall accuracy of the Network can be increased. In the embodiment of the present disclosure, the teaching area may be ppt, blackboard writing, or the like, and the embodiment of the present disclosure does not limit the form of the teaching area.
As shown in fig. 6, in the embodiment of the present disclosure, a classification network is used to directly classify ppt content, that is, a camera picture (a picture captured from a video in a teaching area) is input, and then, after passing through a cornernet network, coordinates of the top left corner of a frame in the ppt area in the picture, a width and a height of the frame (x0, y0, w, h) and a classification category are obtained, for example, a current class is a language class, and a classification tag is output corresponding to the language class.
Through the scheme in the embodiment of the disclosure, intelligent judgment, cutting and class marking of different classes can be realized, and the cutting speed and accuracy can be ensured.
Based on the same inventive concept, as shown in fig. 7, an embodiment of the present disclosure further provides a classroom scene classroom switching determination apparatus 700, including:
an obtaining module 701, configured to obtain a video of a student area and a video of a teaching area in the same time period, which are collected by an image collection unit arranged in the classroom, and capture photos at different time points from the video of the student area;
a statistic module 702, configured to count the dressing classifications and the dressing position distributions of the students at different time points in the classroom based on the photos at different time points captured from the videos in the student area and a preset clothing detection algorithm;
a determining module 703, configured to arrange the dressing classifications and the dressing position distributions of the students at different time points according to a time sequence, and determine different lessons in the video of the student area, and the class time and the class leaving time corresponding to the different lessons;
the video cutting module 704 is used for cutting out sub-videos with different classes from the videos in the teaching area according to the class time and the class leaving time corresponding to different classes contained in the videos in the student area and the time corresponding relation between the videos in the student area and the videos in the teaching area.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 8 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. As shown in fig. 8, the electronic device 800 may include: a processor 801, a memory 802. The electronic device 800 may also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communications component 805.
The processor 801 is configured to control the overall operation of the electronic device 800, so as to complete all or part of the steps described above. The memory 802 is used to store various types of data to support operation at the electronic device 800, such as instructions for any application or method operating on the electronic device 800 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the electronic device 800 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 805 may therefore include: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components, for performing the above-described classroom scene classroom switching discrimination method.
In another exemplary embodiment, a computer readable storage medium is also provided that includes program instructions which, when executed by a processor, implement the steps of the classroom scene classroom switch discrimination method described above. For example, the computer readable storage medium may be the memory 802 described above that includes program instructions executable by the processor 801 of the electronic device 800 to perform the classroom scene classroom switching discrimination method described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code sections for performing the above-mentioned classroom scene classroom switch discrimination method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (12)

1. A classroom scene classroom switching discrimination method is characterized by comprising the following steps:
acquiring videos of a student area and videos of a teaching area in the same time period, which are acquired by an image acquisition unit arranged in the classroom, and capturing photos at different time points from the videos of the student area;
counting the dressing classification and the dressing position distribution of students at different time points in the classroom based on the pictures of different time points intercepted from the videos of the student areas and a preset clothing detection algorithm;
the dressing classifications and the dressing position distribution of the students at different time points are arranged according to a time sequence, and different classes in the video of the student area, and the class-giving time and the class-leaving time corresponding to the different classes are judged;
according to the class time and the class leaving time corresponding to different classes contained in the video of the student area and the time corresponding relation between the video of the student area and the video of the teaching area, the sub-videos of different classes are cut out from the video of the teaching area.
2. The method as claimed in claim 1, wherein the step of counting the dressing classification and the position distribution of the dressing of the students in the classroom at different time points based on the photos of the student at different time points and the preset clothing detection algorithm, which are taken from the videos of the student area, comprises the steps of:
aiming at the photos of different time points intercepted from the videos of the student areas, extracting the interested area corresponding to each student from the photos by adopting a target detection algorithm, and recording the position distribution of each interested area in the photos;
and for each interested region, obtaining the dressing classification of each student by adopting the clothing detection algorithm, and further counting the dressing classifications of the students at different time points in the classroom.
3. The method of claim 1, wherein the garment detection algorithm is a multi-label algorithm; the multi-label detection algorithm adopts a convolutional neural network to perform feature extraction on an input photo, and classification results of different grades are output at different levels of the convolutional neural network;
wherein the number of levels of the convolutional neural network is greater than or equal to 2; and the classification results of different grades are used for representing dressing styles of students.
4. The method of claim 3, in which the convolutional neural network comprises a plurality of backbone networks; the photo generates feature data of a first full connection layer through a first backbone network of the plurality of backbone networks, and the feature data of the first full connection layer is processed by utilizing a softmax function to obtain a corresponding first-stage classification result;
the feature data of the first full connection layer is processed through a second backbone network of the plurality of backbone networks to generate feature data of a second full connection layer, and a softmax function is used for processing the feature data of the second full connection layer to obtain a corresponding second-stage classification result;
and in other backbone networks of the plurality of backbone networks, sequentially analogizing until the convolutional neural network outputs classification results of all levels.
5. The method of claim 3, wherein the loss function of the convolutional neural network is a sum of the loss functions of the different levels;
when calculating the loss functions of different levels, for each level, respectively calculating the cross entropy loss function of each level as the loss function of each level.
6. The method of claim 3, wherein the backbone network of the convolutional neural network is a network of convolution convolutions that employ a convolution operator.
7. The method of any one of claims 1-6, further comprising:
capturing photos at different time points from the video of the teaching area;
identifying the category of the teaching content at different time points based on the pictures of different time points intercepted from the video of the teaching area and a preset teaching content identification algorithm;
and marking the categories of the sub-videos of different classes according to the categories of the teaching contents of different time points and by combining the class-giving time and the class-leaving time corresponding to different classes.
8. The method of claim 7, wherein identifying the category of the teaching content at different time points based on the photos of different time points taken from the video of the teaching area and a preset teaching content identification algorithm comprises:
and classifying each input photo by adopting a convolutional neural network based on the photos at different time points intercepted from the video of the teaching area, and outputting the position of the teaching content in the photo and the category of the teaching content.
9. The method of claim 8, wherein the backbone network of convolutional neural networks is a network of inner volume convolutions that employ inner volume convolution operators.
10. A classroom scene classroom switching discriminating device is characterized by comprising:
the acquisition module is used for acquiring videos of a student area and videos of a teaching area in the same time period, which are acquired by an image acquisition unit arranged in the classroom, and capturing photos at different time points from the videos of the student area;
the statistical module is used for counting the dressing classification and the dressing position distribution of students at different time points in the classroom based on the pictures of different time points intercepted from the videos of the student areas and a preset clothing detection algorithm;
the judging module is used for arranging the dressing classification and the dressing position distribution of the students at different time points according to the time sequence, and judging different classes in the video of the student area and the class-on time and class-off time corresponding to the different classes;
the video cutting module is used for cutting out the sub-videos of different classes from the videos of the teaching area according to the class time and the class leaving time corresponding to different classes contained in the videos of the student area and the time corresponding relation between the videos of the student area and the videos of the teaching area.
11. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.
12. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 9.
CN202210102525.0A 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment Active CN114519789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210102525.0A CN114519789B (en) 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210102525.0A CN114519789B (en) 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114519789A true CN114519789A (en) 2022-05-20
CN114519789B CN114519789B (en) 2024-05-24

Family

ID=81597546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210102525.0A Active CN114519789B (en) 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114519789B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007003128A1 (en) * 2005-07-01 2007-01-11 Weigiao Huang A network teaching system and method
CN110599835A (en) * 2019-09-25 2019-12-20 淄博职业学院 Interactive computer remote education system
CN112001251A (en) * 2020-07-22 2020-11-27 山东大学 Pedestrian re-identification method and system based on combination of human body analysis and clothing color
CN112132079A (en) * 2020-09-29 2020-12-25 中国银行股份有限公司 Method, device and system for monitoring students in online teaching
CN112200818A (en) * 2020-10-15 2021-01-08 广州华多网络科技有限公司 Image-based dressing area segmentation and dressing replacement method, device and equipment
CN113052085A (en) * 2021-03-26 2021-06-29 新东方教育科技集团有限公司 Video clipping method, video clipping device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007003128A1 (en) * 2005-07-01 2007-01-11 Weigiao Huang A network teaching system and method
CN110599835A (en) * 2019-09-25 2019-12-20 淄博职业学院 Interactive computer remote education system
CN112001251A (en) * 2020-07-22 2020-11-27 山东大学 Pedestrian re-identification method and system based on combination of human body analysis and clothing color
CN112132079A (en) * 2020-09-29 2020-12-25 中国银行股份有限公司 Method, device and system for monitoring students in online teaching
CN112200818A (en) * 2020-10-15 2021-01-08 广州华多网络科技有限公司 Image-based dressing area segmentation and dressing replacement method, device and equipment
CN113052085A (en) * 2021-03-26 2021-06-29 新东方教育科技集团有限公司 Video clipping method, video clipping device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114519789B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN109697416B (en) Video data processing method and related device
CN109447169A (en) The training method of image processing method and its model, device and electronic system
CN111598164B (en) Method, device, electronic equipment and storage medium for identifying attribute of target object
CN106874826A (en) Face key point-tracking method and device
CN111225234A (en) Video auditing method, video auditing device, equipment and storage medium
CN111061898A (en) Image processing method, image processing device, computer equipment and storage medium
CN104077597B (en) Image classification method and device
CN110718067A (en) Violation behavior warning method and related device
CN112633313B (en) Bad information identification method of network terminal and local area network terminal equipment
CN110363233A (en) A kind of the fine granularity image-recognizing method and system of the convolutional neural networks based on block detector and Fusion Features
CN106407908A (en) Training model generation method and human face detection method and device
CN111582342A (en) Image identification method, device, equipment and readable storage medium
CN113076903A (en) Target behavior detection method and system, computer equipment and machine readable medium
Han et al. Improved visual background extractor using an adaptive distance threshold
Shah et al. Efficient portable camera based text to speech converter for blind person
CN110766645B (en) Target person recurrence map generation method based on person identification and segmentation
CN113705510A (en) Target identification tracking method, device, equipment and storage medium
CN106682669A (en) Image processing method and mobile terminal
CN105631410B (en) A kind of classroom detection method based on intelligent video processing technique
CN112417974A (en) Public health monitoring method
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN111626197A (en) Human behavior recognition network model and recognition method
CN114519789A (en) Classroom scene classroom switching discrimination method and device and electronic equipment
CN113822137A (en) Data annotation method, device and equipment and computer readable storage medium
Mentari et al. Detecting Objects Using Haar Cascade for Human Counting Implemented in OpenMV

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant