CN114519789B - Classroom scene classroom switching discriminating method and device and electronic equipment - Google Patents

Classroom scene classroom switching discriminating method and device and electronic equipment Download PDF

Info

Publication number
CN114519789B
CN114519789B CN202210102525.0A CN202210102525A CN114519789B CN 114519789 B CN114519789 B CN 114519789B CN 202210102525 A CN202210102525 A CN 202210102525A CN 114519789 B CN114519789 B CN 114519789B
Authority
CN
China
Prior art keywords
different
video
area
student
dressing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210102525.0A
Other languages
Chinese (zh)
Other versions
CN114519789A (en
Inventor
陈奕名
王超
霍卫涛
请求不公布姓名
马丁
阚海鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghong Software Technology Co ltd
Original Assignee
Beijing Jinghong Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghong Software Technology Co ltd filed Critical Beijing Jinghong Software Technology Co ltd
Priority to CN202210102525.0A priority Critical patent/CN114519789B/en
Publication of CN114519789A publication Critical patent/CN114519789A/en
Application granted granted Critical
Publication of CN114519789B publication Critical patent/CN114519789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to a classroom scene class switching discriminating method, a device and electronic equipment, wherein the method comprises the following steps: acquiring videos of student areas and videos of teaching areas in the same time period, and capturing photos at different time points from the videos of the student areas; based on a preset clothing detection algorithm, counting dressing classifications and dressing position distribution of students at different time points in the classroom; arranging dressing classifications and dressing position distributions of students at different time points according to time sequence, and judging different classes in videos of the student areas and class time; and cutting sub-videos of different classes from the video of the teaching area according to the class time corresponding to the different classes contained in the video of the student area. The technical problems that in the prior art, the manual cutting scheme can slow the video line speed, and the video line speed is directly cut according to the schedule time and cannot be compatible with special scenes are solved.

Description

Classroom scene classroom switching discriminating method and device and electronic equipment
Technical Field
The disclosure relates to the field of video segmentation, in particular to a classroom scene class switching judging method and device and electronic equipment.
Background
Along with offline lesson recording, the number of online playback educational scenes is increased, and intelligent switching of the lessons is particularly important. Especially when the scene is comparatively complicated under the line, whether the teacher goes on class at night, presses actions such as hall to be difficult to avoid, and the interval time between every section class is different, has caused the time cutting difficulty of playback class on line.
In the prior art, manual cutting is generally adopted for lesson loading and unloading time, and online playback is provided, but the manual cutting cost is too high, and especially when recorded videos are too much, the manual cutting scheme can seriously slow down the online speed of the videos.
On the other hand, if the manual cutting scheme is not adopted, the manual cutting scheme is directly used for cutting according to the schedule time, a teacher is required to be forced to finish the manual cutting scheme in a specified time, and the manual cutting scheme cannot be compatible with frequently-occurring scenes such as late lessons, pressing halls and the like.
Disclosure of Invention
The invention aims to provide a classroom scene class switching judging method, device and electronic equipment, which are used for solving the technical problems that a manual cutting scheme can slow the video line speed, and the video line speed is directly cut according to the time of a class schedule and cannot be compatible with a special scene in the prior art.
In order to achieve the above object, a first aspect of the present disclosure provides a classroom scene class switching discrimination method, the method including:
Acquiring videos of a student area and a teaching area in the same time period acquired by an image acquisition unit arranged in the classroom, and capturing photos at different time points from the videos of the student area;
Based on photos of different time points taken from videos of the student areas and a preset clothing detection algorithm, statistics is carried out on dressing classification and dressing position distribution of students at different time points in the classrooms;
sorting the dressing of the students at different time points and arranging the position distribution of the dressing according to time sequence, and distinguishing different classes in the video of the student area, and the class-taking time corresponding to the different classes;
And cutting sub-videos of different lessons from the video of the teaching area according to lesson time and lesson time corresponding to different lessons contained in the video of the student area and the time corresponding relation between the video of the student area and the video of the teaching area.
Optionally, based on photographs taken from the video of the student area at different time points and a preset clothing detection algorithm, statistics of dressing classifications and position distributions of dressing of students at different time points in the classroom include:
For photos of different time points taken from videos of the student areas, extracting an area of interest corresponding to each student from the photos by adopting a target detection algorithm, and recording the position distribution of each area of interest in the photos;
And (3) for each region of interest, adopting the clothing detection algorithm to obtain the dressing classification of each student, and further counting the dressing classifications of the students at different time points in the classroom.
Optionally, the clothing detection algorithm is a multi-label algorithm; the multi-label detection algorithm adopts a convolutional neural network to extract characteristics of the input photos, and outputs classification results of different grades at different levels of the convolutional neural network;
Wherein the number of levels of the convolutional neural network is greater than or equal to 2; the classification results of the different grades are used for representing the dressing style of the students.
Optionally, the convolutional neural network includes a plurality of backbone networks; generating characteristic data of a first full-connection layer by the photo through a first backbone network in the plurality of backbone networks, and processing the characteristic data of the first full-connection layer by using a softmax function to obtain a corresponding first-stage classification result;
The characteristic data of the first full-connection layer are processed through a second backbone network in the plurality of backbone networks to generate the characteristic data of the second full-connection layer, and the characteristic data of the second full-connection layer are processed through a softmax function to obtain a corresponding second-stage classification result;
and in other backbone networks of the plurality of backbone networks, analogizing is performed until the convolutional neural network outputs classification results of all levels.
Optionally, the loss function of the convolutional neural network is the sum of the loss functions of the different levels;
when calculating the loss functions of different levels, for each level, calculating the cross entropy loss function of each level as the loss function of each level.
Optionally, the backbone network of the convolutional neural network is a network formed by inner-coil convolution, and the network formed by inner-coil convolution adopts an inner-coil convolution operator.
Optionally, the method further comprises:
Capturing photos at different time points from the video of the teaching area;
identifying categories of teaching contents at different time points based on photos at different time points and a preset teaching content identification algorithm which are taken from videos of the teaching area;
and marking the categories of the sub-videos of different classes according to the categories of the teaching contents of different time points and combining the class-loading time and the class-unloading time corresponding to the different classes.
Optionally, identifying the category of the teaching content at different time points based on the photos at different time points taken from the video of the teaching area and a preset teaching content identification algorithm includes:
Based on photos of different time points taken from the video of the teaching area, classifying each input photo by adopting a convolutional neural network, and outputting the position of the teaching content in the photo and the category of the teaching content.
Optionally, the backbone network of the convolutional neural network is a network formed by inner-coil convolution, and the network formed by inner-coil convolution adopts an inner-coil convolution operator.
The second aspect of the present disclosure provides a classroom scene class switching discrimination device, including:
The acquisition module is used for acquiring videos of the student areas and videos of the teaching areas in the same time period acquired by the image acquisition unit arranged in the teaching room, and capturing photos at different time points from the videos of the student areas;
the statistics module is used for counting dressing classifications and dressing position distribution of students at different time points in the classrooms based on photos at different time points intercepted from the video of the student area and a preset clothing detection algorithm;
the judging module is used for arranging the dressing classification and the dressing position distribution of the students at different time points according to the time sequence, and judging different lessons in the video of the student area, and the lesson time corresponding to the different lessons;
And the video cutting module is used for cutting out the sub-videos of different lessons from the video of the teaching area according to the lesson time and lesson time corresponding to different lessons contained in the video of the student area and the time corresponding relation between the video of the student area and the video of the teaching area.
A third aspect of the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
A fourth aspect of the present disclosure provides an electronic device, comprising:
A memory having a computer program stored thereon;
A processor for executing the computer program in the memory to implement the steps of the method of the first aspect.
According to the technical scheme, through judging the dressing distribution of students in classrooms, arranging the distribution according to the time sequence, judging whether the students take lessons or not, judging the lesson time and the lesson time of different lessons, and then cutting out the sub-videos of different lessons from the video of the teaching area according to the time corresponding relation between the video of the student area and the video of the teaching area.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
FIG. 1A is a schematic diagram of a video of a teaching area shown according to an example embodiment;
FIG. 1B is a schematic diagram of a video of a student area shown according to an exemplary embodiment;
FIG. 2 is a flow chart illustrating a classroom scene class switch decision method in accordance with an exemplary embodiment;
FIG. 3 is a schematic diagram of a face region and a dressing region, according to an example embodiment;
FIG. 4 is a schematic diagram of a detection network shown according to an example embodiment;
FIG. 5 is a schematic diagram of a teaching area shown according to an example embodiment;
FIG. 6 is a schematic diagram illustrating a detection method according to an example embodiment;
FIG. 7 is a block diagram of a classroom switching decision device in accordance with an exemplary embodiment;
Fig. 8 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the disclosure, are not intended to limit the disclosure.
The scheme in the present disclosure can be applied to various classroom scenes, including a classroom unfixed scene and a classroom fixed scene. The classrooms are not fixed scenes, and generally belong to scenes in a similar university mode, namely, each classroom has different classes and courses every day. Classroom fixed scenes generally belong to primary, middle, high and medium scenes, namely students in a unified class can be fixed in one classroom for class, but daily school tables can be different.
In order to better judge different classroom contents, two cameras can be deployed in a classroom, one camera shoots a teacher to obtain videos of a teaching area, namely videos required to be cut, as shown in fig. 1A, are video shots of the teaching area, and one shoots students to obtain videos of a student area, as shown in fig. 1B, are video shots of the student area.
Referring to fig. 2, the classroom scene class switching discrimination method in the embodiment of the disclosure includes the following steps.
Step 201, obtaining videos of a student area and a teaching area in the same time period acquired by an image acquisition unit arranged in the classroom, and capturing photos of different time points from the videos of the student area.
In the embodiment of the disclosure, the discrimination and the cutting may be performed in real time during the video recording process, or may review the video recorded before at intervals of a certain period of time, which is not limited in this disclosure.
Step 202, based on photos of different time points taken from the video of the student area and a preset clothing detection algorithm, statistics of dressing classification and dressing position distribution of students at different time points in the classroom are carried out.
Wherein the dressing classifications may include colors, styles, decorations, etc. of the dressing, the positional distribution of the dressing is used to indicate the positional distribution of different dressing in the classroom.
And 203, arranging the dressing classifications and the dressing position distributions of the students at different time points according to a time sequence, and judging different lessons in the video of the student area, and the lesson time corresponding to the different lessons.
Since students in different courses are different, the dressing distribution is completely different, different courses can be judged, the dressing position distribution during the course should be very stable, students can walk back and forth in the course, and then different courses and the dressing time can be judged according to dressing classification and dressing position distribution.
And 204, cutting out sub-videos of different lessons from the video of the teaching area according to lesson time and lesson time corresponding to different lessons contained in the video of the student area and the time corresponding relation between the video of the student area and the video of the teaching area.
According to the scheme, through judging the dressing distribution of students in classrooms, the distribution is arranged according to the time sequence, whether the students take lessons or not is judged, and the lesson time of different lessons are accurately judged by combining the dressing position distribution, so that a manual cutting scheme is avoided, and the students are cut directly according to the time of a class table, and a quick and accurate classroom switching judging method and a teaching video cutting scheme are provided.
In the embodiment of the disclosure, the positions of people in the video are detected according to the video of the student area, wherein the people detection algorithm only needs to use a common detection algorithm, for example: ssd algorithm, FASTER RCNN algorithm and yolo algorithm. A centernet network is used in the present disclosure to detect personnel locations; then, for the roi region (region of interest) of each person, a clothing classification including a clothing type, a color, and the like is counted by using a clothing detection algorithm set in advance. Finally, the purpose of distinguishing different classes can be achieved through statistics of personnel dressing distribution.
In the embodiment of the disclosure, the roi area is an area output by the detection network, and a picture is input into the trained detection network, so that an area frame (roi area) of a target and a corresponding category can be obtained. Taking a photo taken from a video of a student area in the embodiment of the present disclosure as an example, for each input photo, a roi area is extracted first, that is, an area where each student is located, for example, 30 students, 30 roi areas are extracted, and then feature extraction is performed on each roi area.
The reason for this is that in a classroom non-fixed scenario, different lessons in the same classroom are given lessons from different batches of students. From the differences in the clothing of students in classrooms, the differences in the class times can be generally distinguished. The traditional scheme can judge whether the same batch of people are the same by using a face recognition method, but the area of the face is smaller in the camera pictures of students, and the area of information such as clothing and the like is larger than the area of the face, as shown in fig. 3. The detection and recognition accuracy rate of the existing deep learning detection and recognition model is very low, but the detection and recognition accuracy rate of the small target (face) is much higher than that of the large target (dressing and the like), and in combination, the clothing detection is utilized to judge the class switching, and compared with the judgment by utilizing the face, the judgment accuracy rate can be greatly improved.
Next, a clothing detection algorithm in an embodiment of the present disclosure will be described.
In the embodiment of the disclosure, the clothing detection algorithm is a multi-label algorithm; in the conventional detection algorithm, there is only one tag for each type of target that is output, such as the network on the left side of fig. 4. For the clothing detection scene, there are middle classes of patterns in the major classes of the coat, and there are color subclasses or more subclasses in the middle classes of patterns. If all the categories are divided into different categories, excessive network output, network redundancy and difficulty in training are caused. In the embodiment of the disclosure, the characteristic extraction characteristics of the convolutional neural network are utilized to output at different levels so as to solve the problem of excessive output, and the structure of the convolutional neural network is shown as a network on the right side of fig. 4.
With continued reference to fig. 4, the detection network in the disclosed embodiment uses the most current rednet '18 backbone network instead of rednet's 50 backbone network because rednet uses the inner convolution involution operator, which improves the final output accuracy over convolution operations. In addition, in the algorithm of the embodiment of the present disclosure, using such an intermediate output manner, labels of different levels are output at different levels, respectively. For example, a photograph including clothing is input, and a result of a general class classification (such as blue clothing) is output through a first rednet backbone network; then the full connection layer 1 continues to output the result of the middle class (such as the long-term slimming) through the second rednet network; finally, the full connection layer 2 outputs subclass classification results (such as buttons, tie decorations and the like) through a third rednet network; this also conforms to the reasoning model of a normal deep learning network, with tiny features (buttons etc.) requiring deeper network models to extract.
In the embodiment of the disclosure, the feature data of the full connection layer may be processed through a softmax function to obtain classification results of each level, and in practical application, other modes may also be adopted for processing, which is not limited in the disclosure.
In the embodiment of the disclosure, when calculating the loss functions of different levels, for each level, the cross entropy loss function of each level is calculated as the loss function of each level, and the loss function of the convolutional neural network is the sum of the loss functions of the different levels. The loss function of the network can be expressed as loss = Σ Kcrossentropyk, K e K, where K is the number of output classes, e.g. three classes, small, medium, large in fig. 3, then k=3, where crossentropy = - Σp (x) log q (x). Since the classifications for each level are calculated as a loss function separately and then summed, not all the classifications are put together for calculation, which is more accurate.
In one possible implementation, the categories of the sub-videos of different lessons may also be marked in conjunction with the video of the teaching area, for example: the current lesson is a chinese lesson, a physical lesson, etc.
In the embodiment of the disclosure, after a video of a teaching area is obtained, photos at different time points are taken from the video of the teaching area; then, identifying categories of teaching contents at different time points based on photos at different time points and a preset teaching content identification algorithm which are intercepted from videos of the teaching area; and marking the categories of the sub-videos of different lessons according to the categories of the teaching contents of different time points and combining the lesson time and the lesson time corresponding to the different lessons.
In the embodiment of the disclosure, based on photos taken from videos of the teaching area at different time points, a convolutional neural network can be adopted to classify each input photo, and the position of the teaching content in the photo and the category of the teaching content are output.
In one possible implementation, cornernet may be used directly to detect teacher ppt teaching area directly for classification, as shown in FIG. 5. The patent uses cornernet as the main detection plus classification network, and the key point of the scheme is that the backbone network of cornernet is replaced by rednet50 so as to obtain higher accuracy. The original cornernet backbone network is Hourglass Network, which is mainly based on common convolution, however, in the scheme of the embodiment of the disclosure, rednet is based on inner convolution, so that the overall accuracy of the network can be increased. In the embodiment of the present disclosure, the teaching area may be ppt, blackboard writing, or the like, and the embodiment of the present disclosure does not limit the form of the teaching area.
As shown in fig. 6, in the embodiment of the disclosure, the ppt content is classified directly by using a classification network, that is, a camera picture (a photograph taken from a video of a teaching area) is input, and then, after passing through a cornernet network, the upper left corner coordinates of a ppt area frame in the picture, the width and height (x 0, y0, w, h) of the frame, and the classification category, for example, the current class is a language class, and the classification label is output corresponding to the language class.
Through the scheme in the embodiment of the disclosure, intelligent discrimination, cutting and class marking of different classes can be realized, and cutting speed and accuracy can be ensured.
Based on the same inventive concept, as shown in fig. 7, the embodiment of the disclosure further provides a classroom scene classroom switching discriminating apparatus 700, including:
An obtaining module 701, configured to obtain a video of a student area and a video of a teaching area in the same time period acquired by an image acquisition unit disposed in the classroom, and intercept photos at different time points from the video of the student area;
A statistics module 702, configured to, based on photographs taken from the video of the student area at different time points and a preset clothing detection algorithm, count dressing classifications and position distributions of the dressing of the students at the different time points in the classroom;
A judging module 703, configured to arrange the dressing classifications and the dressing position distributions of the students at different time points according to a time sequence, and judge different lessons in the video of the student area, and lesson time corresponding to the different lessons;
And the video cutting module 704 is configured to cut sub-videos of different lessons from the video of the teaching area according to lesson time and lesson time corresponding to different lessons included in the video of the student area and a time corresponding relationship between the video of the student area and the video of the teaching area.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 8 is a block diagram of an electronic device 800, according to an example embodiment. As shown in fig. 8, the electronic device 800 may include: a processor 801, a memory 802. The electronic device 800 may also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communication component 805.
Wherein the processor 801 is configured to control the overall operation of the electronic device 800 to perform all or part of the steps described above. The memory 802 is used to store various types of data to support operation at the electronic device 800, which may include, for example, instructions for any application or method operating on the electronic device 800, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 803 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 802 or transmitted through the communication component 805. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the electronic device 800 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC) for short, 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or a combination of more of them, is not limited herein. The corresponding communication component 805 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application-specific integrated circuits (ASIC), digital signal Processor (DIGITAL SIGNAL Processor, DSP), digital signal processing device (DIGITAL SIGNAL Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field Programmable GATE ARRAY, FPGA), controller, microcontroller, microprocessor, or other electronic components for performing the classroom scene class switch discrimination method described above.
In another exemplary embodiment, there is also provided a computer readable storage medium including program instructions which, when executed by a processor, implement the steps of the classroom scene class switch discrimination method described above. For example, the computer readable storage medium may be the memory 802 including the program instructions described above, which are executable by the processor 801 of the electronic device 800 to perform the classroom scene class switch determination method described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the classroom scene class switch discrimination method described above when executed by the programmable apparatus.
The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the embodiments described above, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.
In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.
Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims (12)

1. A classroom scene class switching discriminating method is characterized by comprising the following steps:
Acquiring videos of a student area and a teaching area in the same time period acquired by an image acquisition unit arranged in the classroom, and capturing photos at different time points from the videos of the student area;
Based on photos of different time points taken from videos of the student areas and a preset clothing detection algorithm, statistics is carried out on dressing classification and dressing position distribution of students at different time points in the classrooms;
sorting the dressing of the students at different time points and arranging the position distribution of the dressing according to time sequence, and distinguishing different classes in the video of the student area, and the class-taking time corresponding to the different classes;
And cutting sub-videos of different lessons from the video of the teaching area according to lesson time and lesson time corresponding to different lessons contained in the video of the student area and the time corresponding relation between the video of the student area and the video of the teaching area.
2. The method of claim 1, wherein counting the dressing classifications and the positional distribution of the dressing for the students at different points in time in the classroom based on the photographs taken at different points in time from the video of the student area and a preset clothing detection algorithm, comprises:
For photos of different time points taken from videos of the student areas, extracting an area of interest corresponding to each student from the photos by adopting a target detection algorithm, and recording the position distribution of each area of interest in the photos;
And (3) for each region of interest, adopting the clothing detection algorithm to obtain the dressing classification of each student, and further counting the dressing classifications of the students at different time points in the classroom.
3. The method of claim 1, wherein the garment detection algorithm is a multi-tag detection algorithm; the multi-label detection algorithm adopts a convolutional neural network to extract characteristics of the input photos, and outputs classification results of different grades at different levels of the convolutional neural network;
Wherein the number of levels of the convolutional neural network is greater than or equal to 2; the classification results of the different grades are used for representing the dressing style of the students.
4. The method of claim 3, wherein the convolutional neural network comprises a plurality of backbone networks; generating characteristic data of a first full-connection layer by the photo through a first backbone network in the plurality of backbone networks, and processing the characteristic data of the first full-connection layer by using a softmax function to obtain a corresponding first-stage classification result;
The characteristic data of the first full-connection layer are processed through a second backbone network in the plurality of backbone networks to generate the characteristic data of the second full-connection layer, and the characteristic data of the second full-connection layer are processed through a softmax function to obtain a corresponding second-stage classification result;
and in other backbone networks of the plurality of backbone networks, analogizing is performed until the convolutional neural network outputs classification results of all levels.
5. A method as claimed in claim 3, wherein the loss function of the convolutional neural network is the sum of the loss functions of the different levels;
when calculating the loss functions of different levels, for each level, calculating the cross entropy loss function of each level as the loss function of each level.
6. The method of claim 3, wherein the backbone network of the convolutional neural network is an inner-wrap convolutional network, the inner-wrap convolutional network employing an inner-wrap convolutional operator.
7. The method of any one of claims 1-6, wherein the method further comprises:
Capturing photos at different time points from the video of the teaching area;
identifying categories of teaching contents at different time points based on photos at different time points and a preset teaching content identification algorithm which are taken from videos of the teaching area;
and marking the categories of the sub-videos of different classes according to the categories of the teaching contents of different time points and combining the class-loading time and the class-unloading time corresponding to the different classes.
8. The method of claim 7, wherein identifying categories of tutorials at different points in time based on photographs taken from video of the tutorial area at different points in time and a preset tutorial identification algorithm comprises:
Based on photos of different time points taken from the video of the teaching area, classifying each input photo by adopting a convolutional neural network, and outputting the position of the teaching content in the photo and the category of the teaching content.
9. The method of claim 8, wherein the backbone network of the convolutional neural network is an inner-wrap convolutional network, the inner-wrap convolutional network employing an inner-wrap convolutional operator.
10. The utility model provides a classroom scene classroom switching discriminating apparatus which characterized in that includes:
The acquisition module is used for acquiring videos of the student areas and videos of the teaching areas in the same time period acquired by the image acquisition unit arranged in the teaching room, and capturing photos at different time points from the videos of the student areas;
the statistics module is used for counting dressing classifications and dressing position distribution of students at different time points in the classrooms based on photos at different time points intercepted from the video of the student area and a preset clothing detection algorithm;
the judging module is used for arranging the dressing classification and the dressing position distribution of the students at different time points according to the time sequence, and judging different lessons in the video of the student area, and the lesson time corresponding to the different lessons;
And the video cutting module is used for cutting out the sub-videos of different lessons from the video of the teaching area according to the lesson time and lesson time corresponding to different lessons contained in the video of the student area and the time corresponding relation between the video of the student area and the video of the teaching area.
11. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method according to any of claims 1-9.
12. An electronic device, comprising:
A memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1-9.
CN202210102525.0A 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment Active CN114519789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210102525.0A CN114519789B (en) 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210102525.0A CN114519789B (en) 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114519789A CN114519789A (en) 2022-05-20
CN114519789B true CN114519789B (en) 2024-05-24

Family

ID=81597546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210102525.0A Active CN114519789B (en) 2022-01-27 2022-01-27 Classroom scene classroom switching discriminating method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114519789B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007003128A1 (en) * 2005-07-01 2007-01-11 Weigiao Huang A network teaching system and method
CN110599835A (en) * 2019-09-25 2019-12-20 淄博职业学院 Interactive computer remote education system
CN112001251A (en) * 2020-07-22 2020-11-27 山东大学 Pedestrian re-identification method and system based on combination of human body analysis and clothing color
CN112132079A (en) * 2020-09-29 2020-12-25 中国银行股份有限公司 Method, device and system for monitoring students in online teaching
CN112200818A (en) * 2020-10-15 2021-01-08 广州华多网络科技有限公司 Image-based dressing area segmentation and dressing replacement method, device and equipment
CN113052085A (en) * 2021-03-26 2021-06-29 新东方教育科技集团有限公司 Video clipping method, video clipping device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007003128A1 (en) * 2005-07-01 2007-01-11 Weigiao Huang A network teaching system and method
CN110599835A (en) * 2019-09-25 2019-12-20 淄博职业学院 Interactive computer remote education system
CN112001251A (en) * 2020-07-22 2020-11-27 山东大学 Pedestrian re-identification method and system based on combination of human body analysis and clothing color
CN112132079A (en) * 2020-09-29 2020-12-25 中国银行股份有限公司 Method, device and system for monitoring students in online teaching
CN112200818A (en) * 2020-10-15 2021-01-08 广州华多网络科技有限公司 Image-based dressing area segmentation and dressing replacement method, device and equipment
CN113052085A (en) * 2021-03-26 2021-06-29 新东方教育科技集团有限公司 Video clipping method, video clipping device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114519789A (en) 2022-05-20

Similar Documents

Publication Publication Date Title
CN109697416B (en) Video data processing method and related device
CN111507283B (en) Student behavior identification method and system based on classroom scene
CN106874826A (en) Face key point-tracking method and device
CN110837795A (en) Teaching condition intelligent monitoring method, device and equipment based on classroom monitoring video
CN111739027B (en) Image processing method, device, equipment and readable storage medium
CN113793336B (en) Method, device and equipment for detecting blood cells and readable storage medium
CN112633313B (en) Bad information identification method of network terminal and local area network terminal equipment
CN111061898A (en) Image processing method, image processing device, computer equipment and storage medium
CN107330455A (en) Image evaluation method
CN106650670A (en) Method and device for detection of living body face video
CN112446437A (en) Goods shelf commodity specification identification method based on machine vision
CN111160277A (en) Behavior recognition analysis method and system, and computer-readable storage medium
CN110543811A (en) non-cooperation type examination person management method and system based on deep learning
CN206557851U (en) A kind of situation harvester of listening to the teacher of imparting knowledge to students
CN112487981A (en) MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation
CN113705510A (en) Target identification tracking method, device, equipment and storage medium
Han et al. Improved visual background extractor using an adaptive distance threshold
CN115719516A (en) Multichannel-based classroom teaching behavior identification method and system
CN111178263B (en) Real-time expression analysis method and device
CN111353439A (en) Method, device, system and equipment for analyzing teaching behaviors
CN106682669A (en) Image processing method and mobile terminal
CN114519789B (en) Classroom scene classroom switching discriminating method and device and electronic equipment
CN111898525B (en) Construction method of smoke identification model, and method and device for detecting smoke
CN117541442A (en) Teaching attendance management method, device, equipment and storage medium
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant