CN112597800B

CN112597800B - Method and system for detecting sitting-up actions of students in recording and broadcasting system

Info

Publication number: CN112597800B
Application number: CN202011327975.7A
Authority: CN
Inventors: 张进; 蒋守欢; 廖亮亮; 王满海
Original assignee: ANHUI TELEHOME DIGITAL TECHNOLOGY CO LTD
Current assignee: ANHUI TELEHOME DIGITAL TECHNOLOGY CO LTD
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2024-01-26
Anticipated expiration: 2040-11-24
Also published as: CN112597800A

Abstract

The invention discloses a method and a system for detecting sitting actions of students in a recording and broadcasting system, which comprise the following steps: s100, acquiring a frame of image; s200, preprocessing the acquired image; s300, performing foreground extraction on the preprocessed image to obtain a motion history image; s400, judging and storing a possibility target of the motion history graph; s500, two-stage judgment is carried out on the stored information through a judgment module to obtain coordinates of standing and sitting. The invention provides a method and a system for detecting standing and sitting actions of students in a recording and broadcasting system, and provides a method for detecting standing and sitting actions of students based on a motion history diagram and gradient directions, wherein the standing and sitting actions are judged by adopting two-stage judgment to improve the detection accuracy. The problem that the target is incomplete is solved by using a motion history graph to solve the background modeling method and the like, the global motion angle is extracted for the complete target, and finally, the two-stage discrimination method is adopted for discrimination.

Description

Method and system for detecting sitting-up actions of students in recording and broadcasting system

Technical Field

The invention relates to the technical field of motion detection, in particular to a method and a system for detecting sitting actions of students in a recording and broadcasting system.

Background

In the intelligent recording and broadcasting, the interaction monitoring of students mainly adopts the modes of detecting the standing and sitting actions of the students, and some detection devices adopt the modes of capacitance pressure sensing and the like.

In order to save costs, vision processing-based methods are receiving increasing attention. CN 102096930A (patent number) adopts a background modeling mode to determine a moving target, and then uses template matching to determine student standing and sitting down, and the method does not consider interference caused by other movements at all, so that the false detection rate is high; CN 110728696A firstly uses background modeling to select a moving object, then uses a method of feature points and sparse optical flow to make erection judgment, and finally gives a judgment result according to a preset line, and the method has several disadvantages: 1. the simple background modeling method is difficult to extract a complete moving object in a complex environment such as a classroom; 2. the feature points are easy to be interfered by light rays and dressing, and the calculation complexity of an optical flow method is high; 3. the pre-scribe method is too limited and if the camera moves slightly, the entire decision condition will fail.

Disclosure of Invention

The invention provides a method and a system for detecting sitting actions of students in a recording and broadcasting system, which can solve the technical defects.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a method for detecting sitting actions of students in a recording and broadcasting system comprises the following steps:

s100, acquiring a frame of image;

s200, preprocessing the acquired image;

s300, performing foreground extraction on the preprocessed image to obtain a motion history image;

s400, judging and storing a possibility target of the motion history graph;

s500, two-stage judgment is carried out on the stored information through a judgment module to obtain coordinates of standing and sitting.

Further, preprocessing the acquired image in S200 includes scaling the height and width of the acquired 4K image to 1/4 of the original height and width, respectively, and then performing gaussian filtering.

Further, the step S300 is to extract the foreground of the preprocessed image to obtain a motion history map;

the method specifically comprises the following steps:

(c1) An inter-frame difference method, that is, an absolute value of a pixel value difference between each point in a current frame and a corresponding point of a previous frame is adopted, a two-frame difference method corresponds to a previous frame, a three-frame difference method corresponds to a previous two frames, and the like, a formula is shown as 1,

F(x,y)＝abs(I(x,y)-I _pre (x,y)) (1)

where F (x, y) represents the result after the frame difference, abs () represents the absolute value process, I (x, y) represents the pixel value of the current frame coordinate (x, y), I _pre (x, y) represents the pixel value of the previous frame coordinates (x, y);

specifically, an annular buf for storing 4 frames of images is created, images in buf [ i ] and buf [ i+2] are taken out, and a result after the frame difference is calculated by using a formula (1);

(c2) The motion history map is a map of motion history pixels where motion occurred set to the current timestamp, and the motion history pixels where motion occurred last long before were cleared, as shown in equation 2,

specifically, a binarization threshold value is set to be 13, F (x, y) is binarized to obtain silh (x, y), when silh (x, y) is not equal to 0, the value of a motion history graph mhi (x, y) takes the current system time timestamp, when silh (x, y) =0 and mhi (x, y) is smaller than the current system time timestamp minus the duration, the value of a motion history graph mhi (x, y) is 0, and otherwise the value of mhi (x, y) is kept unchanged;

the binarized image of the moving object is obtained through the steps, 3*3 morphological corrosion and expansion processing is carried out on the binarized image, and then the circumscribed rectangle of the moving object is extracted to be used as input for judging the possible standing and sitting object module.

Further, in S300, the value of the duration variable is between 0.3 and 0.5 according to the actual situation of the seats of the students in the classroom scene.

Further, the step S400 of judging and storing the possibility target of the motion history map specifically comprises,

the upward angle of standing is set to angle >80 and angle <170, and the downward angle of sitting is set to angle >200 and angle <300;

the specific angle solving comprises gradient direction solving and global motion direction solving;

wherein,

(d1) Gradient direction solving

The gradient direction calculation is shown in equation 3,

specifically, the sobel operator is used for calculating respectivelyAnd->Then, calculating the arcsine of an operator by using a fastAtan2 function to obtain the gradient direction of the motion history map;

(d2) Global motion direction solution

The global motion direction is the average direction of the selected area, and an angle value of 0 to 360 is calculated according to the average direction;

the average direction is calculated from a weighted direction histogram, the weight calculation formula being ω=ax+b, whereb=1-t a, finally +.>Where t represents timestamp in mhi, dt represents duration in mhi, where the most recent motion is known to have a greater weight and the motion that occurred in the past has a lesser weight;

specifically, the gradient direction diagram is divided into 12 equal parts according to 0 to 360 degrees to obtain a gradient direction histogram, the coordinate of the maximum value of the gradient direction histogram is searched to be used as a basic direction, the weight of the motion history diagram is calculated according to a weight formula and an initialized weight coefficient, the relative offset of the basic direction is calculated, and the final motion direction angle can be obtained according to the offset and the basic direction.

Further, the step S500 is to perform two-stage determination on the stored information through a determination module to obtain coordinates of standing and sitting, and specifically, a two-stage determination mode is adopted to improve the detection precision;

in the first stage of judgment, the continuous characteristic of standing and sitting movements is utilized, the standing and sitting movements are judged by calculating the movement direction of a target continuous multi-frame, the face detection is added on the basis of the first stage of the second stage of standing judgment to determine the final judgment result, and the HOG characteristic similarity comparison is adopted for the second stage of sitting judgment for judgment.

Further, the first-stage judgment is that according to the characteristics of rising and sitting movements of students, the rising movements are considered to be rising movements when the rising movements exist for 10 continuous frames, and the falling movements are considered to be sitting movements when the falling movements exist for 10 continuous frames;

firstly, motion judgment of similar areas of adjacent frames is carried out, a feature extraction vector is established by initialization, the feature extraction vector is used for storing motion angles and motion coordinate areas of all possible targets, if the motion angles of a foreground area extracted by a current frame meet angle 80 and angle <170, the coordinates of the area are compared with all coordinates of the previous frame in a non-maximum suppression mode, the condition that the motion angles of continuous 3 frames do not meet angle 80 and angle <170 is considered in the process of motion, therefore, when the ratio of the minimum value between the overlapped area of the two frames and the area of the two frames is larger than 0.5, the two areas are judged to be similar areas, if the motion angles of the continuous 3 frames do not meet angle 80 and angle <170 conditions, the frame count is cleared, and when the feature vector with the frame count being larger than 10 is met, all stored angles in the vector are further judged, and only the number of frames meeting angle 80 and angle <95 is judged to be vertical motion;

the sitting motion determination is the same as the standing motion determination logic, and it is only necessary to determine whether the frame count of the feature vector is greater than 10 frames or not at the time of final determination, and if the frame count is greater than 10 frames, the sitting motion is determined.

Further, the second stage adopts a face detection method to carry out standing judgment;

taking out 10 frames within two seconds of the position determined in the first stage and combining the 10 frames into one frame of image for detection, and judging that the face is standing if two or more faces are detected and the geometric position of the face is at the upper half part of the selected position;

for the sitting action, the extracted HOG features of each frame after the first stage of position determination are compared with the HOG features of the stored image in similarity, and if the similarity value is less than 0.7, the sitting is determined.

On the other hand, the invention also discloses a system for detecting the sitting action of students in the recording and broadcasting system, which comprises the following units,

the image acquisition module is used for acquiring images in the lesson scene of the student;

the image preprocessing module is used for preprocessing the acquired image;

the foreground extraction module is used for extracting the foreground of the preprocessed image to obtain a motion history image;

the possibility standing sitting target judging module is used for judging and storing a possibility target of the motion history map;

and the judging module is used for carrying out two-stage judgment on the stored information through the judging module to obtain the coordinates of standing and sitting.

According to the technical scheme, the method for detecting the standing and sitting actions of the students in the recording and broadcasting system provided by the invention is based on the motion history diagram and the gradient direction, and the standing and sitting actions are judged by adopting two-stage judgment to improve the detection accuracy. The problem that the target is incomplete is solved by using a motion history graph to solve the background modeling method and the like, the global motion angle is extracted for the complete target, and finally, the two-stage discrimination method is adopted for discrimination.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention adopts the three-frame difference method and the motion history diagram to judge the moving object, thereby not only reducing the calculation complexity, but also solving the problem of incomplete detection of the moving object. The information of multiple frames is finally needed for judgment, so that the complexity of subsequent judgment can be greatly reduced by complete target judgment;

(2) The two-stage discrimination method can greatly improve the detection rate. The interference of some slight actions can be filtered through the first-stage discrimination, but the misjudgment possibility still exists for actions with larger actions such as lifting hands, and the like, and the continuous multi-frame combined face detection method is added, so that the instability of single-frame detection can be avoided, and the interference caused by the angle problem can be reduced.

(3) The invention has simple operation, only needs one cradle head camera, omits operations such as extra scribing and the like, and reduces the construction complexity.

Drawings

FIG. 1 is a flow schematic of the method of the present invention;

FIG. 2 is a schematic diagram of a first level decision flow of the present invention;

FIG. 3 is a schematic diagram of the second stage of the present invention for erection decision by face detection;

fig. 4 is a schematic diagram of a sitting motion determination flow of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention.

As shown in fig. 1, in the method for detecting a sitting action of a student in a recording and playing system according to this embodiment, the whole process flow includes: collecting a frame of image; preprocessing an image; carrying out foreground extraction on the preprocessed image to obtain a motion history image; performing probability target judgment on the motion history graph and storing the probability target judgment; and carrying out two-stage judgment on the stored information through a judgment module to obtain the coordinates of standing and sitting.

The following is a specific description:

(a) Image acquisition module

And collecting images in the student class scene.

(b) Image preprocessing module

The image preprocessing module mainly comprises the steps of scaling and filtering the image, specifically, scaling the height and width of the acquired 4K image to 1/4 of the original height and width respectively, and then performing Gaussian filtering processing.

(c) Foreground extraction module

The foreground extraction module is one of key steps of standing and sitting action detection and positioning, and the subsequent operation is judged based on a foreground target. Through the above description, the complete target is difficult to extract by a simple background modeling method, and the method of combining a frame difference method and a motion history graph is adopted to solve the problem of target extraction integrity.

(c1) The basic idea of the inter-frame difference method is that the absolute value of the pixel value difference between each point in the current frame and the corresponding point of the previous frame, the two-frame difference method corresponds to the previous frame, the three-frame difference method corresponds to the previous two frames, and the like, the formula is shown as 1,

F(x,y)＝abs(I(x,y)-I _pre (x,y)) (1)

where F (x, y) represents the result after the frame difference, abs () represents the absolute value process, I (x, y) represents the pixel value of the current frame coordinate (x, y), I _pre (x, y) represents the pixel value of the previous frame coordinates (x, y).

Specifically, an annular buf storing 4 frames of images is created, images in buf [ i ] and buf [ i+2] are taken out, and the result after the frame difference is calculated by using a formula 1.

(c2) The basic idea of the motion history is to set the motion history picture element where the motion occurred as the current time stamp, and to clear the motion history picture element where the motion occurred last long ago, as shown in formula 2,

specifically, a binarization threshold value is set to be 13, F (x, y) is binarized to obtain silh (x, y), when silh (x, y) noteq0, the value of the motion history graph mhi (x, y) takes the current system time timestamp, when silh (x, y) =0 and mhi (x, y) is smaller than the current system time timestamp minus the duration, the value of the motion history graph mhi (x, y) is 0, and otherwise the value of mhi (x, y) remains unchanged. The value of the duration variable is typically between 0.3 and 0.5, depending on the actual seating situation of the students in the classroom scene.

(d) Target judgment module for possible standing and sitting

The background extraction module extracts a plurality of possible target circumscribed rectangles, and the heights of the circumscribed rectangles are larger than the widths according to the action characteristics of the students in standing and sitting, so that the circumscribed rectangles which do not accord with upward and downward movement are firstly excluded according to the characteristics.

The student standing and sitting actions are a continuous process, the standing actions are upward movements, the sitting actions are downward movements, but considering the nonstandard of the student standing and sitting actions, some have a forward tilting action in the front half of the standing, some shake left and right in the front half of the standing, so the standing up angle is set to angle >80 and angle <170, and the sitting down angle is set to angle >200 and angle <300, in order to accommodate all the possibility of standing and sitting down actions.

Specific angle solutions include gradient direction solutions and global motion direction solutions.

(d1) Gradient direction solving

The gradient direction calculation is shown in equation 3,

specifically, the sobel operator is used for calculating respectivelyAnd->Then, the fastAtan2 function is used for solving the arcsine of the operator to obtain the gradient direction of the motion history map.

(d2) Global motion direction solution

The basic idea of the global motion direction is to calculate the average direction of the selected region, from which an angle value of 0 to 360 is found. The average direction is calculated from a weighted direction histogram, the weight calculation formula being ω=ax+b, whereb=1-t a, finally +.>Where t represents timestamp in mhi, dt represents duration in mhi, where it is known from the formula that the most recent motion has a greater weight and the motion that occurred in the past has a lesser weight.

(e) Decision module

The judging module judges the final target according to the foreground extraction content determined in the previous step and the solved motion direction, and the design of the module influences the detection precision, so that the detection precision is improved by adopting a two-stage judging mode in the judging of the module. In the first stage of judgment, the continuous characteristic of standing and sitting movements is utilized, the standing and sitting movements are judged by calculating the movement direction of a target continuous multi-frame, the face detection is added on the basis of the first stage of the second stage of standing judgment to determine the final judgment result, and the HOG characteristic similarity comparison is adopted for the second stage of sitting judgment for judgment.

The first level of determination is based on the characteristics of the student standing and sitting movements, the rising movement tendency is considered to be standing movements when there is a rising movement tendency for 10 consecutive frames, and the falling movement tendency is considered to be sitting movements when there is a falling movement tendency for 10 consecutive frames, as shown in fig. 2. In the specific steps, firstly, motion judgment of similar areas of adjacent frames is carried out, a feature extraction vector is established by initialization, the feature extraction vector is used for storing motion angles and motion coordinate areas of all possible targets, if the motion angles of the foreground areas extracted by the current frames meet angle 80 and angle <170, the coordinates of the area are compared with all coordinates of the previous frames in a non-maximum suppression mode, and considering interference generated in the motion process, the condition that the motion angles of continuous 3 frames do not meet angle 80 and angle <170 possibly exists, therefore, when the ratio of the overlapping area of the two frames to the minimum value between the two frames is larger than 0.5, the two areas are judged to be similar areas, if the motion angles of the continuous 3 frames do not meet angle 80 and angle <170 conditions, the frame count is cleared, and when the feature vector with the frame count being larger than 10 is met, all stored angles in the vector are further judged, and only the number of frames meeting angle 80 and angle <95 is judged to be vertical motion. The sitting motion determination is logically similar to the standing motion determination, and it is only necessary to determine whether the frame count of the feature vector is greater than 10 frames or not at the time of final determination, and if the frame count is greater than 10 frames, the sitting motion is determined.

The second stage of standing and sitting actions are quite different, and for the standing action, the first stage of judgment can solve misjudgment caused by micro motion of students, but the action with relatively high interference of lifting hands still can generate misjudgment, and in order to solve the interference, the second stage adopts a face detection method to perform standing judgment, as shown in fig. 2. The face detection method is mature and applied to various scenes, the minimum face which can be detected can reach 10 x 10 pixels, the detection precision is high, and the detection speed is high. In order to improve the detection efficiency and the detection robustness, 10 frames are taken out and combined into one frame of image for detection within two seconds at the position determined by the first stage, and if two or more faces are detected, the geometric position of the faces is determined to be a rising action at the upper half part of the selected position. For the sitting action, as shown in fig. four, the extracted HOG features of each frame after the first stage of position determination are compared with the HOG features of the stored image in similarity, and if the similarity value is less than 0.7, the sitting is determined.

Therefore, the invention can realize the accurate positioning of standing and sitting by only one camera, does not need additional parameter configuration in the implementation process, and can effectively reduce the miscut of the lens in the interaction.

the image preprocessing module is used for preprocessing the acquired image;

It may be understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and explanation, examples and beneficial effects of the related content may refer to corresponding parts in the above method.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting sitting actions of students in a recording and broadcasting system is characterized by comprising the following steps of: the method comprises the following steps:

s100, acquiring a frame of image;

s200, preprocessing the acquired image;

s400, judging and storing the possibility of sitting down of the target standing up of the motion history map;

s500, two-stage judgment is carried out on the stored information through a judgment module to obtain coordinates of standing and sitting, the two-stage judgment is carried out on the stored information through the judgment module to obtain the coordinates of standing and sitting, and the detection precision is improved by adopting a two-stage judgment mode;

in the first stage of judgment, the standing and sitting actions are judged by calculating the movement direction of a target continuous multi-frame by utilizing the continuity characteristic of standing and sitting movements, the face detection is added on the basis of the first stage of the second stage of standing judgment to determine the final judgment result, and the HOG characteristic similarity comparison is adopted for the second stage of sitting judgment for judgment;

the first stage of judgment is to consider that according to the characteristics of rising and sitting movements of students, the rising movements are considered to be rising movements when the rising movements exist for 10 continuous frames, and the falling movements are considered to be sitting movements when the falling movements exist for 10 continuous frames;

the sitting action judgment is the same as the standing action judgment logic, and only needs to judge whether the frame count of the feature vector is more than 10 frames or not in the final judgment, and if the frame count is more than 10 frames, the sitting action is judged;

for the judgment of the possibility of standing up of the target, the second-stage judgment is to adopt a face detection method to carry out standing up judgment, 10 frames are taken out and combined into one frame of image to be detected within two seconds of the position determined by the first-stage judgment, and if two or more faces are detected and the geometric position of the faces is in the upper half part of the selected position, the target is judged to be the standing up action;

and if the value of the similarity is smaller than 0.7, the second stage judges that the target is in sitting action.

2. The method for detecting the sitting-up motion of a student in a recording and playing system according to claim 1, wherein the method comprises the following steps: the preprocessing of the acquired image in S200 includes scaling the height and width of the acquired 4K image to 1/4 of the original, respectively, and then performing gaussian filtering.

3. The method for detecting the sitting-up motion of a student in a recording and playing system according to claim 1, wherein the method comprises the following steps: s300, performing foreground extraction on the preprocessed image to obtain a motion history image;

the method specifically comprises the following steps:

F(x,y)＝abs(I(x,y)-I _pre (x,y)) (1)

4. The method for detecting sitting actions of students in a recording and playing system according to claim 3, wherein the method comprises the following steps: in the step S300, the value of the duration variable is between 0.3 and 0.5 according to the actual situation of seats of students in a classroom scene.

5. The method for detecting sitting actions of students in a recording and playing system according to claim 3, wherein the method comprises the following steps: the step 400 of determining and storing the possibility of the target standing and sitting on the movement history map specifically comprises,

wherein,

(d1) Gradient direction solving

The gradient direction calculation is shown in equation 3,

(d2) Global motion direction solution

6. A system for detecting a sitting motion of a student in a recording and playing system, for implementing the method of claim 1, characterized in that:

comprising the following units of the device,

the image acquisition and action judgment module is used for acquiring images in the lesson scene of the student; judging and storing the possibility of target standing and sitting on the motion history map; two-stage judgment is carried out on the stored information through a judgment module to obtain coordinates of standing and sitting;

the image preprocessing module is used for preprocessing the acquired image;

and the foreground extraction module is used for carrying out foreground extraction on the preprocessed image to obtain a motion history image.