CN114639166A

CN114639166A - Examination room abnormal behavior recognition method based on motion recognition

Info

Publication number: CN114639166A
Application number: CN202210257498.4A
Authority: CN
Inventors: 闫月; 刘建明; 王鑫
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2022-06-17

Abstract

The invention relates to the technical field of motion recognition, in particular to an examination room abnormal behavior recognition method based on motion recognition, a time interaction module and a channel space attention module are respectively added into a bottomless block on a ResNet-50 backbone network to generate an abnormal behavior recognition network model, a video in an examination room is processed and then input into the abnormal behavior recognition network model to be trained until convergence, finally obtained characteristic results are fused to realize abnormal behavior recognition of the examination room, wherein, the time interaction module and the channel space interaction module are used for identifying the action of each examinee in the examination room, in particular to effectively capture the fine-grained action of the examinee, thereby improving the condition that the abnormal small-scale action can not be accurately identified by the existing analysis method for the abnormal action of the examination room based on deep learning, furthermore, the time context information can be captured with low calculation cost by using the time interaction module.

Description

Examination room abnormal behavior recognition method based on motion recognition

Technical Field

The invention relates to the technical field of motion recognition, in particular to an examination room abnormal behavior recognition method based on motion recognition.

Background

With the development of computer technology and the popularization of the application field of motion recognition, it is necessary that colleges and universities apply intelligent monitoring to examination rooms in order to maintain examination fairness. A large amount of human resources and financial resources are needed in a traditional offline invigilation mode, especially in the process of intensively carrying out examinations for a plurality of times in colleges, physical strength of invigilators is greatly consumed, attention of the invigilators is easily reduced, and abnormal behaviors of examination rooms are missed. Although the examination room has a monitoring camera, the electronic equipment generally used at present can only perform simple video recording, storage and the like on the examination room, and still needs to manually use a large amount of time to screen and identify the monitored content. In order to solve the problem of manpower consumption of examination invigilation, people combine the computer vision field with a monitoring task, however, the real-time performance and accuracy of the existing intelligent invigilation system can not reach the standard of practical application, and a plurality of disadvantages exist, such as unsatisfactory recognition effect under the conditions of local shielding, complex background and vision change. Most existing methods identify and judge difference information based on front and back frames of a video, are difficult to capture the difference of slight abnormal actions of an examination room, and have low identification capability on slight abnormal actions of the examination room, such as slight deviation and stealing of fine-grained actions of others, like examination papers, and the like; in terms of efficiency, many methods also have difficulty achieving the requirements of real-time monitoring.

Disclosure of Invention

The invention aims to provide an examination room abnormal behavior identification method based on motion identification, which improves the condition that the existing abnormal behavior analysis method based on a deep learning examination room can not accurately identify abnormal small-scale behaviors.

In order to achieve the purpose, the invention provides an examination room abnormal behavior identification method based on motion identification, which comprises the following steps:

collecting real-time original video content of an examination room;

carrying out image segmentation on the video to obtain a motion image of each examinee in the examination room;

selecting motion image processing of a single examinee to obtain an input image sequence;

inputting the image sequence into an abnormal behavior recognition network model for training until convergence, and outputting a classification result;

and fusing the classification results to realize the identification of abnormal behaviors of the examination room.

In the process of selecting the motion image of a single examinee to process and obtain an input image sequence, dividing a video image of the single examinee into 5 segments to obtain 5 continuous frame sequences, and preprocessing the data to obtain the input image sequence.

The data preprocessing process specifically includes adjusting the short edge of the RGB image to 256, then enhancing the data by using the methods of position dithering, horizontal flip angle point cropping and proportional dithering, and adjusting the cropped area size to 224 × 224.

The abnormal behavior recognition network model is formed on the basis of the ResNet-50 network improvement and comprises 5 stages, wherein each stage comprises a plurality of bottomless blocks, and each bottomless block comprises a time interaction module and a channel space attention module.

Wherein the time interaction module uses a channel-based convolution to independently learn the time evolution of each channel, reducing computational complexity.

The channel space attention module consists of a channel attention module and a space attention module, and comprises an attention mechanism of a channel and an attention mechanism of a space.

The method comprises the steps of randomly generating weights of a model in model training, carrying out continuous back propagation learning on the model according to the quality of a model training result in the later stage of the weights in the training process, and finally adopting an average weighting method to fuse classification results to effectively identify abnormal behaviors of an examination room.

The invention provides an examination room abnormal behavior recognition method based on action recognition, which is characterized in that a time interaction module and a channel space attention module are respectively added into a bottleeck residual block on a ResNet-50 backbone network to generate an abnormal behavior recognition network model, a video in an examination room is processed and then input into the abnormal behavior recognition network model for training until convergence, finally obtained characteristic results are fused to realize the abnormal behavior recognition of the examination room, wherein, the time interaction module and the channel space interaction module are used for identifying the action of each examinee in the examination room, in particular to effectively capture the fine-grained action of the examinee, thereby improving the condition that the abnormal small-scale action can not be accurately identified by the existing analysis method for the abnormal action of the examination room based on deep learning, furthermore, the time context information can be captured with low calculation cost by using the time interaction module.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of an examination room abnormal behavior recognition method based on motion recognition according to the present invention.

FIG. 2 is a process diagram of an example of the construction of the time interaction module of the present invention.

Fig. 3 is a schematic diagram of the channel space attention mechanism module of the present invention.

FIG. 4 is a schematic diagram of the configuration of the channel attention module of the present invention.

FIG. 5 is a schematic structural diagram of a spatial attention module of the present invention.

Fig. 6 is a network architecture diagram of the improved ResNet-50 of the present invention.

Fig. 7 is a schematic structural diagram of the abnormal behavior recognition network model of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1 to 7, the present invention provides a method for identifying abnormal behaviors in an examination room based on motion identification, which includes the following steps:

s1: collecting real-time original video content of an examination room;

s2: carrying out image segmentation on the video to obtain a motion image of each examinee in the examination room;

s3: selecting motion image processing of a single examinee to obtain an input image sequence;

s4: inputting the image sequence into an abnormal behavior recognition network model for training until convergence, and outputting a classification result;

s5: and fusing the classification results to realize the identification of abnormal behaviors of the examination room.

In the process of obtaining an input image sequence by selecting motion image processing of a single examinee, a video image of the single examinee is divided into 5 segments to obtain 5 continuous frame sequences, and the input image sequence is formed after data preprocessing.

The abnormal behavior recognition network model is formed based on ResNet-50 network improvement and comprises 5 stages, wherein each stage comprises a plurality of botdleeck residual blocks, and each botdleeck residual block comprises a time interaction module and a channel space attention module.

The time interaction module uses a channel-based convolution to independently learn the time evolution of each channel, reducing computational complexity.

The channel spatial attention module consists of a channel attention module and a spatial attention module and comprises an attention mechanism of a channel and an attention mechanism of a space.

And randomly generating the weight of the model in the model training, continuously and reversely propagating and learning the model according to the quality of the model training result in the later stage of the weight training to obtain the weight, and finally fusing the classification result by adopting an average weighting method to effectively identify the abnormal behavior of the examination room.

The following is further illustrated from the various modules of the abnormal behavior recognition network model:

1. time interaction module

Fig. 2 is a process of constructing an example of the time interaction Module, the time interaction Module may capture time context information at a lower computation cost, a channel-based convolution is used to independently learn time evolution of each channel, and a lower computation complexity is reserved for model design.

As shown in fig. 2, given an input X ═ X₁，X₂，...，X_TFirstly, change its shape from X^T×C×H×WConvert to X'^C×T×H×W(denoted by X' to avoid ambiguity). The channel convolution is then applied to operate on X', as shown below

Y_{c，t，x，y}＝∑_iV_c，i·X′_{c，t+i，x，y} (1)

Where V is the channel dependent convolution kernel, Y_{c，t，x，y}Is the output after the time convolution. Compared with the three-dimensional convolution, the channel convolution greatly reduces the calculation amount. In the setup herein, the kernel size of the channel convolution is 3 × 1 × 1, which means thatFeatures interact only with features at adjacent times, but the temporal receptive field will gradually increase as features map through deeper layers of the network. After convolution, the shape of the output Y is converted back to T × C × H × W. The parameter of the original 3D convolution is C_out×C_inX t, but the parameter of the TIM module is C_outX 1 x t, the number of parameters in the TIM module is greatly reduced compared to other time convolution operators. In fact, the TSM module can be viewed as a time convolution of the channel mode, with the time kernel fixed to [0, 1, 0 ] for non-shifts]Fixed to [1, 0 ] for the post-shift]For a previous shift fixed to [0, 0, 1 ]]. TIM can generalize TSM operations to flexible modules with learnable convolution kernels, which can more effectively capture temporal context information for motion recognition than random shifts.

2. Channel space attention mechanism module

The channel space attention mechanism module (CBAM) comprises the attention mechanism of the channel and the attention mechanism of the space, and in the identification of the abnormal behaviors of the examination room, because a video does not only comprise a single student, and the influence of the change of background illumination of the examination room and the different sizes of scales can cause interference to the model when the characteristics are extracted, the invention introduces the attention mechanism into the convolution block, can effectively extract the important characteristics in the video content, ignores the secondary characteristics and ensures the accuracy of the final identification result.

Fig. 3 is a schematic diagram of the whole CBAM, and it can be seen that the output result of the convolutional layer passes through a channel attention module to obtain a weighted result, and then passes through a spatial attention module to finally perform weighting to obtain a result. Given an intermediate feature mapping F ∈ R^C×H×WAs an input, CBAM in turn infers a one-dimensional channel attention M, as shown in FIG. 3_c∈R^C×1×1And two-dimensional spatial attention M_c∈R^1×H×WThe overall attention process can be summarized as:

the multiplication by element is expressed in equation (2). During the multiplication, the attention value is copied accordingly. The channel attention value is replicated in spatial dimension and vice versa. F "is the final output.

2.1 channel attention Module

The channel attention module is shown in fig. 4. The spatial information of the feature map is first aggregated using average pooling and maximum pooling operations, generating two different spatial context descriptors:

and

mean pool characteristics and maximum pool characteristics are indicated separately. The two descriptors are then forwarded to a shared network to generate a channel attention map M_c∈R^C×1×1. The shared network is composed of a multi-layer perceptron (MLP) and a hidden layer. To reduce parameter overhead, the hidden activation size is set to R^c ^/r×1×1Where r is the reduction rate. After applying the shared network to each descriptor, the output feature vectors are combined using element summation. Briefly, the channel attention is calculated as follows:

wherein σ represents sigmoid function, W₀∈R^C/r×C，W₁∈R^C×C/rWeight W of MLP₀And W₁Shared for both inputs, and the ReLU activation function is followed by W₀。

2.2 spatial attention Module

The spatial attention module is shown in fig. 5. And generating a spatial attention graph by using the spatial relation among the features. Unlike the channel attention module, the spatial attention module focuses on "where" being the information portion, which is complementary to the channel attention module. To compute spatial attention, the average pool and max pool operations are first applied along the channel axis and concatenated to generate valid feature descriptors. Applying pool operations along the channel axis can effectively highlight the information region. On the concatenated feature descriptors, a spatial attention map is generated by applying the convolutional layers. It encodes the location of emphasis or suppression. The detailed operation of the spatial attention module is as follows:

two-dimensional maps are generated by aggregating channel information of feature maps using two pool operations:

and

each of which represents an average pool characteristic and a maximum pool characteristic in a channel. They are then concatenated and convolved by a standard convolution layer to generate a 2D spatial attention map. Briefly, spatial attention is calculated as follows:

where σ denotes a sigmoid function, f^7×7The convolution operation with a filter size of 7 × 7 is shown.

The main network architecture adopted by the invention is ResNet-50, the network structure diagram of ResNet-50 is shown in figure 6, wherein 5 stages are included, each stage includes a plurality of bottomless blocks, and each bottomless block includes a time interaction module and a channel space attention module. The input image is input into the stage 1 after the convolution operation of the stage 0, is input into the stage 2 after the convolution operation of the stage 1, and is input into the stage 4 finally by analogy, and the classification result is output.

Adding a time interaction module before a first volume block of a Bottleneck layer, adding a channel space attention module after the last volume block without changing the middle convolutional layer network structure of the Bottleneck, and finally adding the obtained attention information and a result output by the last Bottleneck layer to serve as the input of the next Bottleneck residual block. Finally, the obtained feature results are fused, the features which are learned and calculated through the multi-frame video passing time interaction module and the channel space attention module are classified by utilizing the full connection layer respectively, and the final classification result is obtained by fusing the classification results of each group. The finally formed abnormal behavior recognition model of the examination room is shown in FIG. 7.

In summary, the invention combines the advantages of the Time Interaction Module (TIM) and the channel space attention mechanism module (CBAM), not only can acquire time sequence text information with low computation cost, but also can pay close attention to important features of actions, and further brings the following beneficial effects:

1) the TIM time interaction module and the CBAM channel space interaction module are used for identifying the action of each examinee in the examination room, and particularly, the fine-grained action of the examinee can be effectively captured;

2) the time interaction module has higher learning flexibility for the previously proposed time shift module TSM and can capture time context information at lower computational cost;

3) the examination room abnormal behavior model can effectively replace manpower invigilation, so that the manpower resource is greatly saved.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An examination room abnormal behavior recognition method based on motion recognition is characterized by comprising the following steps:

collecting real-time original video content of an examination room;

performing image segmentation on the video to obtain a motion image of each examinee in the examination room;

2. The examination room abnormal behavior recognition method based on motion recognition according to claim 1,

in the process of obtaining an input image sequence by selecting motion image processing of a single test taker, a video image of the single test taker is divided into 5 segments to obtain 5 continuous frame sequences, and the 5 continuous frame sequences are subjected to data preprocessing to form the input image sequence.

3. The examination room abnormal behavior recognition method based on motion recognition according to claim 2,

the data preprocessing process is specifically to adjust the short edge of the RGB image to 256, then enhance the data by using the methods of position dithering, horizontal flip angle point cropping and proportional dithering, and adjust the cropped area size to 224 × 224.

4. The examination room abnormal behavior recognition method based on motion recognition according to claim 1,

5. The examination room abnormal behavior recognition method based on motion recognition according to claim 4,

6. The examination room abnormal behavior recognition method based on motion recognition according to claim 4,

7. The examination room abnormal behavior recognition method based on motion recognition according to claim 1,