CN114049585A

CN114049585A - Mobile phone action detection method based on motion foreground extraction

Info

Publication number: CN114049585A
Application number: CN202111187354.8A
Authority: CN
Inventors: 夏祎; 葛宪莹; 倪亮
Original assignee: Beijing Institute of Control and Electronic Technology
Current assignee: Beijing Institute of Control and Electronic Technology
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2022-02-15
Anticipated expiration: 2041-10-12
Also published as: CN114049585B

Abstract

The invention discloses a mobile phone action detection method based on motion foreground extraction, which utilizes background modeling and background comparison analysis to extract motion foreground from a video sequence, segments the video sequence to obtain a small-size image containing a motion area, and then utilizes a convolutional neural network to detect a mobile phone target in the motion area image, thereby realizing the detection of the action of using a mobile phone. The invention fully utilizes the space-time information provided by the video, realizes the detection process from coarse to fine, has simple steps and high practicability, utilizes the monitoring camera which is installed and fixed in the places such as a laboratory, a conference room, a classroom and the like, can detect the condition that personnel use the mobile phone, and improves the monitoring effect.

Description

Mobile phone action detection method based on motion foreground extraction

Technical Field

The invention relates to a motion detection method, in particular to a mobile phone motion detection method based on motion foreground extraction.

Background

With the rapid development of computer vision and the gradual improvement of computing power, intelligent video monitoring technology gradually appears in the public vision. The technology selects image processing, pattern recognition and other methods to effectively analyze the video collected by the monitoring camera, so that specific targets or abnormal conditions in the video image are automatically recognized, and early warning is timely given out. The application and popularization of the intelligent video monitoring technology greatly promote the improvement of social security, and have important significance in the aspects of improving the quality of life, defending disasters and the like. However, limited by the detection and identification algorithm and the hardware platform, some existing deployed intelligent video monitoring systems have the problems of low identification accuracy, poor real-time performance and the like, and a mature detection method which can be universally applied to all application scenes and application requirements is still lacking, so that a motion detection method which is good in performance and simple to implement needs to be provided for different scenes.

At present, in a fixed indoor scene such as a laboratory, a conference room, a classroom, and the like, a detection method using a mobile phone action mainly processes and analyzes a single-frame image in a video, and performs object detection with the mobile phone as a target, which is used as a basis for judging whether the mobile phone action is used. The method adopts a typical target detection algorithm based on deep learning to detect the mobile phone object, utilizes an image sample marked out of a mobile phone frame to train a detection model, selects image data of a single frame from a plurality of frames in a video as input during application, and detects the mobile phone target through the trained detection model, so that the mobile phone action detection can be realized, and the action of using the mobile phone is considered to exist when the mobile phone is detected. However, in the surveillance video, the mobile phone has a small size, an unobvious feature, a high similarity to other objects such as a notebook, and is easily influenced by factors such as the view field and angle of the surveillance camera to generate changes in shape and size, and when a user holds the mobile phone by hand, the mobile phone is easily blocked, and the mobile phone target is not clear in the image, so that problems such as false detection and missing detection are easily caused when the mobile phone is used as a basis for motion detection. In addition, the detection method is based on a single-frame image, only the spatial domain characteristics of the image are utilized, namely, whether the mobile phone action detection exists is judged by detecting the mobile phone target in a single time space.

Disclosure of Invention

The invention aims to provide a mobile phone action detection method based on motion foreground extraction, which solves the problems of false detection and missing detection existing when a single-frame image is used for mobile phone target detection at present.

A method for detecting actions of a mobile phone based on motion foreground extraction specifically comprises the following steps:

firstly, a mobile phone action detection system based on motion foreground extraction is built

Use cell-phone action detecting system based on motion prospect draws includes: the device comprises a background model building module, a motion foreground extracting module, an off-line training module and a mobile phone action detecting module.

The background model building module has the functions of: and fitting the background image by using a function to obtain a model, and updating the background model by combining the actual scene change of the video.

The motion foreground extraction module has the functions of: and comparing the video sequence with the background model, extracting the motion foreground, and segmenting a motion area through connectivity analysis.

The off-line training module has the functions of: determining a detection network model, constructing a motion area image sample library, and performing network offline training by using the sample library.

The function of using the mobile phone action detection module is as follows: and calculating the motion area image by using the network model, and detecting whether the action of using the mobile phone exists or not.

The second step background model construction module completes background modeling and background updating of the use scene

The background model building module accurately quantizes the background by using a Gaussian probability density function, fits each pixel point by adopting K Gaussian distributions, builds a background model aiming at a use scene and is expressed by a formula (1):

in the formula (1), the first and second groups,at time t, a certain pixel point (X, y) takes the value of X_t，w_i,tIs the weight of the ith Gaussian distribution, eta (X)_t,μ_i,t,∑_i,t)、μ_i,tSum Σ_i,tRespectively, the ith gaussian probability density function, the mean and the covariance matrix, and n is the dimension of gaussian distribution.

Updating the background model in real time according to the change in the scene, and expressing by formula (2) to formula (4):

w_i,t＝(1-α)w_i,t-1+α (2)

μ_i,t＝(1-ρ)μ_i,t-1+ρX_t (3)

∑_i,t＝(1-ρ)∑_i,t-1+ρ[(X_t-μ_i,t)(X_t-μ_i,t)^T] (4)

in the formula (2) to the formula (4),

ρ is the update rate of the model. After the model is updated, calculating the pixel point of each pixel point in the image

And (3) sorting the values, selecting the largest B models as background models, namely the number of Gaussian distributions describing the background is B, T is a weight accumulation threshold, and T belongs to (0.5,1), and is expressed by a formula (5):

thirdly, the motion foreground extraction module extracts the motion foreground and divides the motion area to finish the crude extraction

The motion foreground extraction module compares a current frame image of the video sequence with the background image model for calculation, extracts the motion foreground, and divides a target area containing human motion from the current frame image according to the motion foreground.

Starting from the detection time t, the data is inputComparing the frame image with the background model, and calculating pixel values X one by one_tAnd matching relation with the obtained B Gaussian distributions, wherein when the pixel value is matched with one of the previous B Gaussian distributions, the pixel point is a background point, otherwise, the pixel point is divided into a motion foreground. And calculating the pixel points in the frame image one by one according to the matching relation, and determining whether the pixel points can be matched with Gaussian distribution to obtain a binary image. The matching relationship is expressed by equation (6):

in the formula (6), the point with the gray value of 0 is a background point, and the point with the gray value of 1 is a moving foreground point.

And after the motion foreground is extracted, performing connectivity analysis on the motion foreground, and segmenting a target area image containing human motion from the current frame image to obtain a small-size image with the size of w x h, thereby completing coarse extraction.

The fourth step is that the off-line training module completes the determination and training of the detection of the mobile phone network

The off-line training module marks the mobile phone in the motion area image obtained by the motion foreground extraction module, completes construction of a training sample library, determines and constructs a deep convolutional neural network model for detecting the mobile phone from the image containing the human motion area, determines the number of network layers, definition of each layer, the number of convolutional surfaces of each layer, the size of a convolutional kernel, the size of a pooling layer, a computation function of the pooling layer, an activation function and a loss function, and then performs off-line learning training on unknown parameters of each convolutional kernel of the deep convolutional neural network by using the constructed sample library.

The convolutional layer elementary operation of the network is expressed by formula (7):

X_a,b+1＝f(∑X_b·W_a,b+b_a,b) (7)

in the formula (7), f is an activation function, W_a,bAnd b_a,bThe convolution kernel and the offset value X of the a-th convolution surface in the b-th layer of the network respectively_bRepresenting inputs to channels of layer b of the network, X_a,b+1Representing the output of the a-th volume area of the b-th layer of the network.

The basic operation of the pooling layer of the network is represented by equation (8):

X_a,b+1＝p(X_a,b) (8)

in the formula (8), X_a,bRepresenting the input, X, of the b-th channel of the network_a,b+1Representing the output of the a channel at the b layer of the network, p is the pooling layer calculation function.

The network full-connection layer basic operation is expressed by the formula (9):

y_b＝f(∑x_b·w_b+b_b) (9)

in formula (9), w_bAnd b_bRespectively representing weight and bias, x, of the b-th layer in the full connection layer_bRepresenting the input of the b-th layer of the fully connected layer, y_bRepresenting the output of the b-th layer in the fully connected layer.

During the training process, the parameters are updated with equation (10):

in the formula (10), η represents the learning rate designed in the training process, and the superscript (m) represents the calculated amount of the mth iteration process.

After iterative computation, the loss function loss is converged to the minimum value, a deep convolution neural network model suitable for detecting the mobile phone is obtained, and the off-line preparation stage is completed.

Fifthly, finishing final detection by using a mobile phone action detection module

And a mobile phone action detection module is used for detecting the mobile phone by utilizing the network model obtained by the offline training module, inputting the motion area image obtained by the motion foreground extraction module into the network model for calculation, and outputting a mobile phone detection result. When the mobile phone is detected in the moving area image by using the mobile phone action detection module, the action of using the mobile phone is considered to exist; when the mobile phone is not detected in the motion area image, it is considered that there is no motion using the mobile phone.

Therefore, mobile phone action detection based on motion foreground extraction is achieved.

The invention realizes the detection of the action of the mobile phone, extracts the motion foreground for coarse detection in the use scene, detects the mobile phone for fine detection in the small-size image of the motion foreground obtained by the coarse detection by utilizing the deep learning network, realizes the detection steps from coarse to fine, fully utilizes the space-time characteristic information and can achieve the effect of improving the detection accuracy.

Detailed Description

in the formula (1), a certain pixel point (X, y) takes the value of X at the moment t_t，w_i,tIs the weight of the ith Gaussian distribution, eta (X)_t,μ_i,t,∑_i,t)、μ_i,tSum Σ_i,tRespectively, the ith gaussian probability density function, the mean and the covariance matrix, and n is the dimension of gaussian distribution.

w_i,t＝(1-α)w_i,t-1+α (2)

μ_i,t＝(1-ρ)μ_i,t-1+ρX_t (3)

∑_i,t＝(1-ρ)∑_i,t-1+ρ[(X_t-μ_i,t)(X_t-μ_i,t)^T] (4)

in the formula (2) to the formula (4),

Inputting the frame image from the detection time t, comparing the frame image with a background model, and calculating pixel values X one by one_tAnd matching relation with the obtained B Gaussian distributions, wherein when the pixel value is matched with one of the previous B Gaussian distributions, the pixel point is a background point, otherwise, the pixel point is divided into a motion foreground. And calculating the pixel points in the frame image one by one according to the matching relation, and determining whether the pixel points can be matched with Gaussian distribution to obtain a binary image. The matching relationship is expressed by equation (6):

X_a,b+1＝f(∑X_b·W_a,b+b_a,b) (7)

X_a,b+1＝p(X_a,b) (8)

y_b＝f(∑x_b·w_b+b_b) (9)

During the training process, the parameters are updated with equation (10):

Claims

1. A mobile phone action detection method based on motion foreground extraction is characterized by comprising the following specific steps:

Use cell-phone action detecting system based on motion prospect draws includes: the device comprises a background model construction module, a motion foreground extraction module, an off-line training module and a mobile phone action detection module;

in the formula (1), a certain pixel point (X, y) takes the value of X at the moment t_t，w_i,tIs the weight of the ith Gaussian distribution, eta (X)_t,μ_i,t,∑_i,t)、μ_i,tSum Σ_i,tRespectively an ith Gaussian probability density function, a mean value and a covariance matrix, wherein n is the dimensionality of Gaussian distribution;

w_i,t＝(1-α)w_i,t-1+α (2)

μ_i,t＝(1-ρ)μ_i,t-1+ρX_t (3)

∑_i,t＝(1-ρ)∑_i,t-1+ρ[(X_t-μ_i,t)(X_t-μ_i,t)^T] (4)

in the formula (2) to the formula (4),

rho is the update rate of the model; after the model is updated, calculating the pixel point of each pixel point in the image

The motion foreground extraction module compares a current frame image of the video sequence with the background image model for calculation, extracts a motion foreground, and divides a target area containing human motion from the current frame image according to the motion foreground;

inputting the frame image from the detection time t, comparing the frame image with a background model, and calculating pixel values X one by one_tMatching relation with the obtained B Gaussian distributions, wherein when the pixel value is matched with one of the previous B Gaussian distributions, the pixel point is a background point, otherwise, the pixel point is divided into a motion foreground; calculating pixel points in the frame image one by one according to a matching relation, and determining whether the pixel points can be matched with Gaussian distribution to obtain a binary image; the matching relationship is expressed by equation (6):

in the formula (6), a point with a gray value of 0 is a background point, and a point with a gray value of 1 is a point motion foreground point;

after the motion foreground is extracted, performing connectivity analysis on the motion foreground, and segmenting a target area image containing human motion from the current frame image to obtain a small-size image with the size of w x h, thereby completing coarse extraction;

The off-line training module is used for marking the mobile phone in the motion area image obtained by the motion foreground extraction module, completing construction of a training sample library, determining and constructing a deep convolutional neural network model, detecting the mobile phone from the image containing the human motion area, determining the number of network layers, each layer definition, the number of convolutional surfaces, the size of a convolutional kernel, the size of a pooling layer, a pooling layer calculation function, an activation function and a loss function, and then performing off-line learning training on unknown parameters of each convolutional kernel of the deep convolutional neural network by using the constructed sample library;

X_a,b+1＝f(∑X_b·W_a,b+b_a,b) (7)

in the formula (7), f is an activation function, W_a,bAnd b_a,bThe convolution kernel and the offset value X of the a-th convolution surface in the b-th layer of the network respectively_bRepresenting inputs to channels of layer b of the network, X_a,b+1An output representing a layer b, layer a, volume area of the network;

X_a,b+1＝p(X_a,b) (8)

in the formula (8), X_a,bRepresenting the input, X, of the b-th channel of the network_a,b+1Representing the output of the a channel of the b layer of the network, and p is a pooling layer calculation function;

y_b＝f(∑x_b·w_b+b_b) (9)

in formula (9), w_bAnd b_bRespectively representing weight and bias, x, of the b-th layer in the full connection layer_bRepresenting the input of the b-th layer of the fully connected layer, y_bRepresents the output of the b-th layer in the fully connected layer;

during the training process, the parameters are updated with equation (10):

in the formula (10), eta represents the learning rate designed in the training process, and superscript (m) represents the calculated amount of the mth step of the iterative process;

after iterative computation, converging the loss function loss to the minimum value to obtain a deep convolution neural network model suitable for detecting the mobile phone, and completing an off-line preparation stage;

Detecting the mobile phone by using the mobile phone action detection module and the network model obtained by the offline training module, inputting the motion area image obtained by the motion foreground extraction module into the network model for calculation, and outputting a mobile phone detection result; when the mobile phone is detected in the moving area image by using the mobile phone action detection module, the action of using the mobile phone is considered to exist; when the mobile phone is not detected in the moving area image, the action of using the mobile phone is not considered to exist;

2. The method for detecting actions of the mobile phone based on the motion foreground extraction as claimed in claim 1, wherein the background model building module has the functions of: and fitting the background image by using a function to obtain a model, and updating the background model by combining the actual scene change of the video.

3. The method for detecting actions of a mobile phone based on motion foreground extraction as claimed in claim 1, wherein the motion foreground extraction module functions as: and comparing the video sequence with the background model, extracting the motion foreground, and segmenting a motion area through connectivity analysis.

4. The method for detecting actions of a mobile phone based on motion foreground extraction as claimed in claim 1, wherein the function of the off-line training module is: determining a detection network model, constructing a motion area image sample library, and performing network offline training by using the sample library.

5. The method for detecting actions of using mobile phone based on motion foreground extraction as claimed in claim 1, wherein the function of the module for detecting actions of using mobile phone is: and calculating the motion area image by using the network model, and detecting whether the action of using the mobile phone exists or not.