CN113139452A

CN113139452A - Method for detecting behavior of using mobile phone based on target detection

Info

Publication number: CN113139452A
Application number: CN202110416796.9A
Authority: CN
Inventors: 陈鸣; 冯晓硕; 杨文韬; 张大伟
Original assignee: People's Liberation Army 91054 Troops
Current assignee: People's Liberation Army 91054 Troops
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-07-20

Abstract

The invention provides a method for detecting a behavior of a mobile phone based on target detection, and relates to the technical field of target detection. Firstly, inputting a video or an image of a to-be-detected mobile phone behavior to a human hand target detection model, then detecting whether a human hand exists in the video or the image, storing a picture of the human hand to a human hand picture data set, and outputting the picture to a mobile phone classifier model; secondly, performing secondary classification on the hand picture by using the trained HOG + SVM model as a mobile phone classifier model, judging whether the hand picture is the hand picture with the mobile phone, and outputting the hand picture through three output modes; manually labeling the pictures in the hand picture data set, deleting the irrelevant data set, renaming the data set and generating a configuration file; and finally, training and testing an HOG + SVM classifier model through the processed hand data set and the configuration file to realize the detection of the behavior of the used mobile phone. The method combines a target detection technology based on a candidate area and a target detection technology based on an end-to-end area, and gives consideration to both detection accuracy and efficiency.

Description

Method for detecting behavior of using mobile phone based on target detection

Technical Field

The invention relates to the technical field of target detection, in particular to a detection method for using mobile phone behaviors based on target detection.

Background

At present, the detection of human hands in videos mainly utilizes a target detection technology in computer vision. The target detection technology is mainly divided into traditional target detection and current deep learning target detection. The traditional method for detecting the human hand target mainly utilizes a context detection mode, an HOG + SVM human hand detection mode and a detection mode based on color characteristics. While deep learning frameworks are mainly divided into two categories: a two-stage target detection algorithm and a one-stage target detection algorithm. The two stages are that firstly a series of candidate boxes as samples are generated by an algorithm, and then the samples are classified by a convolutional neural network, such as R-CNN, Fast R-CNN, FasterR-CNN and the like. One stage is to directly translate the problem of target box location into regression problem processing, such as YOLO, SSD, etc., without generating candidate boxes.

Although the technology of human hand detection is mature, the specific application of human hand detection mainly includes gesture detection, man-machine interaction and the like, and the application scenes of detecting whether a person holds an article or not and whether a mobile phone is used or not are few. The application of intelligent monitoring by using target detection is mostly monitoring abnormal behaviors of indoor crowds, monitoring attendance of students in classrooms through face recognition, detecting and monitoring vehicles or license plate numbers on roads and the like. The most applied aspect of mobile phone detection is that a camera detects whether a driver has a behavior of making a call in the driving process, and a relatively mature algorithm is also used for detecting whether the driver has the behavior of making a call in the driving process by using the mobile phone based on the face recognition technology, and if the face is shielded, the detection cannot be performed, so that the method cannot be applied to other scenes. And moreover, the data set of the mobile phone carried by the hand or the article carried by the hand is less, and the public data set is only the data set of the hand.

Disclosure of Invention

The invention aims to solve the technical problem of providing a mobile phone behavior detection method based on target detection, which is used for identifying human hands and detecting whether a mobile phone is held or not based on a target detection technology in computer vision, thereby finding out the behavior of illegally using the mobile phone in a certain scene.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: the method for detecting the behavior of the mobile phone based on the target detection comprises the following steps:

step 1, human hand detection: inputting a video or an image of a to-be-detected mobile phone behavior to a human hand target detection model, then detecting whether a human hand exists in the video or the image, storing a picture of the human hand to a human hand picture data set, and outputting the picture to a mobile phone classifier model;

transmitting a video to be detected into a trained SSD target detection model through two input modes of image detection and video detection; performing multi-scale feature detection on the video frame by the SSD target detection model to obtain coordinates and similarity of all candidate frames; comparing and judging the similarity of all candidate frames with a two-classification threshold of a human hand, if the candidate frames with the similarity larger than the two-classification threshold of the human hand exist, judging that the human hand exists in the frame of picture, storing the frame of picture into a data set of a human hand picture, and transmitting the frame of picture into a mobile phone classifier model;

step 2, constructing a mobile phone classifier model, and identifying a mobile phone: acquiring the hand picture input in the step 1, performing secondary classification on the hand picture by using the trained HOG + SVM model as a mobile phone classifier model, judging whether the hand picture is the hand picture with a mobile phone, and outputting the hand picture by three output modes of displaying a result video, storing the result video and storing the result picture;

after receiving a hand picture transmitted in the hand detection step, calling a trained HOG + SVM model to classify the hand picture, calculating HOG characteristic values of each candidate frame of the picture by using a sliding window technology to form a characteristic matrix, calculating the characteristic matrix by using a prediction function of an SVM classifier model to obtain a mobile phone two-class prediction value, if the mobile phone two-class prediction value is greater than a set mobile phone two-class threshold value, indicating that a mobile phone part is arranged in a window, framing the part with the mobile phone by using a detection frame, and outputting and storing a result video and a storage result picture in three output modes of displaying the result video;

step 3, processing the hand picture data set: manually labeling pictures in the collected hand picture data set, deleting irrelevant data sets, renaming the data sets and generating configuration files;

labeling the picture data in the hand picture data set by using labeling software, framing out a mobile phone part by using a labeling frame, printing a corresponding classification label, generating an annotation file, and storing the annotation file to a hand picture data set path; the image data which are left after manual labeling and are not labeled are useless, and the image data which are not labeled in the hand data set are deleted; renaming the file names of the marked pictures and the annotated files in the manual data set according to a standard format; finally, establishing a configuration file, and recording the corresponding relation and the file path of the labeled picture and the annotation file in the hand data set;

step 4, training a human hand classifier model: training and testing an HOG + SVM classifier model through the processed hand data set and the configuration file to realize the detection of the behavior of the used mobile phone;

acquiring the marked picture data and annotation file path in the data set through the configuration file in the hand picture data set; adjusting the picture to a fixed size, acquiring coordinates of a marking frame in the picture according to the annotation file, and calculating an HOG characteristic value as positive sample data; masking the marked frame part in the marked picture in the data set, and forming a test set by the masked picture; then, calculating the HOG characteristic value of each window of the pictures in the test set by using a sliding window technology as a negative sample, and generating a new SVM classifier model; carrying out difficult negative sample mining operation through a new SVM classifier model, adding negative samples predicted as positive samples in a test set into negative samples, removing negative samples with false alarm, and generating a final SVM classifier model; and (3) replacing the SVM classifier model in the step (2) with the final SVM classifier model to carry out secondary classification on whether the mobile phone exists or not, so that the detection on the behavior of the mobile phone is realized.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the method for detecting the mobile phone using behaviors based on target detection not only can help a user to detect the mobile phone using behaviors in a video input or camera real-time data stream input mode, but also can meet the requirements that the user collects a data set by taking an input source as a data collection source and changes the parameters of a detection model, so that the aim of optimization is fulfilled; meanwhile, the method combines a target detection technology based on a candidate area and a target detection technology based on an end-to-end area, the detection accuracy and efficiency are considered, and the self-built data set fills the vacancy of the public data set.

Drawings

Fig. 1 is a flowchart of a detection method using a mobile phone behavior based on target detection according to an embodiment of the present invention;

FIG. 2 is an architecture diagram of a software system for monitoring behavior of a mobile phone according to an embodiment of the present invention;

fig. 3 is a functional framework diagram of a software system for monitoring behavior of a mobile phone according to an embodiment of the present invention;

FIG. 4 is a timing diagram of a hand acquisition function provided by an embodiment of the present invention;

fig. 5 is a timing diagram of a detection function of a mobile phone according to an embodiment of the present invention;

FIG. 6 is a timing diagram illustrating data set processing functionality provided by an embodiment of the present invention;

FIG. 7 is a timing diagram of classifier training functions provided in accordance with an embodiment of the present invention;

FIG. 8 is a flowchart of an overall human hand detection module according to an embodiment of the present invention;

fig. 9 is an overall flowchart of a mobile phone classifier module according to an embodiment of the present invention;

FIG. 10 is a flowchart of an embodiment of a classifier training module;

fig. 11 is an interface diagram of a detection layer of a mobile phone according to an embodiment of the present invention;

fig. 12 is an interface diagram of a data set processing layer according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

In this embodiment, a method for detecting a behavior of a mobile phone based on target detection, as shown in fig. 1, includes the following steps:

labeling the picture data in the hand picture data set by using labeling software, framing out a mobile phone part by using a labeling frame, printing a corresponding classification label, generating an annotation file, and storing the annotation file to a hand picture data set path; the image data which are left after manual labeling and are not labeled are useless, and the image data which are not labeled in the hand data set are deleted; then, due to the fact that the marked picture data are disordered in naming, the file names of the marked pictures and the annotated files in the manual data set are renamed according to a standard format; finally, establishing a configuration file, and recording the corresponding relation and the file path of the labeled picture and the annotation file in the hand data set;

In this embodiment, the method for detecting the behavior of the mobile phone based on the target detection is adopted to develop a software system for monitoring the behavior of the mobile phone suitable for a client architecture, as shown in fig. 2, a front-end interface is rendered by using PyQt5+ Qtdesigner, all system service logics in a background are written in python language, and a code library for processing video pictures by the whole software is opencv-python. The human hand detection module is mainly based on Tensflow Object detectionaPI, takes Tensflow as a machine learning platform, and calls a human hand detection model in Tensflow Models. Then, the mobile phone classifier module and each module of the data set acquisition layer are independent respectively, provide corresponding interfaces, interact with the interfaces and respond to the requests of users. The whole software system runs on the Windows10 operating system.

In this embodiment, the developed mobile phone behavior monitoring software system includes two levels of functions as shown in fig. 3: the mobile phone detection layer and the data set acquisition layer. The mobile phone detection layer is divided into a human hand detection module and a mobile phone classifier module. The human hand detection module is responsible for detecting the position of a human hand in each frame of picture of the input stream, the picture frame with the human hand is transmitted to the mobile phone classifier module, and a sequence diagram of a human hand detection function is shown in fig. 4. The mobile phone classifier module is responsible for carrying out secondary classification on the input hand pictures, the pictures classified into the pictures with the mobile phone and the pictures without the mobile phone are taken as output results, the pictures with the mobile phone are displayed by the mobile phone part, and a time sequence diagram of the mobile phone detection function is shown in fig. 5.

The data set processing layer is divided into: a data set processing module and a classifier training module. The data set processing module is responsible for manually adding labels to the stored picture data, renaming the data, deleting useless data and the like. FIG. 6 is a timing diagram of data set processing functions. The classifier training module is mainly responsible for training a classifier by using the labeled data set, and finally obtaining a model for detecting the mobile phone. FIG. 7 is a timing diagram of the classifier training function.

(1) The human hand detection module is realized by a human hand target detection algorithm based on SSD, as shown in FIG. 8, a user inputs a video stream, calls a VideoCapture function in a code library opencv to capture a current frame of a video, judges whether the current frame is empty, if a return value is empty, the acquisition is abnormal or the end of the video is reached, and if not, human hand detection operation is carried out. The main detection operation flow is to convert the color space of the picture into RGB as preprocessing, and call detect _ objects function to obtain the detected similarity values of the candidate frames and the candidate frame areas. And taking the similarity value, the picture, the similarity threshold value and the like detected by the candidate frame and the candidate frame area as parameters of a detect _ box _ on _ image function to execute, and acquiring the picture with the similarity larger than the threshold value to classify. And finally, circularly capturing the picture frames of the video, exiting the circulation if the picture frames of the video are empty, and ending the process. The human hand detection module needs to load a trained human hand inference graph frozen _ inference _ graph as a model before detection, and a Session function in Tensflow is used for executing a loaded neural network. Acquiring candidate frames and candidate frame similarity values used for detection through the loaded human hand inference graph, and executing detection by Session to obtain results of boxes, scores, classes and num, which are positions of the candidate frames, similarity values, detected target types and detected number respectively; if the similarity reaches a threshold value, saving the hand picture by utilizing an imwrite function in opencv, and entering a classification stage.

(2) And the mobile phone classifier module is a module for realizing a classified target detection algorithm by taking an SVM classifier based on the HOG characteristics. As shown in fig. 9, the main workflow of the mobile phone classifier module is that the human hand detection module inputs a picture with a detected human hand into a svm _ run function, and first calls a hoddescriptor class in an opencv library as a descriptor of the HOG feature. Then, the size of the input picture is changed into 500 pixels long and 500 pixels wide, the HOG features of the window are calculated for every 10 pixels of the picture by using a sliding window with the size of 50 x 50, and the maximum value of the similarity in the detected window is obtained. And finally, after the circulation is finished, drawing a prediction frame in the picture frame as a processed picture result according to the position of the window with the maximum similarity, and finishing the execution of the mobile phone classifier module.

(3) And the data set processing module is a module for processing and integrating the collected hand picture data into a data set. The method mainly comprises the steps of opening labeling software labelImg, opening a picture folder needing to be labeled, screening hand pictures under a file path, creating a target area in a label box selection if a mobile phone exists in the hand pictures, and selecting a default PASCALVOC format to store an xml file. And after the marking is finished, starting to process the data. Firstly, traversing the picture under the picture file path, then traversing the labeled file under the labeled file path, comparing the file names of the two, and deleting the picture without labeling. Then, the pictures which are not deleted are traversed, and the picture files are renamed and sorted according to the naming format of PASCALVOC. And finally, for the convenience of training a data set by a classifier training module, traversing all the labeled files, and extracting the positions xmin, xmax, ymin and ymax of the image labeled area in the xml file by using a method in an xml. And recording the file name of the marked picture and the central point coordinate of the marked area in a label.

(4) And a classifier training module. The classifier training module has the function of training the HOG + SVM model in the mobile phone classifier module, and the data set is processed by the data set processing module. As shown in fig. 10, the main workflow of the classifier training module is to read the configuration file label.txt of the data set, and obtain the file name of each labeled picture and the coordinates of the center point of the labeled region. Then, reading all pictures, adjusting the size of the pictures to 500 × 500, calculating the HOG characteristics of each labeled area of the pictures, establishing an HOG characteristic matrix as a characteristic matrix of a positive sample, adding a mask larger than a candidate area in the pictures to change the positive sample into a negative sample, and calculating the HOG characteristics of the negative sample as the characteristic matrix of the negative sample. A label matrix is created that records the sample positive and negative of each vector representation. Then, an SVM model in a sklern library is quoted to train an SVM classifier according to the HOG characteristic matrix and the label matrix. The last step is difficult negative sample (hard negative mining), which is a step of retraining negative samples in the dataset that are easily classified as positive samples to improve the accuracy of the classifier. The specific implementation operation is that the picture data of the positive sample is obtained, a mask is added in the labeling area, so that the positive sample is changed into the negative sample, and the masked picture forms a test set; and detecting the picture by using the existing training model, adding the HOG characteristic value into the characteristic matrix if the picture is predicted to be a positive sample, adding the HOG characteristic value into the label matrix to be a negative sample, and retraining the SVM classifier model. And finally, storing the retrained SVM classifier model under the current directory by using a joblib library, and finishing the operation of the classifier training module. The interface of the mobile phone use behavior monitoring software system provided by the embodiment is also divided into a mobile phone detection layer interface and a data set processing layer interface. The mobile phone detection layer mainly has a function of detecting user input streams. The interface mainly provides two input modes for a user to detect, namely real-time monitoring and video detection. Fig. 11 is an interface diagram of a detection layer of a mobile phone. The data set processing layer is mainly a function for optimizing modification of the data set by a developer. The user can open the interface window of the data set processing layer by clicking the data set processing button in the mobile phone detection layer. FIG. 12 is a data set processing layer interface diagram.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims

1. A detection method for using mobile phone behavior based on target detection is characterized in that: the method comprises the following steps:

step 4, training a human hand classifier model: and training and testing an HOG + SVM classifier model through the processed hand data set and the configuration file to realize the detection of the behavior of the used mobile phone.

2. The method for detecting behavior of a mobile phone based on object detection as claimed in claim 1, wherein: the specific method of the step 1 comprises the following steps:

transmitting a video to be detected into a trained SSD target detection model through two input modes of image detection and video detection; performing multi-scale feature detection on the video frame by the SSD target detection model to obtain coordinates and similarity of all candidate frames; and comparing and judging the similarity of all candidate frames with a two-classification threshold of the hands, if the candidate frames with the similarity larger than the two-classification threshold of the hands exist, judging that the hands exist in the frame of picture, storing the frame of picture into a hand picture data set, and transmitting the frame of picture into a mobile phone classifier model.

3. The method for detecting behavior of a mobile phone based on object detection as claimed in claim 2, wherein: the specific method of the step 2 comprises the following steps:

after receiving a hand picture transmitted in the hand detection step, calling a trained HOG + SVM model to classify the hand picture, calculating HOG characteristic values of each candidate frame of the picture by using a sliding window technology to form a characteristic matrix, calculating the characteristic matrix by using a prediction function of an SVM classifier model to obtain a mobile phone two-class prediction value, if the mobile phone two-class prediction value is larger than a set mobile phone two-class threshold value, indicating that a mobile phone part is arranged in a window, using a detection frame to frame the part with the mobile phone, and outputting and storing a result video and a storage result picture in three output modes of displaying the result video, storing the result video and storing the result picture.

4. The method for detecting behavior of a mobile phone based on object detection as claimed in claim 3, wherein: the specific method of the step 3 comprises the following steps:

labeling the picture data in the hand picture data set by using labeling software, framing out a mobile phone part by using a labeling frame, printing a corresponding classification label, generating an annotation file, and storing the annotation file to a hand picture data set path; the image data which are left after manual labeling and are not labeled are useless, and the image data which are not labeled in the hand data set are deleted; renaming the file names of the marked pictures and the annotated files in the manual data set according to a standard format; and finally, establishing a configuration file, and recording the corresponding relation and the file path of the labeled picture and the annotation file in the hand data set.

5. The method for detecting behavior of a mobile phone based on object detection as claimed in claim 2, wherein: the specific method of the step 4 comprises the following steps: