CN111832450B

CN111832450B - Knife holding detection method based on image recognition

Info

Publication number: CN111832450B
Application number: CN202010616604.4A
Authority: CN
Inventors: 吉翔
Original assignee: Chengdu Ruiyan Technology Co ltd
Current assignee: Chengdu Ruiyan Technology Co ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2023-11-28
Anticipated expiration: 2040-06-30
Also published as: CN111832450A

Abstract

The invention discloses a knife holding detection method based on image recognition, which belongs to the technical field of security protection, and mainly comprises the steps of training a pedestrian detection model and an attention knife holding classification model, then decoding a monitoring video into pictures, obtaining position information of a person shape on the pictures through the pedestrian detection model, expanding the positions, and then intercepting the pictures according to the expanded positions to obtain sampling pictures; and sending the sampling pictures into an attention knife holding classification model to identify whether a current pedestrian holds a knife or not, and giving an alarm if the current pedestrian holds the knife. According to the invention, after a single person is detected through the pedestrian detection model, the single person is used for inputting, the interference of the environment is eliminated, so that the invalid information of the input attention-holding knife classification model is greatly reduced, the classification accuracy is improved, and the characteristics of key parts on a picture are enhanced based on the feature extraction module and the attention module added in the attention-holding knife classification model, so that the model is focused on the important characteristics, and the classification accuracy is greatly improved.

Description

Knife holding detection method based on image recognition

Technical Field

The invention relates to the technical field of security and protection, in particular to a knife holding detection method based on image recognition.

Background

Security mainly refers to security protection, and in particular to preparation for coping with attacks or avoiding damages, so that a protected object is in a safe state without danger, invasion or accident. Obviously, the safety is the purpose, the prevention is means, and the purpose of safety is achieved or realized through the prevention means.

Since public places such as railway stations, malls, subways, hospitals and the like have high mobility of people, the places are many and complex in people, and multiple casualties can be caused once dangerous behaviors occur. The safety requirements of public places are very strict, and weapons are required to be found as early as possible, so that potential safety hazards are eliminated, and property, especially life losses are reduced. Current technology for detecting weapons is mainly based on two ways, metal detectors and target detection neural networks.

The principle of the technology based on the metal detector is that eddy current is generated on the metal surface by using a variable magnetic field, and then the magnetic field is generated again so as to be induced by the detector. The metal detector core is a detection coil. When the coil is electrified, a magnetic field is generated, and metal enters the magnetic field to cause magnetic field change, so that the presence of metal substances is judged. The detector generates a periodically varying magnetic field that generates an eddy current electric field in space. And eddy currents, if encountered by metal, can be detected. However, in practical applications, there are some effective objects of non-metal substances, such as high-salt products, whose detection effect is higher than that of metal, so there is false detection. Meanwhile, the method is influenced by the equipment, the equipment occupies large space and can only detect fixed positions, and the detection is inconvenient and has limited detection range.

The deep neural network method adopting the target detection aims at the knife itself or aims at the knife holder itself. In the method, a large number of pictures of the cutters are collected, marked manually, and the cutters are marked. These large numbers of pictures are then used to train a target detection model based on the neural network algorithm model. When the method is applied to actual deployment, the picture is input into a target detection model, the feature information of the picture is extracted by a bottom model, and the subsequent network predicts based on the feature information and the predefined anchor at each position in the image. Predictions can be classified into category predictions, which are a classification task to predict whether the position is a target object, i.e., a knife, and position predictions, which are a regression task to predict the deviation of the actual object position from a predefined position. The training of the method summarizing model requires a large amount of labeling data, namely a large amount of pictures with cutters in different scenes, and the data are difficult to collect and have limited collection types, so that the effect on cutters with small duty ratio is poor. Meanwhile, the effect of the algorithm on small objects is difficult to ensure, and meanwhile, false alarms are easy to occur on objects with various colors and shapes similar to the cutter.

Disclosure of Invention

The invention aims at: the knife holding detection method based on image recognition solves the technical problems that in the existing knife holding detection method, the knife holding detection effect is poor, the effect on small objects is difficult to ensure, and meanwhile, false alarms are easy to occur on objects with various colors and shapes similar to the knife.

The technical scheme adopted by the invention is as follows:

the knife holding detection method based on image recognition comprises the following steps:

s1, training a pedestrian detection model for identifying a human shape through a deep learning neural network structure;

s2, training an attention knife holding classification model for identifying knife holding behaviors through a deep learning neural network structure;

s3, transmitting the video shot by the monitoring camera in real time back to the background server;

s4, decoding the returned video stream data into pictures by the background server;

s5, inputting the picture obtained in the step S4 into a pedestrian detection model, and acquiring the humanoid position in the picture by the pedestrian detection model: marking the humanoid forms by using rectangular frames, and acquiring the position information (x, y, w, h) of each humanoid form according to the position and the size information of the rectangular frame marked with the humanoid form in the picture, wherein x and y are the left upper corner coordinates of the rectangular frame where the humanoid form is located, and w and h are the width and the height of the rectangular frame of the humanoid form;

s6, expanding the rectangular frame according to the position information (x, y, w and h) obtained in the step S5 to obtain a new rectangular frame, and then intercepting a picture according to the new rectangular frame to obtain a sampling picture;

s7, sending the sampling pictures into an attention-holding knife classifying model, extracting features from the input humanoid images by the attention-holding knife classifying model, identifying whether the current pedestrian holds a knife or not based on the features, and giving an alarm if the current pedestrian holds a knife.

Further, in the step S6, in the new rectangular frame, the position information of the humanoid form in the frame is (x-0.1 w, y,1.2w,1.1 h).

Further, the step S1 specifically includes the following steps:

a. data preparation: shooting videos and/or collecting various public videos related to pedestrians;

b. labeling: decoding the video into a picture, marking the human shape in the picture through a rectangular frame by using marking software, and obtaining coordinates (x, y, w, h) of the rectangular frame according to the marking, wherein the coordinates are the positions of the human shape in the picture;

c. training: and c, adopting a pure yolov3 full network or adopting an effective-bo frame to make a network structure of a background bone followed by a yolov3 light detection head as a pedestrian detection model, then taking the picture marked in the step b as the input of the network structure, obtaining and outputting predicted human shape position information (x 1, y1, w1, h 1) by the network structure, taking the human shape position in the picture as a prediction target, calculating the difference between a prediction result and a true value, updating the network weight by using a back propagation algorithm, and finally training out the pedestrian detection model which is basically matched with the calculated prediction result and the true value and used for identifying human shape.

Further, the step S2 specifically includes the following steps:

A. data preparation: shooting a person knife holding picture and/or collecting various public pictures related to the person knife holding;

B. machine pretreatment: b, cutting out the humanoid region from the original image by using a rectangular frame for the image obtained in the step A to obtain an image with an artificial main body;

C. labeling: b, marking by a marking person whether the person holds a knife for classification in the picture obtained in the step B, wherein the knife holding is a positive sample, and the knife holding is a negative sample;

D. training: c, taking positive and negative samples obtained in the step C as the input of the network structure, inputting the positive and negative samples into the network structure, predicting whether the network structure is a positive sample of a knife holding, calculating the difference between a predicted result and a real result, updating network weights by using a back propagation algorithm, and finally training out the attention knife holding classification model which is used for identifying the knife holding behavior and is basically matched with the calculated predicted result and the real value.

Further, in the step S7, the attention-holding classifying model extracts features from the input picture through a backup, the extracted features are processed through a feature extracting module, the data processed through the feature extracting module is processed through a texture module, and finally, whether the shapes of the people in the classifying picture are the people holding the knife or not is processed through a Class module

Due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. according to the knife holding detection method based on image recognition, firstly, the human shape position of the picture is detected, after a single person is detected through the pedestrian detection model, the single person is used for inputting, the interference of the environment is eliminated, the invalid information of the input attention knife holding classification model is greatly reduced, and therefore the classification accuracy is improved; the invention realizes reliable knife holding detection, is convenient to implement without complex equipment, and has a refined model structure, thereby realizing real-time alarm monitoring rapidly and accurately;

2. according to the knife holding detection method based on image recognition, the feature extraction module and the attribute module are added into the attention knife holding classification model, so that the characteristics of key parts are enhanced, the model is focused on important characteristics, and the classification accuracy is greatly improved.

Drawings

For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and should not be considered as limiting the scope, for those skilled in the art, without performing creative efforts, other related drawings may be obtained according to the drawings, where the proportional relationships of the components in the drawings in the present specification do not represent the proportional relationships in actual material selection design, and are merely schematic diagrams of structures or positions, where:

FIG. 1 is a schematic flow chart of the present invention;

fig. 2 is a schematic diagram of the network structure of the attention-holding knife classification model.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.

The present invention will be described in detail with reference to fig. 1 and 2.

Example 1

As shown in fig. 1 and 2, the method for detecting knife holding based on image recognition of the present invention comprises the following steps:

s5, inputting the picture obtained in the step S4 into a pedestrian detection model, and acquiring the humanoid position in the picture by the pedestrian detection model: marking the humanoid by using a rectangular frame through marking software, and acquiring the position information (x, y, w, h) of each humanoid according to the position and the size information of the rectangular frame marked with the humanoid in the picture, wherein x and y are the left upper corner coordinates of the rectangular frame where the humanoid is located, and w and h are the width and the height of the rectangular frame of the humanoid;

There are various extensions of the rectangular frame, and in the present invention, preferably, in the step S6, the location information of the humanoid form in the new rectangular frame is (x-0.1 w, y,1.2w,1.1 h).

In order to realize real-time high-accuracy identification of whether a tool holding person exists, the invention mainly comprises two deep neural network models: pedestrian detection model, based on attention-to-hold knife classification model. Firstly, the invention aims to detect whether a person holds a knife, so the core idea is to analyze attribute analysis of each person appearing in a monitoring scene and analyze whether the person holds the knife. After the video shot by the camera is decoded into a picture through Opencv, the picture is firstly transmitted into a pedestrian detection model, and the model can detect people in the image in real time. The model obtains the location of each person in the image. If the human-shaped position detected by the model is directly output, the human-shaped main body may be only included, for example, the human hands are flattened, and the palm position may not be in the rectangular frame. Therefore, for each position of the acquired humanoid, the position is expanded, a rectangular frame is enlarged to ensure that the hand area is contained, then, for the expanded position, a humanoid picture is intercepted, and the obtained humanoid picture is transmitted into a follow-up attention-holding-knife-based classification model. And then analyzing the input humanoid picture based on the attention-holding knife classifying model, extracting various appearance characteristics of the image, wherein an attention mechanism can highlight characteristic information of a part of areas such as hand positions, analyzing whether the pedestrian holds a knife or not based on the extracted characteristics, and outputting an alarm if the humanoid picture holds the knife.

In the invention, the pedestrian detection model analyzes the whole graph, detects the positions of all the figures, expands the positions of the figures, ensures that the hands of the person are positioned in the rectangular frame, then obtains detailed position information of the figures, and lays a foundation for judging whether the person holds a knife or not by the follow-up attention-holding knife classification model.

In the knife holding detection method designed by the invention, the picture is firstly subjected to human shape position detection, and after a single person is detected by the pedestrian detection model, the single person is used for inputting, so that the interference of the environment is eliminated, the invalid information of the input attention knife holding classification model is greatly reduced, and the classification accuracy is improved. The invention realizes reliable knife holding detection, is convenient to implement, does not need complex equipment, and can quickly and accurately realize real-time alarm monitoring by refining the model structure.

Example 2

This embodiment is a description of training of the pedestrian detection model in embodiment 1.

The step S1 specifically comprises the following steps:

The pedestrian detection model can be called a single-stage detection model, firstly, the image is subjected to feature extraction through the deep neural network, and the extracted features are closer to advanced semantic features along with the deep layer number of the model. And the receptive field is enlarged. By the network structure, the model can extract target features with different sizes, shallow layers are more beneficial to extracting small target features, and deep layer features are more beneficial to extracting large target features. And through feature fusion, the feature information of each layer is richer. The structure ensures that large targets or small targets possibly appearing in different service scenes can be well detected

Example 3

This example describes training of the attention-holding classification model of example 1.

The step S2 specifically comprises the following steps:

Example 4

In the invention, the network structure of the attention-holding knife classification model is specifically as follows:

the improvement and modification based on the acceptance V3 network structure is specifically as follows: cut from mixed6, connect 1*1 convolution, connect 5*5 convolution, then connect 512-dimensional full-connection layer again, finally get output, its structure is as shown in figure 2, concretely, attention holding knife classifying model extracts characteristics from input picture through backstbone, the extracted characteristics are processed by feature extracting module, the data processed by feature extracting module is processed by attribute module, finally classification whether person in picture is holding knife or not is processed by Class module.

The accuracy is lower and is 87% in the judgment of whether a pedestrian holds a knife or not in the existing acceptance V3 network structure; after the existing acceptance V3 network structure is improved, in the judgment of whether a driver holds a knife or not, the accuracy is obviously increased, and the accuracy can reach 95%.

The pictures output by the pedestrian detection model are input into the attention-holding classification model, and the pictures are processed in the improved acceptance V3 network structure as follows: and extracting the characteristics through a backstbone after inputting the picture. The backup adopts a pretrain model which is pretrained in various large-scale classification tasks, so that the robustness and convergence speed of the model are greatly enhanced. The extracted features are processed by a feature extraction module, and the feature extraction module is introduced as follows: the traditional classification task is to classify the semantics of the whole graph, the task aims at classifying whether people carry a knife or not, the proportion of global semantic information occupied by the knife is smaller, and the module is introduced to amplify the characteristics of the hand because the characteristics of the part are more important. After the feature extraction module, the attention module is added, the inspiration of the module is derived from biology, the human eye working process, the attention of the module is different for different parts of the observed picture, the attention principle is that the human eye can pay more attention to important positions in the picture, and the attention is smaller for positions which are far from the important positions. The goal of the intent model is to make the model learn the region that should be more focused, and thus enlarge the features of that region. Finally, the characteristics are classified by a Class module as to whether the characteristics are the person holding the knife.

The attribute module takes the characteristics extracted by the backup and feature extraction modules as input, calculates the relative weights of the channels and the relative weights of the spatial positions according to the characteristics of the channels and the spatial positions, combines the calculated weights with the positions of the feature map, controls the importance degree of the channels and the spatial positions, and transmits the obtained output to the rear for classification. The module weakens unimportant features such as noise of a background part, strengthens the features of important parts such as human-shaped hand parts, and greatly improves the accuracy of the later classification result.

Firstly, compared with the data required by the pedestrian detection model adopted by the network, the data required by the pedestrian detection model is directly obtained, the difficulty is greatly reduced, and the reliability of the detection result is greatly improved.

The classification network designed by the invention takes a single person as input, eliminates the interference of the environment, and greatly reduces the invalid information of the input network. The feature extraction module and the attribute module are added into the classification network, so that the characteristics of key parts are enhanced, the model is focused on important characteristics, and the classification accuracy is greatly improved.

The invention realizes reliable knife holding detection, is convenient to implement, does not need complex equipment, and can realize real-time alarm monitoring by refining the model structure.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not creatively contemplated by those skilled in the art within the technical scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope defined by the claims.

Claims

1. The knife holding detection method based on image recognition is characterized by comprising the following steps of: the method comprises the following steps:

s7, sending the sampling pictures into an attention-holding knife classifying model, extracting features from the input humanoid images by the attention-holding knife classifying model, identifying whether a pedestrian holds a knife or not based on the features, and giving an alarm if the pedestrian holds the knife;

the step S1 specifically comprises the following steps:

c. training: b, adopting a pure yolov3 full network or adopting an effective-bo frame to make a network structure of a background bone and then adopting a yolov3 lightweight detection head as a pedestrian detection model, then taking the picture marked in the step b as the input of the network structure, obtaining and outputting predicted human shape position information (x 1, y1, w1, h 1) by the network structure, taking the human shape position in the picture as a prediction target, calculating the difference between a prediction result and a true value, updating network weight by using a back propagation algorithm, and finally training out a pedestrian detection model which is basically matched with the calculation prediction result and the true value and used for identifying human shape;

the step S2 specifically comprises the following steps:

D. training: c, taking positive and negative samples obtained in the step C as the input of the network structure, inputting the positive and negative samples into the network structure, predicting whether the network structure is a positive sample of a knife holding, calculating the difference between a predicted result and a real result, updating network weights by using a back propagation algorithm, and finally training out the attention knife holding classification model for identifying the knife holding behavior, wherein the calculated predicted result and the real value are basically matched.

2. The method for detecting knife holding based on image recognition according to claim 1, wherein: in the step S6, in the new rectangular frame, the position information of the humanoid form in the frame is (x-0.1 w, y,1.2w,1.1 h).

3. The method for detecting knife holding based on image recognition according to claim 1, wherein: in the step S7, the attention-holding classifying model extracts features from the input picture through a backup, the extracted features are processed through a feature extracting module, the data processed through the feature extracting module is processed through a categorical module, and finally, the classifying module classifies whether the person shape in the picture is the person holding the knife.