CN117593698A

CN117593698A - Regional target intrusion detection method, device and system and storage medium

Info

Publication number: CN117593698A
Application number: CN202311651203.2A
Authority: CN
Inventors: 马忠丽; 刘甲甲; 周巧; 黄俊杰; 万毅; 张佳鹏; 张顺; 安若瑾; 张航天; 宋兴洋
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2023-12-04
Filing date: 2023-12-04
Publication date: 2024-02-23
Anticipated expiration: 2043-12-04
Also published as: CN117593698B

Abstract

The invention discloses a regional target intrusion detection method, a device, a system and a storage medium, wherein the method comprises the following steps: constructing a lightweight target detection model, wherein the lightweight target detection model comprises a Block_Net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output a feature map, and the classifier comprises a global average pooling layer and a full connection layer; and training the lightweight target detection model by utilizing the data set to obtain a trained lightweight target detection model, and completing regional target intrusion detection based on the trained lightweight target detection model. The invention occupies less computer resources, has the highest detection precision, higher comprehensive cost performance and better economic benefit.

Description

Regional target intrusion detection method, device and system and storage medium

Technical Field

The present invention relates to the field of target detection technologies, and in particular, to a method, an apparatus, a system, and a storage medium for detecting intrusion of a regional target.

Background

Traditional intrusion detection methods play an important role in the security field, and the traditional methods generally depend on manually designed features and rules, but are difficult to adapt to complex and changeable intrusion scenes along with technological development and application scene changes. There are some limitations to this approach:

(1) The characteristic design is difficult: the manual design features need to rely on expert knowledge and experience and may not be generalizable for complex intrusion behavior and scene changes.

(2) Limited representation capability: the traditional method has limited characteristic representation capability, is difficult to capture complex targets and intrusion behaviors, and has limited detection performance.

Deep learning is a powerful machine learning technique that can achieve excellent performance in recognition and classification tasks by learning advanced features in data. However, conventional deep learning models typically have a large number of parameters and computational complexity and are not suitable for resource-constrained devices or scenarios.

Disclosure of Invention

The present invention has been made to solve the above-mentioned problems occurring in the prior art. Therefore, there is a need for a method, apparatus, system and storage medium for detecting regional target intrusion, in which researchers typically optimize CNNs to reduce the number of parameters and computational complexity of the network, such as by using a lightweight network architecture (e.g., mobileNet, shuffleNet) or applying model pruning and compression techniques, for example. By adopting a lightweight network architecture (such as MobileNet, shuffleNet and the like) or applying a model pruning and compression technology, the number of parameters and the computational complexity of the model can be remarkably reduced, and meanwhile, higher detection performance is kept.

According to a first aspect of the present invention, there is provided a method for detecting regional target intrusion, the method comprising:

constructing a lightweight target detection model, wherein the lightweight target detection model comprises a Block_Net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer;

training the lightweight target detection model by utilizing a data set to obtain a trained lightweight target detection model, wherein the data set comprises a plurality of original images marked with target information;

and finishing the regional target intrusion detection based on the trained lightweight target detection model.

Further, the Block is used for realizing three operations of channel-by-channel grouping convolution, depth separable convolution and point convolution;

the channel-by-channel grouping convolution divides an input feature map into a plurality of channel groups and rearranges the input feature map in a channel dimension;

the depth separable convolution is a two-step decomposition of a standard convolution into a depth convolution and a point-by-point convolution, the depth convolution is used for independently processing each channel of the input feature map in the channel dimension, and the point-by-point convolution is used for linearly combining features of different channels;

the point convolution is a convolution kernel using 1x1 for adjusting the number of channels and the linear combination of features.

Further, the detection head network structure comprises a dp_detect structure, wherein the dp_detect structure is used for decomposing a target detection task into two subtasks of target classification and target positioning, the target classification is responsible for predicting the class information of the target, and the target positioning is responsible for predicting the boundary box position information of the target.

Further, dividing the image into grids, each grid predicting a fixed number of bounding boxes; for each bounding box, the probabilities that the targets belong to different categories are predicted. For each grid, the model outputs a vector containing class probabilities, where each element corresponds to a particular class; converting the vector containing the class probabilities into a probability distribution by using a softmax function, wherein the probability distribution represents the probability that the target belongs to each class;

the target classification adopts a classification network for predicting the class probability distribution of the target; the target positioning adopts a regression network, and the regression network is utilized to predict the coordinate offset of each boundary frame relative to the grid unit and the width and the height of the boundary frame; the position information of the boundary box is normalized by using a sigmoid function, so that the values of the coordinate offset and the size of the boundary box are ensured to be between 0 and 1, and the calculation formula of the optimization of the preset target box is as follows:

wherein R (x) is the calculation of the initial cluster center point to each data sample x _i P (x) is the probability that each sample will become the center of the next cluster.

Further, the lightweight target detection model is trained by:

according to the pixel points of the original image as two-dimensional coordinates, and according to the coordinates, a closed area is defined, namely a detection area is divided, so that the detection range of the electronic fence can be determined; after the detection range of the electronic fence, the image of the corresponding range is kept unchanged, and the outside of the detection range is covered by pure color;

detecting an electronic fence area defined by the image, and recording target information, wherein the target information comprises the category, probability and target frame of a target;

and labeling the recorded target information on the original image.

Further, after the data set is obtained, data enhancement is carried out on the data set to obtain a training data set, the data enhancement comprises random cutting, overturning, rotating and scaling, and the training data set is utilized to train the light-weight target detection model to obtain a trained light-weight target detection model.

Further, when the lightweight target detection model is trained by utilizing the data set, the training process and performance of the super-parameter optimization model are adjusted, wherein the super-parameters comprise the learning rate, the batch size and the regularization coefficient.

According to a second aspect of the present invention, there is provided an area target intrusion detection apparatus, the apparatus comprising: a model building module configured to build a lightweight target detection model, the lightweight target detection model comprising a block_net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer; the model training module is configured to train the lightweight target detection model by utilizing a data set to obtain a trained lightweight target detection model, wherein the data set comprises a plurality of original images marked with target information; and the target detection module is configured to complete regional target intrusion detection based on the trained lightweight target detection model.

According to a third aspect of the present invention, there is provided a regional target intrusion detection system, the system comprising: a memory for storing a computer program; a processor for executing the computer program to implement the method as described above.

According to a fourth aspect of the invention, there is provided a non-transitory computer readable storage medium storing instructions which, when executed by a processor, perform the method as described above.

The method, the device, the system and the storage medium for detecting the regional target intrusion according to the various schemes have at least the following technical effects:

1) The invention aims to combine the advantages of deep learning and the characteristics of a lightweight model, and the lightweight model can realize efficient and accurate target detection and intrusion detection on equipment with limited computing resources while keeping higher detection performance.

2) The invention emphasizes that the target detection and intrusion detection are carried out on the interested region in the image or video, can improve the detection efficiency and accuracy, reduces the processing of irrelevant regions, and makes the system more suitable for real-time application and resource-limited environments.

3) The regional target intrusion detection method based on the lightweight deep learning model has higher application potential and can be widely applied to the fields of video monitoring, safety systems, intelligent equipment and the like. By taking advantage of the ability to learn deep and the lightweight model, the method is expected to improve the performance and usability of intrusion detection systems.

Drawings

Fig. 1 shows a flowchart of a method for detecting regional target intrusion according to an embodiment of the present invention.

Fig. 2 shows a schematic structural diagram of a lightweight object detection model y_lst according to an embodiment of the present invention.

Fig. 3 shows a block_net network structure diagram according to an embodiment of the present invention.

Fig. 4 shows a schematic diagram of a coupled_detect network structure according to an embodiment of the present invention.

Fig. 5 shows a schematic diagram of region detection according to an embodiment of the present invention, where (a) represents image pixel coordinate points and (b) represents a custom detection region.

Fig. 6 shows a diagram of detection results of the PC-side y_lst algorithm according to an embodiment of the present invention.

FIG. 7 shows a model conversion flow diagram according to an embodiment of the invention.

Fig. 8 shows a diagram of a detection result of a mobile device end of macepad 11 according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the drawings and detailed description to enable those skilled in the art to better understand the technical scheme of the present invention. Embodiments of the present invention will be described in further detail below with reference to the drawings and specific examples, but not by way of limitation. The order in which the steps are described herein by way of example should not be construed as limiting if there is no necessity for a relationship between each other, and it should be understood by those skilled in the art that the steps may be sequentially modified without disrupting the logic of each other so that the overall process is not realized.

Fig. 1 shows a flowchart of a method for detecting an intrusion of a regional target, and an embodiment of the present invention provides a method for detecting an intrusion of a regional target, as shown in fig. 1, where the method includes steps S100-S300.

And S100, constructing a lightweight target detection model.

The schematic structure diagram of the lightweight target detection model y_lst in this embodiment is shown in fig. 2, and features that the backbone network adopts gcs_block, and the network detection head adopts dp_detect.

Specifically, the backbone network of the embodiment adopts a block_net network structure, belongs to model structure design, is a model optimization technology, and mainly comprises two parts of packet convolution (group convolution) +channel rearrangement (Channel Shuffle), and the ShuffleNet has the characteristics of small calculated amount and parameter amount, high model efficiency and high operation speed. The "combining" of the two networks aims at combining the light weight and efficiency of the ShuffleNet with the accuracy of the target detection algorithm of the convolutional network.

As shown in fig. 3, the block_net network structure is a lightweight deep convolutional neural network, and aims to realize efficient image classification and target detection on devices with limited computing resources. It employs an operation called "Shuffle" to reduce the amount of computation and the number of parameters while maintaining model accuracy. The following is a brief introduction of the block_net network architecture:

the block_Net network structure consists of a backbone network and a classifier, wherein the backbone network adopts a series of Block blocks.

Block: the core of the Block_Net network architecture is a Block Block, which consists of three key operations: channel-by-Channel packet convolution (Channel Shuffle), depth separable convolution, and point convolution.

(1) Channel-by-Channel packet convolution (Channel buffer): this operation divides the input feature map into multiple channel groups and reorders in the channel dimension to facilitate information exchange and mixing. It helps to increase the diversity and richness of the feature representation.

(2) Depth separable convolution: this convolution layer decomposes the standard convolution into two steps, a deep convolution and a point-by-point convolution, similar to the design in the MobileNet series. The depth convolution is used to process each channel of the input feature map independently in the channel dimension, while the point-by-point convolution is used to linearly combine the features of the different channels.

(3) Point convolution: a 1x1 convolution kernel is used to adjust the linear combination of channel number and characteristics.

2. Backbone network: the backbone network of the Block_Net network architecture consists of a series of stacked Block blocks and pooling layers. It extracts and combines features of different scales by stacking blocks multiple times to capture more comprehensive image information.

3. A classifier: the classifier of the ShuffleNet consists of a global averaging pooling layer and a series of fully connected layers. The global average pooling layer converts the feature map of the last Block into feature vectors with fixed sizes, and then carries out classification prediction through the full-connection layer.

Specifically, classification is achieved by the following method:

dividing the image into grids, each grid predicting a fixed number of bounding boxes; for each bounding box, the probabilities that the targets belong to different categories are predicted. For each grid, the model outputs a vector containing class probabilities, where each element corresponds to a particular class; converting the vector containing the class probabilities into a probability distribution by using a softmax function, wherein the probability distribution represents the probability that the target belongs to each class;

The ShuffleNet reduces the amount of computation and the number of parameters by using channel-by-channel packet convolution and depth separable convolution to achieve a lightweight network architecture. The channel-by-channel packet convolution facilitates feature exchange and mixing, helping to improve feature diversity. Depth separable convolution provides an efficient way to process the input feature map.

In general, the network architecture of ShuffleNet operates through Block blocks, channel-by-channel packet convolution, and depth separable convolution for efficient image classification and object detection on limited devices. By combining channel-by-channel packet convolution with depth separable convolution, the ShuffleNet reduces the amount of computation and the number of parameters while maintaining higher accuracy. The method has a good balance among the size of the model, the calculation cost and the accuracy, and is suitable for the scenes of mobile equipment, embedded systems and limited calculation resources.

The detected_detect decoupling head is detected as a detection head. The dp_detect structure breaks down the object detection task into two sub-tasks, the former being responsible for predicting class information of the object and the latter being responsible for predicting bounding box position information of the object. The target classification branch generally adopts a classification network for predicting the class probability distribution of the target; the target positioning branch adopts a regression network for predicting the boundary box position of the target. The network structure of the coupled_detect is shown in fig. 4.

The DP_detect structure processes the two subtasks separately, so that the network can learn the detail and the space information of the target better, and the detection precision and the positioning accuracy of the target are improved.

The lightweight target detection model comprises a Block_Net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer;

step 200, training the light-weight target detection model by using a data set to obtain a trained light-weight target detection model, wherein the data set comprises a plurality of original images marked with target information.

The collection scene of the embodiment is a reservoir, including a road around the reservoir, a bayonet, important facilities, a scene where people are easy to gather, and a water surface of the reservoir, that is, only a region where stepping is not allowed in a camera is detected, that is, an irregular electronic fence is defined for a detection range. Therefore, the training process of the lightweight target detection model is specifically as follows:

step 1: determining the detection range of the electronic fence, wherein the process needs to calculate the (x, y) coordinates of each vertex of the irregular range, and the ratio of the width to the height in the whole occupied image is represented by the coordinates of the vertices of the lower left corner and the upper right corner of the image as shown in (a) in fig. 5;

step 2: after obtaining the defined detection range, keeping the image of the range unchanged, and covering the outside of the detection range with pure colors, wherein a blue region is a detection region as shown in (b) of fig. 5;

step 3: detecting the self-defined electronic fence area of the image, and recording information such as the category, probability, target frame and the like of the target;

step 4: and (3) marking the target information recorded in the step (3) on the original image.

In some embodiments, the training method of the lightweight model refers to a training process of the lightweight model by reducing the computational complexity, parameter amount and storage space of the model through specific training strategies and technologies aiming at equipment or application scenes with limited resources.

The training method of lightweight models generally includes the following techniques and strategies:

(1) And (3) model structural design: and designing a lightweight model structure, such as a depth separable convolution, a lightweight inverse residual structure, a channel-by-channel grouping convolution and the like. These structures can reduce the amount of calculation and the number of parameters while maintaining a certain accuracy.

(2) Data enhancement and preprocessing: through data enhancement and preprocessing technology, the diversity and the richness of training data are increased, and the generalization capability of the model is improved. Common data enhancement techniques include random cropping, flipping, rotation, scaling, and the like.

(3) Super parameter tuning: the training process and performance of the model are optimized by adjusting super parameters such as learning rate, batch size, regularization coefficient and the like.

By comprehensively utilizing the training method, the lightweight model can be effectively trained to meet the requirements of equipment or application scenes with limited resources. The method can improve the reasoning speed of the model, reduce the occupied storage space and maintain the accuracy of the model to a certain extent. The specific training method can be adjusted and selected according to factors such as task requirements, data set characteristics, model structures and the like.

Finally, in step S300, area target intrusion detection is completed based on the trained lightweight target detection model.

The area target intrusion detection method based on the lightweight deep learning model has a plurality of advantages. Firstly, by means of the feature extraction capability of the deep learning model, complex targets and intrusion behaviors can be captured, and detection accuracy is improved. Second, the application of lightweight models allows detection to be performed on resource-constrained devices, such as embedded systems, mobile devices, and the like. In addition, the method has the characteristics of high instantaneity and adaptability, and can cope with rapidly-changing invasion scenes.

In summary, the area target intrusion detection method based on the lightweight deep learning model combines the advantages of deep learning and the high-efficiency performance of the lightweight model, and brings new possibility to the intrusion detection field. The method has wide application prospect, and can be used for improving the safety and reliability of the fields of video monitoring, safety systems, intelligent equipment and the like.

Through preprocessing the data set and improving the network model, and through training the data set, a network detection model with excellent detection performance can be obtained, and the model is used for final target detection, so that the types and the number of targets entering and exiting can be accurately detected.

The detection effect is shown in fig. 6. The detection results are analyzed and verified to be available, the final detection results are free from the situations of false detection and missing detection of targets, the target types and the number in the range can be accurately detected according to the preset detection areas and the target types, and the requirements of actual engineering can be met.

The pt model obtained by training the method is converted into a ncnn model, and the model conversion step is shown in fig. 7. And the Android Studio is deployed to a HarmonyOS system of a matepad11 terminal, and the mobile terminal equipment adopts a high-pass Cellon 865 processor. As can be seen from fig. 8, the method is deployed to a mobile terminal device with low computational power, the object in the image can be accurately detected, about 90ms is needed for detecting one image, 50ms is improved compared with the detection of the original YOLOv5 algorithm at the mobile terminal, and the light weight of the model is realized.

Furthermore, although exemplary embodiments have been described herein, the scope thereof includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of the various embodiments across), adaptations or alterations as pertains to the present invention. Elements in the claims are to be construed broadly based on the language employed in the claims and are not limited to examples described in the present specification or during the practice of the present application, which examples are to be construed as non-exclusive. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.

Claims

1. A method for detecting regional target intrusion, the method comprising:

2. The method of claim 1, wherein the Block is used to implement three operations of channel-by-channel packet convolution, depth separable convolution, and point convolution;

3. The method of claim 1, wherein the head network structure includes a dp_detect structure for decomposing a target detection task into two sub-tasks of a target class responsible for predicting class information of a target and a target location responsible for predicting bounding box position information of the target.

4. A method according to claim 3, characterized in that the image is divided into grids, each grid predicting a fixed number of bounding boxes; for each bounding box, the probabilities that the targets belong to different categories are predicted. For each grid, the model outputs a vector containing class probabilities, where each element corresponds to a particular class; converting the vector containing the class probabilities into a probability distribution by using a softmax function, wherein the probability distribution represents the probability that the target belongs to each class;

5. The method of claim 1, wherein the lightweight object detection model is trained by:

according to the pixel points of the original image as two-dimensional coordinates, and according to the coordinates, a closed area is defined, namely a detection area is divided, so that the detection range of the electronic fence can be determined;

after the detection range of the electronic fence is determined, the image in the corresponding range is kept unchanged, and the outside of the detection range is covered by pure color;

and labeling the recorded target information on the original image.

6. The method of claim 5, wherein after the data set is acquired, performing data enhancement on the data set to obtain a training data set, wherein the data enhancement includes random clipping, flipping, rotation, and scaling, and wherein training the lightweight target detection model using the training data set results in a trained lightweight target detection model.

7. The method of claim 1, wherein the training process and performance of the model is optimized by adjusting super parameters including learning rate, batch size, and regularization coefficients while training the lightweight target detection model with the dataset.

8. An area target intrusion detection device, the device comprising:

a model building module configured to build a lightweight target detection model, the lightweight target detection model comprising a block_net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer;

the model training module is configured to train the lightweight target detection model by utilizing a data set to obtain a trained lightweight target detection model, wherein the data set comprises a plurality of original images marked with target information;

and the target detection module is configured to complete regional target intrusion detection based on the trained lightweight target detection model.

9. A regional target intrusion detection system, characterized by: the system comprises:

a memory for storing a computer program;

a processor for executing the computer program to implement the method of any one of claims 1 to 7.

10. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.