CN117593698A - Regional target intrusion detection method, device and system and storage medium - Google Patents
Regional target intrusion detection method, device and system and storage medium Download PDFInfo
- Publication number
- CN117593698A CN117593698A CN202311651203.2A CN202311651203A CN117593698A CN 117593698 A CN117593698 A CN 117593698A CN 202311651203 A CN202311651203 A CN 202311651203A CN 117593698 A CN117593698 A CN 117593698A
- Authority
- CN
- China
- Prior art keywords
- target
- lightweight
- detection model
- model
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 117
- 238000003860 storage Methods 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000011176 pooling Methods 0.000 claims abstract description 20
- 239000013598 vector Substances 0.000 claims description 12
- 238000009826 distribution Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims description 2
- 230000008901 benefit Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 8
- 238000013136 deep learning model Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- QPFMBZIOSGYJDE-UHFFFAOYSA-N 1,1,2,2-tetrachloroethane Chemical compound ClC(Cl)C(Cl)Cl QPFMBZIOSGYJDE-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 229920013660 Cellon Polymers 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a regional target intrusion detection method, a device, a system and a storage medium, wherein the method comprises the following steps: constructing a lightweight target detection model, wherein the lightweight target detection model comprises a Block_Net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output a feature map, and the classifier comprises a global average pooling layer and a full connection layer; and training the lightweight target detection model by utilizing the data set to obtain a trained lightweight target detection model, and completing regional target intrusion detection based on the trained lightweight target detection model. The invention occupies less computer resources, has the highest detection precision, higher comprehensive cost performance and better economic benefit.
Description
Technical Field
The present invention relates to the field of target detection technologies, and in particular, to a method, an apparatus, a system, and a storage medium for detecting intrusion of a regional target.
Background
Traditional intrusion detection methods play an important role in the security field, and the traditional methods generally depend on manually designed features and rules, but are difficult to adapt to complex and changeable intrusion scenes along with technological development and application scene changes. There are some limitations to this approach:
(1) The characteristic design is difficult: the manual design features need to rely on expert knowledge and experience and may not be generalizable for complex intrusion behavior and scene changes.
(2) Limited representation capability: the traditional method has limited characteristic representation capability, is difficult to capture complex targets and intrusion behaviors, and has limited detection performance.
Deep learning is a powerful machine learning technique that can achieve excellent performance in recognition and classification tasks by learning advanced features in data. However, conventional deep learning models typically have a large number of parameters and computational complexity and are not suitable for resource-constrained devices or scenarios.
Disclosure of Invention
The present invention has been made to solve the above-mentioned problems occurring in the prior art. Therefore, there is a need for a method, apparatus, system and storage medium for detecting regional target intrusion, in which researchers typically optimize CNNs to reduce the number of parameters and computational complexity of the network, such as by using a lightweight network architecture (e.g., mobileNet, shuffleNet) or applying model pruning and compression techniques, for example. By adopting a lightweight network architecture (such as MobileNet, shuffleNet and the like) or applying a model pruning and compression technology, the number of parameters and the computational complexity of the model can be remarkably reduced, and meanwhile, higher detection performance is kept.
According to a first aspect of the present invention, there is provided a method for detecting regional target intrusion, the method comprising:
constructing a lightweight target detection model, wherein the lightweight target detection model comprises a Block_Net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer;
training the lightweight target detection model by utilizing a data set to obtain a trained lightweight target detection model, wherein the data set comprises a plurality of original images marked with target information;
and finishing the regional target intrusion detection based on the trained lightweight target detection model.
Further, the Block is used for realizing three operations of channel-by-channel grouping convolution, depth separable convolution and point convolution;
the channel-by-channel grouping convolution divides an input feature map into a plurality of channel groups and rearranges the input feature map in a channel dimension;
the depth separable convolution is a two-step decomposition of a standard convolution into a depth convolution and a point-by-point convolution, the depth convolution is used for independently processing each channel of the input feature map in the channel dimension, and the point-by-point convolution is used for linearly combining features of different channels;
the point convolution is a convolution kernel using 1x1 for adjusting the number of channels and the linear combination of features.
Further, the detection head network structure comprises a dp_detect structure, wherein the dp_detect structure is used for decomposing a target detection task into two subtasks of target classification and target positioning, the target classification is responsible for predicting the class information of the target, and the target positioning is responsible for predicting the boundary box position information of the target.
Further, dividing the image into grids, each grid predicting a fixed number of bounding boxes; for each bounding box, the probabilities that the targets belong to different categories are predicted. For each grid, the model outputs a vector containing class probabilities, where each element corresponds to a particular class; converting the vector containing the class probabilities into a probability distribution by using a softmax function, wherein the probability distribution represents the probability that the target belongs to each class;
the target classification adopts a classification network for predicting the class probability distribution of the target; the target positioning adopts a regression network, and the regression network is utilized to predict the coordinate offset of each boundary frame relative to the grid unit and the width and the height of the boundary frame; the position information of the boundary box is normalized by using a sigmoid function, so that the values of the coordinate offset and the size of the boundary box are ensured to be between 0 and 1, and the calculation formula of the optimization of the preset target box is as follows:
wherein R (x) is the calculation of the initial cluster center point to each data sample x i P (x) is the probability that each sample will become the center of the next cluster.
Further, the lightweight target detection model is trained by:
according to the pixel points of the original image as two-dimensional coordinates, and according to the coordinates, a closed area is defined, namely a detection area is divided, so that the detection range of the electronic fence can be determined; after the detection range of the electronic fence, the image of the corresponding range is kept unchanged, and the outside of the detection range is covered by pure color;
detecting an electronic fence area defined by the image, and recording target information, wherein the target information comprises the category, probability and target frame of a target;
and labeling the recorded target information on the original image.
Further, after the data set is obtained, data enhancement is carried out on the data set to obtain a training data set, the data enhancement comprises random cutting, overturning, rotating and scaling, and the training data set is utilized to train the light-weight target detection model to obtain a trained light-weight target detection model.
Further, when the lightweight target detection model is trained by utilizing the data set, the training process and performance of the super-parameter optimization model are adjusted, wherein the super-parameters comprise the learning rate, the batch size and the regularization coefficient.
According to a second aspect of the present invention, there is provided an area target intrusion detection apparatus, the apparatus comprising: a model building module configured to build a lightweight target detection model, the lightweight target detection model comprising a block_net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer; the model training module is configured to train the lightweight target detection model by utilizing a data set to obtain a trained lightweight target detection model, wherein the data set comprises a plurality of original images marked with target information; and the target detection module is configured to complete regional target intrusion detection based on the trained lightweight target detection model.
According to a third aspect of the present invention, there is provided a regional target intrusion detection system, the system comprising: a memory for storing a computer program; a processor for executing the computer program to implement the method as described above.
According to a fourth aspect of the invention, there is provided a non-transitory computer readable storage medium storing instructions which, when executed by a processor, perform the method as described above.
The method, the device, the system and the storage medium for detecting the regional target intrusion according to the various schemes have at least the following technical effects:
1) The invention aims to combine the advantages of deep learning and the characteristics of a lightweight model, and the lightweight model can realize efficient and accurate target detection and intrusion detection on equipment with limited computing resources while keeping higher detection performance.
2) The invention emphasizes that the target detection and intrusion detection are carried out on the interested region in the image or video, can improve the detection efficiency and accuracy, reduces the processing of irrelevant regions, and makes the system more suitable for real-time application and resource-limited environments.
3) The regional target intrusion detection method based on the lightweight deep learning model has higher application potential and can be widely applied to the fields of video monitoring, safety systems, intelligent equipment and the like. By taking advantage of the ability to learn deep and the lightweight model, the method is expected to improve the performance and usability of intrusion detection systems.
Drawings
Fig. 1 shows a flowchart of a method for detecting regional target intrusion according to an embodiment of the present invention.
Fig. 2 shows a schematic structural diagram of a lightweight object detection model y_lst according to an embodiment of the present invention.
Fig. 3 shows a block_net network structure diagram according to an embodiment of the present invention.
Fig. 4 shows a schematic diagram of a coupled_detect network structure according to an embodiment of the present invention.
Fig. 5 shows a schematic diagram of region detection according to an embodiment of the present invention, where (a) represents image pixel coordinate points and (b) represents a custom detection region.
Fig. 6 shows a diagram of detection results of the PC-side y_lst algorithm according to an embodiment of the present invention.
FIG. 7 shows a model conversion flow diagram according to an embodiment of the invention.
Fig. 8 shows a diagram of a detection result of a mobile device end of macepad 11 according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the drawings and detailed description to enable those skilled in the art to better understand the technical scheme of the present invention. Embodiments of the present invention will be described in further detail below with reference to the drawings and specific examples, but not by way of limitation. The order in which the steps are described herein by way of example should not be construed as limiting if there is no necessity for a relationship between each other, and it should be understood by those skilled in the art that the steps may be sequentially modified without disrupting the logic of each other so that the overall process is not realized.
Fig. 1 shows a flowchart of a method for detecting an intrusion of a regional target, and an embodiment of the present invention provides a method for detecting an intrusion of a regional target, as shown in fig. 1, where the method includes steps S100-S300.
And S100, constructing a lightweight target detection model.
The schematic structure diagram of the lightweight target detection model y_lst in this embodiment is shown in fig. 2, and features that the backbone network adopts gcs_block, and the network detection head adopts dp_detect.
Specifically, the backbone network of the embodiment adopts a block_net network structure, belongs to model structure design, is a model optimization technology, and mainly comprises two parts of packet convolution (group convolution) +channel rearrangement (Channel Shuffle), and the ShuffleNet has the characteristics of small calculated amount and parameter amount, high model efficiency and high operation speed. The "combining" of the two networks aims at combining the light weight and efficiency of the ShuffleNet with the accuracy of the target detection algorithm of the convolutional network.
As shown in fig. 3, the block_net network structure is a lightweight deep convolutional neural network, and aims to realize efficient image classification and target detection on devices with limited computing resources. It employs an operation called "Shuffle" to reduce the amount of computation and the number of parameters while maintaining model accuracy. The following is a brief introduction of the block_net network architecture:
the block_Net network structure consists of a backbone network and a classifier, wherein the backbone network adopts a series of Block blocks.
Block: the core of the Block_Net network architecture is a Block Block, which consists of three key operations: channel-by-Channel packet convolution (Channel Shuffle), depth separable convolution, and point convolution.
(1) Channel-by-Channel packet convolution (Channel buffer): this operation divides the input feature map into multiple channel groups and reorders in the channel dimension to facilitate information exchange and mixing. It helps to increase the diversity and richness of the feature representation.
(2) Depth separable convolution: this convolution layer decomposes the standard convolution into two steps, a deep convolution and a point-by-point convolution, similar to the design in the MobileNet series. The depth convolution is used to process each channel of the input feature map independently in the channel dimension, while the point-by-point convolution is used to linearly combine the features of the different channels.
(3) Point convolution: a 1x1 convolution kernel is used to adjust the linear combination of channel number and characteristics.
2. Backbone network: the backbone network of the Block_Net network architecture consists of a series of stacked Block blocks and pooling layers. It extracts and combines features of different scales by stacking blocks multiple times to capture more comprehensive image information.
3. A classifier: the classifier of the ShuffleNet consists of a global averaging pooling layer and a series of fully connected layers. The global average pooling layer converts the feature map of the last Block into feature vectors with fixed sizes, and then carries out classification prediction through the full-connection layer.
Specifically, classification is achieved by the following method:
dividing the image into grids, each grid predicting a fixed number of bounding boxes; for each bounding box, the probabilities that the targets belong to different categories are predicted. For each grid, the model outputs a vector containing class probabilities, where each element corresponds to a particular class; converting the vector containing the class probabilities into a probability distribution by using a softmax function, wherein the probability distribution represents the probability that the target belongs to each class;
the target classification adopts a classification network for predicting the class probability distribution of the target; the target positioning adopts a regression network, and the regression network is utilized to predict the coordinate offset of each boundary frame relative to the grid unit and the width and the height of the boundary frame; the position information of the boundary box is normalized by using a sigmoid function, so that the values of the coordinate offset and the size of the boundary box are ensured to be between 0 and 1, and the calculation formula of the optimization of the preset target box is as follows:
wherein R (x) is the calculation of the initial cluster center point to each data sample x i P (x) is the probability that each sample will become the center of the next cluster.
The ShuffleNet reduces the amount of computation and the number of parameters by using channel-by-channel packet convolution and depth separable convolution to achieve a lightweight network architecture. The channel-by-channel packet convolution facilitates feature exchange and mixing, helping to improve feature diversity. Depth separable convolution provides an efficient way to process the input feature map.
In general, the network architecture of ShuffleNet operates through Block blocks, channel-by-channel packet convolution, and depth separable convolution for efficient image classification and object detection on limited devices. By combining channel-by-channel packet convolution with depth separable convolution, the ShuffleNet reduces the amount of computation and the number of parameters while maintaining higher accuracy. The method has a good balance among the size of the model, the calculation cost and the accuracy, and is suitable for the scenes of mobile equipment, embedded systems and limited calculation resources.
The detected_detect decoupling head is detected as a detection head. The dp_detect structure breaks down the object detection task into two sub-tasks, the former being responsible for predicting class information of the object and the latter being responsible for predicting bounding box position information of the object. The target classification branch generally adopts a classification network for predicting the class probability distribution of the target; the target positioning branch adopts a regression network for predicting the boundary box position of the target. The network structure of the coupled_detect is shown in fig. 4.
The DP_detect structure processes the two subtasks separately, so that the network can learn the detail and the space information of the target better, and the detection precision and the positioning accuracy of the target are improved.
The lightweight target detection model comprises a Block_Net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer;
step 200, training the light-weight target detection model by using a data set to obtain a trained light-weight target detection model, wherein the data set comprises a plurality of original images marked with target information.
The collection scene of the embodiment is a reservoir, including a road around the reservoir, a bayonet, important facilities, a scene where people are easy to gather, and a water surface of the reservoir, that is, only a region where stepping is not allowed in a camera is detected, that is, an irregular electronic fence is defined for a detection range. Therefore, the training process of the lightweight target detection model is specifically as follows:
step 1: determining the detection range of the electronic fence, wherein the process needs to calculate the (x, y) coordinates of each vertex of the irregular range, and the ratio of the width to the height in the whole occupied image is represented by the coordinates of the vertices of the lower left corner and the upper right corner of the image as shown in (a) in fig. 5;
step 2: after obtaining the defined detection range, keeping the image of the range unchanged, and covering the outside of the detection range with pure colors, wherein a blue region is a detection region as shown in (b) of fig. 5;
step 3: detecting the self-defined electronic fence area of the image, and recording information such as the category, probability, target frame and the like of the target;
step 4: and (3) marking the target information recorded in the step (3) on the original image.
In some embodiments, the training method of the lightweight model refers to a training process of the lightweight model by reducing the computational complexity, parameter amount and storage space of the model through specific training strategies and technologies aiming at equipment or application scenes with limited resources.
The training method of lightweight models generally includes the following techniques and strategies:
(1) And (3) model structural design: and designing a lightweight model structure, such as a depth separable convolution, a lightweight inverse residual structure, a channel-by-channel grouping convolution and the like. These structures can reduce the amount of calculation and the number of parameters while maintaining a certain accuracy.
(2) Data enhancement and preprocessing: through data enhancement and preprocessing technology, the diversity and the richness of training data are increased, and the generalization capability of the model is improved. Common data enhancement techniques include random cropping, flipping, rotation, scaling, and the like.
(3) Super parameter tuning: the training process and performance of the model are optimized by adjusting super parameters such as learning rate, batch size, regularization coefficient and the like.
By comprehensively utilizing the training method, the lightweight model can be effectively trained to meet the requirements of equipment or application scenes with limited resources. The method can improve the reasoning speed of the model, reduce the occupied storage space and maintain the accuracy of the model to a certain extent. The specific training method can be adjusted and selected according to factors such as task requirements, data set characteristics, model structures and the like.
Finally, in step S300, area target intrusion detection is completed based on the trained lightweight target detection model.
The area target intrusion detection method based on the lightweight deep learning model has a plurality of advantages. Firstly, by means of the feature extraction capability of the deep learning model, complex targets and intrusion behaviors can be captured, and detection accuracy is improved. Second, the application of lightweight models allows detection to be performed on resource-constrained devices, such as embedded systems, mobile devices, and the like. In addition, the method has the characteristics of high instantaneity and adaptability, and can cope with rapidly-changing invasion scenes.
In summary, the area target intrusion detection method based on the lightweight deep learning model combines the advantages of deep learning and the high-efficiency performance of the lightweight model, and brings new possibility to the intrusion detection field. The method has wide application prospect, and can be used for improving the safety and reliability of the fields of video monitoring, safety systems, intelligent equipment and the like.
Through preprocessing the data set and improving the network model, and through training the data set, a network detection model with excellent detection performance can be obtained, and the model is used for final target detection, so that the types and the number of targets entering and exiting can be accurately detected.
The detection effect is shown in fig. 6. The detection results are analyzed and verified to be available, the final detection results are free from the situations of false detection and missing detection of targets, the target types and the number in the range can be accurately detected according to the preset detection areas and the target types, and the requirements of actual engineering can be met.
The pt model obtained by training the method is converted into a ncnn model, and the model conversion step is shown in fig. 7. And the Android Studio is deployed to a HarmonyOS system of a matepad11 terminal, and the mobile terminal equipment adopts a high-pass Cellon 865 processor. As can be seen from fig. 8, the method is deployed to a mobile terminal device with low computational power, the object in the image can be accurately detected, about 90ms is needed for detecting one image, 50ms is improved compared with the detection of the original YOLOv5 algorithm at the mobile terminal, and the light weight of the model is realized.
Furthermore, although exemplary embodiments have been described herein, the scope thereof includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of the various embodiments across), adaptations or alterations as pertains to the present invention. Elements in the claims are to be construed broadly based on the language employed in the claims and are not limited to examples described in the present specification or during the practice of the present application, which examples are to be construed as non-exclusive. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.
Claims (10)
1. A method for detecting regional target intrusion, the method comprising:
constructing a lightweight target detection model, wherein the lightweight target detection model comprises a Block_Net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer;
training the lightweight target detection model by utilizing a data set to obtain a trained lightweight target detection model, wherein the data set comprises a plurality of original images marked with target information;
and finishing the regional target intrusion detection based on the trained lightweight target detection model.
2. The method of claim 1, wherein the Block is used to implement three operations of channel-by-channel packet convolution, depth separable convolution, and point convolution;
the channel-by-channel grouping convolution divides an input feature map into a plurality of channel groups and rearranges the input feature map in a channel dimension;
the depth separable convolution is a two-step decomposition of a standard convolution into a depth convolution and a point-by-point convolution, the depth convolution is used for independently processing each channel of the input feature map in the channel dimension, and the point-by-point convolution is used for linearly combining features of different channels;
the point convolution is a convolution kernel using 1x1 for adjusting the number of channels and the linear combination of features.
3. The method of claim 1, wherein the head network structure includes a dp_detect structure for decomposing a target detection task into two sub-tasks of a target class responsible for predicting class information of a target and a target location responsible for predicting bounding box position information of the target.
4. A method according to claim 3, characterized in that the image is divided into grids, each grid predicting a fixed number of bounding boxes; for each bounding box, the probabilities that the targets belong to different categories are predicted. For each grid, the model outputs a vector containing class probabilities, where each element corresponds to a particular class; converting the vector containing the class probabilities into a probability distribution by using a softmax function, wherein the probability distribution represents the probability that the target belongs to each class;
the target classification adopts a classification network for predicting the class probability distribution of the target; the target positioning adopts a regression network, and the regression network is utilized to predict the coordinate offset of each boundary frame relative to the grid unit and the width and the height of the boundary frame; the position information of the boundary box is normalized by using a sigmoid function, so that the values of the coordinate offset and the size of the boundary box are ensured to be between 0 and 1, and the calculation formula of the optimization of the preset target box is as follows:
wherein R (x) is the calculation of the initial cluster center point to each data sample x i P (x) is the probability that each sample will become the center of the next cluster.
5. The method of claim 1, wherein the lightweight object detection model is trained by:
according to the pixel points of the original image as two-dimensional coordinates, and according to the coordinates, a closed area is defined, namely a detection area is divided, so that the detection range of the electronic fence can be determined;
after the detection range of the electronic fence is determined, the image in the corresponding range is kept unchanged, and the outside of the detection range is covered by pure color;
detecting an electronic fence area defined by the image, and recording target information, wherein the target information comprises the category, probability and target frame of a target;
and labeling the recorded target information on the original image.
6. The method of claim 5, wherein after the data set is acquired, performing data enhancement on the data set to obtain a training data set, wherein the data enhancement includes random clipping, flipping, rotation, and scaling, and wherein training the lightweight target detection model using the training data set results in a trained lightweight target detection model.
7. The method of claim 1, wherein the training process and performance of the model is optimized by adjusting super parameters including learning rate, batch size, and regularization coefficients while training the lightweight target detection model with the dataset.
8. An area target intrusion detection device, the device comprising:
a model building module configured to build a lightweight target detection model, the lightweight target detection model comprising a block_net network structure and a detection head network structure; the block_Net network structure comprises a backbone network and a classifier, wherein the backbone network comprises stacked Block blocks and a pooling layer, the stacked Block blocks are used for extracting and combining features of different scales of an input image to output feature images, the classifier comprises a global average pooling layer and a full-connection layer, and the global average pooling layer is used for converting the feature images output by the last Block into feature vectors with fixed sizes and carrying out classification prediction through the full-connection layer;
the model training module is configured to train the lightweight target detection model by utilizing a data set to obtain a trained lightweight target detection model, wherein the data set comprises a plurality of original images marked with target information;
and the target detection module is configured to complete regional target intrusion detection based on the trained lightweight target detection model.
9. A regional target intrusion detection system, characterized by: the system comprises:
a memory for storing a computer program;
a processor for executing the computer program to implement the method of any one of claims 1 to 7.
10. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311651203.2A CN117593698B (en) | 2023-12-04 | 2023-12-04 | Regional target intrusion detection method, device and system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311651203.2A CN117593698B (en) | 2023-12-04 | 2023-12-04 | Regional target intrusion detection method, device and system and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117593698A true CN117593698A (en) | 2024-02-23 |
CN117593698B CN117593698B (en) | 2024-08-20 |
Family
ID=89922679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311651203.2A Active CN117593698B (en) | 2023-12-04 | 2023-12-04 | Regional target intrusion detection method, device and system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117593698B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063666A (en) * | 2018-08-14 | 2018-12-21 | 电子科技大学 | The lightweight face identification method and system of convolution are separated based on depth |
CN113344188A (en) * | 2021-06-18 | 2021-09-03 | 东南大学 | Lightweight neural network model based on channel attention module |
CN114120019A (en) * | 2021-11-08 | 2022-03-01 | 贵州大学 | Lightweight target detection method |
CN116232694A (en) * | 2023-01-31 | 2023-06-06 | 清华大学深圳国际研究生院 | Lightweight network intrusion detection method and device, electronic equipment and storage medium |
US20230215166A1 (en) * | 2021-12-30 | 2023-07-06 | Wuhan University | Few-shot urban remote sensing image information extraction method based on meta learning and attention |
-
2023
- 2023-12-04 CN CN202311651203.2A patent/CN117593698B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063666A (en) * | 2018-08-14 | 2018-12-21 | 电子科技大学 | The lightweight face identification method and system of convolution are separated based on depth |
CN113344188A (en) * | 2021-06-18 | 2021-09-03 | 东南大学 | Lightweight neural network model based on channel attention module |
CN114120019A (en) * | 2021-11-08 | 2022-03-01 | 贵州大学 | Lightweight target detection method |
US20230215166A1 (en) * | 2021-12-30 | 2023-07-06 | Wuhan University | Few-shot urban remote sensing image information extraction method based on meta learning and attention |
CN116232694A (en) * | 2023-01-31 | 2023-06-06 | 清华大学深圳国际研究生院 | Lightweight network intrusion detection method and device, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
姚明海等: "基于轻量级CNN与主动学习的工件疵病识别方法研究", 高技术通讯, 15 April 2020 (2020-04-15) * |
王蓉等: "基于联邦学习和卷积神经网络的入侵检测方法", 信息网络安全, 10 April 2020 (2020-04-10) * |
Also Published As
Publication number | Publication date |
---|---|
CN117593698B (en) | 2024-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108961235B (en) | Defective insulator identification method based on YOLOv3 network and particle filter algorithm | |
CN114549563A (en) | Real-time composite insulator segmentation method and system based on deep LabV3+ | |
CN112614136B (en) | Infrared small target real-time instance segmentation method and device | |
CN112818969A (en) | Knowledge distillation-based face pose estimation method and system | |
CN115205264A (en) | High-resolution remote sensing ship detection method based on improved YOLOv4 | |
CN113011338B (en) | Lane line detection method and system | |
CN114743119A (en) | High-speed rail contact net dropper nut defect detection method based on unmanned aerial vehicle | |
CN112580662A (en) | Method and system for recognizing fish body direction based on image features | |
CN113313703A (en) | Unmanned aerial vehicle power transmission line inspection method based on deep learning image recognition | |
CN115170746A (en) | Multi-view three-dimensional reconstruction method, system and equipment based on deep learning | |
CN115565043A (en) | Method for detecting target by combining multiple characteristic features and target prediction method | |
CN115410087A (en) | Transmission line foreign matter detection method based on improved YOLOv4 | |
CN113326734A (en) | Rotary target detection method based on YOLOv5 | |
CN115810149A (en) | High-resolution remote sensing image building extraction method based on superpixel and image convolution | |
CN116935332A (en) | Fishing boat target detection and tracking method based on dynamic video | |
CN116310328A (en) | Semantic segmentation knowledge distillation method and system based on cross-image similarity relationship | |
CN118212572A (en) | Road damage detection method based on improvement YOLOv7 | |
CN114140622A (en) | Real-time significance detection image method based on double-branch network | |
CN111160372B (en) | Large target identification method based on high-speed convolutional neural network | |
CN117593698B (en) | Regional target intrusion detection method, device and system and storage medium | |
CN116152699B (en) | Real-time moving target detection method for hydropower plant video monitoring system | |
CN106934344B (en) | quick pedestrian detection method based on neural network | |
CN114219757B (en) | Intelligent damage assessment method for vehicle based on improved Mask R-CNN | |
CN113780305B (en) | Significance target detection method based on interaction of two clues | |
CN111950586B (en) | Target detection method for introducing bidirectional attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |