CN117197756B

CN117197756B - Hidden danger area intrusion detection method, device, equipment and storage medium

Info

Publication number: CN117197756B
Application number: CN202311454716.4A
Authority: CN
Inventors: 朱志发; 梁浩; 陈旭林
Original assignee: Santachi Video Technology Shenzhen Co ltd
Current assignee: Santachi Video Technology Shenzhen Co ltd
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-02-27
Anticipated expiration: 2043-11-03
Also published as: CN117197756A

Abstract

The invention discloses a hidden danger area intrusion detection method, a hidden danger area intrusion detection device, hidden danger area intrusion detection equipment and a storage medium, wherein the hidden danger area intrusion detection method comprises the following steps: acquiring a target detection data set, wherein the target detection data set comprises a first resolution image and a second resolution image which are in one-to-one correspondence; constructing a target detection model, wherein the target detection model comprises a detection network model and a superdivision network model, and the superdivision network model is used for generating a superdivision image according to a high-dimensional feature image and a low-dimensional feature image which are output by the detection network model; training the detection network model, wherein input data are first resolution images, loss values are calculated according to original loss and auxiliary loss of the detection network model, and auxiliary loss is calculated according to the super-resolution images and corresponding second resolution images; acquiring a real-time monitoring image, and performing target detection through a trained detection network model to obtain a target detection frame; and performing intrusion detection according to the target detection frame and the hidden danger area. The invention can improve the detection performance without introducing extra calculation cost.

Description

Hidden danger area intrusion detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer vision recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting intrusion in a hidden danger area.

Background

With the popularization of video monitoring equipment and the research and application of computer vision and other technologies, the image-based hidden danger area intrusion detection technology is rapidly developed, and targets in suspected intrusion monitoring areas are detected by using a target detection technology through real-time monitoring of the monitoring areas, so that the detection efficiency is improved. However, the video monitoring of the fixed camera has the problems of wide field of view and easy missed detection caused by targets with fine features. At the server, the hidden danger area is amplified by increasing the input resolution or adding a super-resolution algorithm before target detection, so as to achieve the detection effect of enhancing the small target, but the mobile terminal has limited resources, and the method cannot meet the requirement of real-time detection of the mobile terminal.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: provided are a hidden danger area intrusion detection method, device, equipment and storage medium, which can improve detection performance without introducing extra calculation overhead.

In a first aspect, the present invention provides a method for detecting intrusion in a hidden danger area, including:

acquiring a target detection data set, wherein the target detection data set comprises a first resolution image and a second resolution image which are in one-to-one correspondence, and the second resolution is a preset multiple of the first resolution;

the method comprises the steps that a target detection model is built, the target detection model comprises a detection network model and a superdivision network model, the superdivision network model is used for fusing a high-dimensional feature image and a low-dimensional feature image which are output by a main network in the detection network model, the detection network model is a YOLO model, the high-dimensional feature image is a feature image with a downsampling multiple being greater than or equal to a preset multiple, the low-dimensional feature image is a feature image with the downsampling multiple being smaller than the preset multiple, and the resolution of the superdivision image is a preset multiple of the resolution of an input image of the detection network model;

training a detection network model in the target detection model according to the target detection data set and a preset target loss function, wherein input data of the detection network model is a first resolution image in the target detection data set, the target loss function is constructed according to an original loss function corresponding to the detection network model and an auxiliary loss function corresponding to the super-resolution network model, and the auxiliary loss function is used for calculating errors between the super-resolution image output by the super-resolution network model and a second resolution image corresponding to the input first resolution image;

acquiring a real-time monitoring image, and performing target detection through a trained detection network model to obtain a target detection frame;

and performing intrusion detection according to the target detection frame and a preset hidden danger area.

In a second aspect, the present invention further provides a hidden danger area intrusion detection device, including:

the acquisition module is used for acquiring a target detection data set, wherein the target detection data set comprises a first resolution image and a second resolution image which are in one-to-one correspondence, and the second resolution is a preset multiple of the first resolution;

the system comprises a building module, a target detection module and a display module, wherein the target detection module comprises a detection network module and a super-resolution network module, the super-resolution network module is used for fusing a high-dimensional feature image and a low-dimensional feature image which are output by a main network in the detection network module and generating a super-resolution image, the detection network module is a YOLO module, the high-dimensional feature image is a feature image with the downsampling multiple being greater than or equal to a preset multiple, the low-dimensional feature image is a feature image with the downsampling multiple being less than the preset multiple, and the resolution of the super-resolution image is a preset multiple of the resolution of an input image of the detection network module;

the training module is used for training a detection network model in the target detection model according to the target detection data set and a preset target loss function, wherein input data of the detection network model is a first resolution image in the target detection data set, the target loss function is constructed according to an original loss function corresponding to the detection network model and an auxiliary loss function corresponding to the super-resolution network model, and the auxiliary loss function is used for calculating errors between the super-resolution image output by the super-resolution network model and a second resolution image corresponding to the input first resolution image;

the target detection module is used for acquiring a real-time monitoring image, and carrying out target detection through a trained detection network model to obtain a target detection frame;

and the intrusion detection module is used for performing intrusion detection according to the target detection frame and a preset hidden danger area.

In a third aspect, the present invention also provides an electronic device, including:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for detecting intrusion into a hidden danger area as provided in the first aspect.

In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the hidden danger area intrusion detection method as provided in the first aspect.

The invention has the beneficial effects that: adding an auxiliary super-division algorithm branch, namely a super-division network model, on the basis of a detection network model, in a training stage, fusing low-dimensional local texture features and high-dimensional semantic features of a main network of the detection network model by extracting, reconstructing a high-resolution image by the super-division algorithm after fusing, and finally improving the capability of detecting a main branch, namely a small target by the detection network model by the auxiliary super-division branch, so that the detection performance is improved; in the application stage, the auxiliary superbranch is discarded, and target detection is carried out only through a trained detection network model, so that extra calculation cost is not introduced, and the requirement of real-time detection of the mobile terminal can be met. The invention can improve the recall rate of the regional intrusion target detection without adding extra calculated amount.

Drawings

FIG. 1 is a flow chart of a hidden danger area intrusion detection method provided by the invention;

FIG. 2 is a flow chart of a method according to a first embodiment of the invention;

FIG. 3 is a schematic structural diagram of a target detection model according to a first embodiment of the present invention;

fig. 4 is a schematic structural diagram of a superdivision network model according to a first embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a hidden danger area intrusion detection device according to the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, sub-computer programs, and the like.

Furthermore, the terms "first," "second," and the like, may be used herein to describe various directions, acts, steps, or elements, etc., but these directions, acts, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, the first information may be referred to as second information, and similarly, the second information may be referred to as first information, without departing from the scope of the present application. Both the first information and the second information are information, but they are not the same information. The terms "first," "second," and the like, are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Referring to fig. 1, a hidden danger area intrusion detection method includes:

From the above description, by adding an auxiliary super-division algorithm branch, namely a super-division network model, on the basis of the detection network model, the capability of the detection network model for detecting small targets can be improved through the auxiliary super-division branch in a training stage, so that the detection performance is improved; in the application stage, the auxiliary superbranch is discarded, and target detection is carried out only through a trained detection network model, so that extra calculation cost is not introduced, and the requirement of real-time detection of the mobile terminal can be met.

In an alternative embodiment, the acquiring the target detection data set is specifically:

simultaneously acquiring a first resolution image and a second resolution image through a zooming monitoring device to obtain a first resolution image set and a second resolution image set;

data cleaning is carried out on each first resolution image in the first resolution image set and each second resolution image in the second resolution image set, a positive sample image and a negative sample image are obtained according to a preset positive-negative sample proportion, and a target detection data set is obtained, wherein the positive sample image is an image containing a target, and the negative sample image is an image not containing the target;

and labeling the rectangular frame of the target in the positive sample image by a labeling tool.

As can be seen from the above description, by taking an image that does not contain an object as a negative sample, false detection of non-objects can be reduced.

In an alternative embodiment, the detection network model includes a backbone network, a feature enhancement network, and a detection head;

the super-division network model comprises a first convolution module, a first activation function module, an up-sampling module, a fusion module, a second convolution module, a second activation function module and a super-division module; the first convolution module is connected with the first activation function module, the first activation function module and the up-sampling module are respectively connected with the fusion module, and the fusion module, the second convolution module, the second activation function module and the super-division module are sequentially connected;

the input data of the first convolution module is a low-dimensional feature map output by the backbone network, and the input data of the up-sampling module is a high-dimensional feature map output by the backbone network.

From the above description, the low-dimensional feature map is fused with the high-dimensional feature map after being sequentially subjected to convolution processing of the first convolution module and activation function calculation of the first activation function module, then is sequentially subjected to convolution processing of the second convolution module and activation function calculation of the second activation function module, enters the super-division module, and finally outputs the super-division image.

In an alternative embodiment, the activation function used in the first activation function module and the second activation function module is a LeakyRelu; the fusion module adopts Concat to perform feature fusion; the super division algorithm adopted in the super division module is an ESRGAN algorithm.

From the above description, it is understood that by adopting the LeakyRelu as the activation function, the range of the ReLu activation function is facilitated to be expanded.

In an alternative embodiment, the objective loss function is L _total =c ₁ L ₀ +c ₂ L _s Wherein Ltotal represents the overall loss value of the target detection model, L ₀ Represents the original loss value calculated from the original loss function, L _s Represents the auxiliary loss value calculated from the auxiliary loss function, c ₁ And c ₂ Is a preset weight.

From the above description, it can be seen that the overall loss is calculated by detecting the original loss of the network model and the auxiliary loss weighting of the superdivision network model.

In an alternative embodiment, the original loss function includes a class loss function that is a VFL loss function and a rim regression loss function that is a DFL loss function; the auxiliary loss function is an L1 loss function.

In an optional embodiment, the intrusion detection is performed according to the target detection frame and a preset hidden danger area, specifically:

and if the intersection ratio of the target detection frame and the preset hidden danger area is larger than a preset threshold value, judging that the target intrusion hidden danger area exists.

The invention also provides a hidden danger area intrusion detection device, which comprises:

The invention also provides an electronic device, which comprises:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as described above.

Example 1

Referring to fig. 2-4, a first embodiment of the present invention is as follows: a hidden danger area intrusion detection method can realize effective control over hidden danger areas. In this embodiment, an example of detecting whether a pedestrian target intrudes into a hidden danger area of the distribution box will be described.

As shown in fig. 2, the method comprises the following steps:

s1: acquiring a target detection data set, wherein the target detection data set comprises a first resolution image and a second resolution image which are in one-to-one correspondence, and the second resolution is a preset multiple of the first resolution; in this embodiment, the first resolution is 2k, the second resolution is 4k, and the preset multiple is 2 times.

Specifically, the method comprises the following steps:

s101: and simultaneously acquiring the first resolution image and the second resolution image by the zoom monitoring equipment to obtain a first resolution image set and a second resolution image set.

The zoom visual monitoring equipment arranged on the telegraph pole tower is used for simultaneously collecting images with the resolution of 2k and 4k, and the 2k resolution image and the 4k resolution image which are obtained by shooting at the same time are the corresponding 2k resolution image and 4k resolution image.

S102: and cleaning data of each first resolution image in the first resolution image set and each second resolution image in the second resolution image set, and acquiring a positive sample image and a negative sample image according to a preset positive and negative sample proportion to obtain a target detection data set.

The collected images are subjected to data cleaning, so that the one-to-one correspondence between the 2k resolution image and the 4k resolution image is ensured, and meanwhile, positive and negative sample images are obtained according to the positive and negative sample proportion (9:1 in the embodiment). Wherein, the 2k resolution image and the 4k resolution image in the obtained positive and negative sample images are in one-to-one correspondence; the positive sample image is an image containing an object, and the negative sample image is an image not containing an object.

In this embodiment, false detection of a non-pedestrian target is reduced by making an image that does not contain a pedestrian target as a negative-sample tag.

S103: and labeling the rectangular frame of the target in the positive sample image by a labeling tool.

S2: and constructing a target detection model.

The target detection model includes a detection network model (YOLO model, YOLOv8 network model is adopted in the embodiment) and a superdivision network model, that is, an auxiliary superdivision branch is added on the basis of the YOLOv8 network model, as shown in fig. 3, the YOLOv8 network model includes a backbone network (backbone), a feature enhancement network (neg) and a detection head (head), wherein data output by a P1-P5 layer in the backbone network is a feature map corresponding to 2 times, 4 times, 8 times, 16 times and 32 times of downsampling respectively. For example, in the case where the original input is 640×640, the size of the feature map of the P3 layer output is 80×80, the size of the feature map of the P5 layer output is 20×20, and so on.

In this embodiment, a Feature map with a downsampling multiple of 32 times or more is defined as a High-Level Feature map (High-Level Feature), and a Feature map with a downsampling multiple of 32 times or less is defined as a low-Level Feature map (Lower-Level Feature). Therefore, the high-dimensional feature map in this embodiment is a feature map output by the P5 layer of the backbone network, and the low-dimensional feature map is a feature map output by the P1 layer, the P2 layer, the P3 layer, or the P4 layer of the backbone network. Preferably, considering the calculation amount and the memory overhead, the feature map output by the P3 layer is selected as the low-dimensional feature map output to the superdivision network model in the embodiment.

The super-division network model mainly comprises an Encoder and a Decoder, wherein input data of the Encoder are a low-dimensional Feature map (Lower-Level Feature) output by a P3 layer and a High-dimensional Feature map (High-Level Feature) output by a P5 layer in a backbone network backbone. The Encoder is mainly used for fusing high-dimensional features and low-dimensional features, and the Encoder is mainly used for generating a preset multiple (2 times in the embodiment) resolution image through a super-division algorithm.

As shown in fig. 4, the super-division network model includes a first convolution module (Conv), a first activation function module (LeakyRelu), an upsampling module (Upsample), a fusion module (Concat), a second convolution module (Conv), a second activation function module (LeakyRelu), and a super-division module (ESRGAN); the first convolution module is connected with the first activation function module, the first activation function module and the up-sampling module are respectively connected with the fusion module, and the fusion module, the second convolution module, the second activation function module and the superdivision module are sequentially connected.

The input data of the first convolution module is a low-dimensional Feature map (Lower-Level Feature) output by the P3 layer of the backbone network of the YOLOv8 network model, and the input data of the upsampling module is a High-dimensional Feature map (High-Level Feature) output by the P5 layer of the backbone network of the YOLOv8 network model.

The low-dimensional feature map is subjected to Concat fusion with the high-dimensional feature map subjected to up-sampling processing after being subjected to convolution processing of the first convolution module and activation function calculation of the first activation function module in sequence, then is subjected to convolution processing of the second convolution module and activation function calculation of the second activation function module in sequence, enters a Decoder structure, namely a super-division module, and finally outputs a super-division image.

In this embodiment, the activation function adopts the LeakyRelu, which helps to expand the range of the ReLu activation function, and the calculation formula is as follows:

wherein, the value of alpha is 0.01.

The Decoder adopts a common super-resolution structure, and in the embodiment, an ESRGAN algorithm is adopted to finally obtain data amplified by twice resolution, and the data is used for calculating auxiliary loss with an original 4k resolution image during subsequent training.

S3: training a detection network model in the target detection model according to the target detection data set and a preset target loss function, wherein the target loss function is constructed according to an original loss function corresponding to the detection network model and an auxiliary loss function corresponding to the superdivision network model.

In the training stage, the input data is a first resolution image in the target detection data set, namely a 2k resolution image is input, an image amplified by 2 times of resolution is generated through a super-resolution network model, namely a super-resolution image, and then an auxiliary loss value Ls is calculated according to the super-resolution image and a 4k resolution image corresponding to the input 2k resolution image. The output data of the detection network model includes a prediction box and a prediction probability.

In this embodiment, a random gradient descent optimization algorithm (SGD) is used to optimize the overall network loss during the training phase, and the overall loss value L _total From the original loss value L ₀ And an auxiliary loss value L _s The specific calculation formula is L _total =c ₁ L ₀ +c ₂ L _s Wherein the original loss value L0 is obtained by calculation of an original loss function, the auxiliary loss value Ls is obtained by calculation of an auxiliary loss function, and c ₁ And c ₂ For the preset weight, in this embodiment, c ₁ =0.7，c ₂ =0.3。

In this embodiment, the auxiliary loss function is an L1 loss function, and the L1 loss function is also called an average absolute value error, and calculates an average value of absolute difference values of an actual value and a target value. In this embodiment, the 4k resolution image (i.e., the super-resolution image) generated from the 2k resolution image is as close to the real 4k resolution image as possible, so that the error of the pixel value of the pixel position corresponding to the super-resolution image and the original 4k resolution image is calculated. The calculation formula of the L1 loss function can be expressed as:

wherein y is _p And f (x) _p ) The pixel values of the p-th pixel point positions of the super-resolution image and the second resolution image corresponding to the input first resolution image are respectively represented, and n is the total number of pixel points.

The original Loss function is a Loss function of the detection network model, in this embodiment, the detection network model includes a class Loss function and a frame regression Loss function, where the class Loss function is VFL Loss, and the VFL puts forward asymmetric weighting operation, so that more valuable positive samples can be mined, and a calculation formula of the Loss function can be expressed as:

where p represents the prediction score, i.e., ioU of the prediction box and the real box; q represents a target score, and the values are IoU of a prediction frame and a real frame when the class prediction is correct, 0 when the class prediction is incorrect, and alpha and gamma are super parameters, in this embodiment, alpha=0.75, and gamma=2.

The frame regression loss function adopts DFL (Distribution Focal Loss), the DFL optimizes the probability of 2 left and right positions closest to the label y in a cross entropy mode, so that the network focuses on the distribution of the adjacent area of the target position more quickly, the frame regression loss function is more robust to the coordinate regression, and the calculation formula can be expressed as follows:

wherein y represents the coordinates of the real frame, y _i Representing coordinates of a prediction frame closest to and left of the real frame, S _i Representing the prediction probability, y, corresponding to the left prediction frame _i+1 Representing coordinates of a prediction frame closest to and located to the right of the real frame, S _i+1 The prediction probability corresponding to the right prediction frame is indicated.

In this embodiment, the training is finished under the condition that the original loss value L0 of YOLOv8 and the auxiliary loss value Ls of the auxiliary superbranch are reduced to a certain value and tend to oscillate stably; the optimal model is determined by calculating the mAP value of Yolov 8.

The main branch is in a YOLOv8 structure, the feature size reserved by multi-scale detection is far smaller than the size of an original input image, the small detection target size is smaller, the feature size is usually recovered through up-sampling operation, but the up-sampling cannot well recover information loss such as textures, semantics and the like, so that an auxiliary hyper-division algorithm branch is added to extract the feature information. These features of the auxiliary superbranch extraction affect the overall loss value, thereby achieving the purpose of improving the detection capability of detecting the small target of the main branch.

S4: and acquiring a real-time monitoring image, and performing target detection through a trained detection network model to obtain a target detection frame.

Specifically, when the target detection model is exported as a mobile terminal model, the low-dimensional feature layer and the high-dimensional feature layer input to the auxiliary superbranch are canceled, the auxiliary superbranch is discarded, and only the trained YOLOv8 network model is reserved. And then deploying the trained YOLOv8 network model into the overhead monitoring equipment of the telegraph pole tower, and detecting the pedestrian target in the overhead scene in real time.

S5: and judging whether the intersection ratio of the target detection frame and the preset hidden danger area is larger than a preset threshold value, if so, executing the step S6.

S6: and judging that the target invasion hidden danger area exists, and carrying out alarm processing.

Specifically, calculating the IOU of the pedestrian target detection frame and the preset hidden danger area of the distribution box, if the IOU is larger than 0, indicating that pedestrians invade the hidden danger area of the distribution box, and carrying out timely alarm processing to reduce hidden danger accidents. IoU the calculation formula can be represented as IoU = ((a n B))/(a u B), where a and B represent the target detection box and the distribution box hidden danger area, respectively.

According to the embodiment, an auxiliary super-division algorithm branch is added on the basis of the YOLOv8 network model, in a training stage, low-dimensional local texture features and high-dimensional semantic features of a main network of the YOLOv8 network model are extracted, a decoder structure in an countermeasure network structure is generated after fusion, a high-resolution image is reconstructed, and finally the capability of detecting a main branch, namely the capability of detecting a small target of the YOLOv8 network model, is improved through the auxiliary super-division branch, so that the detection performance is improved; in the application stage, the auxiliary superbranch is discarded, and target detection is carried out only through the trained YOLOv8 network model, so that extra calculation cost is not introduced, and the requirement of real-time detection of the mobile terminal can be met.

Example two

Referring to fig. 5, a second embodiment of the present invention is as follows: the hidden danger area intrusion detection device can execute the hidden danger area intrusion detection method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. The device may be implemented by software and/or hardware, and specifically includes:

an obtaining module 201, configured to obtain a target detection data set, where the target detection data set includes a first resolution image and a second resolution image that are in a one-to-one correspondence, and the second resolution is a preset multiple of the first resolution;

the building module 202 is configured to build a target detection model, where the target detection model includes a detection network model and a super-resolution network model, where the super-resolution network model is configured to fuse a high-dimensional feature map and a low-dimensional feature map output by a backbone network in the detection network model, and generate a super-resolution image, the detection network model is a YOLO model, the high-dimensional feature map is a feature map with a downsampling multiple greater than or equal to a preset multiple, the low-dimensional feature map is a feature map with a downsampling multiple less than the preset multiple, and a resolution of the super-resolution image is a preset multiple of a resolution of an input image of the detection network model;

the training module 203 is configured to train a detection network model in the target detection model according to the target detection data set and a preset target loss function, where input data of the detection network model is a first resolution image in the target detection data set, the target loss function is constructed according to an original loss function corresponding to the detection network model and an auxiliary loss function corresponding to the superminute network model, and the auxiliary loss function is used to calculate an error between a superminute image output by the superminute network model and a second resolution image corresponding to the input first resolution image;

the target detection module 204 is configured to acquire a real-time monitoring image, and perform target detection through a trained detection network model to obtain a target detection frame;

and the intrusion detection module 205 is configured to perform intrusion detection according to the target detection frame and a preset hidden danger area.

In an alternative embodiment, the obtaining module 201 includes:

the acquisition unit is used for simultaneously acquiring the first resolution image and the second resolution image through the zoom monitoring equipment to obtain a first resolution image set and a second resolution image set;

the acquisition unit is used for carrying out data cleaning on each first resolution image in the first resolution image set and each second resolution image in the second resolution image set, acquiring a positive sample image and a negative sample image according to a preset positive and negative sample proportion, and acquiring a target detection data set, wherein the positive sample image is an image containing a target, and the negative sample image is an image not containing the target;

and the labeling unit is used for labeling the rectangular frame of the target in the positive sample image through a labeling tool.

In an alternative embodiment, the detection network model includes a backbone network, a feature enhancement network, and a detection head; the super-division network model comprises a first convolution module, a first activation function module, an up-sampling module, a fusion module, a second convolution module, a second activation function module and a super-division module; the first convolution module is connected with the first activation function module, the first activation function module and the up-sampling module are respectively connected with the fusion module, and the fusion module, the second convolution module, the second activation function module and the super-division module are sequentially connected; the input data of the first convolution module is a low-dimensional feature map output by the backbone network, and the input data of the up-sampling module is a high-dimensional feature map output by the backbone network.

The activation functions adopted in the first activation function module and the second activation function module are LeakyRelu; the fusion module adopts Concat to perform feature fusion; the super division algorithm adopted in the super division module is an ESRGAN algorithm.

The original loss function comprises a category loss function and a frame regression loss function, wherein the category loss function is a VFL loss function, and the frame regression loss function is a DFL loss function; the auxiliary loss function is an L1 loss function.

In an alternative embodiment, the intrusion detection module 205 is specifically configured to determine that there is a target intrusion hidden danger area if the intersection ratio of the target detection frame and the preset hidden danger area is greater than a preset threshold.

Example III

Referring to fig. 6, a third embodiment of the present invention is as follows: an electronic device, the electronic device comprising:

one or more processors 301;

a storage device 302 for storing one or more programs;

when the one or more programs are executed by the one or more processors 301, the one or more processors 301 implement the processes in the embodiment of the hidden danger area intrusion detection method as described above, and the same technical effects can be achieved, so that repetition is avoided and detailed description is omitted.

Example IV

The fourth embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process in the hidden danger area intrusion detection method embodiment described above, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein.

In summary, according to the hidden danger area intrusion detection method, device, equipment and storage medium provided by the invention, on the basis of a detection network model, an auxiliary super-division algorithm branch, namely a super-division network model is added, in a training stage, the low-dimensional local texture features and the high-dimensional semantic features of a main network of the detection network model are extracted to be fused, a high-resolution image is reconstructed through the super-division algorithm after fusion, and finally, the main branch detection, namely the capability of detecting a small target of the detection network model, is improved through the auxiliary super-division branch, so that the detection performance is improved; in the application stage, the auxiliary superbranch is discarded, and target detection is carried out only through a trained detection network model, so that extra calculation cost is not introduced, and the requirement of real-time detection of the mobile terminal can be met.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (RandomAccess Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.

It should be noted that, in the embodiment of the apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims

1. The hidden danger area intrusion detection method is characterized by comprising the following steps of:

2. The hidden danger area intrusion detection method according to claim 1, wherein the acquiring the target detection data set specifically comprises:

3. The hidden danger area intrusion detection method according to claim 1, wherein the detection network model comprises a backbone network, a feature enhancement network and a detection head;

4. The hidden danger area intrusion detection method according to claim 3, wherein the activation function adopted in the first activation function module and the second activation function module is a LeakyRelu; the fusion module adopts Concat to perform feature fusion; the super division algorithm adopted in the super division module is an ESRGAN algorithm.

5. The hidden danger area intrusion detection method according to claim 1, wherein the objective loss function is L _total =c ₁ L ₀ +c ₂ L _s Wherein L is _total Representing the overall loss value of the target detection model, L ₀ Represents the original loss value calculated from the original loss function, L _s Represents the auxiliary loss value calculated from the auxiliary loss function, c ₁ And c ₂ Is a preset weight.

6. The hidden danger area intrusion detection method according to claim 1, wherein the original loss function includes a class loss function and a border regression loss function, the class loss function being a VFL loss function, the border regression loss function being a DFL loss function; the auxiliary loss function is an L1 loss function.

7. The hidden danger area intrusion detection method according to claim 1, wherein the intrusion detection is performed according to the target detection frame and a preset hidden danger area, specifically:

8. A hidden danger area intrusion detection device, comprising:

9. An electronic device, the electronic device comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the hidden danger area intrusion detection method according to any one of claims 1-7.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the hidden danger area intrusion detection method according to any one of claims 1-7.