CN111815570A

CN111815570A - Regional intrusion detection method and related device thereof

Info

Publication number: CN111815570A
Application number: CN202010549904.5A
Authority: CN
Inventors: 任宇鹏; 卢维; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2020-10-23

Abstract

The application provides a regional intrusion detection method and a related device thereof. The regional intrusion detection method comprises the following steps: obtaining a current scene image; determining a warning area integrogram corresponding to the current scene image; obtaining a target object frame belonging to a target object in the current scene image by using an object detection model; calculating the overlapping area of the target object frame and the warning area according to the pixel value of the corresponding pixel point of the vertex of the target object frame in the warning area integral image; and determining whether the target object invades the warning area or not based on the overlapping area. The regional intrusion detection method can reduce the calculation complexity and the calculation time and improve the detection real-time property.

Description

Regional intrusion detection method and related device thereof

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting regional intrusion.

Background

With the continuous development of intelligent monitoring technology and the increasing maturity of image processing technology, the original adoption manpower can not satisfy the demand for judging whether the target object invades the warning area, and the intelligent monitoring system mainly based on technologies such as artificial intelligence and video analysis can make up the problem of insufficient manpower to a great extent, can judge whether the target object invades the warning area through the image automatically. However, the existing regional intrusion detection method has high calculation complexity, more calculation time and poorer real-time performance.

Disclosure of Invention

The application provides a regional intrusion detection method and a related device thereof, and the regional intrusion detection method can reduce the calculation complexity and the calculation time and improve the detection real-time performance.

In order to solve the technical problem, the present application provides a method for detecting a regional intrusion, including:

obtaining a current scene image;

determining a warning area integrogram corresponding to the current scene image, wherein the value of each pixel point of the warning area integrogram is in direct proportion to the occupied area of a warning area in a rectangular frame formed by the corresponding pixel point and the fixed pixel point in the current scene image;

obtaining a target object frame belonging to a target object in the current scene image by using an object detection model;

calculating the overlapping area of the target object frame and the warning area according to the pixel value of the corresponding pixel point of the vertex of the target object frame in the warning area integral image;

and determining whether the target object invades the warning area or not based on the overlapping area.

In order to solve the technical problem, the application provides a regional intrusion detection device, which comprises a processor and a memory; the memory has stored therein a computer program for execution by the processor to perform the steps of the method for intrusion detection of an area as described above.

To solve the technical problem, the present application provides a computer storage medium, in which a computer program is stored, and the computer program is executed to implement the steps in the above-mentioned area intrusion detection method.

After the current scene image is obtained, the warning area integral map corresponding to the current scene image can be determined firstly, then the object frame belonging to the object in the current scene image is obtained through the object detection model, then the overlapping area of the object detection frame and the warning area can be calculated through the pixel values of the pixel points corresponding to the four vertexes of the object detection frame on the warning area integral map, whether the object invades the warning area can be determined through the calculated overlapping area, the overlapping area can be calculated through the pixel values of the pixel points corresponding to the four vertexes of the object detection frame on the warning area integral map under the condition of setting the warning area integral map, the calculation complexity is low, the calculation time is short, and the real-time performance is high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for detecting intrusion into an area according to the present application;

FIG. 2 is a schematic diagram of an alarm area integrogram in the method for detecting intrusion into an area of the present application;

FIG. 3 is a schematic diagram illustrating the calculation of the overlap area in the method for detecting intrusion into a region;

FIG. 4 is a schematic diagram of an application example of the method for detecting intrusion into an area according to the present application;

FIG. 5 is a schematic structural diagram of an embodiment of a panorama segmentation network according to the present application;

fig. 6 is a schematic flowchart illustrating an embodiment of a method for dividing an alarm area in the method for detecting intrusion into an area according to the present application;

FIG. 7 is a schematic structural diagram of an embodiment of an intrusion detection device according to the present application;

FIG. 8 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present application, the area intrusion detection method and the related apparatus provided in the present application are further described in detail below with reference to the accompanying drawings and the detailed description.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a method for detecting intrusion into a region according to the present application. As shown in fig. 1, the method for detecting an intrusion into a region of the present embodiment specifically includes the following steps.

S110: and acquiring a current scene image.

S120: and determining a warning area integral graph corresponding to the current scene image.

The value of each pixel point of the warning area integral graph is in direct proportion to the area occupied by the warning area in a rectangular frame formed by the corresponding pixel point and the fixed pixel point in the current scene image. The fixed pixel point may be any pixel point in the current scene image. Specifically, the fixed pixel point may be an upper left vertex or an upper right vertex in the current scene image. Optionally, the value of each pixel point of the warning region integral map may be the area occupied by the warning region in a rectangular frame formed by the corresponding pixel point and the top left vertex in the current scene image.

In an implementation mode, the warning area integrogram corresponding to the current scene image can be directly obtained, namely, the overlapping area of the target detection frame and the warning area can be calculated by the previously obtained warning area integrogram, so that whether the target object invades the warning area in the current scene image can be judged only by obtaining the warning area integrogram, then whether the target object invades the warning area can be judged by obtaining and calculating a plurality of pixel values in the warning area integrogram, the time overhead required by calculating the invasion result is irrelevant to the size of the current scene image, the calculation time and the calculation complexity are low, and the real-time performance is high.

In another implementation, the warning area integral map may be obtained from the current scene image. Specifically, as shown in fig. 2, the step of obtaining the warning area integral map through the current scene image may include: performing warning area division on the current scene image, and determining a warning area in the current scene image; assigning the pixel points in the warning area to fixed values except 0, for example, assigning the pixel points of the warning area to 1, and assigning the pixel points in the area except the warning area in the current scene image to 0 to obtain an area division map; and taking the sum of all pixel values in a rectangular frame formed by each pixel point and the fixed pixel point in the area division graph as the value of the corresponding pixel point in the warning area integral graph.

It is understood that the above rectangular frame formed by each pixel point and the fixed pixel value may refer to: and each pixel point and the fixed pixel point are used as two diagonal vertexes to obtain a rectangular frame.

It can be understood that, in the actual application process, if the monitoring area or the scene is not changed, a primary warning area integral graph can be formed only by the warning area division result of one or more frames of current scene images of the current scene, and whether a target object in at least one subsequent frame of current scene images invades the warning area can be judged by using the formed warning area integral graph until the monitoring area or the scene is changed. Therefore, when the monitoring scene or the monitoring area is not changed, whether the target object invades the warning area or not can be judged by using the previously obtained warning area integral graph, so that the time cost required for calculating the invasion result is irrelevant to the size of an input image, the invasion result can be obtained through the pixel values of four vertexes of the target detection frame, the time complexity is O (1), the operation time is saved, the space cost required by the area invasion detection method can be saved, and the advantages in processing the monitoring video with high resolution and frame rate are obvious.

It will be appreciated that the alert region of the current scene image may be the same as the alert region in the current scene image used to derive the alert region score. The warning area integrogram can comprise pixel points corresponding to all pixel points of the current scene image, so that the overlapping area of a target object frame and a warning area can be calculated through the warning area integrogram at any position of the current scene image.

S130: and obtaining a target object frame belonging to the target object in the current scene image by using the object detection model.

In this embodiment, a target object frame belonging to a target object in a current scene image may be detected by an arbitrary object detection network.

For example, an object frame classification model may be used to determine a target object frame in the current scene image that is capable of framing out the target object. In the present embodiment, the target object frame has a rectangular shape.

Besides, the confidence corresponding to the target object frame can be confirmed. In order to ensure that at least one finally obtained target object frame corresponds to different objects respectively, an NMS non-maximum suppression algorithm can be adopted and redundant target object frames can be removed according to the corresponding confidence degrees of the target object frames, so that whether the target object invades the warning area or not can be conveniently determined according to the target object frames. Specifically, if the candidate target object frames are arranged according to the confidence level, A, B, C, D, E, F are arranged in sequence from large to small, the target object frame a with the highest confidence level is selected, the IOU of the rest target object frames and the IOU of A are calculated, and the target object frame B, C with the IOU larger than a certain threshold (such as 0.6) is removed; removing and storing A as one of the target object frames; then continuing to select the target object frame D with the highest confidence coefficient, calculating the IOU values of the rest target object frames E, F and D, and removing the target object frames larger than the threshold value; the steps are circulated until no candidate target frames remain; and the removed and stored target object frame is the finally confirmed target object frame.

S140: and calculating the overlapping area of the target object frame and the warning area according to the pixel value of the corresponding pixel point of the vertex of the target object frame in the warning area integral image.

Optionally, the overlapping area S of the target object and the warning area is calculated according to the pixel values of the pixel points corresponding to the four vertexes of the target object frame in the warning area integral graph_Invasion。

Optionally, the pixel values of the pixel points of the four vertexes of the target object frame on the warning area integrogram are searched first, and then the addition and subtraction operation is performed through the pixel values of the corresponding pixel points on the warning area integrogram to obtain the overlapping area. For example, as shown in fig. 3, the intrusion S is SDO-SBO-SCO + SAO, where the intrusion S is an overlapping area of the target object and the warning area, and SAO, SBO, SCO, and SDO are pixel values of four vertices A, B, C of the target object frame and D on the warning area integral map, respectively.

S150: and determining whether the target object invades the warning area or not based on the overlapping area.

In an implementation manner, whether the target object invades the warning area or not can be directly determined through the overlapping area, and specifically, if the overlapping area is larger than the first threshold, the target object invades the warning area can be determined. If the overlapping area is smaller than the first threshold value, the target object can be judged not to invade the warning area.

In another implementation manner, a ratio of the overlapping area to the area corresponding to the target object may be calculated, and then whether the target object invades the warning area is determined according to the ratio. Specifically, if the ratio is greater than the second threshold, it can be determined that the target object invades the warning area, and if the ratio is less than the second threshold, it can be determined that the target object does not invade the warning area.

In this embodiment, after the current scene image is obtained, an alarm region integral map corresponding to the current scene image may be determined, then a target object frame belonging to a target object in the current scene image is obtained through the object detection model, then an overlapping area between the target detection frame and the alarm region may be calculated through pixel values of pixel points corresponding to four vertices of the target detection frame on the alarm region integral map, whether the target object invades the alarm region may be determined through the calculated overlapping area, and the overlapping area may be calculated through pixel values of pixel points corresponding to four vertices of the target detection frame on the alarm region integral map under the condition that the overlapping area is calculated through the alarm region integral map, so that the calculation complexity is low, the calculation time is short, and the real-time performance is high.

In an application scenario, as shown in fig. 4, the above-mentioned regional intrusion detection method may be applied to determine whether a vehicle intrudes into a high-speed gate diversion line region. Firstly, the original image at the high-speed bayonet is subjected to panoramic segmentation to confirm a warning area in the original image, for example, a flow guide line no-pass area in the step (f) is divided into warning areas, a warning area integral graph is calculated through the divided warning areas, after each frame of vehicle detection frame is obtained, the overlapping area of the vehicle detection frame and the warning area can be rapidly calculated, and whether the vehicle invades the warning area or not can be rapidly judged by calculating the ratio of the overlapping area/the area of a target object frame and combining a second threshold value.

Further, in step S120, in the process of determining the warning area integrogram corresponding to the current scene image, warning area division may be performed on the current scene image, and then the warning area integrogram may be determined based on the divided warning areas. Optionally, the trained panorama segmentation network may be used to segment the current scene image to obtain the warning image. The panoramic segmentation network can be trained by using a plurality of current scene images identifying various semantic category areas (such as warning areas), so that the trained panoramic segmentation network can segment various current scene images conveniently, and the warning areas under the changed scenes can be segmented in a self-adaptive manner. Further, the panorama segmentation network may be trained using training samples that identify various semantic category regions and regions where instances are located to obtain a trained panorama segmentation network. As shown in fig. 5, the panorama segmentation network may include a feature extraction layer, an object detection layer, and a panorama segmentation layer. Specifically, as shown in fig. 6, the warning area division of the current scene image may include the following steps.

S121: and extracting a feature map of the current scene image through a feature extraction layer.

Alternatively, the feature extraction layer may be a codec structure or a hole convolution structure, which can generate richer semantic information and a higher resolution feature map, thereby enhancing the ability to identify larger or smaller objects and enhancing the robustness of identifying larger or smaller objects.

Preferably, the feature extraction layer is a codec structure. The coding structure in the feature extraction layer may include a downsampling unit that computes a current scene image and extracts a feature map of the current scene image. Alternatively, the coding structure may include at least two downsampling units, so that feature maps of multiple levels of the current scene image may be obtained. The obtained feature maps of multiple levels correspond to features with different complexity levels respectively, wherein the feature map of a low level corresponds to features with relatively low complexity levels, such as points, lines and the like of a base in an image. The middle-level and high-level feature maps correspond to features with a slightly higher complexity, for example, the middle-level features are used for describing the outline of a certain object in the image, and the high-level features are used for describing semantic information in the image. In particular, the down-sampling unit may include a second convolution layer and a down-sampling layer. Wherein the downsampling layer may be a pooling layer.

The decoding structure may be connected after the encoding structure to output an output picture having the same resolution as the current scene picture. The decoding structure includes an upsampling unit. Wherein, at least two up-sampling units can be arranged in the decoding structure. Optionally, the upsampling unit may include an upsampling layer and a third convolution layer. Wherein, the upsampling layer may be an interpolation layer.

In addition, the feature extraction layer may include an inter-layer jump connection layer. The inter-layer jump connection layer can jump-connect the input feature map of the down-sampling layer in the coding structure with the output feature map of the same resolution of the up-sampling layer in the decoding structure, so that the feature map containing low-level information in the coding structure and the feature map containing high-level information in the decoding structure can be fused, and the final feature map of the network contains both image detail information and high-level semantic information. The number of the inter-layer jump connection layers can be the same as the number of the down-sampling units and the number of the up-sampling units in the feature extraction layer.

Further, the feature extraction layer may also include a first convolution layer. The first convolution layer is connected after all the down-sampling units and before all the up-sampling units. The first convolution layer may process the output feature map of the last down-sampling unit to obtain a feature map having the same resolution as the output feature map of the last down-sampling unit.

For example, as shown in fig. 5, the feature extraction layer includes 3 down-sampling units, 2 first convolution layers, 3 up-sampling units, and 3 inter-layer skip connection layers. The down-sampling unit is of a Conv3-Conv3-Maxpooling structure. The Upsampling unit has an Upsampling-Conv3-Conv3 structure, namely the Upsampling unit comprises a bilinear interpolation layer and a third convolution layer. The interlayer jump connection layer adds the input feature map of the pooling layer point by point or directly connects the feature map of the same resolution of the output of the interpolation layer in the up-sampling unit. The first winding layer has a Conv3 structure. In an application scenario, an image with a resolution of 512 × 512 may be input into the feature extraction layer, and the resolution of the feature map is halved every time the image passes through a down-sampling unit, for example, the resolution is changed from 512 × 512 to 256 × 256; the resolution of the feature map is increased by a factor of 2 per up-sampling unit, for example, the resolution is changed from 256 × 256 to 512 × 512.

The feature map extracted by the feature extraction layer can be input into the object detection layer and the panoramic segmentation layer, parameter sharing between the panoramic segmentation layer and the object detection layer can be realized, the panoramic segmentation network can effectively integrate instance information and semantic category region information, interference of instances on semantic category region segmentation can be better eliminated, and segmentation results are more robust.

S122: and carrying out object frame detection on the characteristic diagram through the object detection layer to obtain at least one object frame.

The object detection layer may be any model that implements the object detection function. The object detection layer is obtained by training in advance according to the label graph of the labeled object. In the pre-training stage, firstly, objects are marked in the images, and the object detection layer is trained by using the label images of the marked objects, so that the trained object detection layer can be obtained.

And performing object detection on the feature map of the current scene image extracted by the feature extraction layer through the trained object detection layer, and confirming the object frame capable of framing each object. The object frame may be a rectangular frame that can frame objects on the current scene image.

Besides, the confidence corresponding to the object frame can be confirmed. In order to ensure that at least one finally obtained object frame respectively corresponds to different objects, an NMS non-maximum suppression algorithm may be adopted and according to the confidence corresponding to the object frame, redundant object frames may be removed to obtain a finally confirmed object frame.

Of course, after the object frame corresponding to each object is finally determined, the object type and/or the instance type corresponding to each object frame may also be recorded, so that the panoramic segmentation layer is divided into which instances the pixel points belong. Specifically, by using the object detection layer, the probability that the selected region of the object frame in the current scene image belongs to various instances can be obtained, that is, the confidence of each object frame corresponding to various instances can be obtained. The instance with the highest confidence corresponding to each object frame can be used as the instance corresponding to the object frame.

S123: and processing the characteristic image according to the detected object frame to obtain an attention mask image.

The attention mask may be obtained based on the at least one object frame acquired in step S122.

In an implementation manner, a thermodynamic diagram corresponding to each object frame may be directly obtained according to the at least one object frame obtained in step S122 to obtain the attention mask.

In another implementation manner, a predetermined number of detected object frames may be selected from the at least one object frame obtained in step S122, and the feature map may be processed according to the selected predetermined number of object frames. And if the preset number is larger than the number of the object frames obtained by detection, obtaining a thermodynamic diagram corresponding to each object frame according to the object frames obtained by detection, and performing zero setting processing on the feature diagram corresponding to the redundant object frames to obtain the attention mask. And if the preset number is smaller than the number of the object frames obtained by detection, selecting the preset number of the object frames with the highest confidence degree from the number of the object frames obtained by detection. Specifically, the predetermined number of object frames may be sequentially selected according to a descending order of confidence degrees of the object frame detections, so that after the predetermined number of object frames with the highest confidence degree are selected from the object frames obtained by the detection, a thermodynamic diagram corresponding to each of the predetermined number of object frames may be obtained according to the selected predetermined number of object frames, so as to obtain the attention mask.

It can be understood that the pixel values of the pixel points on the thermodynamic diagram corresponding to the redundant number of object frames are all zero, that is, the feature map is zeroed corresponding to the redundant number of object frames.

Alternatively, corresponding to the detected object frames, a thermodynamic diagram corresponding to each object frame may be generated based on the position and size of each object frame. Specifically, a two-dimensional gaussian kernel function with the mean value as the coordinates of the center point of the object and the diagonal elements of the covariance matrix as the width and the height of the object 1/4 may be used to obtain the pixel value of each pixel point on the thermodynamic diagram, that is, the two-dimensional gaussian kernel function is used to obtain the thermodynamic diagram corresponding to each object.

In one implementation, the thermodynamic diagrams corresponding to the respective object frames may be directly connected to obtain the attention mask.

In another implementation, the pixel values in each thermodynamic diagram may be normalized to 0 to 1, and then all the normalized thermodynamic diagrams are directly connected to obtain the attention mask.

S124: and processing the attention mask image and the feature image through the panoramic segmentation layer to obtain a segmentation result.

In one implementation, the feature map and the attention mask may be subjected to unified processing of dimension channels, so as to obtain a feature map including the attention mask.

In another implementation, the feature maps with multiple resolutions extracted by the feature extraction layer can be aggregated in multiple scales to obtain an aggregated feature map, so that the deep-layer features and the shallow-layer features can be combined conveniently to obtain detailed information and semantic information. And then carrying out unified processing on the dimension channels on the aggregated feature map and the attention mask to obtain the feature map comprising the attention mask.

Specifically, the output feature maps of all the upsampling units in the decoding structure may be subjected to multi-scale aggregation to obtain an aggregated feature map. Specifically, the output feature maps of all the upsampling units in the decoding structure may be changed into feature maps of the same resolution, and then these feature maps of the same resolution may be directly connected or added point by point to obtain an aggregate feature map. It is understood that the output feature maps of all the upsampling units include the feature map output by the feature extraction layer.

After the feature map including the attention mask is obtained in the above manner, the feature map including the attention mask may be processed by the panorama segmentation layer, and a segmentation result may be obtained. Specifically, the probability that each pixel in the current scene image belongs to each instance and each semantic segmentation class can be determined based on the feature map including the attention mask map, and then each pixel class in the current scene image is determined according to the probability, so that the classes to which all pixel points in the current scene image belong are synthesized to obtain the segmentation result.

The panorama segmentation layer can perform multi-step convolution operation on the feature map comprising the attention mask through the multilayer convolution layer to generate the feature map with the dimension of (50+ the number of semantic segmentation categories). The width and the height of the (50+ semantic segmentation class number) layer feature map are respectively the same as the width and the height of the current scene image, and the pixel value of each pixel point on each layer feature map is the probability that each pixel point in the current scene image belongs to the corresponding instance or semantic segmentation class of the layer feature map.

The panorama segmentation layer may determine a category to which each pixel in the current scene image belongs based on the probability through an Argmax function. Specifically, the category to which each pixel belongs may be the category to which each pixel has the highest probability.

In addition, because a situation that an instance in the current scene image blocks an area corresponding to the semantic segmentation class may occur, a plurality of current scene images may be segmented, and the segmentation results in the plurality of current scene images are integrated to obtain a final panorama segmentation result. Specifically, a union of regions of each semantic segmentation class in the multiple current scene images may be used as a region corresponding to each semantic segmentation class to obtain a final segmentation result.

In the process of segmenting the current scene image by the panoramic segmentation network, firstly extracting a feature map of the current scene image through a feature extraction layer, then detecting at least one object frame by an object detection layer according to the feature map, confirming an attention mask according to the at least one frame, and then processing the attention mask and the feature map through the panoramic segmentation layer to obtain a segmentation result, so that the feature extraction layer can extract the feature map which can be used by both the object detection layer and the panoramic segmentation layer, thereby realizing parameter sharing, reducing the calculation amount, and extracting the attention mask which is input into the panoramic segmentation layer through the object detection layer, so that the panoramic segmentation layer can not only judge the semantic category to which each pixel point in the current scene image belongs, but also can distinguish the example to which each pixel point in the current scene image belongs, and realize semantic segmentation and example segmentation through the panoramic segmentation network of the application, and two independent networks are not required to be relied on, so that the calculation amount is reduced.

The above-mentioned regional intrusion detection method is generally implemented by a regional intrusion detection device, and therefore the present application also provides a regional intrusion detection device. Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a local intrusion detection device according to the present application. The regional intrusion detection device 10 comprises a processor 12 and a memory 11; the memory 11 has stored therein a computer program, and the processor 12 is adapted to execute the computer program to implement the steps in the above-mentioned intrusion detection method for an area.

The logic processes of the above-mentioned regional intrusion detection method can be presented as a computer program, and in terms of the computer program, if it is sold or used as a stand-alone software product, it can be stored in a computer storage medium, so that the present application proposes a computer storage medium. Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a computer storage medium 20 according to the present application, in which a computer program 21 is stored, and when the computer program is executed by a processor, the steps in the method for detecting an intrusion into a region are implemented.

The computer storage medium 20 may be a medium that can store a computer program, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a server that stores the computer program, and the server may send the stored computer program to another device for running or may run the stored computer program by itself. The computer storage medium 20 may be a combination of a plurality of entities from a physical point of view, for example, a plurality of servers, a server plus a memory, or a memory plus a removable hard disk.

The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A regional intrusion detection method is characterized by comprising the following steps:

obtaining a current scene image;

determining a warning area integrogram corresponding to the current scene image, wherein the value of each pixel point of the warning area integrogram is in direct proportion to the area occupied by a warning area in a rectangular frame formed by the corresponding pixel point and the fixed pixel point in the current scene image;

calculating the overlapping area of the target object frame and the warning area according to the pixel value of the corresponding pixel point of the vertex of the target object frame in the warning area integrogram;

determining whether the target object invades the warning area based on the overlapping area.

2. The method according to claim 1, wherein the determining the warning area integral map corresponding to the current scene image comprises:

performing warning area division on the current scene image, and determining a warning area in the current scene image;

assigning pixel points in the warning area to fixed values except 0, and assigning pixel points in the area except the warning area in the current scene image to 0 to obtain an area division map;

and taking the sum of all pixel values in a rectangular frame formed by each pixel point and the fixed pixel point in the area division graph as the value of the corresponding pixel point in the warning area integral graph.

3. The method according to claim 2, wherein the dividing the warning region of the current scene image comprises:

segmenting the current scene image by utilizing a trained panoramic segmentation network to obtain a warning area, wherein the panoramic segmentation network comprises a feature extraction layer, an object detection layer and a panoramic segmentation layer;

the segmenting the current scene image by using the trained panorama segmentation network comprises the following steps:

extracting a feature map of the current scene image through the feature extraction layer;

performing object frame detection on the feature map through the object detection layer to obtain at least one object frame;

processing the characteristic diagram according to the detected object frame to obtain an attention mask diagram;

and processing the attention mask image and the feature image through the panoramic segmentation layer to obtain a segmentation result.

4. The method according to claim 3, wherein the processing the attention mask map and the feature map through the panorama segmentation layer comprises:

carrying out unified processing on dimension channels on the attention mask graph and the feature graph;

processing a feature map including an attention mask map through the panorama segmentation layer.

5. The method according to claim 4, wherein the feature extraction layer comprises at least two down-sampling units, a first convolution layer connected after the at least two down-sampling units, at least two inter-layer jump connection layers, and at least two up-sampling units connected after the first convolution layer, wherein the inter-layer jump connection layers are used for performing inter-layer jump connection between the feature maps obtained by the down-sampling units and the feature maps obtained by the corresponding up-sampling units.

6. The method according to claim 5, wherein the unified processing of the attention mask map and the feature map with dimension channels comprises:

performing multi-scale aggregation operation on the feature maps output by all the up-sampling units to obtain an aggregated feature map, wherein the feature maps output by all the up-sampling units comprise the feature maps;

and carrying out unified processing of dimension channels on the aggregation characteristic diagram and the attention mask diagram.

7. The method according to claim 3,

the processing the feature map according to the detected object frame includes:

and selecting a preset number of detected object frames, and processing the characteristic diagram according to the preset number of selected object frames.

8. The method of claim 7, wherein the area intrusion detection system further comprises a plurality of sensors,

the selecting of the predetermined number of detected object frames and the processing of the feature map according to the selected predetermined number of object frames includes:

and if the preset number is larger than the number of the object frames obtained by detection, performing zero setting processing on the feature map corresponding to the redundant number of the object frames.

9. The method of claim 7, wherein the area intrusion detection system further comprises a plurality of sensors,

the selecting of the object frames obtained by detecting the preset number comprises the following steps:

and sequentially selecting the preset number of object frames according to the sequence of the confidence degrees of the object frame detection from large to small.

10. The method according to claim 3,

processing the feature map according to the detected object frame to obtain an attention mask map, including:

processing the characteristic graph by taking the coordinates of the center point of the detected object frame as a mean value and the diagonal elements of the quarter width and the height as covariance matrixes as two-dimensional Gaussian kernel functions to obtain a thermal mask graph;

and carrying out normalization processing on the thermal mask image to obtain an attention mask image.

11. An area intrusion detection device, characterized in that the area intrusion detection device comprises a memory and a processor; the memory has stored therein a computer program for execution by the processor to carry out the steps of the method of intrusion detection of an area according to any one of claims 1 to 10.

12. A computer storage medium on which a computer program is stored, characterized in that the computer program realizes the steps in the method of any of claims 1-10 when executed by a processor.