CN112861711A

CN112861711A - Regional intrusion detection method and device, electronic equipment and storage medium

Info

Publication number: CN112861711A
Application number: CN202110164424.1A
Authority: CN
Inventors: 张松华; 闫潇宁; 郑双午
Original assignee: Shenzhen Anruan Huishi Technology Co ltd; Shenzhen Anruan Technology Co Ltd
Current assignee: Shenzhen Anruan Huishi Technology Co ltd; Shenzhen Anruan Technology Co Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-05-28

Abstract

The embodiment of the invention provides a method, a device and electronic equipment for detecting regional intrusion, wherein the method comprises the following steps: acquiring video data, wherein the video data comprises continuous multi-frame images; respectively detecting target people through a first detection model and a second detection model based on continuous multi-frame images to obtain a corresponding first detection result and a second detection result, wherein the first detection result comprises a first detection frame set, and the second detection result comprises a second detection frame set; extracting feature information of the target person through a third detection model based on the continuous multi-frame images and the first detection frame set; screening the first detection frame set through the second detection frame set to obtain a target detection frame set, and matching the characteristic information of the target person with the target detection frame set through a preset matching algorithm to obtain a target detection frame corresponding to the target person; and judging whether the corresponding target personnel invade the preset area or not based on the target detection frame. The accuracy rate of the intrusion detection of the pedestrian area is improved.

Description

Regional intrusion detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a method and an apparatus for detecting intrusion in a region, an electronic device, and a storage medium.

Background

In recent years, with the development of deep learning technology in the field of artificial intelligence and the improvement of computing power of related hardware equipment, computer vision-based applications are falling to the ground and further developed in various fields. For example, the area intrusion detection technology is increasingly widely applied in the security field, and accurate and efficient pedestrian detection and identification results have important auxiliary functions in assisting security personnel to patrol, improving human-computer interaction experience and the like.

In the prior art, a detection model based on a convolutional neural network has a good effect on solving the problem of image classification and identification, but for continuous detection of pedestrian regional intrusion, accurate prediction and further tracking processing are required to be performed on the position and the size of each pedestrian while the existence of the pedestrian in an image is judged so as to prompt the pedestrian in time, the existing detection model can have the condition of inaccurate frame prediction, and the re-identification and tracking are difficult to perform after a detection target is lost, so that the accuracy of pedestrian regional intrusion detection is not high.

Disclosure of Invention

The embodiment of the invention provides a regional intrusion detection method, which can improve the accuracy of pedestrian regional intrusion detection.

In a first aspect, an embodiment of the present invention provides a method for detecting an area intrusion, including:

acquiring video data, wherein the video data comprises continuous multi-frame images, and each frame of image comprises at least one target person;

respectively detecting the target person through a first detection model and a second detection model based on the continuous multi-frame images to obtain a corresponding first detection result and a corresponding second detection result, wherein the first detection result comprises a first detection frame set, and the second detection result comprises a second detection frame set;

extracting feature information of the target person through a third detection model based on the continuous multi-frame images and the first detection frame set;

screening the first detection frame set through the second detection frame set to obtain a target detection frame set, and matching the characteristic information of the target person with the target detection frame set through a preset matching algorithm to obtain a target detection frame corresponding to the target person;

and judging whether the corresponding target personnel invade a preset area or not based on the target detection frame.

Optionally, the screening the first detection frame set by the second detection frame set to obtain a target detection frame set includes:

overlapping the detection frames in the second detection frame set with the detection frames in the first detection frame set, and calculating the overlapping area;

if the overlapping area is larger than a preset overlapping threshold value, putting the corresponding detection frame in the second detection frame set into the target detection frame set;

otherwise, putting the corresponding detection frame in the first detection frame set into the target detection frame set.

Optionally, the preset matching algorithm includes a hungarian algorithm, and the matching the feature information of the target person with the target detection box set through the preset matching algorithm to obtain the target detection box corresponding to the target person includes:

carrying out graph structure processing on the characteristic information of the target person and the target detection frame set to obtain graph structure combined data;

and matching the graph structure joint data through the Hungarian algorithm to obtain a target detection box of the target personnel.

Optionally, after the feature information of the target person is extracted through a third network model based on the continuous multi-frame image and the first detection frame set, before the first detection frame set is screened through the second detection frame set to obtain a target detection frame set, the method further includes:

and carrying out de-duplication and filtering processing on the first detection frame set and the second detection frame set.

Optionally, the first detection model and the third detection model are constructed by a residual convolution, a standard convolution and a channel mixing algorithm, and are pre-trained by corresponding training data sets.

Optionally, the second detection model is constructed by a gaussian mixture background modeling method.

Optionally, the pre-training of the first detection model and the third detection model includes:

acquiring video data;

acquiring a multi-frame image from the video data, acquiring a pedestrian from the image and marking the pedestrian to obtain a pedestrian data set;

dividing the pedestrian data set into a training set, a verification set and a test set;

and constructing a first detection model or a third detection model and pre-training the first detection model or the third detection model through the pedestrian data set to obtain the trained first detection model or the trained third detection model.

In a second aspect, an embodiment of the present invention provides an area intrusion detection apparatus, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring video data, and the video data comprises continuous multi-frame images which comprise at least one target person;

the detection module is used for respectively detecting the target person through a first detection model and a second detection model based on the continuous multi-frame images to obtain a corresponding first detection result and a corresponding second detection result, wherein the first detection result comprises a first detection frame set, and the second detection result comprises a second detection frame set;

the extracting module is used for extracting the characteristic information of the target person through a third detection model based on the continuous multi-frame images and the first detection frame set;

the screening and matching module is used for screening the first detection frame set through the second detection frame set to obtain a target detection frame set, and matching the characteristic information of the target person with the target detection frame set through a preset matching algorithm to obtain a target detection frame corresponding to the target person;

and the judging module is used for judging whether the corresponding target personnel invade the preset area or not based on the target detection frame.

Optionally, the screening and matching module includes:

the overlapping submodule is used for overlapping the detection frames in the second detection frame set with the detection frames in the first detection frame set and calculating the overlapping area;

the first input sub-module is used for inputting the corresponding detection frame in the second detection frame set into the target detection frame set if the overlapping area is larger than a preset overlapping threshold value;

and the second putting sub-module is used for putting the corresponding detection frame in the first detection frame set into the target detection frame set if the detection frame is not in the target detection frame set.

Optionally, the screening and matching module further includes:

the processing submodule is used for carrying out graph structure processing on the characteristic information of the target personnel and the target detection frame set to obtain graph structure combined data;

and the matching submodule is used for matching the graph structure joint data through the Hungarian algorithm to obtain a target detection box of the target personnel.

Optionally, after the extracting module, the apparatus further includes:

and the de-duplication and filtering processing module is used for carrying out de-duplication and filtering processing on the first detection frame set and the second detection frame set.

Optionally, the apparatus further includes a pre-training module of the first detection model and the third detection model, where the pre-training module includes:

the first obtaining submodule is used for obtaining video data;

the second acquisition submodule is used for acquiring multi-frame images from the video data, acquiring pedestrians from the images and marking the pedestrians to obtain a pedestrian data set;

the dividing submodule is used for dividing the pedestrian data set into a training set, a verification set and a test set;

and the training submodule is used for constructing a first detection model or a third detection model and pre-training the first detection model or the third detection model through the pedestrian data set to obtain the trained first detection model or the trained third detection model.

In a third aspect, an embodiment of the present invention provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps in the area intrusion detection method provided by the embodiment of the invention.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps in the area intrusion detection method provided by the embodiment of the present invention.

In the embodiment of the invention, video data is acquired, wherein the video data comprises continuous multi-frame images, and the images comprise at least one target person; respectively detecting the target person through a first detection model and a second detection model based on the continuous multi-frame images to obtain a corresponding first detection result and a corresponding second detection result, wherein the first detection result comprises a first detection frame set, and the second detection result comprises a second detection frame set; extracting feature information of the target person through a third detection model based on the continuous multi-frame images and the first detection frame set; screening the first detection frame set through the second detection frame set to obtain a target detection frame set, and matching the characteristic information of the target person with the target detection frame set through a preset matching algorithm to obtain a target detection frame corresponding to the target person; and judging whether the corresponding target personnel invade a preset area or not based on the target detection frame. The method comprises the steps of obtaining continuous multi-frame images of video data including at least one target pedestrian, using a first detection model and a second detection model to respectively detect and obtain a first detection frame set and a second detection frame set of the target person from the continuous multi-frame images, screening the first detection frame set through the second detection frame set, obtaining a more accurate human body detection frame as the target detection frame set, then extracting feature information of the target person from the continuous multi-frame images through a third detection model, matching the feature information with the target detection frame set to obtain a final target detection frame, continuously identifying and tracking the target person corresponding to corresponding feature information through the target detection frame, judging whether the corresponding target person invades a preset area, and accordingly improving accuracy of pedestrian invasion area detection.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for detecting an intrusion into a region according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a first detection model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a third detection model according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for screening test frames according to an embodiment of the present invention;

FIG. 5 is a flow chart of a matching method provided by an embodiment of the invention;

fig. 6 is a schematic structural diagram of an area intrusion detection apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a screening and matching module according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of another screening and matching module provided in an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a method for detecting an intrusion into a region according to an embodiment of the present invention, as shown in fig. 1, including the following steps:

101. video data is acquired, the video data comprises a plurality of continuous frames of images, and each frame of image comprises at least one target person.

In the embodiment of the invention, the regional intrusion detection method can be applied to a pedestrian regional intrusion identification detection scene under video monitoring. The electronic device on which the provided regional intrusion detection method operates can acquire the video data and transmit the data in a wired connection mode or a wireless connection mode. It should be noted that the Wireless connection manner may include, but is not limited to, a 3G/4G connection, a WiFi (Wireless-Fidelity) connection, a bluetooth connection, a wimax (Worldwide Interoperability for Microwave Access) connection, a Zigbee (low power local area network protocol) connection, a uwb (ultra wideband) connection, and other Wireless connection manners known now or developed in the future.

The video data can be shot, collected and transmitted in real time through video collecting equipment, can also be actively uploaded from a terminal through manpower, and the like, and then the obtained image data is stored or directly input into electronic equipment on which a regional intrusion detection method is operated for detection. The video acquisition equipment can comprise a camera and electronic equipment which is provided with the camera and can acquire video images. The video includes a plurality of frames of continuous image data, or may be a plurality of frames of image sequences acquired at certain intervals, where the plurality of frames of continuous image data may be an image sequence including at least one target person and requiring pedestrian area intrusion detection, or may include a plurality of target persons, for example, 2 or 3 target persons.

102. On the basis of continuous multi-frame images, the target person is detected through the first detection model and the second detection model respectively, and corresponding first detection results and second detection results are obtained, wherein the first detection results comprise a first detection frame set, and the second detection results comprise a second detection frame set.

In the embodiment of the present invention, target detection may be performed on target persons from the consecutive multiple frames of images through a first detection model and a second detection model, and human detection frames of all target persons in each frame of image are correspondingly output, that is, the first detection frame set including all target persons of all frame of image and the second detection frame set including all target persons of all frame of image. The first detection model and/or the second detection model may be a trained neural network model capable of performing target recognition and positioning, such as an existing fast R-CNN, YOLO, or the like, or may be a self-designed neural network model, or may be a non-neural network model; at least one target person is detected from the multi-frame images through the first detection model and the second detection model, and a human body detection frame (namely, rectangular frame coordinates of the target person in the images) of the corresponding target person can be obtained.

Further, the first detection model is constructed by a residual convolution, a standard convolution and a channel mixing algorithm, and is pre-trained by a corresponding training data set. Specifically, referring to fig. 2, fig. 2 is a schematic structural diagram of a first detection model according to an embodiment of the present invention, as shown in fig. 2, the first detection model includes a multi-layer neural network, and a network structure of the multi-layer neural network includes an input image layer, a standard convolution module, and three feature dimension reduction modules, where an input image of the input image layer has a size of 416 × 416 RGB images, and then the number of channels is first increased to 64 through the standard convolution module with a convolution kernel of 3 × 3, the standard convolution module includes two sets of standard convolution structures, a last two-dimensional convolution of a first set of standard convolution structures of the standard convolution module increases the channels to 32, and a last two-dimensional convolution of a second set of standard convolution structures increases the channels to 64, so that a feature size is 104 × 104; after the standard convolution module, three characteristic dimension reduction modules are continuously arranged, each characteristic dimension reduction module is composed of eight groups of grouping convolutions, 1 × 1 standard convolutions and average pooling layers to form a residual convolution structure, and is connected with a two-dimensional convolution used for increasing the number of channels, and feature graphs (the scales of the feature graphs are respectively 52 × 52, 26 × 26 and 13 × 13) output by the last three characteristic dimension reduction modules are collected and output after channel mixing and superposition, and the number of model parameters constructed by the network structure is small, so that the operation speed of the model can be increased; through the network structure of the first detection model, more detailed pedestrian characteristics can be extracted, and the accuracy rate of pedestrian detection is increased.

The second detection model is constructed by a mixed Gaussian background modeling method. The detection problem of moving objects is mainly divided into two categories, camera fixation and camera motion. In the field of intelligent security, a camera is basically a fixed point location, so that the mixed Gaussian background modeling is very suitable for separating a background and a foreground from a video image sequence under the condition, and suppressing a non-pedestrian target in the foreground to obtain a more accurate boundary frame comprising a plurality of target personnel outlines of a plurality of frames of images, and the boundary frame is used as the second detection frame set.

103. And extracting the characteristic information of the target person through a third detection model based on the continuous multi-frame images and the first detection frame set.

In the embodiment of the present invention, a corresponding target person may be obtained from the continuous multiple frames of images through the first detection frame set output by the first detection model, and then feature information of the obtained target person, for example, face information, may be extracted through a third detection model. The third detection model can be constructed by a residual convolution, a standard convolution and a channel mixing algorithm, and is pre-trained by a corresponding training data set. Specifically, referring to fig. 3, fig. 3 is a schematic structural diagram of a third detection model according to an embodiment of the present invention, as shown in fig. 3, the third detection model includes a multi-layer neural network, and a network structure of the third detection model includes an input image layer, a standard convolution module, two feature dimension reduction modules, and a global mean pooling layer, where the input image of the input image layer is a target person intercepted by the first detection module, each target person is normalized to an image size of 128 × 64 in width, promoted to 256 by the standard convolution module, promoted to 1024 dimensions by the two lightweight feature dimension reduction modules, then one feature is extracted from each feature channel by the global mean pooling layer, and the target person features of 1024 dimensions are summed up, and each feature dimension reduction module is composed of two groups of convolution, a residual convolution structure composed of a standard convolution of 1 × 1 and a mean pooling layer, the two-dimensional convolution for increasing the number of channels is connected at the back, the number of parameters of the model constructed by the network structure is small, and the running speed of the model can be increased; and finally, the target person features are distributed to two branches, one branch utilizes cross entropy cross Encopy loss to train the target person feature classification, the other branch utilizes triple loss to shorten the distances of all features of the target person features with consistent features, and the distances of the features of the target persons with different features are pushed far, so that all feature information of different target persons is separated.

It should be noted that, the first detection model and the third detection model are trained in advance, and the pre-training step may include:

acquiring video data;

and constructing a first detection model or a third detection model and pre-training the first detection model or the third detection model through the pedestrian data set to obtain the trained first detection model or the trained third detection model. And then, verifying and testing the trained model through the verification set and the test set until the model meets the requirements.

104. And screening the first detection frame set through the second detection frame set to obtain a target detection frame set, and matching the characteristic information of the target person with the target detection frame set through a preset matching algorithm to obtain a target detection frame corresponding to the target person.

Optionally, referring to fig. 4, fig. 4 is a flowchart of a method for screening detection frames according to an embodiment of the present invention, and as shown in fig. 4, the screening the first detection frame set by the second detection frame set to obtain the target detection frame set includes the following steps:

401. overlapping the detection frames in the second detection frame set with the detection frames in the first detection frame set, and calculating the overlapping area;

402. if the overlapping area is larger than a preset overlapping threshold value, putting the corresponding detection frame in the second detection frame set into the target detection frame set;

403. otherwise, putting the corresponding detection frame in the first detection frame set into the target detection frame set.

In an embodiment of the present invention, the second detection frame set is a set of multiple target person human body detection frames including multiple frames of images output by the second detection model constructed by the gaussian mixture background modeling method, and the first detection frame set is a set of multiple target person human body detection frames including multiple frames of images output by the first detection model constructed based on the neural network. Then performing or operation (i.e. taking the union of the two detection frames) on the human detection frames in the second detection frame set and the human detection frames in the first detection frame set, calculating the overlapping area IoU of the two detection frames (i.e. taking the intersection of the two detection frames), if the overlapping area IoU of a certain human detection frame is larger than a preset overlapping threshold, deleting the corresponding human detection frame in the first detection frame set, only reserving the corresponding human detection frame in the second detection frame set, otherwise, if the overlapping area IoU of a certain human detection frame is smaller than the preset overlapping threshold, deleting the corresponding human detection frame in the second detection frame set, only reserving the corresponding human detection frame in the first detection frame set, then putting the reserved human detection frame into a target detection frame set until the last human detection frame in the second detection frame set and/or the first detection frame set, the screening is ended. The screening method which integrates the first detection model based on the neural network and the second detection model based on the mixed Gaussian background modeling can greatly improve the accuracy of pedestrian target detection.

Optionally, referring to fig. 5, fig. 5 is a flowchart of a matching method provided in an embodiment of the present invention, as shown in fig. 5, where the preset matching algorithm includes a hungarian algorithm, and the matching the feature information of the target person with the target detection box set by using the preset matching algorithm to obtain the target detection box corresponding to the target person includes the following steps:

501. carrying out graph structure processing on the characteristic information of the target personnel and the target detection frame set to obtain graph structure combined data;

502. and matching the combined data of the graph structures through a Hungarian algorithm to obtain a target detection box of the target personnel.

In the embodiment of the present invention, feature information, such as face information, of all target persons corresponding to all target person human body detection boxes in the first detection box set may be extracted from the multi-frame image through the third detection model and the first detection box set, then the target detection box set and the feature information of all target persons are subjected to graph structure processing to obtain graph structure combined data, the graph structure combined data may be specifically processed into a bipartite graph, that is, the feature information of the target persons may be regarded as one point, the feature information of all target persons forms one point set of the bipartite graph, all target person human body detection boxes in the target detection box set form another point set of the bipartite graph, then the two point sets may be matched by using the hungarian algorithm to obtain a plurality of continuous target detection boxes corresponding to each target person, therefore, the corresponding target personnel can be continuously tracked through the target detection frame, and the accuracy of pedestrian tracking is improved.

105. And judging whether the corresponding target personnel invade the preset area or not based on the target detection frame.

In the embodiment of the present invention, the preset area may be an artificially set area, a dangerous area, or another area where entry is prohibited, and may be defined in a video image acquired by an image pickup device, then the corresponding target person is continuously tracked by the multiple continuous target detection frames corresponding to each target person obtained through the above steps, and whether the corresponding target person enters the preset area is determined by an overlapping area between the target detection frame and the preset area, and a corresponding prompt is performed, for example, when the overlapping area between the target detection frame and the preset area is greater than zero, a corresponding warning message is sent to remind the relevant person. Furthermore, the overlapping area can be constructed to classify the intrusion level of the alarm information, and the larger the overlapping area is, the deeper the pedestrian invades, the higher the intrusion level is, and the stronger the alarm information can be.

In summary, the embodiment of the invention obtains the first detection frame set and the second detection frame set of the target person from the continuous multi-frame images by obtaining the continuous multi-frame images including the video data of at least one target pedestrian, using the first detection model and the second detection model to respectively detect and obtain the first detection frame set and the second detection frame set of the target person from the continuous multi-frame images, and the first detection frame set is subjected to detection frame screening through the second detection frame set, so that more accurate human body detection frames can be obtained as a target detection frame set, then extracting the characteristic information of the target person from the continuous multi-frame images through a third detection model, matching the characteristic information with the target detection frame set to obtain a final target detection frame, the target personnel corresponding to the corresponding characteristic information can be continuously identified and tracked through the target detection frame, whether the corresponding target personnel invade the preset area or not is judged, and therefore the accuracy of pedestrian area invasion detection is improved.

It should be noted that the method for detecting regional intrusion provided by the embodiment of the present invention may be applied to devices such as a mobile phone, a monitor, a computer, and a server that can perform regional intrusion detection.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an area intrusion detection apparatus according to an embodiment of the present invention, and as shown in fig. 6, the area intrusion detection apparatus 600 includes:

the acquiring module 601 is configured to acquire video data, where the video data includes a plurality of consecutive frames of images, and the images include at least one target person.

In the embodiment of the invention, the area intrusion detection device can be applied to a pedestrian area intrusion identification detection scene under video monitoring. The electronic equipment on which the provided regional intrusion detection device operates can acquire the video data and transmit the data in a wired connection mode or a wireless connection mode. It should be noted that the Wireless connection manner may include, but is not limited to, a 3G/4G connection, a WiFi (Wireless-Fidelity) connection, a bluetooth connection, a wimax (Worldwide Interoperability for Microwave Access) connection, a Zigbee (low power local area network protocol) connection, a uwb (ultra wideband) connection, and other Wireless connection manners known now or developed in the future.

The video data can be shot, collected and transmitted in real time through video collecting equipment, can also be actively uploaded from a terminal through manpower, and the like, and then the obtained image data is stored or directly input into the electronic equipment on which the regional intrusion detection device operates to detect. The video acquisition equipment can comprise a camera and electronic equipment which is provided with the camera and can acquire video images. The video includes a plurality of frames of continuous image data, or may be a plurality of frames of image sequences acquired at certain intervals, where the plurality of frames of continuous image data may be an image sequence including at least one target person and requiring pedestrian area intrusion detection, or may include a plurality of target persons, for example, 2 or 3 target persons.

The detection module 602 is configured to detect the target person through a first detection model and a second detection model respectively based on the consecutive multi-frame images to obtain a corresponding first detection result and a corresponding second detection result, where the first detection result includes a first detection frame set, and the second detection result includes a second detection frame set.

Further, the first detection model is constructed by a residual convolution, a standard convolution and a channel mixing algorithm, and is pre-trained by a corresponding training data set. Specifically, referring to fig. 2, the first detection model includes a multi-layer neural network, and a network structure of the multi-layer neural network includes an input image layer, a standard convolution module and three feature dimension reduction modules, wherein the size of an input image of the input image layer is 416 × 416, then the number of channels is first increased to 64 by the standard convolution module with a convolution kernel of 3 × 3, the standard convolution module specifically includes two groups of standard convolution structures, the last two-dimensional convolution of the first group of standard convolution structures of the standard convolution module increases the number of channels to 32, and the last two-dimensional convolution of the second group of standard convolution structures increases the number of channels to 64, so that the size of the feature map becomes 104 × 104; after the standard convolution module, three characteristic dimension reduction modules are continuously arranged, each characteristic dimension reduction module is composed of eight groups of grouping convolutions, 1 × 1 standard convolutions and average pooling layers to form a residual convolution structure, and is connected with a two-dimensional convolution used for increasing the number of channels, and feature graphs (the scales of the feature graphs are respectively 52 × 52, 26 × 26 and 13 × 13) output by the last three characteristic dimension reduction modules are collected and output after channel mixing and superposition, and the number of model parameters constructed by the network structure is small, so that the operation speed of the model can be increased; through the network structure of the first detection model, more detailed pedestrian characteristics can be extracted, and the accuracy rate of pedestrian detection is increased.

Optionally, the second detection model is constructed by a gaussian mixture background modeling method. The detection problem of moving objects is mainly divided into two categories, camera fixation and camera motion. In the field of intelligent security, a camera is basically a fixed point location, so that the mixed Gaussian background modeling is very suitable for separating a background and a foreground from a video image sequence under the condition, and suppressing a non-pedestrian target in the foreground to obtain a more accurate boundary frame comprising a plurality of target personnel outlines of a plurality of frames of images, and the boundary frame is used as the second detection frame set.

An extracting module 603, configured to extract, based on the consecutive multi-frame images and the first detection frame set, feature information of the target person through a third detection model.

In the embodiment of the present invention, a corresponding target person may be obtained from the continuous multiple frames of images through the first detection frame set output by the first detection model, and then feature information of the obtained target person, for example, face information, may be extracted through a third detection model.

Optionally, the third detection model may be constructed by a residual convolution, a standard convolution and a channel mixing algorithm, and pre-trained by a corresponding training data set. Specifically, referring to fig. 3, the third detection model includes a multi-layer neural network, and the network structure includes an input image layer, a standard convolution module, two feature dimension reduction modules, and a global mean pooling layer, where the input image of the input image layer is the target person intercepted by the first detection module, each target person is normalized to an image size 128 × 64 in width, and is promoted to 256 by the standard convolution module, and then the dimension is promoted to 1024 by the two lightweight feature dimension reduction modules, and then a feature is extracted from each feature channel by the global mean pooling layer, and the target person features of 1024 dimensions are counted, and each feature dimension reduction module is composed of two groups of convolutions, a residual convolution structure composed of 1 × 1 standard convolution and averaging pooling layer, and a two-dimensional convolution for increasing the number of channels is connected to the subsequent two-dimensional convolution, and the number of model parameters constructed by the network structure is small, the running speed of the model can be improved; and finally, the target person features are distributed to two branches, one branch utilizes cross entropy cross Encopy loss to train the target person feature classification, the other branch utilizes triple loss to shorten the distances of all features of the target person features with consistent features, and the distances of the features of the target persons with different features are pushed far, so that all feature information of different target persons is separated.

It should be noted that, the first detection model and the third detection model are trained in advance, and correspondingly, the area intrusion detection apparatus further includes a pre-training module of the first detection model and the third detection model, where the pre-training module includes:

the first obtaining submodule is used for obtaining video data;

And a screening and matching module 604, configured to screen the first detection frame set through the second detection frame set to obtain a target detection frame set, and match the feature information of the target person with the target detection frame set through a preset matching algorithm to obtain a target detection frame corresponding to the target person.

Optionally, as shown in fig. 7, the screening and matching module 604 includes:

an overlap sub-module 6041, configured to overlap the detection frames in the second detection frame set and the detection frames in the first detection frame set, and calculate an overlap area;

a first putting sub-module 6042, configured to put a corresponding detection frame in the second detection frame set into the target detection frame set if the overlapping area is greater than a preset overlapping threshold;

and a second placing sub-module 6043, configured to place the corresponding detection frame in the first detection frame set into the target detection frame set otherwise.

Optionally, after the extracting module 603, the apparatus further includes a de-duplication and filtering processing module, configured to perform de-duplication and filtering processing on the first detection frame set and the second detection frame set.

Optionally, as shown in fig. 8, the screening and matching module 604 further includes:

the processing submodule 6044 is configured to perform graph structure processing on the feature information of the target person and the target detection box set to obtain graph structure joint data;

and the matching submodule 6045 is used for matching the graph structure joint data through the Hungarian algorithm to obtain a target detection box of the target person.

And a determining module 605, configured to determine whether the corresponding target person invades the preset area based on the target detection frame.

It should be noted that the device for detecting intrusion into a region provided by the embodiment of the present invention may be applied to a mobile phone, a monitor, a computer, a server, and other devices that can perform intrusion detection into a region.

The regional intrusion detection device provided by the embodiment of the invention can realize each process realized by the regional intrusion detection method in the embodiment of the method, and can achieve the same beneficial effect. To avoid repetition, further description is omitted here.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 9, the electronic device 900 includes: a memory 902, a processor 901 and a computer program stored on the memory 902 and executable on the processor 901, wherein:

the processor 901 is used for calling the computer program stored in the memory 902 and executing the following steps:

acquiring video data, wherein the video data comprises continuous multi-frame images, and the images comprise at least one target person;

Optionally, the screening, performed by the processor 901, the first detection frame set by using the second detection frame set to obtain a target detection frame set includes:

Optionally, the preset matching algorithm executed by the processor 901 includes a hungarian algorithm, and the obtaining of the target detection box corresponding to the target person by matching the feature information of the target person with the target detection box set through the preset matching algorithm includes:

Optionally, after the extracting, by a third network model, the feature information of the target person based on the continuous multi-frame images and the first detection frame set, before the screening, by the second detection frame set, the first detection frame set to obtain a target detection frame set, the method executed by the processor 901 further includes:

Optionally, in the steps executed by the processor 901, the first detection model and the third detection model are constructed by a residual convolution, a standard convolution and a channel mixing algorithm, and are pre-trained by a corresponding training data set.

Optionally, in the step executed by the processor 901, the second detection model is constructed by a gaussian mixture background modeling method.

Optionally, in the steps executed by the processor 901, the pre-training of the first detection model or the third detection model includes:

acquiring video data;

The electronic device may be a device that can be applied to a mobile phone, a monitor, a computer, a server, and the like that can perform regional intrusion detection.

The electronic device provided by the embodiment of the invention can realize each process realized by the regional intrusion detection method in the method embodiment, can achieve the same beneficial effects, and is not repeated here to avoid repetition.

The memory 902 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 902 may be an internal storage unit of the electronic device 900, such as a hard disk or a memory of the electronic device 900. In other embodiments, the memory 902 may also be an external storage device of the electronic device 900, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the electronic device 900. Of course, the memory 902 may also include both internal and external memory units of the electronic device 900. In this embodiment, the memory 902 is generally used for storing an operating system installed in the electronic device 900 and various application software, such as a program code of a local intrusion detection method. In addition, the memory 902 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 901 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 901 is typically used to control the overall operation of the electronic device 900. In this embodiment, the processor 901 is configured to execute a program code stored in the memory 902 or process data, for example, execute a program code of a method for detecting an intrusion into a region.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the method for detecting an intrusion into a region provided in the embodiment of the present invention, and can achieve the same technical effect, and in order to avoid repetition, the computer program is not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for regional intrusion detection, comprising the steps of:

2. The method of claim 1, wherein the screening the first set of detection boxes through the second set of detection boxes to obtain a set of target detection boxes comprises:

3. The method as claimed in claim 2, wherein the preset matching algorithm comprises a hungarian algorithm, and the matching the feature information of the target person with the target detection box set through the preset matching algorithm to obtain the target detection box corresponding to the target person comprises:

4. The method of claim 1, wherein after the extracting, by a third network model, the feature information of the target person based on the consecutive multi-frame images and the first detection frame set, before the screening, by the second detection frame set, the first detection frame set to obtain a target detection frame set, the method further comprises:

5. The method of claim 4, wherein the first detection model and the third detection model are constructed by residual convolution, standard convolution and channel mixing algorithms and pre-trained by corresponding training data sets.

6. The method of claim 5, wherein the second detection model is constructed by a Gaussian mixture background modeling method.

7. The method of claim 6, wherein the pre-training of the first detection model and the third detection model comprises:

acquiring video data;

8. An area intrusion detection device, comprising:

9. The apparatus of claim 8, wherein the screening and matching module comprises:

10. The apparatus of claim 9, wherein the screening and matching module further comprises:

11. The apparatus of claim 10, wherein after the extraction module, the apparatus further comprises:

12. The apparatus of claim 11, in which the first detection model and the third detection model are constructed by residual convolution, standard convolution, and channel mixing algorithms and pre-trained by corresponding training data sets.

13. The apparatus of claim 12, wherein the second detection model is constructed by a gaussian mixture background modeling method.

14. The apparatus of claim 13, further comprising a pre-training module of the first detection model and the third detection model, the pre-training module comprising:

the first obtaining submodule is used for obtaining video data;

15. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of regional intrusion detection according to any of claims 1 to 7 when executing the computer program.

16. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for regional intrusion detection according to any one of claims 1 to 7.