WO2022141962A1 - Invasion detection method and apparatus, device, storage medium, and program product - Google Patents

Invasion detection method and apparatus, device, storage medium, and program product Download PDF

Info

Publication number
WO2022141962A1
WO2022141962A1 PCT/CN2021/087835 CN2021087835W WO2022141962A1 WO 2022141962 A1 WO2022141962 A1 WO 2022141962A1 CN 2021087835 W CN2021087835 W CN 2021087835W WO 2022141962 A1 WO2022141962 A1 WO 2022141962A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection frame
intrusion
detection
preset
frame
Prior art date
Application number
PCT/CN2021/087835
Other languages
French (fr)
Chinese (zh)
Inventor
朱铖恺
赵永磊
武伟
路少卿
闫俊杰
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Publication of WO2022141962A1 publication Critical patent/WO2022141962A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present disclosure relates to the field of intelligent detection, and in particular, to an intrusion detection method and apparatus, device, storage medium and program product.
  • Pedestrians/non-motor vehicles often enter by mistake or intentionally on expressways, which affects the normal driving of vehicles on the road and has a great impact on traffic safety.
  • the characteristics of such long-tail events are that the probability of occurrence within a certain period of time is low, and the image data collected by the camera during the collection process is massive.
  • the target detection method of deep learning in related technologies achieves target accuracy by increasing training data and model capacity. However, it requires a lot of manpower to mark the detection frame, and will greatly increase the hardware cost of algorithm operation, which is an urgent problem to be solved in the implementation of long-tail event algorithms such as pedestrian intrusion.
  • embodiments of the present application provide an intrusion detection method, apparatus, device, and storage medium.
  • an embodiment of the present application provides an intrusion detection method, including: obtaining the to-be-processed image from a to-be-processed video stream; detecting objects in the to-be-processed image to obtain at least one object detection frame; determining whether the preset intrusion object exists in the object detection frame; in the case of determining that there is a preset intrusion object in the object detection frame, identifying the to-be-processed image to obtain an intrusion detection area; based on The preset position of the intrusion object and the intrusion detection area determine whether an intrusion event occurs.
  • the detecting an object in the to-be-processed image to obtain at least one object detection frame includes: detecting an object in the to-be-processed image to obtain at least one object detection frame, each the position of the object detection frame and the type of the object in each of the object detection frames; when it is determined based on the type of the object that the preset intrusion object exists in any of the object detection frames, determine the The preset intrusion object exists in the to-be-processed image; based on the position of the object detection frame where the preset intrusion object exists, the position of the preset intrusion object is determined.
  • the method of first obtaining the object detection frame, and then determining that there is a preset intrusion object in the object detection frame can effectively detect the preset intrusion object in the image to be processed, and the detection frame can also be used to subsequently determine whether the intrusion object is not exists in the intrusion detection area.
  • the detector can be used to accurately determine whether the intrusion object is included in the to-be-processed image and the position of the intrusion object.
  • the object detection in the image to be processed obtains at least one object detection frame, the position of each object detection frame and the category of the object in each object detection frame, including : based on a deep convolutional network, extract features from the image to be processed to obtain a first feature map; based on a region generating network, generate candidate target regions in the first feature map to obtain a second feature map; the first feature map
  • the second feature map includes at least one detection frame and the position and confidence of each detection frame; based on the pooling layer, the first feature map and the second feature map are subjected to the process of pooling position-sensitive candidate regions wherein, based on the at least one detection frame and the confidence level of each detection frame, a detection frame that satisfies a preset condition is determined as an object detection frame, and a category of an object in the object detection frame is determined.
  • the object detection in the image to be processed can be realized, and at least one object detection frame, the position of each object detection frame and the category of the object in each object detection frame can be obtained.
  • the determining a detection frame that meets a preset condition as an object detection frame based on the at least one detection frame and the confidence of each of the detection frames includes: using a non-maximum value suppression algorithm, Based on the confidence of each of the detection frames and the intersection ratio between the detection frames in the at least one detection frame, a detection frame satisfying a preset condition is determined as an object detection frame.
  • each object in the image to be processed can be finally determined a most suitable object detection frame by using the non-maximum value suppression algorithm.
  • the non-maximum suppression algorithm is used, based on the confidence of each of the detection frames and the intersection ratio between the detection frames in the at least one detection frame, will satisfy the predetermined
  • Determining a conditional detection frame as an object detection frame includes: based on the confidence of each detection frame, determining at least one detection frame with the highest confidence in the detection frame as a target detection frame; detecting the target The frame is determined as one of the object detection frames; the intersection ratio of the target detection frame and each other detection frame is determined; wherein, the other detection frames refer to the at least one of the detection frames except the target detection frame A detection frame other than the frame; delete other detection frames whose intersection ratio is greater than the threshold from the at least one detection frame to obtain a candidate detection frame set; remove the target detection frame from the candidate detection frame set
  • the detection frame with the highest confidence is determined as a new target detection frame; the new target detection frame is determined as one of the object detection frames; the difference between the new target detection frame and each new other detection frame is determined Intersection and union ratio; wherein,
  • the non-maximum value suppression algorithm is used to merge the detection frames whose intersection ratio is greater than the threshold, and the detection frame that meets the preset conditions can be determined as the object detection frame, so as to realize the final determination of each object in the image to be processed as a maximum detection frame.
  • the determining whether the preset intrusion object exists in the object detection frame includes: using a first-level classifier to perform a first-level classifier on the object detection frame corresponding to the target category based on the object category. A classification is used to obtain a first classification result; using a second-level classifier cascaded with the first-level classifier, based on the first classification result, a second classification is performed on the object detection frame that meets the preset condition, and the result is obtained second classification result; when it is determined that the preset intrusion object exists in any of the object detection frames based on the second classification result, it is determined that the preset intrusion object exists in the object detection frame.
  • the first classification can be regarded as a preliminary judgment
  • the second classification can be regarded as a re-judgment. Adopting the two-classification method of making a preliminary judgment and then a second judgment can effectively improve the efficiency of the classification and reduce the misjudgment rate.
  • the first-level classifier and the second-level classifier have the following relationship: the classification accuracy of the first-level classifier is lower than the classification accuracy of the second-level classifier; The number of convolutional layers in the first-level classifier is less than the number of convolutional layers in the second-level classifier; the confidence level of the first-level classifier is higher than that of the second-level classifier. Confidence is low.
  • using the classifier composed of the first-level classifier and the second-level classifier can effectively improve the efficiency and confidence of the classification and reduce the misjudgment rate on the basis of ensuring the classification accuracy.
  • the identifying the to-be-processed image to obtain the intrusion detection area includes: using a convolutional neural network model to perform semantic segmentation on the to-be-processed image to obtain the intrusion detection area.
  • the convolutional neural network is used for semantic segmentation of the image to be processed, which realizes the automatic identification of the intrusion detection area, and does not require manual labeling of the area, which is convenient for large-scale online applications.
  • the determining whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area includes: judging the preset intrusion based on the preset position of the intrusion object Whether the object is located in the intrusion detection area; in response to the preset intrusion object located in the intrusion detection area, it is determined that the intrusion event occurs; or in response to the preset intrusion object located in the intrusion detection area Outside the area, it was determined that the intrusion event did not occur.
  • determining whether the preset intrusion object is located in the intrusion detection area based on the preset position of the intrusion object can effectively improve the accuracy of determining that the preset intrusion object is located in the intrusion detection area.
  • the determining whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area includes: determining an object detection frame in which the preset intrusion object exists as a target detection frame; determine the center point of the bottom edge of the target detection frame as the position of the preset intrusion object; based on the relative relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area Location relationship to determine whether an intrusion event has occurred.
  • the method further comprises: in response to the occurrence of the intrusion event, outputting an alarm identification.
  • the intrusion object can be quickly guided to leave the intrusion detection area according to the alarm identification, thereby effectively preventing the intrusion object from entering the intrusion detection area.
  • the method further includes: in response to the occurrence of the intrusion event, recording the intrusion event based on the category of the intrusion object and the intrusion detection area to obtain an intrusion record; recording the intrusion record stored or sent to the associated terminal.
  • intrusion events can be recorded, and high-frequency points of intrusion by intrusion objects can be found according to the intrusion records, and preventive measures can be strengthened.
  • an embodiment of the present application provides an intrusion detection device, including: an obtaining module configured to obtain the to-be-processed image from a to-be-processed video stream; a detection module configured to detect objects in the to-be-processed image performing detection to obtain at least one object detection frame; a first determination module, configured to determine whether the preset intrusion object exists in the object detection frame; an identification module, configured to determine whether a preset intrusion object exists in the object detection frame In the case of the intrusion object, the image to be processed is identified to obtain an intrusion detection area; the second determination module is configured to determine whether an intrusion event occurs based on the preset intrusion object position and the intrusion detection area.
  • an embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and when the processor executes the program, the intrusion detection of the above method is implemented method.
  • an embodiment of the present application provides a storage medium storing executable instructions for implementing the intrusion detection method of the above method when a processor is caused to execute.
  • an embodiment of the present application provides a computer program product, including one or more instructions, where the one or more instructions are suitable for an intrusion detection method in which a processor loads and executes the above method.
  • the object in the image to be processed is detected to obtain at least one object detection frame, and then it is determined whether a preset intrusion object exists in the object detection frame.
  • the provided detection model and classification model are decoupled, and the classification model can be customized for special scenarios during the implementation of the algorithm to quickly achieve the expected performance.
  • Great improvement since the detection model and the classification model are decoupled, for the optimization of false positives in new scenarios, it is only necessary to add false positive data to train a new classifier, which can be cascaded with the existing detection model, which is suitable for rapid upgrade iterations of algorithm implementation. Filtering false positives in a cascaded manner can greatly improve the detection accuracy of long-tail events.
  • FIG. 1 is a schematic flowchart of the implementation of an intrusion detection method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a cascaded classification model provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a semantic segmentation model provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a detection model provided by an embodiment of the present application.
  • FIG. 5 is a display diagram of a smart transportation platform provided by an embodiment of the present application.
  • FIG. 6 is a pedestrian/vehicle intrusion diagram provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an implementation flowchart of an intrusion detection method provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an intrusion detection device according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application.
  • the embodiment of the present application proposes an intrusion detection method to be applied to a computer device.
  • the computer device may include a removable device or a non-removable device.
  • the functions implemented by the method may be implemented by calling a program code by a processor in the computer device.
  • the program code can be stored in a computer storage medium, and it can be seen that the computer device includes at least a processor and a storage medium.
  • Step S101 obtaining the to-be-processed image from the to-be-processed video stream
  • the video stream acquired by the image acquisition device can be used as input to acquire the image to be processed from the video stream. Due to the influence of the acquisition time period, the data of such acquired video stream is massive in most cases.
  • the image capturing device may be a camera.
  • the current road image acquisition system can be reused, which can effectively avoid the limitation of dedicated hardware; the image to be processed can also be obtained by means of timed snapshots, and the pedestrians/non-motor vehicles entering the expressway can be identified and warned. , to assist the traffic police in maintaining the order of the expressway and improving the safety of the road network.
  • Step S102 detecting objects in the to-be-processed image to obtain at least one object detection frame
  • an object detection model may be used to detect objects in the image to be processed to obtain at least one object detection frame.
  • the target detection model can be trained based on Faster-Regions with Convolutional Neural Network (Faster-RCNN), You Only Look Once (YOLO), single-shot One of the multi-box (SingleShot MultiBox Detector, SSD) networks, etc.
  • Faster-RCNN Faster-Regions with Convolutional Neural Network
  • YOLO You Only Look Once
  • SingleShot MultiBox Detector, SSD single-shot One of the multi-box
  • the two-step target detection method represented by Faster R-CNN has the advantage of high detection accuracy, but the disadvantage is that the detection speed is slow; the single-step target detection method represented by YOLO and SSD network has the advantage that the detection speed is higher than that of double-step detection.
  • Method classes are fast.
  • the input of any of the above three types of object detection models may be images to be processed, and at least one object detection frame is output after processing.
  • Step S103 determining whether the preset intrusion object exists in the object detection frame
  • a cascaded classifier model can be used to determine whether a preset intrusion object exists in the object detection frame.
  • the cascaded classifier model may include multi-level classifiers, and each level of classifier completes the corresponding classification task. In this way, the classification result determined by the cascaded classifier model is more accurate than the classification result determined by the single-level classifier model, and the effect of effectively improving the classification efficiency can be achieved.
  • the preset intrusion object may be a pedestrian or a non-motor vehicle.
  • Step S104 in the case of determining that a preset intrusion object exists in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area;
  • the intrusion detection method provided by the embodiments of the present application can be applied to the identification of pedestrians/non-motor vehicles inadvertently breaking into high-speed roads, or intentional entry events, and can also be applied to the entrance of kindergartens to detect lost children, people by lakes or waters Falling into the water, or a long-tail event such as a prison break. Since such long-tail events are characterized by a low probability of occurring within a certain period of time, the image data collected by the camera during the acquisition process is massive. If the target area is identified for each image, the computing power of the system is required to be high. .
  • the system only recognizes the to-be-processed images that are judged to have intrusion objects, and the recognition method can use a semantic segmentation model.
  • the intrusion detection method provided by this embodiment of the present application is applied to the pedestrian/non-motor vehicle accidental intrusion occurring on a high-speed road as an example for description.
  • Step S105 Determine whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area.
  • the position of the identified preset intrusion object and the intrusion detection area obtained in step S104 are processed to determine whether there is a preset intrusion object in the intrusion detection area.
  • the object in the image to be processed is detected to obtain at least one object detection frame, and then it is determined whether a preset intrusion object exists in the object detection frame.
  • the provided detection model and classification model are decoupled, and the classification model can be customized for special scenarios during the implementation of the algorithm to quickly achieve the expected performance.
  • Great improvement due to the decoupling of the detection model and the classification model, for the optimization of false positives in new scenarios, it is only necessary to add false positive data to train a new classifier, which can be cascaded with the existing detection model, which is suitable for the implementation of the algorithm
  • the rapid upgrade iteration of the system can greatly improve the detection accuracy of long-tail events by filtering false positives in a cascading manner.
  • the cascaded classification model includes: a first-level classifier 220 and a second-level classifier 230 . in:
  • the first-level classifier 220 includes a first residual network 221 and a first fully connected layer 222, wherein the first residual network 221 performs feature extraction on the image content in the input object detection frame 210 to obtain a feature map P1;
  • the first fully connected layer 222 performs the first classification based on the feature map P1 to obtain a first classification result; based on the first classification result, the object detection frames that do not meet the requirements are filtered out.
  • the second-level classifier 230 includes a second residual network 231 and a second fully-connected layer 232, wherein the second residual network 231 is used to detect the picture in the object detection frame when the first classification result satisfies the condition Feature extraction is performed on the content to obtain a feature map P2; the second fully connected layer 232 performs a second classification based on the feature map P2 to obtain a second classification result; based on the second classification result, object detection frames that do not meet the requirements are filtered out.
  • the second classification result 240 represents the classification result of the object detection frame after being classified by the cascade model.
  • the first residual network 221 in the first-level classifier 220 can use a classification model with fast classification speed, such as the residual network Resnet18, which can filter most of the negative samples;
  • the second-level The second residual network 231 in the classifier 230 can use a classification model with slow speed but high accuracy, such as the residual network Resnet50, to improve the accuracy, so that the overall speed will not be much slower, and the accuracy will be improved a lot.
  • Step S201 obtaining the to-be-processed image from the to-be-processed video stream
  • Step S202 detecting the object in the to-be-processed image to obtain at least one object detection frame, the position of each of the object detection frames, and the category of the object in each of the object detection frames;
  • the detection model input is a high-speed road image
  • the output is an object detection frame.
  • the objects can be workers, pedestrians, animals, cars, motorcycles, electric bicycles, and the like.
  • a deep convolutional network is used to extract features from high-speed road images
  • a region generation network is used to extract candidate object detection frames; in the detection stage, based on the object detection frames obtained in the feature extraction stage, candidate objects are detected.
  • the frame feature performs position-sensitive candidate region pooling, that is, category classification and coordinate regression, to obtain the position of each object detection frame and the category of the object in each object detection frame.
  • Step S203 when it is determined that the preset intrusion object exists in any of the object detection frames based on the category of the object, determine that the preset intrusion object exists in the to-be-processed image;
  • the detection result includes, but is not limited to, at least one of the following: workers, pedestrians, animals, cars, motorcycles, electric bicycles, and the like on the high-speed road.
  • objects classified as pedestrians and non-motor vehicles are preset intrusion objects, that is, when there are pedestrians and non-motor vehicles in any object detection frame, it is determined that there are preset intrusion objects in the image to be processed.
  • Step S204 determining the position of the preset intrusion object based on the position of the object detection frame where the preset intrusion object exists;
  • the position of the object detection frame where there is a preset intrusion object may be represented by position coordinates. Based on the position coordinates of the object detection frame, the position of the intrusion object in the image to be processed can be determined.
  • Step S205 using a first-level classifier to perform a first classification on the object detection frame corresponding to the target category based on the category of the object, to obtain a first classification result;
  • a cascaded classification model as shown in FIG. 2 may be used, and the first-level classifier 220 may include a first residual network 221 and a first fully connected layer 222 .
  • the first residual network 221 can use the ResNet18 network
  • 18 represents the depth of the network, that is, 18 specifies 18 layers with weights, including convolutional layers and fully connected layers, excluding pooling layers and batches Normalization (Batch Normalization, BN) layer.
  • the ResNet18 network performs feature extraction on the object detection frame to obtain a feature map; the first fully connected layer 222 performs the first classification based on the feature map to obtain the first classification result, that is, the object detection frame that does not meet the requirements is filtered out by the first filter.
  • the first-level classifier 220 completes the initial judgment of the image, which may also be called rough judgment. The characteristics of rough judgment are high efficiency but high false judgment rate.
  • Step S206 using the second-level classifier cascaded with the first-level classifier, and based on the first classification result, perform a second classification on the object detection frame that meets the preset condition, and obtain a second classification result;
  • the first-level classifier and the second-level classifier have the following relationship: the classification accuracy of the first-level classifier is lower than the classification accuracy of the second-level classifier; the The number of convolutional layers in the first-level classifier is less than the number of convolutional layers in the second-level classifier; the confidence level of the first-level classifier is higher than that of the second-level classifier. Low.
  • the object detection frame is classified using the second-level classifier 230, wherein the second-level classifier 230 includes a second residual network 231 and the second fully connected layer 232.
  • the second residual network 231 can use the ResNet50 network.
  • the ResNet50 network performs feature extraction on the object detection frame to obtain a feature map; the second fully connected layer 232 performs a second classification based on the feature map to obtain a second classification result, that is, the object detection frame that does not meet the requirements is filtered out by a second filter.
  • What the second-level classifier 230 completes is the re-judgment of the object detection frame, which may also be called fine judgment.
  • the fine judgment is characterized by high classification accuracy and low misjudgment rate.
  • Step S207 when it is determined that the preset intrusion object exists in any of the object detection frames based on the second classification result, determine that the preset intrusion object exists in the object detection frame;
  • Step S208 in the case of determining that there is a preset intrusion object in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area;
  • Step S209 determining the object detection frame in which the preset intrusion object exists as the target detection frame
  • Step S210 determining the center point of the bottom edge of the target detection frame as the preset position of the intrusion object
  • the center point of the bottom edge of the target detection frame corresponds to a position coordinate
  • the position coordinate is determined as the position of the intrusion object.
  • the position coordinate corresponding to the center point of any frame of the target detection frame may also be determined as the position of the intrusion object.
  • Step S211 based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area, determine whether an intrusion event occurs.
  • the position coordinates corresponding to the center point of the bottom edge of the target detection frame may be compared with the position coordinates of the intrusion detection area. If it is determined that the position coordinates belong to the intrusion area, it is determined that an intrusion event has occurred; if it is determined that the position coordinates do not belong to the intrusion area, it is determined that no intrusion event has occurred.
  • the classified object detection frame is obtained, and only the image cropped from the matched classified object frame is input into the cascade classification. In this way, there is no need to identify all the images to be processed, which can significantly reduce the computing power requirement on the hardware device.
  • the to-be-processed image is identified to obtain the intrusion detection area.
  • the cascade classifier is used for classification in two steps.
  • the first-level classification can be regarded as a preliminary judgment
  • the second-level classification can be regarded as a secondary judgment.
  • the two-stage classification method of first preliminary judgment and then secondary judgment can be effective. Improve the efficiency of classification and reduce the misjudgment rate.
  • filtering false positives in a cascaded manner can greatly improve the detection accuracy of long-tail events.
  • the semantic segmentation model includes: a multi-layer convolution network 302 , a multi-layer deconvolution network 303 , and an image 304 that has completed semantic segmentation. in:
  • the multi-layer convolutional network 302 when the number of layers is 5, is a 5-layer convolutional network, which is used to downsample the image to be processed by 32 times, and at the same time encode the image to be processed.
  • the multi-layer deconvolution network 303 when the number of layers is 4, is a 4-layer deconvolution network, which is used to upsample the encoding result by 32 times, and perform decoding and semantic understanding of the encoding result.
  • the image to be processed 301 is input into the convolutional neural network model to obtain an image 304 that can complete semantic segmentation, that is, the high-speed expressway area (gray, the label is 1) and the non-highway expressway area are obtained. Area (black, label 0).
  • marking the high-speed expressway area as gray and marking the non-high-speed expressway area as black can achieve the visualization effect; marking the high-speed expressway area as 1, and marking the non-high-speed expressway area as 0 can realize the use of different Mark the area to quickly identify the location of the intruding object.
  • the detection model includes: a deep convolutional network 402, a region generation network (Region Proposal Network, RPN) 403, a position-sensitive candidate region pooling layer (Position Sensitive Regions of Interest Pooling, PSROIPooling) 404, bounding box regression result 405, and classification result 406.
  • a deep convolutional network 402 a region generation network (Region Proposal Network, RPN) 403, a position-sensitive candidate region pooling layer (Position Sensitive Regions of Interest Pooling, PSROIPooling) 404, bounding box regression result 405, and classification result 406.
  • RPN region generation network
  • PSROIPooling Position-sensitive candidate region pooling layer
  • the deep convolutional network 402 is used to perform feature extraction on the to-be-processed image 301 (which is the same image as the to-be-processed image 301 shown in FIG. 3 ) to obtain a first feature map.
  • Region Proposal Network (RPN) 403 is used to generate a candidate target region (object detection frame) on the first feature map to obtain a second feature map, the second feature map includes at least one detection frame and each The position and confidence of the detection frame.
  • the position-sensitive candidate region pooling (Position Sensitive Regions of Interest Pooling, PSROIPooling) layer 404 is used to perform position-sensitive candidate region pooling on the simultaneously input first feature image and at least one object detection frame to obtain a frame regression result 405 and the classification result 406, wherein, in this embodiment, the classification result 406 realizes the prediction of the detection result, and the detection result includes but is not limited to at least one of the following: staff, pedestrians, animals, vehicles on high-speed expressways , motorcycle, electric bicycle, etc., the frame regression result 405 predicts the precise coordinates of the detection frame corresponding to the detection result.
  • At least one object detection frame cropped from the to-be-processed image is obtained, and the position, confidence level and category of the object in the object detection frame are determined for each of the object detection frames.
  • Step S401 obtaining the to-be-processed image from the to-be-processed video stream
  • Step S402 performing feature extraction on the to-be-processed image based on a deep convolutional network to obtain a first feature map
  • a detector based on Faster-Regions with Convolutional Neural Network can be used to detect images to be processed, and the Faster-RCNN network inputs is the image to be processed, and at least one object detection frame is output after processing;
  • the first stage uses the deep convolution network 402 to perform feature extraction, wherein the deep convolution network 402 includes: vector convolution operation 1 (conv1), vector convolution operation 2 (conv2), Dense vector convolution operation 3 (dense conv3) and dense vector convolution operation 4 (dense conv4) use the above four deep convolutional networks to perform feature extraction on images.
  • the deep convolution network 402 includes: vector convolution operation 1 (conv1), vector convolution operation 2 (conv2), Dense vector convolution operation 3 (dense conv3) and dense vector convolution operation 4 (dense conv4) use the above four deep convolutional networks to perform feature extraction on images.
  • Step S403 based on the region generation network, generate candidate target regions in the first feature map to obtain a second feature map;
  • the second feature map includes at least one detection frame and the position and confidence of each detection frame ;
  • candidate target regions are generated in the first feature map to obtain the second feature map.
  • the second feature map includes at least one detection frame, the position of each detection frame, and the confidence level of the detected target.
  • Step S404 in the process of pooling the first feature map and the second feature map into position-sensitive candidate regions based on the pooling layer, based on the confidence of the at least one detection frame and each detection frame degree, determine the detection frame that satisfies the preset condition as the object detection frame, and determine the category of the object in the object detection frame;
  • the first feature map and the second feature map are subjected to position-sensitive candidate region pooling, that is, the first feature map and the second feature map are simultaneously input to perform the position-sensitive candidate region pooling layer 404 to obtain a frame. From the regression result 405 and the classification result 406, the confidence of the detected target and the position of the detection frame are obtained, the detection frame satisfying the preset condition is determined as the object detection frame, and the category of the object in the object detection frame is determined.
  • Step S405 when it is determined that the preset intrusion object exists in any of the object detection frames based on the category of the object, determine that the preset intrusion object exists in the to-be-processed image;
  • Step S406 determining the position of the preset intrusion object based on the position of the object detection frame where the preset intrusion object exists;
  • Step S407 determining whether the preset intrusion object exists in the object detection frame
  • Step S408 In the case of determining that a preset intrusion object exists in the object detection frame, use a convolutional neural network model to perform semantic segmentation on the to-be-processed image to obtain the intrusion detection area;
  • the convolutional neural network model shown in FIG. 3 includes a multi-layer convolution network 302 and a multi-layer deconvolution network 303 .
  • the multi-layer convolution network 302 when the number of layers is 5, is a 5-layer convolution network, which is used to downsample the image to be processed by 32 times and encode the image to be processed at the same time;
  • multi-layer deconvolution The network 303 when the number of layers is 4, is a 4-layer deconvolution network, which is used to upsample the encoding result by 32 times, and perform decoding and semantic understanding of the encoding result.
  • inputting the input to-be-processed image 301 into the convolutional neural network model can obtain the high-speed road area (gray, the label is 1) and the non-high-speed road area (black, the label is 0) .
  • marking the high-speed expressway area as gray and marking the non-high-speed expressway area as black can achieve the visualization effect; marking the high-speed expressway area as 1, and marking the non-high-speed expressway area as 0 can realize the use of different Mark the area to quickly identify the location of the intruding object.
  • Step S409 judging whether the preset intrusion object is located in the intrusion detection area based on the position of the preset intrusion object;
  • the position of the intrusion object may correspond to a set of position coordinates.
  • a group of position coordinates of the object detection frame is used as the position coordinates of the intrusion object, and the coordinates that best represent the position of the intrusion object are selected from a set of position coordinates as the intrusion object.
  • the position coordinates of the object are compared with the position coordinates of the intrusion detection area to determine whether the intrusion object is located in the intrusion detection area.
  • Step S410 determining that the intrusion event occurs in response to the preset intrusion object being located within the intrusion detection area; or determining that the intrusion event has not occurred in response to the preset intrusion object being located outside the intrusion detection area the intrusion event.
  • the first feature map is obtained based on the deep convolutional network; the second feature map is obtained based on the region generating network; In the pooling process, based on the confidence of at least one detection frame and each detection frame, a detection frame that satisfies a preset condition is determined as an object detection frame, and a category of an object in the object detection frame is determined. In this way, the obtained object detection frame is a detection frame including a preset intrusion object. If no object detection frame including an intrusion object is detected, subsequent processing is not required, which can effectively improve the detection efficiency of long-tail events.
  • a convolutional neural network model in the presence of a preset intrusion object, is used to semantically segment the image to be processed, and an intrusion detection area and a non-intrusion detection area distinguished by colors and labels are obtained.
  • an intrusion detection area marked with different colors can achieve a visualization effect; using the intrusion detection areas marked without labels, you can use different labels to mark different areas and quickly identify the location of the intrusion object.
  • determining whether the preset intrusion object is located in the intrusion detection area based on the preset intrusion object position can effectively improve the accuracy of determining that the preset intrusion object is located in the intrusion detection area.
  • Step S421 obtaining the to-be-processed image from the to-be-processed video stream
  • Step S422 performing feature extraction on the to-be-processed image based on a deep convolutional network to obtain a first feature map
  • Step S423 based on the region generation network, generate candidate target regions in the first feature map to obtain a second feature map;
  • the second feature map includes at least one detection frame and the position and confidence of each detection frame ;
  • Step S424 in the process of pooling the first feature map and the second feature map into position-sensitive candidate regions based on the pooling layer, a non-maximum value suppression algorithm is used, based on the detection value of each detection frame.
  • a non-maximum value suppression algorithm is used, based on the confidence of each detection frame and the intersection ratio between the detection frames in the at least one detection frame, the Determining the detection frame as an object detection frame includes: based on the confidence of each detection frame, determining at least one detection frame with the highest confidence in the detection frame as a target detection frame; determining the target detection frame as 1.
  • the object detection frame determines the intersection ratio of the target detection frame and each other detection frame; wherein, the other detection frame refers to the at least one of the detection frame except the target detection frame delete the other detection frames whose intersection ratio is greater than the threshold from the at least one of the detection frames to obtain a candidate detection frame set;
  • the largest detection frame is determined as a new target detection frame; the new target detection frame is determined as one of the object detection frames; the intersection ratio between the new target detection frame and each new other detection frame is determined.
  • each new other detection frame refers to a detection frame other than the new target detection frame in the candidate detection frame set; the new other detection frame whose intersection ratio is greater than the threshold is removed from the Delete the candidate detection frame set to obtain a new candidate detection frame set; and so on, to obtain the object detection frame.
  • Step S425 in the case that the preset intrusion object exists in any of the object detection frames based on the category of the object, determine that the preset intrusion object exists in the to-be-processed image;
  • Step S427 determining whether the preset intrusion object exists in the object detection frame
  • Step S429 Determine whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area.
  • Step S430 outputting an alarm identifier in response to the occurrence of the intrusion event
  • Step S431 in response to the occurrence of the intrusion event, record the intrusion event based on the category of the intrusion object and the intrusion detection area to obtain an intrusion record;
  • Step S432 Store the intrusion record or send it to an associated terminal.
  • the confidence level of the detected object and the position of the detection frame are obtained first. Then, a non-maximum value suppression algorithm is used to merge the detection frames whose intersection ratio is greater than the threshold, and determine the detection frame that meets the preset condition as the object detection frame. In this way, each object in the image to be processed can be finally determined a most suitable object detection frame by using the non-maximum value suppression algorithm.
  • an alarm flag is output; the intrusion event is recorded based on the category of the intrusion object and the intrusion detection area, and the intrusion record is obtained; the intrusion record is stored or sent to the associated terminal.
  • the intrusion object can be quickly guided to leave the intrusion detection area according to the alarm identification, thereby effectively preventing the intrusion object from entering the intrusion detection area.
  • Intrusion events can also be recorded, and high-frequency points of intrusion by intrusion objects can be found according to the intrusion records, and preventive measures can be strengthened.
  • Pedestrians/non-motor vehicles often enter by mistake or intentionally on expressways, which affects the normal driving of vehicles on the road and has a great impact on traffic safety.
  • Video patrols need to conduct real-time and active detection of pedestrians/motor vehicles on the road.
  • pedestrians/motor vehicles are found within the driving range of high-speed highways, relevant early warnings should be issued in time, and the traffic police department should be notified to respond in a timely manner to guide and urge pedestrians/non-motor vehicles.
  • the motor vehicle leaves the driving area of the expressway, eliminates the hidden danger of road driving, and improves the road driving safety index.
  • the early video patrol system mainly relied on the judge polling the video of the image acquisition equipment to detect the pedestrians who broke into the high speed by mistake, and take corresponding measures.
  • this scheme can effectively detect pedestrians entering by mistake, the research and judgment efficiency is low, omissions are more likely to occur, and the real-time polling is not high.
  • the target detection algorithm has been greatly improved. It is used to pre-screen pedestrians appearing in images and videos, which greatly improves the work efficiency of judges.
  • target detection algorithms have proposed data-driven solutions based on deep learning, which further improves the accuracy and recall rate of pedestrian intrusion detection. How the algorithm accuracy can reach or even surpass manual judgment has become a research hotspot.
  • pedestrian intrusion is not a common event on fast-high-speed roads, which places high requirements on the accuracy of detection algorithms. For example, 99% accuracy means that only one false positive is allowed within a hundred incidents.
  • the target detection method based on deep learning can theoretically achieve the target accuracy by increasing the training data and model capacity, it requires a lot of manpower to label the detection frame, and will greatly increase the hardware cost of the algorithm operation, which is the landing of the pedestrian intrusion event algorithm. Problems to be solved.
  • pedestrian intrusion needs to set a prohibited intrusion area, and there is a lot of redundant operation and maintenance work for large-scale applications based on the method of manually demarcating the area.
  • Fig. 5 is a display diagram of an intelligent transportation platform provided by an embodiment of the present application. As shown in Fig. 5, an intelligent transportation platform 501 is used to display the intrusion images of pedestrians/non-motor vehicles identified on the Kuaigao Expressway using the intrusion detection method.
  • FIG. 6 is a pedestrian/vehicle intrusion diagram provided by an embodiment of the present application.
  • the pedestrian/vehicle intrusion 601 is the displayed pedestrian/non-motor vehicle intrusion image after clicking the pedestrian/non-motor vehicle intrusion image shown in FIG. 5 .
  • Motor vehicle intrusion magnified image and image details, such as time, location, etc. of pedestrian/non-motor vehicle intrusion.
  • FIG. 7 is an intrusion detection method provided by the embodiment of the present application.
  • the schematic diagram of the realization flow of as shown in Figure 7, in which, the time axis involves 5 moments, which are from T1 to T5 in sequence, and the workflow is described as follows:
  • Step S700 input the image to be processed into the detector at time T1 to obtain at least one candidate pedestrian detection frame
  • the image to be processed may be an original image or an image after preprocessing the original image.
  • the processing process of step S700 is divided into a feature extraction stage and a detection stage:
  • Feature extraction stage using the deep convolution network 402 shown in Figure 4, including vector convolution operation 1 (conv1), vector convolution operation 2 (conv2), dense vector convolution operation 3 (dense conv3) and dense vector convolution Product operation 4 (dense conv4).
  • the above four depth convolutional layers can be used to perform feature extraction on the image to be processed to obtain a first feature map, and based on the region generation network 403, a candidate target region is generated in the first feature map to obtain a second feature map.
  • the second feature map includes at least one detection frame, the position of each detection frame, and the confidence level of the detected target.
  • the first feature map and the second feature map are pooled for position-sensitive candidate regions, that is, the first feature map and the second feature map are simultaneously input for position-sensitive candidate region pooling layer 404 , obtain the frame regression result 405 and the classification result 406, after this processing, the confidence of the detected target and the position of the detection frame can be obtained, the detection frame that meets the preset conditions is determined as the object detection frame, and the object detection frame is determined.
  • the detection frames whose intersection ratio is greater than the threshold are merged, and the object detection frame (candidate pedestrian detection frame) that meets the requirements is output.
  • Step S701 cutting out each candidate pedestrian detection frame at time T2, inputting it to the cascade classifier, and obtaining a classification result;
  • the cascade classifier can be obtained through training, for example, in the training phase: collect 300,000 small pictures of detection alarms, including 60,000 positive samples and 240,000 negative samples.
  • the pedestrian/non-pedestrian binary classification is performed on these data, and then the stochastic gradient descent algorithm is used to train the ResNet18 network and the ResNet50 network respectively.
  • the first residual network 211 can use the ResNet18 network to realize the rough judgment of the image, which has the characteristics of high judgment efficiency and high error rate;
  • the second residual network 231 can use the ResNet50 network to realize the image judgment. Fine judgment, low misjudgment rate.
  • the classification improves the accuracy step by step, and finally obtains high-precision pedestrian detection results.
  • Step S702 at time T3, determine whether the cascaded classification model has acquired a valid image
  • the cascade classification model it can be judged whether there is a required valid image in the image to be processed. For example, to judge pedestrians who are prohibited from intruding on a high-speed highway, at the time T3, it can be determined whether the cascade classification model has obtained images of intruding pedestrians. When no valid image is obtained, there is no need to identify the target area (high-speed road area) of the image to be processed, and the process ends. In this way, there is no need to identify the target area for each input original image, which obviously saves the cost of hardware equipment. computing power.
  • Step S703 when it is determined that the cascaded classifier has acquired a valid image, input the to-be-processed image into the semantic segmentation model at time T4 to obtain a pedestrian prohibited entry area on a high expressway;
  • the input of the semantic segmentation model is the image to be processed 301, which is first subjected to 32 times downsampling through a 5-layer convolutional network (conv1, conv2, conv3, conv4 and conv5) 302 to the image to be processed 301, and the image to be processed is encoded; After 4 layers of deconvolution (dconv1, dconv2, dconv3 and dconv4) 303, the encoding result is upsampled by 32 times, and the encoding result is decoded and semantically understood to obtain the high expressway area (gray, label is 1) and non-high Expressway area (black, labelled 0).
  • marking the high-speed expressway area as gray and marking the non-high-speed expressway area as black can achieve the visualization effect; marking the high-speed expressway area as 1, and marking the non-high-speed expressway area as 0 can realize the use of different Mark the area to quickly identify the location of the intruding object.
  • Step S704 at time T5, according to the pedestrian classification result and the pedestrian prohibited entry area of the expressway, determine whether the pedestrian has entered the prohibited area;
  • the pedestrian detection frame and the semantic segmentation map can be obtained.
  • the semantic segmentation result is a two-dimensional matrix G.
  • the upper left point of the pedestrian detection frame be (x 1 , y 1 )
  • the lower right point be (x 2 , y 2 )
  • the semantic segmentation result is a two-dimensional matrix G.
  • the center point of the bottom edge of the pedestrian detection frame can be selected as the pedestrian positioning point to determine whether the pedestrian is intruding, that is, the following formula (1):
  • Step S705, at time T6, output the result of pedestrian intrusion.
  • the images of pedestrian/non-motor vehicle intrusions identified on Kuaigao Expressway using the intrusion detection method are shown.
  • the enlarged image and image details of the pedestrian/non-motor vehicle intrusion are displayed, such as the time and location of the pedestrian/non-motor vehicle intrusion.
  • the embodiment of the present application proposes an intrusion detection method for cascading event detection.
  • pedestrian detection is performed based on the detection model at time T1
  • candidate targets are filtered through the cascade classifier at time T2, and then judged at time T3.
  • the to-be-processed image with an intrusion object will be semantically segmented to determine whether the target appears in the prohibited intrusion area.
  • T5 Input the object detection frame and the intrusion detection area at the same time for judgment, and complete the judgment at time T6.
  • the scheme realizes fully automatic pedestrian intrusion detection on high-speed roads without significantly increasing the computing power requirement. In this way, no-entry areas are identified using semantic segmentation without human annotation.
  • the algorithm modules for detection, classification, and segmentation can be upgraded independently. For long-tail event detection, the algorithm's computing power requirements are reduced.
  • the embodiments of the present application provide an intrusion detection apparatus, which includes each module included and each submodule included in each module, which can be implemented by a processor in a computer device; of course, it can also be It is implemented by a specific logic circuit; in the process of implementation, the processor may be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP) or a field programmable gate array (FPGA) or the like.
  • the processor may be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP) or a field programmable gate array (FPGA) or the like.
  • FIG. 8 is a schematic diagram of the structure and composition of the intrusion detection device according to an embodiment of the present application. As shown in FIG. 8 , the device 800 includes:
  • Obtaining module 810 configured to obtain the to-be-processed image from the to-be-processed video stream
  • a detection module 820 configured to detect objects in the to-be-processed image to obtain at least one object detection frame
  • a first determining module 830 configured to determine whether the preset intrusion object exists in the object detection frame
  • the identification module 840 is configured to identify the to-be-processed image when it is determined that a preset intrusion object exists in the object detection frame to obtain an intrusion detection area;
  • the second determination module 850 is configured to determine whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area.
  • the detection module 820 includes a detection sub-module, a first determination sub-module and a second determination sub-module, wherein the detection sub-module is configured to detect an object in the image to be processed, Obtain at least one object detection frame, the position of each of the object detection frames and the category of the object in each of the object detection frames; the first determination sub-module is configured to determine any one of the objects based on the category of the object.
  • the second determination sub-module is configured to be based on the existence of the preset intrusion object. The position of the object detection frame of the object determines the position of the preset intrusion object.
  • the detection sub-module includes a deep convolutional network, a region generation network and a pooling layer, wherein the deep convolutional network is configured to perform feature extraction on the to-be-processed image to obtain the first feature
  • the region generation network is configured to generate candidate target regions in the first feature map to obtain a second feature map
  • the second feature map includes at least one detection frame and the position of each detection frame, confidence
  • the pooling layer is configured to perform position-sensitive candidate region pooling on the first feature map and the second feature map based on the at least one detection frame and each of the For the confidence level of the detection frame, the detection frame that satisfies the preset condition is determined as the object detection frame, and the category of the object in the object detection frame is determined.
  • the detection sub-module includes a non-maximum suppression algorithm unit, using a non-maximum suppression algorithm, based on the confidence of each of the detection frames and the at least one of the detection frames.
  • the intersection ratio between the detection frames determines the detection frame that meets the preset condition as the object detection frame.
  • the non-maximum value suppression algorithm unit includes a first determination subunit, a second determination subunit, a deletion subunit, a third determination subunit, and a fourth determination subunit, wherein the first determination subunit A determination subunit, configured to, based on the confidence of each of the detection frames, determine at least one detection frame with the highest confidence in the detection frame as a target detection frame; determine the target detection frame as an object detection frame; the second determination subunit is configured to determine the intersection ratio of the target detection frame and each other detection frame; wherein, the other detection frame refers to the at least one detection frame except all the detection frames.
  • the deletion subunit is configured to delete other detection frames whose intersection ratio is greater than a threshold from the at least one detection frame to obtain a candidate detection frame set;
  • the third determination a subunit configured to determine the detection frame with the highest confidence in the candidate detection frame set except the target detection frame as a new target detection frame; determine the new target detection frame as a an object detection frame;
  • the fourth determination subunit is configured to determine the intersection ratio of the new target detection frame and each new other detection frame; wherein, the each new other detection frame refers to the a detection frame other than the new target detection frame in the candidate detection frame set;
  • the deletion subunit is further configured to delete other new detection frames whose intersection ratio is greater than a threshold from the candidate detection frame set, A new set of candidate detection frames is obtained; and by analogy, the object detection frame is obtained.
  • the first determination module 830 includes a first classification sub-module, a second classification sub-module and a third determination sub-module, wherein the second classification sub-module uses a first-level classifier based on The category of the object performs a first classification on the object detection frame corresponding to the target category to obtain a first classification result; the second classification sub-module adopts a second-level classifier cascaded with the first-level classifier, Based on the first classification result, a second classification is performed on the object detection frame that meets the preset condition, and a second classification result is obtained; the third determination sub-module determines any one of the objects based on the second classification result.
  • the first-level classifier and the second-level classifier have the following relationship: the classification accuracy of the first-level classifier is lower than the classification accuracy of the second-level classifier; the first-level classification The number of convolutional layers in the classifier is less than the number of convolutional layers in the second-level classifier; the confidence level of the first-level classifier is lower than that of the second-level classifier.
  • the identification module 840 is further configured to use a convolutional neural network model to perform semantic segmentation on the to-be-processed image to obtain the intrusion detection area.
  • the second determination module 850 includes a determination sub-module and a fourth determination sub-module, wherein the determination sub-module is configured to determine the preset intrusion object based on the position of the preset intrusion object whether the intrusion object is located in the intrusion detection area; the fourth determination submodule is configured to determine that the intrusion event occurs in response to the preset intrusion object being located in the intrusion detection area; or in response to the intrusion detection area; The preset intrusion object is located outside the intrusion detection area, and it is determined that the intrusion event does not occur.
  • the second determination module 850 further includes a fifth determination sub-module, a sixth determination sub-module and a seventh determination sub-module, wherein the fifth determination sub-module is configured to The set object detection frame of the intrusion object is determined as the target detection frame; the sixth determination submodule is configured to determine the center point of the bottom edge of the target detection frame as the preset position of the intrusion object; The seventh determination sub-module is configured to determine whether an intrusion event occurs based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area.
  • the intrusion detection apparatus further includes an output module configured to output an alarm identification in response to the occurrence of the intrusion event.
  • the intrusion detection apparatus further includes a recording module and a sending module, wherein the recording module is configured to, in response to the occurrence of the intrusion event, pair the intrusion detection area based on the category of the intrusion object and the intrusion detection area.
  • the intrusion event is recorded to obtain an intrusion record;
  • the sending module is configured to store or send the intrusion record to an associated terminal.
  • the above-mentioned intrusion detection method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present application may be embodied in the form of software products in essence or the parts that contribute to related technologies.
  • the computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a robot, a server, etc.) executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • ROM Read Only Memory
  • the aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • the embodiments of the present application are not limited to any specific combination of hardware and software.
  • the embodiments of the present application provide a computer-readable storage medium, which may be a volatile storage medium or a non-volatile storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the foregoing embodiments are implemented The steps in the intrusion detection method provided in .
  • FIG. 9 is a schematic diagram of a hardware entity of the computer device according to the embodiment of the present application.
  • the hardware entity of the device 900 includes: a memory 901 and a processor 902 , the memory 901 stores a computer program that can be executed on the processor 902, and when the processor 902 executes the program, the steps in the intrusion detection method provided in the above embodiments are implemented.
  • the memory 901 is configured to store instructions and applications executable by the processor 902, and can also cache data to be processed or processed by the processor 902 and various modules in the computer device 900 (eg, image data, audio data, voice communication data and Video communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • FLASH flash memory
  • RAM Random Access Memory
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.
  • the unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit; it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may all be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.
  • the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, the execution includes: The steps of the above method embodiments; and the aforementioned storage medium includes: a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • ROM Read Only Memory
  • the above-mentioned integrated units of the present application are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium and include several instructions to make
  • a computer device (which may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a robot, a server, etc.) executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
  • the to-be-processed image is obtained from the to-be-processed video stream, so that the image in the video stream collected by the image capture device can be used as input to analyze the video stream, which can effectively improve the performance of the image capture device.
  • Utilization rate firstly, the object in the image to be processed is detected to obtain at least one object detection frame, and then it is determined whether there is a preset intrusion object in the object detection frame. In this way, the provided detection model and classification model are decoupled, and the classification model can be customized for special scenarios during the implementation of the algorithm to quickly achieve the expected performance. Great improvement.
  • the detection model and the classification model are decoupled, for the optimization of false positives in new scenarios, it is only necessary to add false positive data to train a new classifier, which can be cascaded with the existing detection model, which is suitable for rapid upgrade iterations of algorithm implementation. Filtering false positives in a cascading manner can greatly improve the detection accuracy of long-tail events; when it is determined that there is a preset intrusion object in the object detection frame, the to-be-processed image is identified to obtain an intrusion detection area ; Determine whether an intrusion event occurs based on the preset intrusion object position and intrusion detection area.
  • the identification of the intrusion detection area is performed only in the to-be-processed images that are confirmed to have intrusion objects, without the need to identify all the to-be-processed images, which can significantly reduce the computing power requirements on hardware devices, thereby realizing efficient and fully automatic detection. Whether the intrusion object intrudes into the intrusion detection area, and then there is no need to manually mark the area, which is convenient for large-scale online applications.

Abstract

An invasion detection method and apparatus, a device, a storage medium, and a program product. The method comprises: obtaining an image to be processed from a video stream to be processed (S101); detecting objects in said, and obtaining at least one object detection box (S102); determining whether a preset invading object is present in the object detection box (S103); when it is determined that the preset invading object is present in the object detection box, performing recognition on said image, and obtaining an invasion detection area (S104); and determining, on the basis of the location of the preset invading object and the invasion detection area, whether an invasion event has occurred (S105).

Description

入侵检测方法、装置、设备、存储介质和程序产品Intrusion detection method, apparatus, device, storage medium and program product
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开基于申请号为202011620177.3、申请日为2020年12月31日、申请名称为“入侵检测方法、装置、设备及存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本公开。This disclosure is based on the Chinese patent application with the application number of 202011620177.3, the application date of which is on December 31, 2020, and the application name is "Intrusion Detection Method, Device, Equipment and Storage Medium", and claims the priority of the Chinese patent application, which The entire contents of the Chinese patent application are hereby incorporated by reference into the present disclosure.
技术领域technical field
本公开涉及智能检测领域,尤其涉及一种入侵检测方法及装置、设备、存储介质和程序产品。The present disclosure relates to the field of intelligent detection, and in particular, to an intrusion detection method and apparatus, device, storage medium and program product.
背景技术Background technique
高快速道路中时常发生行人/非机动车误闯或有意进入,影响道路行驶车辆的正常行驶,对交通安全造成极大影响。这类长尾事件的特点是在一定时间段内发生的概率低,采集过程中摄像装置采集的图像数据是海量的,相关技术深度学习的目标检测方法,通过增加训练数据和模型容量达到目标精度,但需要投入大量人力进行检测框标注,并且会大幅度增加算法运行的硬件成本,是行人入侵等类似长尾事件算法落地亟待解决的问题。Pedestrians/non-motor vehicles often enter by mistake or intentionally on expressways, which affects the normal driving of vehicles on the road and has a great impact on traffic safety. The characteristics of such long-tail events are that the probability of occurrence within a certain period of time is low, and the image data collected by the camera during the collection process is massive. The target detection method of deep learning in related technologies achieves target accuracy by increasing training data and model capacity. However, it requires a lot of manpower to mark the detection frame, and will greatly increase the hardware cost of algorithm operation, which is an urgent problem to be solved in the implementation of long-tail event algorithms such as pedestrian intrusion.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请实施例提供一种入侵检测方法、装置、设备及存储介质。In view of this, embodiments of the present application provide an intrusion detection method, apparatus, device, and storage medium.
本申请实施例的技术方案是这样实现的:The technical solutions of the embodiments of the present application are implemented as follows:
第一方面,本申请实施例提供一种入侵检测方法,包括:从待处理的视频流中获得所述待处理图像;对所述待处理图像中的对象进行检测,得到至少一个对象检测框;确定所述对象检测框中是否存在所述预设的入侵对象;在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件。In a first aspect, an embodiment of the present application provides an intrusion detection method, including: obtaining the to-be-processed image from a to-be-processed video stream; detecting objects in the to-be-processed image to obtain at least one object detection frame; determining whether the preset intrusion object exists in the object detection frame; in the case of determining that there is a preset intrusion object in the object detection frame, identifying the to-be-processed image to obtain an intrusion detection area; based on The preset position of the intrusion object and the intrusion detection area determine whether an intrusion event occurs.
在一些实施例中,所述对所述待处理图像中的对象进行检测,得到至少一个对象检测框,包括:对所述待处理图像中的对象进行检测,得到至少一个对象检测框、每一所述对象检测框的位置和每一所述对象检测框中对象的类别;在基于所述对象的类别确定任一所述对象检测框中存在所述预设的入侵对象情况下,确定所述待处理图像中存在所述预设的入侵对象;基于存在所述预设的入侵对象的对象检测框的位置,确定所述预设的入侵对象的位置。In some embodiments, the detecting an object in the to-be-processed image to obtain at least one object detection frame includes: detecting an object in the to-be-processed image to obtain at least one object detection frame, each the position of the object detection frame and the type of the object in each of the object detection frames; when it is determined based on the type of the object that the preset intrusion object exists in any of the object detection frames, determine the The preset intrusion object exists in the to-be-processed image; based on the position of the object detection frame where the preset intrusion object exists, the position of the preset intrusion object is determined.
这样,先得到对象检测框,再确定对象检测框中存在预设的入侵对象的方法,可以有效检测出待处理图像中的预设的入侵对象,且检测框还可以用于后续判断入侵对象是否存在于入侵检测区域。实现了利用检测器准确确定待处理图像中是否包括入侵对象以及入侵对象的位置。In this way, the method of first obtaining the object detection frame, and then determining that there is a preset intrusion object in the object detection frame can effectively detect the preset intrusion object in the image to be processed, and the detection frame can also be used to subsequently determine whether the intrusion object is not exists in the intrusion detection area. The detector can be used to accurately determine whether the intrusion object is included in the to-be-processed image and the position of the intrusion object.
在一些实施例中,所述对所述待处理图像中的对象进行检测,得到至少一个对象检测框、每一所述对象检测框的位置和每一所述对象检测框中对象的类别,包括:基于深度卷积网络,对所述待 处理图像进行特征提取,得到第一特征图;基于区域生成网络,在所述第一特征图中生成候选目标区域,得到第二特征图;所述第二特征图包括至少一个检测框和每一所述检测框的位置、置信度;基于池化层在将所述第一特征图和所述第二特征图进行位置敏感的候选区域池化的过程中,基于所述至少一个检测框和每一所述检测框的置信度,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别。In some embodiments, the object detection in the image to be processed obtains at least one object detection frame, the position of each object detection frame and the category of the object in each object detection frame, including : based on a deep convolutional network, extract features from the image to be processed to obtain a first feature map; based on a region generating network, generate candidate target regions in the first feature map to obtain a second feature map; the first feature map The second feature map includes at least one detection frame and the position and confidence of each detection frame; based on the pooling layer, the first feature map and the second feature map are subjected to the process of pooling position-sensitive candidate regions wherein, based on the at least one detection frame and the confidence level of each detection frame, a detection frame that satisfies a preset condition is determined as an object detection frame, and a category of an object in the object detection frame is determined.
这样,基于深度卷积网络、区域生成网络和池化层可以实现待处理图像中的对象进行检测,得到至少一个对象检测框、每一对象检测框的位置和每一对象检测框中对象的类别。In this way, based on the deep convolutional network, the region generation network and the pooling layer, the object detection in the image to be processed can be realized, and at least one object detection frame, the position of each object detection frame and the category of the object in each object detection frame can be obtained. .
在一些实施例中,所述基于所述至少一个检测框和每一所述检测框的置信度,将满足预设条件的检测框确定为对象检测框,包括:采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框。In some embodiments, the determining a detection frame that meets a preset condition as an object detection frame based on the at least one detection frame and the confidence of each of the detection frames includes: using a non-maximum value suppression algorithm, Based on the confidence of each of the detection frames and the intersection ratio between the detection frames in the at least one detection frame, a detection frame satisfying a preset condition is determined as an object detection frame.
这样,采用非极大值抑制算法可以将待处理图像中的每一对象最终确定一个最合适的对象检测框。In this way, each object in the image to be processed can be finally determined a most suitable object detection frame by using the non-maximum value suppression algorithm.
在一些实施例中,所述采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框,包括:基于每一所述检测框的置信度,将至少一个所述检测框中置信度最大的检测框,确定为目标检测框;将所述目标检测框确定为一所述对象检测框;确定所述目标检测框与每一其他检测框的交并比;其中,所述其他检测框是指所述至少一个所述检测框中除所述目标检测框之外的检测框;将交并比大于阈值的其他检测框从所述至少一个所述检测框删除,得到候选检测框集合;将所述候选检测框集合中除所述目标检测框之外的置信度最大的检测框,确定为新的目标检测框;将所述新的目标检测框确定为一所述对象检测框;确定所述新的目标检测框与每一新的其他检测框的交并比;其中,所述每一新的其他检测框是指所述候选检测框集合中除所述新的目标检测框之外的检测框;将交并比大于阈值的新的其他检测框从所述候选检测框集合中删除,得到新的候选检测框集合;以此类推,得到所述对象检测框。In some embodiments, the non-maximum suppression algorithm is used, based on the confidence of each of the detection frames and the intersection ratio between the detection frames in the at least one detection frame, will satisfy the predetermined Determining a conditional detection frame as an object detection frame includes: based on the confidence of each detection frame, determining at least one detection frame with the highest confidence in the detection frame as a target detection frame; detecting the target The frame is determined as one of the object detection frames; the intersection ratio of the target detection frame and each other detection frame is determined; wherein, the other detection frames refer to the at least one of the detection frames except the target detection frame A detection frame other than the frame; delete other detection frames whose intersection ratio is greater than the threshold from the at least one detection frame to obtain a candidate detection frame set; remove the target detection frame from the candidate detection frame set The detection frame with the highest confidence is determined as a new target detection frame; the new target detection frame is determined as one of the object detection frames; the difference between the new target detection frame and each new other detection frame is determined Intersection and union ratio; wherein, each new other detection frame refers to a detection frame other than the new target detection frame in the candidate detection frame set; the intersection and union ratio is greater than the new other detection frame of the threshold value Delete from the candidate detection frame set to obtain a new candidate detection frame set; and so on, to obtain the object detection frame.
这样,采用非极大值抑制算法,合并交并比大于阈值的检测框,可以将满足预设条件的检测框确定为对象检测框,以实现将待处理图像中的每一对象最终确定一个最合适的对象检测框。In this way, the non-maximum value suppression algorithm is used to merge the detection frames whose intersection ratio is greater than the threshold, and the detection frame that meets the preset conditions can be determined as the object detection frame, so as to realize the final determination of each object in the image to be processed as a maximum detection frame. Appropriate object detection boxes.
在一些实施例中,所述确定所述对象检测框中是否存在所述预设的入侵对象,包括:采用第一级分类器,基于所述对象的类别将目标类别对应的对象检测框进行第一分类,得到第一分类结果;采用与所述第一级分类器级联的第二级分类器,基于所述第一分类结果,对满足预设条件的对象检测框进行第二分类,得到第二分类结果;在基于所述第二分类结果确定任一所述对象检测框中存在所述预设的入侵对象时,确定所述对象检测框中存在所述预设的入侵对象。In some embodiments, the determining whether the preset intrusion object exists in the object detection frame includes: using a first-level classifier to perform a first-level classifier on the object detection frame corresponding to the target category based on the object category. A classification is used to obtain a first classification result; using a second-level classifier cascaded with the first-level classifier, based on the first classification result, a second classification is performed on the object detection frame that meets the preset condition, and the result is obtained second classification result; when it is determined that the preset intrusion object exists in any of the object detection frames based on the second classification result, it is determined that the preset intrusion object exists in the object detection frame.
这样,第一分类可以视为是初步判定,第二分类可以视为是再次判定,采用先初步判定再二次判定的两次分类方法可以有效提升分类的效率、降低误判率。In this way, the first classification can be regarded as a preliminary judgment, and the second classification can be regarded as a re-judgment. Adopting the two-classification method of making a preliminary judgment and then a second judgment can effectively improve the efficiency of the classification and reduce the misjudgment rate.
在一些实施例中,所述第一级分类器与所述第二级分类器之间具有以下关系:所述第一级分类器的分类精度比所述第二级分类器的分类精度低;所述第一级分类器中卷积层的层数比所述第二级分类器中卷积层的层数少;所述第一级分类器的置信度比所述第二级分类器的置信度低。In some embodiments, the first-level classifier and the second-level classifier have the following relationship: the classification accuracy of the first-level classifier is lower than the classification accuracy of the second-level classifier; The number of convolutional layers in the first-level classifier is less than the number of convolutional layers in the second-level classifier; the confidence level of the first-level classifier is higher than that of the second-level classifier. Confidence is low.
这样,利用第一级分类器与所述第二级分类器组成的分类器可以保证分类精度的基础上有效提升分类的效率和置信度,降低误判率。In this way, using the classifier composed of the first-level classifier and the second-level classifier can effectively improve the efficiency and confidence of the classification and reduce the misjudgment rate on the basis of ensuring the classification accuracy.
在一些实施例中,所述对所述待处理图像进行识别,得到入侵检测区域,包括:采用卷积神经 网络模型对所述待处理图像进行语义分割,得到所述入侵检测区域。In some embodiments, the identifying the to-be-processed image to obtain the intrusion detection area includes: using a convolutional neural network model to perform semantic segmentation on the to-be-processed image to obtain the intrusion detection area.
这样,采用卷积神经网络对待处理图像进行语义分割,实现了入侵检测区域的自动识别,无需人工标注区域,方便大规模上线应用。In this way, the convolutional neural network is used for semantic segmentation of the image to be processed, which realizes the automatic identification of the intrusion detection area, and does not require manual labeling of the area, which is convenient for large-scale online applications.
在一些实施例中,所述基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件,包括:基于所述预设的入侵对象的位置判断所述预设的入侵对象是否位于所述入侵检测区域内;响应于所述预设的入侵对象位于所述入侵检测区域之内,确定发生所述入侵事件;或响应于所述预设的入侵对象位于所述入侵检测区域之外,确定未发生所述入侵事件。In some embodiments, the determining whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area includes: judging the preset intrusion based on the preset position of the intrusion object Whether the object is located in the intrusion detection area; in response to the preset intrusion object located in the intrusion detection area, it is determined that the intrusion event occurs; or in response to the preset intrusion object located in the intrusion detection area Outside the area, it was determined that the intrusion event did not occur.
这样,基于预设的入侵对象的位置判断预设的入侵对象是否位于入侵检测区域内,可有效提升确定预设的入侵对象位于入侵检测区域的准确性。In this way, determining whether the preset intrusion object is located in the intrusion detection area based on the preset position of the intrusion object can effectively improve the accuracy of determining that the preset intrusion object is located in the intrusion detection area.
在一些实施例中,所述基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件,包括:将存在所述预设的入侵对象的对象检测框,确定为目标检测框;将所述目标检测框的底边的中心点,确定为所述预设的入侵对象的位置;基于所述目标检测框的底边的中心点和所述入侵检测区域之间的相对位置关系,确定是否发生入侵事件。In some embodiments, the determining whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area includes: determining an object detection frame in which the preset intrusion object exists as a target detection frame; determine the center point of the bottom edge of the target detection frame as the position of the preset intrusion object; based on the relative relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area Location relationship to determine whether an intrusion event has occurred.
这样,基于所述目标检测框的底边的中心点和所述入侵检测区域之间的相对位置关系,确定是否发生入侵事件。这样,将目标检测框的底边的中心点对应的位置坐标与入侵检测区域的位置坐标做比对。在确定该位置坐标属于入侵区域的情况下,确定发生入侵事件,可以有效提升确定入侵事件的精度。In this way, based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area, it is determined whether an intrusion event occurs. In this way, the position coordinates corresponding to the center point of the bottom edge of the target detection frame are compared with the position coordinates of the intrusion detection area. When it is determined that the location coordinates belong to the intrusion area, it is determined that an intrusion event occurs, which can effectively improve the accuracy of determining the intrusion event.
在一些实施例中,所述方法还包括:响应于发生所述入侵事件,输出告警标识。In some embodiments, the method further comprises: in response to the occurrence of the intrusion event, outputting an alarm identification.
这样,可以根据告警标识迅速引导入侵对象离开入侵检测区域,有效预防入侵对象进入入侵检测区域。In this way, the intrusion object can be quickly guided to leave the intrusion detection area according to the alarm identification, thereby effectively preventing the intrusion object from entering the intrusion detection area.
在一些实施例中,所述方法还包括:响应于发生所述入侵事件,基于所述入侵对象的类别和所述入侵检测区域对所述入侵事件进行记录,得到入侵记录;将所述入侵记录进行存储或者发送给关联的终端。In some embodiments, the method further includes: in response to the occurrence of the intrusion event, recording the intrusion event based on the category of the intrusion object and the intrusion detection area to obtain an intrusion record; recording the intrusion record stored or sent to the associated terminal.
这样,可以对入侵事件进行记录,根据入侵记录发现入侵对象入侵的高频点位,加强防范措施。In this way, intrusion events can be recorded, and high-frequency points of intrusion by intrusion objects can be found according to the intrusion records, and preventive measures can be strengthened.
第二方面,本申请实施例提供一种入侵检测装置,包括:获得模块,配置为从待处理的视频流中获得所述待处理图像;检测模块,配置为对所述待处理图像中的对象进行检测,得到至少一个对象检测框;第一确定模块,配置为确定所述对象检测框中是否存在所述预设的入侵对象;识别模块,配置为在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;第二确定模块,配置为基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件。In a second aspect, an embodiment of the present application provides an intrusion detection device, including: an obtaining module configured to obtain the to-be-processed image from a to-be-processed video stream; a detection module configured to detect objects in the to-be-processed image performing detection to obtain at least one object detection frame; a first determination module, configured to determine whether the preset intrusion object exists in the object detection frame; an identification module, configured to determine whether a preset intrusion object exists in the object detection frame In the case of the intrusion object, the image to be processed is identified to obtain an intrusion detection area; the second determination module is configured to determine whether an intrusion event occurs based on the preset intrusion object position and the intrusion detection area. .
第三方面,本申请实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法的入侵检测方法。In a third aspect, an embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and when the processor executes the program, the intrusion detection of the above method is implemented method.
第四方面,本申请实施例提供一种存储介质,存储有可执行指令,用于引起处理器执行时,实现上述方法的入侵检测方法。In a fourth aspect, an embodiment of the present application provides a storage medium storing executable instructions for implementing the intrusion detection method of the above method when a processor is caused to execute.
第五方面,本申请实施例提供一种计算机程序产品,包括一条或多条指令,该一条或多条指令适于有处理器加载并执行上述方法的入侵检测方法。In a fifth aspect, an embodiment of the present application provides a computer program product, including one or more instructions, where the one or more instructions are suitable for an intrusion detection method in which a processor loads and executes the above method.
本申请实施例中具有以下优点:The embodiments of the present application have the following advantages:
1)从待处理的视频流中获得所述待处理图像,这样可以利用图像采集设备采集的视频流中的图 像作为输入,对视频流进行分析,能够有效提高图像采集设备的利用率。1) Obtain the image to be processed from the video stream to be processed, so that the image in the video stream collected by the image capture device can be used as input to analyze the video stream, which can effectively improve the utilization rate of the image capture device.
2)先对待处理图像中的对象进行检测,得到至少一个对象检测框,然后确定对象检测框中是否存在预设的入侵对象。这样,提供的检测模型和分类模型实现了解耦,可以在算法落地过程中,针对特殊场景定制分类模型,快速达到预期性能,解除了对单一检测模型的精度依赖,使得算法速度和精度获得了很大的提升。进一步地,由于检测模型和分类模型实现了解耦,对于新场景误报优化,仅需要加入误报数据训练新分类器,与已有检测模型级联即可,适合算法落地的快速升级迭代,通过级联方式过滤误报,可以极大提高长尾事件的检测精度。2) First, the object in the image to be processed is detected to obtain at least one object detection frame, and then it is determined whether a preset intrusion object exists in the object detection frame. In this way, the provided detection model and classification model are decoupled, and the classification model can be customized for special scenarios during the implementation of the algorithm to quickly achieve the expected performance. Great improvement. Further, since the detection model and the classification model are decoupled, for the optimization of false positives in new scenarios, it is only necessary to add false positive data to train a new classifier, which can be cascaded with the existing detection model, which is suitable for rapid upgrade iterations of algorithm implementation. Filtering false positives in a cascaded manner can greatly improve the detection accuracy of long-tail events.
3)在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;基于预设的入侵对象的位置和入侵检测区域,确定是否发生入侵事件。这样,只在确认有入侵对象的待处理图像中进行入侵检测区域的识别,而无需对所有的待处理图像进行识别,能够显著减少对硬件设备上计算力的要求,从而实现了高效全自动检测入侵对象是否入侵入侵检测区域,进而无需人工标注区域,方便大规模上线应用。3) In the case of determining that a preset intrusion object exists in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area; determine whether an intrusion detection area occurs based on the preset intrusion object position and intrusion detection area intrusion event. In this way, the identification of the intrusion detection area is performed only in the to-be-processed images that are confirmed to have intrusion objects, without the need to identify all the to-be-processed images, which can significantly reduce the computing power requirements on hardware devices, thereby realizing efficient and fully automatic detection. Whether the intrusion object intrudes into the intrusion detection area, and then there is no need to manually mark the area, which is convenient for large-scale online applications.
附图说明Description of drawings
图1为本申请实施例提供的一种入侵检测方法的实现流程示意图;FIG. 1 is a schematic flowchart of the implementation of an intrusion detection method provided by an embodiment of the present application;
图2为本申请实施例提供的一种级联分类模型示意图;2 is a schematic diagram of a cascaded classification model provided by an embodiment of the present application;
图3为本申请实施例提供的一种语义分割模型示意图;3 is a schematic diagram of a semantic segmentation model provided by an embodiment of the present application;
图4为本申请实施例提供的一种检测模型示意图;4 is a schematic diagram of a detection model provided by an embodiment of the present application;
图5为本申请实施例提供的智慧交通平台展示图;FIG. 5 is a display diagram of a smart transportation platform provided by an embodiment of the present application;
图6为本申请实施例提供的行人/车闯入图;FIG. 6 is a pedestrian/vehicle intrusion diagram provided by an embodiment of the present application;
图7为本申请实施例提供的一种入侵检测方法的实现流程示意图;FIG. 7 is a schematic diagram of an implementation flowchart of an intrusion detection method provided by an embodiment of the present application;
图8为本申请实施例入侵检测装置结构组成示意图;FIG. 8 is a schematic structural diagram of an intrusion detection device according to an embodiment of the present application;
图9为本申请实施例计算机设备的一种硬件实体示意图。FIG. 9 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application.
实施方式Implementation
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对发明的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application more clear, the specific technical solutions of the invention will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are used to illustrate the present application, but are not intended to limit the scope of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
应当理解,此处所描述的一些实施例仅仅用以解释本申请实施例的技术方案,并不用于限定本申请实施例的技术范围。It should be understood that some embodiments described herein are only used to explain the technical solutions of the embodiments of the present application, and are not used to limit the technical scope of the embodiments of the present application.
本申请实施例提出一种入侵检测方法应用于计算机设备,所述计算机设备可包括可移动设备或不可移动设备,该方法所实现的功能可以通过计算机设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中,可见,该计算机设备至少包括处理器和存储介质。The embodiment of the present application proposes an intrusion detection method to be applied to a computer device. The computer device may include a removable device or a non-removable device. The functions implemented by the method may be implemented by calling a program code by a processor in the computer device. Of course, The program code can be stored in a computer storage medium, and it can be seen that the computer device includes at least a processor and a storage medium.
本申请实施例提供的一种入侵检测方法的实现流程示意图,如图1所示,该方法包括:A schematic diagram of an implementation flow of an intrusion detection method provided by an embodiment of the present application, as shown in FIG. 1 , the method includes:
步骤S101、从待处理的视频流中获得所述待处理图像;Step S101, obtaining the to-be-processed image from the to-be-processed video stream;
在一些实施例中,可以利用图像采集设备获取的视频流作为输入,从视频流中获得待处理图像,这类采集到的视频流由于受采集时间段的影响,数据多数情况是海量的。这里,图像采集设备可以是摄像装置。在实施过程中,可以复用当前道路图像采集系统,这样可以有效避免专用硬件的限制; 还可以采用定时抓拍方式获取待处理图像,对闯入高快速路的行人/非机动车进行识别和预警,协助交警维护高快速路秩序,提高路网安全性。In some embodiments, the video stream acquired by the image acquisition device can be used as input to acquire the image to be processed from the video stream. Due to the influence of the acquisition time period, the data of such acquired video stream is massive in most cases. Here, the image capturing device may be a camera. In the implementation process, the current road image acquisition system can be reused, which can effectively avoid the limitation of dedicated hardware; the image to be processed can also be obtained by means of timed snapshots, and the pedestrians/non-motor vehicles entering the expressway can be identified and warned. , to assist the traffic police in maintaining the order of the expressway and improving the safety of the road network.
步骤S102、对所述待处理图像中的对象进行检测,得到至少一个对象检测框;Step S102, detecting objects in the to-be-processed image to obtain at least one object detection frame;
在一些实施例中,可以使用目标检测模型对待处理图像中的对象进行检测,得到至少一个对象检测框。在实施过程中,目标检测模型可以是经过训练的基于快速区域卷积神经网络(Faster-Regions with Convolutional Neural Network,Faster-RCNN)、你只需看一次(You Only Look Once,YOLO)、单发多框(SingleShot MultiBox Detector,SSD)网络等之一。其中,以Faster R-CNN为代表的双步目标检测方法,优点是检测精度高,缺点是检测速度慢;以YOLO、SSD网络为代表的单步目标检测方法,优点是检测速度比双步检测方法类快。In some embodiments, an object detection model may be used to detect objects in the image to be processed to obtain at least one object detection frame. In the implementation process, the target detection model can be trained based on Faster-Regions with Convolutional Neural Network (Faster-RCNN), You Only Look Once (YOLO), single-shot One of the multi-box (SingleShot MultiBox Detector, SSD) networks, etc. Among them, the two-step target detection method represented by Faster R-CNN has the advantage of high detection accuracy, but the disadvantage is that the detection speed is slow; the single-step target detection method represented by YOLO and SSD network has the advantage that the detection speed is higher than that of double-step detection. Method classes are fast.
在实施过程中,以上三类目标检测模型任一的输入可以是待处理图像,经过处理输出至少一个对象检测框。In the implementation process, the input of any of the above three types of object detection models may be images to be processed, and at least one object detection frame is output after processing.
步骤S103、确定所述对象检测框中是否存在所述预设的入侵对象;Step S103, determining whether the preset intrusion object exists in the object detection frame;
在一些实施例中,可以采用级联分类器模型确定对象检测框中是否存在预设的入侵对象。级联分类器模型可以包括多级分类器,每一级分类器完成对应的分类任务。这样,经过级联分类器模型确定的分类结果比单级分类器模型确定的分类结果精度更高,且可以达到有效提升分类效率的效果。In some embodiments, a cascaded classifier model can be used to determine whether a preset intrusion object exists in the object detection frame. The cascaded classifier model may include multi-level classifiers, and each level of classifier completes the corresponding classification task. In this way, the classification result determined by the cascaded classifier model is more accurate than the classification result determined by the single-level classifier model, and the effect of effectively improving the classification efficiency can be achieved.
在实施过程中,例如高快速路场景下,预设的入侵对象可以是行人或非机动车辆。In the implementation process, for example, in a high-speed road scene, the preset intrusion object may be a pedestrian or a non-motor vehicle.
步骤S104、在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;Step S104, in the case of determining that a preset intrusion object exists in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area;
本申请实施例提供的入侵检测方法可以应用于高快速道路中发生的行人/非机动车误闯,或有意进入事件的识别,还可以应用于幼儿园门口检测幼儿走丢、湖边或水边有人失足落水,或越狱事件等这类长尾事件。由于这类长尾事件的特点是在一定时间段内发生的概率低,采集过程中摄像装置采集的图像数据是海量的,如果每个图像都进行目标区域的识别,对系统的运算能力要求高。这里,系统只对经过判断存在入侵对象的待处理图像进行识别,识别的方法可以采用语义分割模型。在本实施例中,以将本申请实施例提供的入侵检测方法应用于高快速道路中发生的行人/非机动车误闯为例进行说明。The intrusion detection method provided by the embodiments of the present application can be applied to the identification of pedestrians/non-motor vehicles inadvertently breaking into high-speed roads, or intentional entry events, and can also be applied to the entrance of kindergartens to detect lost children, people by lakes or waters Falling into the water, or a long-tail event such as a prison break. Since such long-tail events are characterized by a low probability of occurring within a certain period of time, the image data collected by the camera during the acquisition process is massive. If the target area is identified for each image, the computing power of the system is required to be high. . Here, the system only recognizes the to-be-processed images that are judged to have intrusion objects, and the recognition method can use a semantic segmentation model. In this embodiment, the intrusion detection method provided by this embodiment of the present application is applied to the pedestrian/non-motor vehicle accidental intrusion occurring on a high-speed road as an example for description.
步骤S105、基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件。Step S105: Determine whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area.
将在同一张待处理图像中,识别出的预设的入侵对象的位置和步骤S104得到的入侵检测区域,经过运算处理,确定入侵检测区域中有或无预设的入侵对象。In the same image to be processed, the position of the identified preset intrusion object and the intrusion detection area obtained in step S104 are processed to determine whether there is a preset intrusion object in the intrusion detection area.
本申请实施例中具有以下优点:The embodiments of the present application have the following advantages:
1)从待处理的视频流中获得所述待处理图像,这样可以利用图像采集设备采集的视频流中的图像作为输入,对视频流进行分析,能够有效提高图像采集设备的利用率。1) Obtain the to-be-processed image from the to-be-processed video stream, so that the image in the video stream collected by the image capture device can be used as input to analyze the video stream, which can effectively improve the utilization rate of the image capture device.
2)先对待处理图像中的对象进行检测,得到至少一个对象检测框,然后确定对象检测框中是否存在预设的入侵对象。这样,提供的检测模型和分类模型实现了解耦,可以在算法落地过程中,针对特殊场景定制分类模型,快速达到预期性能,解除了对单一检测模型的精度依赖,使得算法速度和精度获得了很大的提升;进一步地,由于检测模型和分类模型实现了解耦,对于新场景误报优化,仅需要加入误报数据训练新分类器,与已有检测模型级联即可,适合算法落地的快速升级迭代,通过级联方式过滤误报,可以极大提高长尾事件的检测精度。2) First, the object in the image to be processed is detected to obtain at least one object detection frame, and then it is determined whether a preset intrusion object exists in the object detection frame. In this way, the provided detection model and classification model are decoupled, and the classification model can be customized for special scenarios during the implementation of the algorithm to quickly achieve the expected performance. Great improvement; further, due to the decoupling of the detection model and the classification model, for the optimization of false positives in new scenarios, it is only necessary to add false positive data to train a new classifier, which can be cascaded with the existing detection model, which is suitable for the implementation of the algorithm The rapid upgrade iteration of the system can greatly improve the detection accuracy of long-tail events by filtering false positives in a cascading manner.
3)在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到 入侵检测区域;基于预设的入侵对象的位置和入侵检测区域,确定是否发生入侵事件。这样,只在确认有入侵对象的待处理图像中进行入侵检测区域的识别,而无需对所有的待处理图像进行识别,能够显著减少对硬件设备上计算力的要求,从而实现了高效全自动检测入侵对象是否入侵入侵检测区域,进而无需人工标注区域,方便大规模上线应用。3) In the case of determining that a preset intrusion object exists in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area; determine whether an intrusion detection area occurs based on the preset intrusion object position and intrusion detection area intrusion event. In this way, the identification of the intrusion detection area is performed only in the to-be-processed images that are confirmed to have intrusion objects, without the need to identify all the to-be-processed images, which can significantly reduce the computing power requirements on hardware devices, thereby realizing efficient and fully automatic detection. Whether the intrusion object intrudes into the intrusion detection area, and then there is no need to manually mark the area, which is convenient for large-scale online applications.
本申请实施例提供一种级联分类模型,如图2所示,该级联分类模型包括:第一级分类器220和第二级分类器230。其中:This embodiment of the present application provides a cascaded classification model. As shown in FIG. 2 , the cascaded classification model includes: a first-level classifier 220 and a second-level classifier 230 . in:
第一级分类器220,包括第一残差网络221和第一全连接层222,其中,第一残差网络221对输入的对象检测框210中的画面内容进行特征提取,得到特征图P1;第一全连接层222基于特征图P1进行第一次分类,得到第一分类结果;基于第一分类结果将不符合要求的对象检测框过滤掉。The first-level classifier 220 includes a first residual network 221 and a first fully connected layer 222, wherein the first residual network 221 performs feature extraction on the image content in the input object detection frame 210 to obtain a feature map P1; The first fully connected layer 222 performs the first classification based on the feature map P1 to obtain a first classification result; based on the first classification result, the object detection frames that do not meet the requirements are filtered out.
第二级分类器230,包括第二残差网络231和第二全连接层232,其中,第二残差网络231用于在第一分类结果满足条件的情况下,对对象检测框中的画面内容进行特征提取,得到特征图P2;第二全连接层232基于特征图P2进行第二次分类,得到第二分类结果;基于第二分类结果将不符合要求的对象检测框过滤掉。The second-level classifier 230 includes a second residual network 231 and a second fully-connected layer 232, wherein the second residual network 231 is used to detect the picture in the object detection frame when the first classification result satisfies the condition Feature extraction is performed on the content to obtain a feature map P2; the second fully connected layer 232 performs a second classification based on the feature map P2 to obtain a second classification result; based on the second classification result, object detection frames that do not meet the requirements are filtered out.
第二分类结果240,表示经过级联模型分类后的对象检测框分类结果。The second classification result 240 represents the classification result of the object detection frame after being classified by the cascade model.
如图2所示的级联分类模型,第一级分类器220中的第一残差网络221可以使用分类速度快的分类模型,例如残差网络Resnet18,可以过滤大部分负样本;第二级分类器230中的第二残差网络231可以使用速度慢但精度高的分类模型,例如残差网络Resnet50,提高准确率,这样整体速度不会变慢很多,精度提高了很多。For the cascaded classification model shown in Figure 2, the first residual network 221 in the first-level classifier 220 can use a classification model with fast classification speed, such as the residual network Resnet18, which can filter most of the negative samples; the second-level The second residual network 231 in the classifier 230 can use a classification model with slow speed but high accuracy, such as the residual network Resnet50, to improve the accuracy, so that the overall speed will not be much slower, and the accuracy will be improved a lot.
本申请实施例提供的一种入侵检测方法,该方法包括:An intrusion detection method provided by an embodiment of the present application includes:
步骤S201、从待处理的视频流中获得所述待处理图像;Step S201, obtaining the to-be-processed image from the to-be-processed video stream;
步骤S202、对所述待处理图像中的对象进行检测,得到至少一个对象检测框、每一所述对象检测框的位置和每一所述对象检测框中对象的类别;Step S202, detecting the object in the to-be-processed image to obtain at least one object detection frame, the position of each of the object detection frames, and the category of the object in each of the object detection frames;
在一些实施例中,在高快速路场景的情况下,检测模型输入为高快速路图像,输出为对象检测框。这里,对象可以是工作人员、行人、动物、车、摩托车、电动自行车等。在特征提取阶段,使用深度卷积网络,对高快速路图像进行特征提取,并通过区域生成网络进行提取候选对象检测框;在检测阶段,基于特征提取阶段得到的对象检测框,对候选对象检测框特征进行位置敏感的候选区域池化,即进行类别分类和坐标回归,得到每一对象检测框的位置和每一对象检测框中对象的类别。In some embodiments, in the case of a high-speed road scene, the detection model input is a high-speed road image, and the output is an object detection frame. Here, the objects can be workers, pedestrians, animals, cars, motorcycles, electric bicycles, and the like. In the feature extraction stage, a deep convolutional network is used to extract features from high-speed road images, and a region generation network is used to extract candidate object detection frames; in the detection stage, based on the object detection frames obtained in the feature extraction stage, candidate objects are detected. The frame feature performs position-sensitive candidate region pooling, that is, category classification and coordinate regression, to obtain the position of each object detection frame and the category of the object in each object detection frame.
步骤S203、在基于所述对象的类别确定任一所述对象检测框中存在所述预设的入侵对象情况下,确定所述待处理图像中存在所述预设的入侵对象;Step S203 , when it is determined that the preset intrusion object exists in any of the object detection frames based on the category of the object, determine that the preset intrusion object exists in the to-be-processed image;
在一些实施例中,在高快速路场景的情况下,检测结果包括但不限于以下中的至少一种:高快速路上的工作人员、行人、动物、车、摩托车、电动自行车等。这里,可以确定类别为行人和非机动车的对象为预设的入侵对象,即任一对象检测框中存在行人和非机动车的情况下,确定待处理图像中存在预设的入侵对象。In some embodiments, in the case of a high-speed road scene, the detection result includes, but is not limited to, at least one of the following: workers, pedestrians, animals, cars, motorcycles, electric bicycles, and the like on the high-speed road. Here, it may be determined that objects classified as pedestrians and non-motor vehicles are preset intrusion objects, that is, when there are pedestrians and non-motor vehicles in any object detection frame, it is determined that there are preset intrusion objects in the image to be processed.
步骤S204、基于存在所述预设的入侵对象的对象检测框的位置,确定所述预设的入侵对象的位置;Step S204, determining the position of the preset intrusion object based on the position of the object detection frame where the preset intrusion object exists;
在一些实施例中,存在预设的入侵对象的对象检测框的位置,可以用位置坐标表示。基于对象检测框的位置坐标可以确定入侵对象在待处理图像中的位置。In some embodiments, the position of the object detection frame where there is a preset intrusion object may be represented by position coordinates. Based on the position coordinates of the object detection frame, the position of the intrusion object in the image to be processed can be determined.
步骤S205、采用第一级分类器,基于所述对象的类别将目标类别对应的对象检测框进行第一分类,得到第一分类结果;Step S205, using a first-level classifier to perform a first classification on the object detection frame corresponding to the target category based on the category of the object, to obtain a first classification result;
在一些实施例中,可以使用如图2所示的级联分类模型,第一级分类器220可以包括第一残差网络221和第一全连接层222。其中,第一残差网络221可以使用ResNet18网络,18代表的是网络的深度,也就是18指定的是带有权重的18层,包括卷积层和全连接层,不包括池化层和批归一化(Batch Normalization,BN)层。ResNet18网络对对象检测框进行特征提取,得到特征图;第一全连接层222基于特征图进行第一次分类,得到第一分类结果,即将不满足要求的对象检测框经过第一过滤过滤掉。这里第一级分类器220完成的是图像的初判,也可以称为粗判,粗判的特点是效率高但误判率高。In some embodiments, a cascaded classification model as shown in FIG. 2 may be used, and the first-level classifier 220 may include a first residual network 221 and a first fully connected layer 222 . Among them, the first residual network 221 can use the ResNet18 network, 18 represents the depth of the network, that is, 18 specifies 18 layers with weights, including convolutional layers and fully connected layers, excluding pooling layers and batches Normalization (Batch Normalization, BN) layer. The ResNet18 network performs feature extraction on the object detection frame to obtain a feature map; the first fully connected layer 222 performs the first classification based on the feature map to obtain the first classification result, that is, the object detection frame that does not meet the requirements is filtered out by the first filter. Here, the first-level classifier 220 completes the initial judgment of the image, which may also be called rough judgment. The characteristics of rough judgment are high efficiency but high false judgment rate.
步骤S206、采用与所述第一级分类器级联的第二级分类器,基于所述第一分类结果,对满足预设条件的对象检测框进行第二分类,得到第二分类结果;Step S206, using the second-level classifier cascaded with the first-level classifier, and based on the first classification result, perform a second classification on the object detection frame that meets the preset condition, and obtain a second classification result;
在一些实施例中,第一级分类器与所述第二级分类器之间具有以下关系:所述第一级分类器的分类精度比所述第二级分类器的分类精度低;所述第一级分类器中卷积层的层数比所述第二级分类器中卷积层的层数少;所述第一级分类器的置信度比所述第二级分类器的置信度低。In some embodiments, the first-level classifier and the second-level classifier have the following relationship: the classification accuracy of the first-level classifier is lower than the classification accuracy of the second-level classifier; the The number of convolutional layers in the first-level classifier is less than the number of convolutional layers in the second-level classifier; the confidence level of the first-level classifier is higher than that of the second-level classifier. Low.
在一些实施例中,如图2所示,在第一分类结果满足条件的情况下,对对象检测框使用第二级分类器230进行分类,其中第二级分类器230包括第二残差网络231和第二全连接层232。其中,第二残差网络231可以使用ResNet50网络。ResNet50网络对对象检测框进行特征提取,得到特征图;第二全连接层232基于特征图进行第二次分类,得到第二分类结果,即将不满足要求的对象检测框经过第二过滤过滤掉。第二级分类器230完成的是对对象检测框的复判,也可以称为细判,细判的特点是分类精度高、误判率低。In some embodiments, as shown in FIG. 2 , when the first classification result satisfies the condition, the object detection frame is classified using the second-level classifier 230, wherein the second-level classifier 230 includes a second residual network 231 and the second fully connected layer 232. Wherein, the second residual network 231 can use the ResNet50 network. The ResNet50 network performs feature extraction on the object detection frame to obtain a feature map; the second fully connected layer 232 performs a second classification based on the feature map to obtain a second classification result, that is, the object detection frame that does not meet the requirements is filtered out by a second filter. What the second-level classifier 230 completes is the re-judgment of the object detection frame, which may also be called fine judgment. The fine judgment is characterized by high classification accuracy and low misjudgment rate.
步骤S207、在基于所述第二分类结果确定任一所述对象检测框中存在所述预设的入侵对象时,确定所述对象检测框中存在所述预设的入侵对象;Step S207, when it is determined that the preset intrusion object exists in any of the object detection frames based on the second classification result, determine that the preset intrusion object exists in the object detection frame;
步骤S208、在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;Step S208, in the case of determining that there is a preset intrusion object in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area;
步骤S209、将存在所述预设的入侵对象的对象检测框,确定为目标检测框;Step S209, determining the object detection frame in which the preset intrusion object exists as the target detection frame;
步骤S210、将所述目标检测框的底边的中心点,确定为所述预设的入侵对象的位置;Step S210, determining the center point of the bottom edge of the target detection frame as the preset position of the intrusion object;
在一些实施例中,目标检测框的底边的中心点对应一个位置坐标,将该位置坐标确定为入侵对象的位置。In some embodiments, the center point of the bottom edge of the target detection frame corresponds to a position coordinate, and the position coordinate is determined as the position of the intrusion object.
在另一些实施例中,还可以确定目标检测框的任意边框的中心点对应的位置坐标为入侵对象的位置。In other embodiments, the position coordinate corresponding to the center point of any frame of the target detection frame may also be determined as the position of the intrusion object.
步骤S211、基于所述目标检测框的底边的中心点和所述入侵检测区域之间的相对位置关系,确定是否发生入侵事件。Step S211 , based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area, determine whether an intrusion event occurs.
在实施过程中,可以将目标检测框的底边的中心点对应的位置坐标与入侵检测区域的位置坐标做比对。在确定该位置坐标属于入侵区域的情况下,确定发生入侵事件;在确定该位置坐标不属于入侵区域的情况下,确定未发生入侵事件。In the implementation process, the position coordinates corresponding to the center point of the bottom edge of the target detection frame may be compared with the position coordinates of the intrusion detection area. If it is determined that the position coordinates belong to the intrusion area, it is determined that an intrusion event has occurred; if it is determined that the position coordinates do not belong to the intrusion area, it is determined that no intrusion event has occurred.
本申请实施例中,获得分类后的对象检测框,仅将符合的分类的对象框裁剪出来的图像输入级联分类。这样,无需对所有的待处理图像进行识别,能够显著减少对硬件设备上计算力的要求。In the embodiment of the present application, the classified object detection frame is obtained, and only the image cropped from the matched classified object frame is input into the cascade classification. In this way, there is no need to identify all the images to be processed, which can significantly reduce the computing power requirement on the hardware device.
本申请实施例中,在经过级联分类确认符合包括入侵对象的对象检测框的情况下,将该待处理 图像进行识别,得到入侵检测区域。这样,利用级联分类器分两次进行分类,第一级分类可以视为是初步判定,第二级分类可以视为二次判定,采用先初步判定再二次判定的两次分类方法可以有效提升分类的效率、降低误判率。用于类似的长尾事件检测,例如烟火检测,通过级联方式过滤误报,可以极大提高长尾事件的检测精度。In the embodiment of the present application, when it is confirmed that the object detection frame including the intrusion object is met through the cascade classification, the to-be-processed image is identified to obtain the intrusion detection area. In this way, the cascade classifier is used for classification in two steps. The first-level classification can be regarded as a preliminary judgment, and the second-level classification can be regarded as a secondary judgment. The two-stage classification method of first preliminary judgment and then secondary judgment can be effective. Improve the efficiency of classification and reduce the misjudgment rate. For similar long-tail event detection, such as fireworks detection, filtering false positives in a cascaded manner can greatly improve the detection accuracy of long-tail events.
本申请实施例中,基于所述目标检测框的底边的中心点和所述入侵检测区域之间的相对位置关系,确定是否发生入侵事件。这样,将目标检测框的底边的中心点对应的位置坐标与入侵检测区域的位置坐标做比对。在确定该位置坐标属于入侵区域的情况下,确定发生入侵事件,可以有效提升确定入侵事件的精度。In the embodiment of the present application, it is determined whether an intrusion event occurs based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area. In this way, the position coordinates corresponding to the center point of the bottom edge of the target detection frame are compared with the position coordinates of the intrusion detection area. When it is determined that the location coordinates belong to the intrusion area, it is determined that an intrusion event occurs, which can effectively improve the accuracy of determining the intrusion event.
本申请实施例提供的一种语义分割模型,如图3所示,该语义分割模型包括:多层卷积网络302、多层反卷积网络303和完成语义分割的图像304。其中:A semantic segmentation model provided by an embodiment of the present application, as shown in FIG. 3 , the semantic segmentation model includes: a multi-layer convolution network 302 , a multi-layer deconvolution network 303 , and an image 304 that has completed semantic segmentation. in:
多层卷积网络302,在层数取值为5的情况下,是5层卷积网络,用于对待处理图像进行32倍降采样,同时对待处理图像进行编码。The multi-layer convolutional network 302, when the number of layers is 5, is a 5-layer convolutional network, which is used to downsample the image to be processed by 32 times, and at the same time encode the image to be processed.
多层反卷积网络303,在层数取值为4的情况下,是4层反卷积网络,用于对编码结果进行32倍上采样,对编码结果进行解码和语义理解。The multi-layer deconvolution network 303, when the number of layers is 4, is a 4-layer deconvolution network, which is used to upsample the encoding result by 32 times, and perform decoding and semantic understanding of the encoding result.
以待处理图像为高快速路场景为例,将待处理图像301输入卷积神经网络模型得到可以完成语义分割的图像304,即得到高快速路区域(灰色,标签为1)和非高快速路区域(黑色,标签为0)。这样,将高快速路区域标注为灰色,将非高快速路区域标注为黑色,可以达到可视化效果;将高快速路区域标注为1,将非高快速路区域标注为0,可以实现利用不同的标注区域,快速识别出入侵对象所在的位置。Taking the image to be processed as a high-speed expressway scene as an example, the image to be processed 301 is input into the convolutional neural network model to obtain an image 304 that can complete semantic segmentation, that is, the high-speed expressway area (gray, the label is 1) and the non-highway expressway area are obtained. Area (black, label 0). In this way, marking the high-speed expressway area as gray and marking the non-high-speed expressway area as black can achieve the visualization effect; marking the high-speed expressway area as 1, and marking the non-high-speed expressway area as 0 can realize the use of different Mark the area to quickly identify the location of the intruding object.
本申请实施例提供的一种检测模型,如图4所示,该检测模型包括:深度卷积网络402、区域生成网络(RegionProposal Network,RPN)403、位置敏感的候选区域池化层(Position Sensitive Regions of Interest Pooling,PSROIPooling)404、边框回归结果405和分类结果406。其中:A detection model provided by the embodiment of the present application, as shown in FIG. 4 , the detection model includes: a deep convolutional network 402, a region generation network (Region Proposal Network, RPN) 403, a position-sensitive candidate region pooling layer (Position Sensitive Regions of Interest Pooling, PSROIPooling) 404, bounding box regression result 405, and classification result 406. in:
深度卷积网络402,用于对待处理图像301(与图3所示的待处理图像301为同一图像)进行特征提取,得到第一特征图。The deep convolutional network 402 is used to perform feature extraction on the to-be-processed image 301 (which is the same image as the to-be-processed image 301 shown in FIG. 3 ) to obtain a first feature map.
区域生成网络(RegionProposal Network,RPN)403,用于在第一特征图上生成候选目标区域(对象检测框),得到第二特征图,所述第二特征图包括至少一个检测框和每一所述检测框的位置、置信度。Region Proposal Network (RPN) 403 is used to generate a candidate target region (object detection frame) on the first feature map to obtain a second feature map, the second feature map includes at least one detection frame and each The position and confidence of the detection frame.
位置敏感的候选区域池化(Position Sensitive Regions of Interest Pooling,PSROIPooling)层404,用于对同时输入的第一特征图像和至少一个对象检测框进行位置敏感的候选区域池化,得到边框回归结果405和分类结果406,其中,本实施例中,所述分类结果406实现检测结果的预测,所述检测结果包括但不限于以下中的至少一种:高快速路上的工作人员、行人、动物、车、摩托车、电动自行车等等,所述边框回归结果405预测所述检测结果所对应检测框的精准坐标。The position-sensitive candidate region pooling (Position Sensitive Regions of Interest Pooling, PSROIPooling) layer 404 is used to perform position-sensitive candidate region pooling on the simultaneously input first feature image and at least one object detection frame to obtain a frame regression result 405 and the classification result 406, wherein, in this embodiment, the classification result 406 realizes the prediction of the detection result, and the detection result includes but is not limited to at least one of the following: staff, pedestrians, animals, vehicles on high-speed expressways , motorcycle, electric bicycle, etc., the frame regression result 405 predicts the precise coordinates of the detection frame corresponding to the detection result.
这样,待处理图像经过检测模型的检测后,得到从待处理图像中剪裁出的至少一个对象检测框,并确定每一所述对象检测框的位置、置信度和对象检测框中对象的类别。In this way, after the image to be processed is detected by the detection model, at least one object detection frame cropped from the to-be-processed image is obtained, and the position, confidence level and category of the object in the object detection frame are determined for each of the object detection frames.
本申请实施例提供的一种入侵检测方法,该方法包括:An intrusion detection method provided by an embodiment of the present application includes:
步骤S401、从待处理的视频流中获得所述待处理图像;Step S401, obtaining the to-be-processed image from the to-be-processed video stream;
步骤S402、基于深度卷积网络,对所述待处理图像进行特征提取,得到第一特征图;Step S402, performing feature extraction on the to-be-processed image based on a deep convolutional network to obtain a first feature map;
在一些实施例中,可以使用如图4所示的,基于快速区域卷积神经网络(Faster-Regions with Convolutional Neural Network,Faster-RCNN)的检测器进行待处理图像的检测,Faster-RCNN网络输入的是待处理图像,经过处理输出至少一个对象检测框;In some embodiments, as shown in FIG. 4 , a detector based on Faster-Regions with Convolutional Neural Network (Faster-RCNN) can be used to detect images to be processed, and the Faster-RCNN network inputs is the image to be processed, and at least one object detection frame is output after processing;
如图4所示,阶段一(特征提取阶段)使用深度卷积网络402进行特征提取,其中,深度卷积网络402包括:向量卷积运算1(conv1)、向量卷积运算2(conv2)、稠密向量卷积运算3(dense conv3)和稠密向量卷积运算4(dense conv4),使用以上4个深度卷积网络,对图像进行特征提取。As shown in FIG. 4 , the first stage (feature extraction stage) uses the deep convolution network 402 to perform feature extraction, wherein the deep convolution network 402 includes: vector convolution operation 1 (conv1), vector convolution operation 2 (conv2), Dense vector convolution operation 3 (dense conv3) and dense vector convolution operation 4 (dense conv4) use the above four deep convolutional networks to perform feature extraction on images.
步骤S403、基于区域生成网络,在所述第一特征图中生成候选目标区域,得到第二特征图;所述第二特征图包括至少一个检测框和每一所述检测框的位置、置信度;Step S403, based on the region generation network, generate candidate target regions in the first feature map to obtain a second feature map; the second feature map includes at least one detection frame and the position and confidence of each detection frame ;
如图4所示,基于区域生成网络403,在第一特征图中生成候选目标区域,得到第二特征图。第二特征图包括至少一个检测框和每一所述检测框的位置、检出目标的置信度。As shown in FIG. 4 , based on the region generation network 403 , candidate target regions are generated in the first feature map to obtain the second feature map. The second feature map includes at least one detection frame, the position of each detection frame, and the confidence level of the detected target.
步骤S404、基于池化层在将所述第一特征图和所述第二特征图进行位置敏感的候选区域池化的过程中,基于所述至少一个检测框和每一所述检测框的置信度,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别;Step S404, in the process of pooling the first feature map and the second feature map into position-sensitive candidate regions based on the pooling layer, based on the confidence of the at least one detection frame and each detection frame degree, determine the detection frame that satisfies the preset condition as the object detection frame, and determine the category of the object in the object detection frame;
如图4所示,将第一特征图和第二特征图进行位置敏感的候选区域池化,即同时输入第一特征图和第二特征图进行位置敏感的候选区域池化层404,得到边框回归结果405和分类结果406,得到检出目标的置信度,以及检测框的位置,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别。As shown in FIG. 4 , the first feature map and the second feature map are subjected to position-sensitive candidate region pooling, that is, the first feature map and the second feature map are simultaneously input to perform the position-sensitive candidate region pooling layer 404 to obtain a frame. From the regression result 405 and the classification result 406, the confidence of the detected target and the position of the detection frame are obtained, the detection frame satisfying the preset condition is determined as the object detection frame, and the category of the object in the object detection frame is determined.
步骤S405、在基于所述对象的类别确定任一所述对象检测框中存在所述预设的入侵对象情况下,确定所述待处理图像中存在所述预设的入侵对象;Step S405 , when it is determined that the preset intrusion object exists in any of the object detection frames based on the category of the object, determine that the preset intrusion object exists in the to-be-processed image;
步骤S406、基于存在所述预设的入侵对象的对象检测框的位置,确定所述预设的入侵对象的位置;Step S406, determining the position of the preset intrusion object based on the position of the object detection frame where the preset intrusion object exists;
步骤S407、确定所述对象检测框中是否存在所述预设的入侵对象;Step S407, determining whether the preset intrusion object exists in the object detection frame;
步骤S408、在确定所述对象检测框中存在预设的入侵对象的情况下,采用卷积神经网络模型对所述待处理图像进行语义分割,得到所述入侵检测区域;Step S408: In the case of determining that a preset intrusion object exists in the object detection frame, use a convolutional neural network model to perform semantic segmentation on the to-be-processed image to obtain the intrusion detection area;
在实施过程中,如图3所示的卷积神经网络模型包括多层卷积网络302、多层反卷积网络303。其中,多层卷积网络302,在层数取值为5的情况下,是5层卷积网络,用于对待处理图像进行32倍降采样,同时对待处理图像进行编码;多层反卷积网络303,在层数取值为4的情况下,是4层反卷积网络,用于对编码结果进行32倍上采样,对编码结果进行解码和语义理解。During implementation, the convolutional neural network model shown in FIG. 3 includes a multi-layer convolution network 302 and a multi-layer deconvolution network 303 . Among them, the multi-layer convolution network 302, when the number of layers is 5, is a 5-layer convolution network, which is used to downsample the image to be processed by 32 times and encode the image to be processed at the same time; multi-layer deconvolution The network 303, when the number of layers is 4, is a 4-layer deconvolution network, which is used to upsample the encoding result by 32 times, and perform decoding and semantic understanding of the encoding result.
以待处理图像为高快速路场景为例,将输入待处理图像301输入卷积神经网络模型可以得到高快速路区域(灰色,标签为1)和非高快速路区域(黑色,标签为0)。这样,将高快速路区域标注为灰色,将非高快速路区域标注为黑色,可以达到可视化效果;将高快速路区域标注为1,将非高快速路区域标注为0,可以实现利用不同的标注区域,快速识别出入侵对象所在的位置。Taking the image to be processed as a high-speed road scene as an example, inputting the input to-be-processed image 301 into the convolutional neural network model can obtain the high-speed road area (gray, the label is 1) and the non-high-speed road area (black, the label is 0) . In this way, marking the high-speed expressway area as gray and marking the non-high-speed expressway area as black can achieve the visualization effect; marking the high-speed expressway area as 1, and marking the non-high-speed expressway area as 0 can realize the use of different Mark the area to quickly identify the location of the intruding object.
步骤S409、基于所述预设的入侵对象的位置判断所述预设的入侵对象是否位于所述入侵检测区域内;Step S409, judging whether the preset intrusion object is located in the intrusion detection area based on the position of the preset intrusion object;
在实施过程中,入侵对象的位置可以对应一组位置坐标,例如将对象检测框的一组位置坐标作为入侵对象的位置坐标,在一组位置坐标中选择最能代表入侵对象位置的坐标作为入侵对象的位置坐标,与入侵检测区域的位置坐标做比对,以判断入侵对象是否位于入侵检测区域内。In the implementation process, the position of the intrusion object may correspond to a set of position coordinates. For example, a group of position coordinates of the object detection frame is used as the position coordinates of the intrusion object, and the coordinates that best represent the position of the intrusion object are selected from a set of position coordinates as the intrusion object. The position coordinates of the object are compared with the position coordinates of the intrusion detection area to determine whether the intrusion object is located in the intrusion detection area.
步骤S410、响应于所述预设的入侵对象位于所述入侵检测区域之内,确定发生所述入侵事件; 或响应于所述预设的入侵对象位于所述入侵检测区域之外,确定未发生所述入侵事件。Step S410, determining that the intrusion event occurs in response to the preset intrusion object being located within the intrusion detection area; or determining that the intrusion event has not occurred in response to the preset intrusion object being located outside the intrusion detection area the intrusion event.
本申请实施例中,基于深度卷积网络,得到第一特征图;基于区域生成网络,得到第二特征图;基于池化层在将第一特征图和第二特征图进行位置敏感的候选区域池化的过程中,基于至少一个检测框和每一检测框的置信度,将满足预设条件的检测框确定为对象检测框,并确定对象检测框中对象的类别。这样,得到的对象检测框是包括预设的入侵对象的检测框,如果没有检测到包括入侵对象的对象检测框,则不需要进行后续的处理,可以有效提升长尾事件的检测效率。In the embodiment of the present application, the first feature map is obtained based on the deep convolutional network; the second feature map is obtained based on the region generating network; In the pooling process, based on the confidence of at least one detection frame and each detection frame, a detection frame that satisfies a preset condition is determined as an object detection frame, and a category of an object in the object detection frame is determined. In this way, the obtained object detection frame is a detection frame including a preset intrusion object. If no object detection frame including an intrusion object is detected, subsequent processing is not required, which can effectively improve the detection efficiency of long-tail events.
本申请实施例中,在存在预设的入侵对象的情况下,采用卷积神经网络模型对待处理图像进行语义分割,得到用颜色和标签区分的入侵检测域和非入侵检测区域。这样,使用不同颜色标注出的入侵检测区域,可以达到可视化效果;使用不用标签标出的入侵检测区域,可以实现利用不同标签标注不同区域,快速识别出入侵对象所在的位置。通过引入卷积神经网络模型对待处理图像进行入侵检测区域识别,实现了入侵检测区域的自动识别,无需人工标注区域,方便大规模上线应用。不需要预先标注入侵检测区域,部署上线方便。In the embodiment of the present application, in the presence of a preset intrusion object, a convolutional neural network model is used to semantically segment the image to be processed, and an intrusion detection area and a non-intrusion detection area distinguished by colors and labels are obtained. In this way, using the intrusion detection areas marked with different colors can achieve a visualization effect; using the intrusion detection areas marked without labels, you can use different labels to mark different areas and quickly identify the location of the intrusion object. By introducing the convolutional neural network model to identify the intrusion detection area on the image to be processed, the automatic identification of the intrusion detection area is realized without manual labeling of the area, which is convenient for large-scale online applications. There is no need to pre-mark the intrusion detection area, and it is easy to deploy and go online.
本申请实施例中,基于预设的入侵对象的位置判断预设的入侵对象是否位于入侵检测区域内,可有效提升确定预设的入侵对象位于入侵检测区域的准确性。In the embodiment of the present application, determining whether the preset intrusion object is located in the intrusion detection area based on the preset intrusion object position can effectively improve the accuracy of determining that the preset intrusion object is located in the intrusion detection area.
本申请实施例提供的一种入侵检测方法,该方法包括:An intrusion detection method provided by an embodiment of the present application includes:
步骤S421、从待处理的视频流中获得所述待处理图像;Step S421, obtaining the to-be-processed image from the to-be-processed video stream;
步骤S422、基于深度卷积网络,对所述待处理图像进行特征提取,得到第一特征图;Step S422, performing feature extraction on the to-be-processed image based on a deep convolutional network to obtain a first feature map;
步骤S423、基于区域生成网络,在所述第一特征图中生成候选目标区域,得到第二特征图;所述第二特征图包括至少一个检测框和每一所述检测框的位置、置信度;Step S423, based on the region generation network, generate candidate target regions in the first feature map to obtain a second feature map; the second feature map includes at least one detection frame and the position and confidence of each detection frame ;
步骤S424、基于池化层在将所述第一特征图和所述第二特征图进行位置敏感的候选区域池化的过程中,采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别;Step S424, in the process of pooling the first feature map and the second feature map into position-sensitive candidate regions based on the pooling layer, a non-maximum value suppression algorithm is used, based on the detection value of each detection frame. The confidence level and the intersection ratio between the detection frames in the at least one detection frame, determine the detection frame that satisfies the preset condition as the object detection frame, and determine the category of the object in the object detection frame;
在实施过程中,采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框,包括:基于每一所述检测框的置信度,将至少一个所述检测框中置信度最大的检测框,确定为目标检测框;将所述目标检测框确定为一所述对象检测框;确定所述目标检测框与每一其他检测框的交并比;其中,所述其他检测框是指所述至少一个所述检测框中除所述目标检测框之外的检测框;将交并比大于阈值的其他检测框从所述至少一个所述检测框删除,得到候选检测框集合;将所述候选检测框集合中除所述目标检测框之外的置信度最大的检测框,确定为新的目标检测框;将所述新的目标检测框确定为一所述对象检测框;确定所述新的目标检测框与每一新的其他检测框的交并比;其中,所述每一新的其他检测框是指所述候选检测框集合中除所述新的目标检测框之外的检测框;将交并比大于阈值的新的其他检测框从所述候选检测框集合中删除,得到新的候选检测框集合;以此类推,得到所述对象检测框。In the implementation process, a non-maximum value suppression algorithm is used, based on the confidence of each detection frame and the intersection ratio between the detection frames in the at least one detection frame, the Determining the detection frame as an object detection frame includes: based on the confidence of each detection frame, determining at least one detection frame with the highest confidence in the detection frame as a target detection frame; determining the target detection frame as 1. the object detection frame; determine the intersection ratio of the target detection frame and each other detection frame; wherein, the other detection frame refers to the at least one of the detection frame except the target detection frame delete the other detection frames whose intersection ratio is greater than the threshold from the at least one of the detection frames to obtain a candidate detection frame set; The largest detection frame is determined as a new target detection frame; the new target detection frame is determined as one of the object detection frames; the intersection ratio between the new target detection frame and each new other detection frame is determined. ; wherein, the each new other detection frame refers to a detection frame other than the new target detection frame in the candidate detection frame set; the new other detection frame whose intersection ratio is greater than the threshold is removed from the Delete the candidate detection frame set to obtain a new candidate detection frame set; and so on, to obtain the object detection frame.
步骤S425、在基于所述对象的类别确定任一所述对象检测框中存在所述预设的入侵对象情况下,确定所述待处理图像中存在所述预设的入侵对象;Step S425, in the case that the preset intrusion object exists in any of the object detection frames based on the category of the object, determine that the preset intrusion object exists in the to-be-processed image;
步骤S426、基于存在所述预设的入侵对象的对象检测框的位置,确定所述预设的入侵对象的位 置;Step S426, based on the position of the object detection frame of the preset intrusion object, determine the position of the preset intrusion object;
步骤S427、确定所述对象检测框中是否存在所述预设的入侵对象;Step S427, determining whether the preset intrusion object exists in the object detection frame;
步骤S428、在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;Step S428, in the case of determining that a preset intrusion object exists in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area;
步骤S429、基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件。Step S429: Determine whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area.
步骤S430、响应于发生所述入侵事件,输出告警标识;Step S430, outputting an alarm identifier in response to the occurrence of the intrusion event;
在实施过程中,以识别高快速路的行人闯入事件为例,输出告警标识,可以迅速引导入侵对象离开危险区域,预防交通事故发生。还可以利用识别结果,发现入侵对象入侵高频点位,加强防范措施。In the process of implementation, take pedestrian intrusion incidents on expressways as an example, and output warning signs, which can quickly guide the intruding objects to leave the dangerous area and prevent traffic accidents. The identification results can also be used to find intrusion objects intruding high-frequency points and strengthen preventive measures.
步骤S431、响应于发生所述入侵事件,基于所述入侵对象的类别和所述入侵检测区域对所述入侵事件进行记录,得到入侵记录;Step S431, in response to the occurrence of the intrusion event, record the intrusion event based on the category of the intrusion object and the intrusion detection area to obtain an intrusion record;
步骤S432、将所述入侵记录进行存储或者发送给关联的终端。Step S432: Store the intrusion record or send it to an associated terminal.
本申请实施例中,在检测预设的入侵对象的过程中,首先得到检出对象的置信度,以及检测框的位置。然后采用非极大值抑制算法,合并交并比大于阈值的检测框,将满足预设条件的检测框确定为对象检测框。这样,采用非极大值抑制算法可以将待处理图像中的每一对象最终确定一个最合适的对象检测框。In the embodiment of the present application, in the process of detecting a preset intrusion object, the confidence level of the detected object and the position of the detection frame are obtained first. Then, a non-maximum value suppression algorithm is used to merge the detection frames whose intersection ratio is greater than the threshold, and determine the detection frame that meets the preset condition as the object detection frame. In this way, each object in the image to be processed can be finally determined a most suitable object detection frame by using the non-maximum value suppression algorithm.
本申请实施例中,当确定对象检测框在入侵检测区域内的情况下,输出告警标识;基于入侵对象的类别和入侵检测区域对入侵事件进行记录,得到入侵记录;将入侵记录进行存储或者发送给关联的终端。这样,可以根据告警标识迅速引导入侵对象离开入侵检测区域,有效预防入侵对象进入入侵检测区域。还可以对入侵事件进行记录,根据入侵记录发现入侵对象入侵的高频点位,加强防范措施。In the embodiment of the present application, when it is determined that the object detection frame is within the intrusion detection area, an alarm flag is output; the intrusion event is recorded based on the category of the intrusion object and the intrusion detection area, and the intrusion record is obtained; the intrusion record is stored or sent to the associated terminal. In this way, the intrusion object can be quickly guided to leave the intrusion detection area according to the alarm identification, thereby effectively preventing the intrusion object from entering the intrusion detection area. Intrusion events can also be recorded, and high-frequency points of intrusion by intrusion objects can be found according to the intrusion records, and preventive measures can be strengthened.
高快速道路中时常发生行人/非机动车误闯或有意进入,影响道路行驶车辆的正常行驶,对交通安全造成极大影响。视频巡逻需对道路中行人/机动车进行实时、主动检测,发现行人/机动车出现在高快速路行驶范围内时,及时进行相关预警,并通知交警部门及时响应处理,引导敦促行人/非机动车离开高快速路驾驶区域,消除道路行驶安全隐患,提升道路行驶安全指数。Pedestrians/non-motor vehicles often enter by mistake or intentionally on expressways, which affects the normal driving of vehicles on the road and has a great impact on traffic safety. Video patrols need to conduct real-time and active detection of pedestrians/motor vehicles on the road. When pedestrians/motor vehicles are found within the driving range of high-speed highways, relevant early warnings should be issued in time, and the traffic police department should be notified to respond in a timely manner to guide and urge pedestrians/non-motor vehicles. The motor vehicle leaves the driving area of the expressway, eliminates the hidden danger of road driving, and improves the road driving safety index.
早期的视频巡逻系统,主要依赖研判员轮询图像采集设备的录像,来发现误闯入高快速的行人,并采取相应措施。这种方案虽然能够有效发现误闯入行人,但研判效率低,比较容易出现遗漏的情况,且轮询实时性也不高。随着计算机视觉技术的发展,目标检测算法得到了极大改进,将其用于预筛选图像视频中出现的行人,极大地提高了研判员的工作效率。近些年,目标检测算法基于深度学习提出了数据驱动的方案,进一步提高了行人入侵检测的准确率和召回率。算法精度如何达到甚至超越人工研判成为研究热点。The early video patrol system mainly relied on the judge polling the video of the image acquisition equipment to detect the pedestrians who broke into the high speed by mistake, and take corresponding measures. Although this scheme can effectively detect pedestrians entering by mistake, the research and judgment efficiency is low, omissions are more likely to occur, and the real-time polling is not high. With the development of computer vision technology, the target detection algorithm has been greatly improved. It is used to pre-screen pedestrians appearing in images and videos, which greatly improves the work efficiency of judges. In recent years, target detection algorithms have proposed data-driven solutions based on deep learning, which further improves the accuracy and recall rate of pedestrian intrusion detection. How the algorithm accuracy can reach or even surpass manual judgment has become a research hotspot.
通常情况下,行人入侵在快高快速路中并不是一种常见事件,这对检测算法精度提出了很高的要求,例如99%准确率意味着百起事件时间内只允许1次误报。虽然基于深度学习的目标检测方法,理论上可以通过增加训练数据和模型容量达到目标精度,但需要投入大量人力进行检测框标注,并且会大幅度增加算法运行的硬件成本,是行人入侵事件算法落地亟待解决的问题。另一方面,行人入侵需要设定禁止入侵区域,基于人工标定区域的方式对大规模应用存在大量冗余的运维工作。Usually, pedestrian intrusion is not a common event on fast-high-speed roads, which places high requirements on the accuracy of detection algorithms. For example, 99% accuracy means that only one false positive is allowed within a hundred incidents. Although the target detection method based on deep learning can theoretically achieve the target accuracy by increasing the training data and model capacity, it requires a lot of manpower to label the detection frame, and will greatly increase the hardware cost of the algorithm operation, which is the landing of the pedestrian intrusion event algorithm. Problems to be solved. On the other hand, pedestrian intrusion needs to set a prohibited intrusion area, and there is a lot of redundant operation and maintenance work for large-scale applications based on the method of manually demarcating the area.
图5为本申请实施例提供的智慧交通平台展示图,如图5所示,智慧交通平台501,用于展示 使用入侵检测方法在快高快速路上识别出的行人/非机动车闯入图像。Fig. 5 is a display diagram of an intelligent transportation platform provided by an embodiment of the present application. As shown in Fig. 5, an intelligent transportation platform 501 is used to display the intrusion images of pedestrians/non-motor vehicles identified on the Kuaigao Expressway using the intrusion detection method.
图6为本申请实施例提供的行人/车闯入图,如图6所示,行人/车闯入601,是点击图5所示行人/非机动车闯入图像后,展示的行人/非机动车闯入放大图像和图像详情,例如行人/非机动车闯入的时间、地点等。FIG. 6 is a pedestrian/vehicle intrusion diagram provided by an embodiment of the present application. As shown in FIG. 6 , the pedestrian/vehicle intrusion 601 is the displayed pedestrian/non-motor vehicle intrusion image after clicking the pedestrian/non-motor vehicle intrusion image shown in FIG. 5 . Motor vehicle intrusion magnified image and image details, such as time, location, etc. of pedestrian/non-motor vehicle intrusion.
为了说明本申请实施例提供的方案能够明显节省硬件设备的计算力,以时间顺序来说明快高快速路上判断行人闯入禁止区域为例,图7为本申请实施例提供的一种入侵检测方法的实现流程示意图,如图7示,其中,时间轴上涉及5个时刻,按照先后顺序依次为T1至T5时刻,工作流程描述如下:In order to illustrate that the solution provided by the embodiment of the present application can significantly save the computing power of the hardware device, an example of judging pedestrians entering the forbidden area on the Kuaigao Expressway is described in chronological order. FIG. 7 is an intrusion detection method provided by the embodiment of the present application. The schematic diagram of the realization flow of , as shown in Figure 7, in which, the time axis involves 5 moments, which are from T1 to T5 in sequence, and the workflow is described as follows:
步骤S700、在T1时刻将待处理图像输入检测器,获得至少一个候选行人检测框;Step S700, input the image to be processed into the detector at time T1 to obtain at least one candidate pedestrian detection frame;
其中,待处理图像可以是原始图像也可以是对原始图像经过预处理之后的图像。步骤S700的处理过程分为特征提取阶段和检测阶段:The image to be processed may be an original image or an image after preprocessing the original image. The processing process of step S700 is divided into a feature extraction stage and a detection stage:
特征提取阶段:利用如图4所示的深度卷积网络402,包括向量卷积运算1(conv1)、向量卷积运算2(conv2)、稠密向量卷积运算3(dense conv3)和稠密向量卷积运算4(dense conv4)。可以使用以上4个深度卷积层,对待处理图像进行特征提取得到第一特征图,基于区域生成网络403,在第一特征图中生成候选目标区域,得到第二特征图。第二特征图包括至少一个检测框和每一所述检测框的位置、检出目标的置信度。Feature extraction stage: using the deep convolution network 402 shown in Figure 4, including vector convolution operation 1 (conv1), vector convolution operation 2 (conv2), dense vector convolution operation 3 (dense conv3) and dense vector convolution Product operation 4 (dense conv4). The above four depth convolutional layers can be used to perform feature extraction on the image to be processed to obtain a first feature map, and based on the region generation network 403, a candidate target region is generated in the first feature map to obtain a second feature map. The second feature map includes at least one detection frame, the position of each detection frame, and the confidence level of the detected target.
检测阶段:如图4所示,将第一特征图和第二特征图进行位置敏感的候选区域池化,即同时输入第一特征图和第二特征图进行位置敏感的候选区域池化层404,得到边框回归结果405和分类结果406,这样处理之后,可以得到检出目标的置信度,以及检测框的位置,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别。Detection stage: As shown in Figure 4, the first feature map and the second feature map are pooled for position-sensitive candidate regions, that is, the first feature map and the second feature map are simultaneously input for position-sensitive candidate region pooling layer 404 , obtain the frame regression result 405 and the classification result 406, after this processing, the confidence of the detected target and the position of the detection frame can be obtained, the detection frame that meets the preset conditions is determined as the object detection frame, and the object detection frame is determined. The category of the object in the box.
最后通过极大值抑制算法,合并交并比大于阈值的检测框,输出符合要求的对象检测框(候选行人检测框)。Finally, through the maximum value suppression algorithm, the detection frames whose intersection ratio is greater than the threshold are merged, and the object detection frame (candidate pedestrian detection frame) that meets the requirements is output.
步骤S701、在T2时刻将每一候选行人检测框裁剪出来,输入到级联分类器,得到分类结果;Step S701, cutting out each candidate pedestrian detection frame at time T2, inputting it to the cascade classifier, and obtaining a classification result;
这里,级联分类器可以通过训练获得,例如在训练阶段:收集30万张检测报警小图,其中正样本6万,负样本24万。首先对这些数据进行行人/非行人二分类标注,再使用随机梯度下降算法,分别训练得到ResNet18网络和ResNet50网络。如图2所示,第一残差网络211可以使用ResNet18网络实现对图像的粗判,具有判断效率高但误判率也高的特点;第二残差网络231可以使用ResNet50网络实现对图像的细判,误判率低。这样,通过将多个分类器级联,在保持高召回的前提下,逐级分类提高精度,最终获得高精度行人检测结果。Here, the cascade classifier can be obtained through training, for example, in the training phase: collect 300,000 small pictures of detection alarms, including 60,000 positive samples and 240,000 negative samples. First, the pedestrian/non-pedestrian binary classification is performed on these data, and then the stochastic gradient descent algorithm is used to train the ResNet18 network and the ResNet50 network respectively. As shown in Fig. 2, the first residual network 211 can use the ResNet18 network to realize the rough judgment of the image, which has the characteristics of high judgment efficiency and high error rate; the second residual network 231 can use the ResNet50 network to realize the image judgment. Fine judgment, low misjudgment rate. In this way, by cascading multiple classifiers, on the premise of maintaining high recall, the classification improves the accuracy step by step, and finally obtains high-precision pedestrian detection results.
步骤S702、在T3时刻判断级联分类模型是否获取到有效图像;Step S702, at time T3, determine whether the cascaded classification model has acquired a valid image;
经过级联分类模型可以判断出待处理图像中是否存在需要的有效图像。例如:判断高快速路上禁止闯入的行人,在T3时刻可以判断级联分类模型是否获取到有闯入行人的图像。当未获取到有效图像时,不需要再对待处理图像做目标区域(高快速路区域)识别,流程结束,这样,不需要对每一输入的原始图像做目标区域识别,明显节省了硬件设备的计算力。Through the cascade classification model, it can be judged whether there is a required valid image in the image to be processed. For example, to judge pedestrians who are prohibited from intruding on a high-speed highway, at the time T3, it can be determined whether the cascade classification model has obtained images of intruding pedestrians. When no valid image is obtained, there is no need to identify the target area (high-speed road area) of the image to be processed, and the process ends. In this way, there is no need to identify the target area for each input original image, which obviously saves the cost of hardware equipment. computing power.
步骤S703、当确定级联分类器获取到有效图像时,在T4时刻将待处理图像输入语义分割模型,获得高快速路行人禁止进入区域;Step S703, when it is determined that the cascaded classifier has acquired a valid image, input the to-be-processed image into the semantic segmentation model at time T4 to obtain a pedestrian prohibited entry area on a high expressway;
这里,因为快高快速路上出现行人是一种长尾事件,即,在固定场景中该事件的发生概率极低,所以需要在确定级联分类器获取到有效图像时,再次将待处理图像输入语义分割模型,获得高快速 路行人禁止进入区域。这样,只需要对有效图像做语音分割,可以在类似的长尾事件分析中有效降低算法对算力的需求。Here, because the appearance of pedestrians on the fast-high-speed road is a long-tail event, that is, the probability of occurrence of this event in a fixed scene is extremely low, so it is necessary to input the to-be-processed image again when it is determined that the cascaded classifier has obtained a valid image. Semantic segmentation model to obtain high expressway pedestrian prohibited areas. In this way, it is only necessary to perform speech segmentation on valid images, which can effectively reduce the computing power requirements of the algorithm in similar long-tail event analysis.
参见图3,语义分割模型的输入是待处理图像301,先经过5层卷积网络(conv1、conv2、conv3、conv4和conv5)302对待处理图像301进行32倍降采样,对待处理图像进行编码;再经过4层反卷积(dconv1、dconv2、dconv3和dconv4)303对编码结果进行32倍上采样,对编码结果进行解码和语义理解,得到高快速路区域(灰色,标签为1)和非高快速路区域(黑色,标签为0)。这样,将高快速路区域标注为灰色,将非高快速路区域标注为黑色,可以达到可视化效果;将高快速路区域标注为1,将非高快速路区域标注为0,可以实现利用不同的标注区域,快速识别出入侵对象所在的位置。Referring to Fig. 3, the input of the semantic segmentation model is the image to be processed 301, which is first subjected to 32 times downsampling through a 5-layer convolutional network (conv1, conv2, conv3, conv4 and conv5) 302 to the image to be processed 301, and the image to be processed is encoded; After 4 layers of deconvolution (dconv1, dconv2, dconv3 and dconv4) 303, the encoding result is upsampled by 32 times, and the encoding result is decoded and semantically understood to obtain the high expressway area (gray, label is 1) and non-high Expressway area (black, labelled 0). In this way, marking the high-speed expressway area as gray and marking the non-high-speed expressway area as black can achieve the visualization effect; marking the high-speed expressway area as 1, and marking the non-high-speed expressway area as 0 can realize the use of different Mark the area to quickly identify the location of the intruding object.
步骤S704、在T5时刻,根据行人分类结果和高快速路行人禁止进入区域,判断行人是否闯入禁止区域;Step S704, at time T5, according to the pedestrian classification result and the pedestrian prohibited entry area of the expressway, determine whether the pedestrian has entered the prohibited area;
完成步骤S701至S704,可以获得到了行人检测框和语义分割图。After completing steps S701 to S704, the pedestrian detection frame and the semantic segmentation map can be obtained.
设行人检测框左上点为(x 1,y 1),右下点为(x 2,y 2),语义分割结果为二维矩阵G。这里,透视关系,判断行人是否进入区域,需要尽可能选择与地面尽处于同一平面的行人定位点。 Let the upper left point of the pedestrian detection frame be (x 1 , y 1 ), the lower right point be (x 2 , y 2 ), and the semantic segmentation result is a two-dimensional matrix G. Here, for the perspective relationship, to determine whether a pedestrian enters the area, it is necessary to select a pedestrian positioning point that is on the same plane as the ground as much as possible.
理论上,可以选择行人检测框底边中心点作为行人定位点,判断行人是否入侵,即以下公式(1):In theory, the center point of the bottom edge of the pedestrian detection frame can be selected as the pedestrian positioning point to determine whether the pedestrian is intruding, that is, the following formula (1):
Figure PCTCN2021087835-appb-000001
Figure PCTCN2021087835-appb-000001
实际使用中,为了获得更鲁棒的结果,通过统计行人底边中心点附近的一个区域的分割结果均值,来判断行人是否进入禁止进入区域。判断行人入侵的公式为公式(2):In actual use, in order to obtain a more robust result, the average value of the segmentation results of an area near the center point of the pedestrian's bottom edge is calculated to determine whether the pedestrian enters the forbidden area. The formula for judging pedestrian intrusion is formula (2):
Figure PCTCN2021087835-appb-000002
Figure PCTCN2021087835-appb-000002
步骤S705、在T6时刻,输出行人闯入结果。Step S705, at time T6, output the result of pedestrian intrusion.
如图5所示,展示使用入侵检测方法在快高快速路上识别出的行人/非机动车闯入图像。点击图5所示行人/非机动车闯入图像后,如图6所示,展示的行人/非机动车闯入放大图像和图像详情,例如行人/非机动车闯入的时间、地点等。As shown in Figure 5, the images of pedestrian/non-motor vehicle intrusions identified on Kuaigao Expressway using the intrusion detection method are shown. After clicking the pedestrian/non-motor vehicle intrusion image shown in Figure 5, as shown in Figure 6, the enlarged image and image details of the pedestrian/non-motor vehicle intrusion are displayed, such as the time and location of the pedestrian/non-motor vehicle intrusion.
本申请实施例提出了一种对级联事件检测的入侵检测方法,首先在T1时刻基于检测模型进行行人检测,其次在T2时刻通过级联分类器对候选目标进行二次过滤,在T3时刻判断待处理图像中是否有入侵对象,如果没有入侵对象则不再对原始图像进行语义分割,在T4时刻将有入侵对象的待处理图像进行语义分割,判断目标是否出现在禁止闯入区域,在T5时刻同时输入对象检测框和入侵检测区域进行判断,在T6时刻完成判断。该方案通过将多个算法模块级联,在不显著增加算力需求的前提下,实现了全自动高快速路行人入侵检测。这样,使用语义分割来识别禁止进入区域,不需要人工标注。使用多模型级联方式实现,可单独升级检测、分类、分割的算法模块,对于长尾事件检测,降低了算法对算力的需求。The embodiment of the present application proposes an intrusion detection method for cascading event detection. First, pedestrian detection is performed based on the detection model at time T1, and secondly, candidate targets are filtered through the cascade classifier at time T2, and then judged at time T3. Whether there is an intrusion object in the image to be processed, if there is no intrusion object, the original image will not be semantically segmented. At T4, the to-be-processed image with an intrusion object will be semantically segmented to determine whether the target appears in the prohibited intrusion area. At T5 Input the object detection frame and the intrusion detection area at the same time for judgment, and complete the judgment at time T6. By cascading multiple algorithm modules, the scheme realizes fully automatic pedestrian intrusion detection on high-speed roads without significantly increasing the computing power requirement. In this way, no-entry areas are identified using semantic segmentation without human annotation. Implemented in a multi-model cascade, the algorithm modules for detection, classification, and segmentation can be upgraded independently. For long-tail event detection, the algorithm's computing power requirements are reduced.
基于前述的实施例,本申请实施例提供一种入侵检测装置,该装置包括所包括的各模块、以及各模块所包括的各子模块,可以通过计算机设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)或现场可编程门阵列(FPGA)等。Based on the foregoing embodiments, the embodiments of the present application provide an intrusion detection apparatus, which includes each module included and each submodule included in each module, which can be implemented by a processor in a computer device; of course, it can also be It is implemented by a specific logic circuit; in the process of implementation, the processor may be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP) or a field programmable gate array (FPGA) or the like.
本申请实施例提供一种入侵检测装置,图8为本申请实施例入侵检测装置结构组成示意图,如图8所示,所述装置800包括:An embodiment of the present application provides an intrusion detection device. FIG. 8 is a schematic diagram of the structure and composition of the intrusion detection device according to an embodiment of the present application. As shown in FIG. 8 , the device 800 includes:
获得模块810、配置为从待处理的视频流中获得所述待处理图像;Obtaining module 810, configured to obtain the to-be-processed image from the to-be-processed video stream;
检测模块820、配置为对所述待处理图像中的对象进行检测,得到至少一个对象检测框;A detection module 820, configured to detect objects in the to-be-processed image to obtain at least one object detection frame;
第一确定模块830、配置为确定所述对象检测框中是否存在所述预设的入侵对象;a first determining module 830, configured to determine whether the preset intrusion object exists in the object detection frame;
识别模块840、配置为在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;The identification module 840 is configured to identify the to-be-processed image when it is determined that a preset intrusion object exists in the object detection frame to obtain an intrusion detection area;
第二确定模块850、配置为基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件。The second determination module 850 is configured to determine whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area.
在一些实施例中,所述检测模块820包括检测子模块、第一确定子模块和第二确定子模块,其中,所述检测子模块,配置为对所述待处理图像中的对象进行检测,得到至少一个对象检测框、每一所述对象检测框的位置和每一所述对象检测框中对象的类别;所述第一确定子模块,配置为在基于所述对象的类别确定任一所述对象检测框中存在所述预设的入侵对象情况下,确定所述待处理图像中存在所述预设的入侵对象;所述第二确定子模块,配置为基于存在所述预设的入侵对象的对象检测框的位置,确定所述预设的入侵对象的位置。In some embodiments, the detection module 820 includes a detection sub-module, a first determination sub-module and a second determination sub-module, wherein the detection sub-module is configured to detect an object in the image to be processed, Obtain at least one object detection frame, the position of each of the object detection frames and the category of the object in each of the object detection frames; the first determination sub-module is configured to determine any one of the objects based on the category of the object. In the case where the preset intrusion object exists in the object detection frame, it is determined that the preset intrusion object exists in the to-be-processed image; the second determination sub-module is configured to be based on the existence of the preset intrusion object. The position of the object detection frame of the object determines the position of the preset intrusion object.
在一些实施例中,所述检测子模块包括深度卷积网络、区域生成网络和池化层,其中,所述深度卷积网络,配置为对所述待处理图像进行特征提取,得到第一特征图;所述区域生成网络,配置为在所述第一特征图中生成候选目标区域,得到第二特征图;所述第二特征图包括至少一个检测框和每一所述检测框的位置、置信度;所述池化层,配置为在将所述第一特征图和所述第二特征图进行位置敏感的候选区域池化的过程中,基于所述至少一个检测框和每一所述检测框的置信度,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别。In some embodiments, the detection sub-module includes a deep convolutional network, a region generation network and a pooling layer, wherein the deep convolutional network is configured to perform feature extraction on the to-be-processed image to obtain the first feature The region generation network is configured to generate candidate target regions in the first feature map to obtain a second feature map; the second feature map includes at least one detection frame and the position of each detection frame, confidence; the pooling layer is configured to perform position-sensitive candidate region pooling on the first feature map and the second feature map based on the at least one detection frame and each of the For the confidence level of the detection frame, the detection frame that satisfies the preset condition is determined as the object detection frame, and the category of the object in the object detection frame is determined.
在一些实施例中,所述检测子模块包括非极大值抑制算法单元,采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框。In some embodiments, the detection sub-module includes a non-maximum suppression algorithm unit, using a non-maximum suppression algorithm, based on the confidence of each of the detection frames and the at least one of the detection frames. The intersection ratio between the detection frames determines the detection frame that meets the preset condition as the object detection frame.
在一些实施例中,所述非极大值抑制算法单元包括第一确定子单元、第二确定子单元、删除子单元、第三确定子单元和第四确定子单元,其中,所述第一确定子单元,配置为基于每一所述检测框的置信度,将至少一个所述检测框中置信度最大的检测框,确定为目标检测框;将所述目标检测框确定为一所述对象检测框;所述第二确定子单元,配置为确定所述目标检测框与每一其他检测框的交并比;其中,所述其他检测框是指所述至少一个所述检测框中除所述目标检测框之外的检测框;所述删除子单元,配置为将交并比大于阈值的其他检测框从所述至少一个所述检测框删除,得到候选检测框集合;所述第三确定子单元,配置为将所述候选检测框集合中除所述目标检测框之外的置信度最大的检测框,确定为新的目标检测框;将所述新的目标检测框确定为一所述对象检测框;所述第四确定子单元,配置为确定所述新的目标检测框与每一新的其他检测框的交并比;其中,所述每一新的其他检测框是指所述候选检测框集合中除所述新的目标检测框之外的检测框;所述删除子单元,还配置为将交并比大于阈值的新的其他检测框从所述候选检测框集合中删除,得到新的候选检测框集合;以此类推,得到所述对象检测框。In some embodiments, the non-maximum value suppression algorithm unit includes a first determination subunit, a second determination subunit, a deletion subunit, a third determination subunit, and a fourth determination subunit, wherein the first determination subunit A determination subunit, configured to, based on the confidence of each of the detection frames, determine at least one detection frame with the highest confidence in the detection frame as a target detection frame; determine the target detection frame as an object detection frame; the second determination subunit is configured to determine the intersection ratio of the target detection frame and each other detection frame; wherein, the other detection frame refers to the at least one detection frame except all the detection frames. a detection frame other than the target detection frame; the deletion subunit is configured to delete other detection frames whose intersection ratio is greater than a threshold from the at least one detection frame to obtain a candidate detection frame set; the third determination a subunit, configured to determine the detection frame with the highest confidence in the candidate detection frame set except the target detection frame as a new target detection frame; determine the new target detection frame as a an object detection frame; the fourth determination subunit is configured to determine the intersection ratio of the new target detection frame and each new other detection frame; wherein, the each new other detection frame refers to the a detection frame other than the new target detection frame in the candidate detection frame set; the deletion subunit is further configured to delete other new detection frames whose intersection ratio is greater than a threshold from the candidate detection frame set, A new set of candidate detection frames is obtained; and by analogy, the object detection frame is obtained.
在一些实施例中,所述第一确定模块830包括第一分类子模块,第二分类子模块和第三确定子模块,其中,所述第二分类子模块,采用第一级分类器,基于所述对象的类别将目标类别对应的对 象检测框进行第一分类,得到第一分类结果;所述第二分类子模块,采用与所述第一级分类器级联的第二级分类器,基于所述第一分类结果,对满足预设条件的对象检测框进行第二分类,得到第二分类结果;所述第三确定子模块,在基于所述第二分类结果确定任一所述对象检测框中存在所述预设的入侵对象时,确定所述对象检测框中存在所述预设的入侵对象。所述第一级分类器与所述第二级分类器之间具有以下关系:所述第一级分类器的分类精度比所述第二级分类器的分类精度低;所述第一级分类器中卷积层的层数比所述第二级分类器中卷积层的层数少;所述第一级分类器的置信度比所述第二级分类器的置信度低。In some embodiments, the first determination module 830 includes a first classification sub-module, a second classification sub-module and a third determination sub-module, wherein the second classification sub-module uses a first-level classifier based on The category of the object performs a first classification on the object detection frame corresponding to the target category to obtain a first classification result; the second classification sub-module adopts a second-level classifier cascaded with the first-level classifier, Based on the first classification result, a second classification is performed on the object detection frame that meets the preset condition, and a second classification result is obtained; the third determination sub-module determines any one of the objects based on the second classification result. When the preset intrusion object exists in the detection frame, it is determined that the preset intrusion object exists in the object detection frame. The first-level classifier and the second-level classifier have the following relationship: the classification accuracy of the first-level classifier is lower than the classification accuracy of the second-level classifier; the first-level classification The number of convolutional layers in the classifier is less than the number of convolutional layers in the second-level classifier; the confidence level of the first-level classifier is lower than that of the second-level classifier.
在一些实施例中,所述识别模块840还配置为采用卷积神经网络模型对所述待处理图像进行语义分割,得到所述入侵检测区域。In some embodiments, the identification module 840 is further configured to use a convolutional neural network model to perform semantic segmentation on the to-be-processed image to obtain the intrusion detection area.
在一些实施例中,所述第二确定模块850包括判断子模块和第四确定子模块,其中,所述判断子模块,配置为基于所述预设的入侵对象的位置判断所述预设的入侵对象是否位于所述入侵检测区域内;所述第四确定子模块,配置为响应于所述预设的入侵对象位于所述入侵检测区域之内,确定发生所述入侵事件;或响应于所述预设的入侵对象位于所述入侵检测区域之外,确定未发生所述入侵事件。In some embodiments, the second determination module 850 includes a determination sub-module and a fourth determination sub-module, wherein the determination sub-module is configured to determine the preset intrusion object based on the position of the preset intrusion object whether the intrusion object is located in the intrusion detection area; the fourth determination submodule is configured to determine that the intrusion event occurs in response to the preset intrusion object being located in the intrusion detection area; or in response to the intrusion detection area; The preset intrusion object is located outside the intrusion detection area, and it is determined that the intrusion event does not occur.
在一些实施例中,所述第二确定模块850还包括第五确定子模块、第六确定子模块和第七确定子模块,其中,所述第五确定子模块,配置为将存在所述预设的入侵对象的对象检测框,确定为目标检测框;所述第六确定子模块,配置为将所述目标检测框的底边的中心点,确定为所述预设的入侵对象的位置;所述第七确定子模块,配置为基于所述目标检测框的底边的中心点和所述入侵检测区域之间的相对位置关系,确定是否发生入侵事件。In some embodiments, the second determination module 850 further includes a fifth determination sub-module, a sixth determination sub-module and a seventh determination sub-module, wherein the fifth determination sub-module is configured to The set object detection frame of the intrusion object is determined as the target detection frame; the sixth determination submodule is configured to determine the center point of the bottom edge of the target detection frame as the preset position of the intrusion object; The seventh determination sub-module is configured to determine whether an intrusion event occurs based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area.
在一些实施例中,所述入侵检测装置还包括输出模块,配置为响应于发生所述入侵事件,输出告警标识。In some embodiments, the intrusion detection apparatus further includes an output module configured to output an alarm identification in response to the occurrence of the intrusion event.
基于前述的实施例,所述入侵检测装置还包括记录模块和发送模块,其中,所述记录模块,配置为响应于发生所述入侵事件,基于所述入侵对象的类别和所述入侵检测区域对所述入侵事件进行记录,得到入侵记录;所述发送模块,配置为将所述入侵记录进行存储或者发送给关联的终端。Based on the foregoing embodiment, the intrusion detection apparatus further includes a recording module and a sending module, wherein the recording module is configured to, in response to the occurrence of the intrusion event, pair the intrusion detection area based on the category of the intrusion object and the intrusion detection area. The intrusion event is recorded to obtain an intrusion record; the sending module is configured to store or send the intrusion record to an associated terminal.
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。The descriptions of the above apparatus embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to the method embodiments. For technical details not disclosed in the device embodiments of the present application, please refer to the descriptions of the method embodiments of the present application for understanding.
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的入侵检测方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得计算机设备(可以是手机、平板电脑、笔记本电脑、台式计算机、机器人、服务器等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。It should be noted that, in the embodiments of the present application, if the above-mentioned intrusion detection method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of software products in essence or the parts that contribute to related technologies. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a robot, a server, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes. As such, the embodiments of the present application are not limited to any specific combination of hardware and software.
对应地,本申请实施例提供一种计算机可读存储介质,可为易失性存储介质或非易失性存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的入侵检测方法中的步骤。Correspondingly, the embodiments of the present application provide a computer-readable storage medium, which may be a volatile storage medium or a non-volatile storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the foregoing embodiments are implemented The steps in the intrusion detection method provided in .
对应地,本申请实施例提供一种计算机设备,图9为本申请实施例计算机设备的一种硬件实体示意图,如图9所示,该设备900的硬件实体包括:包括存储器901和处理器902,所述存储器901 存储有可在处理器902上运行的计算机程序,所述处理器902执行所述程序时实现上述实施例中提供的入侵检测方法中的步骤。Correspondingly, an embodiment of the present application provides a computer device, and FIG. 9 is a schematic diagram of a hardware entity of the computer device according to the embodiment of the present application. As shown in FIG. 9 , the hardware entity of the device 900 includes: a memory 901 and a processor 902 , the memory 901 stores a computer program that can be executed on the processor 902, and when the processor 902 executes the program, the steps in the intrusion detection method provided in the above embodiments are implemented.
存储器901配置为存储由处理器902可执行的指令和应用,还可以缓存待处理器902以及计算机设备900中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。The memory 901 is configured to store instructions and applications executable by the processor 902, and can also cache data to be processed or processed by the processor 902 and various modules in the computer device 900 (eg, image data, audio data, voice communication data and Video communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。It should be pointed out here that the descriptions of the above storage medium and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to the method embodiments. For technical details not disclosed in the embodiments of the storage medium and device of the present application, please refer to the description of the method embodiments of the present application to understand.
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。It is to be understood that reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic associated with the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation. The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit; it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may all be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, the execution includes: The steps of the above method embodiments; and the aforementioned storage medium includes: a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存 储介质中,包括若干指令用以使得计算机设备(可以是手机、平板电脑、笔记本电脑、台式计算机、机器人、服务器等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated units of the present application are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of software products in essence or the parts that contribute to related technologies. The computer software products are stored in a storage medium and include several instructions to make A computer device (which may be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a robot, a server, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined under the condition of no conflict to obtain new method embodiments.
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only the embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.
工业实用性Industrial Applicability
在本公开实施例中,从待处理的视频流中获得所述待处理图像,这样可以利用图像采集设备采集的视频流中的图像作为输入,对视频流进行分析,能够有效提高图像采集设备的利用率;先对待处理图像中的对象进行检测,得到至少一个对象检测框,然后确定对象检测框中是否存在预设的入侵对象。这样,提供的检测模型和分类模型实现了解耦,可以在算法落地过程中,针对特殊场景定制分类模型,快速达到预期性能,解除了对单一检测模型的精度依赖,使得算法速度和精度获得了很大的提升。进一步地,由于检测模型和分类模型实现了解耦,对于新场景误报优化,仅需要加入误报数据训练新分类器,与已有检测模型级联即可,适合算法落地的快速升级迭代,通过级联方式过滤误报,可以极大提高长尾事件的检测精度;在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;基于预设的入侵对象的位置和入侵检测区域,确定是否发生入侵事件。这样,只在确认有入侵对象的待处理图像中进行入侵检测区域的识别,而无需对所有的待处理图像进行识别,能够显著减少对硬件设备上计算力的要求,从而实现了高效全自动检测入侵对象是否入侵入侵检测区域,进而无需人工标注区域,方便大规模上线应用。In the embodiment of the present disclosure, the to-be-processed image is obtained from the to-be-processed video stream, so that the image in the video stream collected by the image capture device can be used as input to analyze the video stream, which can effectively improve the performance of the image capture device. Utilization rate: firstly, the object in the image to be processed is detected to obtain at least one object detection frame, and then it is determined whether there is a preset intrusion object in the object detection frame. In this way, the provided detection model and classification model are decoupled, and the classification model can be customized for special scenarios during the implementation of the algorithm to quickly achieve the expected performance. Great improvement. Further, since the detection model and the classification model are decoupled, for the optimization of false positives in new scenarios, it is only necessary to add false positive data to train a new classifier, which can be cascaded with the existing detection model, which is suitable for rapid upgrade iterations of algorithm implementation. Filtering false positives in a cascading manner can greatly improve the detection accuracy of long-tail events; when it is determined that there is a preset intrusion object in the object detection frame, the to-be-processed image is identified to obtain an intrusion detection area ; Determine whether an intrusion event occurs based on the preset intrusion object position and intrusion detection area. In this way, the identification of the intrusion detection area is performed only in the to-be-processed images that are confirmed to have intrusion objects, without the need to identify all the to-be-processed images, which can significantly reduce the computing power requirements on hardware devices, thereby realizing efficient and fully automatic detection. Whether the intrusion object intrudes into the intrusion detection area, and then there is no need to manually mark the area, which is convenient for large-scale online applications.

Claims (27)

  1. 一种入侵检测方法,所述方法由计算机设备执行,所述方法包括:An intrusion detection method, the method is performed by a computer device, and the method includes:
    从待处理的视频流中获得所述待处理图像;obtain the to-be-processed image from the to-be-processed video stream;
    对所述待处理图像中的对象进行检测,得到至少一个对象检测框;Detecting objects in the to-be-processed image to obtain at least one object detection frame;
    确定所述对象检测框中是否存在所述预设的入侵对象;determining whether the preset intrusion object exists in the object detection frame;
    在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;In the case of determining that a preset intrusion object exists in the object detection frame, identify the to-be-processed image to obtain an intrusion detection area;
    基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件。Based on the preset position of the intrusion object and the intrusion detection area, it is determined whether an intrusion event occurs.
  2. 根据权利要求1所述的方法,其中,所述对所述待处理图像中的对象进行检测,得到至少一个对象检测框,包括:The method according to claim 1, wherein the detecting an object in the to-be-processed image to obtain at least one object detection frame comprises:
    对所述待处理图像中的对象进行检测,得到至少一个对象检测框、每一所述对象检测框的位置和每一所述对象检测框中对象的类别;Detecting objects in the to-be-processed image to obtain at least one object detection frame, the position of each of the object detection frames, and the category of the object in each of the object detection frames;
    在基于所述对象的类别确定任一所述对象检测框中存在所述预设的入侵对象情况下,确定所述待处理图像中存在所述预设的入侵对象;In the case of determining that the preset intrusion object exists in any of the object detection frames based on the category of the object, determining that the preset intrusion object exists in the image to be processed;
    基于存在所述预设的入侵对象的对象检测框的位置,确定所述预设的入侵对象的位置。The position of the preset intrusion object is determined based on the position of the object detection frame in which the preset intrusion object exists.
  3. 根据权利要求2所述的方法,其中,所述对所述待处理图像中的对象进行检测,得到至少一个对象检测框、每一所述对象检测框的位置和每一所述对象检测框中对象的类别,包括:The method according to claim 2, wherein, by detecting an object in the to-be-processed image, at least one object detection frame, a position of each of the object detection frames, and each of the object detection frames are obtained. Classes of objects, including:
    基于深度卷积网络,对所述待处理图像进行特征提取,得到第一特征图;Based on a deep convolutional network, feature extraction is performed on the to-be-processed image to obtain a first feature map;
    基于区域生成网络,在所述第一特征图中生成候选目标区域,得到第二特征图;所述第二特征图包括至少一个检测框和每一所述检测框的位置、置信度;Based on the region generation network, a candidate target region is generated in the first feature map to obtain a second feature map; the second feature map includes at least one detection frame and the position and confidence of each detection frame;
    基于池化层在将所述第一特征图和所述第二特征图进行位置敏感的候选区域池化的过程中,基于所述至少一个检测框和每一所述检测框的置信度,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别。In the process of pooling the first feature map and the second feature map into position-sensitive candidate regions based on the pooling layer, based on the confidence of the at least one detection frame and each detection frame, the The detection frame that satisfies the preset condition is determined as the object detection frame, and the category of the object in the object detection frame is determined.
  4. 根据权利要求3所述的方法,其中,所述基于所述至少一个检测框和每一所述检测框的置信度,将满足预设条件的检测框确定为对象检测框,包括:The method according to claim 3, wherein, based on the confidence of the at least one detection frame and each of the detection frames, determining a detection frame that satisfies a preset condition as an object detection frame comprises:
    采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框。Using a non-maximum suppression algorithm, based on the confidence of each of the detection frames and the intersection ratio between the detection frames in the at least one detection frame, a detection frame that satisfies a preset condition is determined as an object Check box.
  5. 根据权利要求4所述的方法,其中,所述采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框,包括:5. The method of claim 4, wherein said employing a non-maximum suppression algorithm is based on a confidence level of each said detection frame and an intersection between said detection frames in said at least one said detection frame The detection frame that meets the preset conditions is determined as the object detection frame, including:
    基于每一所述检测框的置信度,将至少一个所述检测框中置信度最大的检测框,确定为目标检测框;将所述目标检测框确定为一所述对象检测框;Based on the confidence of each of the detection frames, at least one detection frame with the highest confidence in the detection frame is determined as a target detection frame; the target detection frame is determined as an object detection frame;
    确定所述目标检测框与每一其他检测框的交并比;其中,所述其他检测框是指所述至少一个所述检测框中除所述目标检测框之外的检测框;determining the intersection ratio of the target detection frame and each other detection frame; wherein, the other detection frame refers to a detection frame other than the target detection frame in the at least one of the detection frames;
    将交并比大于阈值的其他检测框从所述至少一个所述检测框删除,得到候选检测框集合;Delete other detection frames whose intersection ratio is greater than the threshold from the at least one detection frame to obtain a candidate detection frame set;
    将所述候选检测框集合中除所述目标检测框之外的置信度最大的检测框,确定为新的目标检测框;将所述新的目标检测框确定为一所述对象检测框;Determining the detection frame with the highest confidence except the target detection frame in the candidate detection frame set as a new target detection frame; determining the new target detection frame as a described object detection frame;
    确定所述新的目标检测框与每一新的其他检测框的交并比;其中,所述每一新的其他检测框是指所述候选检测框集合中除所述新的目标检测框之外的检测框;Determine the intersection ratio of the new target detection frame and each new other detection frame; wherein, the each new other detection frame refers to the candidate detection frame set except the new target detection frame. outside the detection frame;
    将交并比大于阈值的新的其他检测框从所述候选检测框集合中删除,得到新的候选检测框集合;Delete other new detection frames whose intersection ratio is greater than the threshold from the candidate detection frame set to obtain a new candidate detection frame set;
    以此类推,得到所述对象检测框。By analogy, the object detection frame is obtained.
  6. 根据权利要求1所述的方法,其中,所述确定所述对象检测框中是否存在所述预设的入侵对象,包括:The method according to claim 1, wherein the determining whether the preset intrusion object exists in the object detection frame comprises:
    采用第一级分类器,基于所述对象的类别将目标类别对应的对象检测框进行第一分类,得到第一分类结果;Using the first-level classifier, the object detection frame corresponding to the target category is firstly classified based on the category of the object to obtain a first classification result;
    采用与所述第一级分类器级联的第二级分类器,基于所述第一分类结果,对满足预设条件的对象检测框进行第二分类,得到第二分类结果;Using the second-level classifier cascaded with the first-level classifier, based on the first classification result, perform a second classification on the object detection frame that meets the preset condition, and obtain a second classification result;
    在基于所述第二分类结果确定任一所述对象检测框中存在所述预设的入侵对象时,确定所述对象检测框中存在所述预设的入侵对象。When it is determined that the preset intrusion object exists in any of the object detection frames based on the second classification result, it is determined that the preset intrusion object exists in the object detection frame.
  7. 根据权利要求6所述的方法,其中,所述第一级分类器与所述第二级分类器之间具有以下关系:The method of claim 6, wherein the first-level classifier and the second-level classifier have the following relationship:
    所述第一级分类器的分类精度比所述第二级分类器的分类精度低;The classification accuracy of the first-level classifier is lower than the classification accuracy of the second-level classifier;
    所述第一级分类器中卷积层的层数比所述第二级分类器中卷积层的层数少;The number of convolutional layers in the first-level classifier is less than the number of convolutional layers in the second-level classifier;
    所述第一级分类器的置信度比所述第二级分类器的置信度低。The confidence level of the first-level classifier is lower than the confidence level of the second-level classifier.
  8. 根据权利要求1至7任一项所述的方法,其中,所述对所述待处理图像进行识别,得到入侵检测区域,包括:The method according to any one of claims 1 to 7, wherein the identifying the to-be-processed image to obtain an intrusion detection area comprises:
    采用卷积神经网络模型对所述待处理图像进行语义分割,得到所述入侵检测区域。The image to be processed is semantically segmented by using a convolutional neural network model to obtain the intrusion detection area.
  9. 根据权利要求1至8任一项所述的方法,其中,所述基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件,包括:The method according to any one of claims 1 to 8, wherein the determining whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area comprises:
    基于所述预设的入侵对象的位置判断所述预设的入侵对象是否位于所述入侵检测区域内;Determine whether the preset intrusion object is located in the intrusion detection area based on the position of the preset intrusion object;
    响应于所述预设的入侵对象位于所述入侵检测区域之内,确定发生所述入侵事件;或响应于所述预设的入侵对象位于所述入侵检测区域之外,确定未发生所述入侵事件。In response to the preset intrusion object being located within the intrusion detection area, it is determined that the intrusion event has occurred; or in response to the preset intrusion object being located outside the intrusion detection area, it is determined that the intrusion has not occurred event.
  10. 根据权利要求1至7任一所述的方法,其中,所述基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件,包括:The method according to any one of claims 1 to 7, wherein the determining whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area comprises:
    将存在所述预设的入侵对象的对象检测框,确定为目标检测框;Determining the object detection frame in which the preset intrusion object exists as the target detection frame;
    将所述目标检测框的底边的中心点,确定为所述预设的入侵对象的位置;Determining the center point of the bottom edge of the target detection frame as the position of the preset intrusion object;
    基于所述目标检测框的底边的中心点和所述入侵检测区域之间的相对位置关系,确定是否发生入侵事件。Based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area, it is determined whether an intrusion event occurs.
  11. 根据权利要求1至10任一所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 10, wherein the method further comprises:
    响应于发生所述入侵事件,输出告警标识。In response to the occurrence of the intrusion event, an alert indicator is output.
  12. 根据权利要求1至11任一所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 11, wherein the method further comprises:
    响应于发生所述入侵事件,基于所述入侵对象的类别和所述入侵检测区域对所述入侵事件进行记录,得到入侵记录;In response to the occurrence of the intrusion event, recording the intrusion event based on the category of the intrusion object and the intrusion detection area to obtain an intrusion record;
    将所述入侵记录进行存储或者发送给关联的终端。The intrusion record is stored or sent to an associated terminal.
  13. 一种入侵检测装置,所述装置包括:An intrusion detection device comprising:
    获得模块,配置为从待处理的视频流中获得所述待处理图像;an obtaining module, configured to obtain the to-be-processed image from the to-be-processed video stream;
    检测模块,配置为对所述待处理图像中的对象进行检测,得到至少一个对象检测框;a detection module, configured to detect objects in the to-be-processed image to obtain at least one object detection frame;
    第一确定模块,配置为确定所述对象检测框中是否存在所述预设的入侵对象;a first determining module, configured to determine whether the preset intrusion object exists in the object detection frame;
    识别模块,配置为在确定所述对象检测框中存在预设的入侵对象的情况下,对所述待处理图像进行识别,得到入侵检测区域;an identification module, configured to identify the to-be-processed image to obtain an intrusion detection area when it is determined that a preset intrusion object exists in the object detection frame;
    第二确定模块,配置为基于所述预设的入侵对象的位置和所述入侵检测区域,确定是否发生入侵事件。The second determination module is configured to determine whether an intrusion event occurs based on the preset position of the intrusion object and the intrusion detection area.
  14. 根据权利要求13所述的装置,其中,所述检测模块包括:The apparatus of claim 13, wherein the detection module comprises:
    检测子模块,配置为对所述待处理图像中的对象进行检测,得到至少一个对象检测框、每一所述对象检测框的位置和每一所述对象检测框中对象的类别;a detection submodule, configured to detect objects in the to-be-processed image to obtain at least one object detection frame, the position of each of the object detection frames, and the category of the object in each of the object detection frames;
    第一确定子模块,配置为在基于所述对象的类别确定任一所述对象检测框中存在所述预设的入侵对象情况下,确定所述待处理图像中存在所述预设的入侵对象;a first determination submodule, configured to determine that the preset intrusion object exists in the image to be processed when it is determined that the preset intrusion object exists in any of the object detection frames based on the category of the object ;
    第二确定子模块,配置为基于存在所述预设的入侵对象的对象检测框的位置,确定所述预设的入侵对象的位置。The second determination sub-module is configured to determine the position of the preset intrusion object based on the position of the object detection frame in which the preset intrusion object exists.
  15. 根据权利要求14所述的装置,其中,所述检测子模块包括:The apparatus of claim 14, wherein the detection submodule comprises:
    深度卷积网络,配置为对所述待处理图像进行特征提取,得到第一特征图;a deep convolutional network, configured to perform feature extraction on the to-be-processed image to obtain a first feature map;
    区域生成网络,配置为在所述第一特征图中生成候选目标区域,得到第二特征图;第二特征图包括至少一个检测框和每一所述检测框的位置、置信度;a region generation network, configured to generate candidate target regions in the first feature map to obtain a second feature map; the second feature map includes at least one detection frame and the position and confidence of each of the detection frames;
    池化层,配置为在将所述第一特征图和所述第二特征图进行位置敏感的候选区域池化的过程中,基于所述至少一个检测框和每一所述检测框的置信度,将满足预设条件的检测框确定为对象检测框,并确定所述对象检测框中对象的类别。a pooling layer, configured to perform position-sensitive candidate region pooling on the first feature map and the second feature map, based on the confidence level of the at least one detection frame and each of the detection frames , determining a detection frame that satisfies a preset condition as an object detection frame, and determining the category of the object in the object detection frame.
  16. 根据权利要求15所述的装置,其中,所述检测子模块包括非极大值抑制算法单元,采用非极大值抑制算法,基于每一所述检测框的置信度和所述至少一个所述检测框中所述检测框之间的交并比,将满足预设条件的检测框确定为对象检测框。The apparatus according to claim 15, wherein the detection sub-module includes a non-maximum suppression algorithm unit, and a non-maximum suppression algorithm is used to base on the confidence of each of the detection frames and the at least one of the The intersection ratio between the detection frames in the detection frame determines the detection frame satisfying the preset condition as the object detection frame.
  17. 根据权利要求16所述的装置,其中,所述非极大值抑制算法单元包括:The apparatus of claim 16, wherein the non-maximum suppression algorithm unit comprises:
    第一确定子单元,配置为基于每一所述检测框的置信度,将至少一个所述检测框中置信度最大的检测框,确定为目标检测框;将所述目标检测框确定为一所述对象检测框;The first determination subunit is configured to, based on the confidence of each detection frame, determine at least one detection frame with the highest confidence in the detection frame as a target detection frame; determine the target detection frame as a the object detection frame;
    第二确定子单元,配置为确定所述目标检测框与每一其他检测框的交并比;其中,所述其他检测框是指所述至少一个所述检测框中除所述目标检测框之外的检测框;The second determination subunit is configured to determine the intersection ratio of the target detection frame and each other detection frame; wherein, the other detection frame refers to the at least one detection frame except the target detection frame. outside the detection frame;
    删除子单元,配置为将交并比大于阈值的其他检测框从所述至少一个所述检测框删除,得到候选检测框集合;a deletion subunit, configured to delete other detection frames whose intersection ratio is greater than a threshold from the at least one detection frame to obtain a candidate detection frame set;
    第三确定子单元,配置为将所述候选检测框集合中除所述目标检测框之外的置信度最大的检测框,确定为新的目标检测框;将所述新的目标检测框确定为一所述对象检测框;The third determination subunit is configured to determine the detection frame with the highest confidence in the candidate detection frame set except the target detection frame as a new target detection frame; determine the new target detection frame as the object detection frame;
    第四确定子单元,配置为确定所述新的目标检测框与每一新的其他检测框的交并比;其中,所述每一新的其他检测框是指所述候选检测框集合中除所述新的目标检测框之外的检测框;The fourth determination subunit is configured to determine the intersection ratio of the new target detection frame and each new other detection frame; wherein, the each new other detection frame refers to the addition of the candidate detection frame set detection frames other than the new target detection frame;
    删除子单元,还配置为将交并比大于阈值的新的其他检测框从所述候选检测框集合中删除,得到新的候选检测框集合;The deletion subunit is also configured to delete other new detection frames whose intersection ratio is greater than the threshold from the candidate detection frame set to obtain a new candidate detection frame set;
    以此类推,得到所述对象检测框。By analogy, the object detection frame is obtained.
  18. 根据权利要求13所述的装置,其中,所述第一确定模块包括:The apparatus of claim 13, wherein the first determining module comprises:
    第二分类子模块,采用第一级分类器,基于所述对象的类别将目标类别对应的对象检测框进行第一分类,得到第一分类结果;The second classification sub-module adopts a first-level classifier to perform a first classification on the object detection frame corresponding to the target class based on the class of the object to obtain a first classification result;
    第二分类子模块,采用与所述第一级分类器级联的第二级分类器,基于所述第一分类结果,对满足预设条件的对象检测框进行第二分类,得到第二分类结果;The second classification sub-module adopts the second-level classifier cascaded with the first-level classifier, and based on the first classification result, performs a second classification on the object detection frame that meets the preset condition, and obtains a second classification result;
    第三确定子模块,在基于所述第二分类结果确定任一所述对象检测框中存在所述预设的入侵对象时,确定所述对象检测框中存在所述预设的入侵对象。The third determination sub-module, when it is determined based on the second classification result that the preset intrusion object exists in any of the object detection frames, determines that the preset intrusion object exists in the object detection frame.
  19. 根据权利要求18所述的装置,其中,所述第一级分类器与所述第二级分类器之间具有以下关系:所述第一级分类器的分类精度比所述第二级分类器的分类精度低;所述第一级分类器中卷积层的层数比所述第二级分类器中卷积层的层数少;所述第一级分类器的置信度比所述第二级分类器的置信度低。The apparatus according to claim 18, wherein the first-level classifier and the second-level classifier have the following relationship: the classification accuracy of the first-level classifier is higher than that of the second-level classifier The classification accuracy is low; the number of convolutional layers in the first-level classifier is less than that in the second-level classifier; The confidence of the secondary classifier is low.
  20. 根据权利要求13至19任一项所述的装置,其中,所述识别模块还配置为采用卷积神经网络模型对所述待处理图像进行语义分割,得到所述入侵检测区域。The apparatus according to any one of claims 13 to 19, wherein the identification module is further configured to use a convolutional neural network model to perform semantic segmentation on the to-be-processed image to obtain the intrusion detection area.
  21. 根据权利要求13至20任一项所述的装置,其中,所述第二确定模块包括:The apparatus according to any one of claims 13 to 20, wherein the second determining module comprises:
    判断子模块,配置为基于所述预设的入侵对象的位置判断所述预设的入侵对象是否位于所述入侵检测区域内;a judgment submodule, configured to judge whether the preset intrusion object is located in the intrusion detection area based on the position of the preset intrusion object;
    第四确定子模块,配置为响应于所述预设的入侵对象位于所述入侵检测区域之内,确定发生所述入侵事件;或响应于所述预设的入侵对象位于所述入侵检测区域之外,确定未发生所述入侵事件。The fourth determination sub-module is configured to determine that the intrusion event occurs in response to the preset intrusion object being located within the intrusion detection area; or in response to the preset intrusion object being located within the intrusion detection area. Additionally, it is determined that the intrusion event did not occur.
  22. 根据权利要求13至19任一项所述的装置,其中,所述第二确定模块还包括:The apparatus according to any one of claims 13 to 19, wherein the second determining module further comprises:
    第五确定子模块,配置为将存在所述预设的入侵对象的对象检测框,确定为目标检测框;a fifth determination submodule, configured to determine the object detection frame in which the preset intrusion object exists as the target detection frame;
    第六确定子模块,配置为将所述目标检测框的底边的中心点,确定为所述预设的入侵对象的位置;The sixth determination sub-module is configured to determine the center point of the bottom edge of the target detection frame as the position of the preset intrusion object;
    第七确定子模块,配置为基于所述目标检测框的底边的中心点和所述入侵检测区域之间的相对位置关系,确定是否发生入侵事件。A seventh determination submodule is configured to determine whether an intrusion event occurs based on the relative positional relationship between the center point of the bottom edge of the target detection frame and the intrusion detection area.
  23. 根据权利要求13至22任一项所述的装置,其中,所述入侵检测装置还包括输出模块,配置为响应于发生所述入侵事件,输出告警标识。The apparatus according to any one of claims 13 to 22, wherein the intrusion detection apparatus further comprises an output module configured to output an alarm identification in response to the occurrence of the intrusion event.
  24. 根据权利要求13至23任一项所述的装置,其中,所述入侵检测装置还包括:The device according to any one of claims 13 to 23, wherein the intrusion detection device further comprises:
    记录模块,配置为响应于发生所述入侵事件,基于所述入侵对象的类别和所述入侵检测区域对所述入侵事件进行记录,得到入侵记录;a recording module, configured to, in response to the occurrence of the intrusion event, record the intrusion event based on the category of the intrusion object and the intrusion detection area to obtain an intrusion record;
    发送模块,配置为将所述入侵记录进行存储或者发送给关联的终端。A sending module, configured to store or send the intrusion record to an associated terminal.
  25. 一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至12任一项所述方法中的步骤。A computer device, comprising a memory and a processor, wherein the memory stores a computer program that can be run on the processor, wherein the processor implements the program described in any one of claims 1 to 12 when the processor executes the program steps in the method.
  26. 一种存储介质,存储有可执行指令,用于引起处理器执行时,实现权利要求1至12任一项所述的方法。A storage medium storing executable instructions for causing a processor to implement the method of any one of claims 1 to 12 when executed.
  27. 一种计算机程序产品,所述计算机程序产品包括一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1至12任一项所述的方法。A computer program product comprising one or more instructions adapted to be loaded by a processor and to perform the method of any of claims 1 to 12.
PCT/CN2021/087835 2020-12-31 2021-04-16 Invasion detection method and apparatus, device, storage medium, and program product WO2022141962A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011620177.3A CN112668496A (en) 2020-12-31 2020-12-31 Intrusion detection method, device, equipment and storage medium
CN202011620177.3 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022141962A1 true WO2022141962A1 (en) 2022-07-07

Family

ID=75412094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/087835 WO2022141962A1 (en) 2020-12-31 2021-04-16 Invasion detection method and apparatus, device, storage medium, and program product

Country Status (2)

Country Link
CN (1) CN112668496A (en)
WO (1) WO2022141962A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909215A (en) * 2022-12-09 2023-04-04 厦门农芯数字科技有限公司 Edge intrusion early warning method and system based on target detection
CN116030423A (en) * 2023-03-29 2023-04-28 浪潮通用软件有限公司 Regional boundary intrusion detection method, equipment and medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887806B (en) * 2021-05-09 2023-04-07 电子科技大学 Long-tail cascade popularity prediction model, training method and prediction method
CN113255533B (en) * 2021-05-31 2022-06-21 中再云图技术有限公司 Method for identifying forbidden zone intrusion behavior, storage device and server
CN113344900B (en) * 2021-06-25 2023-04-18 北京市商汤科技开发有限公司 Airport runway intrusion detection method, airport runway intrusion detection device, storage medium and electronic device
CN113469021A (en) * 2021-06-29 2021-10-01 深圳市商汤科技有限公司 Video processing apparatus, electronic device, and computer-readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303163A (en) * 2015-09-22 2016-02-03 深圳市华尊科技股份有限公司 Method and detection device for target detection
CN106156785A (en) * 2015-04-07 2016-11-23 佳能株式会社 Method for checking object and body detection device
CN107784289A (en) * 2017-11-02 2018-03-09 深圳市共进电子股份有限公司 A kind of security-protecting and monitoring method, apparatus and system
CN111126317A (en) * 2019-12-26 2020-05-08 腾讯科技(深圳)有限公司 Image processing method, device, server and storage medium
CN111813997A (en) * 2020-09-08 2020-10-23 平安国际智慧城市科技股份有限公司 Intrusion analysis method, device, equipment and storage medium
CN112052787A (en) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 Target detection method and device based on artificial intelligence and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697424A (en) * 2018-12-19 2019-04-30 浙江大学 A kind of high-speed railway impurity intrusion detection device and method based on FPGA and deep learning
CN111160125B (en) * 2019-12-11 2023-06-30 北京交通大学 Railway foreign matter intrusion detection method based on railway monitoring

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156785A (en) * 2015-04-07 2016-11-23 佳能株式会社 Method for checking object and body detection device
CN105303163A (en) * 2015-09-22 2016-02-03 深圳市华尊科技股份有限公司 Method and detection device for target detection
CN107784289A (en) * 2017-11-02 2018-03-09 深圳市共进电子股份有限公司 A kind of security-protecting and monitoring method, apparatus and system
CN111126317A (en) * 2019-12-26 2020-05-08 腾讯科技(深圳)有限公司 Image processing method, device, server and storage medium
CN112052787A (en) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 Target detection method and device based on artificial intelligence and electronic equipment
CN111813997A (en) * 2020-09-08 2020-10-23 平安国际智慧城市科技股份有限公司 Intrusion analysis method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909215A (en) * 2022-12-09 2023-04-04 厦门农芯数字科技有限公司 Edge intrusion early warning method and system based on target detection
CN115909215B (en) * 2022-12-09 2023-07-14 厦门农芯数字科技有限公司 Edge intrusion early warning method and system based on target detection
CN116030423A (en) * 2023-03-29 2023-04-28 浪潮通用软件有限公司 Regional boundary intrusion detection method, equipment and medium

Also Published As

Publication number Publication date
CN112668496A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
WO2022141962A1 (en) Invasion detection method and apparatus, device, storage medium, and program product
Jia et al. Real‐time automatic helmet detection of motorcyclists in urban traffic using improved YOLOv5 detector
Lubna et al. Automatic number plate Recognition: A detailed survey of relevant algorithms
WO2020173022A1 (en) Vehicle violation identifying method, server and storage medium
WO2015102759A1 (en) Driver behavior monitoring systems and methods for driver behavior monitoring
Jo Cumulative dual foreground differences for illegally parked vehicles detection
Varghese et al. An efficient algorithm for detection of vacant spaces in delimited and non-delimited parking lots
Chetouane et al. Vision‐based vehicle detection for road traffic congestion classification
KR102122850B1 (en) Solution for analysis road and recognition vehicle license plate employing deep-learning
CN111898491A (en) Method and device for identifying reverse driving of vehicle and electronic equipment
CN112215188A (en) Traffic police gesture recognition method, device, equipment and storage medium
CN116311166A (en) Traffic obstacle recognition method and device and electronic equipment
Kodwani et al. Automatic license plate recognition in real time videos using visual surveillance techniques
Arshad et al. Detection of Two-Wheeler Traffic Rule Violation Using Deep Learning
KR101766467B1 (en) Alarming apparatus and methd for event occurrence, and providing method of event occurrence determination model
CN112686136B (en) Object detection method, device and system
Ghahremannezhad et al. Traffic surveillance video analytics: A concise survey
Sri Jamiya et al. A survey on vehicle detection and tracking algorithms in real time video surveillance
Panda et al. Application of Image Processing In Road Traffic Control
Prakash-Borah et al. Real-Time Helmet Detection and Number Plate Extraction Using Computer Vision
Pranata Improved Visual Background Extractor for Illegally Parked Vehicle Detection.
Prabhakaran et al. Automated Non-Helmet Rider Detection using YOLO v7 and OCR for Enhanced Traffic Monitoring
Zaman et al. Deep Learning Approaches for Vehicle and Pedestrian Detection in Adverse Weather
Zhang MASFF: Multiscale Adaptive Spatial Feature Fusion Method for vehicle recognition
Tsai et al. Four Categories Vehicle Detection in Hsuehshan Tunnel via Single Shot Multibox Detector

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912727

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.11.2023)