CN111813995A

CN111813995A - Pedestrian article extraction behavior detection method and system based on space-time relationship

Info

Publication number: CN111813995A
Application number: CN202010619517.4A
Authority: CN
Inventors: 孙涛; 许金金; 李宗南; 张浒; 王东; 苗应亮
Original assignee: Maxvision Technology Corp
Current assignee: Maxvision Technology Corp
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2020-10-23

Abstract

The invention provides a pedestrian article extraction behavior detection method and system based on a space-time relationship, wherein the method comprises the following steps: the method comprises the steps of detecting an acquired monitoring video frame image in real time based on a trained pedestrian key point detection model, a face key point detection model and an article detection model to respectively obtain a pedestrian detection result, a face detection result and an article detection result; matching the pedestrian detection result with the face detection result based on the space-time relation of the object in the monitoring video frame image to obtain a related pedestrian and face matching result, and matching the pedestrian detection result with the object detection result to obtain a related pedestrian and object matching result; and detecting the behavior of extracting the article, and associating the pedestrian corresponding to the behavior of extracting the article according to the matching result of the pedestrian and the article. The pedestrian detection method can combine the space-time relationship to realize real-time accurate detection of pedestrian extracted articles.

Description

Pedestrian article extraction behavior detection method and system based on space-time relationship

Technical Field

The invention relates to the field of target detection, in particular to a pedestrian article extraction behavior detection method and system based on a space-time relationship.

Background

At present, in places such as airports, railway stations, bus stations, subway stations, ports, frontier inspection, customs, government buildings, embassy halls, convention centers and the like, article security inspection equipment is arranged to detect suspicious articles. When the suspicious goods relate to the illegal crime, the physical evidence that the related staff can collect is generally limited to the suspicious goods, and the evidence that can prove the relevance between the suspicious goods and the suspicious people is difficult to collect, so that the suspicious people cannot be effectively controlled by the evidence.

At present, in a real security inspection scene, due to the fact that a background is quite complex, pedestrians are mobile, article types are various, and a single type of target detection model is not enough to meet detection judgment of association between the pedestrians and the articles. However, in the prior art, no target detection method can effectively relate the action of the pedestrian and the action of extracting the article.

Therefore, a method capable of accurately detecting the behavior of the pedestrian extracting the article is needed to solve the above problems.

Disclosure of Invention

The invention provides a pedestrian article extraction behavior detection method and system based on a spatiotemporal relationship, which overcomes or at least partially solves the problems.

According to a first aspect provided by the invention, the invention provides a pedestrian article extraction behavior detection method based on a spatiotemporal relationship, which comprises the following steps:

the method comprises the steps of detecting an acquired monitoring video frame image in real time based on a trained pedestrian key point detection model, a face key point detection model and an article detection model to respectively obtain a pedestrian detection result, a face detection result and an article detection result;

matching the pedestrian detection result with the face detection result based on the space-time relation of the object in the monitoring video frame image to obtain a related pedestrian and face matching result, and matching the pedestrian detection result with the object detection result to obtain a related pedestrian and object matching result;

and detecting the behavior of extracting the article, and associating the pedestrian corresponding to the behavior of extracting the article according to the matching result of the pedestrian and the article.

Wherein the method further comprises:

carrying out deep learning training by utilizing the constructed pedestrian data set to obtain the pedestrian key point detection model;

carrying out deep learning training by using the constructed face data set to obtain the face key point detection model;

and carrying out deep learning training by utilizing the constructed object data set to obtain the object detection model, wherein the pedestrian data set, the human face data set and the object data set are video frame images captured from a historical monitoring video, and pedestrians, human faces and objects in the video frame images are marked and classified.

Wherein, based on pedestrian key point detection model, face key point detection model and the article detection model that train, carry out real-time detection to the surveillance video frame image who obtains, obtain pedestrian detection result, face detection result and article detection result respectively, include:

acquiring a human body positioning rectangular frame and human body skeleton information of the pedestrian on a frame image based on the trained pedestrian key point detection model;

acquiring a face positioning rectangular frame on a frame image and a face image after face alignment correction based on a trained face key point detection model;

and acquiring an article positioning rectangular frame on the frame image based on the trained article detection model.

Wherein, the matching the pedestrian detection result and the face detection result to obtain the associated pedestrian and face matching result comprises:

in the same frame of image, if the overlapping degree of the area of the face in the image and the head area in the human skeleton information of the pedestrian is larger than a preset threshold value, the pedestrian and the face are judged to be successfully matched in an associated mode.

In the same frame of image, the areas of rectangular frames of a face region and a head region in the pedestrian human skeleton information are respectively A1 and A2, the area of an overlapping region is A0, and the overlapping rate R is:

and the overlapping rate threshold value is T, and if R is larger than T, the pedestrian and the face are judged to be successfully associated and matched.

Wherein, the matching the pedestrian detection result and the article detection result to obtain the associated pedestrian and article matching result comprises:

in the same frame of image, if the object has an arm vertex in the pedestrian human skeleton information in the region of the image, judging that the association matching between the pedestrian and the object is successful.

Wherein the detecting the behavior of extracting the article comprises:

and comparing the object detection result in the current frame image with the object detection results in the previous frames of images, and if the coordinates of the center point of the object positioning rectangular frame rise in the images, judging that the behavior of extracting the object exists.

Setting the coordinate of the central point of the article positioning rectangle frame in the vertical direction at the current frame as Y_tY in the previous frame_t-1Coordinate variation difference Δ Y ═Y_t-Y_t-1(ii) a When DeltaY is larger than threshold value T_yWhen the counter is over, the counter C is accumulated; the counter threshold is T_cWhen C is greater than T_cAnd judging that the coordinates of the center point of the article positioning rectangular frame rise in the image.

Wherein the method further comprises:

and if the pedestrian corresponding to the behavior of the article is extracted and the associated pedestrian and face matching result exists, outputting the aligned corrected face image corresponding to the pedestrian.

According to a second aspect provided by the present invention, the present invention provides a pedestrian extraction article behavior detection system based on spatiotemporal relationship, comprising:

the data detection module is used for detecting the acquired monitoring video frame images in real time based on the trained pedestrian key point detection model, the face key point detection model and the object detection model to respectively obtain a pedestrian detection result, a face detection result and an object detection result;

the data matching module is used for matching the pedestrian detection result with the face detection result based on the space-time relation of the object in the monitoring video frame image to obtain a related pedestrian and face matching result, and matching the pedestrian detection result with the object detection result to obtain a related pedestrian and object matching result;

and the behavior detection module is used for detecting the behavior of extracting the article and associating the pedestrian corresponding to the behavior of extracting the article according to the matching result of the pedestrian and the article.

According to a third aspect provided by the present invention, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the steps of the pedestrian extraction item behavior detection method based on spatiotemporal relationship as provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the pedestrian extraction item behavior detection method based on spatiotemporal relationship as provided in the above first aspect.

According to the pedestrian article extraction behavior detection method and system based on the space-time relationship, the behavior of pedestrian article extraction is identified through a human body key point detection model, a human face key point detection model and an article detection model based on deep learning and coordinate information on time and space is combined, and real-time and accurate detection of pedestrian article extraction is achieved.

Drawings

Fig. 1 is a schematic flow chart of a pedestrian article extraction behavior detection method based on a spatiotemporal relationship provided by an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a pedestrian article extraction behavior detection system based on spatiotemporal relationship according to an embodiment of the present invention.

Fig. 3 illustrates a schematic structural diagram of an electronic device.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 is a schematic flow chart of a pedestrian article extraction behavior detection method based on spatiotemporal relationship, as shown in fig. 1, including:

101. the method comprises the steps of detecting an acquired monitoring video frame image in real time based on a trained pedestrian key point detection model, a face key point detection model and an article detection model to respectively obtain a pedestrian detection result, a face detection result and an article detection result;

102. matching the pedestrian detection result with the face detection result based on the space-time relation of the object in the monitoring video frame image to obtain a related pedestrian and face matching result, and matching the pedestrian detection result with the object detection result to obtain a related pedestrian and object matching result;

103. and detecting the behavior of extracting the article, and associating the pedestrian corresponding to the behavior of extracting the article according to the matching result of the pedestrian and the article.

Specifically, in step 101, three detection models, namely a pedestrian key point detection model, a face key point detection model and an article detection model, are trained in advance in the embodiment of the present invention, and the pedestrian key point detection model is specifically a human body positioning rectangular frame and human body key point positions, such as a head, shoulders, arms, a trunk and legs, of a pedestrian in an image by labeling information. When the pedestrian key point detection model is used, firstly, a convolutional neural network operation is carried out on an input image to obtain a feature map, then the feature map is respectively input into a key point confidence coefficient prediction network and a key point affinity vector field prediction network to obtain the key point confidence coefficient and affinity vector of each key point, according to the key point confidence coefficient network, a position with the highest confidence coefficient is selected as a key point position of network prediction, finally, according to the key point affinity vector field network, the correlation of the two key points is calculated, a connection line mode with the highest affinity is output, and clustering is carried out on the key points.

The training process of the face key point detection model is realized by positioning a rectangular frame and the positions of face key points, such as eyes, a nose and a mouth, for the faces of pedestrians in the image by marking information. When the face key point detection model is used, firstly, two-stage cascade convolution neural network operation is carried out on an image, then a face region is obtained through calculation according to the first-stage convolution neural network, and finally the face region is used as the input of the second stage to obtain face key point information (position and confidence).

The training process of the article detection model is realized by marking information as a positioning rectangular frame of an article in an image, and the article detection model is mainly divided into three parts: a base layer, an extension layer, and a prediction layer. And calculating the input image through a basic layer to obtain two characteristic diagrams, calculating the input image through an extended layer to obtain four characteristic diagrams, and finally obtaining the position coordinate and the confidence coefficient of the article through the six characteristic diagrams through a prediction layer.

Further, in step 102, the embodiment of the present invention matches the pedestrian detection result and the face detection result in sequence by using the spatiotemporal relationship in the monitoring video frame image to obtain a related pedestrian and face matching result, and then matches the pedestrian face matching result and the object detection result to obtain a related pedestrian and object matching result.

Finally, in step 103, the behavior of extracting the article is automatically output, and the pedestrian extracting the article is determined according to the association of the article and the pedestrian before, so that the evidence is saved.

According to the pedestrian article extraction behavior detection method based on the space-time relationship, the behavior of extracting articles from pedestrians is identified through a human body key point detection model, a human face key point detection model and an article detection model based on deep learning and by combining coordinate information on time and space, and real-time and accurate detection of the pedestrian article extraction is achieved.

On the basis of the above embodiment, the method further includes:

On the basis of the above embodiment, the real-time detection is performed on the acquired surveillance video frame image based on the trained pedestrian key point detection model, the face key point detection model and the article detection model, so as to obtain a pedestrian detection result, a face detection result and an article detection result respectively, including:

The pedestrian detection result comprises a human body positioning rectangular frame and human body skeleton information of the pedestrian on the frame image, and comprises coordinate information of the head, the shoulders, the arms, the trunk and the legs; the detection result of the human face is a human face positioning rectangular frame and a human face picture after the human face is aligned and corrected, and the object detection result is an object positioning rectangular frame.

On the basis of the above embodiment, the matching the pedestrian detection result and the face detection result to obtain a related pedestrian and face matching result includes:

When the pedestrian and the face appear in the detection result at the same time, the embodiment of the invention judges whether the pedestrian is matched with the face according to the plane coordinate information of the frame image, specifically, whether the region of the face in the image is overlapped with the head region in the human skeleton information of the pedestrian, and when the overlapping degree reaches a certain threshold value, the detected pedestrian and the face are matched.

On the basis of the above embodiment, the matching the pedestrian detection result and the article detection result to obtain a related pedestrian and article matching result includes:

And when the pedestrian and the article appear in the detection result at the same time, judging whether the pedestrian is matched with the article according to the plane coordinate information of the frame image, namely whether the article has an arm vertex in the pedestrian human body skeleton information in the region of the image, and if so, pairing the detected pedestrian and article.

On the basis of the above embodiment, the detecting the behavior of extracting the article includes:

The embodiment of the invention identifies the behavior of extracting the article by comparing the article detection result of the current frame with the article detection results of the previous frames, thereby judging whether the position of the article in the frame image rises. Specifically, when an article appears in the detection result, coordinate information of the center point of each frame of article rectangular frame is recorded, and a plurality of frames are taken as 5 frames for illustration, and if the upper left corner of the image is taken as the origin of coordinates and the coordinate value of the center point of the article rectangular frame of the 5 frames in the front and back directions in the vertical direction is reduced, the article is judged to be ascending.

It can be understood that if the coordinates of the center point of the rectangular frame of the object are moving upwards in the frame image, the object is considered to be extracted by a pedestrian who is successfully paired, the object is associated with the corresponding pedestrian, and the result is output.

On the basis of the above embodiment, the method further includes:

It should be noted that, if the behavior of the pedestrian for extracting the object is successfully detected, the embodiment of the present invention may further output and extract a face image corresponding to the pedestrian of the object according to the result of matching between the pedestrian and the face, where the face image is a face image after face alignment correction output by the face key point detection model. When the article is a suspicious article, the method can be convenient for a law enforcement department to capture the suspicious person after face recognition in time.

Fig. 2 is a schematic structural diagram of a pedestrian article extraction behavior detection system based on spatiotemporal relationship according to an embodiment of the present invention, as shown in fig. 2, including: a data detection module 201, a data matching module 202, and a behavior detection module 203, wherein:

the data detection module 201 is configured to perform real-time detection on the acquired surveillance video frame image based on the trained pedestrian key point detection model, the trained face key point detection model and the trained article detection model, and obtain a pedestrian detection result, a face detection result and an article detection result respectively;

the data matching module 202 is configured to match the pedestrian detection result and the face detection result based on a spatiotemporal relationship of the object in the surveillance video frame image to obtain an associated pedestrian and face matching result, and match the pedestrian detection result and the object detection result to obtain an associated pedestrian and object matching result;

the behavior detection module 203 is configured to detect a behavior of extracting an article, and associate a pedestrian corresponding to the behavior of extracting the article according to the result of matching the pedestrian with the article.

For a specific way of detecting the behavior of the pedestrian by using the data detection module 201, the data matching module 202 and the behavior detection module 203, reference may be made to the embodiment shown in fig. 1, and details of the embodiment of the present invention are not repeated herein.

Fig. 3 illustrates a schematic structural diagram of an electronic device, and as shown in fig. 3, the server may include: a processor (processor)301, a communication Interface (communication Interface)302, a memory (memory)303 and a bus 304, wherein the processor 301, the communication Interface 302 and the memory 303 complete communication with each other through the bus 304. The communication interface 302 may be used for information transmission between the server and the smart television. Processor 301 may call logic instructions in memory 303 to perform the following method: the method comprises the steps of detecting an acquired monitoring video frame image in real time based on a trained pedestrian key point detection model, a face key point detection model and an article detection model to respectively obtain a pedestrian detection result, a face detection result and an article detection result; matching the pedestrian detection result with the face detection result based on the space-time relation of the object in the monitoring video frame image to obtain a related pedestrian and face matching result, and matching the pedestrian detection result with the object detection result to obtain a related pedestrian and object matching result; and detecting the behavior of extracting the article, and associating the pedestrian corresponding to the behavior of extracting the article according to the matching result of the pedestrian and the article.

The present embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, enable the computer to perform the methods provided by the above-described method embodiments, for example, including: the method comprises the steps of detecting an acquired monitoring video frame image in real time based on a trained pedestrian key point detection model, a face key point detection model and an article detection model to respectively obtain a pedestrian detection result, a face detection result and an article detection result; matching the pedestrian detection result with the face detection result based on the space-time relation of the object in the monitoring video frame image to obtain a related pedestrian and face matching result, and matching the pedestrian detection result with the object detection result to obtain a related pedestrian and object matching result; and detecting the behavior of extracting the article, and associating the pedestrian corresponding to the behavior of extracting the article according to the matching result of the pedestrian and the article.

The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the above method embodiments, for example, including: the method comprises the steps of detecting an acquired monitoring video frame image in real time based on a trained pedestrian key point detection model, a face key point detection model and an article detection model to respectively obtain a pedestrian detection result, a face detection result and an article detection result; matching the pedestrian detection result with the face detection result based on the space-time relation of the object in the monitoring video frame image to obtain a related pedestrian and face matching result, and matching the pedestrian detection result with the object detection result to obtain a related pedestrian and object matching result; and detecting the behavior of extracting the article, and associating the pedestrian corresponding to the behavior of extracting the article according to the matching result of the pedestrian and the article.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A pedestrian article extraction behavior detection method based on a spatiotemporal relationship is characterized by comprising the following steps:

2. The pedestrian extraction article behavior detection method based on the spatiotemporal relationship as claimed in claim 1, wherein the method further comprises:

3. The pedestrian extraction article behavior detection method based on the spatiotemporal relationship as claimed in claim 2, wherein the detecting the acquired surveillance video frame image in real time based on the trained pedestrian key point detection model, the face key point detection model and the article detection model to respectively obtain the pedestrian detection result, the face detection result and the article detection result comprises:

4. The pedestrian extraction article behavior detection method based on the spatiotemporal relationship as claimed in claim 3, wherein the matching of the pedestrian detection result and the face detection result to obtain the associated pedestrian and face matching result comprises:

5. The pedestrian extraction article behavior detection method based on the spatiotemporal relationship as claimed in claim 4, wherein the matching of the pedestrian detection result and the article detection result to obtain the associated pedestrian and article matching result comprises:

6. The pedestrian extracted article behavior detection method based on spatiotemporal relationship according to claim 5, wherein the detecting the behavior of the extracted article comprises:

7. The pedestrian extraction article behavior detection method based on the spatiotemporal relationship as claimed in claim 1, wherein the method further comprises:

8. A pedestrian extraction article behavior detection system based on spatiotemporal relationship is characterized by comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the pedestrian extraction article behavior detection method based on spatiotemporal relationship as claimed in any one of claims 1 to 7.

10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the pedestrian extraction article behavior detection method based on spatiotemporal relationship according to any one of claims 1 to 7.