WO2021093329A1

WO2021093329A1 - Interactive behavior identification method and apparatus, computer device and storage medium

Info

Publication number: WO2021093329A1
Application number: PCT/CN2020/097002
Authority: WO
Inventors: 余代伟; 孙皓; 董昱青; 庄喜阳; 李永翔
Original assignee: 苏宁易购集团股份有限公司; 苏宁云计算有限公司
Priority date: 2019-11-12
Filing date: 2020-06-19
Publication date: 2021-05-20
Also published as: CA3160731A1; CN110991261A

Abstract

The present application relates to an interactive behavior identification method and apparatus, a computer device and a storage medium. The method comprises: acquiring an image to be detected; inputting the image into a preset multi-task model to obtain key points and a detection frame of a passerby in the image, wherein the key points are located inside the detection frame, and the multi-task model is used for passerby detection and human body key point detection; and according to the key points of the passerby and a preset item rack image corresponding to the image, determining interactive behavior information of the passerby and a corresponding item rack. By using the present method, interactive behavior between passersby and items may be efficiently identified.

Description

Interactive behavior recognition method, device, computer equipment and storage medium

Technical field

This application relates to the field of computer vision technology, and in particular to an interactive behavior recognition method, device, computer equipment, and storage medium.

Background technique

With the advent of the Internet era, the retail industry has begun to enter a stage of rapid development. The future of retail is smart retail, that is, using technologies such as the Internet and big data to perceive users’ consumption habits, so as to provide consumers with diversification and personalization. The recognition of human-goods interaction behavior is a problem that needs to be solved in the smart retail field.

The traditional human-goods interaction behavior recognition method generally uses sound, light, electricity and other sensor devices to realize behavior recognition, which requires high hardware costs, and its use scenarios are limited, and cannot be applied to complex environments such as supermarkets on a large scale; supermarket monitoring The equipment generates a large amount of video data every day. Analyzing the surveillance video can obtain a lot of information about the interaction between people and goods, but this requires a lot of manpower and also has the problem of low efficiency.

Summary of the invention

Based on this, it is necessary to address the above technical problems and provide an interactive behavior recognition method, device, computer equipment, and storage medium that can efficiently recognize the interaction between the human body and the article.

An interactive behavior identification method, which includes:

Obtain the image to be detected;

Input the image to be detected into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;

According to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined.

In one of the embodiments, the preset item rack image is a preset item rack mask image, and the interaction behavior information between the pedestrian and the corresponding item rack is determined based on the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, including :

Select the key point of the wrist among the key points of the pedestrian;

According to the key points of the wrist and the preset radius threshold, the pedestrian's hand area is obtained;

When the intersecting area between the image of the hand area and the mask image of the preset item rack is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack;

When the intersecting area of the image of the hand area and the preset mask image of the item rack is less than or equal to the area threshold, it is determined that there is no interaction between the pedestrian and the corresponding item rack.

In one of the embodiments, the method further includes:

Select any point in the pedestrian detection frame as the anchor point, and set the position coordinate of the anchor point in the image to be tested as the first position coordinate of the pedestrian;

According to the preset coordinate mapping relationship, map the first position coordinates of the pedestrian to the world coordinate system to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position coordinates of the pedestrian in the world coordinate system;

Collect the second position coordinates of the pedestrian at each time point in the preset time period to obtain the pedestrian's route map in the preset time period.

In one of the embodiments, the method further includes:

According to the key points of the pedestrian, get the direction information of the pedestrian;

According to the pedestrian's orientation information and the preset item rack image, the item rack area to which the pedestrian is facing is obtained.

In one of the embodiments, obtaining the pedestrian's orientation information according to the key points of the pedestrian includes:

Select the key points of the shoulder among the key points of pedestrians. The key points of the shoulder include the key points of the left shoulder and the key points of the right shoulder;

Calculate the difference between the coordinates of the key point of the left shoulder and the key point of the right shoulder to obtain the shoulder vector;

The inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector. The preset unit vector is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;

Sum the radian value of the included angle and π to obtain the heading angle of the pedestrian;

When the orientation angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing the side of the image to be detected;

When the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.

In one of the embodiments, acquiring the image to be detected includes:

Obtain surveillance video of the target site;

Screen out the image with pedestrians from the surveillance video as the image to be detected.

In one of the embodiments, the method further includes:

Obtain sample images;

Carry out key point annotation and detection frame annotation on pedestrians in the sample image to obtain annotated image data;

The labeled image data is input into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.

A recognition device for human-goods interaction behavior, the device comprising:

The acquisition module is used to acquire the image to be detected;

The detection module is used to input the image to be detected into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected. The key points are located inside the detection frame. The multi-task model is used for pedestrian detection and human key point detection ；

The recognition module is used to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.

A computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:

Obtain the image to be detected;

Input the image to be detected into the preset multi-task model to obtain the key points and detection frame of pedestrians in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;

A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:

Obtain the image to be detected;

The above-mentioned interactive behavior recognition method, device, computer equipment and storage medium obtain the image to be detected. By inputting the image to be detected into a preset multi-task model, the key points and detection frames of pedestrians in the image to be detected are obtained. The multi-task model of pedestrian detection and human body key point detection can simultaneously obtain pedestrian detection frame and human body key points, which improves the efficiency of image processing; key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame to achieve integration Use the detection frame and key points to improve the accuracy of the key point labeling; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, determine the interaction behavior information of the pedestrian and the corresponding item rack, which can efficiently identify the interaction behavior , And improve the recognition accuracy.

Description of the drawings

Figure 1 is an application environment diagram of an interactive behavior recognition method in an embodiment;

Figure 2 is a schematic flowchart of an interactive behavior identification method in an embodiment;

FIG. 3 is a schematic flowchart of an interactive behavior judgment step in an embodiment;

4 is a schematic flowchart of an interactive behavior recognition method in another embodiment;

Figure 5 is a structural block diagram of an interactive behavior recognition device in an embodiment;

Fig. 6 is an internal structure diagram of a computer device in an embodiment.

Detailed ways

In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

The interactive behavior identification method provided in this application can be applied to the application environment as shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network through the network. The terminal 102 can be, but is not limited to, various image acquisition devices. Specifically, the terminal 102 can be an existing monitoring device in a shopping mall, supermarket or library, and the server 104 can be an independent server or a server cluster composed of multiple servers. achieve.

In an embodiment, as shown in FIG. 2, an interactive behavior recognition method is provided. The method is applied to the server in FIG. 1 as an example for description, including the following steps:

Step 202: Obtain an image to be detected.

Among them, the image to be detected is an image with pedestrians collected by an image acquisition device. The above-mentioned image acquisition device may be a monitoring device that has been installed and used in a target place such as a shopping mall, a supermarket or a library, such as an existing camera in the target place, without the need to monitor the target. The site is transformed, and the deployment cost is low.

Specifically, the surveillance video is acquired through the camera, and pictures with pedestrians are selected from the surveillance video as the image to be detected.

Step 204: Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection.

Among them, the multi-task model can obtain the detection frame of the pedestrian in the image to be detected through pedestrian detection, and at the same time, the key points of the human body are detected to obtain the key points of the pedestrian, so as to achieve the synchronization of the detection frame and key points of the pedestrian, and the feature sharing between different tasks. It reduces the amount of calculation, reduces the hardware resource occupation, and shortens the processing time of a single frame image. It can process the images to be detected obtained from multiple cameras at the same time, and realize the parallel processing of multiple cameras.

Specifically, the acquired image to be detected is input into a preset multi-task model. The multi-task model performs pedestrian detection and human key point detection on the image to be detected. In the process of processing the image to be detected, the multi-task model can exclude The key points outside the detection frame make the output key points all located inside the detection frame. Finally, the multi-task model can output the key points and the detection frame of the pedestrian in the image to be detected.

For example, input the image to be detected I ^H×W×3 to the above-mentioned multi-task model, and the multi-task model outputs key points

And check box

among them,

Among them, N is the number of pedestrians in the image to be detected, and K is the number of key points for each pedestrian, usually K=17;

Represents the coordinates of the j-th key point of the i-th person on the image to be detected;

Indicates the coordinates of the upper left and lower right corners of the detection frame of the i-th person on the picture to be detected, and score represents the confidence level of the detection frame, that is, the degree of confidence.

Step 206: Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.

Among them, the existing cameras, target location layouts, and item racks are pre-positioned and marked, and each camera is configured with a corresponding preset item rack image. It is known that the image to be detected is obtained through one of the cameras, which can be seen by the same The images to be detected acquired by one camera all correspond to the aforementioned cameras, so the images to be detected also correspond to the preset item rack images configured by the aforementioned cameras.

Specifically, one of the key points of pedestrians can be selected as a reference key point, and then based on the correlation between the reference key point and the above-mentioned preset item rack image, such as the distance or the intersection area, the pedestrian and the corresponding can be judged The interaction between the racks.

In the above interactive behavior recognition method, the image to be detected is acquired, and the key points and detection frames of pedestrians in the image to be detected are obtained by inputting the image to be detected into a preset multi-task model. This method is used for pedestrian detection and key points of the human body. The multi-task detection model can obtain the pedestrian detection frame and the key points of the human body simultaneously, which improves the efficiency of image processing; the key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame, so as to achieve comprehensive utilization of the detection frame and key points , The purpose of improving the accuracy of the key point labeling; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information of the pedestrian and the corresponding item rack can be determined, which can efficiently identify the interaction behavior and improve the recognition accuracy rate ; And this method can realize the full-process automatic processing without manual intervention, which greatly reduces labor costs.

In one embodiment, as shown in FIG. 3, the preset item rack image is a preset item rack mask image, and the preset item rack mask image may be a frame of image extracted from a large number of surveillance videos, and then labeled with a polygon The image obtained from the outline of the item rack in the image; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined, including:

Step 302, selecting a wrist key point among pedestrian key points;

Among them, the wrist key point data includes left wrist key point data and right wrist key point data.

Step 304: Obtain the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold;

Specifically, the left-hand area and the right-hand area are divided into the left-hand area and the right-hand area by taking the left-hand and right-hand key points as the center of the circle respectively, and the preset radius threshold is the radius, so as to obtain the image of the left-hand area and the image of the right-hand area.

Step 306: Determine whether the intersecting area of the image of the hand area and the preset mask image of the article rack is greater than a preset area threshold;

Step 308, if yes, determine that the pedestrian interacts with the corresponding item rack;

In step 310, if not, it is determined that there is no interaction between the pedestrian and the corresponding item rack.

In the above step 306, the hand area includes a left-hand area and a right-hand area. Specifically, when the intersecting area of the image of at least one of the left-hand area and the right-hand area and the preset item rack mask image is greater than the preset area threshold At the time, it is determined that the pedestrian interacts with the corresponding item rack; otherwise, it is determined that the pedestrian does not interact with the corresponding item rack.

E.g,

Indicates the hand area with the left wrist as the center and R as the radius, that is, the left hand area;

Indicates the hand area with the right wrist as the center and R as the radius, that is, the right hand area;

The preset area threshold is 150 unit area. When H _R ∩M _S ＞150, it is determined that the pedestrian interacts with the corresponding item rack, that is, the pedestrian is shopping;

When H _R ∩M _S ≤150, it is determined that there is no interaction between the pedestrian and the corresponding item rack, that is, the pedestrian is not shopping.

In this embodiment, an interactive behavior recognition method is provided. The interactive behavior recognition method judges the interactive behavior by directly estimating the intersection area of the hand and the item rack, which is simple and easy to implement, has strong scalability and fast calculation speed. The real-time performance is better; this method is usually used for the recognition of human-goods interaction behaviors in shopping malls and supermarkets. At this time, the item racks are shelves in the shopping malls and supermarkets. However, this method can also be used for the recognition of human-object interaction behaviors in other places, such as libraries. At this time, the item rack is the library shelf.

In one embodiment, the method further includes:

Specifically, the center point of the detection frame is selected as the positioning point, which is convenient to select, and the center point can more accurately indicate the position of the pedestrian.

Here, the preset coordinate mapping relationship is the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system; specifically, the position of the image acquisition device in the world coordinate system is pre-calibrated through the position information of the image acquisition device, namely The coordinate position of the image to be detected in the world coordinate system collected by the image acquisition device can be obtained, so as to infer the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system.

Among them, the preset time period is the time from when pedestrians enter the target place to when they leave the target place. The route map of the pedestrian within the preset time period is the route that the pedestrian passes from entering the target place to exiting the target place, that is, the pedestrian's moving line diagram. , Combined with the layout drawing of the target place, you can draw a moving line diagram of pedestrians entering the target place on the layout drawing of the target place.

In this embodiment, an interactive behavior recognition method is provided. The method can obtain the pedestrian's route map within a preset time period according to the pedestrian's detection frame and the preset coordinate mapping relationship, which is convenient to record the pedestrian's The movement trajectory in the target place. When this method is applied to a shopping mall or supermarket, you can intuitively observe the data of the customer's movement route in the supermarket from entering to leaving. The staff can adjust the supermarket layout based on these data to make it more updated. Adapt to customers' shopping habits.

In one embodiment, the method further includes:

Specifically, select the key points of the shoulders among the key points of pedestrians;

For example: shoulder key points include left shoulder key points

And right shoulder key points

among them,

Take the difference between the coordinates of the key point of the left shoulder and the key point of the right shoulder to obtain the shoulder vector:

Sum the radian value of the included angle and π to obtain the heading angle of the pedestrian:

When the orientation angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing one side of the image to be detected; when the orientation angle is greater than or equal to 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.

According to the pedestrian's orientation information and the preset item rack image, the item rack area to which the pedestrian is facing is obtained. Specifically, according to the orientation of the pedestrian in the image to be detected and the preset item rack image corresponding to the image to be detected, the area of the item rack to which the pedestrian is oriented can be obtained.

In this embodiment, an interactive behavior recognition method is provided, which uses the key point data of the shoulder to calculate the direction of the pedestrian, and the result of the direction is more robust, so as to determine the shelf area of the customer’s attention, which can be a business Super product placement provides reference.

In one embodiment, acquiring the image to be detected includes:

Obtain surveillance video of the target site;

Specifically, perform position calibration on the image acquisition equipment that has been installed and used in the supermarket, and configure the corresponding shelf mask image for each image acquisition equipment to obtain the surveillance video captured by the image acquisition equipment. The above-mentioned image acquisition equipment generally adopts webcam.

In this embodiment, an interactive behavior recognition method is provided. The method directly utilizes existing monitoring equipment at the target location, such as a camera in a shopping mall or supermarket, without the need to modify the venue, has low deployment cost, and is easy to promote.

In one embodiment, the method further includes:

Obtain sample images; specifically, obtain surveillance videos of shopping malls and supermarkets, and filter out a large number of images with pedestrians as sample images from the surveillance videos.

Annotate the pedestrians in the sample image with key points and the detection frame to obtain annotated image data; specifically, annotate the pedestrian detection frame in the sample image, and annotate the pedestrian's eyes, nose, ears, shoulders, elbows, wrists, hips, and hips. Key points such as knees, ankles, etc., and finally labeled image data are obtained.

Input the annotated image data into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model, which is a one-stage bottom-up multi-task network model, which is more similar to similar models. Compared with the stage algorithm, it saves processing time; compared with the top-down algorithm, the processing time does not change with the number of people in the picture.

In this embodiment, an interactive behavior recognition method is provided, which processes the images to be detected by establishing and training a multi-task model. The training and optimization of the model are completed in the background without affecting places such as shopping malls, supermarkets or libraries. The operation of the model; and the model has strong generalization ability, which can be easily and quickly deployed; the features can be shared between different tasks of the multi-task model, which reduces the amount of calculation, reduces the hardware resource occupation, shortens the processing time of a single frame, and realizes the parallelism of multiple cameras deal with.

In one embodiment, as shown in Figure 4, the method includes the following steps:

Step 402: Obtain surveillance video of the target location;

Step 404: Filter out an image with pedestrians from the surveillance video as an image to be detected;

Step 406: Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected, and the key points are all located inside the detection frame;

Step 408: Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected;

Step 410: Obtain a route map of the pedestrian in a preset time period according to the mapping relationship between the detection frame of the pedestrian and the preset coordinate;

Step 412: Obtain the direction information of the pedestrian according to the key points of the pedestrian.

It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one embodiment, as shown in FIG. 5, an interactive behavior recognition device is provided, which includes an acquisition module 502, a detection module 504, and an identification module 506, wherein:

The obtaining module 502 is used to obtain the image to be detected;

The detection module 504 is used to input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key points Detection

The recognition module 506 is configured to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.

In one embodiment, the preset item rack image is a preset item rack mask image, and the aforementioned recognition module 506 includes:

The first key point selection unit is used to select the key point of the wrist among the key points of the pedestrian;

The hand area unit is used to obtain the pedestrian's hand area according to the key points of the wrist and the preset radius threshold;

The interaction determination unit is used to determine when the intersecting area of the hand area image and the preset item rack mask image is greater than the preset area threshold, the pedestrian interacts with the corresponding item rack; when the hand area image and the preset item rack are hidden When the intersecting area of the modular image is less than or equal to the area threshold, it is determined that there is no interaction between the pedestrian and the corresponding item rack.

In one embodiment, the device further includes:

The first position coordinate module is used to select any point in the detection frame of the pedestrian as the positioning point, and set the position coordinates of the positioning point in the image to be tested as the first position coordinates of the pedestrian;

The second position coordinate module is used to map the first position coordinates of the pedestrian to the world coordinate system according to the preset coordinate mapping relationship to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position of the pedestrian in the world coordinate system coordinate;

The route map module is used to collect the second position coordinates of the pedestrian at each time point in the preset time period to obtain the pedestrian's route map in the preset time period.

In one embodiment, the device further includes:

The orientation information module is used to obtain the orientation information of the pedestrian according to the key points of the pedestrian;

The orientation area module is used to obtain the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.

In an embodiment, the above-mentioned orientation information module includes:

The second key point selection unit is used to select the shoulder key points among the key points of pedestrians. The shoulder key points include the left shoulder key point and the right shoulder key point;

The direction angle calculation unit is used to calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector. The preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and π to obtain the heading angle of the pedestrian;

The orientation determination unit is used to determine that the pedestrian is facing one side of the image to be detected when the orientation angle is greater than or equal to π and less than 1.5π; when the orientation angle is greater than 1.5π and less than or equal to 2π, determine that the pedestrian is facing the other side of the image to be detected .

In an embodiment, the above-mentioned obtaining module 502 includes:

The video acquisition unit is used to acquire the surveillance video of the target location;

The image acquisition unit is used to screen out the image with pedestrians from the surveillance video as the image to be detected.

In one embodiment, the device further includes:

The sample acquisition module is used to acquire sample images;

The sample data module is used to label the pedestrians in the sample image with key points and check boxes to obtain labeled image data;

The model training module is used to input the labeled image data into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.

For the specific limitation of the interactive behavior recognition device, please refer to the above limitation on the interactive behavior recognition method, which will not be repeated here. Each module in the above-mentioned interactive behavior recognition device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize an interactive behavior identification method.

Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, the following steps are implemented: acquiring an image to be detected; The detection image is input into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected. The key points are located inside the detection frame. The multi-task model is used for pedestrian detection and human key point detection; according to the key points and key points of the pedestrian The preset item rack image corresponding to the image to be detected determines the interaction behavior information between pedestrians and the corresponding item rack.

In one embodiment, the processor further implements the following steps when executing the computer program: the preset item rack image is the preset item rack mask image, and the pedestrian is determined according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected The step of interacting behavior information with the corresponding item rack includes: selecting the wrist key point among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the wrist key point and the preset radius threshold; when the hand area image and When the intersection area of the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold, It is determined that there is no interaction between the pedestrian and the corresponding item rack.

In one embodiment, the processor further implements the following steps when executing the computer program: selecting any point in the detection frame of the pedestrian as an anchor point, and setting the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system to obtain the second position coordinates of the pedestrian. The second position coordinates are the position coordinates of the pedestrian in the world coordinate system; the pedestrian is collected at the preset time The second position coordinates of each time point in the segment are obtained, and the route map of the pedestrian in the preset time period is obtained.

In one embodiment, the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the key points of the pedestrian; obtaining the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.

In one embodiment, the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the pedestrian's key points, including: selecting the shoulder key points of the pedestrian key points, and the shoulder key points include the left shoulder key Point and right shoulder key point; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector. The preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and π to obtain the heading angle of the pedestrian; when the heading angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing the image to be detected When the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.

In an embodiment, the processor further implements the following steps when executing the computer program: acquiring the image to be detected includes: acquiring a surveillance video of the target location; and filtering out images with pedestrians from the surveillance video as the image to be detected.

In one embodiment, the processor further implements the following steps when executing the computer program: acquiring a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain labeled image data; and inputting the labeled image data into the neural network model Perform training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: acquiring the image to be detected; inputting the image to be detected into a preset multitasking model , Obtain the key points and detection frame of pedestrians in the image to be detected. The key points are located inside the detection frame. The multi-task model is used for pedestrian detection and human key point detection; according to the key points of the pedestrian and the preset item rack corresponding to the image to be detected Image, determine the interaction behavior information of pedestrians and corresponding item racks.

In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: the preset item rack image is the preset item rack mask image, and the preset item rack image corresponding to the key points of the pedestrian and the image to be detected is determined The step of interactive behavior information between pedestrians and corresponding item racks includes: selecting the key points of the wrist among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold; when the image of the hand area is When the intersection area with the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold , It is determined that there is no interaction between the pedestrian and the corresponding item rack.

In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: select any point in the detection frame of the pedestrian as an anchor point, and set the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian ; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system, and the second position coordinates of the pedestrian are obtained. The second position coordinates are the position coordinates of the pedestrian in the world coordinate system; collect pedestrians in the preset The second position coordinates of each time point in the time period are obtained, and the route map of the pedestrian in the preset time period is obtained.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: obtaining the pedestrian's orientation information according to the key points of the pedestrian; and obtaining the item rack area to which the pedestrian is facing according to the pedestrian's orientation information and the preset item rack image.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: obtaining the orientation information of the pedestrian according to the key points of the pedestrian, including: selecting the shoulder key point among the key points of the pedestrian, and the shoulder key point includes the left shoulder Key points and right shoulder key points; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector, and the preset unit vector Is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and π to obtain the direction angle of the pedestrian; when the direction angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing to be detected One side of the image; when the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: acquiring the image to be detected includes: acquiring a surveillance video of the target location; and screening an image with pedestrians from the surveillance video as the image to be detected.

In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: obtaining a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain annotated image data; inputting the annotated image data into the neural network model The multi-task model is obtained by training in the process; preferably, the neural network model adopts the ResNet-101+FPN network model.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and their description is relatively specific and detailed, but they should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

An interactive behavior recognition method, characterized in that the method includes:

Obtain the image to be detected;

Input the image to be detected into a preset multi-task model to obtain key points and a detection frame of pedestrians in the image to be detected, the key points are all located inside the detection frame, and the multi-task model is used for pedestrian detection And detection of key points of the human body;

According to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined.
The method according to claim 1, wherein the preset item rack image is a preset item rack mask image, and the preset item rack corresponding to the key point of the pedestrian and the image to be detected is The image, which determines the interaction behavior information between the pedestrian and the corresponding item rack, includes:

Selecting a wrist key point among the key points of the pedestrian;

Obtaining the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold;

When the intersecting area of the image of the hand area and the mask image of the preset item rack is greater than a preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack;

When the intersection area of the image of the hand region and the preset mask image of the item rack is less than or equal to the area threshold, it is determined that no interaction behavior between the pedestrian and the corresponding item rack has occurred.
The method according to claim 1, wherein the method further comprises:

Selecting any point in the pedestrian detection frame as the positioning point, and setting the position coordinates of the positioning point in the image to be tested as the first position coordinates of the pedestrian;

According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position of the pedestrian in the world coordinate system coordinate;

Collecting the second position coordinates of the pedestrian at each time point in the preset time period to obtain a route map of the pedestrian in the preset time period.
The method according to claim 1, wherein the method further comprises:

Obtain the direction information of the pedestrian according to the key points of the pedestrian;

According to the orientation information of the pedestrian and the preset item rack image, the item rack area to which the pedestrian is facing is obtained.
The method according to claim 4, wherein the obtaining the direction information of the pedestrian according to the key points of the pedestrian comprises:

Selecting a shoulder key point among the key points of the pedestrian, the shoulder key point includes a left shoulder key point and a right shoulder key point;

Calculating the difference between the coordinates of the left shoulder key point and the coordinates of the right shoulder key point to obtain a shoulder vector;

Using an arc cosine function to calculate the angle between the shoulder vector and a preset unit vector, where the preset unit vector is a unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;

Sum the radian value of the included angle and π to obtain the heading angle of the pedestrian;

When the orientation angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing one side of the image to be detected;

When the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.
The method according to any one of claims 1 to 5, wherein said acquiring the image to be detected comprises:

Obtain surveillance video of the target site;

An image with pedestrians is selected from the surveillance video as the image to be detected.
The method according to any one of claims 1 to 5, wherein the method further comprises:

Obtain sample images;

Carry out key point annotation and detection frame annotation on pedestrians in the sample image to obtain annotated image data;

The labeled image data is input into a neural network model for training to obtain the multi-task model; preferably, the neural network model adopts a ResNet-101+FPN network model.
An interactive behavior recognition device, characterized in that the device includes:

The acquisition module is used to acquire the image to be detected;

The detection module is used to input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected. The key points are all located inside the detection frame, and the multi-task The model is used for pedestrian detection and human key point detection;

The recognition module is configured to determine the interaction behavior information between the pedestrian and the corresponding article rack according to the key points of the pedestrian and the preset article rack image corresponding to the image to be detected.
A computer device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 7 when the computer program is executed The steps of the method.
A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by a processor.