WO2021093329A1 - Interactive behavior identification method and apparatus, computer device and storage medium - Google Patents

Interactive behavior identification method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2021093329A1
WO2021093329A1 PCT/CN2020/097002 CN2020097002W WO2021093329A1 WO 2021093329 A1 WO2021093329 A1 WO 2021093329A1 CN 2020097002 W CN2020097002 W CN 2020097002W WO 2021093329 A1 WO2021093329 A1 WO 2021093329A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
pedestrian
preset
detected
key points
Prior art date
Application number
PCT/CN2020/097002
Other languages
French (fr)
Chinese (zh)
Inventor
余代伟
孙皓
董昱青
庄喜阳
李永翔
Original Assignee
苏宁易购集团股份有限公司
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司, 苏宁云计算有限公司 filed Critical 苏宁易购集团股份有限公司
Priority to CA3160731A priority Critical patent/CA3160731A1/en
Publication of WO2021093329A1 publication Critical patent/WO2021093329A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • This application relates to the field of computer vision technology, and in particular to an interactive behavior recognition method, device, computer equipment, and storage medium.
  • the traditional human-goods interaction behavior recognition method generally uses sound, light, electricity and other sensor devices to realize behavior recognition, which requires high hardware costs, and its use scenarios are limited, and cannot be applied to complex environments such as supermarkets on a large scale; supermarket monitoring The equipment generates a large amount of video data every day. Analyzing the surveillance video can obtain a lot of information about the interaction between people and goods, but this requires a lot of manpower and also has the problem of low efficiency.
  • An interactive behavior identification method which includes:
  • the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
  • the interaction behavior information between the pedestrian and the corresponding item rack is determined.
  • the preset item rack image is a preset item rack mask image
  • the interaction behavior information between the pedestrian and the corresponding item rack is determined based on the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, including :
  • the pedestrian's hand area is obtained
  • the method further includes:
  • the method further includes:
  • the item rack area to which the pedestrian is facing is obtained.
  • obtaining the pedestrian's orientation information according to the key points of the pedestrian includes:
  • the key points of the shoulder include the key points of the left shoulder and the key points of the right shoulder;
  • the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector.
  • the preset unit vector is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;
  • the orientation angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing the side of the image to be detected;
  • orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
  • acquiring the image to be detected includes:
  • the method further includes:
  • the labeled image data is input into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
  • a recognition device for human-goods interaction behavior comprising:
  • the acquisition module is used to acquire the image to be detected
  • the detection module is used to input the image to be detected into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected.
  • the key points are located inside the detection frame.
  • the multi-task model is used for pedestrian detection and human key point detection ;
  • the recognition module is used to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
  • a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
  • the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
  • the interaction behavior information between the pedestrian and the corresponding item rack is determined.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
  • the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
  • the interaction behavior information between the pedestrian and the corresponding item rack is determined.
  • the above-mentioned interactive behavior recognition method, device, computer equipment and storage medium obtain the image to be detected.
  • the key points and detection frames of pedestrians in the image to be detected are obtained.
  • the multi-task model of pedestrian detection and human body key point detection can simultaneously obtain pedestrian detection frame and human body key points, which improves the efficiency of image processing; key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame to achieve integration
  • Use the detection frame and key points to improve the accuracy of the key point labeling; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, determine the interaction behavior information of the pedestrian and the corresponding item rack, which can efficiently identify the interaction behavior , And improve the recognition accuracy.
  • Figure 1 is an application environment diagram of an interactive behavior recognition method in an embodiment
  • Figure 2 is a schematic flowchart of an interactive behavior identification method in an embodiment
  • FIG. 3 is a schematic flowchart of an interactive behavior judgment step in an embodiment
  • Figure 5 is a structural block diagram of an interactive behavior recognition device in an embodiment
  • Fig. 6 is an internal structure diagram of a computer device in an embodiment.
  • the interactive behavior identification method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the terminal 102 can be, but is not limited to, various image acquisition devices.
  • the terminal 102 can be an existing monitoring device in a shopping mall, supermarket or library, and the server 104 can be an independent server or a server cluster composed of multiple servers. achieve.
  • an interactive behavior recognition method is provided.
  • the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • Step 202 Obtain an image to be detected.
  • the image to be detected is an image with pedestrians collected by an image acquisition device.
  • the above-mentioned image acquisition device may be a monitoring device that has been installed and used in a target place such as a shopping mall, a supermarket or a library, such as an existing camera in the target place, without the need to monitor the target.
  • the site is transformed, and the deployment cost is low.
  • the surveillance video is acquired through the camera, and pictures with pedestrians are selected from the surveillance video as the image to be detected.
  • Step 204 Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected.
  • the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection.
  • the multi-task model can obtain the detection frame of the pedestrian in the image to be detected through pedestrian detection, and at the same time, the key points of the human body are detected to obtain the key points of the pedestrian, so as to achieve the synchronization of the detection frame and key points of the pedestrian, and the feature sharing between different tasks. It reduces the amount of calculation, reduces the hardware resource occupation, and shortens the processing time of a single frame image. It can process the images to be detected obtained from multiple cameras at the same time, and realize the parallel processing of multiple cameras.
  • the acquired image to be detected is input into a preset multi-task model.
  • the multi-task model performs pedestrian detection and human key point detection on the image to be detected.
  • the multi-task model can exclude The key points outside the detection frame make the output key points all located inside the detection frame.
  • the multi-task model can output the key points and the detection frame of the pedestrian in the image to be detected.
  • N is the number of pedestrians in the image to be detected
  • Step 206 Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
  • the existing cameras, target location layouts, and item racks are pre-positioned and marked, and each camera is configured with a corresponding preset item rack image. It is known that the image to be detected is obtained through one of the cameras, which can be seen by the same.
  • the images to be detected acquired by one camera all correspond to the aforementioned cameras, so the images to be detected also correspond to the preset item rack images configured by the aforementioned cameras.
  • the image to be detected is acquired, and the key points and detection frames of pedestrians in the image to be detected are obtained by inputting the image to be detected into a preset multi-task model.
  • This method is used for pedestrian detection and key points of the human body.
  • the multi-task detection model can obtain the pedestrian detection frame and the key points of the human body simultaneously, which improves the efficiency of image processing; the key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame, so as to achieve comprehensive utilization of the detection frame and key points ,
  • the purpose of improving the accuracy of the key point labeling according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information of the pedestrian and the corresponding item rack can be determined, which can efficiently identify the interaction behavior and improve the recognition accuracy rate ; And this method can realize the full-process automatic processing without manual intervention, which greatly reduces labor costs.
  • the preset item rack image is a preset item rack mask image
  • the preset item rack mask image may be a frame of image extracted from a large number of surveillance videos, and then labeled with a polygon The image obtained from the outline of the item rack in the image; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined, including:
  • Step 302 selecting a wrist key point among pedestrian key points
  • the wrist key point data includes left wrist key point data and right wrist key point data.
  • Step 304 Obtain the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold;
  • the left-hand area and the right-hand area are divided into the left-hand area and the right-hand area by taking the left-hand and right-hand key points as the center of the circle respectively, and the preset radius threshold is the radius, so as to obtain the image of the left-hand area and the image of the right-hand area.
  • Step 306 Determine whether the intersecting area of the image of the hand area and the preset mask image of the article rack is greater than a preset area threshold
  • Step 308 if yes, determine that the pedestrian interacts with the corresponding item rack
  • step 310 if not, it is determined that there is no interaction between the pedestrian and the corresponding item rack.
  • the hand area includes a left-hand area and a right-hand area. Specifically, when the intersecting area of the image of at least one of the left-hand area and the right-hand area and the preset item rack mask image is greater than the preset area threshold At the time, it is determined that the pedestrian interacts with the corresponding item rack; otherwise, it is determined that the pedestrian does not interact with the corresponding item rack.
  • E.g Indicates the hand area with the left wrist as the center and R as the radius, that is, the left hand area; Indicates the hand area with the right wrist as the center and R as the radius, that is, the right hand area;
  • the preset area threshold is 150 unit area.
  • an interactive behavior recognition method judges the interactive behavior by directly estimating the intersection area of the hand and the item rack, which is simple and easy to implement, has strong scalability and fast calculation speed.
  • the real-time performance is better; this method is usually used for the recognition of human-goods interaction behaviors in shopping malls and supermarkets.
  • the item racks are shelves in the shopping malls and supermarkets.
  • this method can also be used for the recognition of human-object interaction behaviors in other places, such as libraries. At this time, the item rack is the library shelf.
  • the method further includes:
  • the center point of the detection frame is selected as the positioning point, which is convenient to select, and the center point can more accurately indicate the position of the pedestrian.
  • the preset coordinate mapping relationship is the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system; specifically, the position of the image acquisition device in the world coordinate system is pre-calibrated through the position information of the image acquisition device, namely The coordinate position of the image to be detected in the world coordinate system collected by the image acquisition device can be obtained, so as to infer the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system.
  • the preset time period is the time from when pedestrians enter the target place to when they leave the target place.
  • the route map of the pedestrian within the preset time period is the route that the pedestrian passes from entering the target place to exiting the target place, that is, the pedestrian's moving line diagram.
  • an interactive behavior recognition method can obtain the pedestrian's route map within a preset time period according to the pedestrian's detection frame and the preset coordinate mapping relationship, which is convenient to record the pedestrian's The movement trajectory in the target place.
  • This method is applied to a shopping mall or supermarket, you can intuitively observe the data of the customer's movement route in the supermarket from entering to leaving. The staff can adjust the supermarket layout based on these data to make it more updated. Adapt to customers' shopping habits.
  • the method further includes:
  • shoulder key points include left shoulder key points And right shoulder key points
  • the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector.
  • the preset unit vector is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;
  • orientation angle When the orientation angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing one side of the image to be detected; when the orientation angle is greater than or equal to 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
  • the item rack area to which the pedestrian is facing is obtained. Specifically, according to the orientation of the pedestrian in the image to be detected and the preset item rack image corresponding to the image to be detected, the area of the item rack to which the pedestrian is oriented can be obtained.
  • an interactive behavior recognition method which uses the key point data of the shoulder to calculate the direction of the pedestrian, and the result of the direction is more robust, so as to determine the shelf area of the customer’s attention, which can be a business Super product placement provides reference.
  • acquiring the image to be detected includes:
  • the above-mentioned image acquisition equipment generally adopts webcam.
  • an interactive behavior recognition method is provided.
  • the method directly utilizes existing monitoring equipment at the target location, such as a camera in a shopping mall or supermarket, without the need to modify the venue, has low deployment cost, and is easy to promote.
  • the method further includes:
  • sample images specifically, obtain surveillance videos of shopping malls and supermarkets, and filter out a large number of images with pedestrians as sample images from the surveillance videos.
  • the neural network model adopts the ResNet-101+FPN network model, which is a one-stage bottom-up multi-task network model, which is more similar to similar models. Compared with the stage algorithm, it saves processing time; compared with the top-down algorithm, the processing time does not change with the number of people in the picture.
  • an interactive behavior recognition method which processes the images to be detected by establishing and training a multi-task model.
  • the training and optimization of the model are completed in the background without affecting places such as shopping malls, supermarkets or libraries.
  • the operation of the model; and the model has strong generalization ability, which can be easily and quickly deployed; the features can be shared between different tasks of the multi-task model, which reduces the amount of calculation, reduces the hardware resource occupation, shortens the processing time of a single frame, and realizes the parallelism of multiple cameras deal with.
  • the method includes the following steps:
  • Step 402 Obtain surveillance video of the target location
  • Step 404 Filter out an image with pedestrians from the surveillance video as an image to be detected
  • Step 406 Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected, and the key points are all located inside the detection frame;
  • Step 408 Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected;
  • Step 410 Obtain a route map of the pedestrian in a preset time period according to the mapping relationship between the detection frame of the pedestrian and the preset coordinate;
  • Step 412 Obtain the direction information of the pedestrian according to the key points of the pedestrian.
  • an interactive behavior recognition device which includes an acquisition module 502, a detection module 504, and an identification module 506, wherein:
  • the obtaining module 502 is used to obtain the image to be detected
  • the detection module 504 is used to input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected.
  • the key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key points Detection
  • the recognition module 506 is configured to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
  • the preset item rack image is a preset item rack mask image
  • the aforementioned recognition module 506 includes:
  • the first key point selection unit is used to select the key point of the wrist among the key points of the pedestrian;
  • the hand area unit is used to obtain the pedestrian's hand area according to the key points of the wrist and the preset radius threshold;
  • the interaction determination unit is used to determine when the intersecting area of the hand area image and the preset item rack mask image is greater than the preset area threshold, the pedestrian interacts with the corresponding item rack; when the hand area image and the preset item rack are hidden When the intersecting area of the modular image is less than or equal to the area threshold, it is determined that there is no interaction between the pedestrian and the corresponding item rack.
  • the device further includes:
  • the first position coordinate module is used to select any point in the detection frame of the pedestrian as the positioning point, and set the position coordinates of the positioning point in the image to be tested as the first position coordinates of the pedestrian;
  • the second position coordinate module is used to map the first position coordinates of the pedestrian to the world coordinate system according to the preset coordinate mapping relationship to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position of the pedestrian in the world coordinate system coordinate;
  • the route map module is used to collect the second position coordinates of the pedestrian at each time point in the preset time period to obtain the pedestrian's route map in the preset time period.
  • the device further includes:
  • the orientation information module is used to obtain the orientation information of the pedestrian according to the key points of the pedestrian;
  • the orientation area module is used to obtain the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.
  • the above-mentioned orientation information module includes:
  • the second key point selection unit is used to select the shoulder key points among the key points of pedestrians.
  • the shoulder key points include the left shoulder key point and the right shoulder key point;
  • the direction angle calculation unit is used to calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector;
  • the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector.
  • the preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and ⁇ to obtain the heading angle of the pedestrian;
  • the orientation determination unit is used to determine that the pedestrian is facing one side of the image to be detected when the orientation angle is greater than or equal to ⁇ and less than 1.5 ⁇ ; when the orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , determine that the pedestrian is facing the other side of the image to be detected .
  • the above-mentioned obtaining module 502 includes:
  • the video acquisition unit is used to acquire the surveillance video of the target location
  • the image acquisition unit is used to screen out the image with pedestrians from the surveillance video as the image to be detected.
  • the device further includes:
  • the sample acquisition module is used to acquire sample images
  • the sample data module is used to label the pedestrians in the sample image with key points and check boxes to obtain labeled image data
  • the model training module is used to input the labeled image data into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
  • Each module in the above-mentioned interactive behavior recognition device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize an interactive behavior identification method.
  • FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, the following steps are implemented: acquiring an image to be detected;
  • the detection image is input into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected.
  • the key points are located inside the detection frame.
  • the multi-task model is used for pedestrian detection and human key point detection; according to the key points and key points of the pedestrian
  • the preset item rack image corresponding to the image to be detected determines the interaction behavior information between pedestrians and the corresponding item rack.
  • the processor further implements the following steps when executing the computer program: the preset item rack image is the preset item rack mask image, and the pedestrian is determined according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected
  • the step of interacting behavior information with the corresponding item rack includes: selecting the wrist key point among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the wrist key point and the preset radius threshold; when the hand area image and When the intersection area of the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold, It is determined that there is no interaction between the pedestrian and the corresponding item rack.
  • the processor further implements the following steps when executing the computer program: selecting any point in the detection frame of the pedestrian as an anchor point, and setting the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system to obtain the second position coordinates of the pedestrian.
  • the second position coordinates are the position coordinates of the pedestrian in the world coordinate system; the pedestrian is collected at the preset time The second position coordinates of each time point in the segment are obtained, and the route map of the pedestrian in the preset time period is obtained.
  • the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the key points of the pedestrian; obtaining the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.
  • the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the pedestrian's key points, including: selecting the shoulder key points of the pedestrian key points, and the shoulder key points include the left shoulder key Point and right shoulder key point; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector.
  • the preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and ⁇ to obtain the heading angle of the pedestrian; when the heading angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing the image to be detected When the orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
  • the processor further implements the following steps when executing the computer program: acquiring the image to be detected includes: acquiring a surveillance video of the target location; and filtering out images with pedestrians from the surveillance video as the image to be detected.
  • the processor further implements the following steps when executing the computer program: acquiring a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain labeled image data; and inputting the labeled image data into the neural network model Perform training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
  • a computer-readable storage medium on which a computer program is stored.
  • the following steps are implemented: acquiring the image to be detected; inputting the image to be detected into a preset multitasking model , Obtain the key points and detection frame of pedestrians in the image to be detected.
  • the key points are located inside the detection frame.
  • the multi-task model is used for pedestrian detection and human key point detection; according to the key points of the pedestrian and the preset item rack corresponding to the image to be detected Image, determine the interaction behavior information of pedestrians and corresponding item racks.
  • the preset item rack image is the preset item rack mask image
  • the preset item rack image corresponding to the key points of the pedestrian and the image to be detected is determined
  • the step of interactive behavior information between pedestrians and corresponding item racks includes: selecting the key points of the wrist among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold; when the image of the hand area is When the intersection area with the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold , It is determined that there is no interaction between the pedestrian and the corresponding item rack.
  • the following steps are also implemented: select any point in the detection frame of the pedestrian as an anchor point, and set the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian ; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system, and the second position coordinates of the pedestrian are obtained.
  • the second position coordinates are the position coordinates of the pedestrian in the world coordinate system; collect pedestrians in the preset The second position coordinates of each time point in the time period are obtained, and the route map of the pedestrian in the preset time period is obtained.
  • the following steps are further implemented: obtaining the pedestrian's orientation information according to the key points of the pedestrian; and obtaining the item rack area to which the pedestrian is facing according to the pedestrian's orientation information and the preset item rack image.
  • the following steps are further implemented: obtaining the orientation information of the pedestrian according to the key points of the pedestrian, including: selecting the shoulder key point among the key points of the pedestrian, and the shoulder key point includes the left shoulder Key points and right shoulder key points; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector, and the preset unit vector Is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and ⁇ to obtain the direction angle of the pedestrian; when the direction angle is greater than or equal to ⁇ and less than 1.5 ⁇ , it is determined that the pedestrian is facing to be detected One side of the image; when the orientation angle is greater than 1.5 ⁇ and less than or equal to 2 ⁇ , it is determined that the pedestrian is facing the other side of the image to be detected.
  • acquiring the image to be detected includes: acquiring a surveillance video of the target location; and screening an image with pedestrians from the surveillance video as the image to be detected.
  • the following steps are also implemented: obtaining a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain annotated image data; inputting the annotated image data into the neural network model
  • the multi-task model is obtained by training in the process; preferably, the neural network model adopts the ResNet-101+FPN network model.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

The present application relates to an interactive behavior identification method and apparatus, a computer device and a storage medium. The method comprises: acquiring an image to be detected; inputting the image into a preset multi-task model to obtain key points and a detection frame of a passerby in the image, wherein the key points are located inside the detection frame, and the multi-task model is used for passerby detection and human body key point detection; and according to the key points of the passerby and a preset item rack image corresponding to the image, determining interactive behavior information of the passerby and a corresponding item rack. By using the present method, interactive behavior between passersby and items may be efficiently identified.

Description

交互行为识别方法、装置、计算机设备和存储介质Interactive behavior recognition method, device, computer equipment and storage medium 技术领域Technical field
本申请涉及计算机视觉技术领域,特别是涉及一种交互行为识别方法、装置、计算机设备和存储介质。This application relates to the field of computer vision technology, and in particular to an interactive behavior recognition method, device, computer equipment, and storage medium.
背景技术Background technique
随着互联网时代的到来,零售行业开始进入飞速发展的阶段,未来的零售是智慧零售,也就是运用互联网和大数据等技术,去感知用户的消费习惯,从而为消费者提供多样化、个性化的产品和服务,而人货交互行为识别是智慧零售领域需要解决的问题。With the advent of the Internet era, the retail industry has begun to enter a stage of rapid development. The future of retail is smart retail, that is, using technologies such as the Internet and big data to perceive users’ consumption habits, so as to provide consumers with diversification and personalization. The recognition of human-goods interaction behavior is a problem that needs to be solved in the smart retail field.
传统的人货交互行为识别方法一般是借助声、光、电等传感器设备来实现行为识别,需要高昂的硬件成本,而且使用场景受限,无法大规模应用于商超等复杂环境;商超监控设备每天产生大量视频数据,分析监控视频可以获得很多人货交互行为的相关信息,但这需要耗费庞大的人力,还存在效率低下的问题。The traditional human-goods interaction behavior recognition method generally uses sound, light, electricity and other sensor devices to realize behavior recognition, which requires high hardware costs, and its use scenarios are limited, and cannot be applied to complex environments such as supermarkets on a large scale; supermarket monitoring The equipment generates a large amount of video data every day. Analyzing the surveillance video can obtain a lot of information about the interaction between people and goods, but this requires a lot of manpower and also has the problem of low efficiency.
发明内容Summary of the invention
基于此,有必要针对上述技术问题,提供一种能够高效地识别人体与物品交互行为的交互行为识别方法、装置、计算机设备和存储介质。Based on this, it is necessary to address the above technical problems and provide an interactive behavior recognition method, device, computer equipment, and storage medium that can efficiently recognize the interaction between the human body and the article.
一种交互行为识别方法,该方法包括:An interactive behavior identification method, which includes:
获取待检测图像;Obtain the image to be detected;
将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测;Input the image to be detected into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。According to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined.
在其中一个实施例中,预设物品架图像为预设物品架掩模图像,根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息,包括:In one of the embodiments, the preset item rack image is a preset item rack mask image, and the interaction behavior information between the pedestrian and the corresponding item rack is determined based on the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, including :
选取行人的关键点中的手腕关键点;Select the key point of the wrist among the key points of the pedestrian;
根据手腕关键点和预设的半径阈值,得到行人的手部区域;According to the key points of the wrist and the preset radius threshold, the pedestrian's hand area is obtained;
当手部区域的图像和预设物品架掩模图像的相交面积大于预设面积阈值时,判定行人与对应物品架发生交互行为;When the intersecting area between the image of the hand area and the mask image of the preset item rack is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack;
当手部区域的图像和预设物品架掩模图像的相交面积小于或等于面积阈值时,判定行人与对应物品架未发生交互行为。When the intersecting area of the image of the hand area and the preset mask image of the item rack is less than or equal to the area threshold, it is determined that there is no interaction between the pedestrian and the corresponding item rack.
在其中一个实施例中,该方法还包括:In one of the embodiments, the method further includes:
选取行人的检测框中任一点作为定位点,将定位点在待测试图像中的位置坐标设定为行人的第一位置坐标;Select any point in the pedestrian detection frame as the anchor point, and set the position coordinate of the anchor point in the image to be tested as the first position coordinate of the pedestrian;
根据预设坐标映射关系,将行人的第一位置坐标映射到世界坐标系中,得到行人的第二位置坐标,第二位置坐标为行人在世界坐标系中的位置坐标;According to the preset coordinate mapping relationship, map the first position coordinates of the pedestrian to the world coordinate system to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position coordinates of the pedestrian in the world coordinate system;
采集行人在预设时间段内各时间点的第二位置坐标,得到行人在预设时间段内的路线图。Collect the second position coordinates of the pedestrian at each time point in the preset time period to obtain the pedestrian's route map in the preset time period.
在其中一个实施例中,该方法还包括:In one of the embodiments, the method further includes:
根据行人的关键点,得到行人的朝向信息;According to the key points of the pedestrian, get the direction information of the pedestrian;
根据行人的朝向信息和预设物品架图像,得到行人朝向的物品架区域。According to the pedestrian's orientation information and the preset item rack image, the item rack area to which the pedestrian is facing is obtained.
在其中一个实施例中,根据行人的关键点,得到行人的朝向信息,包括:In one of the embodiments, obtaining the pedestrian's orientation information according to the key points of the pedestrian includes:
选取行人的关键点中的肩部关键点,肩部关键点包括左肩关键点和右肩关键点;Select the key points of the shoulder among the key points of pedestrians. The key points of the shoulder include the key points of the left shoulder and the key points of the right shoulder;
对左肩关键点的坐标和右肩关键点的坐标求差,得到肩部向量;Calculate the difference between the coordinates of the key point of the left shoulder and the key point of the right shoulder to obtain the shoulder vector;
采用反余弦函数计算肩部向量与预设单位向量的夹角,预设单位向量为待检测图像的坐标系y轴负方向上的单位向量;The inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector. The preset unit vector is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;
对夹角的弧度值与π求和,得到行人的朝向角;Sum the radian value of the included angle and π to obtain the heading angle of the pedestrian;
当朝向角大于等于π且小于1.5π时,判定行人朝向待检测图像的一侧;When the orientation angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing the side of the image to be detected;
当朝向角大于1.5π且小于等于2π时,判定行人朝向待检测图像的另一侧。When the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.
在其中一个实施例中,获取待检测图像,包括:In one of the embodiments, acquiring the image to be detected includes:
获取目标场所的监控视频;Obtain surveillance video of the target site;
从监控视频中筛选出具有行人的图像作为待检测图像。Screen out the image with pedestrians from the surveillance video as the image to be detected.
在其中一个实施例中,该方法还包括:In one of the embodiments, the method further includes:
获取样本图像;Obtain sample images;
对样本图像中的行人进行关键点标注和检测框标注,得到标注图像数据;Carry out key point annotation and detection frame annotation on pedestrians in the sample image to obtain annotated image data;
将标注图像数据输入神经网络模型中进行训练,得到多任务模型;优选地,神经网络模型采用ResNet-101+FPN网络模型。The labeled image data is input into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
一种人货交互行为识别装置,该装置包括:A recognition device for human-goods interaction behavior, the device comprising:
获取模块,用于获取待检测图像;The acquisition module is used to acquire the image to be detected;
检测模块,用于将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测;The detection module is used to input the image to be detected into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected. The key points are located inside the detection frame. The multi-task model is used for pedestrian detection and human key point detection ;
识别模块,用于根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。The recognition module is used to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:A computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
获取待检测图像;Obtain the image to be detected;
将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键 点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测;Input the image to be detected into the preset multi-task model to obtain the key points and detection frame of pedestrians in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。According to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
获取待检测图像;Obtain the image to be detected;
将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测;Input the image to be detected into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection;
根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。According to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined.
上述交互行为识别方法、装置、计算机设备和存储介质,获取待检测图像,通过将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,该方法通过用于行人检测和人体关键点检测的多任务模型,可以同步获取行人检测框和人体关键点,提高了图像处理效率;关键点均位于检测框内部,可以排除检测框外部的错误关键点,从而达到综合利用检测框和关键点,提高关键点标注准确度的目的;根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息,能够高效地识别交互行为,并提高识别准确率。The above-mentioned interactive behavior recognition method, device, computer equipment and storage medium obtain the image to be detected. By inputting the image to be detected into a preset multi-task model, the key points and detection frames of pedestrians in the image to be detected are obtained. The multi-task model of pedestrian detection and human body key point detection can simultaneously obtain pedestrian detection frame and human body key points, which improves the efficiency of image processing; key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame to achieve integration Use the detection frame and key points to improve the accuracy of the key point labeling; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, determine the interaction behavior information of the pedestrian and the corresponding item rack, which can efficiently identify the interaction behavior , And improve the recognition accuracy.
附图说明Description of the drawings
图1为一个实施例中交互行为识别方法的应用环境图;Figure 1 is an application environment diagram of an interactive behavior recognition method in an embodiment;
图2为一个实施例中交互行为识别方法的流程示意图;Figure 2 is a schematic flowchart of an interactive behavior identification method in an embodiment;
图3为一个实施例中交互行为判断步骤的流程示意图;FIG. 3 is a schematic flowchart of an interactive behavior judgment step in an embodiment;
图4为另一个实施例中交互行为识别方法的流程示意图;4 is a schematic flowchart of an interactive behavior recognition method in another embodiment;
图5为一个实施例中交互行为识别装置的结构框图;Figure 5 is a structural block diagram of an interactive behavior recognition device in an embodiment;
图6为一个实施例中计算机设备的内部结构图。Fig. 6 is an internal structure diagram of a computer device in an embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供的交互行为识别方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。终端102可以但不限于是各种图像采集装置,具体地,终端102可以为商场超市或图书馆等场所现有的监控设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The interactive behavior identification method provided in this application can be applied to the application environment as shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network through the network. The terminal 102 can be, but is not limited to, various image acquisition devices. Specifically, the terminal 102 can be an existing monitoring device in a shopping mall, supermarket or library, and the server 104 can be an independent server or a server cluster composed of multiple servers. achieve.
在一个实施例中,如图2所示,提供了一种交互行为识别方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In an embodiment, as shown in FIG. 2, an interactive behavior recognition method is provided. The method is applied to the server in FIG. 1 as an example for description, including the following steps:
步骤202,获取待检测图像。Step 202: Obtain an image to be detected.
其中,待检测图像为图像采集装置所采集的具有行人的图像,上述图像采集装置可以为商场超市或图书馆等目标场所已经安装并使用的监控设备,例如目标场所现有的摄像头,无需对目标场所进行改造,部署成本低。Among them, the image to be detected is an image with pedestrians collected by an image acquisition device. The above-mentioned image acquisition device may be a monitoring device that has been installed and used in a target place such as a shopping mall, a supermarket or a library, such as an existing camera in the target place, without the need to monitor the target. The site is transformed, and the deployment cost is low.
具体地,通过摄像头获取监控视频,从该监控视频中筛选出具有行人的图片作为待检测图像。Specifically, the surveillance video is acquired through the camera, and pictures with pedestrians are selected from the surveillance video as the image to be detected.
步骤204,将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测。Step 204: Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key point detection.
其中,多任务模型可以通过行人检测获得待检测图像中行人的检测框,同时进行人体关键点检测获得行人的关键点,从而实现同步获取行人的检测框和关键点,不同任务之间特征共享,降低了计算量,减少了硬件资源占用,缩短了单帧图像处理时间,可以对从多路摄像头获取到的待检测图像同时进行处理,实现多路摄像头并行处理。Among them, the multi-task model can obtain the detection frame of the pedestrian in the image to be detected through pedestrian detection, and at the same time, the key points of the human body are detected to obtain the key points of the pedestrian, so as to achieve the synchronization of the detection frame and key points of the pedestrian, and the feature sharing between different tasks. It reduces the amount of calculation, reduces the hardware resource occupation, and shortens the processing time of a single frame image. It can process the images to be detected obtained from multiple cameras at the same time, and realize the parallel processing of multiple cameras.
具体地,将获取到的待检测图像输入预设的多任务模型,该多任务模型对待检测图像进行行人检测和人体关键点检测,多任务模型在处理待检测图像的过程中,可以排除掉位于检测框外部的关键点,使得输出的关键点均位于检测框内部,最后多任务模型可输出待检测图像中行人的关键点和检测框。Specifically, the acquired image to be detected is input into a preset multi-task model. The multi-task model performs pedestrian detection and human key point detection on the image to be detected. In the process of processing the image to be detected, the multi-task model can exclude The key points outside the detection frame make the output key points all located inside the detection frame. Finally, the multi-task model can output the key points and the detection frame of the pedestrian in the image to be detected.
例如,向上述多任务模型输入待检测图片I H×W×3,多任务模型输出关键点
Figure PCTCN2020097002-appb-000001
和检测框
Figure PCTCN2020097002-appb-000002
For example, input the image to be detected I H×W×3 to the above-mentioned multi-task model, and the multi-task model outputs key points
Figure PCTCN2020097002-appb-000001
And check box
Figure PCTCN2020097002-appb-000002
其中,
Figure PCTCN2020097002-appb-000003
among them,
Figure PCTCN2020097002-appb-000003
Figure PCTCN2020097002-appb-000004
Figure PCTCN2020097002-appb-000004
其中,N是待检测图像中的行人个数,K是每个行人关键点的个数,通常取K=17;Among them, N is the number of pedestrians in the image to be detected, and K is the number of key points for each pedestrian, usually K=17;
Figure PCTCN2020097002-appb-000005
表示第i个人的第j个关键点在待检测图像上的坐标;
Figure PCTCN2020097002-appb-000005
Represents the coordinates of the j-th key point of the i-th person on the image to be detected;
Figure PCTCN2020097002-appb-000006
表示第i个人的检测框的左上角和右下角在待检测图片上的坐标,score表示检测框的置信度,即可信程度。
Figure PCTCN2020097002-appb-000006
Indicates the coordinates of the upper left and lower right corners of the detection frame of the i-th person on the picture to be detected, and score represents the confidence level of the detection frame, that is, the degree of confidence.
步骤206,根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。Step 206: Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
其中,预先对现有的摄像头、目标场所布局、物品架进行定位和标注,并为每个摄像头配置对应的预设物品架图像,已知待检测图像是通过其中一个摄像头获取的,可见由同一个摄像头获取的待检测图像均对应上述摄像头,从而待检测图像也对应上述摄像头配置的预设物品架图像。Among them, the existing cameras, target location layouts, and item racks are pre-positioned and marked, and each camera is configured with a corresponding preset item rack image. It is known that the image to be detected is obtained through one of the cameras, which can be seen by the same The images to be detected acquired by one camera all correspond to the aforementioned cameras, so the images to be detected also correspond to the preset item rack images configured by the aforementioned cameras.
具体地,可以选取行人的关键点中的一个部位关键点作为参考关键点,然后根据该参考关键点与上述预设物品架图像之间的相互关系,例如距离或相交面积,来判断行人和对应物品架之间的交互行为。Specifically, one of the key points of pedestrians can be selected as a reference key point, and then based on the correlation between the reference key point and the above-mentioned preset item rack image, such as the distance or the intersection area, the pedestrian and the corresponding can be judged The interaction between the racks.
上述交互行为识别方法中,获取待检测图像,通过将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,该方法通过用于进行行人检测和人体关键点检测的多任务模型,可以同步获取行人检测框和人体关键点,提高了图像处理效率;关键点均位于检测框内部, 可以排除检测框外部的错误关键点,从而达到综合利用检测框和关键点,提高关键点标注准确度的目的;根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息,能够高效地识别交互行为,并提高识别准确率;而且本方法可以实现全流程自动化处理,不需要人工干预,极大地降低人工成本。In the above interactive behavior recognition method, the image to be detected is acquired, and the key points and detection frames of pedestrians in the image to be detected are obtained by inputting the image to be detected into a preset multi-task model. This method is used for pedestrian detection and key points of the human body. The multi-task detection model can obtain the pedestrian detection frame and the key points of the human body simultaneously, which improves the efficiency of image processing; the key points are located inside the detection frame, which can eliminate the wrong key points outside the detection frame, so as to achieve comprehensive utilization of the detection frame and key points , The purpose of improving the accuracy of the key point labeling; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information of the pedestrian and the corresponding item rack can be determined, which can efficiently identify the interaction behavior and improve the recognition accuracy rate ; And this method can realize the full-process automatic processing without manual intervention, which greatly reduces labor costs.
在一个实施例中,如图3所示,预设物品架图像为预设物品架掩模图像,该预设物品架掩模图像可以是从大量监控视频中抽取一帧图像,再用多边形标注出该图像中物品架外轮廓所得到的图像;根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息,包括:In one embodiment, as shown in FIG. 3, the preset item rack image is a preset item rack mask image, and the preset item rack mask image may be a frame of image extracted from a large number of surveillance videos, and then labeled with a polygon The image obtained from the outline of the item rack in the image; according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined, including:
步骤302,选取行人的关键点中的手腕关键点; Step 302, selecting a wrist key point among pedestrian key points;
其中,手腕关键点数据包括左手腕关键点数据和右手腕关键点数据。Among them, the wrist key point data includes left wrist key point data and right wrist key point data.
步骤304,根据手腕关键点和预设的半径阈值,得到行人的手部区域;Step 304: Obtain the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold;
具体地,分别以左手手腕关键点和右手手腕关键点为圆心,预设的半径阈值为半径,划分出左手区域和右手区域,从而得到左手区域的图像和右手区域的图像。Specifically, the left-hand area and the right-hand area are divided into the left-hand area and the right-hand area by taking the left-hand and right-hand key points as the center of the circle respectively, and the preset radius threshold is the radius, so as to obtain the image of the left-hand area and the image of the right-hand area.
步骤306,判断手部区域的图像和预设物品架掩模图像的相交面积是否大于预设面积阈值;Step 306: Determine whether the intersecting area of the image of the hand area and the preset mask image of the article rack is greater than a preset area threshold;
步骤308,若是,判定行人与对应物品架发生交互行为; Step 308, if yes, determine that the pedestrian interacts with the corresponding item rack;
步骤310,若否,判定行人与对应物品架未发生交互行为。In step 310, if not, it is determined that there is no interaction between the pedestrian and the corresponding item rack.
在上述步骤306中,该手部区域包括左手区域和右手区域,具体地,当左手区域和右手区域中至少一个手部区域的图像和预设物品架掩模图像的相交面积大于预设面积阈值时,判定行人与对应物品架发生交互行为;否则,判定行人与对应物品架未发生交互行为。In the above step 306, the hand area includes a left-hand area and a right-hand area. Specifically, when the intersecting area of the image of at least one of the left-hand area and the right-hand area and the preset item rack mask image is greater than the preset area threshold At the time, it is determined that the pedestrian interacts with the corresponding item rack; otherwise, it is determined that the pedestrian does not interact with the corresponding item rack.
例如,
Figure PCTCN2020097002-appb-000007
表示以左手手腕为圆心,R为半径的手部区域,即左手区域;
Figure PCTCN2020097002-appb-000008
表示以右手手腕为圆心,R为半径的手部区域,即右手区域;
E.g,
Figure PCTCN2020097002-appb-000007
Indicates the hand area with the left wrist as the center and R as the radius, that is, the left hand area;
Figure PCTCN2020097002-appb-000008
Indicates the hand area with the right wrist as the center and R as the radius, that is, the right hand area;
预设面积阈值为150单位面积,当H R∩M S>150时,判定行人与对应物品架发生交互行为,即行人正在购物; The preset area threshold is 150 unit area. When H R ∩M S >150, it is determined that the pedestrian interacts with the corresponding item rack, that is, the pedestrian is shopping;
当H R∩M S≤150时,判定行人与对应物品架未发生交互行为,即行人没在购物。 When H R ∩M S ≤150, it is determined that there is no interaction between the pedestrian and the corresponding item rack, that is, the pedestrian is not shopping.
在本实施例中,提供了一种交互行为识别方法,该交互行为识别方法通过直接估算手部和物品架的相交面积来进行交互行为判断,简单易行,可扩展性强,计算速度快,实时性更好;该方法通常用于商场超市中人货交互行为识别,此时物品架为商场超市中的货架,但本方法也可用于其他场所的人体与物体交互行为识别,例如图书馆,此时物品架为图书馆书架。In this embodiment, an interactive behavior recognition method is provided. The interactive behavior recognition method judges the interactive behavior by directly estimating the intersection area of the hand and the item rack, which is simple and easy to implement, has strong scalability and fast calculation speed. The real-time performance is better; this method is usually used for the recognition of human-goods interaction behaviors in shopping malls and supermarkets. At this time, the item racks are shelves in the shopping malls and supermarkets. However, this method can also be used for the recognition of human-object interaction behaviors in other places, such as libraries. At this time, the item rack is the library shelf.
在一个实施例中,该方法还包括:In one embodiment, the method further includes:
选取行人的检测框中任一点作为定位点,将定位点在待测试图像中的位置坐标设定为行人的第一位置坐标;Select any point in the pedestrian detection frame as the anchor point, and set the position coordinate of the anchor point in the image to be tested as the first position coordinate of the pedestrian;
具体地,选取上述检测框的中心点作为定位点,选取方便,且中心点可以更准确地表明行人的位置。Specifically, the center point of the detection frame is selected as the positioning point, which is convenient to select, and the center point can more accurately indicate the position of the pedestrian.
根据预设坐标映射关系,将行人的第一位置坐标映射到世界坐标系中,得到行人的第二位置坐标,第二位置坐标为行人在世界坐标系中的位置坐标;According to the preset coordinate mapping relationship, map the first position coordinates of the pedestrian to the world coordinate system to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position coordinates of the pedestrian in the world coordinate system;
这里,预设坐标映射关系为待检测图像的坐标系和世界坐标系之间的坐标映射关系;具体地,预先标定图像采集装置在世界坐标系中的位置,通过图像采集装置的位置信息,即可得到该图像采集装置所采集的待检测图像在世界坐标系中的坐标位置,从而推断出待检测图像的坐标系和世界坐标系之间的坐标映射关系。Here, the preset coordinate mapping relationship is the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system; specifically, the position of the image acquisition device in the world coordinate system is pre-calibrated through the position information of the image acquisition device, namely The coordinate position of the image to be detected in the world coordinate system collected by the image acquisition device can be obtained, so as to infer the coordinate mapping relationship between the coordinate system of the image to be detected and the world coordinate system.
采集行人在预设时间段内各时间点的第二位置坐标,得到行人在预设时间段内的路线图。Collect the second position coordinates of the pedestrian at each time point in the preset time period to obtain the pedestrian's route map in the preset time period.
其中,预设时间段为行人从进入目标场所到走出目标场所的时间,行人在预设时间段内的路线图即行人从进入目标场所到走出目标场所所经过的路线,即行人的动线图,结合目标场所布局图,即可在目标场所布局图 上绘制出进入目标场所的行人的动线图。Among them, the preset time period is the time from when pedestrians enter the target place to when they leave the target place. The route map of the pedestrian within the preset time period is the route that the pedestrian passes from entering the target place to exiting the target place, that is, the pedestrian's moving line diagram. , Combined with the layout drawing of the target place, you can draw a moving line diagram of pedestrians entering the target place on the layout drawing of the target place.
在本实施例中,提供了一种交互行为识别方法,该方法可以根据行人的检测框和预设坐标映射关系,得到行人在预设时间段内的路线图,便于记录行人在预设时间内在目标场所内的行动轨迹,当本方法应用于商场超市时,可以直观地观察到顾客从进场到离场期间在超市内的行动路线数据,工作人员可根据这些数据调整超市布局,使其更适应顾客的购物习惯。In this embodiment, an interactive behavior recognition method is provided. The method can obtain the pedestrian's route map within a preset time period according to the pedestrian's detection frame and the preset coordinate mapping relationship, which is convenient to record the pedestrian's The movement trajectory in the target place. When this method is applied to a shopping mall or supermarket, you can intuitively observe the data of the customer's movement route in the supermarket from entering to leaving. The staff can adjust the supermarket layout based on these data to make it more updated. Adapt to customers' shopping habits.
在一个实施例中,该方法还包括:In one embodiment, the method further includes:
根据行人的关键点,得到行人的朝向信息;According to the key points of the pedestrian, get the direction information of the pedestrian;
具体地,选取行人的关键点中的肩部关键点;Specifically, select the key points of the shoulders among the key points of pedestrians;
例如:肩部关键点包括左肩关键点
Figure PCTCN2020097002-appb-000009
和右肩关键点
Figure PCTCN2020097002-appb-000010
For example: shoulder key points include left shoulder key points
Figure PCTCN2020097002-appb-000009
And right shoulder key points
Figure PCTCN2020097002-appb-000010
其中,
Figure PCTCN2020097002-appb-000011
among them,
Figure PCTCN2020097002-appb-000011
对左肩关键点的坐标和右肩关键点的坐标求差,得到肩部向量:Take the difference between the coordinates of the key point of the left shoulder and the key point of the right shoulder to obtain the shoulder vector:
Figure PCTCN2020097002-appb-000012
Figure PCTCN2020097002-appb-000012
采用反余弦函数计算肩部向量与预设单位向量的夹角,预设单位向量为待检测图像的坐标系y轴负方向上的单位向量;The inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector. The preset unit vector is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;
对夹角的弧度值与π求和,得到行人的朝向角:Sum the radian value of the included angle and π to obtain the heading angle of the pedestrian:
Figure PCTCN2020097002-appb-000013
Figure PCTCN2020097002-appb-000013
当朝向角大于等于π且小于1.5π时,判定行人朝向待检测图像的一侧;当朝向角大于1.5π且小于等于2π时,判定行人朝向待检测图像的另一侧。When the orientation angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing one side of the image to be detected; when the orientation angle is greater than or equal to 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.
根据行人的朝向信息和预设物品架图像,得到行人朝向的物品架区域。具体地,根据行人在待检测图像内的朝向和待检测图像所对应的预设物品架图像,可以得到行人朝向的物品架区域。According to the pedestrian's orientation information and the preset item rack image, the item rack area to which the pedestrian is facing is obtained. Specifically, according to the orientation of the pedestrian in the image to be detected and the preset item rack image corresponding to the image to be detected, the area of the item rack to which the pedestrian is oriented can be obtained.
在本实施例中,提供了一种交互行为识别方法,其利用肩部关键点数据,计算出行人的朝向,朝向结果的鲁棒性更高,从而判断出顾客关注的货架区域,可以为商超的货品摆放提供参考。In this embodiment, an interactive behavior recognition method is provided, which uses the key point data of the shoulder to calculate the direction of the pedestrian, and the result of the direction is more robust, so as to determine the shelf area of the customer’s attention, which can be a business Super product placement provides reference.
在一个实施例中,获取待检测图像,包括:In one embodiment, acquiring the image to be detected includes:
获取目标场所的监控视频;Obtain surveillance video of the target site;
具体地,对商超里已经安装并使用的图像采集设备进行位置标定,并为各图像采集设备配置对应的货架掩模图像,获取图像采集设备拍摄到的监控视频,上述的图像采集设备一般采用摄像头。Specifically, perform position calibration on the image acquisition equipment that has been installed and used in the supermarket, and configure the corresponding shelf mask image for each image acquisition equipment to obtain the surveillance video captured by the image acquisition equipment. The above-mentioned image acquisition equipment generally adopts webcam.
从监控视频中筛选出具有行人的图像作为待检测图像。Screen out the image with pedestrians from the surveillance video as the image to be detected.
在本实施例中,提供了一种交互行为识别方法,该方法直接利用目标场所已有的监控设备,例如商场或超市的摄像头,无需对场地进行改造,部署成本低,容易推广。In this embodiment, an interactive behavior recognition method is provided. The method directly utilizes existing monitoring equipment at the target location, such as a camera in a shopping mall or supermarket, without the need to modify the venue, has low deployment cost, and is easy to promote.
在一个实施例中,该方法还包括:In one embodiment, the method further includes:
获取样本图像;具体地,获取商场超市的监控视频,从监控视频中筛选出大量具有行人的图像作为样本图像。Obtain sample images; specifically, obtain surveillance videos of shopping malls and supermarkets, and filter out a large number of images with pedestrians as sample images from the surveillance videos.
对样本图像中的行人进行关键点标注和检测框标注,得到标注图像数据;具体地,标注样本图像中的行人检测框,并标注行人的眼睛、鼻子、耳朵、肩膀、肘、腕、髋、膝、踝等关键点位置,最后得到标注图像数据。Annotate the pedestrians in the sample image with key points and the detection frame to obtain annotated image data; specifically, annotate the pedestrian detection frame in the sample image, and annotate the pedestrian's eyes, nose, ears, shoulders, elbows, wrists, hips, and hips. Key points such as knees, ankles, etc., and finally labeled image data are obtained.
将标注图像数据输入神经网络模型中进行训练,得到多任务模型;优选地,神经网络模型采用ResNet-101+FPN网络模型,该神经网络模型为一阶段自底向上多任务网络模型,和同类多阶段算法相比,节约处理时间;和自顶向下算法相比,处理时间不随图片中人数变化而变化。Input the annotated image data into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model, which is a one-stage bottom-up multi-task network model, which is more similar to similar models. Compared with the stage algorithm, it saves processing time; compared with the top-down algorithm, the processing time does not change with the number of people in the picture.
在本实施例中,提供了一种交互行为识别方法,其通过建立和训练多任务模型,对待检测图像进行处理,模型的训练和优化均在后台完成,不影响商场、超市或图书馆等场所的运营;而且模型泛化能力强,可以方便快速部署;多任务模型的不同任务之间可以特征共享,降低了计算量,减少了硬件资源占用,缩短单帧图像处理时间,实现多路摄像头并行处理。In this embodiment, an interactive behavior recognition method is provided, which processes the images to be detected by establishing and training a multi-task model. The training and optimization of the model are completed in the background without affecting places such as shopping malls, supermarkets or libraries. The operation of the model; and the model has strong generalization ability, which can be easily and quickly deployed; the features can be shared between different tasks of the multi-task model, which reduces the amount of calculation, reduces the hardware resource occupation, shortens the processing time of a single frame, and realizes the parallelism of multiple cameras deal with.
在一个实施例中,如图4所示,该方法包括以下步骤:In one embodiment, as shown in Figure 4, the method includes the following steps:
步骤402,获取目标场所的监控视频;Step 402: Obtain surveillance video of the target location;
步骤404,从监控视频中筛选出具有行人的图像作为待检测图像;Step 404: Filter out an image with pedestrians from the surveillance video as an image to be detected;
步骤406,将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部;Step 406: Input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected, and the key points are all located inside the detection frame;
步骤408,根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息;Step 408: Determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected;
步骤410,根据行人的检测框和预设坐标映射关系,得到行人在预设时间段内的路线图;Step 410: Obtain a route map of the pedestrian in a preset time period according to the mapping relationship between the detection frame of the pedestrian and the preset coordinate;
步骤412,根据行人的关键点,得到行人的朝向信息。Step 412: Obtain the direction information of the pedestrian according to the key points of the pedestrian.
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在一个实施例中,如图5所示,提供了一种交互行为识别装置,包括获取模块502、检测模块504和识别模块506,其中:In one embodiment, as shown in FIG. 5, an interactive behavior recognition device is provided, which includes an acquisition module 502, a detection module 504, and an identification module 506, wherein:
获取模块502,用于获取待检测图像;The obtaining module 502 is used to obtain the image to be detected;
检测模块504,用于将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测;The detection module 504 is used to input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected. The key points are located inside the detection frame, and the multi-task model is used for pedestrian detection and human key points Detection
识别模块506,用于根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。The recognition module 506 is configured to determine the interaction behavior information between the pedestrian and the corresponding item rack according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected.
在一个实施例中,预设物品架图像为预设物品架掩模图像,上述识别模块506包括:In one embodiment, the preset item rack image is a preset item rack mask image, and the aforementioned recognition module 506 includes:
第一关键点选取单元,用于选取行人的关键点中的手腕关键点;The first key point selection unit is used to select the key point of the wrist among the key points of the pedestrian;
手部区域单元,用于根据手腕关键点和预设的半径阈值,得到行人的手部区域;The hand area unit is used to obtain the pedestrian's hand area according to the key points of the wrist and the preset radius threshold;
交互判定单元,用于当手部区域图像和预设物品架掩模图像的相交面积大于预设面积阈值时,判定行人与对应物品架发生交互行为;当手部区域图像和预设物品架掩模图像的相交面积小于或等于面积阈值时,判定行人与对应物品架未发生交互行为。The interaction determination unit is used to determine when the intersecting area of the hand area image and the preset item rack mask image is greater than the preset area threshold, the pedestrian interacts with the corresponding item rack; when the hand area image and the preset item rack are hidden When the intersecting area of the modular image is less than or equal to the area threshold, it is determined that there is no interaction between the pedestrian and the corresponding item rack.
在一个实施例中,该装置还包括:In one embodiment, the device further includes:
第一位置坐标模块,用于选取行人的检测框中任一点作为定位点,将定位点在待测试图像中的位置坐标设定为行人的第一位置坐标;The first position coordinate module is used to select any point in the detection frame of the pedestrian as the positioning point, and set the position coordinates of the positioning point in the image to be tested as the first position coordinates of the pedestrian;
第二位置坐标模块,用于根据预设坐标映射关系,将行人的第一位置坐标映射到世界坐标系中,得到行人的第二位置坐标,第二位置坐标为行人在世界坐标系中的位置坐标;The second position coordinate module is used to map the first position coordinates of the pedestrian to the world coordinate system according to the preset coordinate mapping relationship to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position of the pedestrian in the world coordinate system coordinate;
路线图模块,用于采集行人在预设时间段内各时间点的第二位置坐标,得到行人在预设时间段内的路线图。The route map module is used to collect the second position coordinates of the pedestrian at each time point in the preset time period to obtain the pedestrian's route map in the preset time period.
在一个实施例中,该装置还包括:In one embodiment, the device further includes:
朝向信息模块,用于根据行人的关键点,得到行人的朝向信息;The orientation information module is used to obtain the orientation information of the pedestrian according to the key points of the pedestrian;
朝向区域模块,用于根据行人的朝向信息和预设物品架图像,得到行人朝向的物品架区域。The orientation area module is used to obtain the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.
在一个实施例中,上述朝向信息模块包括:In an embodiment, the above-mentioned orientation information module includes:
第二关键点选取单元,用于选取行人的关键点中的肩部关键点,肩部关键点包括左肩关键点和右肩关键点;The second key point selection unit is used to select the shoulder key points among the key points of pedestrians. The shoulder key points include the left shoulder key point and the right shoulder key point;
朝向角计算单元,用于对左肩关键点的坐标和右肩关键点的坐标求差,得到肩部向量;采用反余弦函数计算肩部向量与预设单位向量的夹角,预设单位向量为待检测图像的坐标系y轴负方向上的单位向量;对夹角的弧度值与π求和,得到行人的朝向角;The direction angle calculation unit is used to calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; the inverse cosine function is used to calculate the angle between the shoulder vector and the preset unit vector. The preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and π to obtain the heading angle of the pedestrian;
朝向判断单元,用于当朝向角大于等于π且小于1.5π时,判定行人朝向待检测图像的一侧;当朝向角大于1.5π且小于等于2π时,判定行人 朝向待检测图像的另一侧。The orientation determination unit is used to determine that the pedestrian is facing one side of the image to be detected when the orientation angle is greater than or equal to π and less than 1.5π; when the orientation angle is greater than 1.5π and less than or equal to 2π, determine that the pedestrian is facing the other side of the image to be detected .
在一个实施例中,上述获取模块502包括:In an embodiment, the above-mentioned obtaining module 502 includes:
视频获取单元,用于获取目标场所的监控视频;The video acquisition unit is used to acquire the surveillance video of the target location;
图像获取单元,用于从监控视频中筛选出具有行人的图像作为待检测图像。The image acquisition unit is used to screen out the image with pedestrians from the surveillance video as the image to be detected.
在一个实施例中,该装置还包括:In one embodiment, the device further includes:
样本获取模块,用于获取样本图像;The sample acquisition module is used to acquire sample images;
样本数据模块,用于对样本图像中的行人进行关键点标注和检测框标注,得到标注图像数据;The sample data module is used to label the pedestrians in the sample image with key points and check boxes to obtain labeled image data;
模型训练模块,用于将标注图像数据输入神经网络模型中进行训练,得到多任务模型;优选地,神经网络模型采用ResNet-101+FPN网络模型。The model training module is used to input the labeled image data into the neural network model for training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
关于交互行为识别装置的具体限定可以参见上文中对于交互行为识别方法的限定,在此不再赘述。上述交互行为识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the interactive behavior recognition device, please refer to the above limitation on the interactive behavior recognition method, which will not be repeated here. Each module in the above-mentioned interactive behavior recognition device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种交互行为识别方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize an interactive behavior identification method.
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设 备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
在一个实施例中,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:获取待检测图像;将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测;根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, the following steps are implemented: acquiring an image to be detected; The detection image is input into the preset multi-task model to obtain the key points and detection frame of the pedestrian in the image to be detected. The key points are located inside the detection frame. The multi-task model is used for pedestrian detection and human key point detection; according to the key points and key points of the pedestrian The preset item rack image corresponding to the image to be detected determines the interaction behavior information between pedestrians and the corresponding item rack.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:预设物品架图像为预设物品架掩模图像,根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息这一步骤时,包括:选取行人的关键点中的手腕关键点;根据手腕关键点和预设的半径阈值,得到行人的手部区域;当手部区域图像和预设物品架掩模图像的相交面积大于预设面积阈值时,判定行人与对应物品架发生交互行为;当手部区域图像和预设物品架掩模图像的相交面积小于或等于面积阈值时,判定行人与对应物品架未发生交互行为。In one embodiment, the processor further implements the following steps when executing the computer program: the preset item rack image is the preset item rack mask image, and the pedestrian is determined according to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected The step of interacting behavior information with the corresponding item rack includes: selecting the wrist key point among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the wrist key point and the preset radius threshold; when the hand area image and When the intersection area of the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold, It is determined that there is no interaction between the pedestrian and the corresponding item rack.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:选取行人的检测框中任一点作为定位点,将定位点在待测试图像中的位置坐标设定为行人的第一位置坐标;根据预设坐标映射关系,将行人的第一位置坐标映射到世界坐标系中,得到行人的第二位置坐标,第二位置坐标为行人在世界坐标系中的位置坐标;采集行人在预设时间段内各时间点的第二位置坐标,得到行人在预设时间段内的路线图。In one embodiment, the processor further implements the following steps when executing the computer program: selecting any point in the detection frame of the pedestrian as an anchor point, and setting the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system to obtain the second position coordinates of the pedestrian. The second position coordinates are the position coordinates of the pedestrian in the world coordinate system; the pedestrian is collected at the preset time The second position coordinates of each time point in the segment are obtained, and the route map of the pedestrian in the preset time period is obtained.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:根据行人的关键点,得到行人的朝向信息;根据行人的朝向信息和预设物品架图像,得到行人朝向的物品架区域。In one embodiment, the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the key points of the pedestrian; obtaining the item rack area where the pedestrian is oriented according to the pedestrian's orientation information and the preset item rack image.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:根据行人的关键点,得到行人的朝向信息,包括:选取行人的关键点中的肩部关 键点,肩部关键点包括左肩关键点和右肩关键点;对左肩关键点的坐标和右肩关键点的坐标求差,得到肩部向量;采用反余弦函数计算肩部向量与预设单位向量的夹角,预设单位向量为待检测图像的坐标系y轴负方向上的单位向量;对夹角的弧度值与π求和,得到行人的朝向角;当朝向角大于等于π且小于1.5π时,判定行人朝向待检测图像的一侧;当朝向角大于1.5π且小于等于2π时,判定行人朝向待检测图像的另一侧。In one embodiment, the processor further implements the following steps when executing the computer program: obtaining the pedestrian's orientation information according to the pedestrian's key points, including: selecting the shoulder key points of the pedestrian key points, and the shoulder key points include the left shoulder key Point and right shoulder key point; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector. The preset unit vector is The unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and π to obtain the heading angle of the pedestrian; when the heading angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing the image to be detected When the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取待检测图像,包括:获取目标场所的监控视频;从监控视频中筛选出具有行人的图像作为待检测图像。In an embodiment, the processor further implements the following steps when executing the computer program: acquiring the image to be detected includes: acquiring a surveillance video of the target location; and filtering out images with pedestrians from the surveillance video as the image to be detected.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取样本图像;对样本图像中的行人进行关键点标注和检测框标注,得到标注图像数据;将标注图像数据输入神经网络模型中进行训练,得到多任务模型;优选地,神经网络模型采用ResNet-101+FPN网络模型。In one embodiment, the processor further implements the following steps when executing the computer program: acquiring a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain labeled image data; and inputting the labeled image data into the neural network model Perform training to obtain a multi-task model; preferably, the neural network model adopts the ResNet-101+FPN network model.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取待检测图像;将待检测图像输入预设的多任务模型,得到待检测图像中行人的关键点和检测框,关键点均位于检测框内部,多任务模型用于行人检测和人体关键点检测;根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: acquiring the image to be detected; inputting the image to be detected into a preset multitasking model , Obtain the key points and detection frame of pedestrians in the image to be detected. The key points are located inside the detection frame. The multi-task model is used for pedestrian detection and human key point detection; according to the key points of the pedestrian and the preset item rack corresponding to the image to be detected Image, determine the interaction behavior information of pedestrians and corresponding item racks.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:预设物品架图像为预设物品架掩模图像,根据行人的关键点和待检测图像对应的预设物品架图像,确定行人和对应物品架的交互行为信息这一步骤时,包括:选取行人的关键点中的手腕关键点;根据手腕关键点和预设的半径阈值,得到行人的手部区域;当手部区域图像和预设物品架掩模图像的相交面积大于预设面积阈值时,判定行人与对应物品架发生交互行为;当手部区域图像和预设物品架掩模图像的相交面积小于或等于面积阈值时,判定行人与对应物品架未发生交互行为。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: the preset item rack image is the preset item rack mask image, and the preset item rack image corresponding to the key points of the pedestrian and the image to be detected is determined The step of interactive behavior information between pedestrians and corresponding item racks includes: selecting the key points of the wrist among the key points of the pedestrian; obtaining the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold; when the image of the hand area is When the intersection area with the preset item rack mask image is greater than the preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack; when the intersection area of the hand area image and the preset item rack mask image is less than or equal to the area threshold , It is determined that there is no interaction between the pedestrian and the corresponding item rack.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:选取行人的检测框中任一点作为定位点,将定位点在待测试图像中的位置坐标设定为行人的第一位置坐标;根据预设坐标映射关系,将行人的第一位置坐标映射到世界坐标系中,得到行人的第二位置坐标,第二位置坐标为行人在世界坐标系中的位置坐标;采集行人在预设时间段内各时间点的第二位置坐标,得到行人在预设时间段内的路线图。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: select any point in the detection frame of the pedestrian as an anchor point, and set the position coordinates of the anchor point in the image to be tested as the first position coordinates of the pedestrian ; According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system, and the second position coordinates of the pedestrian are obtained. The second position coordinates are the position coordinates of the pedestrian in the world coordinate system; collect pedestrians in the preset The second position coordinates of each time point in the time period are obtained, and the route map of the pedestrian in the preset time period is obtained.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:根据行人的关键点,得到行人的朝向信息;根据行人的朝向信息和预设物品架图像,得到行人朝向的物品架区域。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: obtaining the pedestrian's orientation information according to the key points of the pedestrian; and obtaining the item rack area to which the pedestrian is facing according to the pedestrian's orientation information and the preset item rack image.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:根据行人的关键点,得到行人的朝向信息,包括:选取行人的关键点中的肩部关键点,肩部关键点包括左肩关键点和右肩关键点;对左肩关键点的坐标和右肩关键点的坐标求差,得到肩部向量;采用反余弦函数计算肩部向量与预设单位向量的夹角,预设单位向量为待检测图像的坐标系y轴负方向上的单位向量;对夹角的弧度值与π求和,得到行人的朝向角;当朝向角大于等于π且小于1.5π时,判定行人朝向待检测图像的一侧;当朝向角大于1.5π且小于等于2π时,判定行人朝向待检测图像的另一侧。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: obtaining the orientation information of the pedestrian according to the key points of the pedestrian, including: selecting the shoulder key point among the key points of the pedestrian, and the shoulder key point includes the left shoulder Key points and right shoulder key points; calculate the difference between the coordinates of the left shoulder key point and the right shoulder key point to obtain the shoulder vector; use the inverse cosine function to calculate the angle between the shoulder vector and the preset unit vector, and the preset unit vector Is the unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected; sum the radian value of the included angle and π to obtain the direction angle of the pedestrian; when the direction angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing to be detected One side of the image; when the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取待检测图像,包括:获取目标场所的监控视频;从监控视频中筛选出具有行人的图像作为待检测图像。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: acquiring the image to be detected includes: acquiring a surveillance video of the target location; and screening an image with pedestrians from the surveillance video as the image to be detected.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取样本图像;对样本图像中的行人进行关键点标注和检测框标注,得到标注图像数据;将标注图像数据输入神经网络模型中进行训练,得到多任务模型;优选地,神经网络模型采用ResNet-101+FPN网络模型。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: obtaining a sample image; marking pedestrians in the sample image with key points and detecting frames to obtain annotated image data; inputting the annotated image data into the neural network model The multi-task model is obtained by training in the process; preferably, the neural network model adopts the ResNet-101+FPN network model.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时, 可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the procedures of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and their description is relatively specific and detailed, but they should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (10)

  1. 一种交互行为识别方法,其特征在于,所述方法包括:An interactive behavior recognition method, characterized in that the method includes:
    获取待检测图像;Obtain the image to be detected;
    将所述待检测图像输入预设的多任务模型,得到所述待检测图像中行人的关键点和检测框,所述关键点均位于所述检测框内部,所述多任务模型用于行人检测和人体关键点检测;Input the image to be detected into a preset multi-task model to obtain key points and a detection frame of pedestrians in the image to be detected, the key points are all located inside the detection frame, and the multi-task model is used for pedestrian detection And detection of key points of the human body;
    根据所述行人的关键点和所述待检测图像对应的预设物品架图像,确定所述行人和对应物品架的交互行为信息。According to the key points of the pedestrian and the preset item rack image corresponding to the image to be detected, the interaction behavior information between the pedestrian and the corresponding item rack is determined.
  2. 根据权利要求1所述的方法,其特征在于,所述预设物品架图像为预设物品架掩模图像,所述根据所述行人的关键点和所述待检测图像对应的预设物品架图像,确定所述行人和对应物品架的交互行为信息,包括:The method according to claim 1, wherein the preset item rack image is a preset item rack mask image, and the preset item rack corresponding to the key point of the pedestrian and the image to be detected is The image, which determines the interaction behavior information between the pedestrian and the corresponding item rack, includes:
    选取所述行人的关键点中的手腕关键点;Selecting a wrist key point among the key points of the pedestrian;
    根据所述手腕关键点和预设的半径阈值,得到所述行人的手部区域;Obtaining the hand area of the pedestrian according to the key points of the wrist and the preset radius threshold;
    当所述手部区域的图像和所述预设物品架掩模图像的相交面积大于预设面积阈值时,判定所述行人与对应物品架发生交互行为;When the intersecting area of the image of the hand area and the mask image of the preset item rack is greater than a preset area threshold, it is determined that the pedestrian interacts with the corresponding item rack;
    当所述手部区域的图像和所述预设物品架掩模图像的相交面积小于或等于所述面积阈值时,判定所述行人与对应物品架未发生交互行为。When the intersection area of the image of the hand region and the preset mask image of the item rack is less than or equal to the area threshold, it is determined that no interaction behavior between the pedestrian and the corresponding item rack has occurred.
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    选取所述行人的检测框中任一点作为定位点,将所述定位点在所述待测试图像中的位置坐标设定为所述行人的第一位置坐标;Selecting any point in the pedestrian detection frame as the positioning point, and setting the position coordinates of the positioning point in the image to be tested as the first position coordinates of the pedestrian;
    根据预设坐标映射关系,将所述行人的第一位置坐标映射到世界坐标系中,得到所述行人的第二位置坐标,所述第二位置坐标为所述行人在世界坐标系中的位置坐标;According to the preset coordinate mapping relationship, the first position coordinates of the pedestrian are mapped to the world coordinate system to obtain the second position coordinates of the pedestrian, and the second position coordinates are the position of the pedestrian in the world coordinate system coordinate;
    采集所述行人在预设时间段内各时间点的第二位置坐标,得到所述行人在所述预设时间段内的路线图。Collecting the second position coordinates of the pedestrian at each time point in the preset time period to obtain a route map of the pedestrian in the preset time period.
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    根据所述行人的关键点,得到所述行人的朝向信息;Obtain the direction information of the pedestrian according to the key points of the pedestrian;
    根据所述行人的朝向信息和所述预设物品架图像,得到所述行人朝向的物品架区域。According to the orientation information of the pedestrian and the preset item rack image, the item rack area to which the pedestrian is facing is obtained.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述行人的关键点,得到所述行人的朝向信息,包括:The method according to claim 4, wherein the obtaining the direction information of the pedestrian according to the key points of the pedestrian comprises:
    选取所述行人的关键点中的肩部关键点,所述肩部关键点包括左肩关键点和右肩关键点;Selecting a shoulder key point among the key points of the pedestrian, the shoulder key point includes a left shoulder key point and a right shoulder key point;
    对所述左肩关键点的坐标和所述右肩关键点的坐标求差,得到肩部向量;Calculating the difference between the coordinates of the left shoulder key point and the coordinates of the right shoulder key point to obtain a shoulder vector;
    采用反余弦函数计算所述肩部向量与预设单位向量的夹角,所述预设单位向量为所述待检测图像的坐标系y轴负方向上的单位向量;Using an arc cosine function to calculate the angle between the shoulder vector and a preset unit vector, where the preset unit vector is a unit vector in the negative direction of the y-axis of the coordinate system of the image to be detected;
    对所述夹角的弧度值与π求和,得到所述行人的朝向角;Sum the radian value of the included angle and π to obtain the heading angle of the pedestrian;
    当所述朝向角大于等于π且小于1.5π时,判定所述行人朝向所述待检测图像的一侧;When the orientation angle is greater than or equal to π and less than 1.5π, it is determined that the pedestrian is facing one side of the image to be detected;
    当所述朝向角大于1.5π且小于等于2π时,判定所述行人朝向所述待检测图像的另一侧。When the orientation angle is greater than 1.5π and less than or equal to 2π, it is determined that the pedestrian is facing the other side of the image to be detected.
  6. 根据权利要求1至5任意一项所述的方法,其特征在于,所述获取待检测图像,包括:The method according to any one of claims 1 to 5, wherein said acquiring the image to be detected comprises:
    获取目标场所的监控视频;Obtain surveillance video of the target site;
    从所述监控视频中筛选出具有行人的图像作为所述待检测图像。An image with pedestrians is selected from the surveillance video as the image to be detected.
  7. 根据权利要求1至5任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, wherein the method further comprises:
    获取样本图像;Obtain sample images;
    对所述样本图像中的行人进行关键点标注和检测框标注,得到标注图像数据;Carry out key point annotation and detection frame annotation on pedestrians in the sample image to obtain annotated image data;
    将所述标注图像数据输入神经网络模型中进行训练,得到所述多任务模型;优选地,所述神经网络模型采用ResNet-101+FPN网络模型。The labeled image data is input into a neural network model for training to obtain the multi-task model; preferably, the neural network model adopts a ResNet-101+FPN network model.
  8. 一种交互行为识别装置,其特征在于,所述装置包括:An interactive behavior recognition device, characterized in that the device includes:
    获取模块,用于获取待检测图像;The acquisition module is used to acquire the image to be detected;
    检测模块,用于将所述待检测图像输入预设的多任务模型,得到所述待检测图像中行人的关键点和检测框,所述关键点均位于所述检测框内部,所述多任务模型用于行人检测和人体关键点检测;The detection module is used to input the image to be detected into a preset multi-task model to obtain key points and detection frames of pedestrians in the image to be detected. The key points are all located inside the detection frame, and the multi-task The model is used for pedestrian detection and human key point detection;
    识别模块,用于根据所述行人的关键点和所述待检测图像对应的预设物品架图像,确定所述行人和对应物品架的交互行为信息。The recognition module is configured to determine the interaction behavior information between the pedestrian and the corresponding article rack according to the key points of the pedestrian and the preset article rack image corresponding to the image to be detected.
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。A computer device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 7 when the computer program is executed The steps of the method.
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by a processor.
PCT/CN2020/097002 2019-11-12 2020-06-19 Interactive behavior identification method and apparatus, computer device and storage medium WO2021093329A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3160731A CA3160731A1 (en) 2019-11-12 2020-06-19 Interactive behavior recognizing method, device, computer equipment and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911100457.9 2019-11-12
CN201911100457.9A CN110991261A (en) 2019-11-12 2019-11-12 Interactive behavior recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021093329A1 true WO2021093329A1 (en) 2021-05-20

Family

ID=70083879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097002 WO2021093329A1 (en) 2019-11-12 2020-06-19 Interactive behavior identification method and apparatus, computer device and storage medium

Country Status (3)

Country Link
CN (1) CN110991261A (en)
CA (1) CA3160731A1 (en)
WO (1) WO2021093329A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758239A (en) * 2022-04-22 2022-07-15 安徽工业大学科技园有限公司 Method and system for monitoring articles flying away from predetermined travel route based on machine vision
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991261A (en) * 2019-11-12 2020-04-10 苏宁云计算有限公司 Interactive behavior recognition method and device, computer equipment and storage medium
CN113642361B (en) * 2020-05-11 2024-01-23 杭州萤石软件有限公司 Fall behavior detection method and equipment
CN112307871A (en) * 2020-05-29 2021-02-02 北京沃东天骏信息技术有限公司 Information acquisition method and device, attention detection method, device and system
CN111611970B (en) * 2020-06-01 2023-08-22 城云科技(中国)有限公司 Urban management monitoring video-based random garbage throwing behavior detection method
CN111798341A (en) * 2020-06-30 2020-10-20 深圳市幸福人居建筑科技有限公司 Green property management method, system computer equipment and storage medium thereof
CN111783724B (en) * 2020-07-14 2024-03-26 上海依图网络科技有限公司 Target object identification method and device
CN112084984A (en) * 2020-09-15 2020-12-15 山东鲁能软件技术有限公司 Escalator action detection method based on improved Mask RCNN
CN112016528B (en) * 2020-10-20 2021-07-20 成都睿沿科技有限公司 Behavior recognition method and device, electronic equipment and readable storage medium
CN113377192B (en) * 2021-05-20 2023-06-20 广州紫为云科技有限公司 Somatosensory game tracking method and device based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245828A (en) * 2015-09-02 2016-01-13 北京旷视科技有限公司 Item analysis method and equipment
CN106709422A (en) * 2016-11-16 2017-05-24 南京亿猫信息技术有限公司 Supermarket shopping cart hand identification method and identification system thereof
CN109934075A (en) * 2017-12-19 2019-06-25 杭州海康威视数字技术股份有限公司 Accident detection method, apparatus, system and electronic equipment
US20190266405A1 (en) * 2016-10-26 2019-08-29 Htc Corporation Virtual reality interaction method, apparatus and system
CN110991261A (en) * 2019-11-12 2020-04-10 苏宁云计算有限公司 Interactive behavior recognition method and device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993067B (en) * 2019-03-07 2022-01-28 北京旷视科技有限公司 Face key point extraction method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245828A (en) * 2015-09-02 2016-01-13 北京旷视科技有限公司 Item analysis method and equipment
US20190266405A1 (en) * 2016-10-26 2019-08-29 Htc Corporation Virtual reality interaction method, apparatus and system
CN106709422A (en) * 2016-11-16 2017-05-24 南京亿猫信息技术有限公司 Supermarket shopping cart hand identification method and identification system thereof
CN109934075A (en) * 2017-12-19 2019-06-25 杭州海康威视数字技术股份有限公司 Accident detection method, apparatus, system and electronic equipment
CN110991261A (en) * 2019-11-12 2020-04-10 苏宁云计算有限公司 Interactive behavior recognition method and device, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758239A (en) * 2022-04-22 2022-07-15 安徽工业大学科技园有限公司 Method and system for monitoring articles flying away from predetermined travel route based on machine vision
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge
CN116862980B (en) * 2023-06-12 2024-01-23 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge

Also Published As

Publication number Publication date
CA3160731A1 (en) 2021-05-20
CN110991261A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
WO2021093329A1 (en) Interactive behavior identification method and apparatus, computer device and storage medium
TWI773797B (en) System, method and computer program product for tracking multi-joint subjects in an area of real space
Wei et al. A vision and learning-based indoor localization and semantic mapping framework for facility operations and management
JP6397144B2 (en) Business discovery from images
US9014467B2 (en) Image processing method and image processing device
CN108304757A (en) Personal identification method and device
KR20200040665A (en) Systems and methods for detecting a point of interest change using a convolutional neural network
US11093886B2 (en) Methods for real-time skill assessment of multi-step tasks performed by hand movements using a video camera
WO2016034008A1 (en) Target tracking method and device
US10572716B2 (en) Processing uncertain content in a computer graphics system
US11113571B2 (en) Target object position prediction and motion tracking
WO2017088804A1 (en) Method and apparatus for detecting wearing of spectacles in facial image
CN109522790A (en) Human body attribute recognition approach, device, storage medium and electronic equipment
WO2021258579A1 (en) Image splicing method and apparatus, computer device, and storage medium
US20180101228A1 (en) Gaze direction mapping
Unzueta et al. Efficient generic face model fitting to images and videos
TW201246089A (en) Method for setting dynamic environmental image borders and method for instantly determining the content of staff member activities
JP7192143B2 (en) Method and system for object tracking using online learning
TWI420440B (en) Object exhibition system and method
JP6331270B2 (en) Information processing system, information processing method, and program
US9361705B2 (en) Methods and systems for measuring group behavior
Guo et al. Monocular 3D multi-person pose estimation via predicting factorized correction factors
US9924865B2 (en) Apparatus and method for estimating gaze from un-calibrated eye measurement points
US20230091536A1 (en) Camera Placement Guidance
JP2018081452A (en) Image processing device, and image processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20887643

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3160731

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20887643

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20887643

Country of ref document: EP

Kind code of ref document: A1