WO2021047232A1 - 交互行为识别方法、装置、计算机设备和存储介质 - Google Patents

交互行为识别方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021047232A1
WO2021047232A1 PCT/CN2020/096994 CN2020096994W WO2021047232A1 WO 2021047232 A1 WO2021047232 A1 WO 2021047232A1 CN 2020096994 W CN2020096994 W CN 2020096994W WO 2021047232 A1 WO2021047232 A1 WO 2021047232A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
human body
preset
information
recognition
Prior art date
Application number
PCT/CN2020/096994
Other languages
English (en)
French (fr)
Inventor
庄喜阳
余代伟
孙皓
杨现
Original Assignee
苏宁易购集团股份有限公司
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司, 苏宁云计算有限公司 filed Critical 苏宁易购集团股份有限公司
Priority to CA3154025A priority Critical patent/CA3154025A1/en
Publication of WO2021047232A1 publication Critical patent/WO2021047232A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • This application relates to an interactive behavior recognition method, device, computer equipment and storage medium.
  • the existing human-goods interaction behavior recognition methods usually use template and rule matching, and the definition of the template and the formulation of rules require a lot of manpower and are often only suitable for the recognition of common human postures and recognition accuracy. It is poor, and its portability is weak, and it can only be applied to the interaction between humans and goods in specific scenarios.
  • An interactive behavior identification method includes:
  • the performing human posture detection on the image to be detected by using a preset detection model to obtain human posture information and hand position information includes:
  • the human body posture detection is performed on the human body image through a preset detection model to obtain the human body posture information and the hand position information.
  • the method further includes:
  • a second interactive behavior recognition result is obtained, and the second interactive behavior recognition result is a human-goods interaction behavior recognition result.
  • the acquiring the image to be detected includes:
  • the preset first shooting angle of view is an overhead angle of view perpendicular to the ground, and the image to be detected is RGBD data.
  • the method further includes:
  • the first training data set is input into the HRNet model for training to obtain the detection model.
  • the method further includes:
  • the second training data set is input into a convolutional neural network for training to obtain the preset classification and recognition model, and the convolutional neural network is a yolov3-tiny network or a vgg16 network.
  • the acquiring sample image data includes:
  • the sample image data with human-goods interaction behavior is filtered from the collected image data.
  • the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data .
  • An interactive behavior recognition device includes:
  • the first acquisition module is used to acquire the image to be detected
  • the first detection module is configured to perform human posture detection on the image to be detected through a preset detection model to obtain human posture information and hand position information, and the detection model is used to perform human posture detection;
  • the tracking module is used to track the human body posture according to the human body posture information to obtain human body motion trajectory information, and to perform target tracking on the hand position according to the hand position information to obtain a hand region image ;
  • the second detection module is configured to perform item recognition on the hand region image through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
  • the first interactive behavior recognition module is configured to obtain the first interactive behavior recognition result according to the human body motion trajectory information and the article recognition result.
  • a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
  • the above-mentioned interactive behavior recognition methods, devices, computer equipment and storage media use detection models and classification recognition models to perform interactive behavior recognition on the image to be detected. Based on the original model, only a small amount of data needs to be collected, and it can be deployed in different stores. , It has strong portability, low deployment cost, and the detection model can identify interactive behaviors more flexibly and accurately, which improves the recognition accuracy.
  • Figure 1 is an application environment diagram of an interactive behavior recognition method in an embodiment
  • Figure 2 is a schematic flowchart of an interactive behavior identification method in an embodiment
  • FIG. 3 is a schematic flowchart of an interactive behavior recognition method in another embodiment
  • FIG. 4 is a schematic flow chart of a training step of a detection model in an embodiment
  • FIG. 5 is a schematic flowchart of a training step of a classification and recognition model in an embodiment
  • Figure 6 is a structural block diagram of an interactive behavior recognition device in an embodiment
  • Fig. 7 is an internal structure diagram of a computer device in an embodiment.
  • the interactive behavior identification method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the terminal 102 may be, but is not limited to, various image acquisition devices. More specifically, the terminal 102 may use one or more depth cameras with a shooting angle perpendicular to the ground, and the server 104 may be an independent server or a combination of multiple servers. Server clusters are implemented.
  • an interactive behavior recognition method is provided.
  • the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • Step 202 Obtain an image to be detected
  • the image to be detected is an image of interaction behavior between a person and an object to be detected.
  • step 202 includes the following content: the server acquires the image to be detected collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first shooting angle of view is perpendicular to the ground or close to perpendicular to the ground.
  • the overhead viewing angle of the ground, and the image to be detected is RGBD data.
  • the image to be detected is the RGBD data collected by the image acquisition device in the overhead viewing angle scene.
  • the image acquisition device can use a depth camera set above the shelf.
  • the first shooting angle of view may not be perpendicular to the ground, which is allowed in the installation environment. Under the circumstance, the angle of view can be arbitrarily close to vertical, and try to avoid shooting blind spots.
  • This technical solution uses a depth camera with a top view angle to detect the interaction between people and goods. Compared with the traditional camera installation method at a certain angle with the ground, it can effectively avoid the problem of people and shelf occlusion based on a squint angle of view, as well as hand The problem of increasing the difficulty of internal tracking; in practical applications, image acquisition from the overhead view can better identify the occurrence of cross-pickup behavior of different people.
  • Step 204 Perform human posture detection on the image to be detected using a preset detection model to obtain human posture information and hand position information, and the detection model is used for human posture detection;
  • the detection model is a human posture detection model, which can be used to detect key points of human bones.
  • the server inputs the human body image to the detection model; performs human posture detection on the human body image in the detection model; obtains human posture information and hand position information output by the detection model; human posture detection can be a commonly used bone line detection method,
  • the obtained human body posture information is the human bone key point image, and the hand position information is the specific position of the hand in the human bone key point image.
  • Step 206 tracking the human body posture according to the human body posture information to obtain the human body motion trajectory information; and according to the hand position information, performing target tracking on the hand position to obtain an image of the hand area;
  • the target tracking algorithm is used, for example, the Camshift algorithm can be changed to adapt to the size and shape of the moving target to track the motion trajectory of the human body and the hand respectively to obtain the human body motion trajectory information, and expand the hand position during the tracking process to obtain the hand Area image.
  • Step 208 Perform item recognition on the image of the hand area through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
  • the classification recognition model is an item recognition model, and an item recognition model trained by deep learning can be used.
  • the hand area image is input to the classification recognition model, and the hand area image is detected in the classification recognition model to determine whether there is an item in the hand area.
  • the classification recognition model recognizes the item and outputs Item recognition results; on the other hand, the classification and recognition model can also judge the skin color of the hand region image, and issue early warnings for the deliberate use of clothing and other items to cover the hands in a timely manner to achieve the purpose of reducing cargo damage.
  • Step 210 Obtain a first interactive behavior recognition result according to the human body motion trajectory information and the item recognition result.
  • the first interaction behavior recognition result is the interaction behavior recognition result between people and objects.
  • the aforementioned human body motion trajectory information can be used to determine a person’s behavior, such as stretching, leaning, bending, and squatting, etc., and then according to whether the human hand holds an object, and when the hand holds an object,
  • the item recognition result obtained by the item recognition can determine that the human body is picking up or putting down the item, that is, analyzing the recognition result of the interaction between the person and the item.
  • a detection model and a classification recognition model are used to recognize the interactive behavior of the image to be detected. After model training and algorithm tuning, the interactive behavior between people and objects can be automatically recognized, and the recognition result is better. Accurate; and based on the current detection model and classification and recognition model, only a small amount of data can be collected, and it can be deployed in different scenarios. It has strong portability and low deployment cost.
  • the method includes the following steps:
  • Step 302 Obtain an image to be detected
  • Step 304 Perform preset processing on the image to be detected to obtain a human body image in the image to be detected;
  • step 304 is a process of extracting the human body image that needs to be used in the subsequent steps from the image to be detected, while shielding the unnecessary background image.
  • the foregoing preset processing may adopt background modeling, that is, perform background modeling based on Gaussian mixture on the image to be detected to obtain a background model;
  • the human body image in the image to be detected is obtained.
  • Step 306 Perform human posture detection on the human body image by using a preset detection model to obtain human posture information and hand position information;
  • Step 308 tracking the human body posture according to the human body posture information to obtain the human body motion trajectory information, and performing target tracking on the hand position according to the hand position information to obtain an image of the hand area;
  • Step 310 Perform item recognition on the image of the hand area through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
  • Step 312 Obtain a first interactive behavior recognition result according to the human body motion track information and the item recognition result.
  • step 304 masks out unnecessary background images by preprocessing the image to be detected, and only retains the human body image to be used later, thereby reducing the amount of data to be processed in the next step and improving the data processing efficiency.
  • the method further includes:
  • the human body position information may refer to the position information of the human body in the three-dimensional world coordinate system.
  • the acquisition position information of the image to be detected in the three-dimensional world coordinate system is acquired; according to the position information of the human body image in the image to be detected, and the acquisition position information, the three-dimensional world coordinate transformation is performed to obtain the human body in the three-dimensional world coordinate system.
  • Location information is acquired.
  • the second interactive behavior recognition result is obtained, and the second interactive behavior recognition result is the human-goods interaction behavior recognition result.
  • the shelf information includes shelf position information and item information in the shelf, and the shelf position information is the three-dimensional world coordinate position of the shelf.
  • the shelf information corresponding to the position of the human body is obtained according to the position information of the human body and the preset shelf information; an interaction between the human body and the shelf is confirmed by tracking the three-dimensional world coordinate position of the human body and the shelf, and then In the tracking process, by identifying whether there are goods associated with the shelf in the hand area, the occurrence of an effective human-goods interaction behavior is further confirmed.
  • the effective human-goods interaction behavior can complete a pickup behavior for the customer from the shelf.
  • This technical solution converts the position of the customer in the world coordinate system through the three-dimensional world coordinate transformation, and associates it with the shelf, and can identify whether the customer has an effective human-goods interaction behavior; on the other hand, it is the basis for identifying the human-goods interaction behavior
  • the shelf stock under the premise that the shelf stock is known, by monitoring the effective number of interactions between people and the shelf, the inventory of the shelf’s existing inventory can be indirectly realized.
  • the server can promptly remind the clerk to take care of it. Goods, greatly reducing the cost of manpower inventory.
  • the method further includes a detection model training step, which specifically includes the following steps:
  • Step 402 Obtain sample image data
  • the preset second shooting angle of view may be a top-down angle of view perpendicular to the ground or nearly perpendicular to the ground, and the sample image data is RGBD data.
  • Step 404 Perform key point annotation and hand position annotation on the human body image in the sample image data to obtain first annotated image data;
  • the sample image data needs to basically cover different human-goods interaction behaviors in the actual scene.
  • the sample data can also be enhanced to increase the number of sample image data and increase the proportion of training samples with large postures during the interaction behavior, such as increasing the profile.
  • the posture ratio of body, bending, squatting and other human-goods interaction behaviors improves the detection accuracy of the detection model.
  • a part of the first annotated image data can be used as a training data set, and the remaining part can be used as a verification data set.
  • Step 406 Perform image enhancement processing on the first annotated image data to obtain a first training data set; in a specific implementation process, perform image enhancement processing on the training data set in the first annotated image data to obtain the first training data set.
  • the image enhancement processing may include any one or more of the following image transformation methods, such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
  • image transformation methods such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
  • Step 408 Input the first training data set into the HRNet model for training to obtain a detection model.
  • different network architectures of the HRNet model can be used to train the human posture detection model, and each model obtained by training with different network architectures is verified and evaluated through the verification data set, and the model with the best effect is selected and set as the detection model.
  • the method further includes a classification and recognition model training step, which specifically includes the following steps:
  • Step 502 Obtain sample image data
  • Step 504 Label the hand area in the sample image data and label the items located in the hand area to obtain the second label image data;
  • Step 506 Perform image enhancement processing on the second annotated image data to obtain a second training data set
  • the image enhancement processing may include any one or more of the following image transformation methods, such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
  • image transformation methods such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
  • Step 508 Input the second training data set into the yolov3-tiny network or the vgg16 network for training to obtain a preset classification and recognition model.
  • RGBD data collects RGBD data through a depth camera with a line of sight vertical or close to the ground, and then manually collects RGBD data with human-goods interaction behavior as training samples, that is, sample image data, using deep learning training, and training model results. Recognizing different postures of the human body, the detection model can recognize interactive behaviors more flexibly and accurately, and has strong transplantability.
  • An interactive behavior recognition device provides an interactive behavior recognition device, including: a first acquisition module 602, a first detection module 604, a tracking module 606, a second detection module 608, and a first interaction behavior
  • the identification module 610 wherein:
  • the first acquisition module 602 is used to acquire the image to be detected
  • the first detection module 604 is configured to perform human posture detection on the image to be detected using a preset detection model to obtain human posture information and hand position information, and the detection model is used to perform human posture detection;
  • the tracking module 606 is used to track the posture of the human body according to the posture information of the human body to obtain the trajectory information of the human body, and to track the target according to the hand position information and the position of the hand to obtain an image of the hand area;
  • the second detection module 608 is configured to perform item recognition on the image of the hand area through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
  • the first interactive behavior recognition module 610 is configured to obtain the first interactive behavior recognition result according to the human body motion track information and the item recognition result.
  • the first detection module 604 is also used to perform preset processing on the image to be detected to obtain the human body image in the image to be detected; to perform human body posture detection on the human body image through the preset detection model to obtain human body posture information And hand position information.
  • the device further includes:
  • the human body position module is used to obtain human body position information according to the image to be detected
  • the second interactive behavior recognition module is used to obtain the second interactive behavior recognition result according to the human body movement track information, the item recognition result, the human body position information and the preset shelf information, and the second interactive behavior recognition result is the human-goods interaction behavior recognition result.
  • the first acquisition module 602 is also used to acquire the to-be-detected image collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first shooting angle of view is a top view perpendicular to the ground. Viewing angle, the image to be inspected is RGBD data.
  • the device further includes:
  • the second acquisition module is used to acquire sample image data
  • the first labeling module is used to label the key points and hand positions of the human body image in the sample image data to obtain the first labelled image data;
  • the first enhancement module is configured to perform image enhancement processing on the first annotated image data to obtain a first training data set
  • the first training module is used to input the first training data set into the HRNet model for training to obtain the detection model.
  • the device further includes:
  • the second labeling module is used to label the hand area in the sample image data and label the items located in the hand area to obtain the second label image data;
  • the second enhancement module is configured to perform image enhancement processing on the second annotated image data to obtain a second training data set
  • the second training module is used to input the second training data set into the yolov3-tiny network or the vgg16 network for training to obtain a preset classification and recognition model.
  • the second acquisition module is also used to acquire image data collected by the image acquisition device at a preset second shooting angle within a preset time range; the collected image data is filtered to obtain a human-goods interaction behavior
  • the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data.
  • Each module in the above-mentioned interactive behavior recognition device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize an interactive behavior identification method.
  • FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, the following steps are implemented: acquiring an image to be detected;
  • the preset detection model performs human posture detection on the image to be detected, and obtains human posture information and hand position information.
  • the detection model is used for human posture detection; according to the human posture information, the human posture is tracked to obtain the human motion trajectory information, and According to the hand position information, the hand position is tracked to obtain the hand area image; the hand area image is recognized through the preset classification recognition model to obtain the object recognition result, and the classification recognition model is used for the object recognition; according to the human body
  • the movement trajectory information and the item recognition result obtain the first interactive behavior recognition result.
  • the processor further implements the following steps when executing the computer program: performing human posture detection on the image to be detected through a preset detection model to obtain human posture information and hand position information, including: performing preset processing on the image to be detected , The human body image in the image to be detected is obtained; the human body posture detection is performed on the human body image through the preset detection model, and the human body posture information and the hand position information are obtained.
  • the processor further implements the following steps when executing the computer program: obtain the human body position information according to the image to be detected; obtain the second human body position information according to the human body movement track information, the item recognition result, the human body position information and the preset shelf information
  • the interactive behavior recognition result, and the second interactive behavior recognition result is the human-goods interaction behavior recognition result.
  • the processor further implements the following steps when executing the computer program: acquiring the image to be inspected includes: acquiring the image to be inspected collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first The shooting angle of view is an overhead angle of view perpendicular to the ground, and the image to be detected is RGBD data.
  • the processor further implements the following steps when executing the computer program: acquiring sample image data; performing key point annotation and hand position annotation on the human body image in the sample image data to obtain the first annotated image data;
  • the annotated image data is subjected to image enhancement processing to obtain the first training data set;
  • the first training data set is input into the HRNet model for training, and the detection model is obtained.
  • the processor further implements the following steps when executing the computer program: labeling the hand area in the sample image data and labeling the items located in the hand area to obtain the second labeling image data;
  • the second annotated image data is subjected to image enhancement processing to obtain a second training data set;
  • the second training data set is input into a convolutional neural network for training, and a preset classification recognition model is obtained.
  • the processor further implements the following steps when executing the computer program: acquiring sample image data includes: acquiring image data acquired by the image acquisition device at a preset second shooting angle within a preset time range; The image data is filtered to obtain sample image data with human-goods interaction behavior.
  • the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data.
  • a computer-readable storage medium is provided, and a computer program is stored thereon.
  • the computer program is executed by a processor, the following steps are implemented: acquiring an image to be detected; performing a human body on the image to be detected through a preset detection model Posture detection, the human body posture information and hand position information are obtained, and the detection model is used for human body posture detection; according to the human body posture information, the human body posture is tracked to obtain the human body motion trajectory information, and according to the hand position information, the hand position is performed Target tracking, obtain the image of the hand area; use the preset classification recognition model to recognize the hand area image to obtain the object recognition result, and the classification recognition model is used for the object recognition; according to the human body movement trajectory information and the object recognition result, the first 1. Recognition result of interactive behavior.
  • the following steps are further implemented: performing human posture detection on the image to be detected through a preset detection model to obtain human posture information and hand position information, including: presetting the image to be detected Through processing, the human body image in the image to be detected is obtained; the human body posture detection is performed on the human body image through the preset detection model, and the human body posture information and the hand position information are obtained.
  • the following steps are also implemented: obtaining human body position information according to the image to be detected; obtaining the first human body position information according to the human body motion track information, item recognition result, human body position information, and preset shelf information Second, the recognition result of the interactive behavior, and the second recognition result of the interactive behavior is the recognition result of the human-goods interaction behavior.
  • acquiring the image to be detected includes: acquiring the image to be detected collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first A shooting angle of view is the overhead angle of view perpendicular to the ground, and the image to be detected is RGBD data.
  • the following steps are also implemented: obtaining sample image data; performing key point annotation and hand position annotation on the human body image in the sample image data to obtain the first annotated image data; Perform image enhancement processing on annotated image data to obtain a first training data set; input the first training data set into an HRNet model for training to obtain a detection model.
  • the following steps are further implemented: labeling the hand area in the sample image data and labeling the items located in the hand area to obtain the second labeling image data; Perform image enhancement processing on the second labeled image data to obtain a second training data set; input the second training data set into a convolutional neural network for training, and obtain a preset classification recognition model.
  • acquiring sample image data includes: acquiring image data acquired by the image acquisition device at a preset second shooting angle within a preset time range; Sample image data with human-to-goods interaction behavior is obtained by screening the image data in.
  • the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种交互行为识别方法、装置、计算机设备和存储介质。该方法包括:获取待检测图像;通过预设的检测模型对待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,检测模型用于进行人体姿态检测;根据人体姿态信息,对人体姿态进行跟踪,得到人体运动轨迹信息;根据手部位置信息,对手部位置进行目标跟踪,获取手部区域图像;通过预设的分类识别模型对手部区域图像进行物品识别,得到物品识别结果,分类识别模型用于进行物品识别;根据人体运动轨迹信息和物品识别结果,得到第一交互行为识别结果。本方法能够提高交互行为的识别精度,并具有更好的可迁移性更好。

Description

交互行为识别方法、装置、计算机设备和存储介质 技术领域
本申请涉及一种交互行为识别方法、装置、计算机设备和存储介质。
背景技术
随着科技发展,无人售货技术开始日益被各大零售商推崇,该项技术通过采用传感器、图像分析、计算机视觉和等多种智能识别技术实现无人结算。其中,运用图像识别技术感知人与货架之间的相对位置和货架上商品的移动,进行人货交互行为识别,是保证顾客正常结算消费的重要的前提。
然而,现有的人货交互行为识别方法通常使用的是模版和规则匹配,而模版的定义和规则的制定,需要耗费大量的人力劳动,并且往往只适用于常用人体姿态的识别,识别准确度较差,而且可移植性很弱,只能应用于特定场景的人货交互行为。
发明内容
基于此,有必要针对上述技术问题,提供一种识别精度更高、可迁移性更好的交互行为识别方法、装置、计算机设备和存储介质。
一种交互行为识别方法,所述方法包括:
获取待检测图像;
通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;
根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息;根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;
通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;
根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
在其中一个实施例中,所述通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,包括:
对所述待检测图像进行预设处理,得到所述待检测图像中的人体图像;
通过预设的检测模型对所述人体图像进行人体姿态检测,得到所述人体姿态信息 和所述手部位置信息。
在其中一个实施例中,所述方法还包括:
根据所述待检测图像,获取人体位置信息;
根据所述人体运动轨迹信息、所述物品识别结果、所述人体位置信息及预设的货架信息,得到第二交互行为识别结果,所述第二交互行为识别结果为人货交互行为识别结果。
在其中一个实施例中,所述获取待检测图像,包括:
获取图像采集装置在预设的第一拍摄视角采集的所述待检测图像;
优选地,所述预设的第一拍摄视角为垂直于地面的俯拍视角,所述待检测图像为RGBD数据。
在其中一个实施例中,所述方法还包括:
获取样本图像数据;
对所述样本图像数据中的人体图像进行关键点标注和手部位置标注,得到第一标注图像数据;
对所述第一标注图像数据进行图像增强处理,得到第一训练数据集;
将所述第一训练数据集输入HRNet模型中进行训练,得到所述检测模型。
在其中一个实施例中,所述方法还包括:
对所述样本图像数据中的手部区域进行标注且对位于所述手部区域内的物品进行物品类别标注,得到第二标注图像数据;
对所述第二标注图像数据进行图像增强处理,得到第二训练数据集;
将所述第二训练数据集输入卷积神经网络中进行训练,得到所述预设的分类识别模型,所述卷积神经网络为yolov3-tiny网络或者vgg16网络。
在其中一个实施例中,所述获取样本图像数据,包括:
获取预设时间范围内图像采集装置在预设的第二拍摄视角采集的图像数据;
从采集到的所述图像数据中筛选得到具有人货交互行为的样本图像数据,优选地,所述预设的第二拍摄视角为垂直于地面的俯拍视角,所述样本图像数据为RGBD数据。
一种交互行为识别装置,所述装置包括:
第一获取模块,用于获取待检测图像;
第一检测模块,用于通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;
跟踪模块,用于根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息,且根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;
第二检测模块,用于通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;
第一交互行为识别模块,用于根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
获取待检测图像;
通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;
根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息;根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;
通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;
根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取待检测图像;
通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;
根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息;根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;
通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;
根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
上述交互行为识别方法、装置、计算机设备和存储介质,通过检测模型和分类识别模型,对待检测图像进行交互行为识别,在原有模型的基础上只需采集少量数据,即可在不同的门店进行部署,具有较强的移植性,部署成本较低,而且检测模型能够 更灵活准确地识别交互行为,提高了识别精度。
附图说明
图1为一个实施例中交互行为识别方法的应用环境图;
图2为一个实施例中交互行为识别方法的流程示意图;
图3为另一个实施例中交互行为识别方法的流程示意图;
图4为一个实施例中检测模型训练步骤的流程示意图;
图5为一个实施例中分类识别模型训练步骤的流程示意图;
图6为一个实施例中交互行为识别装置的结构框图;
图7为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的交互行为识别方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。其中,终端102可以但不限于是各种图像采集装置,进一步具体地,终端102可以采用一个或多个拍摄角度垂直于地面的深度摄像头,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种交互行为识别方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤202,获取待检测图像;
其中,待检测图像为待检测的人与物体之间的交互行为图像。
在其中一个实施例中,步骤202包括以下内容:服务器获取图像采集装置在预设的第一拍摄视角采集的待检测图像;优选地,预设的第一拍摄视角为垂直于地面或者接近垂直于地面的俯拍视角,待检测图像为RGBD数据。
也就是说,待检测图像为图像采集装置在俯拍视角场景下采集到的RGBD数据,图像采集装置可以采用设置在货架上方的深度摄像头,第一拍摄视角可以不与地面垂直,在安装环境允许的情况下可以为任意接近垂直的俯拍视角,尽量避免出现拍摄死角。
本技术方案利用俯拍视角的深度摄像头,检测人货交互行为,与传统的与地面呈一定夹角的相机安装方式相比,可以有效地规避了基于斜视视角的人及货架遮挡问题, 以及手部跟踪难度加大的问题;在实际应用中,在俯拍视角进行图像采集,可以更好地对不同人交叉取货行为的发生进行识别。
步骤204,通过预设的检测模型对待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,该检测模型用于进行人体姿态检测;
其中,该检测模型为人体姿态检测模型,可以用于进行人体骨骼关键点检测。
具体地,服务器将人体图像输入到检测模型;在检测模型中对人体图像进行人体姿态检测;获取检测模型输出的人体姿态信息和手部位置信息;人体姿态检测可以为常用的骨骼线检测方法,则得到的人体姿态信息为人体骨骼关键点图像,手部位置信息为手部在该人体骨骼关键点图像中的具体位置。
步骤206,根据人体姿态信息,对人体姿态进行跟踪,得到人体运动轨迹信息;且根据手部位置信息,对手部位置进行目标跟踪,获取手部区域图像;
具体地,采用目标跟踪算法,例如可以适应运动目标的大小形状改变Camshift算法,分别对人体和手部的运动轨迹进行跟踪,得到人体运动轨迹信息,并在跟踪过程中扩充人手位置,获取手部区域图像。
步骤208,通过预设的分类识别模型对手部区域图像进行物品识别,得到物品识别结果,该分类识别模型用于进行物品识别;
其中,分类识别模型为物品识别模型,可以采用深度学习训练出的物品识别模型。
具体地,将手部区域图像输入到分类识别模型,在分类识别模型中对手部区域图像进行检测,判断手部区域是否拿有物品,当有物品时,分类识别模型对该物品进行识别,输出物品识别结果;另一方面,该分类识别模型还可以对所述手部区域图像进行肤色判断,对刻意用衣服等物品进行遮挡手部的行为及时发出预警,达到减少货损的目的。
步骤210,根据人体运动轨迹信息和该物品识别结果,得到第一交互行为识别结果。
其中,第一交互行为识别结果为人与物品的交互行为识别结果。
具体地,上述人体运动轨迹信息可以用于判断人的行为动作,例如伸手、俯身、弯腰及下蹲等,再根据人体手部是否拿有物品,以及当手部拿有物品时,对该物品进行识别得到的物品识别结果,可以判断出人体在拿起或者放下物品,即分析得到人与物品的交互行为识别结果。
本技术方案提供的交互行为识别方法中,采用了检测模型和分类识别模型对待检 测图像进行交互行为识别,经过模型训练和算法调优,能够自动识别人与物品之间的交互行为,识别结果更准确;而且在当前检测模型和分类识别模型的基础上只需采集少量数据,即可在不同的场景进行部署,具有较强的移植性,部署成本较低。
在其中一个实施例中,如图3所示,该方法包括以下步骤:
步骤302,获取待检测图像;
步骤304,对待检测图像进行预设处理,得到待检测图像中的人体图像;
其中,步骤304为从待检测图像中提取后续步骤需要使用的人体图像的过程,而屏蔽掉不需要的背景图像。
具体地,上述预设处理可以采用背景建模,也就是说,对待检测图像进行基于混合高斯的背景建模,得到背景模型;
根据待检测图像和背景模型,得到待检测图像中的人体图像。
步骤306,通过预设的检测模型对人体图像进行人体姿态检测,得到人体姿态信息和手部位置信息;
步骤308,根据人体姿态信息,对人体姿态进行跟踪,得到人体运动轨迹信息,且根据手部位置信息,对手部位置进行目标跟踪,获取手部区域图像;
步骤310,通过预设的分类识别模型对手部区域图像进行物品识别,得到物品识别结果,该分类识别模型用于进行物品识别;
步骤312,根据人体运动轨迹信息和物品识别结果,得到第一交互行为识别结果。
本实施例中,步骤304通过对待检测图像的预处理,屏蔽掉不需要的背景图像,只保留后续要使用的人体图像,从而减少接下来步骤中需要处理的数据量,提高了数据处理效率。
在其中一个实施例中,该方法还包括:
根据待检测图像,获取人体位置信息;
其中,该人体位置信息可以指人体在三维世界坐标系中的位置信息。
具体地,获取待检测图像的在三维世界坐标系中的采集位置信息;根据人体图像在待检测图像中的位置信息,以及采集位置信息,进行三维世界坐标变换,得到人体在三维世界坐标系中的位置信息。
根据人体运动轨迹信息、物品识别结果、人体位置信息及预设的货架信息,得到第二交互行为识别结果,该第二交互行为识别结果为人货交互行为识别结果。
其中,货架信息包括货架位置信息和货架中的物品信息,该货架位置信息为货架 所在三维世界坐标位置。
具体地,根据所述人体位置信息和所述预设的货架信息,得到所述人体位置对应的货架信息;通过跟踪人体与货架所在三维世界坐标位置来确认人体与货架的一次交互行为,然后在跟踪过程中通过识别手部区域是否有与货架相关联的商品,进一步确认一次有效人货交互行为的发生,此处有效人货交互行为可以为顾客从货架上完成一次取货行为。
本技术方案通过三维世界坐标变换,换算出顾客所在世界坐标系中的位置,与货架进行关联,可以识别顾客是否发生一次有效的人货交互行为;另一方面,在识别人货交互行为的基础上,结合物品识别结果,在货架存量已知的前提下,通过监控人与货架的有效交互次数,可以间接地实现货架的现有库存量的盘点,缺货时,服务器可以及时提醒店员进行理货,大大减少了人力盘货成本。
在其中一个实施例中,如图4所示,该方法还包括检测模型训练步骤,具体包括以下步骤:
步骤402,获取样本图像数据;
具体地,获取预设时间范围内图像采集装置在预设的第二拍摄视角采集的图像数据,即采集一定数量级的交互行为图像数据;从采集到的该图像数据中筛选得到具有人货交互行为的样本图像数据,该预设的第二拍摄视角可以为垂直于地面或接近垂直于地面的俯拍视角,该样本图像数据为RGBD数据。
步骤404,对该样本图像数据中的人体图像进行关键点标注和手部位置标注,得到第一标注图像数据;
具体地,该样本图像数据需要基本覆盖实际场景中不同的人货交互行为,还可以对样本数据进行增强,增加样本图像数据数量,提高交互行为过程中姿态幅度大的训练样本比例,比如增加俯身,弯腰,下蹲等人货交互行为姿态比例,提高检测模型的检测准确度。具体实施过程中,可以将该第一标注图像数据的一部分作为训练数据集,其余部分作为验证数据集。
步骤406,对第一标注图像数据进行图像增强处理,得到第一训练数据集;具体实施过程中,对第一标注图像数据中的训练数据集进行图像增强处理,得到第一训练数据集。
具体地,所述图像增强处理可以包括以下任意一种或多种图像变换方法,例如:图像归一化、随机裁剪图像、图像缩放、图像翻转、图像仿射变换、图像对比度变化、 图像色调变化、图像饱和度变化,以及在图像上添加色调干扰块等。
步骤408,将第一训练数据集输入HRNet模型中进行训练,得到检测模型。具体地,可以采用HRNet模型的不同网络架构来训练人体姿态检测模型,在通过验证数据集对不同网络架构训练得到的各模型进行验证评估,选择效果最优的模型,设定为检测模型。
在其中一个实施例中,如图5所示,该方法还包括分类识别模型训练步骤,具体包括以下步骤:
步骤502,获取样本图像数据;
步骤504,对样本图像数据中的手部区域进行标注且对位于手部区域内的物品进行物品类别标注,得到第二标注图像数据;
步骤506,对第二标注图像数据进行图像增强处理,得到第二训练数据集;
具体地,所述图像增强处理可以包括以下任意一种或多种图像变换方法,例如:图像归一化、随机裁剪图像、图像缩放、图像翻转、图像仿射变换、图像对比度变化、图像色调变化、图像饱和度变化,以及在图像上添加色调干扰块等。
步骤508,将第二训练数据集输入yolov3-tiny网络或者vgg16网络中进行训练,得到预设的分类识别模型。
本技术方案通过视线垂直或接近垂直于地面的深度摄像头采集RGBD数据,再通过人工整理收集具有人货交互行为的RGBD数据作为训练样本,即样本图像数据,利用深度学习训练,用训练模型结果来识别人体的不同姿态,检测模型能够更灵活准确地识别交互行为,而且具有较强的移植性。
应该理解的是,虽然图2-5的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-5中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
一种交互行为识别装置,如图6所示,提供了一种交互行为识别装置,包括:第一获取模块602、第一检测模块604、跟踪模块606、第二检测模块608和第一交互行为识别模块610,其中:
第一获取模块602,用于获取待检测图像;
第一检测模块604,用于通过预设的检测模型对待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,检测模型用于进行人体姿态检测;
跟踪模块606,用于根据人体姿态信息,对人体姿态进行跟踪,得到人体运动轨迹信息,且根据手部位置信息,对手部位置进行目标跟踪,获取手部区域图像;
第二检测模块608,用于通过预设的分类识别模型对手部区域图像进行物品识别,得到物品识别结果,分类识别模型用于进行物品识别;
第一交互行为识别模块610,用于根据人体运动轨迹信息和物品识别结果,得到第一交互行为识别结果。
在其中一个实施例中,第一检测模块604还用于对待检测图像进行预设处理,得到待检测图像中的人体图像;通过预设的检测模型对人体图像进行人体姿态检测,得到人体姿态信息和手部位置信息。
在其中一个实施例中,该装置还包括:
人体位置模块,用于根据待检测图像,获取人体位置信息;
第二交互行为识别模块,用于根据人体运动轨迹信息、物品识别结果、人体位置信息及预设的货架信息,得到第二交互行为识别结果,第二交互行为识别结果为人货交互行为识别结果。
在其中一个实施例中,第一获取模块602还用于获取图像采集装置在预设的第一拍摄视角采集的待检测图像;优选地,预设的第一拍摄视角为垂直于地面的俯拍视角,待检测图像为RGBD数据。
在其中一个实施例中,该装置还包括:
第二获取模块,用于获取样本图像数据;
第一标注模块,用于对样本图像数据中的人体图像进行关键点标注和手部位置标注,得到第一标注图像数据;
第一增强模块,用于对第一标注图像数据进行图像增强处理,得到第一训练数据集;
第一训练模块,用于将第一训练数据集输入HRNet模型中进行训练,得到检测模型。
在其中一个实施例中,该装置还包括:
第二标注模块,用于对样本图像数据中的手部区域进行标注且对位于手部区域内 的物品进行物品类别标注,得到第二标注图像数据;
第二增强模块,用于对第二标注图像数据进行图像增强处理,得到第二训练数据集;
第二训练模块,用于将第二训练数据集输入yolov3-tiny网络或者vgg16网络中进行训练,得到预设的分类识别模型。
在其中一个实施例中,第二获取模块还用于获取预设时间范围内图像采集装置在预设的第二拍摄视角采集的图像数据;从采集到的图像数据中筛选得到具有人货交互行为的样本图像数据,优选地,预设的第二拍摄视角为垂直于地面的俯拍视角,样本图像数据为RGBD数据。
关于交互行为识别装置的具体限定可以参见上文中对于交互行为识别方法的限定,在此不再赘述。上述交互行为识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种交互行为识别方法。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:获取待检测图像;通过预设的检测模型对待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,检测模型用于进行人体姿态检测;根据人体姿态信息,对人体姿 态进行跟踪,得到人体运动轨迹信息,且根据手部位置信息,对手部位置进行目标跟踪,获取手部区域图像;通过预设的分类识别模型对手部区域图像进行物品识别,得到物品识别结果,分类识别模型用于进行物品识别;根据人体运动轨迹信息和物品识别结果,得到第一交互行为识别结果。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:通过预设的检测模型对待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,包括:对待检测图像进行预设处理,得到待检测图像中的人体图像;通过预设的检测模型对人体图像进行人体姿态检测,得到人体姿态信息和手部位置信息。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:根据待检测图像,获取人体位置信息;根据人体运动轨迹信息、物品识别结果、人体位置信息及预设的货架信息,得到第二交互行为识别结果,第二交互行为识别结果为人货交互行为识别结果。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取待检测图像,包括:获取图像采集装置在预设的第一拍摄视角采集的待检测图像;优选地,预设的第一拍摄视角为垂直于地面的俯拍视角,待检测图像为RGBD数据。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取样本图像数据;对样本图像数据中的人体图像进行关键点标注和手部位置标注,得到第一标注图像数据;对第一标注图像数据进行图像增强处理,得到第一训练数据集;将第一训练数据集输入HRNet模型中进行训练,得到检测模型。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:对样本图像数据中的手部区域进行标注且对位于手部区域内的物品进行物品类别标注,得到第二标注图像数据;对第二标注图像数据进行图像增强处理,得到第二训练数据集;将第二训练数据集输入卷积神经网络中进行训练,得到预设的分类识别模型。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取样本图像数据,包括:获取预设时间范围内图像采集装置在预设的第二拍摄视角采集的图像数据;从采集到的图像数据中筛选得到具有人货交互行为的样本图像数据,优选地,预设的第二拍摄视角为垂直于地面的俯拍视角,样本图像数据为RGBD数据。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取待检测图像;通过预设的检测模型对待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,检测模型用于进行 人体姿态检测;根据人体姿态信息,对人体姿态进行跟踪,得到人体运动轨迹信息,且根据手部位置信息,对手部位置进行目标跟踪,获取手部区域图像;通过预设的分类识别模型对手部区域图像进行物品识别,得到物品识别结果,分类识别模型用于进行物品识别;根据人体运动轨迹信息和物品识别结果,得到第一交互行为识别结果。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:通过预设的检测模型对待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,包括:对待检测图像进行预设处理,得到待检测图像中的人体图像;通过预设的检测模型对人体图像进行人体姿态检测,得到人体姿态信息和手部位置信息。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:根据待检测图像,获取人体位置信息;根据人体运动轨迹信息、物品识别结果、人体位置信息及预设的货架信息,得到第二交互行为识别结果,第二交互行为识别结果为人货交互行为识别结果。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取待检测图像,包括:获取图像采集装置在预设的第一拍摄视角采集的待检测图像;优选地,预设的第一拍摄视角为垂直于地面的俯拍视角,待检测图像为RGBD数据。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取样本图像数据;对样本图像数据中的人体图像进行关键点标注和手部位置标注,得到第一标注图像数据;对第一标注图像数据进行图像增强处理,得到第一训练数据集;将第一训练数据集输入HRNet模型中进行训练,得到检测模型。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:对样本图像数据中的手部区域进行标注且对位于手部区域内的物品进行物品类别标注,得到第二标注图像数据;对第二标注图像数据进行图像增强处理,得到第二训练数据集;将第二训练数据集输入卷积神经网络中进行训练,得到预设的分类识别模型。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取样本图像数据,包括:获取预设时间范围内图像采集装置在预设的第二拍摄视角采集的图像数据;从采集到的图像数据中筛选得到具有人货交互行为的样本图像数据,优选地,预设的第二拍摄视角为垂直于地面的俯拍视角,样本图像数据为RGBD数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流 程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种交互行为识别方法,其特征在于,所述方法包括:
    获取待检测图像;
    通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;
    根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息;根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;
    通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;
    根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
  2. 根据权利要求1所述的方法,其特征在于,所述通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,包括:
    对所述待检测图像进行预设处理,得到所述待检测图像中的人体图像;
    通过预设的检测模型对所述人体图像进行人体姿态检测,得到所述人体姿态信息和所述手部位置信息。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    根据所述待检测图像,获取人体位置信息;
    根据所述人体运动轨迹信息、所述物品识别结果、所述人体位置信息及预设的货架信息,得到第二交互行为识别结果,所述第二交互行为识别结果为人货交互行为识别结果。
  4. 根据权利要求3所述的方法,其特征在于,所述获取待检测图像,包括:
    获取图像采集装置在预设的第一拍摄视角采集的所述待检测图像;
    优选地,所述预设的第一拍摄视角为垂直于地面的俯拍视角,所述待检测图像为RGBD数据。
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述方法还包括:
    获取样本图像数据;
    对所述样本图像数据中的人体图像进行关键点标注和手部位置标注,得到第一标注图像数据;
    对所述第一标注图像数据进行图像增强处理,得到第一训练数据集;
    将所述第一训练数据集输入HRNet模型中进行训练,得到所述检测模型。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    对所述样本图像数据中的手部区域进行标注且对位于所述手部区域内的物品进行物品类别标注,得到第二标注图像数据;
    对所述第二标注图像数据进行图像增强处理,得到第二训练数据集;
    将所述第二训练数据集输入卷积神经网络中进行训练,得到所述预设的分类识别模型;优选地,所述卷积神经网络为yolov3-tiny网络或者vgg16网络。
  7. 根据权利要求6所述的方法,其特征在于,所述获取样本图像数据,包括:
    获取预设时间范围内图像采集装置在预设的第二拍摄视角采集的图像数据;
    从采集到的所述图像数据中筛选得到具有人货交互行为的样本图像数据,优选地,所述预设的第二拍摄视角为垂直于地面的俯拍视角,所述样本图像数据为RGBD数据。
  8. 一种交互行为识别装置,其特征在于,所述装置包括:
    第一获取模块,用于获取待检测图像;
    第一检测模块,用于通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;
    跟踪模块,用于根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息,且根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;
    第二检测模块,用于通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;
    第一交互行为识别模块,用于根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。
PCT/CN2020/096994 2019-09-11 2020-06-19 交互行为识别方法、装置、计算机设备和存储介质 WO2021047232A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3154025A CA3154025A1 (en) 2019-09-11 2020-06-19 Interactive behavior recognizing method, device, computer equipment and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910857295.7 2019-09-11
CN201910857295.7A CN110674712A (zh) 2019-09-11 2019-09-11 交互行为识别方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021047232A1 true WO2021047232A1 (zh) 2021-03-18

Family

ID=69077877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/096994 WO2021047232A1 (zh) 2019-09-11 2020-06-19 交互行为识别方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
CN (1) CN110674712A (zh)
CA (1) CA3154025A1 (zh)
WO (1) WO2021047232A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031464A (zh) * 2021-03-22 2021-06-25 北京市商汤科技开发有限公司 设备控制方法、装置、电子设备及存储介质
CN113448443A (zh) * 2021-07-12 2021-09-28 交互未来(北京)科技有限公司 一种基于硬件结合的大屏幕交互方法、装置和设备
CN113687715A (zh) * 2021-07-20 2021-11-23 温州大学 基于计算机视觉的人机交互系统及交互方法
CN113792700A (zh) * 2021-09-24 2021-12-14 成都新潮传媒集团有限公司 一种电瓶车入箱检测方法、装置、计算机设备及存储介质
CN114274184A (zh) * 2021-12-17 2022-04-05 重庆特斯联智慧科技股份有限公司 一种基于投影引导的物流机器人人机交互方法及系统

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674712A (zh) * 2019-09-11 2020-01-10 苏宁云计算有限公司 交互行为识别方法、装置、计算机设备和存储介质
CN111259817A (zh) * 2020-01-17 2020-06-09 维沃移动通信有限公司 物品清单建立方法及电子设备
CN111208148A (zh) * 2020-02-21 2020-05-29 凌云光技术集团有限责任公司 一种挖孔屏漏光缺陷检测系统
CN111339903B (zh) * 2020-02-21 2022-02-08 河北工业大学 一种多人人体姿态估计方法
CN111507231B (zh) * 2020-04-10 2023-06-23 盛景智能科技(嘉兴)有限公司 工序步骤正确性自动化检测方法和系统
CN111679737B (zh) * 2020-05-27 2022-06-21 维沃移动通信有限公司 手部分割方法和电子设备
CN111563480B (zh) * 2020-06-01 2024-01-12 北京嘀嘀无限科技发展有限公司 冲突行为检测方法、装置、计算机设备和存储介质
CN111797728B (zh) * 2020-06-19 2024-06-14 浙江大华技术股份有限公司 一种运动物体的检测方法、装置、计算设备及存储介质
CN111882601B (zh) * 2020-07-23 2023-08-25 杭州海康威视数字技术股份有限公司 定位方法、装置及设备
CN114093019A (zh) * 2020-07-29 2022-02-25 顺丰科技有限公司 抛扔动作检测模型训练方法、装置和计算机设备
CN114302050A (zh) * 2020-09-22 2022-04-08 阿里巴巴集团控股有限公司 图像处理方法及设备、非易失性存储介质
CN111931740B (zh) * 2020-09-29 2021-01-26 创新奇智(南京)科技有限公司 商品销量识别方法及装置、电子设备、存储介质
CN112132868B (zh) * 2020-10-14 2024-02-27 杭州海康威视系统技术有限公司 一种支付信息的确定方法、装置及设备
CN112418118A (zh) * 2020-11-27 2021-02-26 招商新智科技有限公司 一种无监督桥下行人入侵检测方法及装置
CN112560646A (zh) * 2020-12-09 2021-03-26 上海眼控科技股份有限公司 交易行为的检测方法、装置、设备及存储介质
CN112784760B (zh) 2021-01-25 2024-04-12 北京百度网讯科技有限公司 人体行为识别方法、装置、设备以及存储介质
CN112949689A (zh) * 2021-02-01 2021-06-11 Oppo广东移动通信有限公司 图像识别方法、装置、电子设备及存储介质
CN114241354A (zh) * 2021-11-19 2022-03-25 上海浦东发展银行股份有限公司 仓库人员行为识别方法、装置、计算机设备、存储介质
CN114327062A (zh) * 2021-12-28 2022-04-12 深圳Tcl新技术有限公司 人机交互方法、装置、电子设备、存储介质和程序产品
CN114429647A (zh) * 2022-01-21 2022-05-03 上海浦东发展银行股份有限公司 一种递进式人物交互识别方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881100A (zh) * 2012-08-24 2013-01-16 济南纳维信息技术有限公司 基于视频分析的实体店面防盗监控方法
CN105245828A (zh) * 2015-09-02 2016-01-13 北京旷视科技有限公司 物品分析方法和设备
CN105518734A (zh) * 2013-09-06 2016-04-20 日本电气株式会社 顾客行为分析系统、顾客行为分析方法、非暂时性计算机可读介质和货架系统
US20170061204A1 (en) * 2014-05-12 2017-03-02 Fujitsu Limited Product information outputting method, control device, and computer-readable recording medium
CN107424273A (zh) * 2017-07-28 2017-12-01 杭州宇泛智能科技有限公司 一种无人超市的管理方法
CN109977896A (zh) * 2019-04-03 2019-07-05 上海海事大学 一种超市智能售货系统
CN110674712A (zh) * 2019-09-11 2020-01-10 苏宁云计算有限公司 交互行为识别方法、装置、计算机设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881100A (zh) * 2012-08-24 2013-01-16 济南纳维信息技术有限公司 基于视频分析的实体店面防盗监控方法
CN105518734A (zh) * 2013-09-06 2016-04-20 日本电气株式会社 顾客行为分析系统、顾客行为分析方法、非暂时性计算机可读介质和货架系统
US20170061204A1 (en) * 2014-05-12 2017-03-02 Fujitsu Limited Product information outputting method, control device, and computer-readable recording medium
CN105245828A (zh) * 2015-09-02 2016-01-13 北京旷视科技有限公司 物品分析方法和设备
CN107424273A (zh) * 2017-07-28 2017-12-01 杭州宇泛智能科技有限公司 一种无人超市的管理方法
CN109977896A (zh) * 2019-04-03 2019-07-05 上海海事大学 一种超市智能售货系统
CN110674712A (zh) * 2019-09-11 2020-01-10 苏宁云计算有限公司 交互行为识别方法、装置、计算机设备和存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031464A (zh) * 2021-03-22 2021-06-25 北京市商汤科技开发有限公司 设备控制方法、装置、电子设备及存储介质
CN113448443A (zh) * 2021-07-12 2021-09-28 交互未来(北京)科技有限公司 一种基于硬件结合的大屏幕交互方法、装置和设备
CN113687715A (zh) * 2021-07-20 2021-11-23 温州大学 基于计算机视觉的人机交互系统及交互方法
CN113792700A (zh) * 2021-09-24 2021-12-14 成都新潮传媒集团有限公司 一种电瓶车入箱检测方法、装置、计算机设备及存储介质
CN113792700B (zh) * 2021-09-24 2024-02-27 成都新潮传媒集团有限公司 一种电瓶车入箱检测方法、装置、计算机设备及存储介质
CN114274184A (zh) * 2021-12-17 2022-04-05 重庆特斯联智慧科技股份有限公司 一种基于投影引导的物流机器人人机交互方法及系统
CN114274184B (zh) * 2021-12-17 2024-05-24 重庆特斯联智慧科技股份有限公司 一种基于投影引导的物流机器人人机交互方法及系统

Also Published As

Publication number Publication date
CN110674712A (zh) 2020-01-10
CA3154025A1 (en) 2021-03-18

Similar Documents

Publication Publication Date Title
WO2021047232A1 (zh) 交互行为识别方法、装置、计算机设备和存储介质
CN110502986B (zh) 识别图像中人物位置方法、装置、计算机设备和存储介质
WO2021043073A1 (zh) 基于图像识别的城市宠物活动轨迹监测方法及相关设备
US10089556B1 (en) Self-attention deep neural network for action recognition in surveillance videos
Shen et al. The first facial landmark tracking in-the-wild challenge: Benchmark and results
US8379920B2 (en) Real-time clothing recognition in surveillance videos
Patruno et al. People re-identification using skeleton standard posture and color descriptors from RGB-D data
CN105740780B (zh) 人脸活体检测的方法和装置
CN111325769B (zh) 一种目标对象检测方法及装置
CN110991261A (zh) 交互行为识别方法、装置、计算机设备和存储介质
CN111626123A (zh) 视频数据处理方法、装置、计算机设备及存储介质
CN110889355B (zh) 一种人脸识别校验方法、系统及存储介质
CN111178252A (zh) 多特征融合的身份识别方法
US11062126B1 (en) Human face detection method
US20190228209A1 (en) Lip movement capturing method and device, and storage medium
CN106682641A (zh) 基于fhog‑lbph特征的图像行人识别方法
WO2019033570A1 (zh) 嘴唇动作分析方法、装置及存储介质
CN110717449A (zh) 车辆年检人员的行为检测方法、装置和计算机设备
CN105893957A (zh) 基于视觉湖面船只检测识别与跟踪方法
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
CN112541394A (zh) 黑眼圈及鼻炎识别方法、系统及计算机介质
CN116912880A (zh) 一种基于鸟类关键点检测的鸟类识别质量评估方法及系统
CN115375991A (zh) 一种强/弱光照和雾环境自适应目标检测方法
CN111402185B (zh) 一种图像检测方法及装置
CN117274843B (zh) 基于轻量级边缘计算的无人机前端缺陷识别方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20863604

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3154025

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20863604

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20863604

Country of ref document: EP

Kind code of ref document: A1