WO2021047232A1 - 交互行为识别方法、装置、计算机设备和存储介质 - Google Patents
交互行为识别方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2021047232A1 WO2021047232A1 PCT/CN2020/096994 CN2020096994W WO2021047232A1 WO 2021047232 A1 WO2021047232 A1 WO 2021047232A1 CN 2020096994 W CN2020096994 W CN 2020096994W WO 2021047232 A1 WO2021047232 A1 WO 2021047232A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- human body
- preset
- information
- recognition
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Definitions
- This application relates to an interactive behavior recognition method, device, computer equipment and storage medium.
- the existing human-goods interaction behavior recognition methods usually use template and rule matching, and the definition of the template and the formulation of rules require a lot of manpower and are often only suitable for the recognition of common human postures and recognition accuracy. It is poor, and its portability is weak, and it can only be applied to the interaction between humans and goods in specific scenarios.
- An interactive behavior identification method includes:
- the performing human posture detection on the image to be detected by using a preset detection model to obtain human posture information and hand position information includes:
- the human body posture detection is performed on the human body image through a preset detection model to obtain the human body posture information and the hand position information.
- the method further includes:
- a second interactive behavior recognition result is obtained, and the second interactive behavior recognition result is a human-goods interaction behavior recognition result.
- the acquiring the image to be detected includes:
- the preset first shooting angle of view is an overhead angle of view perpendicular to the ground, and the image to be detected is RGBD data.
- the method further includes:
- the first training data set is input into the HRNet model for training to obtain the detection model.
- the method further includes:
- the second training data set is input into a convolutional neural network for training to obtain the preset classification and recognition model, and the convolutional neural network is a yolov3-tiny network or a vgg16 network.
- the acquiring sample image data includes:
- the sample image data with human-goods interaction behavior is filtered from the collected image data.
- the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data .
- An interactive behavior recognition device includes:
- the first acquisition module is used to acquire the image to be detected
- the first detection module is configured to perform human posture detection on the image to be detected through a preset detection model to obtain human posture information and hand position information, and the detection model is used to perform human posture detection;
- the tracking module is used to track the human body posture according to the human body posture information to obtain human body motion trajectory information, and to perform target tracking on the hand position according to the hand position information to obtain a hand region image ;
- the second detection module is configured to perform item recognition on the hand region image through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
- the first interactive behavior recognition module is configured to obtain the first interactive behavior recognition result according to the human body motion trajectory information and the article recognition result.
- a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer program:
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
- the above-mentioned interactive behavior recognition methods, devices, computer equipment and storage media use detection models and classification recognition models to perform interactive behavior recognition on the image to be detected. Based on the original model, only a small amount of data needs to be collected, and it can be deployed in different stores. , It has strong portability, low deployment cost, and the detection model can identify interactive behaviors more flexibly and accurately, which improves the recognition accuracy.
- Figure 1 is an application environment diagram of an interactive behavior recognition method in an embodiment
- Figure 2 is a schematic flowchart of an interactive behavior identification method in an embodiment
- FIG. 3 is a schematic flowchart of an interactive behavior recognition method in another embodiment
- FIG. 4 is a schematic flow chart of a training step of a detection model in an embodiment
- FIG. 5 is a schematic flowchart of a training step of a classification and recognition model in an embodiment
- Figure 6 is a structural block diagram of an interactive behavior recognition device in an embodiment
- Fig. 7 is an internal structure diagram of a computer device in an embodiment.
- the interactive behavior identification method provided in this application can be applied to the application environment as shown in FIG. 1.
- the terminal 102 communicates with the server 104 through the network through the network.
- the terminal 102 may be, but is not limited to, various image acquisition devices. More specifically, the terminal 102 may use one or more depth cameras with a shooting angle perpendicular to the ground, and the server 104 may be an independent server or a combination of multiple servers. Server clusters are implemented.
- an interactive behavior recognition method is provided.
- the method is applied to the server in FIG. 1 as an example for description, including the following steps:
- Step 202 Obtain an image to be detected
- the image to be detected is an image of interaction behavior between a person and an object to be detected.
- step 202 includes the following content: the server acquires the image to be detected collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first shooting angle of view is perpendicular to the ground or close to perpendicular to the ground.
- the overhead viewing angle of the ground, and the image to be detected is RGBD data.
- the image to be detected is the RGBD data collected by the image acquisition device in the overhead viewing angle scene.
- the image acquisition device can use a depth camera set above the shelf.
- the first shooting angle of view may not be perpendicular to the ground, which is allowed in the installation environment. Under the circumstance, the angle of view can be arbitrarily close to vertical, and try to avoid shooting blind spots.
- This technical solution uses a depth camera with a top view angle to detect the interaction between people and goods. Compared with the traditional camera installation method at a certain angle with the ground, it can effectively avoid the problem of people and shelf occlusion based on a squint angle of view, as well as hand The problem of increasing the difficulty of internal tracking; in practical applications, image acquisition from the overhead view can better identify the occurrence of cross-pickup behavior of different people.
- Step 204 Perform human posture detection on the image to be detected using a preset detection model to obtain human posture information and hand position information, and the detection model is used for human posture detection;
- the detection model is a human posture detection model, which can be used to detect key points of human bones.
- the server inputs the human body image to the detection model; performs human posture detection on the human body image in the detection model; obtains human posture information and hand position information output by the detection model; human posture detection can be a commonly used bone line detection method,
- the obtained human body posture information is the human bone key point image, and the hand position information is the specific position of the hand in the human bone key point image.
- Step 206 tracking the human body posture according to the human body posture information to obtain the human body motion trajectory information; and according to the hand position information, performing target tracking on the hand position to obtain an image of the hand area;
- the target tracking algorithm is used, for example, the Camshift algorithm can be changed to adapt to the size and shape of the moving target to track the motion trajectory of the human body and the hand respectively to obtain the human body motion trajectory information, and expand the hand position during the tracking process to obtain the hand Area image.
- Step 208 Perform item recognition on the image of the hand area through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
- the classification recognition model is an item recognition model, and an item recognition model trained by deep learning can be used.
- the hand area image is input to the classification recognition model, and the hand area image is detected in the classification recognition model to determine whether there is an item in the hand area.
- the classification recognition model recognizes the item and outputs Item recognition results; on the other hand, the classification and recognition model can also judge the skin color of the hand region image, and issue early warnings for the deliberate use of clothing and other items to cover the hands in a timely manner to achieve the purpose of reducing cargo damage.
- Step 210 Obtain a first interactive behavior recognition result according to the human body motion trajectory information and the item recognition result.
- the first interaction behavior recognition result is the interaction behavior recognition result between people and objects.
- the aforementioned human body motion trajectory information can be used to determine a person’s behavior, such as stretching, leaning, bending, and squatting, etc., and then according to whether the human hand holds an object, and when the hand holds an object,
- the item recognition result obtained by the item recognition can determine that the human body is picking up or putting down the item, that is, analyzing the recognition result of the interaction between the person and the item.
- a detection model and a classification recognition model are used to recognize the interactive behavior of the image to be detected. After model training and algorithm tuning, the interactive behavior between people and objects can be automatically recognized, and the recognition result is better. Accurate; and based on the current detection model and classification and recognition model, only a small amount of data can be collected, and it can be deployed in different scenarios. It has strong portability and low deployment cost.
- the method includes the following steps:
- Step 302 Obtain an image to be detected
- Step 304 Perform preset processing on the image to be detected to obtain a human body image in the image to be detected;
- step 304 is a process of extracting the human body image that needs to be used in the subsequent steps from the image to be detected, while shielding the unnecessary background image.
- the foregoing preset processing may adopt background modeling, that is, perform background modeling based on Gaussian mixture on the image to be detected to obtain a background model;
- the human body image in the image to be detected is obtained.
- Step 306 Perform human posture detection on the human body image by using a preset detection model to obtain human posture information and hand position information;
- Step 308 tracking the human body posture according to the human body posture information to obtain the human body motion trajectory information, and performing target tracking on the hand position according to the hand position information to obtain an image of the hand area;
- Step 310 Perform item recognition on the image of the hand area through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
- Step 312 Obtain a first interactive behavior recognition result according to the human body motion track information and the item recognition result.
- step 304 masks out unnecessary background images by preprocessing the image to be detected, and only retains the human body image to be used later, thereby reducing the amount of data to be processed in the next step and improving the data processing efficiency.
- the method further includes:
- the human body position information may refer to the position information of the human body in the three-dimensional world coordinate system.
- the acquisition position information of the image to be detected in the three-dimensional world coordinate system is acquired; according to the position information of the human body image in the image to be detected, and the acquisition position information, the three-dimensional world coordinate transformation is performed to obtain the human body in the three-dimensional world coordinate system.
- Location information is acquired.
- the second interactive behavior recognition result is obtained, and the second interactive behavior recognition result is the human-goods interaction behavior recognition result.
- the shelf information includes shelf position information and item information in the shelf, and the shelf position information is the three-dimensional world coordinate position of the shelf.
- the shelf information corresponding to the position of the human body is obtained according to the position information of the human body and the preset shelf information; an interaction between the human body and the shelf is confirmed by tracking the three-dimensional world coordinate position of the human body and the shelf, and then In the tracking process, by identifying whether there are goods associated with the shelf in the hand area, the occurrence of an effective human-goods interaction behavior is further confirmed.
- the effective human-goods interaction behavior can complete a pickup behavior for the customer from the shelf.
- This technical solution converts the position of the customer in the world coordinate system through the three-dimensional world coordinate transformation, and associates it with the shelf, and can identify whether the customer has an effective human-goods interaction behavior; on the other hand, it is the basis for identifying the human-goods interaction behavior
- the shelf stock under the premise that the shelf stock is known, by monitoring the effective number of interactions between people and the shelf, the inventory of the shelf’s existing inventory can be indirectly realized.
- the server can promptly remind the clerk to take care of it. Goods, greatly reducing the cost of manpower inventory.
- the method further includes a detection model training step, which specifically includes the following steps:
- Step 402 Obtain sample image data
- the preset second shooting angle of view may be a top-down angle of view perpendicular to the ground or nearly perpendicular to the ground, and the sample image data is RGBD data.
- Step 404 Perform key point annotation and hand position annotation on the human body image in the sample image data to obtain first annotated image data;
- the sample image data needs to basically cover different human-goods interaction behaviors in the actual scene.
- the sample data can also be enhanced to increase the number of sample image data and increase the proportion of training samples with large postures during the interaction behavior, such as increasing the profile.
- the posture ratio of body, bending, squatting and other human-goods interaction behaviors improves the detection accuracy of the detection model.
- a part of the first annotated image data can be used as a training data set, and the remaining part can be used as a verification data set.
- Step 406 Perform image enhancement processing on the first annotated image data to obtain a first training data set; in a specific implementation process, perform image enhancement processing on the training data set in the first annotated image data to obtain the first training data set.
- the image enhancement processing may include any one or more of the following image transformation methods, such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
- image transformation methods such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
- Step 408 Input the first training data set into the HRNet model for training to obtain a detection model.
- different network architectures of the HRNet model can be used to train the human posture detection model, and each model obtained by training with different network architectures is verified and evaluated through the verification data set, and the model with the best effect is selected and set as the detection model.
- the method further includes a classification and recognition model training step, which specifically includes the following steps:
- Step 502 Obtain sample image data
- Step 504 Label the hand area in the sample image data and label the items located in the hand area to obtain the second label image data;
- Step 506 Perform image enhancement processing on the second annotated image data to obtain a second training data set
- the image enhancement processing may include any one or more of the following image transformation methods, such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
- image transformation methods such as: image normalization, random cropping of images, image scaling, image inversion, image affine transformation, image contrast change, image tone change , The image saturation changes, and the hue interference block is added to the image.
- Step 508 Input the second training data set into the yolov3-tiny network or the vgg16 network for training to obtain a preset classification and recognition model.
- RGBD data collects RGBD data through a depth camera with a line of sight vertical or close to the ground, and then manually collects RGBD data with human-goods interaction behavior as training samples, that is, sample image data, using deep learning training, and training model results. Recognizing different postures of the human body, the detection model can recognize interactive behaviors more flexibly and accurately, and has strong transplantability.
- An interactive behavior recognition device provides an interactive behavior recognition device, including: a first acquisition module 602, a first detection module 604, a tracking module 606, a second detection module 608, and a first interaction behavior
- the identification module 610 wherein:
- the first acquisition module 602 is used to acquire the image to be detected
- the first detection module 604 is configured to perform human posture detection on the image to be detected using a preset detection model to obtain human posture information and hand position information, and the detection model is used to perform human posture detection;
- the tracking module 606 is used to track the posture of the human body according to the posture information of the human body to obtain the trajectory information of the human body, and to track the target according to the hand position information and the position of the hand to obtain an image of the hand area;
- the second detection module 608 is configured to perform item recognition on the image of the hand area through a preset classification and recognition model to obtain an item recognition result, and the classification and recognition model is used for item recognition;
- the first interactive behavior recognition module 610 is configured to obtain the first interactive behavior recognition result according to the human body motion track information and the item recognition result.
- the first detection module 604 is also used to perform preset processing on the image to be detected to obtain the human body image in the image to be detected; to perform human body posture detection on the human body image through the preset detection model to obtain human body posture information And hand position information.
- the device further includes:
- the human body position module is used to obtain human body position information according to the image to be detected
- the second interactive behavior recognition module is used to obtain the second interactive behavior recognition result according to the human body movement track information, the item recognition result, the human body position information and the preset shelf information, and the second interactive behavior recognition result is the human-goods interaction behavior recognition result.
- the first acquisition module 602 is also used to acquire the to-be-detected image collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first shooting angle of view is a top view perpendicular to the ground. Viewing angle, the image to be inspected is RGBD data.
- the device further includes:
- the second acquisition module is used to acquire sample image data
- the first labeling module is used to label the key points and hand positions of the human body image in the sample image data to obtain the first labelled image data;
- the first enhancement module is configured to perform image enhancement processing on the first annotated image data to obtain a first training data set
- the first training module is used to input the first training data set into the HRNet model for training to obtain the detection model.
- the device further includes:
- the second labeling module is used to label the hand area in the sample image data and label the items located in the hand area to obtain the second label image data;
- the second enhancement module is configured to perform image enhancement processing on the second annotated image data to obtain a second training data set
- the second training module is used to input the second training data set into the yolov3-tiny network or the vgg16 network for training to obtain a preset classification and recognition model.
- the second acquisition module is also used to acquire image data collected by the image acquisition device at a preset second shooting angle within a preset time range; the collected image data is filtered to obtain a human-goods interaction behavior
- the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data.
- Each module in the above-mentioned interactive behavior recognition device can be implemented in whole or in part by software, hardware, and a combination thereof.
- the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
- a computer device is provided.
- the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
- the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
- the processor of the computer device is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, a computer program, and a database.
- the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
- the database of the computer equipment is used to store data.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer program is executed by the processor to realize an interactive behavior identification method.
- FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
- the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
- a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor.
- the processor executes the computer program, the following steps are implemented: acquiring an image to be detected;
- the preset detection model performs human posture detection on the image to be detected, and obtains human posture information and hand position information.
- the detection model is used for human posture detection; according to the human posture information, the human posture is tracked to obtain the human motion trajectory information, and According to the hand position information, the hand position is tracked to obtain the hand area image; the hand area image is recognized through the preset classification recognition model to obtain the object recognition result, and the classification recognition model is used for the object recognition; according to the human body
- the movement trajectory information and the item recognition result obtain the first interactive behavior recognition result.
- the processor further implements the following steps when executing the computer program: performing human posture detection on the image to be detected through a preset detection model to obtain human posture information and hand position information, including: performing preset processing on the image to be detected , The human body image in the image to be detected is obtained; the human body posture detection is performed on the human body image through the preset detection model, and the human body posture information and the hand position information are obtained.
- the processor further implements the following steps when executing the computer program: obtain the human body position information according to the image to be detected; obtain the second human body position information according to the human body movement track information, the item recognition result, the human body position information and the preset shelf information
- the interactive behavior recognition result, and the second interactive behavior recognition result is the human-goods interaction behavior recognition result.
- the processor further implements the following steps when executing the computer program: acquiring the image to be inspected includes: acquiring the image to be inspected collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first The shooting angle of view is an overhead angle of view perpendicular to the ground, and the image to be detected is RGBD data.
- the processor further implements the following steps when executing the computer program: acquiring sample image data; performing key point annotation and hand position annotation on the human body image in the sample image data to obtain the first annotated image data;
- the annotated image data is subjected to image enhancement processing to obtain the first training data set;
- the first training data set is input into the HRNet model for training, and the detection model is obtained.
- the processor further implements the following steps when executing the computer program: labeling the hand area in the sample image data and labeling the items located in the hand area to obtain the second labeling image data;
- the second annotated image data is subjected to image enhancement processing to obtain a second training data set;
- the second training data set is input into a convolutional neural network for training, and a preset classification recognition model is obtained.
- the processor further implements the following steps when executing the computer program: acquiring sample image data includes: acquiring image data acquired by the image acquisition device at a preset second shooting angle within a preset time range; The image data is filtered to obtain sample image data with human-goods interaction behavior.
- the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data.
- a computer-readable storage medium is provided, and a computer program is stored thereon.
- the computer program is executed by a processor, the following steps are implemented: acquiring an image to be detected; performing a human body on the image to be detected through a preset detection model Posture detection, the human body posture information and hand position information are obtained, and the detection model is used for human body posture detection; according to the human body posture information, the human body posture is tracked to obtain the human body motion trajectory information, and according to the hand position information, the hand position is performed Target tracking, obtain the image of the hand area; use the preset classification recognition model to recognize the hand area image to obtain the object recognition result, and the classification recognition model is used for the object recognition; according to the human body movement trajectory information and the object recognition result, the first 1. Recognition result of interactive behavior.
- the following steps are further implemented: performing human posture detection on the image to be detected through a preset detection model to obtain human posture information and hand position information, including: presetting the image to be detected Through processing, the human body image in the image to be detected is obtained; the human body posture detection is performed on the human body image through the preset detection model, and the human body posture information and the hand position information are obtained.
- the following steps are also implemented: obtaining human body position information according to the image to be detected; obtaining the first human body position information according to the human body motion track information, item recognition result, human body position information, and preset shelf information Second, the recognition result of the interactive behavior, and the second recognition result of the interactive behavior is the recognition result of the human-goods interaction behavior.
- acquiring the image to be detected includes: acquiring the image to be detected collected by the image acquisition device at a preset first shooting angle of view; preferably, the preset first A shooting angle of view is the overhead angle of view perpendicular to the ground, and the image to be detected is RGBD data.
- the following steps are also implemented: obtaining sample image data; performing key point annotation and hand position annotation on the human body image in the sample image data to obtain the first annotated image data; Perform image enhancement processing on annotated image data to obtain a first training data set; input the first training data set into an HRNet model for training to obtain a detection model.
- the following steps are further implemented: labeling the hand area in the sample image data and labeling the items located in the hand area to obtain the second labeling image data; Perform image enhancement processing on the second labeled image data to obtain a second training data set; input the second training data set into a convolutional neural network for training, and obtain a preset classification recognition model.
- acquiring sample image data includes: acquiring image data acquired by the image acquisition device at a preset second shooting angle within a preset time range; Sample image data with human-to-goods interaction behavior is obtained by screening the image data in.
- the preset second shooting angle of view is an overhead angle of view perpendicular to the ground, and the sample image data is RGBD data.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种交互行为识别方法,其特征在于,所述方法包括:获取待检测图像;通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息;根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
- 根据权利要求1所述的方法,其特征在于,所述通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,包括:对所述待检测图像进行预设处理,得到所述待检测图像中的人体图像;通过预设的检测模型对所述人体图像进行人体姿态检测,得到所述人体姿态信息和所述手部位置信息。
- 根据权利要求2所述的方法,其特征在于,所述方法还包括:根据所述待检测图像,获取人体位置信息;根据所述人体运动轨迹信息、所述物品识别结果、所述人体位置信息及预设的货架信息,得到第二交互行为识别结果,所述第二交互行为识别结果为人货交互行为识别结果。
- 根据权利要求3所述的方法,其特征在于,所述获取待检测图像,包括:获取图像采集装置在预设的第一拍摄视角采集的所述待检测图像;优选地,所述预设的第一拍摄视角为垂直于地面的俯拍视角,所述待检测图像为RGBD数据。
- 根据权利要求1至4任意一项所述的方法,其特征在于,所述方法还包括:获取样本图像数据;对所述样本图像数据中的人体图像进行关键点标注和手部位置标注,得到第一标注图像数据;对所述第一标注图像数据进行图像增强处理,得到第一训练数据集;将所述第一训练数据集输入HRNet模型中进行训练,得到所述检测模型。
- 根据权利要求5所述的方法,其特征在于,所述方法还包括:对所述样本图像数据中的手部区域进行标注且对位于所述手部区域内的物品进行物品类别标注,得到第二标注图像数据;对所述第二标注图像数据进行图像增强处理,得到第二训练数据集;将所述第二训练数据集输入卷积神经网络中进行训练,得到所述预设的分类识别模型;优选地,所述卷积神经网络为yolov3-tiny网络或者vgg16网络。
- 根据权利要求6所述的方法,其特征在于,所述获取样本图像数据,包括:获取预设时间范围内图像采集装置在预设的第二拍摄视角采集的图像数据;从采集到的所述图像数据中筛选得到具有人货交互行为的样本图像数据,优选地,所述预设的第二拍摄视角为垂直于地面的俯拍视角,所述样本图像数据为RGBD数据。
- 一种交互行为识别装置,其特征在于,所述装置包括:第一获取模块,用于获取待检测图像;第一检测模块,用于通过预设的检测模型对所述待检测图像进行人体姿态检测,得到人体姿态信息和手部位置信息,所述检测模型用于进行人体姿态检测;跟踪模块,用于根据所述人体姿态信息,对所述人体姿态进行跟踪,得到人体运动轨迹信息,且根据所述手部位置信息,对所述手部位置进行目标跟踪,获取手部区域图像;第二检测模块,用于通过预设的分类识别模型对所述手部区域图像进行物品识别,得到物品识别结果,所述分类识别模型用于进行物品识别;第一交互行为识别模块,用于根据所述人体运动轨迹信息和所述物品识别结果,得到第一交互行为识别结果。
- 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3154025A CA3154025A1 (en) | 2019-09-11 | 2020-06-19 | Interactive behavior recognizing method, device, computer equipment and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910857295.7 | 2019-09-11 | ||
CN201910857295.7A CN110674712A (zh) | 2019-09-11 | 2019-09-11 | 交互行为识别方法、装置、计算机设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021047232A1 true WO2021047232A1 (zh) | 2021-03-18 |
Family
ID=69077877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/096994 WO2021047232A1 (zh) | 2019-09-11 | 2020-06-19 | 交互行为识别方法、装置、计算机设备和存储介质 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN110674712A (zh) |
CA (1) | CA3154025A1 (zh) |
WO (1) | WO2021047232A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113031464A (zh) * | 2021-03-22 | 2021-06-25 | 北京市商汤科技开发有限公司 | 设备控制方法、装置、电子设备及存储介质 |
CN113448443A (zh) * | 2021-07-12 | 2021-09-28 | 交互未来(北京)科技有限公司 | 一种基于硬件结合的大屏幕交互方法、装置和设备 |
CN113687715A (zh) * | 2021-07-20 | 2021-11-23 | 温州大学 | 基于计算机视觉的人机交互系统及交互方法 |
CN113792700A (zh) * | 2021-09-24 | 2021-12-14 | 成都新潮传媒集团有限公司 | 一种电瓶车入箱检测方法、装置、计算机设备及存储介质 |
CN114274184A (zh) * | 2021-12-17 | 2022-04-05 | 重庆特斯联智慧科技股份有限公司 | 一种基于投影引导的物流机器人人机交互方法及系统 |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674712A (zh) * | 2019-09-11 | 2020-01-10 | 苏宁云计算有限公司 | 交互行为识别方法、装置、计算机设备和存储介质 |
CN111259817A (zh) * | 2020-01-17 | 2020-06-09 | 维沃移动通信有限公司 | 物品清单建立方法及电子设备 |
CN111208148A (zh) * | 2020-02-21 | 2020-05-29 | 凌云光技术集团有限责任公司 | 一种挖孔屏漏光缺陷检测系统 |
CN111339903B (zh) * | 2020-02-21 | 2022-02-08 | 河北工业大学 | 一种多人人体姿态估计方法 |
CN111507231B (zh) * | 2020-04-10 | 2023-06-23 | 盛景智能科技(嘉兴)有限公司 | 工序步骤正确性自动化检测方法和系统 |
CN111679737B (zh) * | 2020-05-27 | 2022-06-21 | 维沃移动通信有限公司 | 手部分割方法和电子设备 |
CN111563480B (zh) * | 2020-06-01 | 2024-01-12 | 北京嘀嘀无限科技发展有限公司 | 冲突行为检测方法、装置、计算机设备和存储介质 |
CN111797728B (zh) * | 2020-06-19 | 2024-06-14 | 浙江大华技术股份有限公司 | 一种运动物体的检测方法、装置、计算设备及存储介质 |
CN111882601B (zh) * | 2020-07-23 | 2023-08-25 | 杭州海康威视数字技术股份有限公司 | 定位方法、装置及设备 |
CN114093019A (zh) * | 2020-07-29 | 2022-02-25 | 顺丰科技有限公司 | 抛扔动作检测模型训练方法、装置和计算机设备 |
CN114302050A (zh) * | 2020-09-22 | 2022-04-08 | 阿里巴巴集团控股有限公司 | 图像处理方法及设备、非易失性存储介质 |
CN111931740B (zh) * | 2020-09-29 | 2021-01-26 | 创新奇智(南京)科技有限公司 | 商品销量识别方法及装置、电子设备、存储介质 |
CN112132868B (zh) * | 2020-10-14 | 2024-02-27 | 杭州海康威视系统技术有限公司 | 一种支付信息的确定方法、装置及设备 |
CN112418118A (zh) * | 2020-11-27 | 2021-02-26 | 招商新智科技有限公司 | 一种无监督桥下行人入侵检测方法及装置 |
CN112560646A (zh) * | 2020-12-09 | 2021-03-26 | 上海眼控科技股份有限公司 | 交易行为的检测方法、装置、设备及存储介质 |
CN112784760B (zh) | 2021-01-25 | 2024-04-12 | 北京百度网讯科技有限公司 | 人体行为识别方法、装置、设备以及存储介质 |
CN112949689A (zh) * | 2021-02-01 | 2021-06-11 | Oppo广东移动通信有限公司 | 图像识别方法、装置、电子设备及存储介质 |
CN114241354A (zh) * | 2021-11-19 | 2022-03-25 | 上海浦东发展银行股份有限公司 | 仓库人员行为识别方法、装置、计算机设备、存储介质 |
CN114327062A (zh) * | 2021-12-28 | 2022-04-12 | 深圳Tcl新技术有限公司 | 人机交互方法、装置、电子设备、存储介质和程序产品 |
CN114429647A (zh) * | 2022-01-21 | 2022-05-03 | 上海浦东发展银行股份有限公司 | 一种递进式人物交互识别方法及系统 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102881100A (zh) * | 2012-08-24 | 2013-01-16 | 济南纳维信息技术有限公司 | 基于视频分析的实体店面防盗监控方法 |
CN105245828A (zh) * | 2015-09-02 | 2016-01-13 | 北京旷视科技有限公司 | 物品分析方法和设备 |
CN105518734A (zh) * | 2013-09-06 | 2016-04-20 | 日本电气株式会社 | 顾客行为分析系统、顾客行为分析方法、非暂时性计算机可读介质和货架系统 |
US20170061204A1 (en) * | 2014-05-12 | 2017-03-02 | Fujitsu Limited | Product information outputting method, control device, and computer-readable recording medium |
CN107424273A (zh) * | 2017-07-28 | 2017-12-01 | 杭州宇泛智能科技有限公司 | 一种无人超市的管理方法 |
CN109977896A (zh) * | 2019-04-03 | 2019-07-05 | 上海海事大学 | 一种超市智能售货系统 |
CN110674712A (zh) * | 2019-09-11 | 2020-01-10 | 苏宁云计算有限公司 | 交互行为识别方法、装置、计算机设备和存储介质 |
-
2019
- 2019-09-11 CN CN201910857295.7A patent/CN110674712A/zh active Pending
-
2020
- 2020-06-19 WO PCT/CN2020/096994 patent/WO2021047232A1/zh active Application Filing
- 2020-06-19 CA CA3154025A patent/CA3154025A1/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102881100A (zh) * | 2012-08-24 | 2013-01-16 | 济南纳维信息技术有限公司 | 基于视频分析的实体店面防盗监控方法 |
CN105518734A (zh) * | 2013-09-06 | 2016-04-20 | 日本电气株式会社 | 顾客行为分析系统、顾客行为分析方法、非暂时性计算机可读介质和货架系统 |
US20170061204A1 (en) * | 2014-05-12 | 2017-03-02 | Fujitsu Limited | Product information outputting method, control device, and computer-readable recording medium |
CN105245828A (zh) * | 2015-09-02 | 2016-01-13 | 北京旷视科技有限公司 | 物品分析方法和设备 |
CN107424273A (zh) * | 2017-07-28 | 2017-12-01 | 杭州宇泛智能科技有限公司 | 一种无人超市的管理方法 |
CN109977896A (zh) * | 2019-04-03 | 2019-07-05 | 上海海事大学 | 一种超市智能售货系统 |
CN110674712A (zh) * | 2019-09-11 | 2020-01-10 | 苏宁云计算有限公司 | 交互行为识别方法、装置、计算机设备和存储介质 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113031464A (zh) * | 2021-03-22 | 2021-06-25 | 北京市商汤科技开发有限公司 | 设备控制方法、装置、电子设备及存储介质 |
CN113448443A (zh) * | 2021-07-12 | 2021-09-28 | 交互未来(北京)科技有限公司 | 一种基于硬件结合的大屏幕交互方法、装置和设备 |
CN113687715A (zh) * | 2021-07-20 | 2021-11-23 | 温州大学 | 基于计算机视觉的人机交互系统及交互方法 |
CN113792700A (zh) * | 2021-09-24 | 2021-12-14 | 成都新潮传媒集团有限公司 | 一种电瓶车入箱检测方法、装置、计算机设备及存储介质 |
CN113792700B (zh) * | 2021-09-24 | 2024-02-27 | 成都新潮传媒集团有限公司 | 一种电瓶车入箱检测方法、装置、计算机设备及存储介质 |
CN114274184A (zh) * | 2021-12-17 | 2022-04-05 | 重庆特斯联智慧科技股份有限公司 | 一种基于投影引导的物流机器人人机交互方法及系统 |
CN114274184B (zh) * | 2021-12-17 | 2024-05-24 | 重庆特斯联智慧科技股份有限公司 | 一种基于投影引导的物流机器人人机交互方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN110674712A (zh) | 2020-01-10 |
CA3154025A1 (en) | 2021-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021047232A1 (zh) | 交互行为识别方法、装置、计算机设备和存储介质 | |
CN110502986B (zh) | 识别图像中人物位置方法、装置、计算机设备和存储介质 | |
WO2021043073A1 (zh) | 基于图像识别的城市宠物活动轨迹监测方法及相关设备 | |
US10089556B1 (en) | Self-attention deep neural network for action recognition in surveillance videos | |
Shen et al. | The first facial landmark tracking in-the-wild challenge: Benchmark and results | |
US8379920B2 (en) | Real-time clothing recognition in surveillance videos | |
Patruno et al. | People re-identification using skeleton standard posture and color descriptors from RGB-D data | |
CN105740780B (zh) | 人脸活体检测的方法和装置 | |
CN111325769B (zh) | 一种目标对象检测方法及装置 | |
CN110991261A (zh) | 交互行为识别方法、装置、计算机设备和存储介质 | |
CN111626123A (zh) | 视频数据处理方法、装置、计算机设备及存储介质 | |
CN110889355B (zh) | 一种人脸识别校验方法、系统及存储介质 | |
CN111178252A (zh) | 多特征融合的身份识别方法 | |
US11062126B1 (en) | Human face detection method | |
US20190228209A1 (en) | Lip movement capturing method and device, and storage medium | |
CN106682641A (zh) | 基于fhog‑lbph特征的图像行人识别方法 | |
WO2019033570A1 (zh) | 嘴唇动作分析方法、装置及存储介质 | |
CN110717449A (zh) | 车辆年检人员的行为检测方法、装置和计算机设备 | |
CN105893957A (zh) | 基于视觉湖面船只检测识别与跟踪方法 | |
CN113780145A (zh) | 精子形态检测方法、装置、计算机设备和存储介质 | |
CN112541394A (zh) | 黑眼圈及鼻炎识别方法、系统及计算机介质 | |
CN116912880A (zh) | 一种基于鸟类关键点检测的鸟类识别质量评估方法及系统 | |
CN115375991A (zh) | 一种强/弱光照和雾环境自适应目标检测方法 | |
CN111402185B (zh) | 一种图像检测方法及装置 | |
CN117274843B (zh) | 基于轻量级边缘计算的无人机前端缺陷识别方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20863604 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3154025 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20863604 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20863604 Country of ref document: EP Kind code of ref document: A1 |