CA3160731A1

CA3160731A1 - Interactive behavior recognizing method, device, computer equipment and storage medium

Info

Publication number: CA3160731A1
Application number: CA3160731A
Authority: CA
Inventors: Daiwei YU; Hao Sun; Yuqing Dong; Xiyang ZHUANG; Yongxiang Li
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2019-11-12
Filing date: 2020-06-19
Publication date: 2021-05-20
Also published as: WO2021093329A1; CN110991261A

Abstract

The present application relates to an interactive behavior identification method and apparatus, a computer device and a storage medium. The method comprises: acquiring an image to be detected; inputting the image into a preset multi-task model to obtain key points and a detection frame of a passerby in the image, wherein the key points are located inside the detection frame, and the multi-task model is used for passerby detection and human body key point detection; and according to the key points of the passerby and a preset item rack image corresponding to the image, determining interactive behavior information of the passerby and a corresponding item rack. By using the present method, interactive behavior between passersby and items may be efficiently identified.

Description

INTERACTIVE BEHAVIOR RECOGNIZING METHOD, DEVICE, COMPUTER
EQUIPMENT AND STORAGE MEDIUM
BACKGROUND OF THE INVENTION
Technical Field [0001] The present application relates to the field of computer vision technology, and more particularly to an interactive behavior recognizing method, and corresponding device, computer equipment and storage medium.
Description of Related Art

[0002] With the coming of age of the Internet, the retail industry has come to the stage of rapid development. The future retail will be smart retail, in which such techniques as the internet and big data will be utilized to perceive consumption habits of users, so as to provide consumers with diversified and personalized products and services, while the recognition of human-goods interactive behaviors is a problem to be dealt with in the field of smart retail.

[0003] The traditional method of recognizing human-goods interactive behaviors is usually to achieve behavior recognition by means of acoustic, optical, and electrical sensor equipments, high hardware cost is necessitated, the scenarios of use are restricted, and large-scale application in such complicated environments as shopping malls and supermarkets is impossible; surveillance equipment in shopping malls and supermarkets daily produce great volumes of video data, and analysis of the surveillance videos can acquire much relevant information of human-goods interactive behaviors, but this requires a huge input of manpower, and it is also problematic in terms of low efficiency.

Date Recue/Date Received 2022-05-09 SUMMARY OF THE INVENTION

[0004] In view of the above technical problems, there is an urgent need to provide an interactive behavior recognizing method, and corresponding device, computer equipment and storage medium enabling highly effective recognition of interactive behaviors between human bodies and goods.

[0005] There is provided an interactive behavior recognizing method that comprises:

[0006] obtaining an image to be detected;

[0007] inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and

[0008] determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

[0009] In one of the embodiments, the preset goods-rack image is a preset goods-rack mask image, and the step of determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds includes:

[0010] selecting a wrist key point from the key points of the pedestrian;

[0011] acquiring a hand region of the pedestrian according to the wrist key point and a preset radius threshold;

[0012] determining that an interactive behavior occurs between the pedestrian and the corresponding goods rack when an intersection area between an image of the hand region and the preset goods-rack mask image is greater than a preset area threshold;
and

[0013] determining that no interactive behavior occurs between the pedestrian and the corresponding goods rack when the intersection area between the image of the hand Date Recue/Date Received 2022-05-09 region and the preset goods-rack mask image is smaller than or equal to the preset area threshold.

[0014] In one of the embodiments, the method further comprises:

[0015] selecting any point in the detection box of the pedestrian to serve as an anchor point, and setting position coordinates of the anchor point in the image to be detected as first position coordinates of the pedestrian;

[0016] mapping the first position coordinates of the pedestrian to a world coordinate system according to a preset coordinate mapping relation, and acquiring second position coordinates of the pedestrian, wherein the second position coordinates are position coordinates of the pedestrian in the world coordinate system; and

[0017] collecting second position coordinates of various time points of the pedestrian within a preset time interval, and acquiring a route map of the pedestrian within the preset time interval.

[0018] In one of the embodiments, the method further comprises:

[0019] acquiring orientation information of the pedestrian according to the key points of the pedestrian; and

[0020] acquiring a goods-rack region towards which the pedestrian orients according to the orientation information of the pedestrian and the preset goods-rack image.

[0021] In one of the embodiments, the step of acquiring orientation information of the pedestrian according to the key points of the pedestrian includes:

[0022] selecting shoulder key points from the key points of the pedestrian, wherein the shoulder key points include a left shoulder key point and a right shoulder key point;

[0023] calculating a difference between coordinates of the left shoulder key point and coordinates of the right shoulder key point, and obtaining a shoulder vector;

[0024] employing an inverse cosine function to calculate an included angle formed by the shoulder vector and a preset unit vector, wherein the preset unit vector is a unit vector on Date Recue/Date Received 2022-05-09 a y-axis negative direction of a coordinate system of the image to be detected;

[0025] summating a radian value of the included angle with it. and acquiring an orientation angle of the pedestrian;

[0026] determining that the pedestrian orients towards one side of the image to be detected when the orientation angle is greater than or equal to it and smaller than 1.5n;
and

[0027] determining that the pedestrian orients towards another side of the image to be detected when the orientation angle is greater than 1.5n and smaller than or equal to 2n.

[0028] In one of the embodiments, the step of obtaining an image to be detected includes:

[0029] obtaining a surveillance video of a target site; and

[0030] screening an image with a pedestrian out of the surveillance video to serve as the image to be detected.

[0031] In one of the embodiments, the method further comprises:

[0032] obtaining a sample image;

[0033] marking key points and detection boxes of pedestrians in the sample image, and acquiring marked image data; and

[0034] inputting the marked image data in a neural network model for training, and acquiring the multitask model; preferably, the neural network model is embodied as a ResNet-101+FPN network model.

[0035] There is provided a human-goods interactive behavior recognizing device that comprises:

[0036] an obtaining module, for obtaining an image to be detected;

[0037] a detecting module, for inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points;
and

[0038] a recognizing module, for determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset Date Recue/Date Received 2022-05-09 goods-rack image to which the image to be detected corresponds.

[0039] There is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:

[0040] obtaining an image to be detected;

[0041] inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and

[0042] determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

[0043] There is provided a computer-readable storage medium storing a computer program thereon, and the following steps are realized when the computer program is executed by a processor:

[0044] obtaining an image to be detected;

[0045] inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and

[0046] determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

[0047] In the aforementioned interactive behavior recognizing method, device, computer equipment and storage medium, an image to be detected is obtained and input in a preset multitask model to acquire key points and a detection box of a pedestrian in the image to Date Recue/Date Received 2022-05-09 be detected; through the multitask model used for the detection of pedestrians and for the detection of human body key points, this method makes it possible to synchronously obtain detection boxes and human body key points of pedestrians, so as to enhance image processing efficiency; the key points are all located inside the detection box, whereby it is made possible to exclude erroneous key points located outside the detection box, so as to achieve the objectives of making comprehensive use of the detection box and the key points, and enhancing marking precision of the key points; interactive behavior information of the pedestrian and the corresponding goods rack is determined according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds, whereby it is made possible to highly effectively recognize interactive behaviors, and to enhance recognition precision.
BRIEF DESCRIPTION OF THE DRAWINGS

[0048] Fig. 1 is a view illustrating the application environment for the interactive behavior recognizing method in an embodiment;

[0049] Fig. 2 is a flowchart schematically illustrating the interactive behavior recognizing method in an embodiment;

[0050] Fig. 3 is a flowchart schematically illustrating an interactive behavior judging step in an embodiment;

[0051] Fig. 4 is a flowchart schematically illustrating the interactive behavior recognizing method in another embodiment;

[0052] Fig. 5 is a block diagram illustrating the structure of the interactive behavior recognizing device in an embodiment; and Date Recue/Date Received 2022-05-09

[0053] Fig. 6 is a view illustrating the internal structure of the computer equipment in an embodiment.
DETAILED DESCRIPTION OF THE INVENTION

[0054] To make more lucid and clear the objectives, technical solutions and advantages of the present application, the present application is described in greater detail below with reference to accompanying drawings and embodiments. As should be understood, the specific embodiments described here are merely meant to explain the present application, rather than to restrict the present application.

[0055] The interactive behavior recognizing method provided by the present application is applicable to the application environment as shown in Fig. 1, in which terminal 102 communicates with server 104 through network. Terminal 102 can be, but is not limited to be, any of various image collection devices, specifically, terminal 102 can be an existing surveillance equipment in such a site as a supermarket or a library, and server 104 can be embodied as an independent server or a server cluster consisting of a plurality of servers.

[0056] In one embodiment, as shown in Fig. 2, there is provided an interactive behavior recognizing method, and the method is explained with an example of its being applied to the server in Fig. 1, to comprise the following steps.

[0057] Step 202 ¨ obtaining an image to be detected.

[0058] The image to be detected is an image with pedestrians collected by an image collection device, the image collection device can be a surveillance equipment, e.g., an existing camera in a target site, already installed and used in such a target site as a supermarket or a library, and it is not necessary to make any reconstruction on the target site, so the Date Recue/Date Received 2022-05-09 deployment cost is low.

[0059] Specifically, a surveillance video is obtained through a camera, and an image with pedestrians is screened out of the surveillance video to serve as the image to be detected.

[0060] Step 204 - inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points.

[0061] The multitask model can obtain the detection box of the pedestrian in the image to be detected through detection of pedestrians and simultaneously obtain the key points of the pedestrian through detection of human body key points, so as to achieve synchronous obtainment of the detection box and the key points of the pedestrian, features are shared among different tasks, computational quantity is lowered, occupation of hardware resources is reduced, single-frame image processing time is shortened, images to be detected as obtained from multichannel cameras can be processed at the same time, and parallel processing of multichannel cameras is realized.

[0062] Specifically, the obtained image to be detected is input in the preset multitask model, the multitask model performs detection of pedestrians and detection of human body key points on the image to be detected, the multitask model can exclude any key point located outside the detection box during the process of processing the image to be detected, so that output key points are all located inside the detection box, and finally the multitask model can output the key points and the detection box of the pedestrian in the image to be detected.

[0063] For instance, an image to be detected 1"w' is input in the multitask model, and the multitask model outputs a key point 13 and a detection box .6, Date Recue/Date Received 2022-05-09

[0064] P = [P11 ,P, P/ = (xii

[0065] h." = [B1, B2, ...,B1}1, Bi = (4 ,yoi , score),

[0066] where N indicates the number of pedestrians in the image to be detected, K indicates the number of key points of each pedestrian, usually K=17;

[0067] P/ = (xii , yii.) indicates coordinates of the ith key point of the ith person on the image to be detected;

[0068] Bi = (4, y01, x1, y11, score) indicates coordinates of the upper left corner and the lower right corner of the detection box of the ith person on the image to be detected, and score indicates confidence, namely degree of credibility, of the detection box.

[0069] Step 206 - determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

[0070] Existing cameras, target site layout, and goods racks are located and marked in advance, corresponding preset goods-rack images are configured for each camera, and it is known that the image to be detected is obtained via one of the cameras, seen as such, images to be detected as obtained via the same and single camera all correspond to the camera, and hence the images to be detected also correspond to the preset goods-rack images configured for the camera.

[0071] Specifically, a part key point can be selected from the key points of the pedestrian to serve as a reference key point, and any interactive behavior between the pedestrian and the corresponding goods rack is thereafter judged according to a mutual relation between the reference key point and the preset goods-rack image, such as a distance therebetween or an intersection area therebetween.

[0072] In the aforementioned interactive behavior recognizing method, an image to be detected Date Recue/Date Received 2022-05-09 is obtained and input in a preset multitask model to acquire key points and a detection box of a pedestrian in the image to be detected; through the multitask model used for the detection of pedestrians and for the detection of human body key points, this method makes it possible to synchronously obtain detection boxes and human body key points of pedestrians, so as to enhance image processing efficiency; the key points are all located inside the detection box, whereby it is made possible to exclude erroneous key points located outside the detection box, so as to achieve the objectives of making comprehensive use of the detection box and the key points, and enhancing marking precision of the key points; interactive behavior information of the pedestrian and the corresponding goods rack is determined according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds, whereby it is made possible to highly effectively recognize interactive behaviors, and to enhance recognition precision; moreover, this method can achieve automatic processing throughout the process, human intervention is not required, manpower cost is greatly lowered.

[0073] In one embodiment, as shown in Fig. 3, the preset goods-rack image is a preset goods-rack mask image, the preset goods-rack mask image can be an image obtained by extracting one frame of image from great quantities of surveillance video and thereafter polygonally marking the outer contour of a goods rack in the image; the step of determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds includes the following:

[0074] Step 302 - selecting a wrist key point from the key points of the pedestrian.

[0075] Wrist key point data includes left wrist key point data and right wrist key point data.

[0076] Step 304 - acquiring a hand region of the pedestrian according to the wrist key point and Date Recue/Date Received 2022-05-09 a preset radius threshold.

[0077] Specifically, a left wrist key point and a right wrist key point are each taken as a center of a circle, and a preset radius threshold is taken as a radius to draw out a left hand region and a right hand region, so as to acquire an image of the left hand region and an image of the right hand region.

[0078] Step 306 ¨judging whether an intersection area between an image of the hand region and the preset goods-rack mask image is greater than a preset area threshold.

[0079] Step 308 ¨ if yes, determining that an interactive behavior occurs between the pedestrian and the corresponding goods rack.

[0080] Step 310¨ if not, determining that no interactive behavior occurs between the pedestrian and the corresponding goods rack.

[0081] In the above Step 306, the hand region includes a left hand region and a right hand region, specifically, when the intersection area between the image of the hand region of at least one of the left hand region and the right hand region and the preset goods-rack mask image is greater than the preset area threshold, it is determined that an interactive behavior occurs between the pedestrian and the corresponding goods rack, otherwise it is determined that no interactive behavior occurs between the pedestrian and the corresponding goods rack.

[0082] For instance, HR9 = tx: IIX p?Ii R}
indicates a hand region with the left wrist as a center of circle and R as radius, namely a left hand region; H Rio = tx: Ilx ilrp 0 11 R}
indicates a hand region with the right wrist as a center of circle and R as radius, namely a right hand region;

Date Recue/Date Received 2022-05-09

[0083] the preset area threshold is 150 units area, when HR 11 Ms > 150, it is determined that an interactive behavior occurs between the pedestrian and the corresponding goods rack, i.e., the pedestrian is purchasing goods;

[0084] when HR n Ms < 150, it is determined that no interactive behavior occurs between the pedestrian and the corresponding goods rack, i.e., the pedestrian is not purchasing goods.

[0085] In this embodiment, there is provided an interactive behavior recognizing method, the interactive behavior recognizing method judges any interactive behavior by directly estimating the intersection area between the hand and the goods rack, so the method is simple and feasible, is strong in extensibility, quick in computational speed, and better in real-timeliness; this method is usually applied to the recognition of human-goods interactive behaviors in the supermarket, in which case the goods rack is a goods shelf of the supermarket, but this method is also applicable to the recognition of human-goods interactive behaviors in other sites, such as a library, in which case the goods rack is a book shelf of the library.

[0086] In one embodiment, the method further comprises:

[0087] Selecting any point in the detection box of the pedestrian to serve as an anchor point, and setting position coordinates of the anchor point in the image to be detected as first position coordinates of the pedestrian;

[0088] Specifically, the center point of the detection box is selected to serve as the anchor point, because the selection is convenient, and the center point can more precisely express the position of the pedestrian;

[0089] Mapping the first position coordinates of the pedestrian to a world coordinate system according to a preset coordinate mapping relation, and acquiring second position coordinates of the pedestrian, wherein the second position coordinates are position coordinates of the pedestrian in the world coordinate system;

[0090] The preset coordinate mapping relation here is a coordinate mapping relation between the coordinate system of the image to be detected and the world coordinate system;

Date Recue/Date Received 2022-05-09 specifically, the position of the image collection device in the world coordinate system is demarcated in advance, the coordinate position of the image to be detected as collected by the image collection device in the world coordinate system can be obtained through the positional information of the image collection device, and the coordinate mapping relation between the coordinate system of the image to be detected and the world coordinate system is hence deduced;

[0091] Collecting second position coordinates of various time points of the pedestrian within a preset time interval, and acquiring a route map of the pedestrian within the preset time interval.

[0092] The preset time interval is the time taken for the pedestrian to walk into and out of the target site, the route map of the pedestrian within the preset time interval is the route walked by the pedestrian from entry into the target site to exit out of the target site, namely a moving line map of the pedestrian, in conjunction with the layout map of the target site, it is possible to draw the moving line map of the pedestrian upon entry into the target site based on the layout map of the target site.

[0093] In this embodiment, there is provided an interactive behavior recognizing method, this method makes it possible to acquire a route map of the pedestrian within a preset time interval according to the detection box of the pedestrian and a preset coordinate mapping relation, so as to facilitate the recording of the action track of the pedestrian inside the target site within a preset time, when this method is applied to a supermarket, it is made possible to visually directly observe action line data of customers inside the supermarket from entry to exit, and the operating personnel can readjust the layout of the supermarket in accordance with these data, so as to more adapt it to purchasing habits of customers.

[0094] In one embodiment, the method further comprises the following:

[0095] acquiring orientation information of the pedestrian according to the key points of the pedestrian;

Date Recue/Date Received 2022-05-09

[0096] specifically, shoulder key points are selected from the key points of the pedestrian;

[0097] for instance, the shoulder key points include a left shoulder key point p and a right shoulder key point pi6;

[0098] where p16 = (xj6,y6), p = (4,y);

[0099] a difference between coordinates of the left shoulder key point and coordinates of the right shoulder key point is calculated, and a shoulder vector is obtained:

[0100] P=p-p = (xi6 ¨ ¨ yis);

[0101] an inverse cosine function is employed to calculate an included angle formed by the shoulder vector and a preset unit vector, the preset unit vector is a unit vector on a y-axis negative direction of a coordinate system of the image to be detected;

[0102] a radian value of the included angle is summated with it. and an orientation angle of the pedestrian is acquired;
Yi ¨Yi

[0103] 0 = Tt + arccos ¨17511 = Tt + arccos r el

[0104] it is determined that the pedestrian orients towards one side of the image to be detected when the orientation angle is greater than or equal to it and smaller than 1.5n; and it is determined that the pedestrian orients towards another side of the image to be detected when the orientation angle is greater than 1.5n and smaller than or equal to 2n.

[0105] Acquiring a goods-rack region towards which the pedestrian orients according to the orientation information of the pedestrian and the preset goods-rack image.
Specifically, the goods-rack region towards which the pedestrian orients can be acquired according to the orientation of the pedestrian in the image to be detected and the preset goods-rack image to which the image to be detected corresponds.

[0106] In this embodiment, there is provided an interactive behavior recognizing method, the method makes use of shoulder key point data to calculate the orientation of the pedestrian, Date Recue/Date Received 2022-05-09 the orientation result has higher robustness, whereby the goods-rack region to which attention is paid by the customer is judged, and reference can be provided for goods disposal in the supermarket.

[0107] In one embodiment, the step of obtaining an image to be detected includes the following:

[0108] obtaining a surveillance video of a target site;

[0109] specifically, image collection devices already installed and used in the supermarket are demarcated as to their positions, corresponding goods-rack mask images are configured for the various image collection devices, surveillance videos captured by the image collection devices are obtained, and the image collection devices are generally embodied as cameras.

[0110] Screening an image with a pedestrian out of the surveillance video to serve as the image to be detected.

[0111] In this embodiment, there is provided an interactive behavior recognizing method, this method directly utilizes existing surveillance equipment of the target site, such as a camera of a shopping mall or a supermarket, it is not required to make any reconstruction on the site, the deployment cost is low, and popularization is easy.

[0112] In one embodiment, the method further comprises:

[0113] obtaining a sample image; specifically, a surveillance video of a supermarket is obtained, and great quantities of images with pedestrians are screened out of the surveillance video to serve as sample images;

[0114] marking key points and detection boxes of pedestrians in the sample image, and acquiring marked image data; - specifically, detection boxes of pedestrians in the sample images are marked, such key point positions as eyes, noises, ears, shoulders, elbows, wrists, hips, knees, and ankles of pedestrians are marked, and marked image data is finally obtained;

[0115] inputting the marked image data in a neural network model for training, and acquiring Date Recue/Date Received 2022-05-09 the multitask model, wherein the neural network model is preferably embodied as a ResNet-101+FPN network model, this neural network model is a one-stage bottom-up multitask network model, and saves processing time as compared with multistage algorithms of the same type; as compared with top-down algorithms, the processing time does not vary with variations in the number of people in the picture.

[0116] In this embodiment, there is provided an interactive behavior recognizing method, the method processes an image to be detected by creating and training a multitask model, both the training and optimization of the model are completed at the backstage, and the operation of such a site as a shopping mall, a supermarket or a library is not affected;
moreover, the model is strong in generalization capability, and can be conveniently and quickly deployed; features can be shared among the different tasks of the multitask model, computational quantity is lowered, occupation of hardware resources is reduced, single-frame image processing time is shortened, and parallel processing of multichannel cameras is achieved.

[0117] In one embodiment, as shown in Fig. 4, the method comprises the following steps:

[0118] Step 402 ¨ obtaining a surveillance video of a target site;

[0119] Step 404¨ screening an image with a pedestrian out of the surveillance video to serve as an image to be detected;

[0120] Step 406 ¨ inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of the pedestrian in the image to be detected, wherein the key points are all located inside the detection box;

[0121] Step 408 ¨ determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds;

[0122] Step 410¨ acquiring a route map of the pedestrian within a preset time interval according to the detection box of the pedestrian and a preset coordinate mapping relation; and

[0123] Step 412¨ acquiring orientation information of the pedestrian according to the key points Date Recue/Date Received 2022-05-09 of the pedestrian.

[0124] As should be understood, although the various steps in the flowcharts of Figs. 2 to 4 are sequentially displayed as indicated by arrows, these steps are not necessarily executed in the sequences indicated by arrows. Unless otherwise explicitly noted in this paper, execution of these steps is not restricted by any sequence, as these steps can also be executed in other sequences (than those indicated in the drawings). Moreover, at least partial steps in the flowcharts of Figs. 2 to 4 may include plural sub-steps or multi-phases, these sub-steps or phases are not necessarily completed at the same timing, but can be executed at different timings, and these sub-steps or phases are also not necessarily sequentially performed, but can be performed in turns or alternately with other steps or with at least some of sub-steps or phases of other steps.

[0125] In one embodiment, as shown in Fig. 5, there is provided an interactive behavior recognizing device that comprises an obtaining module 502, a detecting module 504 and a recognizing module 506, of which

[0126] the obtaining module 502 is employed for obtaining an image to be detected;

[0127] the detecting module 504 is employed for inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and

[0128] the recognizing module 506 is employed for determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

[0129] In one embodiment, the preset goods-rack image is a preset goods-rack mask image, and the recognizing module 506 includes:

[0130] a first key-point selecting unit, for selecting a wrist key point from the key points of the Date Recue/Date Received 2022-05-09 pedestrian;

[0131] a hand region unit, for acquiring a hand region of the pedestrian according to the wrist key point and a preset radius threshold;

[0132] an interaction determining unit, for determining that an interactive behavior occurs between the pedestrian and the corresponding goods rack when an intersection area between an image of the hand region and the preset goods-rack mask image is greater than a preset area threshold; and determining that no interactive behavior occurs between the pedestrian and the corresponding goods rack when the intersection area between the image of the hand region and the preset goods-rack mask image is smaller than or equal to the preset area threshold.

[0133] In one embodiment, the device further comprises:

[0134] a first position coordinate module, for selecting any point in the detection box of the pedestrian to serve as an anchor point, and setting position coordinates of the anchor point in the image to be detected as first position coordinates of the pedestrian;

[0135] a second position coordinate module, for mapping the first position coordinates of the pedestrian to a world coordinate system according to a preset coordinate mapping relation, and acquiring second position coordinates of the pedestrian, wherein the second position coordinates are position coordinates of the pedestrian in the world coordinate system; and

[0136] a route map module, for collecting second position coordinates of various time points of the pedestrian within a preset time interval, and acquiring a route map of the pedestrian within the preset time interval.

[0137] In one embodiment, the device further comprises:

[0138] an orientation information module, for acquiring orientation information of the pedestrian according to the key points of the pedestrian; and

[0139] an oriented region module, for acquiring a goods-rack region towards which the pedestrian orients according to the orientation information of the pedestrian and the preset goods-rack image.

Date Recue/Date Received 2022-05-09

[0140] In one embodiment, the orientation information module includes:

[0141] a second key-point selecting unit, for selecting shoulder key points from the key points of the pedestrian, wherein the shoulder key points include a left shoulder key point and a right shoulder key point;

[0142] an orientation angle calculating unit, for calculating a difference between coordinates of the left shoulder key point and coordinates of the right shoulder key point, and obtaining a shoulder vector; employing an inverse cosine function to calculate an included angle formed by the shoulder vector and a preset unit vector, wherein the preset unit vector is a unit vector on a y-axis negative direction of a coordinate system of the image to be detected; summating a radian value of the included angle with it. and acquiring an orientation angle of the pedestrian;

[0143] an orientation judging unit, for determining that the pedestrian orients towards one side of the image to be detected when the orientation angle is greater than or equal to it and smaller than 1.5n; and determining that the pedestrian orients towards another side of the image to be detected when the orientation angle is greater than 1.5n and smaller than or equal to 2n.

[0144] In one embodiment, the obtaining module 502 includes:

[0145] a video obtaining unit, for obtaining a surveillance video of a target site; and

[0146] an image obtaining unit, for screening an image with a pedestrian out of the surveillance video to serve as the image to be detected.

[0147] In one embodiment, the device further comprises:

[0148] a sample obtaining module, for obtaining a sample image;

[0149] a sample data module, for marking key points and detection boxes of pedestrians in the sample image, and acquiring marked image data; and

[0150] a model training module, for inputting the marked image data in a neural network model for training, and acquiring the multitask model; preferably, the neural network model is Date Recue/Date Received 2022-05-09 embodied as a ResNet-101+FPN network model.

[0151] Specific definitions relevant to the interactive behavior recognizing device may be inferred from the aforementioned definitions to the interactive behavior recognizing method, while no repetition is made in this context. The various modules in the aforementioned interactive behavior recognizing device can be wholly or partly realized via software, hardware, and a combination of software with hardware. The various modules can be embedded in the form of hardware in a processor in a computer equipment or independent of any computer equipment, and can also be stored in the form of software in a memory in a computer equipment, so as to facilitate the processor to invoke and perform operations corresponding to the aforementioned various modules.

[0152] In one embodiment, a computer equipment is provided, the computer equipment can be a server, and its internal structure can be as shown in Fig. 6. The computer equipment comprises a processor, a memory, a network interface, and a database connected to each other via a system bus. The processor of the computer equipment is employed to provide computing and controlling capabilities. The memory of the computer equipment includes a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores therein an operating system, a computer program and a database. The internal memory provides environment for the running of the operating system and the computer program in the nonvolatile storage medium. The database of the computer equipment is employed to store data. The network interface of the computer equipment is employed to connect to an external terminal via network for communication. The computer program realizes an interactive behavior recognizing method when it is executed by a processor.

[0153] As understandable to persons skilled in the art, the structure illustrated in Fig. 6 is merely a block diagram of partial structure relevant to the solution of the present application, and does not constitute any restriction to the computer equipment on which the solution of the present application is applied, as the specific computer equipment may comprise Date Recue/Date Received 2022-05-09 component parts that are more than or less than those illustrated in Fig. 6, or may combine certain component parts, or may have different layout of component parts.

[0154] In one embodiment, there is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
obtaining an image to be detected; inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

[0155] In one embodiment, when the processor executes the computer program, the following steps are further realized: the preset goods-rack image is a preset goods-rack mask image, and the step of determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds includes:
selecting a wrist key point from the key points of the pedestrian; acquiring a hand region of the pedestrian according to the wrist key point and a preset radius threshold;
determining that an interactive behavior occurs between the pedestrian and the corresponding goods rack when an intersection area between an image of the hand region and the preset goods-rack mask image is greater than a preset area threshold; and determining that no interactive behavior occurs between the pedestrian and the corresponding goods rack when the intersection area between the image of the hand region and the preset goods-rack mask image is smaller than or equal to the preset area threshold.

[0156] In one embodiment, when the processor executes the computer program, the following Date Recue/Date Received 2022-05-09 steps are further realized: selecting any point in the detection box of the pedestrian to serve as an anchor point, and setting position coordinates of the anchor point in the image to be detected as first position coordinates of the pedestrian; mapping the first position coordinates of the pedestrian to a world coordinate system according to a preset coordinate mapping relation, and acquiring second position coordinates of the pedestrian, wherein the second position coordinates are position coordinates of the pedestrian in the world coordinate system; and collecting second position coordinates of various time points of the pedestrian within a preset time interval, and acquiring a route map of the pedestrian within the preset time interval.

[0157] In one embodiment, when the processor executes the computer program, the following steps are further realized: acquiring orientation information of the pedestrian according to the key points of the pedestrian; and acquiring a goods-rack region towards which the pedestrian orients according to the orientation information of the pedestrian and the preset goods-rack image.

[0158] In one embodiment, when the processor executes the computer program, the following steps are further realized: the step of acquiring orientation information of the pedestrian according to the key points of the pedestrian includes: selecting shoulder key points from the key points of the pedestrian, wherein the shoulder key points include a left shoulder key point and a right shoulder key point; calculating a difference between coordinates of the left shoulder key point and coordinates of the right shoulder key point, and obtaining a shoulder vector; employing an inverse cosine function to calculate an included angle formed by the shoulder vector and a preset unit vector, wherein the preset unit vector is a unit vector on a y-axis negative direction of a coordinate system of the image to be detected; summating a radian value of the included angle with it. and acquiring an orientation angle of the pedestrian; determining that the pedestrian orients towards one side of the image to be detected when the orientation angle is greater than or equal to it and smaller than 1.5n; and determining that the pedestrian orients towards another side Date Recue/Date Received 2022-05-09 of the image to be detected when the orientation angle is greater than 1.5n and smaller than or equal to 2n.

[0159] In one embodiment, when the processor executes the computer program, the following steps are further realized: the step of obtaining an image to be detected includes: obtaining a surveillance video of a target site; and screening an image with a pedestrian out of the surveillance video to serve as the image to be detected.

[0160] In one embodiment, when the processor executes the computer program, the following steps are further realized: obtaining a sample image; marking key points and detection boxes of pedestrians in the sample image, and acquiring marked image data; and inputting the marked image data in a neural network model for training, and acquiring the multitask model; preferably, the neural network model is embodied as a ResNet-101+FPN
network model.

[0161] In one embodiment, there is provided a computer-readable storage medium storing thereon a computer program, and the following steps are realized when the computer program is executed by a processor: obtaining an image to be detected;
inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

[0162] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: the preset goods-rack image is a preset goods-rack mask image, and the step of determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset Date Recue/Date Received 2022-05-09 goods-rack image to which the image to be detected corresponds includes:
selecting a wrist key point from the key points of the pedestrian; acquiring a hand region of the pedestrian according to the wrist key point and a preset radius threshold;
determining that an interactive behavior occurs between the pedestrian and the corresponding goods rack when an intersection area between an image of the hand region and the preset goods-rack mask image is greater than a preset area threshold; and determining that no interactive behavior occurs between the pedestrian and the corresponding goods rack when the intersection area between the image of the hand region and the preset goods-rack mask image is smaller than or equal to the preset area threshold.

[0163] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: selecting any point in the detection box of the pedestrian to serve as an anchor point, and setting position coordinates of the anchor point in the image to be detected as first position coordinates of the pedestrian; mapping the first position coordinates of the pedestrian to a world coordinate system according to a preset coordinate mapping relation, and acquiring second position coordinates of the pedestrian, wherein the second position coordinates are position coordinates of the pedestrian in the world coordinate system; and collecting second position coordinates of various time points of the pedestrian within a preset time interval, and acquiring a route map of the pedestrian within the preset time interval.

[0164] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: acquiring orientation information of the pedestrian according to the key points of the pedestrian; and acquiring a goods-rack region towards which the pedestrian orients according to the orientation information of the pedestrian and the preset goods-rack image.

[0165] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: the step of acquiring orientation information of the pedestrian Date Recue/Date Received 2022-05-09 according to the key points of the pedestrian includes: selecting shoulder key points from the key points of the pedestrian, wherein the shoulder key points include a left shoulder key point and a right shoulder key point; calculating a difference between coordinates of the left shoulder key point and coordinates of the right shoulder key point, and obtaining a shoulder vector; employing an inverse cosine function to calculate an included angle formed by the shoulder vector and a preset unit vector, wherein the preset unit vector is a unit vector on a y-axis negative direction of a coordinate system of the image to be detected; summating a radian value of the included angle with it. and acquiring an orientation angle of the pedestrian; determining that the pedestrian orients towards one side of the image to be detected when the orientation angle is greater than or equal to it and smaller than 1.5n; and determining that the pedestrian orients towards another side of the image to be detected when the orientation angle is greater than 1.5n and smaller than or equal to 2n.

[0166] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: the step of obtaining an image to be detected includes: obtaining a surveillance video of a target site; and screening an image with a pedestrian out of the surveillance video to serve as the image to be detected.

[0167] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: obtaining a sample image; marking key points and detection boxes of pedestrians in the sample image, and acquiring marked image data; and inputting the marked image data in a neural network model for training, and acquiring the multitask model; preferably, the neural network model is embodied as a ResNet-101+FPN
network model.

[0168] As comprehensible to persons ordinarily skilled in the art, the entire or partial flows in the methods according to the aforementioned embodiments can be completed via a computer program instructing relevant hardware, the computer program can be stored in Date Recue/Date Received 2022-05-09 a nonvolatile computer-readable storage medium, and the computer program can include the flows as embodied in the aforementioned various methods when executed. Any reference to the memory, storage, database or other media used in the various embodiments provided by the present application can all include nonvolatile and/or volatile memory/memories. The nonvolatile memory can include a read-only memory (ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM), an electrically erasable and programmable ROM (EEPROM) or a flash memory. The volatile memory can include a random access memory (RAM) or an external cache memory. To serve as explanation rather than restriction, the RAM is obtainable in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM
(SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM
(RDRAM), etc.

[0169] Technical features of the aforementioned embodiments are randomly combinable, while all possible combinations of the technical features in the aforementioned embodiments are not exhausted for the sake of brevity, but all these should be considered to fall within the scope recorded in the Description as long as such combinations of the technical features are not mutually contradictory.

[0170] The foregoing embodiments are merely directed to several modes of execution of the present application, and their descriptions are relatively specific and detailed, but they should not be hence misunderstood as restrictions to the inventive patent scope. As should be pointed out, persons with ordinary skill in the art may further make various modifications and improvements without departing from the conception of the present application, and all these should pertain to the protection scope of the present application.
Accordingly, the patent protection scope of the present application shall be based on the attached Claims.

Date Recue/Date Received 2022-05-09

Claims

CA 03160731 2022-05-09What is claimed is:

1. An interactive behavior recognizing method, characterized in that the method comprises:
obtaining an image to be detected;
inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

2. The method according to Claim 1, characterized in that the preset goods-rack image is a preset goods-rack mask image, and that the step of determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds includes:
selecting a wrist key point from the key points of the pedestrian;
acquiring a hand region of the pedestrian according to the wrist key point and a preset radius threshold;
determining that an interactive behavior occurs between the pedestrian and the corresponding goods rack when an intersection area between an image of the hand region and the preset goods-rack mask image is greater than a preset area threshold; and determining that no interactive behavior occurs between the pedestrian and the corresponding goods rack when the intersection area between the image of the hand region and the preset goods-rack mask image is smaller than or equal to the preset area threshold.

3. The method according to Claim 1, characterized in that the method further comprises:

Date Recue/Date Received 2022-05-09 selecting any point in the detection box of the pedestrian to serve as an anchor point, and setting position coordinates of the anchor point in the image to be detected as first position coordinates of the pedestrian;
mapping the first position coordinates of the pedestrian to a world coordinate system according to a preset coordinate mapping relation, and acquiring second position coordinates of the pedestrian, wherein the second position coordinates are position coordinates of the pedestrian in the world coordinate system; and collecting second position coordinates of various time points of the pedestrian within a preset time interval, and acquiring a route map of the pedestrian within the preset time interval.

4. The method according to Claim 1, characterized in that the method further comprises:
acquiring orientation information of the pedestrian according to the key points of the pedestrian;
and acquiring a goods-rack region towards which the pedestrian orients according to the orientation information of the pedestrian and the preset goods-rack image.

5. The method according to Claim 4, characterized in that the step of acquiring orientation information of the pedestrian according to the key points of the pedestrian includes:
selecting shoulder key points from the key points of the pedestrian, wherein the shoulder key points include a left shoulder key point and a right shoulder key point;
calculating a difference between coordinates of the left shoulder key point and coordinates of the right shoulder key point, and obtaining a shoulder vector;
employing an inverse cosine function to calculate an included angle formed by the shoulder vector and a preset unit vector, wherein the preset unit vector is a unit vector on a y-axis negative direction of a coordinate system of the image to be detected;
summating a radian value of the included angle vvith 7E, and acquiring an orientation angle of the pedestrian;
determining that the pedestrian orients towards one side of the image to be detected when the orientation angle is greater than or equal to 7E and smaller than 1.5n; and Date Recue/Date Received 2022-05-09 determining that the pedestrian orients towards the other side of the image to be detected when the orientation angle is greater than 1.57( and smaller than or equal to 27r.

6. The method according to any of Claims 1 to 5, characterized in that the step of obtaining an image to be detected includes:
obtaining a surveillance video of a target site; and screening an image with a pedestrian out of the surveillance video to serve as the image to be detected.

7. The method according to any of Claims 1 to 5, characterized in that the method further comprises:
obtaining a sample image;
marking key points and detection boxes of pedestrians in the sample image, and acquiring marked image data; and inputting the marked image data in a neural network model for training, and acquiring the multitask model, wherein the neural network model is preferably embodied as a ResNet-101+FPN network model.

8. An interactive behavior recognizing device, characterized in that the device comprises:
an obtaining module, for obtaining an image to be detected;
a detecting module, for inputting the image to be detected in a preset multitask model, and acquiring key points and a detection box of a pedestrian in the image to be detected, wherein the key points are all located inside the detection box, and the multitask model is used for detection of pedestrians and detection of human body key points; and a recognizing module, for determining interactive behavior information of the pedestrian and a corresponding goods rack according to the key points of the pedestrian and a preset goods-rack image to which the image to be detected corresponds.

9. A computer equipment, comprising a memory, a processor and a computer program stored on Date Recue/Date Received 2022-05-09 the memory and operable on the processor, characterized in that steps of the method according to any of Claims 1 to 7 are realized when the processor executes the computer program.

10. A computer-readable storage medium, storing a computer program thereon, characterized in that steps of the method according to any of Claims 1 to 7 are realized when the computer program is executed by a processor.
Date Recue/Date Received 2022-05-09