CN115620402B

CN115620402B - Human-cargo interaction behavior identification method, system and related device

Info

Publication number: CN115620402B
Application number: CN202211498078.1A
Authority: CN
Inventors: 冯昊; 冯雪涛
Original assignee: Zhejiang Lianhe Technology Co ltd
Current assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Priority date: 2022-11-28
Filing date: 2022-11-28
Publication date: 2023-03-31
Anticipated expiration: 2042-11-28
Also published as: CN115620402A

Abstract

One or more embodiments of the present specification disclose a person-goods interaction behavior recognition method, system and related device, the method comprising: judging whether a touch action occurs on a target shelf or not according to the image information, and determining different identification schemes according to the number of target customers with the touch action: if only one target customer has touch behavior, the weighing information of the target shelf can be used for identifying the interaction behavior of the target customer and the target shelf; if a plurality of target customers touch, the goods holding detection model can be used for predicting images before and after each target customer touches, and the interaction behavior of each target customer and the target shelf is identified according to the prediction result. Therefore, the interaction behavior of the target customer and the target shelf can be accurately identified by combining weighing information or a goods holding detection model according to the touch behavior determined by the image information, and the identification accuracy and the identification efficiency are improved.

Description

Human-cargo interaction behavior identification method, system and related device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, a system, and a related device for identifying human-cargo interaction behavior.

Background

The digital stores with the integration of Internet application, internet of things technology of physical stores, artificial intelligence and automation technology have come into operation. At present, stealing behaviors generally exist in shopping places such as shopping malls, supermarkets and the like. The existing solution is to supervise and prevent through a monitoring means, but because there are more customers and commodities and the vision scheme has inherent limitations, such as human body shielding, commodity complexity and background complexity, interaction between the customers and the goods shelf cannot be accurately identified, for example, whether the customers take the commodities and put them back is increased, and then the task of suspect investigation is increased.

Disclosure of Invention

One or more embodiments of the present disclosure are directed to a method, a system and a related device for identifying human-cargo interaction behavior, so as to accurately identify interaction behavior between a target customer and a target shelf.

To solve the above technical problem, one or more embodiments of the present specification are implemented as follows:

in a first aspect, a human-cargo interaction behavior identification method is provided, including:

receiving image information collected based on a shooting place where a target shelf is located;

detecting whether a target customer touches the touch behavior of the target shelf or not based on the image information;

if the touch behavior of a target customer is detected, inquiring weighing information of the target shelf within the starting time to the ending time of the touch behavior, and identifying the interaction behavior of the target customer and the target shelf based on the weighing information;

if the touch behaviors of a plurality of target customers are detected, predicting the handheld states of the target customers at the starting time of the touch behaviors and the handheld states of the target customers at the ending time of the touch behaviors according to a goods holding detection model, and identifying the interaction behaviors of each target customer and the target shelf based on the prediction results;

the goods holding detection model is obtained by training historical hand images before and after historical customers interact with the plurality of shelves respectively.

In a second aspect, a person-goods interaction behavior recognition apparatus is provided, which includes:

the receiving module is used for receiving image information collected based on a shooting place where the target goods shelf is located;

the detection module is used for detecting whether a target customer touches the touch behavior of the target shelf or not based on the image information;

the identification module is used for inquiring weighing information of the target shelf within the time from the starting time to the ending time of the touch action if the detection module detects the touch action of a target customer, and identifying the interaction action of the target customer and the target shelf based on the weighing information;

the identification module is used for predicting the handheld states of the target customers at the starting time of the touch behaviors and the handheld states of the target customers at the ending time of the touch behaviors according to the goods-holding detection model if the detection module detects the touch behaviors of the target customers, and identifying the interaction behaviors of each target customer and the target shelf based on the prediction results;

In a third aspect, a human-cargo interaction behavior recognition system is provided, including: at least one shelf, each shelf is provided with a weighing device for weighing the shelf; the upper computer is used for receiving weighing information sent by the weighing devices; at least one camera for collecting image information of the shelf from a top view angle or a side view angle; and the central control server is respectively connected with the at least one upper computer and the at least one camera, and is used for receiving the weighing information uploaded by the at least one upper computer and the image information acquired by the at least one camera and executing the human-cargo interaction behavior identification method of the first aspect.

In a fourth aspect, an electronic device is provided, including:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of human-cargo interaction behavior recognition as described in the first aspect.

In a fifth aspect, a computer-readable storage medium is provided, which stores one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the human-cargo interaction behavior recognition method of the first aspect.

According to the technical scheme provided by one or more embodiments of the specification, a person-goods interaction behavior recognition system is formed by utilizing a low-cost goods shelf, a weighing device, an upper computer, a camera and a central control server, the weighing information of a target goods shelf collected by the weighing device and the image information containing target customers collected by the camera are transmitted to the central control server through the upper computer for processing, whether touch behaviors occur on the target goods shelf or not is judged firstly according to the image information, and different recognition schemes are determined according to the number of the target customers with the touch behaviors: if only one target customer has touch behavior, the weighing information of the target shelf can be used for identifying the interaction behavior of the target customer and the target shelf; if a plurality of target customers touch, the goods holding detection model can be used for predicting images before and after each target customer touches, and the interaction behavior of each target customer and the target shelf is identified according to the prediction result. Therefore, the interaction behavior of the target customer and the target shelf can be accurately identified by combining weighing information or a goods holding detection model according to the touch behavior determined by the image information, and the identification accuracy and the identification efficiency are improved.

Drawings

In order to more clearly illustrate one or more embodiments or technical solutions in the prior art, the drawings needed to be used in the description of one or more embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present specification, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of a human-cargo interaction behavior recognition system provided in an embodiment of the present specification.

Fig. 2 a-2 b are schematic views of installation positions of a camera and a shelf provided in the embodiment of the present disclosure.

Fig. 3 is a schematic step diagram of a human-cargo interaction behavior identification method provided in an embodiment of the present specification.

Fig. 4a is a schematic diagram of correction of a keypoint trajectory based on a hand trajectory according to an embodiment of the present description.

Fig. 4b is a schematic diagram of correcting a hand trajectory based on a keypoint trajectory according to an embodiment of the present disclosure.

Fig. 5 a-5 f are schematic diagrams of weight change curves of a target rack from a start time to an end time of a touch action according to an embodiment of the present disclosure.

Fig. 6 is a flowchart of a person-goods interaction behavior identification method provided in an embodiment of the present specification.

Fig. 7 is a schematic structural diagram of a human-cargo interaction behavior recognition apparatus provided in an embodiment of the present specification.

Fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the one or more embodiments described are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments that can be derived by a person skilled in the art from one or more of the embodiments described herein without making any inventive step shall fall within the scope of protection of this document.

In view of the fact that the identification of whether a customer takes goods through a visual algorithm is not accurate, and pressure sensors for weighing are required to be arranged on each goods of an intelligent goods shelf arranged in an unmanned supermarket, the cost and the operation cost of the intelligent goods shelf are high, and the intelligent goods shelf cannot be used in more interactive scenes, especially large-scale business overtime places and other places.

For this reason, in the embodiments of the present specification, a low-cost shelf, a weighing device, an upper computer, a camera, and a central control server are used to form a human-cargo interaction behavior recognition system, and based on the weighing information of a target shelf collected by the weighing device and the image information containing a target customer collected by the camera, the weighing information and the image information are transmitted to the central control server through the upper computer for processing, specifically, whether a touch behavior occurs on the target shelf is determined in advance according to the image information, and different recognition schemes are determined according to the number of the target customers who have the touch behavior: if only one target customer touches, the weighing information of the target shelf can be used for identifying the interaction behavior of the target customer and the target shelf; if a plurality of target customers touch, the goods holding detection model can be used for predicting images before and after each target customer touches, and the interaction behavior of each target customer and the target shelf is identified according to the prediction result. Therefore, the interaction behavior of the target customer and the target shelf can be accurately identified by combining weighing information or a goods holding detection model according to the touch behavior determined by the image information, and the identification accuracy and the identification efficiency are improved.

It should be understood that the human-cargo interaction behavior identification scheme referred to in the specification can be applied to various common supermarkets, unmanned supermarkets, shopping malls, supermarkets and other shopping places provided with shelves. Alternatively, the present invention can be applied to public places such as libraries, bookstores, and service stations (umbrella rental, raincoat rental, free borrowing, and the like) that provide rental services or free use services. Therefore, by utilizing the visual image technology, digitization of retail industries such as offline business surpasses, stores, department stores and the like is enabled, accurate identification of interaction behaviors of customers and shelves can be realized in the places, and then suspected customers are accurately checked to realize the purpose of preventing theft and damage.

Referring to fig. 1, a schematic structural diagram of a human-cargo interaction behavior recognition system provided in an embodiment of the present specification is shown. The human-cargo interaction behavior recognition system can comprise: at least one shelf 102, each shelf 102 being mounted with a weighing device (constituted by a pressure sensor 1042 and a signal processing circuit 1044) for weighing the shelf; the upper computer 106 is used for receiving weighing information sent by the weighing devices; at least one camera 108 for acquiring image information of the shelves 102 from a top or side view; and a central control server 110 connected to the at least one upper computer 106 and the at least one camera 108, respectively, where the central control server 110 is configured to receive weighing information uploaded by the at least one upper computer 106 and image information acquired by the at least one camera 108 (2 are shown in the figure), and execute a human-cargo interaction behavior identification method in this specification, which is described in detail below.

The shelves 102 may be placed on a horizontal ground surface, and the shelves 102 may be reinforced with reinforcing members to prevent the shelves 102 from rocking when a customer touches or takes goods from above, and to ensure that the shelves 102 cannot rock due to pedestrian movement. The weighing device mounted on the rack 102 may be a pressure sensor 1042 and a signal processing circuit 1044. That is, each weighing device may be constituted by a plurality of pressure sensors 1042 and a signal processing circuit 1044 connecting the plurality of pressure sensors 1042. Specifically, one pressure sensor 1042 is installed at each of the four corners of the bottom of each shelf 102, that is, the four pressure sensors 1042 are installed between the shelf 102 and the ground, and the pressure generated by the entire weight of the shelf 102 is applied to the four pressure sensors 1042. The four pressure sensors 1042 are connected to a signal processing circuit 1044, and the signal processing circuit 1044 converts analog signals read from the pressure sensors 1042 into digital signals, thereby determining weighing information of the rack 102.

The shelf 102 is different from an intelligent shelf in an unmanned supermarket, the shelf 102 is a common shelf, and only the bottom of the shelf is provided with a pressure sensor 1042; the pressure sensor 1042 may be a half-bridge pressure sensor or other types of pressure sensors, which is not limited in this specification as long as it can obtain a pressure value of the rack 102 and transmit the pressure value to the signal processing circuit 1044 to obtain weighing information of the rack 102.

The upper computer 106 may be a low-cost computer (e.g., a single chip microcomputer, etc.), which has a capability of being connected to the signal processing circuit 1044 (e.g., via a GPIO interface, a transistor-transistor logic TTL interface, a serial port, etc.), and has a capability of being connected to the central control server 110 (e.g., via Wifi, bluetooth, a wired network, etc.). An upper computer 106 can process the weighing information from the plurality of shelves 102 and transmit the weighing information to the central control server 110 as needed.

In fact, the upper computer 106 may provide an http service, when the central control server 110 queries the weighing information of the shelf 102, the central control server 110 may give the serial number of the queried shelf 102, and the upper computer 106 transmits the weighing information of the shelf 102 corresponding to the serial number to the central control server 110 through the http service. In fact, the method is not limited to outputting the weighing information in the http mode, and other modes can also include various communication modes such as telnet/ssh and the like. The http mode is an embodiment, and the work flow of the upper computer 106 is as follows: the upper computer 106 waits for the query request from the central control server 110, and after receiving the query request from the central control server 110, the query request includes which shelf the weight is queried, and the upper computer 106 reads the weighing information from the signal processing circuit 1044 connected to the pressure sensor 1042 and returns the weighing information to the central control server 110.

The camera 108 is connected to the central control server 110 through a network, the camera can be installed right above the shelf 102, as shown in fig. 2a, the camera 108 is installed vertically downward in a top-view installation mode, and the center of the lens of the camera 108 is perpendicular to the plane of the shelf 102; or on the side of the shelf 102. Referring to fig. 2b, the camera is mounted diagonally downward, side-on the side of the middle of the aisle of the shelf 102. The number of the cameras 108 installed in this specification may be two, and in order to acquire image information at an omnidirectional angle, the cameras 108 may be additionally installed at other positions. The camera 108 may be a general camera, or may be an infrared camera or a camera with other capturing and processing functions. It should be understood that the cameras 108 provided for each shelf 102 may be provided with corresponding numbers to facilitate identification of the captured image data or image information, distinguishing between image information belonging to different shelves 102.

The central server 110 obtains the weighing information of the shelves 102 from the upper computer 106, and the central server 110 also needs to obtain image information from the cameras 108, comprehensively evaluate whether the target customers touch the target shelves, and whether to take the goods from the target shelves and identify which customer takes the goods when the target customers touch the target shelves.

Referring to fig. 3, a schematic step diagram of a method for identifying human-cargo interaction behavior provided in an embodiment of the present specification, where the method may include the following steps:

step 302: and receiving image information collected based on the shooting place where the target shelf is located.

Specifically, image information acquired by a camera provided at a shooting place where the target shelf is located may be periodically acquired, and the image information may be composed of a plurality of image data, and the image data may be a human body image including one target customer or a plurality of target customers.

Step 304: detecting whether a target customer touches the touch behavior of the target shelf or not based on the image information; if no touch activity is detected, no processing is done, otherwise, the following step 306 or step 308 is performed.

Optionally, in step 304, when detecting whether there is a touch behavior of the target customer touching the target shelf based on the image information, the key part of the target customer included in the image information may be tracked and located based on a hand detection algorithm, a hand tracking algorithm, a key point detection algorithm, and a key point tracking algorithm; and if the distance between the key part of the target customer and the target shelf is smaller than a first threshold value, determining that the target customer touches the target shelf.

The key point detection algorithm and the key point tracking algorithm take a human body image as input, output the positions of key parts of the human body such as hands, shoulders, feet, head and the like (the positions of key points determined by the central points of the key parts), and connect the key parts belonging to the same person with each other. However, the keypoint detection algorithm and the keypoint tracking algorithm are easily affected by the surrounding environment (such as the arm is blocked, the appearance of the arm is similar to the background, and the like), so that the touch time is difficult to judge. The hand detection algorithm takes a human body image (video) as input and detects the position of a hand in the human body image. However, there may be hands of other people in the body image, resulting in associating the wrong hand with the body. Therefore, in the embodiments of the present disclosure, the key part of the target customer included in the image information is tracked and located by combining the hand detection algorithm and the hand tracking algorithm with the key point detection algorithm and the key point tracking algorithm. And if the distance between the key part of the target customer in a certain image frame of the image information and the target shelf is smaller than a first threshold value, determining that the target customer touches the target shelf. Wherein the first threshold may be a range of values determined based on repeated touch tests, e.g., [0, 2) cm. Conversely, if the distance between the target customer's key location and the target shelf is greater than or equal to the first threshold, it may be determined that the target customer does not touch the target shelf. It should be understood that the value ranges are merely exemplary, and the specific values should be flexibly adjusted according to the touch conditions set in different applicable places.

Further, when the key positions of the target customer included in the image information are tracked and positioned based on a hand detection algorithm and a hand tracking algorithm and a key point detection algorithm and a key point tracking algorithm, a hand track and a key point track can be respectively determined, and then the hand of the target customer is tracked and positioned based on the hand track and the key point track. Specifically, the method comprises the following steps: inputting each image frame of the image information into a key point detection model to obtain a key part set of each target customer, wherein the key part set is associated with identification information and a hand of the target customer; gathering key parts which belong to the same target customer and are obtained from each image frame in the image information into tracks to obtain a key point track of each target customer; inputting each image frame of the image information into a hand detection model to obtain a positioning frame of each hand; tracking and positioning based on a positioning frame obtained from each image frame in the image information to obtain a hand track of each hand; and tracking and positioning the hand of the target customer based on the key point track and the hand track. The identification information of the target customer concerned may be face information.

The key point detection model may be obtained by taking historical human body images as training samples, labeling the positions of key parts (for example, 18 points such as a left hand, a right hand, a left foot, a right foot, a left shoulder, a right shoulder and the like) in each historical human body image, and inputting a preset model for repeated training. The hand detection model can be obtained by taking historical human body images as training samples, marking the position of a hand in each historical human body image, inputting a preset model and performing repeated training. In this way, each image frame can be respectively input into the key point detection model and the hand detection model to obtain a key part set and a hand positioning frame; accordingly, the video images in the image information can obtain the key point track and the hand track of the corresponding target customer.

Further, when the hands of the target customer are tracked and positioned based on the key point tracks and the hand tracks, if the distance between the key point corresponding to the hand in the key point track of the target customer and the center of the positioning frame in the hand track is not greater than a second threshold value in the detected continuous N image frames, determining that the key point track of the target customer is bound with the hand track of the target customer, and splicing the key point track before loss and the lost hand track to track and position the hands of the target customer after the key points except the hands in the key point track of the target customer are lost; and N is a positive integer greater than or equal to 2, and the second threshold is half of an average value of long edges of the positioning frame in the hand track after averaging. Referring to fig. 4a, a diagram of correcting a keypoint trajectory based on a hand trajectory is shown. The upper rectangular frame of FIG. 4a is the positioning frame of the hand, and these positioning frames are connected by the arrow to form the hand track; the lower broken line of fig. 4a is regarded as an arm diagram in which a plurality of key points of an arm are connected, and the arm diagrams form a key point track. The default key point track is associated with the corresponding target customer, but when a certain key point drifts or an arm is blocked to cause the key point to be lost, the key point track can be replaced by the hand track in the lost time period. In fig. 4a, assuming that a keypoint trajectory and a hand trajectory composed of 5 frames are shown, at frames 2 to 4, keypoints partially disappear, and then the hand trajectory may be replaced in this time period. Therefore, the hand track is used for correcting the key point track so as to accurately track and position the hand of the target customer.

After splicing the key point track before the loss and the hand track after the loss to track and position the hand of the target customer, if the distance between the key point corresponding to the hand in the key point track of the target customer and the center of the positioning frame in the hand track is greater than a second threshold value in M continuous image frames, the key point track of the target customer is recovered to be tracked, or the key point track of the target customer is determined to be bound with other hand tracks, wherein M is a positive integer greater than or equal to 2, and M is greater than N. In fact, when it is detected that the distance between the key point corresponding to the hand in the key point trajectory of the target customer and the center of the positioning frame in the hand trajectory is greater than the second threshold, it indicates that the bound key point trajectory and the hand trajectory may not belong to the same target customer, and the key point trajectory of the target customer may be unbound and tracked, or the key point trajectory of the target customer may be bound with other hand trajectories meeting the requirement not greater than the second threshold.

When the hand of the target customer is tracked and positioned based on the key point track and the hand track, if the hand track fracture is detected, estimating the position of the key point corresponding to the hand based on the key point track of the target customer associated with the hand corresponding to the hand track; and splicing the broken hand tracks based on the estimation result to track and locate the hand of the target customer. Referring to fig. 4b, due to occlusion and the like, especially when the hand is not visible after it is extended into the shelf, the hand trajectory of the same hand may be broken, i.e. the break occurs at frame 3 in fig. 4 b. The key point detection algorithm can estimate the position of the hand through the arm trend so as to replace the result of hand detection, and thus, the position of the hand can be estimated by using key points in the key point track so as to connect two broken hand tracks. And correcting the hand trajectory through the key point trajectory so as to accurately track and position the hand of the target customer based on the correct and complete hand trajectory.

Step 306: and if the touch behavior of a target customer is detected, inquiring weighing information of the target shelf from the starting time to the ending time of the touch behavior, and identifying the interaction behavior of the target customer and the target shelf based on the weighing information.

Referring to fig. 5a to 5f, curves of the weight value of the target rack from the start time to the end time of the touch action are shown respectively; in conjunction with these curves, the interaction behavior of the target customer with the target shelf may be identified, and if the first weight value at the start time is greater than the second weight value at the end time, and the weight value decreases within the time period, it is determined that the target customer picked from the target shelf, as in FIG. 5a; if the first weight value of the start time is equal to the second weight value of the end time and the weight values are not changed within the time period, determining that the target customer only touches the target shelf, as in FIG. 5b; if the first weight value at the start time is less than the second weight value at the end time and the weight value rises within the time period, determining that the target customer is on the target shelf, as in FIG. 5c; if the first weight value at the starting time is greater than the second weight value at the ending time, and the weight value rises first and then falls within the time period, determining that the target customer replaced a higher quality item from the target shelf, as shown in FIG. 5d; if the first weight value of the start time is equal to the second weight value of the end time and the weight value first rises and then falls within the time period, determining that the target customer has replaced equal quality items from the target shelf, as in fig. 5e; if the first weight value at the start time is less than the second weight value at the end time and the weight values rise and fall within the time period, then the target customer is determined to have replaced a lower quality item from the target shelf, as shown in FIG. 5f. Therefore, after the touch action between the target customer and the target shelf is determined, the target customer is specifically identified to be the interactive action such as picking, changing or putting or only touching the target shelf according to the weight change curve in the weighing information of the target shelf; therefore, after the specific interactive behaviors are accurately identified, whether the target customer holds the goods or not is determined conveniently based on the identified interactive behaviors, and the checking efficiency and speed are improved.

Step 308: if the touch behaviors of a plurality of target customers are detected, predicting the handheld states of the target customers at the starting time of the touch behaviors and the handheld states of the target customers at the ending time of the touch behaviors according to a goods holding detection model, and identifying the interaction behaviors of each target customer and the target shelf based on the prediction results; the goods holding detection model is obtained by training historical hand images before and after historical customers interact with the plurality of shelves respectively.

Considering that the target shelf is touched by a plurality of target customers, the situations that the hands are crossed or the goods are taken simultaneously or one goods is taken and put are possible, and then the interaction behaviors cannot be accurately identified according to the weighing information. For this purpose, the holding states of the target customers at the starting time of the occurrence of the touch behavior and at the ending time of the occurrence of the touch behavior can be predicted according to the goods-holding detection model, and specifically, a first image and a second image of each target customer in the target customers can be acquired; inputting the first image and the second image into a goods holding detection model respectively to obtain a prediction result of each target customer; wherein the first image is a hand image acquired at a start time of occurrence of the touching act for each target customer, and the second image is a hand image acquired at an end time of occurrence of the touching act for each target customer.

Furthermore, after determining the hand-held state of each target customer before and after touch through the holding detection model, the method can be used for any target customer: determining a first prediction result predicted from the first image and a second detection predicted from the second image from the prediction results of the target customer; if the first prediction result is that the target customer does not hold the goods and the second prediction result is that the target customer holds the goods, determining that the target customer takes the goods from the target shelf; if the first prediction result is that the target customer holds the goods and the second prediction result is that the target customer does not hold the goods, determining that the target customer puts the goods on the target shelf; determining that the target customer only touches the target shelf if a first prediction is that the target customer does not hold items and a second prediction is that the target customer does not hold items; and if the first prediction result is that the target customer holds the goods and the second prediction result is that the target customer holds the goods, determining that the target customer changes the goods on the target shelf.

Referring to fig. 6, a schematic flow chart of human-cargo interaction behavior recognition provided in the embodiments of the present specification is shown.

Step 602: and receiving image information collected based on the shooting place where the target shelf is located.

Step 604: and tracking and positioning the key part of the target customer contained in the image information based on a hand detection algorithm, a hand tracking algorithm, a key point detection algorithm and a key point tracking algorithm.

Step 606: and if the distance between the key part of the target customer and the target shelf is smaller than a first threshold value, determining that the target customer touches the target shelf.

Step 608: and if the touch behavior of a target customer is detected, identifying the interaction behavior of the target customer and the target shelf based on the weighing information.

Step 610: and if the touch behaviors of a plurality of target customers are detected, predicting the handheld states of the target customers at the starting time of the touch behaviors and the ending time of the touch behaviors according to the goods-holding detection model.

Step 612: identifying interaction behavior of each target customer with the target shelf based on the prediction results.

The specific implementation of steps 602 to 612 and the achieved technical effect can refer to steps 302 to 308.

Through above-mentioned technical scheme, utilize low-cost goods shelves, and weighing device, host computer and camera, well accuse server constitutes people goods interaction behavior identification system, and based on the weighing information of the target goods shelves of weighing device collection, and the image information who contains the target customer that the camera was gathered, transmit through the host computer and handle in the accuse server, it has the touch behavior to take place to the target goods shelves specifically to judge earlier according to image information whether, and according to the target customer's that takes place the touch behavior figure, confirm different identification scheme: if only one target customer has touch behavior, the weighing information of the target shelf can be used for identifying the interaction behavior of the target customer and the target shelf; if a plurality of target customers touch, the goods holding detection model can be used for predicting images before and after each target customer touches, and the interaction behavior of each target customer and the target shelf is identified according to the prediction result. Therefore, the interaction behavior of the target customer and the target shelf can be accurately identified by combining weighing information or a goods holding detection model according to the touch behavior determined by the image information, and the identification accuracy and the identification efficiency are improved.

Referring to fig. 7, a human-cargo interaction behavior recognition apparatus provided for an embodiment of the present disclosure may include:

a receiving module 702, configured to receive image information acquired based on a shooting location where a target shelf is located, where the image information includes image data of at least one target customer;

a detection module 704, configured to detect whether there is a touch behavior of a target customer touching the target shelf based on the image information;

the identification module 706 is configured to query weighing information of the target shelf from a starting time to an ending time of occurrence of the touching behavior if the touch behavior of a target customer is detected by the detection module 704, and identify an interaction behavior of the target customer with the target shelf based on the weighing information;

the identification module 706, if the detection module 704 detects touch behaviors of multiple target customers, is configured to predict a handheld state of the multiple target customers at a starting time when the touch behaviors occur and a handheld state of the multiple target customers at an ending time when the touch behaviors occur according to a holding detection model, and identify an interaction behavior of each target customer with the target shelf based on a prediction result;

the goods holding detection model is obtained by training based on historical hand images before and after historical customers respectively interact with the plurality of shelves.

Optionally, as an embodiment, when detecting whether there is a touch behavior of a target customer touching the target shelf based on the image information, the detecting module 704 is specifically configured to:

tracking and positioning the key part of the target customer contained in the image information based on a hand detection algorithm, a hand tracking algorithm, a key point detection algorithm and a key point tracking algorithm; and if the distance between the key part of the target customer and the target shelf is smaller than a first threshold value, determining that the target customer touches the target shelf.

In a specific implementation manner of the embodiment of the present specification, when the detection module 704 tracks and locates the key part of the target customer included in the image information based on the hand detection algorithm and the hand tracking algorithm and the key point detection algorithm and the key point tracking algorithm, it is specifically configured to:

inputting each image frame of the image information into a key point detection model to obtain a key part set of each target customer, wherein the key part set is associated with identification information and a hand of the target customer; gathering key parts which belong to the same target customer and are obtained from each image frame in the image information into a track to obtain a key point track of each target customer; inputting each image frame of the image information into a hand detection model to obtain a positioning frame of each hand; tracking and positioning based on a positioning frame obtained from each image frame in the image information to obtain a hand track of each hand; and tracking and positioning the hand of the target customer based on the key point track and the hand track.

In a further specific implementation manner of the embodiment of the present specification, the detecting module 704, when performing tracking and positioning on the hand of the target customer based on the key point trajectory and the hand trajectory, is specifically configured to:

in the detected N continuous image frames, determining that the distance between a key point corresponding to a hand in the key point track of the target customer and the center of a positioning frame in the hand track is not greater than a second threshold value, and then determining that the key point track of the target customer is bound with the hand track of the target customer, so that after key points except the hand in the key point track of the target customer are lost, splicing the key point track before the loss and the lost hand track to track and position the hand of the target customer; and N is a positive integer greater than or equal to 2, and the second threshold is half of the average value of the long edges of the positioning frames in the hand track after averaging.

In yet another specific implementation manner of the embodiment of the present specification, after splicing the keypoint trajectory before the loss and the hand trajectory after the loss to track and locate the hand of the target customer, the detecting module 704 is further configured to:

in M continuous image frames, if the distance between a key point corresponding to a hand in a key point track of a target customer and the center of a positioning frame in a hand track is larger than a second threshold value, the key point track of the target customer is recovered to be tracked, or the key point track of the target customer is determined to be bound with other hand tracks, wherein M is a positive integer larger than or equal to 2, and M is larger than N.

In yet another specific implementation manner of the embodiment of the present specification, the detecting module 704, when performing tracking and positioning on the hand of the target customer based on the keypoint trajectory and the hand trajectory, is specifically configured to:

when the hand trajectory fracture is detected, estimating the positions of key points corresponding to the hands based on the key point trajectories of target customers associated with the hands corresponding to the hand trajectories; and splicing the broken hand tracks based on the estimation result to track and locate the hand of the target customer.

In another specific implementation manner of the embodiment of the present specification, the identifying module 706, when identifying the interaction behavior of the target customer with the target shelf based on the weighing information, is specifically configured to:

determining the weight value of the target goods shelf within the starting time to the ending time of the touch action in the weighing information; determining that the target customer picked the item from the target shelf if the first weight value at the start time is greater than the second weight value at the end time and the weight value drops within the time period; determining that the target customer only touches the target shelf if the first weight value of the start time is equal to the second weight value of the end time and the weight values do not change within the time period; determining that the target customer is in stock on the target shelf if the first weight value at the start time is less than the second weight value at the end time and the weight value rises within the time period; determining that the target customer has replaced a higher quality item from the target shelf if the first weight value at the start time is greater than the second weight value at the end time and the weight values increase and decrease within the time period; determining that the target customer has replaced goods of equal quality from the target shelf if the first weight value at the start time is equal to the second weight value at the end time and the weight values rise and fall within the time period; if the first weight value at the start time is less than the second weight value at the end time, and the weight value first rises and then falls within the time period, it is determined that the target customer has replaced a lower quality item from the target shelf.

In yet another specific implementation manner of the embodiment of the present specification, the identifying module 706, when predicting the handheld states of the target customers at the starting time of the occurrence of the touching behavior and at the ending time of the occurrence of the touching behavior according to the goods detection model, is specifically configured to:

obtaining a first image and a second image of each of the plurality of targeted customers; inputting the first image and the second image into a goods holding detection model respectively to obtain a prediction result of each target customer; wherein the first image is a hand image acquired at a start time of occurrence of the touching act for each target customer, and the second image is a hand image acquired at an end time of occurrence of the touching act for each target customer.

In yet another specific implementation manner of the embodiment of the present specification, the identifying module 706, when identifying the interaction behavior of each target customer with the target shelf based on the prediction result, is specifically configured to:

for any target customer: determining a first prediction result predicted from the first image and a second detection predicted from the second image from the prediction results of the target customer; if the first prediction result is that the target customer does not hold the goods and the second prediction result is that the target customer holds the goods, determining that the target customer takes the goods from the target shelf; if the first prediction result is that the target customer holds the goods and the second prediction result is that the target customer does not hold the goods, determining that the target customer puts the goods on the target shelf; determining that the target customer only touches the target shelf if the first prediction result is that the target customer does not hold the item and the second prediction result is that the target customer does not hold the item; and if the first prediction result is that the target customer holds the goods and the second prediction result is that the target customer holds the goods, determining that the target customer changes the goods on the target shelf.

Through above-mentioned technical scheme, utilize low-cost goods shelves, and weighing device, host computer and camera, well accuse server constitute people goods interaction behavior identification system, and based on the weighing information of the target goods shelves that weighing device gathered, and the image information who contains the target customer that the camera was gathered, transmit through the host computer and handle for well accuse server, it takes place to have the touch action to the target goods shelves specifically to judge earlier according to image information, and according to the target customer's that takes place the touch action figure, confirm different identification scheme: if only one target customer touches, the weighing information of the target shelf can be used for identifying the interaction behavior of the target customer and the target shelf; if a plurality of target customers touch, the goods holding detection model can be used for predicting images before and after each target customer touches, and the interaction behavior of each target customer and the target shelf is identified according to the prediction result. Therefore, the interaction behavior of the target customer and the target shelf can be accurately identified by combining weighing information or a goods holding detection model according to the touch behavior determined by the image information, and the identification accuracy and the identification efficiency are improved.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 8, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 8, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the human-cargo interaction behavior recognition device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

receiving image information collected based on a shooting place where a target shelf is located, wherein the image information comprises image data of at least one target customer; detecting whether a target customer touches the touch behavior of the target shelf or not based on the image information; if the touch behavior of a target customer is detected, inquiring weighing information of the target shelf from the starting time to the ending time of the touch behavior, and identifying the interaction behavior of the target customer and the target shelf based on the weighing information; if the touch behaviors of a plurality of target customers are detected, predicting the handheld states of the target customers at the starting time of the touch behaviors and the handheld states of the target customers at the ending time of the touch behaviors according to a goods holding detection model, and identifying the interaction behaviors of each target customer and the target shelf based on the prediction results; the goods holding detection model is obtained by training based on historical hand images before and after historical customers respectively interact with the plurality of shelves.

The method performed by the apparatus according to the embodiment shown in fig. 3 or fig. 6 of the present specification may be implemented in a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The methods, steps, and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules within the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and combines hardware thereof to complete the steps of the method.

The electronic device may further execute the method in fig. 3 or fig. 6, and implement the functions of the corresponding apparatus in the embodiment shown in fig. 3 or fig. 6, which are not described herein again in this specification.

Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 3 or fig. 6, and in particular to perform the method of:

receiving image information collected based on a shooting place where a target shelf is located, wherein the image information comprises image data of at least one target customer; detecting whether a target customer touches the touch behavior of the target shelf or not based on the image information; if the touch behavior of a target customer is detected, inquiring weighing information of the target shelf within the starting time to the ending time of the touch behavior, and identifying the interaction behavior of the target customer and the target shelf based on the weighing information; if the touch behaviors of a plurality of target customers are detected, predicting the handheld states of the target customers at the starting time of the touch behaviors and the handheld states of the target customers at the ending time of the touch behaviors according to a goods holding detection model, and identifying the interaction behaviors of each target customer and the target shelf based on the prediction results; the goods holding detection model is obtained by training based on historical hand images before and after historical customers respectively interact with the plurality of shelves.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present specification shall be included in the protection scope of the present specification.

The system, apparatus, module or unit illustrated in one or more embodiments above may be implemented by a computer chip or an entity, or by an article of manufacture with a certain functionality. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims

1. A person-goods interaction behavior identification method comprises the following steps:

2. The human-cargo interaction behavior recognition method according to claim 1, wherein detecting whether a target customer touches the target shelf based on the image information comprises:

tracking and positioning the key part of the target customer contained in the image information based on a hand detection algorithm, a hand tracking algorithm, a key point detection algorithm and a key point tracking algorithm;

and if the distance between the key part of the target customer and the target shelf is smaller than a first threshold value, determining that the target customer touches the target shelf.

3. The human-cargo interaction behavior recognition method according to claim 2, wherein the tracking and positioning of the key parts of the target customers included in the image information based on a hand detection algorithm and a hand tracking algorithm and a key point detection algorithm and a key point tracking algorithm comprises:

inputting each image frame of the image information into a key point detection model to obtain a key part set of each target customer, wherein the key part set is associated with identification information and a hand of the target customer; gathering key parts which belong to the same target customer and are obtained from each image frame in the image information into a track to obtain a key point track of each target customer;

inputting each image frame of the image information into a hand detection model to obtain a positioning frame of each hand; tracking and positioning based on a positioning frame obtained from each image frame in the image information to obtain a hand track of each hand;

and tracking and positioning the hand of the target customer based on the key point track and the hand track.

4. The human-cargo interaction behavior recognition method as claimed in claim 3, wherein the tracking and positioning of the hand of the target customer based on the key point track and the hand track comprises:

in the detected N continuous image frames, determining that the distance between a key point corresponding to a hand in the key point track of the target customer and the center of a positioning frame in the hand track is not greater than a second threshold value, and then determining that the key point track of the target customer is bound with the hand track of the target customer, so that after key points except the hand in the key point track of the target customer are lost, splicing the key point track before the loss and the lost hand track to track and position the hand of the target customer; and N is a positive integer greater than or equal to 2, and the second threshold is half of an average value of long edges of the positioning frame in the hand track after averaging.

5. The human-cargo interaction behavior recognition method as claimed in claim 4, wherein after the key point track before the loss and the hand track after the loss are spliced to track and locate the hand of the target customer, the method further comprises:

6. The human-cargo interaction behavior recognition method as claimed in claim 3, wherein the tracking and positioning of the hand of the target customer based on the key point track and the hand track comprises:

when the hand trajectory fracture is detected, estimating the positions of key points corresponding to the hands based on the key point trajectories of target customers associated with the hands corresponding to the hand trajectory;

and splicing the broken hand tracks based on the estimation result to track and locate the hand of the target customer.

7. The human-cargo interaction behavior recognition method as claimed in any one of claims 1 to 6, wherein recognizing the interaction behavior of the target customer with the target shelf based on the weighing information comprises:

determining the weight value of the target goods shelf within the starting time to the ending time of the touch action in the weighing information;

determining that the target customer picks from the target shelf if the first weight value at the start time is greater than the second weight value at the end time and the weight value decreases within the start time to the end time;

determining that the target customer only touches the target shelf if the first weight value of the start time is equal to the second weight value of the end time and the weight values do not change within the start time to the end time;

determining that the target customer is in stock on the target shelf if the first weight value at the start time is less than the second weight value at the end time and the weight value increases within the start time to the end time;

determining that the target customer has replaced a higher quality item from the target shelf if the first weight value at the start time is greater than the second weight value at the end time and the weight values increase and decrease within the start time to the end time;

determining that the target customer has replaced goods of equal quality from the target shelf if the first weight value at the start time is equal to the second weight value at the end time and the weight values increase and decrease within the start time to the end time;

if the first weight value at the start time is less than the second weight value at the end time, and the weight values first rise and then fall within the start time to the end time, it is determined that the target customer has replaced a lower quality item from the target shelf.

8. The human-cargo interaction behavior recognition method as claimed in any one of claims 1 to 6, wherein the predicting of the handheld states of the plurality of target customers at the starting time of the occurrence of the touching behavior and at the ending time of the occurrence of the touching behavior according to the goods-holding detection model comprises:

obtaining a first image and a second image of each of the plurality of targeted customers;

inputting the first image and the second image into a goods holding detection model respectively to obtain a prediction result of each target customer;

wherein the first image is a hand image acquired at a start time of occurrence of the touching act for each target customer, and the second image is a hand image acquired at an end time of occurrence of the touching act for each target customer.

9. The human-cargo interaction behavior recognition method according to claim 8, wherein recognizing the interaction behavior of each target customer with the target shelf based on the prediction result comprises:

for any targeted customer:

determining a first prediction result predicted from the first image and a second detection predicted from the second image from the prediction results of the target customer;

if the first prediction result is that the target customer does not hold the goods and the second prediction result is that the target customer holds the goods, determining that the target customer takes the goods from the target shelf;

if the first prediction result is that the target customer holds the goods, and the second prediction result is that the target customer does not hold the goods, determining that the target customer puts goods on the target shelf;

determining that the target customer only touches the target shelf if a first prediction is that the target customer does not hold items and a second prediction is that the target customer does not hold items;

and if the first prediction result is that the target customer holds the goods and the second prediction result is that the target customer holds the goods, determining that the target customer changes the goods on the target shelf.

10. A human-cargo interaction behavior recognition apparatus, comprising:

the receiving module is used for receiving image information collected based on a shooting place where the target shelf is located;

11. The human-cargo interaction behavior recognition device according to claim 10, wherein the detection module, when detecting whether there is a touching behavior of a target customer touching the target shelf based on the image information, is specifically configured to:

12. A human-cargo interaction behavior recognition system, comprising: at least one shelf, each shelf is provided with a weighing device for weighing the shelf; the upper computer is used for receiving weighing information sent by the weighing devices; at least one camera for collecting image information of the shelf from a top view angle or a side view angle; and the central control server is respectively connected with the at least one upper computer and the at least one camera, and is used for receiving the weighing information uploaded by the at least one upper computer and the image information acquired by the at least one camera and executing the person-goods interaction behavior identification method according to any one of claims 1 to 9.

13. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the human-cargo interaction behavior recognition method of any of claims 1-9.

14. A computer-readable storage medium storing one or more programs which, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the human-cargo interaction behavior recognition method of any one of claims 1-9.