CN117095462A

CN117095462A - Behavior detection method, device and equipment

Info

Publication number: CN117095462A
Application number: CN202311073680.5A
Authority: CN
Inventors: 焦继乐; 李斌; 冯雪涛; 王炎
Original assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Current assignee: Zhejiang Shenxiang Intelligent Technology Co ltd
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2023-11-21

Abstract

The embodiment of the application provides a behavior detection method, a behavior detection device and behavior detection equipment, wherein the method comprises the following steps: the electronic equipment acquires a real-time video stream sent by the camera module and extracts characteristic information of a user in the real-time video stream; determining an alternative user group with characteristic information meeting a first preset condition, and determining an alternative local image corresponding to the alternative user group; determining a target category corresponding to the alternative local image based on the alternative local image and the target recognition model; determining a first user and a second user in the real-time video stream according to the characteristic information; and if the target category is the first category and the alternative user group comprises the first user and the second user, determining that the alternative partial image corresponds to the target purchasing behavior. Therefore, the electronic equipment can automatically detect the purchasing behavior based on the real-time video, the efficiency is higher, spot check judgment is not needed manually, and the labor cost can be reduced; the accuracy of purchasing behavior detection can be improved, and then the flying bill risk can be effectively reduced.

Description

Behavior detection method, device and equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a behavior detection method, apparatus, and device.

Background

In a market department off-line purchasing scenario, market special cabinet rentals are usually in positive correlation with sales. On the basis, the abnormal recorded flyer behavior brings certain economic loss to the market.

In the related art, in order to reduce the risk of the flyer, a market (or a brand operator, etc.) is usually subjected to manual spot check by a professional to reduce the flyer behavior. The method has high labor cost, and the efficiency and accuracy of manual spot check are not high, so that the fly list risk cannot be effectively reduced.

Disclosure of Invention

The application provides a behavior detection method, device and equipment, which are used for automatically identifying the purchasing behavior of a customer, improving the efficiency and accuracy of purchasing behavior detection and further effectively reducing the risk of a flyer.

In a first aspect, an embodiment of the present application provides a behavior detection method, including:

acquiring a real-time video stream sent by a camera module, and extracting characteristic information of a user in the real-time video stream;

determining an alternative user group of which the characteristic information meets a first preset condition, and determining an alternative local image corresponding to the alternative user group;

determining a target category corresponding to the alternative local image based on the alternative local image and a target recognition model;

Determining a first user and a second user in the real-time video stream according to the characteristic information;

and if the target category is a first category and the alternative user group comprises a first user and a second user, determining that the alternative partial image corresponds to the target purchasing behavior.

In a possible implementation manner, the determining the candidate user group that the feature information meets the first preset condition includes:

determining the position information of the user according to the characteristic information;

in the continuous N image frames, if the position information of two users meets the preset position condition, determining the two users as the alternative user group; wherein N is an integer greater than 1.

according to the characteristic information, determining coordinate information corresponding to the user, and determining orientation angle information corresponding to the user;

and in the continuous N image frames, if the coordinate information and the orientation angle information of the two users meet the preset position condition, determining the two users as the alternative user group.

In one possible embodiment, the method further comprises:

determining a target image area in the real-time video stream according to the real-time video stream;

detecting and determining the alternative user group according to a first frame rate aiming at the target image area;

detecting and determining the alternative user group according to a second frame rate for other image areas except the target image area; the first frame rate is greater than the second frame rate.

In one possible implementation, the target recognition model includes a target feature extraction model and a target classification model; the determining, based on the candidate local image and the target recognition model, a target category corresponding to the candidate local image includes:

determining and labeling a user human hand detection frame and a target handbag detection frame in the alternative partial image to obtain an image to be identified;

inputting the image to be identified into the target feature extraction model, determining the image features corresponding to the image to be identified, and simultaneously determining thermodynamic diagram features corresponding to the image to be identified; the thermodynamic diagram features take the human hand detection frame and the target handbag detection frame as center points;

Fusing the image features and the thermodynamic diagram features to obtain fusion features;

and inputting the fusion characteristics into the target classification model to obtain the target category corresponding to the alternative local image.

In one possible embodiment, the method further comprises:

obtaining a training sample; the training sample comprises a bag delivering behavior image marked with a hand detection frame and a bag lifting detection frame;

inputting the training sample into a preset recognition model, and determining a training category corresponding to the training sample;

and carrying out iterative training on the preset recognition model according to the training category and a preset loss function to obtain the target recognition model.

In a possible implementation manner, the determining the first user and the second user in the real-time video stream according to the feature information includes:

according to the characteristic information, carrying out association tracking on the same user in the real-time video stream, and determining the motion trail of the user;

under the condition that the motion trail meets a second preset condition, determining the user as a first user;

and determining the user as a second user under the condition that the motion trail does not meet the second preset condition.

In one possible embodiment, the second preset condition includes:

the starting time and/or the ending time of the motion trail are/is included in a preset period; and/or the number of the groups of groups,

the duration of the motion trail is greater than a preset time threshold.

In a possible implementation manner, after the determining the target class corresponding to the candidate local image, the method further includes:

if the target class is a first class, determining an association relationship between a user and a target handbag according to an alternative user group corresponding to the first class;

and if the association relation between the user and the target handbag changes, determining that the target category identification corresponding to the alternative partial image is correct.

In a second aspect, an embodiment of the present application provides a behavior detection apparatus, including:

the extraction module is used for acquiring the real-time video stream sent by the camera module and extracting the characteristic information of the user in the real-time video stream;

the first determining module is used for determining an alternative user group of which the characteristic information meets a first preset condition and determining an alternative local image corresponding to the alternative user group;

the second determining module is used for determining a target category corresponding to the alternative local image based on the alternative local image and a target recognition model;

The third determining module is used for determining a first user and a second user in the real-time video stream according to the characteristic information;

and the fourth determining module is used for determining that the alternative partial image corresponds to the target purchasing behavior if the target category is the first category and the alternative user group comprises the first user and the second user.

In a possible implementation manner, the first determining module is specifically configured to:

In one possible embodiment, the apparatus is further for:

In one possible implementation, the target recognition model includes a target feature extraction model and a target classification model; the second determining module is specifically configured to:

In one possible embodiment, the apparatus is further for:

In a possible implementation manner, the third determining module is specifically configured to:

In one possible embodiment, the second preset condition includes:

the duration of the motion trail is greater than a preset time threshold.

In one possible embodiment, the apparatus is further for:

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor;

the memory stores computer-executable instructions;

the processor executing computer-executable instructions stored in the memory causes the processor to perform the behavior detection method of any one of the first or second aspects.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions for implementing the behavior detection method of any one of the first or second aspects when the computer-executable instructions are executed by a processor.

In a fifth aspect, an embodiment of the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the behavior detection method of any one of the first or second aspects.

In the embodiment of the application, the electronic equipment acquires the real-time video stream sent by the camera module and extracts the characteristic information of the user in the real-time video stream; determining an alternative user group with characteristic information meeting a first preset condition, and determining an alternative local image corresponding to the alternative user group; determining a target category corresponding to the alternative local image based on the alternative local image and the target recognition model; determining a first user and a second user in the real-time video stream according to the characteristic information; and if the target category is the first category and the alternative user group comprises the first user and the second user, determining that the alternative partial image corresponds to the target purchasing behavior. In the embodiment of the application, the electronic equipment performs preliminary screening on the bag delivering behavior based on the characteristic information extracted from the real-time video to determine the alternative local image, then accurately identifies the alternative local image, and determines whether the target category corresponding to the alternative local image is the bag delivering behavior; while the electronic device may perform identification of the first user (clerk) and the second user (customer) based on the feature information. When the target category corresponding to the alternative partial image is the pocket-delivering behavior and the store personnel and the customer are included in the alternative user group, the electronic device determines that the purchasing behavior occurs. Therefore, the electronic equipment can automatically detect the purchasing behavior based on the real-time video, the efficiency is higher, spot check judgment is not needed manually, and the labor cost can be reduced; meanwhile, through identification of the bag delivering behaviors and identification of store staff and customers, accuracy of detection of the purchasing behaviors can be improved, subsequently, the flying behaviors can be accurately identified based on the purchasing behaviors, and further, the flying behaviors risk can be effectively reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a schematic diagram of an application scenario provided in an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a behavior detection method according to an exemplary embodiment of the present application;

FIG. 3 is a flowchart of another behavior detection method according to an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of a user orientation relationship calculation provided by an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of an algorithm for identifying the behavior of a delivery bag according to an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a technical link for behavior detection according to an exemplary embodiment of the present application;

fig. 7 is a schematic structural diagram of a behavior detection device according to an exemplary embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.

As consumer consumption levels continue to increase, the number of off-line markets continues to expand. The main revenue source in department stores is rents of all special cabinets. In general, the rent and the sales of the special cabinet are in positive correlation, and once the bill flying behavior occurs, the special cabinet can report part of the sales amount in a hidden way, thereby bringing economic loss to the market. The flyer behavior may refer to the behavior that a sales order is not entered into a sales system specified by a mall by a shopping guide (or referred to as a "clerk", "sales person", etc.) in a special cabinet, or is entered using a third party sales device not allowed by the mall, and false business data is provided to the mall, resulting in a loss of the mall. In addition, sales staffs can often carry out sales promotion activities such as points, rebates and the like, and sales personnel can bill sales sheets again during the sales promotion activities, so that the profit margin is obtained. In addition, once the quality of the goods is a problem, there may be a risk of the market and the special cabinet regarding after-sales problems due to the goods not being checked in.

On the basis, in order to reduce the risk of the flyer, the market is frequently accessed through spot check by staff, but the method has higher labor cost and lower efficiency, and the accuracy of detection is not high enough due to insufficient detection, so that the risk of the flyer cannot be effectively reduced.

In the embodiment of the application, the fly sheet risk identification is based on the premise of accurately identifying the purchasing behavior. In the purchasing scene of the department store special cabinet in the online shopping mall, after the customer purchases the commodity, the clerk can put the commodity in a self-brand shopping bag (target handbag) to give the commodity to the customer. On the basis, the electronic equipment can identify the actions of the store personnel and the customer delivery bags based on the real-time video stream acquired by the camera and the visual algorithm, so that the purchasing behavior can be accurately positioned, and then the flying bill behavior can be effectively determined based on the comparison of the purchasing behavior and the sales record, so that the flying bill risk is reduced; meanwhile, spot check access is not needed manually, so that the recognition efficiency of the fly note behavior can be improved, and the labor cost is reduced.

Fig. 1 is a schematic diagram of an application scenario provided in an exemplary embodiment of the present application. As shown in fig. 1, includes a mall staff 101, an electronic device 102. The electronic device 102 may be a mobile phone, a computer, etc. As shown in fig. 1, in the related art, in order to reduce the risk of a flight sheet, spot check confirmation is typically performed manually on the flight sheet by a mall staff 101. The method has low efficiency and high labor cost, and can not accurately identify a large number of flyers, so that the flyer risk can not be effectively reduced.

In the embodiment of the present application, the electronic device 102 acquires a real-time video stream, detects the bag delivering behavior in the real-time video stream based on a visual algorithm, and determines the purchasing behavior between the store clerk and the customer, so as to determine the flyer behavior based on the purchasing behavior. Thus, the electronic device 102 can automatically and accurately identify the purchasing behavior and the flyer behavior without manual spot check confirmation, so that the determination efficiency of the flyer behavior can be improved, and the flyer risk can be reduced.

The technical scheme shown in the application is described in detail by specific examples. It should be noted that the following embodiments may exist alone or in combination with each other, and for the same or similar content, the description will not be repeated in different embodiments.

Fig. 2 is a flow chart of a behavior detection method according to an exemplary embodiment of the present application. Referring to fig. 2, the behavior detection method may include:

s201, acquiring a real-time video stream sent by the camera module, and extracting characteristic information of a user in the real-time video stream.

The execution body of the embodiment of the application can be electronic equipment or a behavior detection device arranged in the electronic equipment. The behavior detection means may be implemented by software or by a combination of software and hardware. For ease of understanding, hereinafter, an execution body will be described as an example of an electronic device.

In the embodiment of the application, the camera module may refer to a camera for video acquisition, and the camera module may include at least one camera, for example, may include shooting cameras in different positions, and the like. The real-time video stream may refer to a real-time video stream collected and uploaded by the camera module. The user may refer to a person included in the real-time video stream, and may specifically include a customer, a store clerk, and the like. The feature information may refer to human body feature information of the user, and specifically may include human body key point information, human body recognition feature (Reid), and the like.

In this step, the electronic device is in communication connection with the camera module, which may be specifically wired connection or wireless connection, which is not limited in the embodiment of the present application. The electronic equipment can acquire the real-time video stream sent by the camera module, detect and identify the real-time video stream, and extract the characteristic information of each user in the real-time video stream. For example, the electronic device may detect a human frame of a user in each image frame of the real-time video stream based on a multi-target tracking algorithm, such as a real-time online tracking algorithm deep, a multi-target tracking network (Joint Detection and Embedding, JDE), etc., and extract feature information of the human frame.

S202, determining an alternative user group with the characteristic information meeting a first preset condition, and determining an alternative local image corresponding to the alternative user group.

In the embodiment of the present application, the first preset condition may refer to a preset screening condition of a bag delivering behavior, and specifically may refer to that in N consecutive image frames, position information of two users satisfies a preset position condition, and so on. An alternative user group may refer to two users who may have a pocket behavior. The alternative partial images may refer to a sequence of partial images corresponding to the alternative user group, where a pocket-delivering behavior may exist.

In the step, after the characteristic information of each user is extracted based on the real-time video stream, the electronic equipment can perform preliminary screening on the bag delivering behaviors so as to reduce the calculated amount and save the calculation resources. Since the bag delivery occurs, the store clerk is typically located relatively close to the customer and typically performs the bag delivery face-to-face. Based on the feature information, the electronic device can determine the position information of each user, such as sole coordinates, orientation angles and the like, then determine two users with the position information meeting preset position conditions (sole coordinates are similar, human body orientations face to face and the like) as candidate user groups, and then cut and store partial images corresponding to the candidate user groups from image frames of the real-time video stream based on algorithms such as image cutting and the like to obtain candidate partial images corresponding to the candidate user groups, and the candidate partial images are used as candidate sequences for detecting and identifying the pocket delivering behaviors. Therefore, the electronic equipment can filter out users which do not belong to the bag delivering behaviors by primarily screening the bag delivering behavior users in the real-time video stream, so that the calculation amount of the subsequent bag delivering behavior identification can be reduced, the system calculation resource is saved, and meanwhile, the accuracy of the bag delivering behavior detection can be improved.

S203, determining a target category corresponding to the alternative local image based on the alternative local image and the target recognition model.

In the embodiment of the application, the target recognition model may refer to a pre-trained bag delivering behavior recognition model. The target recognition model specifically comprises a target feature extraction model, a target classification model and the like, and can realize image feature extraction and feature classification of the alternative local image. The target category may refer to a category corresponding to the candidate partial image, and the target category may include a first category and a second category, where the first category may include the bagging behavior in the candidate partial image, and the second category may not include the bagging behavior in the candidate partial image. Specifically, after determining the candidate local image corresponding to the candidate user group, the electronic device may input the candidate local image into the target recognition model to perform detection and recognition, and the target recognition model may output a target category corresponding to the candidate local image.

S204, determining a first user and a second user in the real-time video stream according to the characteristic information.

In the embodiment of the application, the first user may refer to a store clerk (sales person, shopping guide) in the real-time video stream, and the like. The second user may refer to a customer (consumer, pedestrian) in the real-time video stream, etc. In an actual offline scenario, there may be a pocket behavior between customers of the same party, and the pocket behavior between customers may not be clearly determined as a purchasing behavior. In order to improve the accuracy of the purchase behavior determination, the electronic device may determine, based on the feature information, identities of respective users in the real-time video stream, that is, determine whether the users in the real-time video stream are first users or second users. Specifically, the electronic device may determine the behavior trace of each user based on the multi-target tracking algorithm, and then determine the first user according to the time feature and the position feature of the behavior trace, and the other users not belonging to the first user are the second users.

S205, if the target category is the first category and the alternative user group comprises the first user and the second user, determining that the alternative partial image corresponds to the target purchasing behavior.

In the embodiment of the application, the target purchasing behavior may refer to purchasing behavior corresponding to a purchasing event between a customer and a clerk. After carrying out the bag delivering action recognition and the user identification, if the target category corresponding to the alternative local image is the first category, namely the alternative local image comprises the bag delivering action; and the candidate user group corresponding to the candidate partial image comprises a first user and a second user, namely, two users in the candidate partial image are respectively a store clerk and a customer, and the electronic equipment can determine the target purchasing behavior corresponding to the candidate partial image. Therefore, the electronic equipment detects and identifies the real-time video stream based on the visual algorithm, so that the purchasing behavior in the real-time video stream can be accurately positioned, the accurate identification of the purchasing behavior is realized, the behavior of the flyer can be determined based on the purchasing behavior and sales data, and the flyer risk can be effectively reduced.

On the basis of the above embodiment, fig. 3 is a schematic flow chart of another behavior detection method according to an exemplary embodiment of the present application. Referring to fig. 3, the behavior detection method may include:

s301, acquiring a real-time video stream sent by the camera module, and extracting characteristic information of a user in the real-time video stream.

S302, determining the position information of the user according to the characteristic information; in the continuous N image frames, if the position information of the two users meets the preset position condition, determining the two users as an alternative user group; wherein N is an integer greater than 1.

In the embodiment of the application, the position information can refer to information such as sole coordinates of a user, human body orientation angles and the like. The preset position condition may refer to a preset position relationship condition, and specifically may refer to that the distance between the sole coordinates of two users is smaller than a preset distance threshold, the human body orientations of the two users are in face-to-face relationship, and the like. Specifically, after determining the characteristic information of the user in the real-time video stream, the electronic device may determine the position information of the user according to the characteristic information such as the key points of the human body, and then determine the user whose two position information satisfy the preset position condition as the candidate user group with the possible pocket delivering behavior.

In one possible embodiment, step S302 may be specifically implemented by the following steps (1) to (2):

(1) And determining coordinate information corresponding to the user according to the characteristic information, and determining orientation angle information corresponding to the user.

In the embodiment of the application, the coordinate information may refer to sole coordinates of a user, and in particular may refer to world coordinates or image coordinates of sole points of a human body of the user. Specifically, the electronic device may determine the image coordinates of the sole points of the user's human body based on the information such as the key points of the human body in the feature information. When calibration information (the corresponding relation between the image pixel distance and the actual scene distance) is preconfigured in the camera module, the electronic equipment can convert the image coordinates of the sole points of the user into world coordinates under a world coordinate system, and the world coordinates are used as coordinate information corresponding to the user; when the camera module is not configured with calibration information, the electronic equipment can directly take the image coordinates of the sole points of the user as the coordinate information corresponding to the user.

The orientation angle information may refer to an angle between a vertical line and a horizontal line (X axis of world coordinate system) of a straight line where two shoulders of a human body are located. The electronic equipment can detect and acquire the orientation angle information of the user through the human body orientation multi-classification model. Of course, the electronic device may also determine the coordinate information and the orientation angle information of the user in other manners, and specifically may be flexibly set based on actual requirements, which is not limited by the embodiment of the present application.

(2) And in the continuous N image frames, if the coordinate information and the orientation angle information of the two users meet the preset position condition, determining the two users as the alternative user group.

In the embodiment of the application, after determining the coordinate information and the orientation angle information of the user, the electronic device can further determine whether the position information between the two users meets the preset position condition, specifically determine whether the distance between the coordinate information of the two users is smaller than the preset distance threshold value, and determine whether the orientation angle information of the two users meets the face-to-face relationship.

In this step, the electronic device may calculate a distance between the coordinate information of the two users, and if the distance is smaller than a preset distance threshold, the electronic device may determine that the coordinate information of the two users satisfies a preset positional relationship. Illustratively, it is assumed that the coordinates of the plantar points in the human body frames of the two users in the world coordinate system are (x ₁ ,y ₁ ) And (x) ₂ ,y ₂ ) The preset distance threshold is T ₁ The distance T between two user coordinate information may be determined byTo calculate. When the distance T is smaller than the preset distance threshold T ₁ And when the coordinate information between the two users meets the preset position relation, the electronic equipment can determine that the coordinate information between the two users meets the preset position relation.

Meanwhile, the electronic device can determine the orientation relation between the two users based on the orientation angle information of the users, and if the orientation relation is smaller than a preset angle threshold, the electronic device can determine that the two users are in face-to-face relation, namely the orientation angle information of the two users meets the preset position relation. Illustratively, assume that the human body orientation angle information in the human body frames of the two users is θ respectively ₁ And theta ₂ The preset angle threshold is T _θ The orientation relation θ between the two user orientation angle information may be calculated by θ=fabs (θ ₁ -(π-θ ₂ ) Is calculated). When the orientation relation theta is smaller than the preset angle threshold T _θ And the electronic equipment can determine that the orientation angle information between the two users meets the preset position relation.

Illustratively, fig. 4 is a schematic diagram of a user orientation relationship calculation according to an exemplary embodiment of the present application. As shown in fig. 4, user a left shoulder and rightThe included angle between the vertical line of the straight line between the shoulders and the horizontal line is theta ₁ I.e. the angle of orientation information of user A is θ ₁ . The included angle between the vertical line of the straight line between the left shoulder and the right shoulder of the user B and the horizontal line is theta ₂ I.e. the angle of orientation information of user B is θ ₂ . The electronic device may be based on θ ₁ And theta ₂ Determining the orientation relation theta between the user A and the user B, and when the theta is smaller than a preset angle threshold T _θ When the electronic device determines that the orientation angle information between the user a and the user B satisfies the preset positional relationship.

In the embodiment of the application, the electronic equipment determines the position information of the user according to the characteristic information of the user, determines the alternative user group possibly having the bag delivering action based on whether the position information meets the preset position condition or not through the modes of distance calculation, angle calculation and the like, thus being capable of accurately judging the relative position among the users, improving the accuracy of determining the alternative user group of the bag delivering action, further improving the accuracy of identifying the subsequent purchasing action, reducing the calculated amount and saving the system resources through screening.

In one possible embodiment, the behavior detection method may further include the following steps (3) to (5):

(3) And determining a target image area in the real-time video stream according to the real-time video stream.

In the embodiment of the application, the target image area may refer to an area with higher occurrence probability of a bag delivering action, such as a cash desk area, a rest waiting area (sofa rest area), and the like, which are shot by a real-time video stream. The electronic device may determine a target image area in the real-time video stream based on an image detection algorithm, a target recognition algorithm, etc.,

(4) An alternative user group is detected and determined at a first frame rate for the target image region.

(5) Detecting and determining an alternative user group according to a second frame rate for other image areas except the target image area; the first frame rate is greater than the second frame rate.

In the embodiment of the application, the frame rate can be the rate of image detection of the real-time video stream by the electronic equipment, and the higher the frame rate is, the more image frames are detected by the electronic equipment in unit time. Because the shooting range of the real-time video stream is larger, the electronic equipment calculates larger pressure when detecting every frame. In the purchase scene of on-line shopping mall department stores, the probability that the pocket delivering action between the store staff and the customers occurs in the cash register area and the rest waiting area is high. On the basis, the electronic equipment can detect and identify target image areas such as a cash register area, a rest waiting area and the like according to a higher first frame rate; for other image areas than the target image area, detection and identification may be performed at a second frame rate that is less than the first frame rate. The electronic equipment detects and identifies different image areas by adopting different frame rates, so that the recall rate of the alternative user group can be improved, system resources can be saved, and the calculation pressure of the electronic equipment can be reduced.

S303, determining an alternative local image corresponding to the alternative user group.

In the embodiment of the application, in the continuous N image frames, when the position information of two users meets the preset position condition, the electronic equipment can determine that the two users are the alternative user groups. The electronic device may cut partial images corresponding to the candidate user groups in the N image frames to obtain candidate partial images, and then store the candidate partial images in an image sequence manner, and may subsequently identify the bag delivering behavior of the candidate partial images.

S304, determining and labeling a human hand detection frame of the user and a target handbag detection frame in the alternative partial image to obtain an image to be identified.

In the embodiment of the application, the human hand detection frame can be used for marking the hand area of the human body. The target handbag detection frame can be used for marking a target handbag, and the target handbag can be brand handbag corresponding to a current special cabinet shot by a real-time video stream. The image to be identified can be an alternative partial image marked by the image detection model

In this step, after determining the candidate local image, the electronic device may input the candidate local image into the image detection model, to obtain an image sequence to be identified, labeled with the human hand detection frame and the target bag detection frame, and may subsequently perform bag delivering behavior identification based on the image sequence to be identified and the target identification model. Therefore, when the electronic equipment performs the bag delivering behavior recognition, the information of the hands of the human body and the target bag is used as the prior information, and the accuracy of the bag delivering behavior recognition based on the image sequence can be further improved.

In the process of determining and marking the target handbag detection frame, the electronic equipment can also detect the target handbag to determine whether the target handbag is the handbag of the current special cabinet. Specifically, the electronic device may extract the image feature corresponding to the target bag in the alternative local image based on the image detection model, and then compare and match with the preconfigured current bag feature of the special cabinet, if the two feature are not matched, the target bag in the alternative local image is not the bag of the current special cabinet, and if the alternative local image does not have the target purchasing behavior based on the bag delivering behavior, the subsequent recognition and judgment process is not needed, so that the calculated amount can be reduced, and the system computing resource is saved. If the target bag in the alternative partial image is the current bag of the special cabinet, the electronic equipment can continue to execute the subsequent identification judgment process.

S305, inputting the image to be identified into a target feature extraction model, determining the image features corresponding to the image to be identified, and simultaneously determining thermodynamic diagram features corresponding to the image to be identified; the thermodynamic diagram features take a human hand detection frame and a target handbag detection frame as center points.

In the embodiment of the application, the target recognition model comprises a target feature extraction model and a target classification model. The target feature extraction model may be created based on a dual-flow network (Two-Stream Convolutional Networks), where one network branch may be used to extract image features of an image to be identified, and the other network branch is used to extract thermodynamic diagram features of the image to be identified with a human hand detection frame and a target bag detection frame as center points. The target classification model may be created based on a convolutional neural network for feature classification. Of course, the object recognition model may also be formed by other models, or may be created based on other algorithms, and specifically may be flexibly set based on actual requirements.

S306, fusing the image features and thermodynamic diagram features to obtain fusion features; and inputting the fusion characteristics into a target classification model to obtain target categories corresponding to the alternative partial images.

In the embodiment of the application, the fusion feature may refer to a feature obtained by fusing the image feature of the image to be identified and the thermodynamic diagram feature. Specifically, the electronic device may extract image features of the image to be identified through the target feature extraction model, and extract thermodynamic diagram features of the image to be identified with the human hand detection frame and the target bag detection frame as centers; the electronic equipment can fuse the image features and thermodynamic diagram features to obtain fusion features; and then the electronic equipment can input the fusion characteristic into a target classification model to obtain a target class corresponding to the image to be identified, namely a target class corresponding to the alternative local image. In this way, the electronic equipment uses the human hand and the target handbag as priori information, and determines the target category corresponding to the alternative local image through feature extraction and classification recognition, so that the accuracy of the detection of the bag delivering behavior is improved.

In one possible implementation, the object recognition model may be trained as follows:

(6) Obtaining a training sample; the training sample comprises a bag delivering behavior image marked with a hand detection frame and a bag detection frame.

In the embodiment of the application, the training sample can be a sample for model training, wherein the training sample can comprise a bag delivering behavior image pre-marked with a human hand detection frame and a bag delivering detection frame. The training sample may include a bag-delivering behavior image as a positive sample and may include other non-bag-delivering behavior images as a negative sample, and the specific configuration of the training sample is not limited in the embodiment of the present application.

(7) Inputting the training sample into a preset recognition model, and determining the training category corresponding to the training sample.

In the embodiment of the present application, the preset recognition model may refer to a recognition model to be trained, and specifically may include a preset feature extraction model and a preset classification model. The training class may refer to a class corresponding to the training sample, and specifically may include a first class that belongs to the bag delivering behavior and a second class that does not belong to the bag delivering behavior.

(8) And performing iterative training on the preset recognition model according to the training category and the preset loss function to obtain the target recognition model.

In the embodiment of the application, after determining the training category corresponding to the training sample, the electronic device can determine the loss value based on the training category and the preset loss function, and iteratively train the preset recognition model based on the loss value, so as to finally obtain the trained target recognition model. The specific form of the preset loss function and the specific mode of the iterative training can be flexibly set based on actual requirements, and the embodiment of the application is not limited to the specific form.

Illustratively, fig. 5 is a schematic diagram of an algorithm for identifying a behavior of a delivery bag according to an exemplary embodiment of the present application. As shown in fig. 5, the electronic device first performs detection and identification on the candidate local image sequence based on the image detection model, and marks a human hand detection frame and a target handbag detection frame in the candidate local image sequence to obtain an image sequence to be identified. Then the electronic equipment can input the image sequence to be identified into a double-flow network of the target feature extraction model, wherein one network branch extracts the image features of the image sequence to be identified; the other network branch takes the human hand detection frame and the target handbag detection frame as the center to generate thermodynamic diagrams, and the thermodynamic diagram features corresponding to the image sequences to be identified are extracted. The electronic equipment can then fuse the image features corresponding to the image to be identified with thermodynamic diagram features to obtain fusion features, and finally the fusion features can be input into the target classification model to obtain target categories corresponding to the image to be identified.

In the embodiment of the application, when the bag is presented in the crowd, the nearby human body and the bag are easily and mistakenly identified as the bag delivering action, and the mistaken identification result is obtained. The electronic equipment can filter the false recognition result of the bag delivering behavior in a post-processing mode. In a possible implementation manner, after step S304, the behavior detection method may further include the following steps:

(9) And if the target class is the first class, determining the association relation between the user and the target handbag according to the alternative user group corresponding to the first class.

(10) And if the association relation between the user and the target handbag changes, determining that the target category identification corresponding to the alternative partial image is correct.

In the embodiment of the application, the association relationship may refer to a matching relationship between the user and the target handbag, for example, whether the target handbag belongs to an accessory of the user or not. When the electronic equipment identifies that the target category corresponding to the alternative local image is the first category, namely the alternative local image comprises the bag delivering behavior, the electronic equipment can track and detect the alternative user group corresponding to the alternative local image, and the association relation between the user and the target bag is determined. If the association relationship between the user and the target handbag changes, for example, the target handbag is converted from the attachment of the user A to the attachment of the user B, the fact that the target handbag is transferred between the user A and the user B is determined, and the identification result that the target category corresponding to the alternative partial image is the first category is accurate. If the association relation between the user and the target handbag is not changed, determining that the target handbag is not transmitted between the two users, wherein the target category corresponding to the alternative partial image is the first category, which is the wrong recognition result and belongs to the wrong recognition. In this way, the electronic device further confirms whether the target handbag completes transmission between two users by tracking and detecting the alternative user group comprising the handbag behaviors, and can filter out the mistakenly-identified handbag behaviors, so that the accuracy of the identification of the handbag behaviors can be ensured.

It should be emphasized that, in the various embodiments of the present application, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

S307, carrying out association tracking on the same user in the real-time video stream according to the characteristic information, and determining the motion trail of the user.

In the embodiment of the application, the motion trail can be a complete motion trail of a user. Because of reasons of shielding, posture change, entering a lens dead angle, entering a fitting room and the like, when the electronic equipment tracks and detects a user based on a multi-target tracking algorithm, the obtained track is usually not a complete track, and track fragments of the same person are required to be associated. In addition, when the camera module includes multiple cameras at different positions, the electronic device also needs to associate track segments of the same user under different cameras. The electronic equipment can perform association matching based on a plurality of track segments of the user and REID features of human frames of the user and based on feature similarity to obtain a complete motion track of the same user. Of course, the electronic device may also acquire the motion trail of the user in other manners, which is not limited by the embodiment of the present application.

S308, determining the user as a first user under the condition that the motion trail meets a second preset condition; and determining the user as a second user under the condition that the motion trail does not meet the second preset condition.

In the embodiment of the present application, the second preset condition may refer to a preset judgment condition of the first user, that is, a clerk. The second preset condition may be set based on a time characteristic, which may include a start time, an end time, a duration time, etc., and a moving position characteristic, which may refer to a movement trace passing a specific position (a door opening), etc. After determining the motion trail of the user, the electronic device can further determine whether the motion trail meets a second preset condition, and if the motion trail of the user meets the second preset condition, the electronic device can determine that the user is the first user; if the motion trail of the user does not meet the second preset condition, the electronic equipment can determine that the user is a second user.

In one possible embodiment, the second preset condition includes:

the starting time and/or the ending time of the motion trail are included in a preset period; and/or the number of the groups of groups,

the duration of the motion profile is greater than a preset time threshold.

In the embodiment of the present application, the start time may refer to a time when the motion trail of the user starts to appear. The end time may refer to a time when the user motion profile ends. The preset period may refer to a non-business period of the market special cabinet, for example, after the market special cabinet ends business hours and before the business hours begin. The preset time threshold may be a preset duration threshold, and may specifically be 3 hours, 4 hours, or the like.

In this step, in one manner, when determining whether the motion trajectory of the user meets the second preset condition, the electronic device may determine whether the start time and/or the end time of the motion trajectory of the user is included in the preset period. When the start time of the user's movement track is before the store chest begins to open and/or the end time of the user's movement track is after the store chest ends to open, the electronic device may determine that the user is the first user, i.e., the clerk. In another manner, the electronic device may determine whether the duration of the user motion trajectory is greater than a preset time threshold when determining whether the user motion trajectory satisfies the second preset condition. When the duration of the user motion profile is greater than a preset time threshold, the electronic device may determine that the user is a clerk.

In the embodiment of the application, the electronic equipment determines the identity based on whether the motion trail of the user meets the second preset condition, so that whether the user is a clerk or a customer can be accurately judged, and the purchasing behavior is determined based on the identity of the user, so that the accuracy of determining the purchasing behavior can be improved.

S309, if the target category is the first category and the candidate user group comprises the first user and the second user, determining that the candidate partial image corresponds to the target purchasing behavior.

In the embodiment of the application, when the target category corresponding to the alternative local image is the first category, that is, the alternative local image includes the pocket delivering behavior, and two users in the alternative user group are respectively a customer and a store clerk, the electronic device can determine that the target purchasing behavior corresponding to the alternative local image exists between the customer and the store clerk. The follow-up comparison analysis can be carried out based on the target purchasing behavior and sales data, the flyer behavior can be comprehensively and accurately determined, and the flyer risk can be reduced.

On the basis of any one of the above embodiments, fig. 6 is a schematic technical link diagram of behavior detection according to an exemplary embodiment of the present application. As shown in fig. 6, the electronic device performs tracking detection on pedestrians, that is, each user in the real-time video stream based on the real-time video stream, and extracts feature information of each user. The electronic equipment determines the position information of the user based on the characteristic information, determines two users with the position information meeting the preset position condition in the continuous N frames as alternative user groups, cuts to obtain alternative local image sequences corresponding to the alternative user groups, and achieves preliminary screening of the users with the bag delivering behaviors. And the electronic equipment can input the alternative local image sequence into the target recognition model, determine the target category corresponding to the alternative local image sequence, and determine whether the alternative local image sequence comprises the bag delivering behavior.

Meanwhile, the electronic equipment carries out association matching on track segments of the same user in the real-time video stream to obtain the complete motion track of the same user, so that the merging of pedestrians is realized. The electronic device may then determine the identity of the user based on whether the user's motion profile satisfies a second preset condition, determining the first user (clerk) and the second user (customer). When the alternative partial image sequence includes a pocket-delivering action, and two users included in the alternative user group are a clerk and a customer, respectively, the electronic device may determine that the alternative partial image sequence corresponds to a target purchasing action, and a purchasing event occurs between the clerk and the customer.

In the embodiment of the application, the electronic equipment identifies shopping behaviors by identifying the pocket delivering behaviors of the store personnel and the customers, the feasibility is high, the identification link is short, the accuracy of the purchase behavior determination is higher, and the identification of the flyer risk is more accurate. In addition, the primary screening of the bag delivering behavior user is performed based on the sole coordinate information and the human body orientation angle, so that the accuracy of purchasing behavior identification can be improved, and meanwhile, the calculation pressure of the system is reduced. In addition, when carrying out the identification of the delivery bag behavior, the electronic equipment takes the human hand and the target handbag as priori information, so that the accuracy of the identification of the purchasing behavior based on the real-time video stream can be further improved.

Fig. 7 is a schematic structural diagram of a behavior detection apparatus according to an exemplary embodiment of the present application, please refer to fig. 7, the behavior detection apparatus 70 includes:

the extracting module 71 is configured to obtain the real-time video stream sent by the camera module, and extract feature information of a user in the real-time video stream;

a first determining module 72, configured to determine an alternative user group whose feature information meets a first preset condition, and determine an alternative partial image corresponding to the alternative user group;

a second determining module 73, configured to determine, based on the candidate local image and the target recognition model, a target class corresponding to the candidate local image;

a third determining module 74, configured to determine a first user and a second user in the real-time video stream according to the feature information;

a fourth determining module 75, configured to determine that the candidate partial image corresponds to the target purchasing behavior if the target category is the first category and the candidate user group includes the first user and the second user.

In one possible implementation, the first determining module 72 is specifically configured to:

in the continuous N image frames, if the position information of the two users meets the preset position condition, determining the two users as an alternative user group; wherein N is an integer greater than 1.

in the continuous N image frames, if the coordinate information and the orientation angle information of the two users meet the preset position condition, the two users are determined to be the candidate user groups.

In one possible embodiment, the apparatus 70 is further configured to:

detecting and determining an alternative user group according to a first frame rate aiming at a target image area;

detecting and determining an alternative user group according to a second frame rate for other image areas except the target image area; the first frame rate is greater than the second frame rate.

In one possible implementation, the object recognition model includes an object feature extraction model and an object classification model; the second determining module 73 is specifically configured to:

determining and labeling a human hand detection frame of a user and a target handbag detection frame in the alternative partial image to obtain an image to be identified;

inputting the image to be identified into a target feature extraction model, determining the image features corresponding to the image to be identified, and simultaneously determining thermodynamic diagram features corresponding to the image to be identified; the thermodynamic diagram features take a human hand detection frame and a target handbag detection frame as center points;

Fusing the image features and thermodynamic diagram features to obtain fusion features;

and inputting the fusion characteristics into a target classification model to obtain target categories corresponding to the alternative partial images.

In one possible embodiment, the apparatus 70 is further configured to:

and carrying out iterative training on the preset recognition model according to the training category and the preset loss function to obtain the target recognition model.

In one possible implementation, the third determining module 74 is specifically configured to:

carrying out association tracking on the same user in a real-time video stream according to the characteristic information, and determining the motion trail of the user;

In one possible embodiment, the second preset condition includes:

The duration of the motion profile is greater than a preset time threshold.

In one possible embodiment, the apparatus 70 is further configured to:

if the target class is the first class, determining the association relation between the user and the target handbag according to the alternative user group corresponding to the first class;

if the association relation between the user and the target handbag changes, the target category identification corresponding to the alternative partial image is determined to be correct.

The behavior detection device 70 provided in the embodiment of the present application may execute the technical solution shown in the foregoing method embodiment, and its implementation principle and beneficial effects are similar, and will not be described herein again.

Fig. 8 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application, referring to fig. 8, the electronic device 80 may include a processor 81 and a memory 82. The processor 81, the memory 82, and the like are illustratively interconnected by a bus 83.

Memory 82 stores computer-executable instructions;

the processor 81 executes computer-executable instructions stored in the memory 82, causing the processor 81 to perform the behavior detection method as shown in the method embodiments described above.

Accordingly, an embodiment of the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the behavior detection method of the above-described method embodiment when the computer-executable instructions are executed by a processor.

Accordingly, embodiments of the present application may also provide a computer program product, including a computer program, which, when executed by a processor, may implement the behavior detection method shown in the foregoing method embodiments.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors, input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A behavior detection method, comprising:

2. The method of claim 1, wherein the determining the set of alternative users for which the characteristic information satisfies a first preset condition comprises:

3. The method of claim 2, wherein the determining the set of alternative users for which the characteristic information satisfies a first preset condition comprises:

4. The method according to claim 1, wherein the method further comprises:

5. The method of claim 1, wherein the target recognition model comprises a target feature extraction model and a target classification model; the determining, based on the candidate local image and the target recognition model, a target category corresponding to the candidate local image includes:

6. The method according to claim 1, wherein the method further comprises:

7. The method of claim 1, wherein determining the first user and the second user in the real-time video stream based on the characteristic information comprises:

8. The method of claim 7, wherein the second preset condition comprises:

the duration of the motion trail is greater than a preset time threshold.

9. The method of claim 1, wherein after the determining the target class to which the candidate partial image corresponds, the method further comprises:

10. A behavior detection apparatus, characterized by comprising:

11. An electronic device, comprising: a memory and a processor;

the memory stores computer-executable instructions;

the processor executing computer-executable instructions stored in the memory, causing the processor to perform the behavior detection method of any one of claims 1 to 9.

12. A computer readable storage medium having stored therein computer executable instructions for implementing the behavior detection method of any one of claims 1 to 9 when the computer executable instructions are executed by a processor.

13. A computer program product comprising a computer program which, when executed by a computer, implements the behavior detection method of any one of claims 1 to 9.