CN116029715A

CN116029715A - Bill processing method, device, equipment and storage medium

Info

Publication number: CN116029715A
Application number: CN202111241937.4A
Authority: CN
Inventors: 欧霄; 李健锐
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2023-04-28

Abstract

The application provides a bill processing method, a bill processing device, bill processing equipment and a storage medium, wherein the bill processing method comprises the following steps: determining a motion trail of a user in a target space area; according to the motion trail of the user, when the user is detected to enter a checkout area of the target space area, acquiring commodities consumed by the user in the target space area and acquiring consumption activity information corresponding to the target space area; and determining a consumption bill of the user according to the commodity consumed by the user and the consumption activity information. The intelligent goods shelf can effectively solve the dilemma that the intelligent goods shelf can only independently settle goods on the goods shelf and can not be combined with consumption activity information corresponding to the target space region, so that the consumption cost of a user is reduced, and the consumption experience of the user is improved.

Description

Bill processing method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of smart retail, in particular to a bill processing method, device, equipment and storage medium.

Background

With the continuous development of technology, smart retailing is becoming popular in people's daily life, and equipment such as face-brushing payment, unmanned self-service vending machine has been presented, so that people begin to enjoy self-service convenience anytime and anywhere.

However, current unmanned self-service vending machines cannot be combined with consumer activities (e.g., full deactivation), which in turn results in increased consumer costs for the user.

Disclosure of Invention

The application provides a bill processing method, device, equipment and storage medium, so as to reduce consumption cost of a user.

In a first aspect, the present application provides a bill processing method, including:

determining a motion trail of a user in a target space area;

acquiring commodities consumed by the user in the target space area and consumption activity information corresponding to the target space area when detecting that the user enters a checkout area of the target space area according to the motion trail of the user;

and determining a consumption bill of the user according to the commodity consumed by the user and the consumption activity information.

In a second aspect, there is provided a bill handling apparatus comprising:

the track determining unit is used for determining the motion track of the user in the target space area;

the acquisition unit is used for acquiring commodities consumed by the user in the target space area and acquiring consumption activity information corresponding to the target space area when detecting that the user enters a checkout area of the target space area according to the motion trail of the user;

And the bill determining unit is used for determining the consumption bill of the user according to the commodity consumed by the user and the consumption activity information.

In some embodiments, the track determining unit is specifically configured to obtain a first image, acquired by the camera at a current moment, of the user in the target space region; detecting a target object of the first image to obtain N first detection frames and first characteristic information of the first image, wherein the target object comprises the user, and N is a positive integer; and determining the motion trail of the user in the target space area at the current moment according to the N first detection frames and the first characteristic information.

In some embodiments, the track determining unit is specifically configured to input the first image into a JDE model, and obtain the N first detection frames and the first feature information output by the JDE model.

Optionally, a convolution layer is used in the JDE model to replace the Focus module.

Optionally, at least one activation function in the JDE model is a modified linear unit ReLU activation function.

Optionally, the JDE model is trained by using a training picture sequence data set with a humanoid form during training, and the training picture is marked with a humanoid form identifier and a bounding box.

In some embodiments, the track determining unit is specifically configured to obtain M first motion tracks existing in the target spatial area at the current moment, where M is a positive integer; according to the N first detection frames and the first characteristic information, matching the M first motion tracks with the N first detection frames to obtain P second motion tracks, wherein P is a positive integer; and determining the motion trail of the user in the target space area at the current moment from the P second motion trail.

In some embodiments, the track determining unit is specifically configured to match at least one track of the M first motion tracks with the N first detection frames according to the N first detection frames and the first feature information, so as to obtain the P second motion tracks.

In some embodiments, the track determining unit is specifically configured to perform cascade matching on Q1 first motion tracks and the N first detection frames according to the N first detection frames and the first feature information, so as to obtain a first matching result, where the Q1 first motion tracks are first motion tracks that are matched by consecutive multiframes in the M first motion tracks, and Q1 is a positive integer; respectively carrying out cross-union ratio IOU matching on the unmatched first motion trail and the unmatched first detection frame in the first matching result, and obtaining a second matching result, wherein the Q2 first motion trail is a first motion trail except the Q1 first motion trail in the M first motion trail; and obtaining the P second motion tracks according to at least one of the first motion tracks matched in the first matching result, the first motion tracks matched in the second matching result and the first detection frames which are not matched in the second matching result.

In some embodiments, the track determining unit is specifically configured to obtain, according to the confidence degrees of the first detection frames, K1 high confidence detection frames and K2 low confidence detection frames from the N first detection frames, where a sum of the K1 and the K2 is smaller than or equal to the N; for each high-confidence detection frame in the K1 high-confidence detection frames, matching the high-confidence detection frame with the M first motion tracks respectively to obtain a third matching result; for each low confidence detection frame in the K2 low confidence detection frames, matching the low confidence detection frame with a first motion track which is not matched in the third matching result respectively to obtain a fourth matching result; and obtaining P second motion tracks according to the first motion track matched in the third matching result, the first motion track matched in the fourth matching result and at least one of the high confidence detection frames which are not matched in the third matching result.

In some embodiments, the track determining unit is specifically configured to predict a j detection frame of a j track at a current time; matching the area of the ith detection frame with the area of the jth detection frame to obtain a first matching value between the ith detection frame and the jth track; determining a target matching result of the ith detection frame and the jth track according to a first matching value between the ith detection frame and the jth track; if the ith detection frame is the high-confidence detection frame and the jth track is one first track in the M first motion tracks, the target matching result is a third matching result; and if the ith detection frame is the low-confidence detection frame and the jth track is the first motion track which is not matched in the third matching result, the target matching result is a fourth matching result.

In some embodiments, the track determining unit is specifically configured to determine, from the first feature information, a first feature value corresponding to the i-th detection frame; extracting a second characteristic value corresponding to the j-th detection frame; determining a distance between the first characteristic value and the second characteristic value as a second matching value between the ith detection frame and the jth track; and determining a target matching result of the ith detection frame and the jth track according to the first matching value and the second matching value between the ith detection frame and the jth track.

In some embodiments, the track determining unit is specifically configured to update the first motion track matched in the third matching result and the first motion track matched in the fourth matching result to obtain a second motion track; and creating a new second motion trail for the unmatched high-confidence detection frame in the third matching result.

In some embodiments, the track determining unit is specifically configured to determine, as the motion track of the user in the target space area at the current moment, a second motion track corresponding to the identifier of the user in the P second motion tracks.

In some embodiments, the bill determining unit is further configured to determine a time that the user stays at the shelf according to the movement track of the user; determining a consumption level of the user for goods on the goods shelf according to the residence time of the user at the goods shelf; and tracing suspicious bills according to the consumption grade of the user for the goods on the goods shelf and the movement track of the user.

Optionally, the longer the user stays at the shelf, the higher the consumer level the user spends with respect to the goods on the shelf.

In a third aspect, an electronic device is provided that includes a processor and a memory. The memory is configured to store a computer program, and the processor is configured to invoke and execute the computer program stored in the memory, so as to perform the method in the first aspect or each implementation manner thereof.

In a fourth aspect, a chip is provided for implementing the method in any one of the first to second aspects or each implementation thereof. Specifically, the chip includes: a processor for calling and running a computer program from a memory, causing a device on which the chip is mounted to perform the method as in the first aspect or implementations thereof described above.

In a fifth aspect, a computer-readable storage medium is provided for storing a computer program, the computer program causing a computer to perform the method of the first aspect or each implementation thereof.

In a sixth aspect, a computer program product is provided, comprising computer program instructions for causing a computer to perform the method of the first aspect or implementations thereof.

In a seventh aspect, there is provided a computer program which, when run on a computer, causes the computer to perform the method of the first aspect or implementations thereof described above.

In summary, the method and the device determine the motion trail of the user in the target space area; according to the motion trail of the user, when the user is detected to enter a checkout area of the target space area, acquiring commodities consumed by the user in the target space area and acquiring consumption activity information corresponding to the target space area; and determining a consumption bill of the user according to the commodity consumed by the user and the consumption activity information. The intelligent goods shelf can effectively solve the dilemma that the intelligent goods shelf can only independently settle goods on the goods shelf and can not be combined with consumption activity information corresponding to the target space region, so that the consumption cost of a user is reduced, and the consumption experience of the user is improved. In addition, the settlement time of the user after the goods are taken by different shelves is intelligently identified through track tracking, so that a plurality of intelligent shelves can be well utilized to build intelligent unmanned supermarkets supporting more kinds of operation strategies.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application scenario of the present application;

FIG. 2 is a flow chart of a bill processing method according to an embodiment of the present disclosure;

FIG. 3 is a system block diagram according to an embodiment of the present application;

fig. 4 is a schematic view of an application scenario in an embodiment of the present application;

FIG. 5 is a flow chart of a billing method according to an embodiment of the present application;

FIG. 6 is a diagram illustrating a network structure of a JDE model according to an embodiment of the present application;

fig. 7 is a schematic flow chart of determining a motion trajectory according to an embodiment of the present application;

FIG. 8 is a schematic diagram of another process for determining a motion trajectory according to an embodiment of the present application;

FIG. 9 is a schematic block diagram of a billing apparatus provided in an embodiment of the present application;

fig. 10 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be understood that in embodiments of the present invention, "B corresponding to a" means that B is associated with a. In one implementation, B may be determined from a. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.

In the description of the present application, unless otherwise indicated, "a plurality" means two or more than two.

In addition, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

In order to facilitate understanding of the embodiments of the present application, the following brief description will be first given to related concepts related to the embodiments of the present application:

artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Embodiments of the present application relate to billing, for example, to self-service billing for an unmanned vending area.

Fig. 1 is a schematic diagram of an application scenario in the present application, and as shown in fig. 1, the target space region may be a storefront of a convenience store or a supermarket, or a partial region in the storefront of the convenience store or the supermarket. In this target area, a plurality of self-service racks 110 (e.g., intelligent racks) are placed, and self-service racks 110 are communicatively connected to server 120. The user terminal 130 has an associated user terminal installed thereon, which is communicatively connected to the server 120.

Goods are placed on the self-service goods shelf 110, and when the user 140 takes the goods on the self-service goods shelf 110, the self-service goods shelf 110 collects the goods information taken by the user 140 and sends the goods information taken by the user 140 to the server 120. The server 120 generates a bill and sends the bill to the user terminal 130, and deducts a corresponding amount from the user 140 account through the user terminal 130. The user 140 can view the bill of the goods consumed through the user side.

In some embodiments, the user terminal 130 may be a smart phone, tablet, notebook, smart wearable device (e.g., smart watch, smart helmet, smart glasses, etc.), or the like.

Server 120 may be one or more. When the server 120 is plural, there are at least two servers for providing different services and/or there are at least two servers for providing the same services, which are not limited in the embodiment of the present application. The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms. The server 120 may also become a node of the blockchain.

The user terminal 130 and the server 120 may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.

The unmanned self-service shelf 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.

In some embodiments, the self-service unmanned shelf 110 may be an unmanned self-service vending cabinet, in which a camera is mounted on a door frame of the cabinet, and when a user opens a cabinet door to take goods, the camera on the door frame may collect an image of the goods taken by the user and send the collected image of the goods to the server 120. The server 120 recognizes the goods image, further recognizes the goods taken by the user, and generates a bill for the goods.

In some embodiments, the self-service automated shelf 110 may be an unmanned self-service vending counter, optionally at any location on the counter, or the surrounding position of the counter, or the bottom of the goods is provided with goods collecting equipment, such as a camera or an inductor, etc. When the user takes the goods, the goods collection device transmits the collected information to the server 120, and the server 120 generates a bill for the goods.

At present, in convenience stores and supermarkets, a single intelligent goods shelf intelligently identifies and calculates the consumption of goods on the goods shelf, and after a user leaves the goods shelf, settlement can be carried out, and linkage operation activities with other goods in the store can not be carried out. Even if the purpose of linkage operation is achieved by combining the consumption orders of the same store within a period of time, on one hand, each new intelligent shelf is required to be included in calculation, on the other hand, the time interval of the combination calculation cannot be effectively evaluated, if the time interval is too short, the user may not leave the store and settle, and if the time interval is too long, the user may enter the store for a second time and still be calculated as one consumption. Therefore, the intelligent operation problem of a store multi-shelf can not be well solved by the prior art. That is, current unmanned self-service vending machines cannot be combined with consumer activities (e.g., full deactivation), which in turn results in increased consumer costs for the user.

In order to solve the above technical problems, the present application proposes a bill processing method, which uses an individual detection and track tracking algorithm to temporarily record purchased goods when a user takes the goods on each shelf, and does not perform unified goods calculation and fee deduction until the user is detected to enter a checkout area of a target space area. On one hand, the consumption record of the intelligent goods shelf can be laterally verified according to the track, and on the other hand, store preferential activities such as full reduction and the like can be executed on the combination of goods on the multiple goods shelves, so that the consumption cost of a user is reduced, and the consumption experience of the user is improved.

The following describes the technical solutions of the embodiments of the present application in detail through some embodiments. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 2 is a flow chart of a bill processing method according to an embodiment of the present application. The execution body of the embodiment of the present application is a device having a bill processing function, for example, a bill processing device, which may be the server in fig. 1.

As shown in fig. 2, the method of the embodiment of the present application includes:

s201, determining a motion trail of a user in a target space area.

S202, acquiring commodities consumed by a user in a target space area and acquiring consumption activity information corresponding to the target space area when detecting that the user enters a checkout area of the target space area according to a motion track of the user.

S203, determining a consumption bill of the user according to the commodity consumed by the user and the consumption activity information.

According to the bill processing method provided by the embodiment of the application, after the server detects that the user enters the target space area, the server tracks the user and generates the motion trail of the user in the target space area in real time. At the same time, the items consumed by the user in the target space region are recorded, for example, the items taken from the shelves by the user are recorded. And according to the motion trail of the user, when the user is detected to enter a checkout area of the target space area, acquiring recorded commodities consumed by the user in the target space area, and acquiring consumption activity information corresponding to the target space area. A consumer bill for the user is generated based on the merchandise consumed by the user in the target spatial region and the consumption activity information (e.g., full deactivation).

In some embodiments, after the server generates the consumption bill of the user according to the above method, the consumption bill is sent to the user, for example, through the user terminal. After the user confirms that the consumption bill is correct, the server carries out corresponding deduction.

In some embodiments, the server generates a consumer bill for the user and pays the fee, and sends the consumer bill and the fee deduction information to the user.

The target space region merchant can set the target space region merchant according to the requirement.

In one example, the target space region may be an entire store region such as a supermarket or convenience store or mall.

In one example, the target space region may be a partial region of a store such as a supermarket, a convenience store, or a mall, for example, the target space region is a vegetable region in the supermarket.

The checkout area merchant of the target space area can be set according to actual needs.

In one example, the checkout area may be at the exit of the target spatial area.

In one example, the checkout area may be a particular area within the target space area, such as an area near the exit.

Fig. 3 is a system structural diagram according to an embodiment of the present application, as shown in fig. 3, including: the system comprises a user, an intelligent shelf, a tracking device, an intelligent shelf background service, a track tracking calculation service, a store order calculation service and a store movement living matter service.

Alternatively, the intelligent shelf background service, the track tracking calculation service, the store order calculation service and the store movement living things service can be all located on one server.

Alternatively, the track following calculation service and the store order calculation service may be located on one server, and the intelligent shelf background service and the store moving living thing service may be located on another one or two servers.

As shown in FIG. 3, the purchase information of the user is collected through each intelligent shelf and is uniformly transmitted to the intelligent shelf background service for recording, while the track information of the user is collected by one or more tracking devices and is transmitted to the track tracking calculation service for recording when the user enters a store, stays on which shelves and leaves the store. After leaving the store, the intelligent goods shelf background service and the track tracking calculation service can gather information to the store order calculation service to perform unified consumption settlement, and meanwhile, the store order calculation service can acquire consumption activity information from a store operation activity service to process and generate a consumption bill of a user. Maximally, the consumption bill of the user is pushed to the knowledge of the user.

The method for determining the motion trail of the user in the target space region in S201 is not limited in the embodiment of the present application.

In one example, the trajectory of the user's motion within the target spatial region may also be determined by carrying a particular chip on the user, positioning the chip. For example, when a user enters a target space region, the user needs to carry a positioning card, the positioning card carries a positioning chip, and the positioning card is connected with a background server. The positioning card reports the position information of the user in the target space area to the server in real time, and the server generates a motion trail of the user in the target space area according to the position information reported by the positioning card.

In this example, the tracking device in fig. 3 is the locator card.

In one example, an image of a user in a target space area is acquired through a camera, and a server identifies the image acquired by the camera to generate a motion trail of the user in the target space area.

In some examples, the motion trail of the user in the target space area may also be generated according to other existing tracking methods, which is not limited in the embodiment of the present application.

In some embodiments, the generated motion trail also records the residence time of the user at the corresponding shelf, and based on this, the embodiments of the application further include the following steps:

step 1, determining the stay time of the user at the goods shelf according to the movement track of the user.

And 2, determining the consumption grade of the user for the goods on the goods shelf according to the residence time of the user at the goods shelf.

In some embodiments, the longer the user remains at the shelf, the higher the user's consumption level for the goods on the shelf, e.g., the ratio of the user's time to some positive number at the shelf is determined as the user's consumption level for the goods on the shelf.

In some embodiments, the different time periods of stay correspond to different consumption levels, e.g., the time of stay corresponds to the consumption level as shown in table 1:

TABLE 1

Residence time period	Consumer grade
		[a1，a2)	A1
[a2，a3)	A2
		……	……
[an-1，an)	An-1

As is clear from table 1, since it is assumed that the time t2, t2 for which the user stays on the shelf is within the time period [ A2, a3 ] for one shelf, it is possible to determine that the consumer class for the goods on the shelf is A2.

And step 3, tracing suspicious bills according to the consumption level of the user for the goods on the goods shelf and the movement track of the user.

For example, when the bill can be traced, if the consumption level of the user for the goods on the shelf is high, but the motion trail of the user does not appear in the checkout area of the target space area, the bill can be determined to be a suspicious bill for tracing.

The consumption grade of the user on different shelves is recorded and used as an analysis evidence of intelligent shelf consumption data for checking.

By way of example, the method of the embodiment of the application is applied to an unmanned vending scene, and at an application layer, a series of track monitoring and behavior recording rules are established for the unmanned vending scene.

The special area is marked in the unmanned supermarket, as shown in fig. 4, the special area comprises a goods area and a checkout area, an exit area can be set, and after the user enters a store, the user arrives at the checkout area again, and the settlement is considered to be carried out, the monitoring target is the user, and the curve track is calculated by an algorithm to obtain the movement track of the user.

According to the method and the device, besides the movement track of the user judges the settlement time of the user leaving the store, the consumption level of the user is obtained by calculating the time that the user stays on a certain goods shelf, and the longer the stay time is, the higher the consumption level of goods on the certain goods shelf is. The grade can be used as an auxiliary analysis basis for tracing suspicious bills of the unmanned supermarket, for example, by inquiring different suspicious grades of the tracks in a certain time period, the quick positioning of the staggered account source is achieved, and basic capability support is provided for the construction of the unmanned supermarket. Through test experiment analysis, the tracking accuracy of the method in the region with good light conditions on the track can reach more than 95%, and the accuracy of region intrusion and consumption level circulation is also verified.

According to the bill processing method provided by the embodiment of the application, the movement track of the user in the target space area is determined; according to the motion trail of the user, when the user is detected to enter a checkout area of the target space area, acquiring commodities consumed by the user in the target space area and acquiring consumption activity information corresponding to the target space area; and determining a consumption bill of the user according to the commodity consumed by the user and the consumption activity information. The intelligent goods shelf can effectively solve the dilemma that the intelligent goods shelf can only independently settle goods on the goods shelf and can not be combined with consumption activity information corresponding to the target space region, so that the consumption cost of a user is reduced, and the consumption experience of the user is improved. In addition, the settlement time of the user after the goods are taken by different shelves is intelligently identified through track tracking, so that a plurality of intelligent shelves can be well utilized to build intelligent unmanned supermarkets supporting more kinds of operation strategies.

The method for determining the motion trajectory of the user in the target space region in S201 is described in detail below with reference to specific embodiments.

Fig. 5 is a flowchart of a bill processing method according to an embodiment of the present application. The execution body of the embodiment of the present application is the server in fig. 1.

As shown in fig. 5, the above S201 includes steps S501 to S503 as follows:

s501, acquiring a first image of a user in a target space area acquired by a camera at the current moment.

The camera of the embodiment of the application acquires the first image of the user in the target space area in real time.

S502, detecting targets of the first image to obtain N first detection frames and first characteristic information of the first image.

The method for detecting the target object in the S502 to obtain the N first detection frames and the first feature information of the first image is not limited.

In one example, the object in the first image is detected by the object detection model, so as to obtain an object detection result of the first image.

Optionally, the target object is a user, that is, the person shape in the first image is checked.

Optionally, the target object includes a user, that is, the target object to be detected may include other objects besides the user, which is not limited in this application.

In this example, the specific network structure of the object detection model is not limited, and for example, the object detection model may be a graph neural network, an image convolution neural network, an countermeasure network, a self-encoding/decoding network, or the like.

The tracking scheme of the track tracking calculation algorithm part can be split into a target object detection model with humanoid detection parameters and a multi-target tracking algorithm structure. The target object detection model can detect the boundary frame of a humanoid target and extract the appearance characteristics of the target on an input visual image, and then the target is matched with the target motion track based on a multi-target tracking algorithm. In the application, a system is built outside the basic algorithm capability interface, and the track and shopping behaviors are monitored.

The technical details of the application are developed into three layers: the target object detection model training, tracking algorithm logic layer and behavior monitoring application layer.

In one possible implementation, the object detection model is JDE (Jointly learns the detector and embedding model, joint learning detector and embedded model).

At this time, S502 includes: and inputting the first image into the JDE model to obtain N first detection frames and first characteristic information output by the JDE model.

In some embodiments, the network structure of the JDE model may be any existing structure, for example, the input end of the JDE model includes a Focus module for extracting feature information of different scales of the first image.

In some embodiments, in order to reduce the operand of the present application, a convolution layer is used in the JDE model of the present application to replace a Focus module with a higher requirement on the operand. In addition, the detection result maximum value suppression part is put into a subsequent tracking algorithm logic layer to realize.

In some embodiments, at least one activation function in the JDE model is a modified linear unit ReLU activation function. That is, the present embodiment replaces the computationally intensive silu activation function with the less computationally intensive ReLU activation function.

In some embodiments, the JDE model is trained by using a training picture sequence data set with a human shape during training, and a human shape identifier and a bounding box are marked in a training picture, so that the JDE model after training can output an identified human shape detection box.

Fig. 6 is a schematic diagram of a network structure of a JDE model according to an embodiment of the present application, and as shown in fig. 6, the JDE model is composed of a convolutional layer (Conv), a CBR module, a CSP1-1 module, a CSP1-3 module, a CSP2-1 module, a spatial pyramid pooling (Spatial Pyramid Pooling, SPP) layer, and an upsampling unit (UpSample).

Fig. 6 shows that the CBR module consists of a convolution layer, a batch normalization (Batch Normalization, BN) layer and an activation function (Rulu). The CSP1-1 module and the CSP1-3 module are respectively composed of a CBR module, a plurality of residual units, a convolution layer, a BN layer and an activation function. The CSP2-1 module consists of a plurality of CBR modules, a convolution layer, a BN layer and an activation function.

As can be seen from fig. 6, the size of the first image is assumed to be 1088X608X3, wherein 1088X608 is the length-width dimension of the first image, and 3 is the 3 channels of RGB. The first image with the size of 1088X608X3 is input into the JDE model shown in fig. 6, where the OutPut of the JDE model includes three branches of a detection frame and one branch of feature_map, for example, in fig. 6, the detection frame information OutPut by the first branch of the detection frame out put1 is 76X136X6, where 76X136 can be understood as the size of the detection frame, 6 represents parameters with 6 different dimensions, 4 parameters represent the position information of the detection frame, 1 parameter represents the class of the object in the detection frame, and 1 parameter represents the confidence that the object in the detection frame is the class. As shown in fig. 6, the detection frame information OutPut by the second detection frame branch out put2 is 38X68X6, where 38X68 is the size of the detection frame, and the meaning indicated by 6 is the same as that indicated by 6 in the detection frame information OutPut by the first detection frame branch. As shown in fig. 6, the third detection frame branch out put3 outputs 19X34X6, where 19X34 is the size of the detection frame, and the meaning of 6 is the same as above.

As shown in fig. 6, the JDE model in the embodiment of the present application outputs, in addition to the above detection frame, first feature information of the first image, and optionally, the first feature information may be a feature map as shown in table 5. The size of the first feature information may be set, and the size of the first feature information shown in fig. 6 is 76X136X512, which is just an example, and the present application includes but is not limited to this example.

It should be noted that fig. 6 indicates an example of a JDE model according to an embodiment of the present application, where the JDE model includes, but is not limited to, those shown in fig. 6, and for example, related modules in fig. 6 may be deleted or added, or replaced, etc.

According to the above steps, after obtaining N first detection frames of the first image and the first feature information of the first image, the following step S503 is performed.

S503, determining the motion trail of the user in the target space area at the current moment according to the N first detection frames and the first characteristic information.

According to the N first detection frames and the first characteristic information, a multi-target tracking method is adopted to determine the motion trail of the user in the target space area at the current moment.

Specific implementations of S503 include, but are not limited to, the following:

In a first mode, a motion trail generation model is adopted to predict a motion trail of a user in a target space area at the current moment, specifically, N first detection frames and first characteristic information obtained by detection at the current moment are input into the motion trail generation model, and a motion trail of the user in the target space area at the current moment output by the motion trail generation model is obtained.

And secondly, determining the motion trail of the user in the target space area at the current moment according to the N first detection frames and the first characteristic information by a trail matching algorithm. Specifically, the step S503 includes the following steps S503-A to S503-C:

S503-A, obtaining M first motion tracks existing in a target space region at the current moment, wherein M is a positive integer.

In some embodiments, the M first motion trajectories existing in the target space region at the current time include M first motion trajectories generated by updating at the previous time and stored unmatched trajectories.

S503-B, according to the N first detection frames and the first characteristic information, matching the M first motion tracks with the N first detection frames to obtain P second motion tracks, wherein P is a positive integer.

The implementation of S503-B includes, but is not limited to, the following:

In one aspect, the S503-B includes the following S503-B1: and matching at least one track in the M first motion tracks with the N first detection frames according to the N first detection frames and the first characteristic information to obtain P second motion tracks.

In one implementation manner of the first mode, any one or more tracks in the M first motion tracks are respectively matched with the N foreign first detection frames one by one.

In another implementation manner of the first embodiment, the step S503-B1 includes the following steps S503-B11 and S503-B13:

S503-B11, according to N first detection frames and first characteristic information, performing cascade matching on Q1 first motion tracks and N first detection frames respectively to obtain a first matching result, wherein the Q1 first motion tracks are first motion tracks matched by continuous multiframes in the M first motion tracks, and Q1 is a positive integer;

S503-B12, respectively carrying out cross-point ratio (Intersection Over Union, IOU) matching on the first motion trail which is not matched and the first detection frame which is not matched in the first matching result with the Q2 first motion trail to obtain a second matching result, wherein the Q2 first motion trail is the first motion trail except the Q1 first motion trail in the M first motion trails;

S503-B13, obtaining P second motion tracks according to at least one of the first motion tracks matched in the first matching result, the first motion tracks matched in the second matching result and the first detection frames which are not matched in the second matching result.

Specifically, the M first motion trajectories are classified according to the number of matching times, for example, first motion trajectories of consecutive multiframes (for example, 3 frames) in the M first motion trajectories are obtained, Q1 first motion trajectories are obtained, and first motion trajectories except for the Q1 first motion trajectories in the M first motion trajectories are recorded as Q2 first motion trajectories. And respectively carrying out cascade matching on each first track in the Q1 first motion tracks and the N first detection frames according to the N first detection frames and the first characteristic information to obtain a first matching result. The first matching result includes 3 results, namely, track matching, track unmatching and detection frame unmatching.

In some embodiments, in S503-B11, cascade matching is performed on the Q1 first motion trajectories and the N first detection frames, so as to obtain a first matching result, where the step includes: predicting a second inspection frame 1 of the first motion track 1 at the current moment aiming at the first motion track 1 in the Q1 first motion tracks, and extracting second characteristic information corresponding to the second inspection frame (for example, inputting the second inspection frame into a neural network to obtain second characteristic information output by the neural network); for a first detection frame 1 in the N first detection frames, determining second characteristic information corresponding to the first detection frame 1 from the first characteristic information; and matching the second characteristic information corresponding to the first detection frame 1 with the second characteristic information corresponding to the second detection frame 1, for example, calculating the distance between two characteristics to obtain a characteristic matching value 1.

Optionally, determining a first matching result between the first detection frame 1 and the first motion track 1 according to the feature matching value 1

Alternatively, the motion match value 1 between the first detection frame 1 and the second detection frame 1 is calculated, for example, by a mahalanobis distance calculating the motion match value 1 between the first detection frame 1 and the second detection frame 1. Then, a weighted sum of the feature matching value 1 and the motion matching value 1 is calculated, and a first matching result between the first detection frame 1 and the first motion trail 1 is determined according to the weighted sum.

In some embodiments, in S503-B11, performing IOU matching on the unmatched first motion track and the unmatched first detection frame in the first matching result with the Q2 first motion tracks, to obtain a second matching result includes the following steps: for a first motion track 2 in the Q2 first motion tracks, predicting a second check frame 2 of the first motion track 2 at the current moment, calculating an IOU value between the first check frame 2 and the second check frame 2, specifically calculating a ratio of an intersection and a union of the first check frame 2 and the second check frame 2, namely, the overlapping degree of the first check frame 2 and the second check frame 2, and determining a second matching result between the first check frame 2 and the first motion track 2 according to the overlapping degree.

The first matching result and the second matching result may each include 3 results, which are respectively that the track is matched, the track is not matched, and the detection frame is not matched.

And then, generating P second motion tracks of the target space region at the current moment according to the first matching result and the second matching result.

In one form, S503-B13 includes: updating the first motion trail matched in the first matching result and the first motion trail matched in the second matching result to obtain a second motion trail; and creating a new second motion trail for the first detection frame which is not matched in the second matching result.

Optionally, for the first motion track that is not matched in the first matching result and the second matching result, if the number of times that the first motion track is not matched is smaller than a preset value (for example, 30 frames), the first motion track is stored as one track of the P second motion tracks, and track matching at the next moment is performed.

In some embodiments, the embodiments of the present application use a multi-target tracking deep (Deep Simple Online And Realtime Tracking, depth simple online real-time tracking) algorithm to track the track, and generate P second motion tracks.

The JDE module detects a bounding box of a human-shaped object of the input first image and extracts first feature information of the first image, and then inputs the detection box and the first feature information into a deep start algorithm. As shown in fig. 7, matching between the detection frame and the first motion track is performed based on the deep sort algorithm, specifically, cascade Matching (Matching Cascade) is performed on the (confirmed) Q1 first motion tracks determined in the M first motion tracks and the N first detection frames, so as to obtain a first Matching result, where the first Matching result includes 3 results of an unmatched track (unmatched tracks), an unmatched detection frame (unmatched detections), and a matched track (matched tracks). As shown in fig. 7, next, the unmatched first motion track and the unmatched first detection frame in the first matching result are respectively subjected to IOU matching with the Q2 first motion tracks, so as to obtain a second matching result, where the second matching result includes 3 results of the unmatched tracks (unmatched tracks), the unmatched detection frame (unmatched detections), and the matched tracks (matched tracks). And deleting the first motion trail which is not matched in the second matching result when the first motion trail which is not matched is marked as unconfirmed or the number of times which is not matched is larger than the preset maximum number of times (max_age). If the number of times of unmatched first motion tracks is smaller than the preset maximum number of times (max_age), the unmatched first motion tracks are reserved and recorded as second motion tracks. For a non-matched detection box in the second matching result, if the confidence of the detection box is high, a new track is generated for the detection box, and the new track is marked as a second motion track, but marked as 'unconfirmed' (because the targets are possibly noise output by a detector). When 3 consecutive frames get a match, then confirm that this new track is valid, labeled 'confirmed'; otherwise, the noise trace is considered to be a "delayed" trace. And adding the matched first detection frame to the matched track in the first matching result and the second matching result, and updating by using a Kalman filter to obtain a second motion track.

One implementation of generating the second motion profile in S503-B is described above, and another implementation of generating the second motion profile in S503-B is described below.

In a second aspect, the step S503-B includes steps S503-B21 to S503-B23 as follows:

S503-B21, obtaining K1 high confidence detection frames and K2 low confidence detection frames from N first detection frames according to the confidence degrees of the first detection frames, wherein the sum of K1 and K2 is smaller than or equal to N.

The above-mentioned information about the first detection frame includes a probability that the target in the first detection frame is humanoid, and the probability is recorded as the confidence level of the first detection frame. The higher the probability that the object within the first detection frame is humanoid, the higher the confidence of the first detection frame.

Based on this, K1 high confidence detection frames and K2 low confidence detection frames may be selected from the N first detection frames according to the confidence level of the first detection frames, where the sum of K1 and K2 is less than or equal to N. Optionally, the K1 high confidence detection frames are detection frames with confidence degrees greater than a first threshold in the N first detection frames, and the K2 low confidence detection frames are detection frames with confidence degrees less than a second threshold in the N first detection frames. Optionally, the first threshold is equal to the second threshold, and optionally, the first threshold is greater than the second threshold.

S503-B22, matching the high-confidence detection frames with M first motion tracks respectively aiming at each high-confidence detection frame in the K1 high-confidence detection frames to obtain a third matching result.

For example, each high confidence detection frame in the K1 high confidence detection frames is matched with each first moving track in the M first moving tracks, and a third matching result is obtained.

In one example, a cascade matching mode is adopted to match the high-confidence detection frame with the first running track, so as to obtain a third matching result.

In another example, an IOU matching mode is adopted to match the high-confidence detection frame with the first running track, and a third matching result is obtained.

In another example, a cascade matching and an IOU matching mode is adopted to match the high-confidence detection frame with the first running track, and a third matching result is obtained.

Optionally, other matching methods may be adopted to match the high confidence detection frame with the first running track to obtain a third matching result, which is not limited in this step.

S503-B23, matching the low confidence detection frames with the first motion tracks which are not matched in the third matching result respectively aiming at each of the K2 low confidence detection frames, so as to obtain a fourth matching result.

For example, each high confidence detection frame in the K2 low confidence detection frames is matched with each first moving track in the first moving tracks which are not matched in the third matching result, so that a fourth matching result is obtained.

In one example, a cascade matching mode is adopted to match the low-confidence detection frame with the first running track, so as to obtain a fourth matching result.

In another example, an IOU matching mode is adopted to match the low-confidence detection frame with the first running track, and a fourth matching result is obtained.

In another example, a cascade matching and an IOU matching mode is adopted to match the low-confidence detection frame with the first running track, so that a fourth matching result is obtained.

Optionally, other matching methods may be adopted to match the low confidence detection frame with the first running track to obtain a fourth matching result, which is not limited in this step.

In the tracking algorithm of the mode, the traditional mode of directly discarding low-level frames and carrying out matching tracking on high-level frames higher than a confidence threshold is improved, N first detection frames output by a JED model are classified according to confidence, the high-confidence detection frames are matched with M first motion tracks, and then the low-confidence detection frames are matched with the motion tracks which are not matched by the high-confidence detection frames (namely, the tracks prone to being interrupted by the frame). After the matching step is finished, the motion track which is not matched yet enters a retaining link, a new track is created by the high-confidence detection frame which is not matched yet, the low-confidence detection frame which is not matched yet is directly abandoned, and experiments show that the tracking method in the mode has the advantages that the track tracking integrity and the track continuity are improved compared with the method of directly abandoning the low-confidence detection frame in the past, for example, the situation that the user is incomplete in the first image can be realized, and the accurate tracking of the user can be realized.

In some embodiments, the manner in which the high confidence detection frames are respectively matched with the M first motion trajectories in S503-B22 to obtain the third matching result and the manner in which the low confidence detection frames are respectively matched with the first motion trajectories not matched in the third matching result in S503-B23 to obtain the fourth matching result may be different.

In some embodiments, the manner of matching the high confidence detection frame with the M first motion trajectories in S503-B22 to obtain the third matching result and the manner of matching the low confidence detection frame with the first motion trajectories not matched in the third matching result in S503-B23 to obtain the fourth matching result may be the same.

Another matching method in S503-B22 and/or S503-B23 is described below, and for convenience of description, the matching process of the ith detection frame and the jth track is described below as an example. If the ith detection frame is a high-confidence detection frame and the jth track is one of the M first motion tracks, the following target matching result is a third matching result; if the ith detection frame is a low confidence detection frame and the jth track is a first motion track which is not matched in the third matching result, the following target matching result is a fourth matching result.

Illustratively, the matching process of the ith detection box and the jth trace includes the steps of:

and A1, predicting a j detection frame of the j track at the current moment.

For example, the j-th track is input to the neural network, the detection frame at the current time is predicted, and for convenience of description, the predicted detection frame is referred to as the j-th detection frame.

For another example, according to the running direction of the jth track, the detection frame of the jth track at the current moment is predicted and is recorded as the jth detection frame.

And A2, matching the area of the ith detection frame with the area of the jth detection frame to obtain a first matching value between the ith detection frame and the jth track.

For example, the reciprocal of the ratio of the area of the i-th detection frame to the area of the j-th detection frame is determined as the first matching value between the i-th detection frame and the j-th track.

For another example, the reciprocal of the difference between the area of the ith detection frame and the area of the jth detection frame is determined as a first matching value between the ith detection frame and the jth track.

For another example, IOU matching is performed on the area of the ith detection frame and the area of the jth detection frame to obtain a first matching value between the ith detection frame and the jth track.

Alternatively, other methods may be used to match the area of the ith detection frame with the area of the jth detection frame to obtain the first matching value between the ith detection frame and the jth track, which is not limited in this step.

And A3, determining a target matching result of the ith detection frame and the jth track according to the first matching value between the ith detection frame and the jth track.

The implementation manner of the step A3 includes, but is not limited to, the following:

in a first implementation manner, a target matching result of the ith detection frame and the jth track is determined according to the magnitude of the first matching value between the ith detection frame and the jth track. For example, if the first matching value between the ith detection frame and the jth track is greater than the threshold value a, it is determined that the ith detection frame is matched with the jth track, if the first matching value between the ith detection frame and the jth track is less than the threshold value a, it is determined that the ith detection frame is not matched with the jth track, or if the first matching value between the ith detection frame and the jth track is less than the threshold value b, it is determined that the ith detection frame is not matched. Optionally, the threshold b is smaller than the threshold a.

In the method, after an object is blocked, the situation that the object cannot be detected by a detection algorithm is detected on the tracking algorithm level, and the matching capability of the object to the appeared object is maintained in a life cycle by utilizing first characteristic information provided by a JDE model. Based on this, the above step A3 further includes the following second implementation manner.

In a second implementation manner, the step A3 includes the following steps:

and A31, determining a first characteristic value corresponding to the ith detection frame from the first characteristic information.

For example, the center of the ith detection frame is determined, and the feature value corresponding to the center position of the ith detection frame in the first feature information is determined as the first feature value corresponding to the ith detection frame, where the first feature value includes a feature.

For another example, the first feature information is a feature map, and the feature value corresponding to the i-th detection frame area in the first feature information is determined as a first feature value corresponding to the i-th detection frame, where the first feature value includes a plurality of feature values.

And A32, extracting a second characteristic value corresponding to the j-th detection frame.

For example, the second characteristic value corresponding to the j-th detection frame is extracted through the depth network.

Optionally, the second eigenvalue is an eigenvector of unit norm.

Optionally, the second feature value corresponding to the jth detection frame is a feature value of the detection frame matched with the last frame in the jth track.

And step A33, determining the distance between the first characteristic value and the second characteristic value as a second matching value between the ith detection frame and the jth track.

In one example, a cosine distance between the first eigenvalue and the second eigenvalue is determined as a second matching value between the i-th detection frame and the j-th track.

In one example, a mahalanobis distance between the first feature value and the second feature value is determined as a second matching value between the i-th detection box and the j-th track.

For example, a second matching value D between the ith detection frame and the jth track is determined according to the following formula (2) _M (x)：

Where X is an input parameter, e.g. a first eigenvalue of the ith detection box, and μx is an expected (average) value of X, e.g. a second eigenvalue corresponding to the jth track,

is the covariance matrix of the input x.

Optionally, when the above formula (2) is applied to determine the second matching value between the N detection frames and the different tracks, the above x= (x) _m ,x ₂ ,x ₃ ,x ₄ ,…,x _n ) ^T ，μX＝(μX ₁ ,μX ₂ ,μX ₃ ,μX ₄ ,…,μX _n ) ^T 。

And step A34, determining a target matching result of the ith detection frame and the jth track according to the first matching value and the second matching value between the ith detection frame and the jth track.

For example, a weighted sum of the first and second matching values between the ith detection frame and the jth track is used to determine a target matching result of the ith detection frame and the jth track.

For another example, the arithmetic sum of the first matching value and the second matching value between the ith detection frame and the jth track is used for determining the target matching result of the ith detection frame and the jth track.

According to the above method, a third matching result between the K1 high confidence detection frames and the M first motion trajectories is determined, and a fourth matching result between the K2 low confidence detection frames and the first motion trajectories which are not matched in the third matching result is determined, and then the following steps S503-B24 are executed.

S503-B24, according to at least one of the first motion trail matched in the third matching result, the first motion trail matched in the fourth matching result and the high confidence detection frame not matched in the third matching result, obtaining P second motion trail.

For example, updating the first motion trail matched in the third matching result and the first motion trail matched in the fourth matching result to obtain a second motion trail; and creating a new second motion trail for the unmatched high-confidence detection frame in the third matching result.

In some embodiments, the process of determining P second motion trajectories in the second mode S503-B is shown in fig. 8, and the server obtains K1 high confidence detection frames and K2 low confidence detection frames from the N first detection frames according to the confidence degrees of the first detection frames. The server loads the tracks to obtain M first motion tracks existing at the current moment, and matches the high-confidence detection frames with the M first motion tracks respectively aiming at each of the K1 high-confidence detection frames to obtain a third matching result, wherein the third matching result comprises 3 results of an unmatched track (unmatched tracks), an unmatched detection frame (unmatched detections) and a matched track (matched tracks). As shown in fig. 8, next, for each of the K2 low confidence detection frames, the low confidence detection frame is respectively matched with the first motion track that is not matched in the third matching result, so as to obtain a fourth matching result, where the fourth matching result includes 3 results of the track that is not matched (unmatched tracks), the detection frame that is not matched (unmatched detections), and the track that is matched (matched tracks). And directly deleting the unmatched low-confidence detection frame in the fourth matching result. And deleting the first motion trail which is not matched in the fourth matching result when the first motion trail which is not matched is marked as unconfirmed or the number of times which is not matched is larger than the preset maximum number of times (max_age). If the number of times of unmatched first motion tracks is smaller than the preset maximum number of times (max_age), the unmatched first motion tracks are reserved and recorded as second motion tracks.

For a high confidence detection box that is not matched in the third match result, a new trajectory is generated for the high confidence detection box, denoted as the second motion trajectory, but labeled as 'unconfirmed' (because these targets are likely to be noise output by the detector). When 3 consecutive frames get a match, then confirm that this new track is valid, labeled 'confirmed'; otherwise, the noise trace is considered to be a "delayed" trace. And adding the first detection frame matched with the first motion track in the third matching result and the fourth matching result to the first motion track matched with the first detection frame, and updating by using a Kalman filter to obtain a second motion track.

According to the method, P second motion tracks of the target space region at the current moment can be obtained.

S503-C, determining the motion trail of the user in the target space area at the current moment from the P second motion trail.

For multi-target tracking, the same person can be endowed with a unique identifier (TrackID) from appearance to disappearance in a video picture by the tracking algorithm, and the tracking algorithm can be correctly associated as long as the target is correctly detected, whether the target is blocked, whether the target is severely deformed and is too close to other targets (mutual interference). That is, each track includes an identifier (e.g., a user identifier), so that a motion track of the user in the target space area at the current moment can be determined from the P second motion tracks according to the identifier. For example, the second motion track corresponding to the user identifier in the P second motion tracks is determined as the motion track of the user in the target space area at the current moment.

According to the bill processing method provided by the embodiment of the application, the first image of the user in the target space area acquired by the camera at the current moment is acquired, the first image is subjected to object detection to obtain N first detection frames and first characteristic information of the first image, the motion trail of the user in the target space area at the current moment is determined according to the N first detection frames and the first characteristic information, accurate determination of the motion trail of the user in the target space area is realized, and the accuracy of the consumption bill of the user can be improved when the consumption bill of the user is generated based on the accurately determined motion trail.

The preferred embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described in detail. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be considered as disclosed herein.

It should be further understood that, in the various method embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Method embodiments of the present application are described in detail above in connection with fig. 2-8, and apparatus embodiments of the present application are described in detail below.

Fig. 9 is a schematic block diagram of a bill handling apparatus provided in an embodiment of the present application.

As shown in fig. 9, the billing apparatus 10 may include:

a trajectory determination unit 11 for determining a motion trajectory of a user within a target space region;

an obtaining unit 12, configured to obtain, according to a motion trajectory of the user, when detecting that the user enters a checkout area of the target space area, a commodity consumed by the user in the target space area, and obtain consumption activity information corresponding to the target space area;

and the bill determining unit 13 is used for determining a consumption bill of the user according to the commodity consumed by the user and the consumption activity information.

In some embodiments, the track determining unit 11 is specifically configured to acquire a first image of the user in the target space area acquired by the camera at the current moment; detecting a target object of the first image to obtain N first detection frames and first characteristic information of the first image, wherein the target object comprises the user, and N is a positive integer; and determining the motion trail of the user in the target space area at the current moment according to the N first detection frames and the first characteristic information.

In some embodiments, the track determining unit 11 is specifically configured to input the first image into a JDE model, and obtain the N first detection frames and the first feature information output by the JDE model.

In some embodiments, the track determining unit 11 is specifically configured to obtain M first motion tracks existing in the target spatial area at the current time, where M is a positive integer; according to the N first detection frames and the first characteristic information, matching the M first motion tracks with the N first detection frames to obtain P second motion tracks, wherein P is a positive integer; and determining the motion trail of the user in the target space area at the current moment from the P second motion trail.

In some embodiments, the track determining unit 11 is specifically configured to match at least one track of the M first motion tracks with the N first detection frames according to the N first detection frames and the first feature information, so as to obtain the P second motion tracks.

In some embodiments, the track determining unit 11 is specifically configured to perform cascade matching on Q1 first motion tracks and the N first detection frames according to the N first detection frames and the first feature information, so as to obtain a first matching result, where the Q1 first motion tracks are first motion tracks that are matched by consecutive multiple frames in the M first motion tracks, and Q1 is a positive integer; respectively carrying out cross-union ratio IOU matching on the unmatched first motion trail and the unmatched first detection frame in the first matching result, and obtaining a second matching result, wherein the Q2 first motion trail is a first motion trail except the Q1 first motion trail in the M first motion trail; and obtaining the P second motion tracks according to at least one of the first motion tracks matched in the first matching result, the first motion tracks matched in the second matching result and the first detection frames which are not matched in the second matching result.

In some embodiments, the track determining unit 11 is specifically configured to obtain, according to the confidence levels of the first detection frames, K1 high confidence detection frames and K2 low confidence detection frames from the N first detection frames, where a sum of the K1 and the K2 is smaller than or equal to the N; for each high-confidence detection frame in the K1 high-confidence detection frames, matching the high-confidence detection frame with the M first motion tracks respectively to obtain a third matching result; for each low confidence detection frame in the K2 low confidence detection frames, matching the low confidence detection frame with a first motion track which is not matched in the third matching result respectively to obtain a fourth matching result; and obtaining P second motion tracks according to the first motion track matched in the third matching result, the first motion track matched in the fourth matching result and at least one of the high confidence detection frames which are not matched in the third matching result.

In some embodiments, the track determining unit 11 is specifically configured to predict a j detection frame of a j track at a current time; matching the area of the ith detection frame with the area of the jth detection frame to obtain a first matching value between the ith detection frame and the jth track; determining a target matching result of the ith detection frame and the jth track according to a first matching value between the ith detection frame and the jth track; if the ith detection frame is the high-confidence detection frame and the jth track is one first track in the M first motion tracks, the target matching result is a third matching result; and if the ith detection frame is the low-confidence detection frame and the jth track is the first motion track which is not matched in the third matching result, the target matching result is a fourth matching result.

In some embodiments, the track determining unit 11 is specifically configured to determine, from the first feature information, a first feature value corresponding to the i-th detection frame; extracting a second characteristic value corresponding to the j-th detection frame; determining a distance between the first characteristic value and the second characteristic value as a second matching value between the ith detection frame and the jth track; and determining a target matching result of the ith detection frame and the jth track according to the first matching value and the second matching value between the ith detection frame and the jth track.

In some embodiments, the track determining unit 11 is specifically configured to update the first motion track matched in the third matching result and the first motion track matched in the fourth matching result to obtain a second motion track; and creating a new second motion trail for the unmatched high-confidence detection frame in the third matching result.

In some embodiments, the track determining unit 11 is specifically configured to determine, as the motion track of the user in the target space area at the current time, the second motion track corresponding to the identifier of the user in the P second motion tracks.

In some embodiments, the bill determining unit 13 is further configured to determine, according to the movement track of the user, a time for the user to stay at the shelf; determining a consumption level of the user for goods on the goods shelf according to the residence time of the user at the goods shelf; and tracing suspicious bills according to the consumption grade of the user for the goods on the goods shelf and the movement track of the user.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus shown in fig. 9 may perform the embodiments of the method described above, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for implementing the corresponding method embodiments of the electronic device, which are not described herein for brevity.

The apparatus of the embodiments of the present application are described above in terms of functional modules in conjunction with the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

Fig. 10 is a schematic block diagram of an electronic device provided in an embodiment of the present application, where the electronic device may be a server or a terminal as shown in fig. 2, or a control device in a refrigeration system. The electronic device is configured to perform the above-described model training method embodiment, and/or the refrigeration system control method embodiment.

As shown in fig. 10, the electronic device 30 may include:

a memory 31 and a processor 32, the memory 31 being arranged to store a computer program 33 and to transmit the program code 33 to the processor 32. In other words, the processor 32 may call and run the computer program 33 from the memory 31 to implement the methods in the embodiments of the present application.

For example, the processor 32 may be configured to perform the above-described method steps according to instructions in the computer program 33.

In some embodiments of the present application, the processor 32 may include, but is not limited to:

a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the present application, the memory 31 includes, but is not limited to:

volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the present application, the computer program 33 may be partitioned into one or more modules that are stored in the memory 31 and executed by the processor 32 to perform the methods of recording pages provided herein. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program 33 in the electronic device.

As shown in fig. 10, the electronic device 30 may further include:

a transceiver 34, the transceiver 34 being connectable to the processor 32 or the memory 31.

The processor 32 may control the transceiver 34 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 34 may include a transmitter and a receiver. The transceiver 34 may further include antennas, the number of which may be one or more.

It will be appreciated that the various components in the electronic device 30 are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.

According to an aspect of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.

According to another aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the electronic device to perform the method of the above-described method embodiments.

In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, a flow or function consistent with embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of billing comprising:

determining a motion trail of a user in a target space area;

2. The method of claim 1, wherein determining the trajectory of the user within the target spatial region comprises:

acquiring a first image of the user in the target space area, which is acquired by a camera at the current moment;

detecting a target object of the first image to obtain N first detection frames and first characteristic information of the first image, wherein the target object comprises the user, and N is a positive integer;

and determining the motion trail of the user in the target space area at the current moment according to the N first detection frames and the first characteristic information.

3. The method according to claim 2, wherein the performing object detection on the first image to obtain the N first detection frames and first feature information of the first image includes:

and inputting the first image into a JDE model to obtain the N first detection frames and the first characteristic information output by the JDE model.

4. The method of claim 3, wherein a convolution layer is used in the JDE model to replace a Focus module.

5. The method of claim 3, wherein at least one activation function in the JDE model is a modified linear unit ReLU activation function.

6. The method of claim 3, wherein the JDE model is trained using a training picture sequence dataset having humanoid forms and wherein the training pictures are labeled with humanoid identifiers and bounding boxes.

7. The method according to any one of claims 2-6, wherein determining a motion trajectory of the user within the target spatial region at a current time according to the N first detection frames and the first feature information includes:

obtaining M first motion tracks existing in the target space region at the current moment, wherein M is a positive integer;

according to the N first detection frames and the first characteristic information, matching the M first motion tracks with the N first detection frames to obtain P second motion tracks, wherein P is a positive integer;

and determining the motion trail of the user in the target space area at the current moment from the P second motion trail.

8. The method of claim 7, wherein the matching the M first motion trajectories with the N first detection frames according to the N first detection frames and the first feature information, to obtain P second motion trajectories, includes:

And matching at least one track in the M first motion tracks with the N first detection frames according to the N first detection frames and the first characteristic information to obtain the P second motion tracks.

9. The method of claim 8, wherein the matching at least one track of the M first motion trajectories with the N first detection frames according to the N first detection frames and the first feature information, to obtain the P second motion trajectories, includes:

according to the N first detection frames and the first characteristic information, performing cascade matching on Q1 first motion tracks and the N first detection frames respectively to obtain a first matching result, wherein the Q1 first motion tracks are first motion tracks matched by continuous multiframes in the M first motion tracks, and the Q1 is a positive integer;

respectively carrying out cross-union ratio IOU matching on the unmatched first motion trail and the unmatched first detection frame in the first matching result, and obtaining a second matching result, wherein the Q2 first motion trail is a first motion trail except the Q1 first motion trail in the M first motion trail;

And obtaining the P second motion tracks according to at least one of the first motion tracks matched in the first matching result, the first motion tracks matched in the second matching result and the first detection frames which are not matched in the second matching result.

10. The method of claim 7, wherein the matching the M first motion trajectories with the N first detection frames according to the N first detection frames and the first feature information, to obtain P second motion trajectories, includes:

obtaining K1 high-confidence detection frames and K2 low-confidence detection frames from the N first detection frames according to the confidence degrees of the first detection frames, wherein the sum of the K1 and the K2 is smaller than or equal to the N;

for each high-confidence detection frame in the K1 high-confidence detection frames, matching the high-confidence detection frame with the M first motion tracks respectively to obtain a third matching result;

for each low confidence detection frame in the K2 low confidence detection frames, matching the low confidence detection frame with a first motion track which is not matched in the third matching result respectively to obtain a fourth matching result;

And obtaining P second motion tracks according to the first motion track matched in the third matching result, the first motion track matched in the fourth matching result and at least one of the high confidence detection frames which are not matched in the third matching result.

11. The method according to claim 10, wherein the method further comprises:

predicting a j detection frame of a j track at the current moment;

matching the area of the ith detection frame with the area of the jth detection frame to obtain a first matching value between the ith detection frame and the jth track;

determining a target matching result of the ith detection frame and the jth track according to a first matching value between the ith detection frame and the jth track;

if the ith detection frame is the high-confidence detection frame and the jth track is one first track in the M first motion tracks, the target matching result is a third matching result; and if the ith detection frame is the low-confidence detection frame and the jth track is the first motion track which is not matched in the third matching result, the target matching result is a fourth matching result.

12. The method of claim 11, wherein determining a target match for the ith detection box to the jth track based on the first match value between the ith detection box and the jth track comprises:

determining a first characteristic value corresponding to the ith detection frame from the first characteristic information;

extracting a second characteristic value corresponding to the j-th detection frame;

determining a distance between the first characteristic value and the second characteristic value as a second matching value between the ith detection frame and the jth track;

and determining a target matching result of the ith detection frame and the jth track according to the first matching value and the second matching value between the ith detection frame and the jth track.

13. The method according to any one of claims 10-12, wherein the obtaining P second motion trajectories according to at least one of the first motion trajectories matched in the third matching result, the first motion trajectories matched in the fourth matching result, and the high confidence detection boxes not matched in the third matching result includes:

updating the first motion trail matched in the third matching result and the first motion trail matched in the fourth matching result to obtain a second motion trail;

And creating a new second motion trail for the unmatched high-confidence detection frame in the third matching result.

14. The method of claim 7, wherein determining the motion trajectory of the user within the target spatial region at the current time from the P second motion trajectories comprises:

and determining the second motion trail corresponding to the user identifier in the P second motion trail as the motion trail of the user in the target space region at the current moment.

15. The method according to any one of claims 1-6, further comprising:

determining the stay time of the user at the goods shelf according to the motion trail of the user;

determining a consumption level of the user for goods on the goods shelf according to the residence time of the user at the goods shelf;

and tracing suspicious bills according to the consumption grade of the user for the goods on the goods shelf and the movement track of the user.

16. The method of claim 15, wherein the longer the user remains at the shelf, the higher the user's consumption level for the goods on the shelf.

17. A bill handling device, comprising:

18. An electronic device comprising a processor and a memory;

the memory is used for storing a computer program;

the processor for executing the computer program to implement the method of any of the preceding claims 1 to 16.

19. A computer readable storage medium storing a computer program for causing a computer to perform the method of any one of the preceding claims 1 to 16.