CN111723777A

CN111723777A - Method and device for judging commodity taking and placing process, intelligent container and readable storage medium

Info

Publication number: CN111723777A
Application number: CN202010646953.0A
Authority: CN
Inventors: 梁英男
Original assignee: Guangzhou Weaving Point Intelligent Technology Co ltd
Current assignee: Guangzhou Weaving Point Intelligent Technology Co ltd
Priority date: 2020-07-07
Filing date: 2020-07-07
Publication date: 2020-09-29

Abstract

The embodiment of the invention discloses a method and a device for judging a commodity taking and placing process, an intelligent container and a readable storage medium, wherein the method comprises the following steps: sequentially detecting each video frame in the video stream through a classification identification model; if a commodity image is detected in the video frame, marking the commodity image through a commodity marking frame; if a hand image is detected in the video frame, marking the hand image through a hand marking frame; after the hand mark frame and the commodity mark frame are detected, calculating the intersection ratio of the hand mark frame and the commodity mark frame; and if the intersection ratio is larger than a preset threshold value, determining whether the commodity in the video stream is taken out or put back according to the variation trend of the central coordinates of the hand mark frames in each video frame. The commodity picking and placing process is identified by detecting the direction change of the hand in the commodity picking and placing identification process, so that the times of missed detection and false detection are effectively reduced, and the commodity picking and placing process judgment accuracy is improved.

Description

Method and device for judging commodity taking and placing process, intelligent container and readable storage medium

Technical Field

The invention relates to the field of detection of the identification image or the commodity image, in particular to a method and a device for judging a commodity taking and placing process, an intelligent container and a readable storage medium.

Background

With the social development and the technological progress, the rent of stores, the labor cost, the customer acquisition cost of e-commerce platforms and the logistics distribution cost become the deep burden of the times of traditional commercial operation and e-commerce, so that the development cost (customer acquisition cost, logistics cost, fresh loss and the like) of e-commerce is high, and the survival of traditional physical stores is difficult.

At present, the utilization of intelligent containers for unmanned selling is one of the main new retail modes. The settlement method for the intelligent container by using Radio Frequency Identification (RFID) is one of settlement methods for the intelligent container, and specifically comprises the following steps: when a user takes a needed commodity from the intelligent container, the intelligent container determines the electronic tag of the taken commodity by detecting the RFID tags (called electronic tags) of the rest commodities, and generates a settlement bill according to the determined electronic tag of the taken commodity so as to be used for settlement of the consumer. Disadvantages of using RFID for settlement include at least: when the user takes away the needed commodity, but the electronic tag of the commodity taken away is put back into the intelligent container, the detection result of the intelligent container is that the user does not take away any commodity, and the user can obtain the commodity free of charge at the moment, so that the intelligent container is used maliciously, and the merchant suffers great loss.

Disclosure of Invention

In view of the above problems, the present invention provides a method and an apparatus for determining a commodity picking and placing process, an intelligent container and a readable storage medium.

One embodiment of the present invention provides a method for determining a commodity picking and placing process, including:

sequentially detecting each video frame in the video stream through a classification identification model;

if a commodity image is detected in the video frame, marking the commodity image through a commodity marking frame;

if a hand image is detected in the video frame, marking the hand image through a hand marking frame;

after the hand mark frame and the commodity mark frame are detected, calculating the intersection ratio of the hand mark frame and the commodity mark frame;

and if the intersection ratio is larger than a preset threshold value, determining whether the commodity in the video stream is taken out or put back according to the variation trend of the central coordinates of the hand mark frames in each video frame.

The method for judging the commodity taking and placing process determines whether the commodities in the video stream are taken out or placed back according to the change trend of the center coordinates of the hand mark frames in each video frame, and comprises the following steps:

taking the picking and placing direction of the commodities in the video frames as a picking and placing judgment axis, and calculating the difference value of the picking and placing judgment axis coordinates of the hand mark frames in two adjacent frames in the continuous k video frames, wherein the difference value is equal to the difference value of the picking and placing judgment axis coordinate value of the current frame minus the picking and placing judgment axis coordinate value of the previous frame;

when the coordinate value of the taking and placing judgment shaft in the taking direction is gradually increased, in k-1 difference values, if the difference value is a positive number which is larger than the number of the difference value which is negative, the commodity is determined to be taken out, and if the difference value is a positive number which is smaller than the number of the difference value which is negative, the commodity is determined to be put back;

when the coordinate value of the taking and placing judgment shaft in the placing-back direction gradually increases, in the k-1 difference values, if the difference value is a positive number and is smaller than the number of the difference value which is a negative number, the commodity is determined to be taken out, and if the difference value is a positive number and is larger than the number of the difference value which is a negative number, the commodity is determined to be placed back.

The method for judging the commodity taking and placing process further comprises the following steps:

if a commodity image is detected in the video frame, identifying the commodity type of the commodity image;

and outputting the type of the commodity when the intersection ratio is larger than a preset threshold value.

and pre-training the classification recognition model by using the training sample set until the classification error loss of the classification recognition model is less than an expected value.

According to the commodity picking and placing process judging method, the error loss comprises background confidence coefficient loss, foreground confidence coefficient loss, category loss and coordinate loss.

Another embodiment of the present invention provides a device for determining a commodity taking and placing process, including:

the classification identification module is used for sequentially detecting each video frame in the video stream through a classification identification model;

the commodity marking module is used for marking the commodity image through the commodity marking frame if the commodity image is detected in the video frame;

the hand marking module is used for marking the hand image through a hand marking frame if the hand image is detected in the video frame;

the hand mark frame and the commodity mark frame are detected, and then the hand mark frame and the commodity mark frame are detected;

and the pick-and-place confirming module is used for determining whether the commodities in the video stream are taken out or put back according to the change trend of the central coordinates of the hand mark frames in each video frame if the intersection ratio is larger than a preset threshold value.

In the above device for determining the commodity taking and placing process, the taking and placing confirmation module includes:

the coordinate difference value calculating unit is used for calculating the difference value of the vertical coordinates of the central coordinates of the hand mark frames in two adjacent frames in the continuous k video frames when one side of the video frames close to the commodity placing position is taken as the horizontal axis of the coordinate system, and the vertical axis of the coordinate system takes a positive value, wherein the difference value is equal to the difference value of the vertical coordinate value of the current frame minus the vertical coordinate value of the previous frame;

and the coordinate difference value comparison unit is used for determining that the commodity is taken out if the difference value is positive number and is larger than the number of the difference value which is negative number in the k-1 difference values, and determining that the commodity is put back if the difference value is positive number and is smaller than the number of the difference value which is negative number.

The above device for determining the picking and placing processes of the commodity further comprises:

the type identification module is used for identifying the commodity type of the commodity image if the commodity image is detected in the video frame;

and the type output module is used for outputting the type of the commodity when the intersection ratio is greater than a preset threshold value.

The above embodiment relates to an intelligent container, which includes a memory, a processor, and a video stream acquiring device, where the memory is used to store a computer program, the processor runs the computer program to make the intelligent container execute the above method for determining the commodity taking and placing process, and the video stream acquiring device is used to acquire a video stream.

The above embodiments relate to a readable storage medium, which stores a computer program, and the computer program executes the above method for determining the picking and placing process of the article when running on a processor.

The method comprises the steps of sequentially detecting each video frame in a video stream through a classification recognition model; if a commodity image is detected in the video frame, marking the commodity image through a commodity marking frame; if a hand image is detected in the video frame, marking the hand image through a hand marking frame; after the hand mark frame and the commodity mark frame are detected, calculating the intersection ratio of the hand mark frame and the commodity mark frame; and if the intersection ratio is larger than a preset threshold value, determining whether the commodity in the video stream is taken out or put back according to the variation trend of the central coordinates of the hand mark frames in each video frame. The commodity picking and placing process is identified by detecting the direction change of the hand in the commodity picking and placing identification process, so that the times of missed detection and false detection are effectively reduced, and the commodity picking and placing process judgment accuracy is improved.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.

Fig. 1 is a schematic flow chart illustrating a method for determining a commodity picking and placing process according to an embodiment of the present invention;

FIG. 2 is a method diagram illustrating a cross-over ratio calculation method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a coordinate system of a video frame provided by an embodiment of the invention;

fig. 4 is a schematic structural diagram illustrating a device for determining a commodity taking and placing process according to an embodiment of the present invention.

Description of the main element symbols:

10-a commodity taking and placing process judging device; 100-a classification identification module; 200-a merchandise marking module; 300-a hand marking module; 400-intersection ratio calculation module; 500-pick and place confirmation module; 600-a category identification module; 700-category output module; 1-video frame.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.

Example 1

In this embodiment, referring to fig. 1, a method for determining a commodity taking and placing process is shown, where the method includes:

step S100: and sequentially detecting each video frame in the video stream through the classification recognition model.

The camera can be installed at a proper position according to the specific shape of the intelligent container and the commodity placing rule in the intelligent container, and the effective shooting range obtained by the camera is ensured to be maximum. Exemplarily, the camera can be placed at the upper right corner or the upper left corner of the intelligent container, when the cabinet door of the intelligent container is opened, the camera is triggered to be started, video frames are extracted from the video stream acquired by the camera, and each video frame in the video stream is sequentially detected by utilizing a classification recognition model.

The classification recognition model can be trained in advance by utilizing the training sample set until the classification error loss of the classification recognition model is less than an expected value.

For the classification recognition model, a YOLOv3 model can be selected, the training sample is input into the classification recognition model, the classification recognition model performs classification recognition on the commodities in the training sample, the effectiveness of the classification recognition model is evaluated through classification error loss, and if the classification error loss function converges to the point that the classification error loss is smaller than a preset expected value, the effectiveness of the classification recognition model meets a preset standard.

It should be understood that different classification error loss functions have different evaluation effects, and the loss functions used for evaluating the classification recognition model in the present embodiment include coordinate loss, confidence loss, and category loss.

Furthermore, each training sample image is divided into S × S subunits, each subunit generates B candidate frames (anchor box), and each candidate frame is classified and identified by the classification and identification model to finally obtain a corresponding prediction frame (bounding box).

The coordinate loss can be obtained by the following formula:

therein, loss_coordWhich represents the loss of coordinates and is,

indicates whether the jth candidate box of the ith sub-unit is responsible for the object (commodity or hand), if so

Otherwise

The criterion for determining responsibility is that the intersection ratio (IOU) between the B candidate frames of the ith sub-unit and the real target frame (ground transistor box) of the object is the largest among the intersection ratios (IOU) between all the candidate frames and the real target frame, and x is the largest_i、y_i、

Respectively represent the abscissa, ordinate, width, height of the prediction bounding box,

respectively representing the abscissa, ordinate, width and height of the real target frame. 2-w_i×h_iDenotes a weight coefficient, λ_coordRepresents a coordinate loss coefficient, w_i、h_iAre the width and height of the normalized prior box, respectively. The expression shows that when the jth candidate frame of the ith sub-unit is responsible for a certain real target, the predicted boundary frame generated by the candidate frame is compared with the real target frame, and the error of the coordinates is calculated.

The confidence loss can be obtained by the following formula:

loss_conrepresenting confidence loss, including bounding box foreground confidence loss for objects

And no object bounding box background confidence loss

Indicating whether the jth candidate box representing the ith sub-unit is responsible for predicting an object, and if so

Otherwise

Representing the prediction confidence of the jth candidate box of the ith sub-unit,

Otherwise

The jth candidate box representing i sub-units is not responsible for theObject, λ_noobjRepresenting the background confidence loss factor.

The class loss is obtained by the following formula:

loss_classwhich represents the loss of the category,

and

respectively representing a prediction category probability value and a real category probability value, wherein the formula represents the error before the prediction category and the real category when the jth candidate box of the ith unit is responsible for a certain real object.

It should be appreciated that the overall loss is as follows, and the training process will be based on this loss function.

loss＝loss_coord+loss_con+loss_class

Step S200: and if the commodity image is detected in the video frame, marking the commodity image through a commodity marking frame.

And sequentially detecting each video frame in the video stream by using the trained classification recognition model, and marking the commodity image by using a commodity marking frame in a certain video frame if the commodity image is detected in the video frame.

Further, when a product image is detected in the video frame, the product type of the product image is identified.

Step S300: if a hand image is detected in the video frame, the hand image is marked by a hand marking frame.

And sequentially detecting each video frame in the video stream by using the trained classification recognition model, and marking the hand image in a certain video frame by using a hand marking frame if the hand image is detected in the video frame.

Step S400: and after the hand mark frame and the commodity mark frame are detected, calculating the intersection ratio of the hand mark frame and the commodity mark frame.

And calculating the intersection ratio of the hand mark frame and the commodity mark frame for a video frame comprising the hand mark frame and the commodity mark frame. The intersection ratio is equal to the ratio of the overlapping area of the hand mark frame and the commodity mark frame to the total area of the hand mark frame and the commodity mark frame, and the corresponding ratio relationship is shown in fig. 2, the overlapping area of the hand mark frame and the commodity mark frame is the shaded part of the denominator part in fig. 2, and the total area of the hand mark frame and the commodity mark frame is the shaded part of the denominator part in fig. 2.

Step S500: and if the intersection ratio is larger than a preset threshold value, determining whether the commodity in the video stream is taken out or put back according to the variation trend of the central coordinates of the hand mark frames in each video frame.

If the intersection ratio is larger than the preset threshold value, the contact between the hand and the commodity is indicated, and the user takes out the commodity or puts back the commodity in advance. Then, whether the goods in the video stream are taken out or put back can be determined according to the change trend of the center coordinates of the hand mark boxes in each video frame.

Exemplarily, referring to fig. 3, when a side of the video frame 1 close to the commodity placement position is taken as a horizontal axis of the coordinate system, a vertical axis of the coordinate system takes a positive value, that is, a vertical axis is taken as a pick-and-place determination axis, and coordinate values of the pick-and-place determination axis are gradually increased in the pick-up direction. Calculating the difference value of the vertical coordinates of the central coordinates of the hand mark frames in two adjacent frames in the continuous k video frames, wherein k can be set according to specific application conditions, and the difference value is equal to the difference value of the vertical coordinate value of the current frame minus the vertical coordinate value of the previous frame; when the coordinate value of the taking and placing judgment shaft in the taking direction gradually increases, in the k-1 difference values, if the difference value is positive number and is larger than the number with the difference value being negative number, the commodity is determined to be taken out, and if the difference value is positive number and is smaller than the number with the difference value being negative number, the commodity is determined to be put back.

Exemplarily, when the coordinate value of the pick-and-place judgment axis in the placing-back direction gradually increases, in k-1 difference values, if the difference value is a positive number smaller than the number of the difference value which is a negative number, it is determined that the commodity is taken out, and if the difference value is a positive number larger than the number of the difference value which is a negative number, it is determined that the commodity is placed back. And in the k-1 difference values, if the difference value is positive number and larger than the number with the negative difference value, determining that the commodity is taken out, and if the difference value is positive number and smaller than the number with the negative difference value, determining that the commodity is put back.

Taking the example that the coordinate value of the taking and placing judgment axis is gradually increased in the taking direction, if the k value is 10, continuously acquiring the central coordinates of the hand mark frames in 10 video frames, and calculating the difference value of subtracting the central coordinate of the previous frame from the central coordinates of 9 current frames, if the number of the difference values which are positive is greater than the number of the difference values which are negative, determining that the commodity is taken out, and if the number of the difference values which are positive is less than the number of the difference values which are negative, determining that the commodity is put back.

Further, when the intersection ratio is larger than a preset threshold value, the type of the commodity is output so as to determine the type of the commodity taken out or put back.

According to the commodity picking and placing process, the picking and placing process of the commodities is identified through detecting the direction change of the hands in the commodity picking and placing identifying process, the times of missed detection and false detection are effectively reduced, and the commodity picking and placing process judging accuracy is improved.

Example 2

In this embodiment, referring to fig. 4, the device 10 for determining the picking and placing processes of the goods includes a classification module 100, a goods marking module 200, a hand marking module 300, an intersection ratio calculating module 400, a picking and placing confirming module 500, a category identifying module 600 and a category outputting module 700.

A classification recognition module 100, configured to sequentially detect each video frame in the video stream through a classification recognition model; a commodity marking module 200, configured to mark a commodity image through a commodity marking frame if the commodity image is detected in the video frame; a hand marking module 300, configured to mark a hand image with a hand marking frame if the hand image is detected in the video frame; the intersection ratio calculation module 400 is configured to calculate an intersection ratio of the hand mark frame and the commodity mark frame after the hand mark frame and the commodity mark frame are detected; a pick-and-place confirmation module 500, configured to determine whether a commodity in the video stream is to be picked up or placed back according to a variation trend of the center coordinates of the hand mark frame in each video frame if the intersection ratio is greater than a preset threshold; a category identification module 600, configured to identify a category of a commodity image if the commodity image is detected in a video frame; and the category output module 700 is configured to output the category of the commodity when the intersection ratio is greater than a preset threshold.

The above-mentioned taking and placing confirmation module 500 includes:

the coordinate difference value calculating unit is used for calculating the difference value of the vertical coordinates of the central coordinates of the hand mark frames in two adjacent frames in the continuous k video frames when one side of the video frames close to the commodity placing position is taken as the horizontal axis of the coordinate system, and the vertical axis of the coordinate system takes a positive value, wherein the difference value is equal to the difference value of the vertical coordinate value of the current frame minus the vertical coordinate value of the previous frame; and the coordinate difference value comparison unit is used for determining that the commodity is taken out if the difference value is positive number and is larger than the number of the difference value which is negative number in the k-1 difference values, and determining that the commodity is put back if the difference value is positive number and is smaller than the number of the difference value which is negative number.

The device 10 for determining the commodity taking and placing process in this embodiment is used to execute the method for determining the commodity taking and placing process in the above embodiment by using the classification recognition module 100, the commodity marking module 200, the hand marking module 300, the intersection ratio calculation module 400, the taking and placing confirmation module 500, the kind recognition module 600, and the kind output module 700 in a matching manner, and the implementation scheme and the beneficial effect related to the above embodiment are also applicable to this embodiment, and are not described herein again.

It should be understood that the above embodiments relate to an intelligent container, which includes a memory, a processor and a video stream acquiring device, the memory is used for storing a computer program, the processor runs the computer program to make the intelligent container execute the above commodity taking and placing process determining method, and the video stream acquiring device is used for acquiring a video stream.

It should be understood that the above embodiments relate to a readable storage medium storing a computer program which, when run on a processor, executes the above-mentioned article picking and placing process determining method.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A commodity taking and placing process judging method is characterized by comprising the following steps:

2. The method for determining whether the commodity in the video stream is taken out or put back according to the change trend of the center coordinates of the hand mark frame in each video frame, as set forth in claim 1, comprises:

3. The method for determining the picking and placing process of the commodity according to claim 1, further comprising:

4. The method for determining the picking and placing process of the commodity according to claim 1, further comprising:

and pre-training the classification recognition model by utilizing a training sample set until the classification error loss of the classification recognition model is less than an expected value.

5. The method as claimed in claim 4, wherein the error loss includes a background confidence loss, a foreground confidence loss, a category loss and a coordinate loss.

6. A commodity taking and placing process judging device is characterized by comprising:

7. The device for determining the picking and placing processes of the merchandise according to claim 6, wherein the picking and placing confirming module comprises:

8. The device for discriminating the picking and placing processes of a commodity according to claim 6, further comprising:

9. An intelligent container, comprising a memory for storing a computer program, a processor for executing the computer program to enable the intelligent container to execute the method for discriminating a commodity taking and placing process according to any one of claims 1 to 5, and a video stream acquisition device for acquiring a video stream.

10. A readable storage medium storing a computer program which, when executed on a processor, executes the method for determining a picking and placing process of an article according to any one of claims 1 to 5.