CN110781752A

CN110781752A - Object identification method for dynamic visual intelligent cabinet under multi-class scene

Info

Publication number: CN110781752A
Application number: CN201910923841.2A
Authority: CN
Inventors: 高凤春; 俞梦涛; 吴彬
Original assignee: Shanghai Several Data Technology Co Ltd
Current assignee: Shanghai Several Data Technology Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-02-11

Abstract

The invention provides an object identification method under a multi-class scene of a dynamic visual intelligent cabinet, which comprises the following steps: s1: shooting by a camera to obtain a purchase image; s2: importing the purchase image into a dynamic recognition model to generate a dynamic model detection result; s3: judging the detection result of the dynamic model, if the detection result is judged to be the handheld commodity, entering the next step, and if not, finishing the step; s4: intercepting a plurality of hand-held commodity small pictures according to the purchase image; s5: importing the hand-held commodity small pictures into a static recognition model to generate a plurality of static model detection results; s6: and performing confidence detection on the detection result of the static model, taking the result with the highest confidence, detecting whether the confidence is greater than a threshold, if so, taking the result as the final detection result, and otherwise, returning to the step S1. The invention has the beneficial effects that: the advantages of video identification and image classification are integrated, the mutual defects are overcome, and dynamic visual identification under the condition of multiple categories can be realized.

Description

Object identification method for dynamic visual intelligent cabinet under multi-class scene

Technical Field

The invention relates to the field of visual identification, in particular to an object identification method in a multi-class scene of a dynamic visual intelligent cabinet.

Background

With the development of economy in China and the improvement of the living standard of people, retail industry in China flourishes and develops, and with the flourishing and development of the retail industry, how to reduce the labor cost of the retail industry becomes a problem which needs to be solved urgently.

Conventionally, vending machines based on visual recognition are generally adopted to reduce the labor cost of the retail industry, and the intelligent cabinets use the means of visual recognition to identify the content of the vending so as to calculate the money required to be paid by the consumers. Visual identification generally adopts two kinds of forms, one is video identification, through object detection algorithm, detects commodity in every frame image, and information such as commodity position, but if commodity class is too much, the interference is very serious, and the class is strictly limited, usually exceeds 20 classes, and the training volume is multiplied, and the failure rate is extremely high. Therefore, the method is not suitable for various scenes; the second method is image classification, and the commodities are identified by classifying images with commodities in each frame, and the commodity class of the method can support more than 1000 types, but the problem of taking multiple commodities at the same time and the problem of the movement track of the commodities in the purchasing process cannot be solved.

Therefore, there is a need in the market for an object recognition method that can integrate the advantages of video recognition and image classification, overcome the disadvantages of each other, and realize dynamic visual recognition under multiple categories.

Disclosure of Invention

In order to solve the technical problems, the invention discloses an object identification method under a multi-class scene of a dynamic visual intelligent cabinet, and the technical scheme of the invention is implemented as follows:

an object identification method under a multi-class scene of a dynamic visual intelligent cabinet comprises the following steps: s1: shooting by a camera to obtain a purchase image; s2: importing the purchase image into a dynamic recognition model to generate a dynamic model detection result; s3: judging the detection result of the dynamic model, if the detection result is judged to be the handheld commodity, entering the next step, and if not, finishing the step; s4: intercepting a plurality of hand-held commodity small pictures according to the purchase image; s5: importing the hand-held commodity small pictures into a static recognition model to generate a plurality of static model detection results; s6: and performing confidence detection on the detection result of the static model, taking the result with the highest confidence, detecting whether the confidence is greater than a threshold, if so, taking the result as the final detection result, and otherwise, returning to the step S1.

Preferably, the dynamic model detection result comprises a classification of an image and a position of the handheld commodity; the image is classified as one of a handheld article and a non-handheld article.

Preferably, the static model detection result includes the type and number of the commodity.

Preferably, it further comprises S0: training a model; s0 includes: s0-1: training the dynamic recognition model; s0-2: training the static recognition model.

Preferably, S0-1 includes: s0-1-1: preparing a dynamic recognition model training image; s0-1-2: and importing the dynamic recognition model training image into a dynamic recognition model training algorithm to generate the dynamic recognition model.

Preferably, the dynamic recognition model training algorithm is selected from one of SSD algorithm, mobrienet algorithm or inclusion algorithm.

Preferably, S0-2 includes: s0-2-1: preparing a static recognition model training image; s0-2-2: and importing the static recognition model training image into a static recognition model training algorithm to generate the static recognition model.

Preferably, the static recognition model training algorithm is selected from one of SSD algorithm, mobrienet algorithm or inclusion algorithm.

The technical scheme of the invention can solve the technical problem that the prior art lacks a method which is simultaneously suitable for static identification and dynamic identification and can carry out dynamic visual identification under the condition of multiple categories; by implementing the technical scheme of the invention, the advantages of video identification and image classification can be integrated, the mutual defects are overcome, and the technical effect of dynamic visual identification under the condition of multiple categories can be realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow diagram of one embodiment of a method for object identification in a multi-class scene of a dynamic vision intelligent cabinet;

fig. 2 is a flowchart of S0 of a specific embodiment of the object identification method in the multi-category scene of the dynamic visual intelligent cabinet.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a specific embodiment, as shown in fig. 1 and fig. 2, an object identification method in a multi-category scene of a dynamic visual intelligent cabinet includes: s1: shooting by a camera to obtain a purchase image; s2: importing the purchase image into a dynamic identification model to generate a dynamic model detection result; s3: judging the detection result of the dynamic model, entering the next step if the detection result is judged to be the handheld commodity, and ending the step if the detection result is not judged to be the handheld commodity; s4: intercepting a plurality of hand-held commodity small pictures according to the purchase image; s5: importing a plurality of hand-held commodity small pictures into a static recognition model to generate a plurality of static model detection results; s6: and carrying out confidence detection on the detection results of the static model, taking the result with the highest confidence, detecting whether the confidence is greater than a threshold, if so, taking the result as the final detection result, and otherwise, returning to the step S1.

In the specific embodiment, the commodity held by the consumer is located within the visual field range of the camera, the camera acquires an image of the commodity held by the consumer to obtain a purchase image and transmits the purchase image to the dynamic recognition model, a dynamic recognition model detection result is periodically obtained, whether the image is the image of the commodity held by the consumer is judged according to the dynamic recognition model detection result, if the image is not the image of the commodity held by the consumer, the next step is not carried out, if the image is judged to be the image of the commodity held by the consumer, S4 is executed, otherwise, S2 is repeatedly carried out in the next period, and therefore the image of the commodity held by the non-consumer is removed; cutting the purchased image after S4, cutting the commodity to obtain a plurality of hand-held commodity small images, importing the small images of the hand-held commodities into a static recognition model, generating a corresponding static recognition model detection result by each small image, sequentially arranging the contents in the static recognition model detection corresponding to each small image according to the confidence degree of the contents, taking the content with the highest confidence degree, outputting the content as a detection result if the confidence degree is greater than a preset threshold value, and otherwise, informing the user that the detection fails, returning to S1 to detect again, wherein the output contents comprise the ID, the confidence degree and the evaluation score of the small image; through the steps, the advantages of video identification and image classification are integrated, the mutual defects are overcome, and dynamic visual identification under the condition of multiple categories can be realized.

In a preferred embodiment, as shown in FIG. 1, the dynamic model detection results include the classification of the image and the location of the hand-held merchandise; the classification of the image is one of a hand-held article and a non-hand-held article.

In this preferred embodiment, the dynamic model detection result includes the image of the handheld commodity and the position of the handheld commodity, if the image of the handheld commodity is not the image of the handheld commodity, the next step is not performed, and if the image of the handheld commodity is the image of the handheld commodity, the next step is performed.

In a preferred embodiment, as shown in FIG. 1, the static model test results include the type and quantity of the goods.

In the preferred embodiment, the static model detection result comprises the type and the quantity of the commodity, and the corresponding price and quantity are obtained according to the type and the quantity of the commodity and are transmitted to the server.

In a preferred embodiment, as shown in fig. 1 and 2, further comprising S0: training a model; s0 includes: s0-1: training a dynamic recognition model; s0-2: and training the static recognition model.

In this preferred embodiment, S0 trains a dynamic recognition model for recognizing continuous dynamic images and a static recognition model for recognizing single-frame static images obtained after filtering the category of the images by the dynamic recognition model, respectively, using the preprocessed data set.

In a preferred embodiment, as shown in fig. 1 and 2, S0-1 includes: s0-1-1: preparing a dynamic recognition model training image; s0-1-2: importing the dynamic recognition model training image into a dynamic recognition model training algorithm to generate a dynamic recognition model; the dynamic recognition model training algorithm is selected from one of an SSD algorithm, a MoblieNet algorithm or an inclusion algorithm.

In this preferred embodiment, S0-1 is used to generate a dynamic recognition model for continuous dynamic images, used to detect whether a consumer is shopping, typically trained using one of SSD, MobileNet or inclusion algorithms; s0-1-1, reading a dynamic recognition model training image, preprocessing the dynamic recognition model training image by using a preprocessing algorithm to obtain a preprocessed dynamic recognition model training image, wherein the preprocessing algorithm comprises one of smoothing, median filtering, edge detection and gradient operators, importing the preprocessed dynamic recognition model training image into the dynamic recognition model training algorithm to generate a dynamic recognition model, evaluating the performance of the dynamic recognition model, returning to S0-1-2 if the performance of the dynamic recognition model fails to reach a preset threshold, adjusting relevant parameters of the dynamic recognition model training algorithm, and re-training until the performance of the generated dynamic recognition model reaches a corresponding threshold.

In a preferred embodiment, as shown in fig. 1 and 2, S0-2 includes: s0-2-1: preparing a static recognition model training image; s0-2-2: importing the static recognition model training image into a static recognition model training algorithm to generate a static recognition model; the static recognition model training algorithm is selected from one of an SSD algorithm, a MoblieNet algorithm or an inclusion algorithm.

In this preferred embodiment, S0-2 is used to generate a static recognition model suitable for static images, for detecting the number and types of commodities held by consumers, and a general image classification algorithm trains the model, specifically to the scene, using one of SSD, MobileNet or inclusion; s0-2-2, reading the training image of the static recognition model, preprocessing the training image of the static recognition model by using a preprocessing algorithm, wherein the preprocessing algorithm comprises one of smoothing, median filtering, edge detection and gradient operators, importing the preprocessed training image of the static recognition model into the training algorithm of the static recognition model to obtain a corresponding static recognition model, evaluating the performance of the static recognition model, returning to S0-2-2 if the performance of the static recognition model does not reach a threshold value, adjusting relevant parameters of the training algorithm of the static recognition model, and training again.

It should be understood that the above-described embodiments are merely exemplary of the present invention, and are not intended to limit the present invention, and that any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An object identification method under a dynamic visual intelligent cabinet multi-category scene is characterized by comprising the following steps:

s1: shooting by a camera to obtain a purchase image;

s2: importing the purchase image into a dynamic recognition model to generate a dynamic model detection result;

s3: judging the detection result of the dynamic model, if the detection result is judged to be the handheld commodity, entering the next step, and if not, finishing the step;

s4: intercepting a plurality of hand-held commodity small pictures according to the purchase image;

s5: importing the hand-held commodity small pictures into a static recognition model to generate a plurality of static model detection results;

s6: and performing confidence detection on the detection result of the static model, taking the result with the highest confidence, detecting whether the confidence is greater than a threshold, if so, taking the result as the final detection result, and otherwise, returning to the step S1.

2. The method for identifying the object in the multi-class scene of the dynamic visual intelligent cabinet according to claim 1, wherein the method comprises the following steps:

the dynamic model detection result comprises the classification of the image and the position of the handheld commodity; the image is classified as one of a handheld article and a non-handheld article.

3. The method for identifying the object in the multi-class scene of the dynamic visual intelligent cabinet according to claim 1, wherein the method comprises the following steps: the static model detection result comprises the type and the quantity of the commodity.

4. The method for identifying objects in the multi-category scene of the dynamic visual intelligent cabinet as claimed in claim 1, further comprising S0: training a model;

s0 includes:

s0-1: training the dynamic recognition model;

s0-2: training the static recognition model.

5. The method for identifying objects in the multi-category scene of the dynamic vision intelligent cabinet as recited in claim 4, wherein the step S0-1 comprises:

s0-1-1: preparing a dynamic recognition model training image;

s0-1-2: and importing the dynamic recognition model training image into a dynamic recognition model training algorithm to generate the dynamic recognition model.

6. The method according to claim 5, wherein the dynamic recognition model training algorithm is selected from one of SSD algorithm, MoblieNet algorithm and Incepotion algorithm.

7. The method for identifying objects in the multi-category scene of the dynamic vision intelligent cabinet as recited in claim 4, wherein the step S0-2 comprises:

s0-2-1: preparing a static recognition model training image;

s0-2-2: and importing the static recognition model training image into a static recognition model training algorithm to generate the static recognition model.

8. The method according to claim 7, wherein the static recognition model training algorithm is selected from one of an SSD algorithm, a MoblieNet algorithm, and an inclusion algorithm.