WO2020134102A1

WO2020134102A1 - Article recognition method and device, vending system, and storage medium

Info

Publication number: WO2020134102A1
Application number: PCT/CN2019/099811
Authority: WO
Inventors: 张屹峰; 刘朋樟; 刘巍; 陈宇; 周梦迪
Original assignee: 北京沃东天骏信息技术有限公司
Priority date: 2018-12-29
Filing date: 2019-08-08
Publication date: 2020-07-02
Also published as: CN109754009B; CN109754009A

Abstract

The present disclosure relates to the field of image processing. Disclosed are an article recognition method and device, a vending system, and a storage medium. The article recognition method comprises: inputting training images into a neural network model, wherein the neural network model comprises a target object classifier and a scene negative classifier; acquiring a first output of the target object classifier and a second output of the scene negative classifier; determining a total loss value according to a first loss value determined on the basis of the first output and a second loss value determined on the basis of the second output; adjusting the weights of nodes in the neural network model according to the total loss value to obtain a trained target object classification model, so as to enable the target object classification model to recognize articles, such that, the trained model can keep a high recognition accuracy in various existing scenes and even new scenes, thereby improving the generalization ability of the target object classification model and reducing the training cost.

Description

Article identification method, device, vending system and storage medium

This application is based on the application with the CN application number 201811630337.5 and the application date is December 29, 2018, and claims its priority. The disclosure content of the CN application is hereby incorporated into this application as a whole.

Technical field

The present disclosure relates to the field of image processing, and in particular, to an item identification method, device, vending system, and storage medium.

Background technique

The visual recognition algorithm can be used to recognize the object category in the image. Before the recognition, the visual recognition model needs to be trained with training data, so that the model of the visual recognition algorithm has high accuracy. The visual recognition algorithm can be widely used in various application scenarios.

Summary of the invention

According to a first aspect of some embodiments of the present disclosure, there is provided an item recognition method, including: inputting a training image into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier; acquiring a target The first output of the object classifier and the second output of the negative classifier of the scene; the total loss value is determined based on the first loss value determined based on the first output and the second loss value determined based on the second output; based on the total loss The value adjusts the weights of the nodes in the neural network model to obtain the target object classification model that has been trained, so that the target object classification model can be used to identify items.

In some embodiments, the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.

In some embodiments, the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and the feature map layer has a negative weight coefficient .

In some embodiments, the neural network model further includes a feature extraction network; acquiring the first output of the target object classifier and the second output of the scene negative classifier includes: acquiring image features extracted and output from the training image by the feature extraction network ; Input image features to the target object classifier to obtain the first output; input image features to the scene negative classifier to obtain the second output.

In some embodiments, the item identification method further includes: inputting the collected real image into the generation network of the generative confrontation network to obtain the output virtual image; and determining the virtual image as the training image.

In some embodiments, the item recognition method further includes: inputting the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain the target scene generated by the generation network based on the image from the source scene Virtual image; input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of judging the degree of similarity between the virtual image of the target scene and the image from the target scene; Calculate the loss value of the generative adversarial network; adjust the weights of the nodes of the generative adversarial network according to the loss value of the generative adversarial network, so as to obtain the completed training adversarial network.

In some embodiments, the method further includes: allowing multiple graphics cards to synchronize the weights of the nodes of the generative confrontation network; inputting multiple pairs of images to the multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network based on the input images , And then calculate the gradient value of the weights of the nodes of the generative confrontation network, where each graphics card receives input of one or more pairs of images, and the two images in each pair of images come from different scenes; get the calculation generated by each graphics card The gradient value of the weight of the node of the type confrontation network; the gradient value calculated by each graphics card is aggregated into the memory, so that the memory determines the average value of the gradient value calculated by each graphics card, and then calculates the updated node of the generative confrontation network Weights.

In some embodiments, the item identification method further includes: inputting the image to be tested into the target object classification model that has completed training; using the output of the target object classifier of the target object classification model as the item identification of the target object in the image to be tested result.

In some embodiments, the item identification method further includes: in response to the door of the vending cabinet being opened, acquiring an image to be tested.

According to a second aspect of some embodiments of the present disclosure, there is provided an item recognition device, including: a training image input module configured to input a training image into a neural network model, wherein the neural network model includes a target object classifier and Scene negative classifier; output acquisition module, configured to acquire the first output of the target object classifier and the second output of the scene negative classifier; total loss value calculation module, configured to determine the first output based on the first output A loss value and a second loss value determined based on the second output to determine the total loss value; the weight adjustment module is configured to adjust the weight of the nodes in the neural network model according to the total loss value to obtain the goal of completing training Object classification model, wherein the target object classification model is used to identify items.

According to a third aspect of some embodiments of the present disclosure, there is provided an item identification device, including: a memory; and a processor coupled to the memory, the processor configured to execute the following operations based on instructions stored in the memory Item recognition method: input the training image into the neural network model, where the neural network model includes the target object classifier and the scene negative classifier; obtain the first output of the target object classifier and the second output of the scene negative classifier ; Determine the total loss value based on the first loss value determined based on the first output and the second loss value determined based on the second output; adjust the weight of the nodes in the neural network model according to the total loss value to obtain the completed training The target object classification model in order to use the target object classification model to identify items.

In some embodiments, the operations further include: inputting the collected real images into the generating network of the generative confrontation network to obtain output virtual images; and determining the virtual images as training images.

In some embodiments, the operations further include: inputting the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain a virtual image of the target scene generated by the generation network based on the image from the source scene ; Input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of the network to determine the degree of similarity between the virtual image of the target scene and the image from the target scene; The loss value of the type confrontation network; according to the loss value of the generative confrontation network, the weights of the nodes of the generative confrontation network are adjusted to obtain the trained generative confrontation network.

In some embodiments, the operations further include: causing multiple graphics cards to synchronize the weights of the nodes of the generative confrontation network; inputting multiple pairs of images to the multiple graphics cards, so that each graphics card calculates the loss of the generative confrontation network based on the input images Value, and then calculate the gradient value of the weights of the nodes of the generative confrontation network, where each graphics card receives input of one or more pairs of images, and the two images in each pair of images come from different scenes; obtain the calculated value of each graphics card The gradient value of the weight of the nodes of the generative confrontation network; the gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the nodes of the updated generative confrontation network the weight of.

In some embodiments, the operations further include: inputting the image to be tested into the target object classification model that has completed training; and using the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested.

According to a fourth aspect of some embodiments of the present disclosure, there is provided a vending system, including: a camera device, located in a vending cabinet, configured to collect an image to be tested in response to a door of the vending cabinet being opened; a sorting device, being It is configured to input the image to be tested into the target object classification model that has completed the training, and use the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested; and any of the aforementioned item recognition Device.

According to a fifth aspect of some embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, wherein when the program is executed by a processor, any of the foregoing item identification methods is implemented.

Other features and advantages of the present disclosure will become clear through the following detailed description of exemplary embodiments of the present disclosure with reference to the drawings.

BRIEF DESCRIPTION

In order to more clearly explain the embodiments of the present disclosure or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. For those of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained based on these drawings.

FIG. 1 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure.

FIG. 2 is an exemplary neural network model and the relationship between various modules in the model according to some embodiments of the present disclosure.

FIG. 3 is a schematic structural diagram of an exemplary scene forward classifier according to some embodiments of the present disclosure.

4 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure.

5 is a schematic flowchart of a training image generation method according to some embodiments of the present disclosure.

FIG. 6 is a schematic flowchart of a generative adversarial network training method according to some embodiments of the present disclosure.

7 is a schematic flowchart of a training method of a generative adversarial network according to some other embodiments of the present disclosure.

FIG. 8 is a schematic flowchart of a method for selling a container according to some embodiments of the present disclosure.

9 is a schematic structural diagram of an article identification device according to some embodiments of the present disclosure.

10 is a schematic structural diagram of a vending system according to some embodiments of the present disclosure.

11 is a schematic structural diagram of an article identification device according to other embodiments of the present disclosure.

12 is a schematic structural diagram of an article identification device according to yet other embodiments of the present disclosure.

detailed description

The technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all the embodiments. The following description of at least one exemplary embodiment is actually merely illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.

Unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure.

After analyzing the related technologies, the inventor found that when a new object needs to be recognized, the model needs to be retrained based on the new object; when an object needs to be recognized in a new scene, even if it has been trained based on the object, in The accuracy of recognition in the new scenario cannot be guaranteed. Therefore, the accuracy of item identification is currently low. Therefore, a technical problem to be solved by the embodiments of the present disclosure is: how to improve the accuracy of item identification.

After further analysis, the inventor found that an image often includes not only the target object to be recognized, but also environmental information such as background, light, and darkness. For example, a camera installed at an unmanned sales container collects monitoring images during the process of the user picking up items in the sales container, so as to identify which commodities the user has taken based on the monitoring images. In addition to commodities, the monitoring images also include backgrounds, such as plants, lakes, and other backgrounds included in the monitoring images collected in parks, and street, building, and other backgrounds in the monitoring images collected in commercial areas. In addition, in the monitoring images collected at different locations or at different time points, even the same product will show different effects in different images. For example, the commodities in the surveillance images collected in the office building are exposed to specific colors of light, the commodities in the surveillance images collected outdoors during the day are exposed to natural light, and the commodities in the surveillance images collected outdoors in the evening are exposed to weaker light. Irradiation and so on. These factors will increase the difficulty of identification.

Therefore, the inventor recognizes that it is necessary to train a model that can ignore the scene information and focus on the target object in the image, so that during the recognition process, the model can be adapted to a variety of scenes. The following describes an embodiment of the article recognition method of the present disclosure with reference to FIG. 1.

FIG. 1 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure. As shown in FIG. 1, the article identification method of this embodiment includes steps S102 to S108.

In step S102, the training image is input into a neural network model, where the neural network model includes a target object classifier and a scene negative classifier.

In some embodiments, the training image may have a scene classification label. The scene classification label indicates the scene from which the training image was collected or the environmental features in the image. The training image may or may not include the target object. The target object refers to an object to be recognized, such as a commodity to be recognized, a person to be recognized, and so on. When a target object is included in the training image, the training image has a target object classification label, which indicates which commodity, which person, etc. the target object is.

The target object classifier is used for scoring according to the input image data or image features, and the scoring result indicates which kind of object the identified target object is, so it is a positive classifier. The general scene classifier is based on a similar principle, and its scoring result indicates which scene the image scene belongs to. However, the embodiments of the present disclosure employ a scene negative classifier. The scene negative classifier is also used to score based on the input image data or image features, but the scoring result is determined based on the opposite number of the scoring result of the ordinary scene classifier, and the opposite number of the scoring result based on the ordinary scene classifier Positive correlation.

In some embodiments, the negative scene classifier is implemented based on a positive scene classifier, that is, an ordinary scene classifier. One of the layers in the forward classifier of the scene has a negative weight coefficient. That is, the output value of the scene negative classifier is equal to the output value of the scene positive classifier multiplied by the negative weight coefficient.

In step S104, the first output of the target object classifier and the second output of the scene negative classifier are acquired.

In step S106, the total loss value is determined based on the first loss value determined based on the first output and the second loss value determined based on the second output.

In some embodiments, the loss value is determined based on the difference between the output and the marker value. Therefore, the more accurate the classification of the target object classifier, the smaller the first loss value; the more accurate the classification of the scene forward classifier, the larger the second loss value. Therefore, the attention of the neural network to the scene information can be minimized.

In step S108, the weights of the nodes in the neural network model are adjusted according to the total loss value, so as to obtain the target object classification model for completing the training.

The target object classification model is used for item identification, for example, item identification in a sales scene. For example, when a user purchases a product using an unmanned vending device, the user can take an image of the user picking and placing the product through the camera device, and then identify which product the user picks and placing through the target object classification model.

Through the method of the above embodiment, the influence of the scene of the image on the recognition result of the target object item can be weakened as much as possible in the process of training the model, so that the trained model can be more effective in various existing scenes, even new scenes The high recognition accuracy improves the generalization ability of the target object classification model and reduces the training cost.

In some embodiments, the neural network model also includes a feature extraction network. The relationship between an exemplary neural network model and various modules in the model according to some embodiments of the present disclosure is shown in FIG. 2.

In step S202, image features extracted and output from the training image by the feature extraction network are acquired.

In step S204, the image features are input to the target object classifier to obtain the first output.

In step S206, the image features are input to the scene negative classifier to obtain the second output.

Therefore, the target object classifier and the scene negative classifier can perform further classification processing based on the image features extracted in advance, which improves the calculation efficiency of the network.

In some embodiments, the negative classifier of the scene can be realized by inverting the gradient of the image features. For example, the scene negative classifier is implemented based on the scene positive classifier. The structure of the scene negative classifier is basically the same as the scene positive classifier, including the feature map layer, shallow neural network and scene classification layer, and the feature map layer, shallow neural network and scene classification layer are connected in sequence, as shown in Figure 3 . The feature mapping layer of the negative scene classifier is implemented by adding a negative weight coefficient to the feature mapping layer of the scene positive classifier. The product of the feature map layer and the weight coefficient of the negative classifier of the scene can also be regarded as a gradient inversion layer.

After processing the feature map layer of the negative classifier of the scene, the information input to the shallow neural network is the result of gradient inversion of the image features or image data; then, the shallow neural network extracts the information after the gradient inversion The "scene features" in "," the "scene features" are actually the result of multiplying the scene features of the original training image and the negative weight coefficients; finally, the scene classification layer outputs the results according to the "scene features".

In some embodiments, the target object classifier may have only one layer, so that a feature extraction network may be used to perform most of the processing in the target object classification process.

Since the negative classifier of the scene is only used for auxiliary training, after the model training is completed, only the target object classifier can be used for prediction. The embodiment of the target object classification method of the present disclosure is described below with reference to FIG. 4.

4 is a schematic flowchart of a target object classification method according to some embodiments of the present disclosure. As shown in FIG. 4, the target object classification method of this embodiment includes steps S402 to S404.

In step S402, the image to be tested is input into a target object classification model that has completed training.

In step S404, the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be measured. For example, the image to be tested may be input into the feature extraction network of the target object classification model, and the feature extraction network inputs the extracted image features into the target object classifier. The scene negative classifier may not be used in the prediction stage.

By adopting a target object classification model with stronger generalization ability, more accurate item recognition results can be obtained when classifying target objects for images collected from various scenes.

In the training stage, in order to make the prediction effect of the model better, it is necessary to collect a large number of images for training, and then manually label the images. In some embodiments of the present disclosure, in order to further improve training efficiency and save labor costs, some real images can be collected, and then virtual images can be generated based on the collected real images, and the real images and the virtual images can be used together in the target object classification network. Training process. The following describes an embodiment of the training image generation method of the present disclosure with reference to FIG. 5.

5 is a schematic flowchart of a training image generation method according to some embodiments of the present disclosure. As shown in FIG. 5, the training image generation method of this embodiment includes steps S502 to S504.

In step S502, the collected real image is input into the generating network of the generative confrontation network to obtain the output virtual image. The generative network is a neural network used to generate virtual images.

In step S504, the virtual image is determined as the training image. Of course, the training images may also include real images.

Through the method of the above embodiment, a virtual image can be generated based on the collected real image, which reduces the cost of image acquisition and manual annotation, and improves the training efficiency.

In some embodiments, a training generative network can be trained to obtain a training network that completes the training. The following describes an embodiment of the generative adversarial network training method of the present disclosure with reference to FIG. 6.

FIG. 6 is a schematic flowchart of a generative adversarial network training method according to some embodiments of the present disclosure. As shown in FIG. 6, the generative adversarial network training method of this embodiment includes steps S602 to S608.

In step S602, the image from the source scene and the image from the target scene are input into the generation network of the generative confrontation network to obtain the target scene virtual image generated by the generation network based on the source scene image.

The image from the source scene is an image collected from the source scene, and the image from the target scene is an image collected from the target scene. The source scene may be, for example, a laboratory scene, and the target scene may be, for example, an actual application scene such as a park, street, or shopping mall.

In step S604, the virtual image of the target scene and the image from the target scene are input into the judgment network of the generative adversarial network to obtain the judgment of the judgment degree of the scene similarity between the virtual image of the target scene and the image from the target scene by the network result.

In step S606, the loss value of the generative confrontation network is calculated.

In step S608, according to the loss value of the generative adversarial network, the weights of the nodes of the generative adversarial network are adjusted to obtain the trained generative adversarial network.

The generation network is used to generate a virtual image of the target scene as similar as possible to the image from the target scene based on the image from the source scene. The goal of the adversarial network is to determine whether the virtual image of the target scene is a real image or a virtual image by identifying whether the virtual image of the target scene and the image from the target scene are similar. These two networks can continue to optimize through game with each other until they judge whether the network cannot identify whether the image generated by the generated network is real.

Through the method of the above embodiment, virtual images belonging to different scenes can be generated based on real images, so that a large number of training images belonging to multiple scenes can be provided for the training process of the target object classification model, so the training of the target object classification model can be improved effectiveness.

In some embodiments, multiple graphics cards may be used to collaboratively complete the training process of the generative confrontation network to improve training efficiency. The following describes an embodiment of the generative adversarial network training method of the present disclosure with reference to FIG. 7.

7 is a schematic flowchart of a training method of a generative adversarial network according to some other embodiments of the present disclosure. As shown in FIG. 7, the generative adversarial network training method of this embodiment includes steps S702 to S708.

In step S702, multiple graphics cards are synchronized to generate the weights of the nodes against the network.

In step S704, multiple pairs of images are input to multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network according to the input image, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each Each graphics card receives input from one or more pairs of images, and the two images in each pair come from different scenes.

For example, you can input 4 pairs of images to a graphics card with 24G memory, and use 4 graphics cards to train at the same time. In this case, the calculation process of 16 pairs of images can be performed simultaneously.

In step S706, the gradient value of the weight of the node of the generative confrontation network calculated by each graphics card is obtained.

In step S708, the gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the weight of the updated generative confrontation network node.

In some embodiments, after the weights are updated, step S702 may be returned to, so that multiple graphics cards may synchronize the latest weights and iteratively execute the above steps.

Through the method of the above embodiment, multiple pairs of images can be input into multiple graphics cards, and the communication between the graphics cards can be used to synchronously generate the weight of the nodes against the network, so that multiple graphics cards can perform the training process at the same time. Thus, the training efficiency is improved.

Some embodiments of the present disclosure may be applied to sales scenarios of unmanned sales devices. When the user opens the door of the unmanned vending container to pick up the goods, the camera installed in the unmanned vending container can collect the image when the user takes the goods. Then, the merchandise held by the user can be identified through the target object classification method of the present disclosure. The embodiment of the unmanned vending sales method of the present disclosure is described below with reference to FIG. 8.

8 is a schematic flowchart of an unmanned vending container sales method according to some embodiments of the present disclosure. As shown in FIG. 8, the sales method of this embodiment includes steps S802 to S806.

In step S802, in response to the door of the vending cabinet being opened, an image to be measured is collected. The image to be tested includes a picture of the user taking the product. The collected images to be tested can be sent to the server side for further processing through the network, or transmitted to the processing module built in the vending cabinet through the network, short-range wireless communication means, and data transmission lines.

In step S804, the image to be tested is input into the target object classification model that has completed the training. In some embodiments, the target image may be detected first to determine the location of the target object in the image, and then the image of the location of the target object may be input into the target object classification model.

In step S806, the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be measured, and the product identification is determined according to the item recognition result.

Therefore, the SKU (Stock Keeping Unit), name, price, specifications and other information of the items taken by the user can be determined, so as to settle the items taken by the user and realize the automatic vending process.

Through the method of the above embodiment, the images collected by the cameras of the unmanned vending device deployed in various scenarios can accurately identify the items taken by the user in the image, thereby improving the sales efficiency of the vending machine Accuracy with commodity settlement.

In some embodiments, an unmanned vending container can also be used to collect training images. Therefore, the images collected during use can be used for training, which further improves the training efficiency.

Next, an embodiment of the article identification device of the present disclosure will be described with reference to FIG. 9.

9 is a schematic structural diagram of an article identification device according to some embodiments of the present disclosure. As shown in FIG. 9, the article recognition device 900 of this embodiment includes: a training image input module 9100 configured to input a training image into a neural network model, where the neural network model includes a target object classifier and a scene negative To the classifier; the output acquisition module 9200 is configured to acquire the first output of the target object classifier and the second output of the scene negative classifier; the total loss value calculation module 9300 is configured to determine the first output based on the first output A loss value and a second loss value determined based on the second output to determine the total loss value; the weight adjustment module 9400 is configured to adjust the weight of the nodes in the neural network model according to the total loss value to obtain the completed training Target object classification model, in order to use the target object classification model to identify items.

In some embodiments, the neural network model further includes a feature extraction network; the output acquisition module 9200 is further configured to acquire image features extracted and output from the training image by the feature extraction network; input the image features to the target object classifier to obtain the first One output; input the image features to the negative classifier of the scene to obtain the second output.

In some embodiments, the item recognition device 900 further includes: a virtual image generation module 9500 configured to input the collected real image into the generation network of the generative confrontation network to obtain an output virtual image; and determine the virtual image as training image.

In some embodiments, the item recognition apparatus 900 further includes a generative confrontation network training module 9600 configured to input images from the source scene and the target scene into the generative confrontation network's generation network to obtain the generation network The virtual image of the target scene generated based on the image from the source scene; the virtual image of the target scene and the image from the target scene are input into the judgment network of the generative confrontation network to obtain the virtual image of the judgment network on the target scene and the target scene The judgment result of the scene similarity of the image is calculated; the loss value of the generative adversarial network is calculated; the weights of the nodes of the generative adversarial network are adjusted according to the loss value of the generative adversarial network to obtain the trained generative adversarial network.

In some embodiments, the generative adversarial network training module 9600 may also be configured to allow multiple graphics cards to synchronize the weights of the nodes of the generative adversarial network; input multiple pairs of images to the plurality of graphics cards, so that each graphics card is based on The input image calculates the loss value of the generative confrontation network, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each graphics card receives the input of one or more pairs of images, and the two images in each pair of images come from Different scenarios; Obtain the gradient values of the weights of the nodes of the generative generation network calculated by each graphics card; summarize the gradient values calculated by each graphics card into memory, so that the memory determines the gradient values calculated by each graphics card The average value is used to calculate the weight of the nodes of the updated generative adversarial network.

The following describes an embodiment of the vending system of the present disclosure with reference to FIG. 10.

10 is a schematic structural diagram of a vending system according to some embodiments of the present disclosure. As shown in FIG. 10, the vending system 100 of this embodiment includes: a camera device 1010, located in a vending cabinet, configured to collect images to be tested in response to the door of the vending cabinet being opened; a classification device 1020, configured to The image to be tested is input into the target object classification model that has completed the training, and the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be tested; and the item recognition device 1030. For a specific implementation of the item identification device 1030, reference may be made to the item identification device 900 in the embodiment of FIG. 9.

The classification device 1020 and the item identification device 1030 may be located on the same device, or may be located on different devices. At least one of the classification device 1020 and the item identification device 1030 may be located on the server side, for example, or may be located in the vending device.

11 is a schematic structural diagram of an article identification device according to other embodiments of the present disclosure. As shown in FIG. 11, the article identification device 110 of this embodiment includes: a memory 1110 and a processor 1120 coupled to the memory 1110. The processor 1120 is configured to perform any of the foregoing implementations based on instructions stored in the memory 1110 Example item identification method.

The memory 1110 may include, for example, a system memory, a fixed non-volatile storage medium, and so on. The system memory stores, for example, an operating system, application programs, a boot loader (Boot Loader), and other programs.

12 is a schematic structural diagram of an article identification device according to yet other embodiments of the present disclosure. As shown in FIG. 12, the article identification device 120 of this embodiment includes a memory 1210 and a processor 1220, and may further include an input/output interface 1230, a network interface 1240, a storage interface 1250, and the like. The

interfaces

1230, 1240, 1250 and the memory 1210 and the processor 1220 may be connected via a bus 1260, for example. The input/output interface 1230 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 1240 provides a connection interface for various networked devices. The storage interface 1250 provides a connection interface for external storage devices such as SD cards and U disks.

Embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, which is characterized in that when the program is executed by a processor, any one of the foregoing item identification methods is implemented.

Those skilled in the art should understand that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code .

The present disclosure is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple flows of a flowchart and/or one block or multiple blocks of a block diagram.

These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

The above are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present disclosure shall be included in the protection of the present disclosure Within range.

Claims

An item identification method, including:

Input the training image into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier;

Obtain the first output of the target object classifier and the second output of the scene negative classifier;

Determine the total loss value according to the first loss value determined based on the first output and the second loss value determined based on the second output;

The weights of the nodes in the neural network model are adjusted according to the total loss value to obtain a target object classification model that has completed training, so that the target object classification model is used to identify items.
The item identification method according to claim 1, wherein the scene negative classifier is implemented by adding a negative weight coefficient to one of the layers of the scene positive classifier.
The item recognition method according to claim 2, wherein the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are sequentially connected, and The feature map layer has a negative weight coefficient.
The item recognition method according to claim 1, wherein the neural network model further includes a feature extraction network;

The first output of acquiring the target object classifier and the second output of the scene negative classifier include:

Obtain the image features extracted and output from the training image by the feature extraction network;

Input image features to the target object classifier to obtain the first output;

The image features are input to the negative classifier of the scene to obtain the second output.
The article identification method according to any one of claims 1 to 4, further comprising:

Input the collected real images into the generation network of the generative confrontation network to obtain the output virtual images;

The virtual image is determined as the training image.
The article identification method according to claim 5, further comprising:

Input the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain a virtual image of the target scene generated by the generation network based on the image from the source scene;

Input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of the judgment degree of the scene similarity between the virtual image of the target scene and the image from the target scene;

Calculate the loss value of the generative confrontation network;

According to the loss value of the generative adversarial network, the weights of the nodes of the generative adversarial network are adjusted to obtain the trained generative adversarial network.
The article identification method according to claim 5, further comprising:

Make multiple graphics cards synchronously generate weights against the nodes of the network;

Input multiple pairs of images into the multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network based on the input image, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each graphics card receives Input of one or more pairs of images, two images in each pair of images come from different scenes;

Obtain the gradient value of the weight of the nodes of the generative confrontation network calculated by each graphics card;

The gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the weight of the updated generative confrontation network node.
The article identification method according to claim 1, further comprising:

Input the image to be tested into the target object classification model after training;

The output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be tested.
The article identification method according to claim 8, further comprising:

In response to the door of the vending cabinet being opened, the image to be tested is collected.
An item identification device, including:

A training image input module configured to input training images into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier;

The output acquisition module is configured to acquire the first output of the target object classifier and the second output of the scene negative classifier;

The total loss value calculation module is configured to determine the total loss value based on the first loss value determined based on the first output and the second loss value determined based on the second output;

The weight adjustment module is configured to adjust the weights of the nodes in the neural network model according to the total loss value to obtain a target object classification model for completing training, wherein the target object classification model is used to identify items.
An item identification device, including:

Storage; and

A processor coupled to the memory, the processor configured to perform an item identification method including the following operations based on instructions stored in the memory:

Input the training image into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier;

Obtain the first output of the target object classifier and the second output of the scene negative classifier;

Determine the total loss value according to the first loss value determined based on the first output and the second loss value determined based on the second output;

The weights of the nodes in the neural network model are adjusted according to the total loss value to obtain a target object classification model that has completed training, so that the target object classification model is used to identify items.
The item identification device according to claim 11, wherein the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
The item recognition device according to claim 12, wherein the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and The feature map layer has a negative weight coefficient.
The article identification device according to claim 11, wherein the neural network model further includes a feature extraction network;

The first output of acquiring the target object classifier and the second output of the scene negative classifier include:

Obtain the image features extracted and output from the training image by the feature extraction network;

Input image features to the target object classifier to obtain the first output;

The image features are input to the negative classifier of the scene to obtain the second output.
The article identification device according to any one of claims 11 to 14, wherein the operation further includes:

Input the collected real images into the generation network of the generative confrontation network to obtain the output virtual images;

The virtual image is determined as the training image.
The article identification device according to claim 15, wherein the operation further comprises:

Input the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain a virtual image of the target scene generated by the generation network based on the image from the source scene;

Input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of the judgment degree of the scene similarity between the virtual image of the target scene and the image from the target scene;

Calculate the loss value of the generative confrontation network;

According to the loss value of the generative adversarial network, the weights of the nodes of the generative adversarial network are adjusted to obtain the trained generative adversarial network.
The article identification device according to claim 15, wherein the operation further comprises:

Make multiple graphics cards synchronously generate weights against the nodes of the network;

Input multiple pairs of images into the multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network based on the input image, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each graphics card receives Input of one or more pairs of images, two images in each pair of images come from different scenes;

Obtain the gradient value of the weight of the nodes of the generative confrontation network calculated by each graphics card;

The gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the weight of the updated generative confrontation network node.
The article identification device according to claim 11, wherein the operation further comprises:

Input the image to be tested into the target object classification model after training;

The output of the target object classifier of the target object classification model is used as the object recognition result of the target object in the image to be tested.
A vending system, including:

The camera device, located in the vending cabinet, is configured to collect the image to be tested in response to the door of the vending cabinet being opened;

A classification device configured to input the image to be tested into the target object classification model that has completed the training, and use the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested; and

The article identification device according to any one of claims 11 to 18.
A computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the article identification method according to any one of claims 1 to 9.