WO2020134102A1 - Article recognition method and device, vending system, and storage medium - Google Patents

Article recognition method and device, vending system, and storage medium Download PDF

Info

Publication number
WO2020134102A1
WO2020134102A1 PCT/CN2019/099811 CN2019099811W WO2020134102A1 WO 2020134102 A1 WO2020134102 A1 WO 2020134102A1 CN 2019099811 W CN2019099811 W CN 2019099811W WO 2020134102 A1 WO2020134102 A1 WO 2020134102A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
image
target object
classifier
network
Prior art date
Application number
PCT/CN2019/099811
Other languages
French (fr)
Chinese (zh)
Inventor
张屹峰
刘朋樟
刘巍
陈宇
周梦迪
Original Assignee
北京沃东天骏信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2020134102A1 publication Critical patent/WO2020134102A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present disclosure relates to the field of image processing, and in particular, to an item identification method, device, vending system, and storage medium.
  • the visual recognition algorithm can be used to recognize the object category in the image. Before the recognition, the visual recognition model needs to be trained with training data, so that the model of the visual recognition algorithm has high accuracy.
  • the visual recognition algorithm can be widely used in various application scenarios.
  • an item recognition method including: inputting a training image into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier; acquiring a target The first output of the object classifier and the second output of the negative classifier of the scene; the total loss value is determined based on the first loss value determined based on the first output and the second loss value determined based on the second output; based on the total loss The value adjusts the weights of the nodes in the neural network model to obtain the target object classification model that has been trained, so that the target object classification model can be used to identify items.
  • the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
  • the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and the feature map layer has a negative weight coefficient .
  • the neural network model further includes a feature extraction network; acquiring the first output of the target object classifier and the second output of the scene negative classifier includes: acquiring image features extracted and output from the training image by the feature extraction network ; Input image features to the target object classifier to obtain the first output; input image features to the scene negative classifier to obtain the second output.
  • the item identification method further includes: inputting the collected real image into the generation network of the generative confrontation network to obtain the output virtual image; and determining the virtual image as the training image.
  • the item recognition method further includes: inputting the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain the target scene generated by the generation network based on the image from the source scene Virtual image; input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of judging the degree of similarity between the virtual image of the target scene and the image from the target scene; Calculate the loss value of the generative adversarial network; adjust the weights of the nodes of the generative adversarial network according to the loss value of the generative adversarial network, so as to obtain the completed training adversarial network.
  • the method further includes: allowing multiple graphics cards to synchronize the weights of the nodes of the generative confrontation network; inputting multiple pairs of images to the multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network based on the input images , And then calculate the gradient value of the weights of the nodes of the generative confrontation network, where each graphics card receives input of one or more pairs of images, and the two images in each pair of images come from different scenes; get the calculation generated by each graphics card The gradient value of the weight of the node of the type confrontation network; the gradient value calculated by each graphics card is aggregated into the memory, so that the memory determines the average value of the gradient value calculated by each graphics card, and then calculates the updated node of the generative confrontation network Weights.
  • the item identification method further includes: inputting the image to be tested into the target object classification model that has completed training; using the output of the target object classifier of the target object classification model as the item identification of the target object in the image to be tested result.
  • the item identification method further includes: in response to the door of the vending cabinet being opened, acquiring an image to be tested.
  • an item recognition device including: a training image input module configured to input a training image into a neural network model, wherein the neural network model includes a target object classifier and Scene negative classifier; output acquisition module, configured to acquire the first output of the target object classifier and the second output of the scene negative classifier; total loss value calculation module, configured to determine the first output based on the first output A loss value and a second loss value determined based on the second output to determine the total loss value; the weight adjustment module is configured to adjust the weight of the nodes in the neural network model according to the total loss value to obtain the goal of completing training Object classification model, wherein the target object classification model is used to identify items.
  • an item identification device including: a memory; and a processor coupled to the memory, the processor configured to execute the following operations based on instructions stored in the memory Item recognition method: input the training image into the neural network model, where the neural network model includes the target object classifier and the scene negative classifier; obtain the first output of the target object classifier and the second output of the scene negative classifier ; Determine the total loss value based on the first loss value determined based on the first output and the second loss value determined based on the second output; adjust the weight of the nodes in the neural network model according to the total loss value to obtain the completed training The target object classification model in order to use the target object classification model to identify items.
  • the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
  • the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and the feature map layer has a negative weight coefficient .
  • the neural network model further includes a feature extraction network; acquiring the first output of the target object classifier and the second output of the scene negative classifier includes: acquiring image features extracted and output from the training image by the feature extraction network ; Input image features to the target object classifier to obtain the first output; input image features to the scene negative classifier to obtain the second output.
  • the operations further include: inputting the collected real images into the generating network of the generative confrontation network to obtain output virtual images; and determining the virtual images as training images.
  • the operations further include: inputting the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain a virtual image of the target scene generated by the generation network based on the image from the source scene ; Input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of the network to determine the degree of similarity between the virtual image of the target scene and the image from the target scene; The loss value of the type confrontation network; according to the loss value of the generative confrontation network, the weights of the nodes of the generative confrontation network are adjusted to obtain the trained generative confrontation network.
  • the operations further include: causing multiple graphics cards to synchronize the weights of the nodes of the generative confrontation network; inputting multiple pairs of images to the multiple graphics cards, so that each graphics card calculates the loss of the generative confrontation network based on the input images Value, and then calculate the gradient value of the weights of the nodes of the generative confrontation network, where each graphics card receives input of one or more pairs of images, and the two images in each pair of images come from different scenes; obtain the calculated value of each graphics card The gradient value of the weight of the nodes of the generative confrontation network; the gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the nodes of the updated generative confrontation network the weight of.
  • the operations further include: inputting the image to be tested into the target object classification model that has completed training; and using the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested.
  • a vending system including: a camera device, located in a vending cabinet, configured to collect an image to be tested in response to a door of the vending cabinet being opened; a sorting device, being It is configured to input the image to be tested into the target object classification model that has completed the training, and use the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested; and any of the aforementioned item recognition Device.
  • a computer-readable storage medium on which a computer program is stored, wherein when the program is executed by a processor, any of the foregoing item identification methods is implemented.
  • FIG. 1 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure.
  • FIG. 2 is an exemplary neural network model and the relationship between various modules in the model according to some embodiments of the present disclosure.
  • FIG. 3 is a schematic structural diagram of an exemplary scene forward classifier according to some embodiments of the present disclosure.
  • FIG. 4 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure.
  • FIG. 5 is a schematic flowchart of a training image generation method according to some embodiments of the present disclosure.
  • FIG. 6 is a schematic flowchart of a generative adversarial network training method according to some embodiments of the present disclosure.
  • FIG. 7 is a schematic flowchart of a training method of a generative adversarial network according to some other embodiments of the present disclosure.
  • FIG. 8 is a schematic flowchart of a method for selling a container according to some embodiments of the present disclosure.
  • FIG. 9 is a schematic structural diagram of an article identification device according to some embodiments of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a vending system according to some embodiments of the present disclosure.
  • FIG. 11 is a schematic structural diagram of an article identification device according to other embodiments of the present disclosure.
  • FIG. 12 is a schematic structural diagram of an article identification device according to yet other embodiments of the present disclosure.
  • a technical problem to be solved by the embodiments of the present disclosure is: how to improve the accuracy of item identification.
  • an image often includes not only the target object to be recognized, but also environmental information such as background, light, and darkness.
  • a camera installed at an unmanned sales container collects monitoring images during the process of the user picking up items in the sales container, so as to identify which commodities the user has taken based on the monitoring images.
  • the monitoring images also include backgrounds, such as plants, lakes, and other backgrounds included in the monitoring images collected in parks, and street, building, and other backgrounds in the monitoring images collected in commercial areas.
  • backgrounds such as plants, lakes, and other backgrounds included in the monitoring images collected in parks, and street, building, and other backgrounds in the monitoring images collected in commercial areas.
  • the commodities in the surveillance images collected in the office building are exposed to specific colors of light
  • the commodities in the surveillance images collected outdoors during the day are exposed to natural light
  • the commodities in the surveillance images collected outdoors in the evening are exposed to weaker light. Irradiation and so on. These factors will increase the difficulty of identification.
  • the inventor recognizes that it is necessary to train a model that can ignore the scene information and focus on the target object in the image, so that during the recognition process, the model can be adapted to a variety of scenes.
  • the following describes an embodiment of the article recognition method of the present disclosure with reference to FIG. 1.
  • FIG. 1 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure. As shown in FIG. 1, the article identification method of this embodiment includes steps S102 to S108.
  • step S102 the training image is input into a neural network model, where the neural network model includes a target object classifier and a scene negative classifier.
  • the training image may have a scene classification label.
  • the scene classification label indicates the scene from which the training image was collected or the environmental features in the image.
  • the training image may or may not include the target object.
  • the target object refers to an object to be recognized, such as a commodity to be recognized, a person to be recognized, and so on.
  • the training image has a target object classification label, which indicates which commodity, which person, etc. the target object is.
  • the target object classifier is used for scoring according to the input image data or image features, and the scoring result indicates which kind of object the identified target object is, so it is a positive classifier.
  • the general scene classifier is based on a similar principle, and its scoring result indicates which scene the image scene belongs to.
  • the embodiments of the present disclosure employ a scene negative classifier.
  • the scene negative classifier is also used to score based on the input image data or image features, but the scoring result is determined based on the opposite number of the scoring result of the ordinary scene classifier, and the opposite number of the scoring result based on the ordinary scene classifier Positive correlation.
  • the negative scene classifier is implemented based on a positive scene classifier, that is, an ordinary scene classifier.
  • a positive scene classifier that is, an ordinary scene classifier.
  • One of the layers in the forward classifier of the scene has a negative weight coefficient. That is, the output value of the scene negative classifier is equal to the output value of the scene positive classifier multiplied by the negative weight coefficient.
  • step S104 the first output of the target object classifier and the second output of the scene negative classifier are acquired.
  • step S106 the total loss value is determined based on the first loss value determined based on the first output and the second loss value determined based on the second output.
  • the loss value is determined based on the difference between the output and the marker value. Therefore, the more accurate the classification of the target object classifier, the smaller the first loss value; the more accurate the classification of the scene forward classifier, the larger the second loss value. Therefore, the attention of the neural network to the scene information can be minimized.
  • step S108 the weights of the nodes in the neural network model are adjusted according to the total loss value, so as to obtain the target object classification model for completing the training.
  • the target object classification model is used for item identification, for example, item identification in a sales scene. For example, when a user purchases a product using an unmanned vending device, the user can take an image of the user picking and placing the product through the camera device, and then identify which product the user picks and placing through the target object classification model.
  • the influence of the scene of the image on the recognition result of the target object item can be weakened as much as possible in the process of training the model, so that the trained model can be more effective in various existing scenes, even new scenes
  • the high recognition accuracy improves the generalization ability of the target object classification model and reduces the training cost.
  • the neural network model also includes a feature extraction network.
  • the relationship between an exemplary neural network model and various modules in the model according to some embodiments of the present disclosure is shown in FIG. 2.
  • step S202 image features extracted and output from the training image by the feature extraction network are acquired.
  • step S204 the image features are input to the target object classifier to obtain the first output.
  • step S206 the image features are input to the scene negative classifier to obtain the second output.
  • the target object classifier and the scene negative classifier can perform further classification processing based on the image features extracted in advance, which improves the calculation efficiency of the network.
  • the negative classifier of the scene can be realized by inverting the gradient of the image features.
  • the scene negative classifier is implemented based on the scene positive classifier.
  • the structure of the scene negative classifier is basically the same as the scene positive classifier, including the feature map layer, shallow neural network and scene classification layer, and the feature map layer, shallow neural network and scene classification layer are connected in sequence, as shown in Figure 3 .
  • the feature mapping layer of the negative scene classifier is implemented by adding a negative weight coefficient to the feature mapping layer of the scene positive classifier.
  • the product of the feature map layer and the weight coefficient of the negative classifier of the scene can also be regarded as a gradient inversion layer.
  • the information input to the shallow neural network is the result of gradient inversion of the image features or image data; then, the shallow neural network extracts the information after the gradient inversion
  • the "scene features" in “,” the “scene features” are actually the result of multiplying the scene features of the original training image and the negative weight coefficients; finally, the scene classification layer outputs the results according to the "scene features".
  • the target object classifier may have only one layer, so that a feature extraction network may be used to perform most of the processing in the target object classification process.
  • the negative classifier of the scene is only used for auxiliary training, after the model training is completed, only the target object classifier can be used for prediction.
  • the embodiment of the target object classification method of the present disclosure is described below with reference to FIG. 4.
  • the target object classification method of this embodiment includes steps S402 to S404.
  • step S402 the image to be tested is input into a target object classification model that has completed training.
  • step S404 the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be measured.
  • the image to be tested may be input into the feature extraction network of the target object classification model, and the feature extraction network inputs the extracted image features into the target object classifier.
  • the scene negative classifier may not be used in the prediction stage.
  • some real images can be collected, and then virtual images can be generated based on the collected real images, and the real images and the virtual images can be used together in the target object classification network. Training process. The following describes an embodiment of the training image generation method of the present disclosure with reference to FIG. 5.
  • the training image generation method of this embodiment includes steps S502 to S504.
  • step S502 the collected real image is input into the generating network of the generative confrontation network to obtain the output virtual image.
  • the generative network is a neural network used to generate virtual images.
  • the virtual image is determined as the training image.
  • the training images may also include real images.
  • a virtual image can be generated based on the collected real image, which reduces the cost of image acquisition and manual annotation, and improves the training efficiency.
  • a training generative network can be trained to obtain a training network that completes the training.
  • the following describes an embodiment of the generative adversarial network training method of the present disclosure with reference to FIG. 6.
  • FIG. 6 is a schematic flowchart of a generative adversarial network training method according to some embodiments of the present disclosure. As shown in FIG. 6, the generative adversarial network training method of this embodiment includes steps S602 to S608.
  • step S602 the image from the source scene and the image from the target scene are input into the generation network of the generative confrontation network to obtain the target scene virtual image generated by the generation network based on the source scene image.
  • the image from the source scene is an image collected from the source scene
  • the image from the target scene is an image collected from the target scene.
  • the source scene may be, for example, a laboratory scene
  • the target scene may be, for example, an actual application scene such as a park, street, or shopping mall.
  • step S604 the virtual image of the target scene and the image from the target scene are input into the judgment network of the generative adversarial network to obtain the judgment of the judgment degree of the scene similarity between the virtual image of the target scene and the image from the target scene by the network result.
  • step S606 the loss value of the generative confrontation network is calculated.
  • step S608 according to the loss value of the generative adversarial network, the weights of the nodes of the generative adversarial network are adjusted to obtain the trained generative adversarial network.
  • the generation network is used to generate a virtual image of the target scene as similar as possible to the image from the target scene based on the image from the source scene.
  • the goal of the adversarial network is to determine whether the virtual image of the target scene is a real image or a virtual image by identifying whether the virtual image of the target scene and the image from the target scene are similar.
  • virtual images belonging to different scenes can be generated based on real images, so that a large number of training images belonging to multiple scenes can be provided for the training process of the target object classification model, so the training of the target object classification model can be improved effectiveness.
  • multiple graphics cards may be used to collaboratively complete the training process of the generative confrontation network to improve training efficiency.
  • the following describes an embodiment of the generative adversarial network training method of the present disclosure with reference to FIG. 7.
  • FIG. 7 is a schematic flowchart of a training method of a generative adversarial network according to some other embodiments of the present disclosure. As shown in FIG. 7, the generative adversarial network training method of this embodiment includes steps S702 to S708.
  • step S702 multiple graphics cards are synchronized to generate the weights of the nodes against the network.
  • step S704 multiple pairs of images are input to multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network according to the input image, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each Each graphics card receives input from one or more pairs of images, and the two images in each pair come from different scenes.
  • step S706 the gradient value of the weight of the node of the generative confrontation network calculated by each graphics card is obtained.
  • step S708 the gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the weight of the updated generative confrontation network node.
  • step S702 may be returned to, so that multiple graphics cards may synchronize the latest weights and iteratively execute the above steps.
  • multiple pairs of images can be input into multiple graphics cards, and the communication between the graphics cards can be used to synchronously generate the weight of the nodes against the network, so that multiple graphics cards can perform the training process at the same time.
  • the training efficiency is improved.
  • Some embodiments of the present disclosure may be applied to sales scenarios of unmanned sales devices.
  • the camera installed in the unmanned vending container can collect the image when the user takes the goods. Then, the merchandise held by the user can be identified through the target object classification method of the present disclosure.
  • the embodiment of the unmanned vending sales method of the present disclosure is described below with reference to FIG. 8.
  • FIG. 8 is a schematic flowchart of an unmanned vending container sales method according to some embodiments of the present disclosure. As shown in FIG. 8, the sales method of this embodiment includes steps S802 to S806.
  • step S802 in response to the door of the vending cabinet being opened, an image to be measured is collected.
  • the image to be tested includes a picture of the user taking the product.
  • the collected images to be tested can be sent to the server side for further processing through the network, or transmitted to the processing module built in the vending cabinet through the network, short-range wireless communication means, and data transmission lines.
  • step S804 the image to be tested is input into the target object classification model that has completed the training.
  • the target image may be detected first to determine the location of the target object in the image, and then the image of the location of the target object may be input into the target object classification model.
  • step S806 the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be measured, and the product identification is determined according to the item recognition result.
  • the SKU Stock Keeping Unit
  • name the name, price, specifications and other information of the items taken by the user
  • price the items taken by the user
  • the images collected by the cameras of the unmanned vending device deployed in various scenarios can accurately identify the items taken by the user in the image, thereby improving the sales efficiency of the vending machine Accuracy with commodity settlement.
  • an unmanned vending container can also be used to collect training images. Therefore, the images collected during use can be used for training, which further improves the training efficiency.
  • the article recognition device 900 of this embodiment includes: a training image input module 9100 configured to input a training image into a neural network model, where the neural network model includes a target object classifier and a scene negative To the classifier; the output acquisition module 9200 is configured to acquire the first output of the target object classifier and the second output of the scene negative classifier; the total loss value calculation module 9300 is configured to determine the first output based on the first output A loss value and a second loss value determined based on the second output to determine the total loss value; the weight adjustment module 9400 is configured to adjust the weight of the nodes in the neural network model according to the total loss value to obtain the completed training Target object classification model, in order to use the target object classification model to identify items.
  • the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
  • the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and the feature map layer has a negative weight coefficient .
  • the neural network model further includes a feature extraction network; the output acquisition module 9200 is further configured to acquire image features extracted and output from the training image by the feature extraction network; input the image features to the target object classifier to obtain the first One output; input the image features to the negative classifier of the scene to obtain the second output.
  • the item recognition device 900 further includes: a virtual image generation module 9500 configured to input the collected real image into the generation network of the generative confrontation network to obtain an output virtual image; and determine the virtual image as training image.
  • a virtual image generation module 9500 configured to input the collected real image into the generation network of the generative confrontation network to obtain an output virtual image; and determine the virtual image as training image.
  • the item recognition apparatus 900 further includes a generative confrontation network training module 9600 configured to input images from the source scene and the target scene into the generative confrontation network's generation network to obtain the generation network
  • the judgment result of the scene similarity of the image is calculated; the loss value of the generative adversarial network is calculated; the weights of the nodes of the generative adversarial network are adjusted according to the loss value of the generative adversarial network to obtain the trained generative adversarial network.
  • the generative adversarial network training module 9600 may also be configured to allow multiple graphics cards to synchronize the weights of the nodes of the generative adversarial network; input multiple pairs of images to the plurality of graphics cards, so that each graphics card is based on The input image calculates the loss value of the generative confrontation network, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each graphics card receives the input of one or more pairs of images, and the two images in each pair of images come from Different scenarios; Obtain the gradient values of the weights of the nodes of the generative generation network calculated by each graphics card; summarize the gradient values calculated by each graphics card into memory, so that the memory determines the gradient values calculated by each graphics card The average value is used to calculate the weight of the nodes of the updated generative adversarial network.
  • the vending system 100 of this embodiment includes: a camera device 1010, located in a vending cabinet, configured to collect images to be tested in response to the door of the vending cabinet being opened; a classification device 1020, configured to The image to be tested is input into the target object classification model that has completed the training, and the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be tested; and the item recognition device 1030.
  • the item identification device 1030 For a specific implementation of the item identification device 1030, reference may be made to the item identification device 900 in the embodiment of FIG. 9.
  • the classification device 1020 and the item identification device 1030 may be located on the same device, or may be located on different devices. At least one of the classification device 1020 and the item identification device 1030 may be located on the server side, for example, or may be located in the vending device.
  • the article identification device 110 of this embodiment includes: a memory 1110 and a processor 1120 coupled to the memory 1110.
  • the processor 1120 is configured to perform any of the foregoing implementations based on instructions stored in the memory 1110 Example item identification method.
  • the memory 1110 may include, for example, a system memory, a fixed non-volatile storage medium, and so on.
  • the system memory stores, for example, an operating system, application programs, a boot loader (Boot Loader), and other programs.
  • the article identification device 120 of this embodiment includes a memory 1210 and a processor 1220, and may further include an input/output interface 1230, a network interface 1240, a storage interface 1250, and the like.
  • the interfaces 1230, 1240, 1250 and the memory 1210 and the processor 1220 may be connected via a bus 1260, for example.
  • the input/output interface 1230 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 1240 provides a connection interface for various networked devices.
  • the storage interface 1250 provides a connection interface for external storage devices such as SD cards and U disks.
  • Embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, which is characterized in that when the program is executed by a processor, any one of the foregoing item identification methods is implemented.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code .
  • a computer usable non-transitory storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram can be implemented by computer program instructions.
  • These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the field of image processing. Disclosed are an article recognition method and device, a vending system, and a storage medium. The article recognition method comprises: inputting training images into a neural network model, wherein the neural network model comprises a target object classifier and a scene negative classifier; acquiring a first output of the target object classifier and a second output of the scene negative classifier; determining a total loss value according to a first loss value determined on the basis of the first output and a second loss value determined on the basis of the second output; adjusting the weights of nodes in the neural network model according to the total loss value to obtain a trained target object classification model, so as to enable the target object classification model to recognize articles, such that, the trained model can keep a high recognition accuracy in various existing scenes and even new scenes, thereby improving the generalization ability of the target object classification model and reducing the training cost.

Description

物品识别方法、装置、售货系统和存储介质Article identification method, device, vending system and storage medium
本申请是以CN申请号为201811630337.5,申请日为2018年12月29日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the application with the CN application number 201811630337.5 and the application date is December 29, 2018, and claims its priority. The disclosure content of the CN application is hereby incorporated into this application as a whole.
技术领域Technical field
本公开涉及图像处理领域,特别涉及一种物品识别方法、装置、售货系统和存储介质。The present disclosure relates to the field of image processing, and in particular, to an item identification method, device, vending system, and storage medium.
背景技术Background technique
视觉识别算法可以用于识别图像中的对象类别。在进行识别之前,需要采用训练数据对视觉识别模型进行训练,以使得视觉识别算法的模型具有较高的准确性。视觉识别算法可以广泛地应用于各类应用场景。The visual recognition algorithm can be used to recognize the object category in the image. Before the recognition, the visual recognition model needs to be trained with training data, so that the model of the visual recognition algorithm has high accuracy. The visual recognition algorithm can be widely used in various application scenarios.
发明内容Summary of the invention
根据本公开一些实施例的第一个方面,提供一种物品识别方法,包括:将训练图像输入到神经网络模型中,其中,神经网络模型包括目标对象分类器和场景负向分类器;获取目标对象分类器的第一输出和场景负向分类器的第二输出;根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值;根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型,以便采用目标对象分类模型识别物品。According to a first aspect of some embodiments of the present disclosure, there is provided an item recognition method, including: inputting a training image into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier; acquiring a target The first output of the object classifier and the second output of the negative classifier of the scene; the total loss value is determined based on the first loss value determined based on the first output and the second loss value determined based on the second output; based on the total loss The value adjusts the weights of the nodes in the neural network model to obtain the target object classification model that has been trained, so that the target object classification model can be used to identify items.
在一些实施例中,场景负向分类器通过为场景正向分类器的其中一层添加为负的权重系数实现。In some embodiments, the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
在一些实施例中,场景正向分类器包括特征映射层、浅层神经网络和场景分类层;特征映射层、浅层神经网络和场景分类层依次连接,并且特征映射层具有为负数的权重系数。In some embodiments, the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and the feature map layer has a negative weight coefficient .
在一些实施例中,神经网络模型还包括特征提取网络;获取目标对象分类器的第一输出和场景负向分类器的第二输出包括:获取特征提取网络从训练图像中提取并输出的图像特征;将图像特征输入到目标对象分类器,获得第一输出;将图像特征输入到场景负向分类器,获得第二输出。In some embodiments, the neural network model further includes a feature extraction network; acquiring the first output of the target object classifier and the second output of the scene negative classifier includes: acquiring image features extracted and output from the training image by the feature extraction network ; Input image features to the target object classifier to obtain the first output; input image features to the scene negative classifier to obtain the second output.
在一些实施例中,物品识别方法还包括:将采集的真实图像输入到生成式对抗网络的生成网络中,以获得输出的虚拟图像;将虚拟图像确定为训练图像。In some embodiments, the item identification method further includes: inputting the collected real image into the generation network of the generative confrontation network to obtain the output virtual image; and determining the virtual image as the training image.
在一些实施例中,物品识别方法还包括:将来自源场景的图像和来自目标场景的图像输入到生成式对抗网络的生成网络中,以获得生成网络基于来自源场景的图像生成的目标场景的虚拟图像;将目标场景的虚拟图像和来自目标场景的图像输入到生成式对抗网络的判定网络中,以获取判定网络对目标场景的虚拟图像和来自目标场景的图像的场景相似程度的判定结果;计算生成式对抗网络的损失值;根据生成式对抗网络的损失值,对生成式对抗网络的节点的权重进行调整,以获得完成训练的生成式对抗网络。In some embodiments, the item recognition method further includes: inputting the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain the target scene generated by the generation network based on the image from the source scene Virtual image; input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of judging the degree of similarity between the virtual image of the target scene and the image from the target scene; Calculate the loss value of the generative adversarial network; adjust the weights of the nodes of the generative adversarial network according to the loss value of the generative adversarial network, so as to obtain the completed training adversarial network.
在一些实施例中,还包括:令多张显卡同步生成式对抗网络的节点的权重;将多对图像输入到多张显卡中,以便每张显卡根据输入的图像计算生成式对抗网络的损失值,进而计算生成式对抗网络的节点的权重的梯度值,其中,每张显卡接收一对或多对图像的输入,每对图像中的两个图像来自不同的场景;获取每张显卡计算的生成式对抗网络的节点的权重的梯度值;将每张显卡计算的梯度值汇总到内存中,以便内存确定每张显卡计算的梯度值的平均值,进而计算更新后的生成式对抗网络的节点的权重。In some embodiments, the method further includes: allowing multiple graphics cards to synchronize the weights of the nodes of the generative confrontation network; inputting multiple pairs of images to the multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network based on the input images , And then calculate the gradient value of the weights of the nodes of the generative confrontation network, where each graphics card receives input of one or more pairs of images, and the two images in each pair of images come from different scenes; get the calculation generated by each graphics card The gradient value of the weight of the node of the type confrontation network; the gradient value calculated by each graphics card is aggregated into the memory, so that the memory determines the average value of the gradient value calculated by each graphics card, and then calculates the updated node of the generative confrontation network Weights.
在一些实施例中,物品识别方法还包括:将待测图像输入到完成训练的目标对象分类模型中;将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果。In some embodiments, the item identification method further includes: inputting the image to be tested into the target object classification model that has completed training; using the output of the target object classifier of the target object classification model as the item identification of the target object in the image to be tested result.
在一些实施例中,物品识别方法还包括:响应于售货柜的柜门被开启,采集待测图像。In some embodiments, the item identification method further includes: in response to the door of the vending cabinet being opened, acquiring an image to be tested.
根据本公开一些实施例的第二个方面,提供一种物品识别装置,包括:训练图像输入模块,被配置为将训练图像输入到神经网络模型中,其中,神经网络模型包括目标对象分类器和场景负向分类器;输出获取模块,被配置为获取目标对象分类器的第一输出和场景负向分类器的第二输出;总损失值计算模块,被配置为根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值;权重调整模块,被配置为根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型,其中,所述目标对象分类模型用于识别物品。According to a second aspect of some embodiments of the present disclosure, there is provided an item recognition device, including: a training image input module configured to input a training image into a neural network model, wherein the neural network model includes a target object classifier and Scene negative classifier; output acquisition module, configured to acquire the first output of the target object classifier and the second output of the scene negative classifier; total loss value calculation module, configured to determine the first output based on the first output A loss value and a second loss value determined based on the second output to determine the total loss value; the weight adjustment module is configured to adjust the weight of the nodes in the neural network model according to the total loss value to obtain the goal of completing training Object classification model, wherein the target object classification model is used to identify items.
根据本公开一些实施例的第三个方面,提供一种物品识别装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器中的指令,执行包括 以下操作的物品识别方法:将训练图像输入到神经网络模型中,其中,神经网络模型包括目标对象分类器和场景负向分类器;获取目标对象分类器的第一输出和场景负向分类器的第二输出;根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值;根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型,以便采用目标对象分类模型识别物品。According to a third aspect of some embodiments of the present disclosure, there is provided an item identification device, including: a memory; and a processor coupled to the memory, the processor configured to execute the following operations based on instructions stored in the memory Item recognition method: input the training image into the neural network model, where the neural network model includes the target object classifier and the scene negative classifier; obtain the first output of the target object classifier and the second output of the scene negative classifier ; Determine the total loss value based on the first loss value determined based on the first output and the second loss value determined based on the second output; adjust the weight of the nodes in the neural network model according to the total loss value to obtain the completed training The target object classification model in order to use the target object classification model to identify items.
在一些实施例中,场景负向分类器通过为场景正向分类器的其中一层添加为负的权重系数实现。In some embodiments, the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
在一些实施例中,场景正向分类器包括特征映射层、浅层神经网络和场景分类层;特征映射层、浅层神经网络和场景分类层依次连接,并且特征映射层具有为负数的权重系数。In some embodiments, the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and the feature map layer has a negative weight coefficient .
在一些实施例中,神经网络模型还包括特征提取网络;获取目标对象分类器的第一输出和场景负向分类器的第二输出包括:获取特征提取网络从训练图像中提取并输出的图像特征;将图像特征输入到目标对象分类器,获得第一输出;将图像特征输入到场景负向分类器,获得第二输出。In some embodiments, the neural network model further includes a feature extraction network; acquiring the first output of the target object classifier and the second output of the scene negative classifier includes: acquiring image features extracted and output from the training image by the feature extraction network ; Input image features to the target object classifier to obtain the first output; input image features to the scene negative classifier to obtain the second output.
在一些实施例中,操作还包括:将采集的真实图像输入到生成式对抗网络的生成网络中,以获得输出的虚拟图像;将虚拟图像确定为训练图像。In some embodiments, the operations further include: inputting the collected real images into the generating network of the generative confrontation network to obtain output virtual images; and determining the virtual images as training images.
在一些实施例中,操作还包括:将来自源场景的图像和来自目标场景的图像输入到生成式对抗网络的生成网络中,以获得生成网络基于来自源场景的图像生成的目标场景的虚拟图像;将目标场景的虚拟图像和来自目标场景的图像输入到生成式对抗网络的判定网络中,以获取判定网络对目标场景的虚拟图像和来自目标场景的图像的场景相似程度的判定结果;计算生成式对抗网络的损失值;根据生成式对抗网络的损失值,对生成式对抗网络的节点的权重进行调整,以获得完成训练的生成式对抗网络。In some embodiments, the operations further include: inputting the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain a virtual image of the target scene generated by the generation network based on the image from the source scene ; Input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of the network to determine the degree of similarity between the virtual image of the target scene and the image from the target scene; The loss value of the type confrontation network; according to the loss value of the generative confrontation network, the weights of the nodes of the generative confrontation network are adjusted to obtain the trained generative confrontation network.
在一些实施例中,操作还包括:令多张显卡同步生成式对抗网络的节点的权重;将多对图像输入到多张显卡中,以便每张显卡根据输入的图像计算生成式对抗网络的损失值,进而计算生成式对抗网络的节点的权重的梯度值,其中,每张显卡接收一对或多对图像的输入,每对图像中的两个图像来自不同的场景;获取每张显卡计算的生成式对抗网络的节点的权重的梯度值;将每张显卡计算的梯度值汇总到内存中,以便内存确定每张显卡计算的梯度值的平均值,进而计算更新后的生成式对抗网络的节点的权重。In some embodiments, the operations further include: causing multiple graphics cards to synchronize the weights of the nodes of the generative confrontation network; inputting multiple pairs of images to the multiple graphics cards, so that each graphics card calculates the loss of the generative confrontation network based on the input images Value, and then calculate the gradient value of the weights of the nodes of the generative confrontation network, where each graphics card receives input of one or more pairs of images, and the two images in each pair of images come from different scenes; obtain the calculated value of each graphics card The gradient value of the weight of the nodes of the generative confrontation network; the gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the nodes of the updated generative confrontation network the weight of.
在一些实施例中,操作还包括:将待测图像输入到完成训练的目标对象分类模型 中;将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果。In some embodiments, the operations further include: inputting the image to be tested into the target object classification model that has completed training; and using the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested.
根据本公开一些实施例的第四个方面,提供一种售货系统,包括:摄像设备,位于售货柜,被配置为响应于售货柜的柜门被开启,采集待测图像;分类装置,被配置为将待测图像输入到完成训练的目标对象分类模型中,以及将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果;以及前述任意一种物品识别装置。According to a fourth aspect of some embodiments of the present disclosure, there is provided a vending system, including: a camera device, located in a vending cabinet, configured to collect an image to be tested in response to a door of the vending cabinet being opened; a sorting device, being It is configured to input the image to be tested into the target object classification model that has completed the training, and use the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested; and any of the aforementioned item recognition Device.
根据本公开一些实施例的第五个方面,提供一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现前述任意一种物品识别方法。According to a fifth aspect of some embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, wherein when the program is executed by a processor, any of the foregoing item identification methods is implemented.
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。Other features and advantages of the present disclosure will become clear through the following detailed description of exemplary embodiments of the present disclosure with reference to the drawings.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present disclosure or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. For those of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained based on these drawings.
图1为根据本公开一些实施例的物品识别方法的流程示意图。FIG. 1 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure.
图2为根据本公开一些实施例的一个示例性的神经网络模型和模型中各个模块之间的关系。FIG. 2 is an exemplary neural network model and the relationship between various modules in the model according to some embodiments of the present disclosure.
图3为根据本公开一些实施例的一个示例性的场景正向分类器的结构示意图。FIG. 3 is a schematic structural diagram of an exemplary scene forward classifier according to some embodiments of the present disclosure.
图4为根据本公开一些实施例的物品识别方法的流程示意图。4 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure.
图5为根据本公开一些实施例的训练图像生成方法的流程示意图。5 is a schematic flowchart of a training image generation method according to some embodiments of the present disclosure.
图6为根据本公开一些实施例的生成式对抗网络训练方法的流程示意图。FIG. 6 is a schematic flowchart of a generative adversarial network training method according to some embodiments of the present disclosure.
图7为根据本公开另一些实施例的生成式对抗网络训练方法的流程示意图。7 is a schematic flowchart of a training method of a generative adversarial network according to some other embodiments of the present disclosure.
图8为根据本公开一些实施例的售货柜售货方法的流程示意图。FIG. 8 is a schematic flowchart of a method for selling a container according to some embodiments of the present disclosure.
图9为根据本公开一些实施例的物品识别装置的结构示意图。9 is a schematic structural diagram of an article identification device according to some embodiments of the present disclosure.
图10为根据本公开一些实施例的售货系统的结构示意图。10 is a schematic structural diagram of a vending system according to some embodiments of the present disclosure.
图11为根据本公开另一些实施例的物品识别装置的结构示意图。11 is a schematic structural diagram of an article identification device according to other embodiments of the present disclosure.
图12为根据本公开又一些实施例的物品识别装置的结构示意图。12 is a schematic structural diagram of an article identification device according to yet other embodiments of the present disclosure.
具体实施方式detailed description
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all the embodiments. The following description of at least one exemplary embodiment is actually merely illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。Unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure.
发明人对相关技术进行分析后发现,当需要识别新的对象时,需要基于新的对象重新训练模型;当需要在新的场景下识别某个对象时,即使已经基于该对象进行了训练,在新场景下的识别准确率也无法得到保证。因此,目前物品识别的准确性较低。从而,本公开实施例所要解决的一个技术问题是:如何提高物品识别的准确性。After analyzing the related technologies, the inventor found that when a new object needs to be recognized, the model needs to be retrained based on the new object; when an object needs to be recognized in a new scene, even if it has been trained based on the object, in The accuracy of recognition in the new scenario cannot be guaranteed. Therefore, the accuracy of item identification is currently low. Therefore, a technical problem to be solved by the embodiments of the present disclosure is: how to improve the accuracy of item identification.
发明人经过进一步分析后发现,一张图像中往往不仅包括待识别的目标对象,还包括背景、光线、明暗等环境信息。例如,设置在无人售货柜处的摄像头在用户拿取售货柜中物品的过程中采集监控图像,以便根据监控图像识别用户拿取了哪些商品。监控图像中除了包括商品以外,还包括背景,例如在公园采集的监控图像中包括的植物、湖水等背景,在商业区采集的监控图像中包括街道、建筑等背景。此外,在不同位置或不同时间点采集的监控图像中,即使是同样的商品,也会在不同的图像中呈现不同的效果。例如,在写字楼内采集的监控图像中的商品受到特定颜色的灯光照射,白天在室外采集的监控图像中的商品受到自然光的照射,傍晚在室外采集的监控图像中的商品受到较微弱的光线的照射等等。这些因素都会提高识别的难度。After further analysis, the inventor found that an image often includes not only the target object to be recognized, but also environmental information such as background, light, and darkness. For example, a camera installed at an unmanned sales container collects monitoring images during the process of the user picking up items in the sales container, so as to identify which commodities the user has taken based on the monitoring images. In addition to commodities, the monitoring images also include backgrounds, such as plants, lakes, and other backgrounds included in the monitoring images collected in parks, and street, building, and other backgrounds in the monitoring images collected in commercial areas. In addition, in the monitoring images collected at different locations or at different time points, even the same product will show different effects in different images. For example, the commodities in the surveillance images collected in the office building are exposed to specific colors of light, the commodities in the surveillance images collected outdoors during the day are exposed to natural light, and the commodities in the surveillance images collected outdoors in the evening are exposed to weaker light. Irradiation and so on. These factors will increase the difficulty of identification.
因此,发明人认识到,需要训练一种能够忽略场景信息、重点识别图像中的目标对象的模型,从而在识别过程中,该模型可以适应多样化的场景。下面参考图1描述本公开物品识别方法的实施例。Therefore, the inventor recognizes that it is necessary to train a model that can ignore the scene information and focus on the target object in the image, so that during the recognition process, the model can be adapted to a variety of scenes. The following describes an embodiment of the article recognition method of the present disclosure with reference to FIG. 1.
图1为根据本公开一些实施例的物品识别方法的流程示意图。如图1所示,该实施例的物品识别方法包括步骤S102~S108。FIG. 1 is a schematic flowchart of an item identification method according to some embodiments of the present disclosure. As shown in FIG. 1, the article identification method of this embodiment includes steps S102 to S108.
在步骤S102中,将训练图像输入到神经网络模型中,其中,神经网络模型包括目标对象分类器和场景负向分类器。In step S102, the training image is input into a neural network model, where the neural network model includes a target object classifier and a scene negative classifier.
在一些实施例中,训练图像可以具有场景分类标签。场景分类标签表示训练图像 是从何种场景采集的、或者表示图像中的环境特征。训练图像中可以包括目标对象,也可以不包括目标对象。目标对象是指待识别的对象,例如待识别的商品、待识别的人等等。当训练图像中包括目标对象时,训练图像具有目标对象分类标签,表示目标对象是哪种商品、哪个人等等。In some embodiments, the training image may have a scene classification label. The scene classification label indicates the scene from which the training image was collected or the environmental features in the image. The training image may or may not include the target object. The target object refers to an object to be recognized, such as a commodity to be recognized, a person to be recognized, and so on. When a target object is included in the training image, the training image has a target object classification label, which indicates which commodity, which person, etc. the target object is.
目标对象分类器用于根据输入的图像数据或图像特征进行打分,打分结果表示识别出的目标对象是哪一种对象,因而是一种正向的分类器。普通场景分类器基于相似的原理,其打分结果表示图像的场景属于哪一种场景。然而,本公开的实施例采用了场景负向分类器。场景负向分类器也用于根据输入的图像数据或图像特征进行打分,但打分结果是基于普通场景分类器的打分结果的相反数确定的,并且与基于普通场景分类器的打分结果的相反数呈正相关关系。The target object classifier is used for scoring according to the input image data or image features, and the scoring result indicates which kind of object the identified target object is, so it is a positive classifier. The general scene classifier is based on a similar principle, and its scoring result indicates which scene the image scene belongs to. However, the embodiments of the present disclosure employ a scene negative classifier. The scene negative classifier is also used to score based on the input image data or image features, but the scoring result is determined based on the opposite number of the scoring result of the ordinary scene classifier, and the opposite number of the scoring result based on the ordinary scene classifier Positive correlation.
在一些实施例中,场景负向分类器基于场景正向分类器、即普通场景分类器实现。场景正向分类器的其中一层具有为负的权重系数。即,场景负向分类器的输出值等于场景正向分类器的输出值乘以为负的权重系数。In some embodiments, the negative scene classifier is implemented based on a positive scene classifier, that is, an ordinary scene classifier. One of the layers in the forward classifier of the scene has a negative weight coefficient. That is, the output value of the scene negative classifier is equal to the output value of the scene positive classifier multiplied by the negative weight coefficient.
在步骤S104中,获取目标对象分类器的第一输出和场景负向分类器的第二输出。In step S104, the first output of the target object classifier and the second output of the scene negative classifier are acquired.
在步骤S106中,根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值。In step S106, the total loss value is determined based on the first loss value determined based on the first output and the second loss value determined based on the second output.
在一些实施例中,损失值是根据输出和标记值的差距确定的。因此,目标对象分类器的分类越准确,第一损失值越小;场景正向分类器的分类越准确,第二损失值越大。从而,可以尽量减小神经网络对场景信息的关注。In some embodiments, the loss value is determined based on the difference between the output and the marker value. Therefore, the more accurate the classification of the target object classifier, the smaller the first loss value; the more accurate the classification of the scene forward classifier, the larger the second loss value. Therefore, the attention of the neural network to the scene information can be minimized.
在步骤S108中,根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型。In step S108, the weights of the nodes in the neural network model are adjusted according to the total loss value, so as to obtain the target object classification model for completing the training.
目标对象分类模型用于物品识别,例如用于售货场景中的物品识别。例如,用户在使用无人售货装置购买商品时,可以通过摄像装置拍摄用户取放商品的图像,然后通过目标对象分类模型识别用户取放的是哪种商品。The target object classification model is used for item identification, for example, item identification in a sales scene. For example, when a user purchases a product using an unmanned vending device, the user can take an image of the user picking and placing the product through the camera device, and then identify which product the user picks and placing through the target object classification model.
通过上述实施例的方法,可以在训练模型的过程中,尽量弱化图像的场景对目标对象物品识别结果的影响,从而使得完成训练的模型可以在各种已有场景下、甚至新场景下具有较高的识别准确率,提高了目标对象分类模型的泛化能力,降低了训练成本。Through the method of the above embodiment, the influence of the scene of the image on the recognition result of the target object item can be weakened as much as possible in the process of training the model, so that the trained model can be more effective in various existing scenes, even new scenes The high recognition accuracy improves the generalization ability of the target object classification model and reduces the training cost.
在一些实施例中,神经网络模型还包括特征提取网络。根据本公开一些实施例的一个示例性的神经网络模型和模型中各个模块之间的关系如图2所示。In some embodiments, the neural network model also includes a feature extraction network. The relationship between an exemplary neural network model and various modules in the model according to some embodiments of the present disclosure is shown in FIG. 2.
在步骤S202中,获取特征提取网络从训练图像中提取并输出的图像特征。In step S202, image features extracted and output from the training image by the feature extraction network are acquired.
在步骤S204中,将图像特征输入到目标对象分类器,获得第一输出。In step S204, the image features are input to the target object classifier to obtain the first output.
在步骤S206中,将图像特征输入到场景负向分类器,获得第二输出。In step S206, the image features are input to the scene negative classifier to obtain the second output.
从而,目标对象分类器和场景负向分类器可以基于预先提取的图像特征进行进一步的分类处理,提高了网络的计算效率。Therefore, the target object classifier and the scene negative classifier can perform further classification processing based on the image features extracted in advance, which improves the calculation efficiency of the network.
在一些实施例中,可以通过反转图像特征的梯度实现场景负向分类器。例如,场景负向分类器基于场景正向分类器实现。场景负向分类器的结构基本与场景正向分类器相同,包括特征映射层、浅层神经网络和场景分类层,特征映射层、浅层神经网络和场景分类层依次连接,如图3所示。而场景负向分类器的特征映射层通过向场景正向分类器的特征映射层添加为负数的权重系数实现。场景负向分类器的特征映射层与权重系数的乘积也可以视为一个梯度反转层。In some embodiments, the negative classifier of the scene can be realized by inverting the gradient of the image features. For example, the scene negative classifier is implemented based on the scene positive classifier. The structure of the scene negative classifier is basically the same as the scene positive classifier, including the feature map layer, shallow neural network and scene classification layer, and the feature map layer, shallow neural network and scene classification layer are connected in sequence, as shown in Figure 3 . The feature mapping layer of the negative scene classifier is implemented by adding a negative weight coefficient to the feature mapping layer of the scene positive classifier. The product of the feature map layer and the weight coefficient of the negative classifier of the scene can also be regarded as a gradient inversion layer.
经过场景负向分类器的特征映射层的处理,输入到浅层神经网络中的信息为对图像特征或图像数据进行梯度反转后的结果;然后,浅层神经网络提取梯度反转后的信息中的“场景特征”,该“场景特征”实际上是原始的训练图像的场景特征与为负数的权重系数相乘后的结果;最后由场景分类层根据“场景特征”输出结果。After processing the feature map layer of the negative classifier of the scene, the information input to the shallow neural network is the result of gradient inversion of the image features or image data; then, the shallow neural network extracts the information after the gradient inversion The "scene features" in "," the "scene features" are actually the result of multiplying the scene features of the original training image and the negative weight coefficients; finally, the scene classification layer outputs the results according to the "scene features".
在一些实施例中,目标对象分类器可以仅有一层,从而可以采用特征提取网络执行目标对象分类过程中的大部分处理。In some embodiments, the target object classifier may have only one layer, so that a feature extraction network may be used to perform most of the processing in the target object classification process.
由于场景负向分类器仅用于辅助训练,因此在完成模型训练后,可以仅利用目标对象分类器进行预测。下面参考图4描述本公开目标对象分类方法的实施例。Since the negative classifier of the scene is only used for auxiliary training, after the model training is completed, only the target object classifier can be used for prediction. The embodiment of the target object classification method of the present disclosure is described below with reference to FIG. 4.
图4为根据本公开一些实施例的目标对象分类方法的流程示意图。如图4所示,该实施例的目标对象分类方法包括步骤S402~S404。4 is a schematic flowchart of a target object classification method according to some embodiments of the present disclosure. As shown in FIG. 4, the target object classification method of this embodiment includes steps S402 to S404.
在步骤S402中,将待测图像输入到完成训练的目标对象分类模型中。In step S402, the image to be tested is input into a target object classification model that has completed training.
在步骤S404中,将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果。例如,可以将待测图像输入到目标对象分类模型的特征提取网络中,特征提取网络将提取的图像特征输入到目标对象分类器中。场景负向分类器在预测阶段可以不使用。In step S404, the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be measured. For example, the image to be tested may be input into the feature extraction network of the target object classification model, and the feature extraction network inputs the extracted image features into the target object classifier. The scene negative classifier may not be used in the prediction stage.
通过采用泛化能力更强的目标对象分类模型,在针对从各种场景采集的图像进行对目标对象的分类时,能够获得更准确的物品识别结果。By adopting a target object classification model with stronger generalization ability, more accurate item recognition results can be obtained when classifying target objects for images collected from various scenes.
在训练阶段,为了使得模型的预测效果更好,需要采集大量的图像用于训练,再对图像进行人工标注。在本公开的一些实施例中,为了进一步提升训练效率、节约人 工成本,可以采集一些真实图像,再基于采集的真实图像生成虚拟图像,并将真实图像和虚拟图像共同用于目标对象分类网络的训练过程。下面参考图5描述本公开训练图像生成方法的实施例。In the training stage, in order to make the prediction effect of the model better, it is necessary to collect a large number of images for training, and then manually label the images. In some embodiments of the present disclosure, in order to further improve training efficiency and save labor costs, some real images can be collected, and then virtual images can be generated based on the collected real images, and the real images and the virtual images can be used together in the target object classification network. Training process. The following describes an embodiment of the training image generation method of the present disclosure with reference to FIG. 5.
图5为根据本公开一些实施例的训练图像生成方法的流程示意图。如图5所示,该实施例的训练图像生成方法包括步骤S502~S504。5 is a schematic flowchart of a training image generation method according to some embodiments of the present disclosure. As shown in FIG. 5, the training image generation method of this embodiment includes steps S502 to S504.
在步骤S502中,将采集的真实图像输入到生成式对抗网络的生成网络中,以获得输出的虚拟图像。生成网络是用于生成虚拟图像的神经网络。In step S502, the collected real image is input into the generating network of the generative confrontation network to obtain the output virtual image. The generative network is a neural network used to generate virtual images.
在步骤S504中,将虚拟图像确定为训练图像。当然,训练图像中还可以包括真实图像。In step S504, the virtual image is determined as the training image. Of course, the training images may also include real images.
通过上述实施例的方法,可以基于采集的真实图像生成虚拟图像,降低了图像采集和人工标注的成本,提高了训练效率。Through the method of the above embodiment, a virtual image can be generated based on the collected real image, which reduces the cost of image acquisition and manual annotation, and improves the training efficiency.
在一些实施例中,可以通过训练生成式对抗网络来获得完成训练的生成网络。下面参考图6描述本公开生成式对抗网络训练方法的实施例。In some embodiments, a training generative network can be trained to obtain a training network that completes the training. The following describes an embodiment of the generative adversarial network training method of the present disclosure with reference to FIG. 6.
图6为根据本公开一些实施例的生成式对抗网络训练方法的流程示意图。如图6所示,该实施例的生成式对抗网络训练方法包括步骤S602~S608。FIG. 6 is a schematic flowchart of a generative adversarial network training method according to some embodiments of the present disclosure. As shown in FIG. 6, the generative adversarial network training method of this embodiment includes steps S602 to S608.
在步骤S602中,将来自源场景的图像和来自目标场景的图像输入到生成式对抗网络的生成网络中,以获得生成网络基于源场景图像生成的目标场景虚拟图像。In step S602, the image from the source scene and the image from the target scene are input into the generation network of the generative confrontation network to obtain the target scene virtual image generated by the generation network based on the source scene image.
来自源场景的图像为从源场景采集的图像,来自目标场景的图像为从目标场景采集的图像。源场景例如可以为实验室场景,目标场景例如可以为公园、街道、商场等实际应用场景。The image from the source scene is an image collected from the source scene, and the image from the target scene is an image collected from the target scene. The source scene may be, for example, a laboratory scene, and the target scene may be, for example, an actual application scene such as a park, street, or shopping mall.
在步骤S604中,将目标场景的虚拟图像和来自目标场景的图像输入到生成式对抗网络的判定网络中,以获取判定网络对目标场景的虚拟图像和来自目标场景的图像的场景相似程度的判定结果。In step S604, the virtual image of the target scene and the image from the target scene are input into the judgment network of the generative adversarial network to obtain the judgment of the judgment degree of the scene similarity between the virtual image of the target scene and the image from the target scene by the network result.
在步骤S606中,计算生成式对抗网络的损失值。In step S606, the loss value of the generative confrontation network is calculated.
在步骤S608中,根据生成式对抗网络的损失值,对生成式对抗网络的节点的权重进行调整,以获得完成训练的生成式对抗网络。In step S608, according to the loss value of the generative adversarial network, the weights of the nodes of the generative adversarial network are adjusted to obtain the trained generative adversarial network.
生成网络用于基于来自源场景的图像,生成尽量与来自目标场景的图像相似的目标场景的虚拟图像。对抗网络的目标是通过识别目标场景的虚拟图像和来自目标场景的图像是否相似,来判定目标场景的虚拟图像是真实图像还是虚拟图像。这两种网络通过相互博弈,可以不断地进行优化,直到判断网络无法识别生成网络生成的图像是 否真实。The generation network is used to generate a virtual image of the target scene as similar as possible to the image from the target scene based on the image from the source scene. The goal of the adversarial network is to determine whether the virtual image of the target scene is a real image or a virtual image by identifying whether the virtual image of the target scene and the image from the target scene are similar. These two networks can continue to optimize through game with each other until they judge whether the network cannot identify whether the image generated by the generated network is real.
通过上述实施例的方法,可以基于真实图像生成属于不同场景的虚拟图像,从而可以为目标对象分类模型的训练过程提供属于多种场景的、大量的训练图像,因此能够提升目标对象分类模型的训练效率。Through the method of the above embodiment, virtual images belonging to different scenes can be generated based on real images, so that a large number of training images belonging to multiple scenes can be provided for the training process of the target object classification model, so the training of the target object classification model can be improved effectiveness.
在一些实施例中,可以利用多张显卡协同完成生成式对抗网络的训练过程,以提高训练效率。下面参考图7描述本公开生成式对抗网络训练方法实施例。In some embodiments, multiple graphics cards may be used to collaboratively complete the training process of the generative confrontation network to improve training efficiency. The following describes an embodiment of the generative adversarial network training method of the present disclosure with reference to FIG. 7.
图7为根据本公开另一些实施例的生成式对抗网络训练方法的流程示意图。如图7所示,该实施例的生成式对抗网络训练方法包括步骤S702~S708。7 is a schematic flowchart of a training method of a generative adversarial network according to some other embodiments of the present disclosure. As shown in FIG. 7, the generative adversarial network training method of this embodiment includes steps S702 to S708.
在步骤S702中,令多张显卡同步生成式对抗网络的节点的权重。In step S702, multiple graphics cards are synchronized to generate the weights of the nodes against the network.
在步骤S704中,将多对图像输入到多张显卡中,以便每张显卡根据输入的图像计算生成式对抗网络的损失值,进而计算生成式对抗网络的节点的权重的梯度值,其中,每张显卡接收一对或多对图像的输入,每对图像中的两个图像来自不同的场景。In step S704, multiple pairs of images are input to multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network according to the input image, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each Each graphics card receives input from one or more pairs of images, and the two images in each pair come from different scenes.
例如,可以向具有24G显存的一张显卡输入4对图像,采用4张显卡同时进行训练。这种情况下,可以同时进行16对图像的计算过程。For example, you can input 4 pairs of images to a graphics card with 24G memory, and use 4 graphics cards to train at the same time. In this case, the calculation process of 16 pairs of images can be performed simultaneously.
在步骤S706中,获取每张显卡计算的生成式对抗网络的节点的权重的梯度值。In step S706, the gradient value of the weight of the node of the generative confrontation network calculated by each graphics card is obtained.
在步骤S708中,将每张显卡计算的梯度值汇总到内存中,以便内存确定每张显卡计算的梯度值的平均值,进而计算更新后的生成式对抗网络的节点的权重。In step S708, the gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the weight of the updated generative confrontation network node.
在一些实施例中,在更新权重后,可以回到步骤S702,从而可以令多张显卡同步最新的权重,并迭代执行上述步骤。In some embodiments, after the weights are updated, step S702 may be returned to, so that multiple graphics cards may synchronize the latest weights and iteratively execute the above steps.
通过上述实施例的方法,可以将多对图像输入到多张显卡中,利用显卡间的通讯同步生成式对抗网络的节点的权重,以便多张显卡可以同时进行训练过程。从而,提高了训练效率。Through the method of the above embodiment, multiple pairs of images can be input into multiple graphics cards, and the communication between the graphics cards can be used to synchronously generate the weight of the nodes against the network, so that multiple graphics cards can perform the training process at the same time. Thus, the training efficiency is improved.
本公开的一些实施例可以应用于无人售货装置的售货场景。当用户打开无人售货柜的柜门取货时,安装在无人售货柜的摄像头可以采集用户拿取货物时的图像。然后,可以通过本公开的目标对象分类方法识别用户手中拿取的商品。下面参考图8描述本公开无人售货柜售货方法的实施例。Some embodiments of the present disclosure may be applied to sales scenarios of unmanned sales devices. When the user opens the door of the unmanned vending container to pick up the goods, the camera installed in the unmanned vending container can collect the image when the user takes the goods. Then, the merchandise held by the user can be identified through the target object classification method of the present disclosure. The embodiment of the unmanned vending sales method of the present disclosure is described below with reference to FIG. 8.
图8为根据本公开一些实施例的无人售货柜售货方法的流程示意图。如图8所示,该实施例的售货方法包括步骤S802~S806。8 is a schematic flowchart of an unmanned vending container sales method according to some embodiments of the present disclosure. As shown in FIG. 8, the sales method of this embodiment includes steps S802 to S806.
在步骤S802中,响应于售货柜的柜门被开启,采集待测图像。待测图像中包括用户拿取商品的画面。采集的待测图像可以通过网络发送给服务器侧进行进一步处 理,或者通过网络、近距离无线通信手段、数据传输线路传输给售货柜内置的处理模块。In step S802, in response to the door of the vending cabinet being opened, an image to be measured is collected. The image to be tested includes a picture of the user taking the product. The collected images to be tested can be sent to the server side for further processing through the network, or transmitted to the processing module built in the vending cabinet through the network, short-range wireless communication means, and data transmission lines.
在步骤S804中,将待测图像输入到完成训练的目标对象分类模型中。在一些实施例中,还可以首先对待测图像进行目标对象检测,确定图像中目标对象所在位置,然后将目标对象所在位置的图像输入到目标对象分类模型中。In step S804, the image to be tested is input into the target object classification model that has completed the training. In some embodiments, the target image may be detected first to determine the location of the target object in the image, and then the image of the location of the target object may be input into the target object classification model.
在步骤S806中,将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果,根据物品识别结果确定商品的标识。In step S806, the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be measured, and the product identification is determined according to the item recognition result.
从而,可以确定用户拿取的物品的SKU(Stock Keeping Unit,库存量单位)、名称、价格、规格等信息,以便对用户拿取的物品进行结算,实现自动售货流程。Therefore, the SKU (Stock Keeping Unit), name, price, specifications and other information of the items taken by the user can be determined, so as to settle the items taken by the user and realize the automatic vending process.
通过上述实施例的方法,针对部署在各种场景中的无人售货装置的摄像头采集的图像,均可以准确地识别图像中被用户拿取的物品,从而可以提高自动售货机的售货效率与商品结算的准确性。Through the method of the above embodiment, the images collected by the cameras of the unmanned vending device deployed in various scenarios can accurately identify the items taken by the user in the image, thereby improving the sales efficiency of the vending machine Accuracy with commodity settlement.
在一些实施例中,还可以利用无人售货柜来采集训练图像。从而,可以将使用过程中采集的图像用于训练,进一步提升了训练效率。In some embodiments, an unmanned vending container can also be used to collect training images. Therefore, the images collected during use can be used for training, which further improves the training efficiency.
下面参考图9描述本公开物品识别装置的实施例。Next, an embodiment of the article identification device of the present disclosure will be described with reference to FIG. 9.
图9为根据本公开一些实施例的物品识别装置的结构示意图。如图9所示,该实施例的物品识别装置900包括:训练图像输入模块9100,被配置为将训练图像输入到神经网络模型中,其中,所述神经网络模型包括目标对象分类器和场景负向分类器;输出获取模块9200,被配置为获取目标对象分类器的第一输出和场景负向分类器的第二输出;总损失值计算模块9300,被配置为根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值;权重调整模块9400,被配置为根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型,以便采用目标对象分类模型识别物品。9 is a schematic structural diagram of an article identification device according to some embodiments of the present disclosure. As shown in FIG. 9, the article recognition device 900 of this embodiment includes: a training image input module 9100 configured to input a training image into a neural network model, where the neural network model includes a target object classifier and a scene negative To the classifier; the output acquisition module 9200 is configured to acquire the first output of the target object classifier and the second output of the scene negative classifier; the total loss value calculation module 9300 is configured to determine the first output based on the first output A loss value and a second loss value determined based on the second output to determine the total loss value; the weight adjustment module 9400 is configured to adjust the weight of the nodes in the neural network model according to the total loss value to obtain the completed training Target object classification model, in order to use the target object classification model to identify items.
在一些实施例中,场景负向分类器通过为场景正向分类器的其中一层添加为负的权重系数实现。In some embodiments, the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
在一些实施例中,场景正向分类器包括特征映射层、浅层神经网络和场景分类层;特征映射层、浅层神经网络和场景分类层依次连接,并且特征映射层具有为负数的权重系数。In some embodiments, the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and the feature map layer has a negative weight coefficient .
在一些实施例中,神经网络模型还包括特征提取网络;输出获取模块9200进一步被配置为获取特征提取网络从训练图像中提取并输出的图像特征;将图像特征输入 到目标对象分类器,获得第一输出;将图像特征输入到场景负向分类器,获得第二输出。In some embodiments, the neural network model further includes a feature extraction network; the output acquisition module 9200 is further configured to acquire image features extracted and output from the training image by the feature extraction network; input the image features to the target object classifier to obtain the first One output; input the image features to the negative classifier of the scene to obtain the second output.
在一些实施例中,物品识别装置900还包括:虚拟图像生成模块9500,被配置为将采集的真实图像输入到生成式对抗网络的生成网络中,获得输出的虚拟图像;将虚拟图像确定为训练图像。In some embodiments, the item recognition device 900 further includes: a virtual image generation module 9500 configured to input the collected real image into the generation network of the generative confrontation network to obtain an output virtual image; and determine the virtual image as training image.
在一些实施例中,物品识别装置900还包括:生成式对抗网络训练模块9600,被配置为将来自源场景的图像和来自目标场景的图像输入到生成式对抗网络的生成网络中,获得生成网络基于来自源场景的图像生成的目标场景的虚拟图像;将目标场景的虚拟图像和来自目标场景的图像输入到生成式对抗网络的判定网络中,获取判定网络对目标场景的虚拟图像和来自目标场景的图像的场景相似程度的判定结果;计算生成式对抗网络的损失值;根据生成式对抗网络的损失值,对生成式对抗网络的节点的权重进行调整,以获得完成训练的生成式对抗网络。In some embodiments, the item recognition apparatus 900 further includes a generative confrontation network training module 9600 configured to input images from the source scene and the target scene into the generative confrontation network's generation network to obtain the generation network The virtual image of the target scene generated based on the image from the source scene; the virtual image of the target scene and the image from the target scene are input into the judgment network of the generative confrontation network to obtain the virtual image of the judgment network on the target scene and the target scene The judgment result of the scene similarity of the image is calculated; the loss value of the generative adversarial network is calculated; the weights of the nodes of the generative adversarial network are adjusted according to the loss value of the generative adversarial network to obtain the trained generative adversarial network.
在一些实施例中,生成式对抗网络训练模块9600也可以被配置为令多张显卡同步生成式对抗网络的节点的权重;将多对图像输入到所述多张显卡中,以便每张显卡根据输入的图像计算生成式对抗网络的损失值,进而计算生成式对抗网络的节点的权重的梯度值,其中,每张显卡接收一对或多对图像的输入,每对图像中的两个图像来自不同的场景;获取每张显卡计算的生成式对抗网络的节点的权重的梯度值;将每张显卡计算的所述梯度值汇总到内存中,以便内存确定每张显卡计算的所述梯度值的平均值,进而计算更新后的生成式对抗网络的节点的权重。In some embodiments, the generative adversarial network training module 9600 may also be configured to allow multiple graphics cards to synchronize the weights of the nodes of the generative adversarial network; input multiple pairs of images to the plurality of graphics cards, so that each graphics card is based on The input image calculates the loss value of the generative confrontation network, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each graphics card receives the input of one or more pairs of images, and the two images in each pair of images come from Different scenarios; Obtain the gradient values of the weights of the nodes of the generative generation network calculated by each graphics card; summarize the gradient values calculated by each graphics card into memory, so that the memory determines the gradient values calculated by each graphics card The average value is used to calculate the weight of the nodes of the updated generative adversarial network.
下面参考图10描述本公开售货系统的实施例。The following describes an embodiment of the vending system of the present disclosure with reference to FIG. 10.
图10为根据本公开一些实施例的售货系统的结构示意图。如图10所示,该实施例的售货系统100包括:摄像设备1010,位于售货柜,被配置为响应于售货柜的柜门被开启,采集待测图像;分类装置1020,被配置为将待测图像输入到完成训练的目标对象分类模型中,以及将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果;以及物品识别装置1030。物品识别装置1030的具体实施方式可以参考图9实施例中的物品识别装置900。10 is a schematic structural diagram of a vending system according to some embodiments of the present disclosure. As shown in FIG. 10, the vending system 100 of this embodiment includes: a camera device 1010, located in a vending cabinet, configured to collect images to be tested in response to the door of the vending cabinet being opened; a classification device 1020, configured to The image to be tested is input into the target object classification model that has completed the training, and the output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be tested; and the item recognition device 1030. For a specific implementation of the item identification device 1030, reference may be made to the item identification device 900 in the embodiment of FIG. 9.
分类装置1020和物品识别装置1030可以位于同一个设备上,也可以位于不同的设备上。分类装置1020和物品识别装置1030中的至少一个例如可以位于服务器侧,也可以位于售货装置中。The classification device 1020 and the item identification device 1030 may be located on the same device, or may be located on different devices. At least one of the classification device 1020 and the item identification device 1030 may be located on the server side, for example, or may be located in the vending device.
图11为根据本公开另一些实施例的物品识别装置的结构示意图。如图11所示, 该实施例的物品识别装置110包括:存储器1110以及耦接至该存储器1110的处理器1120,处理器1120被配置为基于存储在存储器1110中的指令,执行前述任意一个实施例中的物品识别方法。11 is a schematic structural diagram of an article identification device according to other embodiments of the present disclosure. As shown in FIG. 11, the article identification device 110 of this embodiment includes: a memory 1110 and a processor 1120 coupled to the memory 1110. The processor 1120 is configured to perform any of the foregoing implementations based on instructions stored in the memory 1110 Example item identification method.
其中,存储器1110例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。The memory 1110 may include, for example, a system memory, a fixed non-volatile storage medium, and so on. The system memory stores, for example, an operating system, application programs, a boot loader (Boot Loader), and other programs.
图12为根据本公开又一些实施例的物品识别装置的结构示意图。如图12所示,该实施例的物品识别装置120包括:存储器1210以及处理器1220,还可以包括输入输出接口1230、网络接口1240、存储接口1250等。这些接口1230,1240,1250以及存储器1210和处理器1220之间例如可以通过总线1260连接。其中,输入输出接口1230为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口1240为各种联网设备提供连接接口。存储接口1250为SD卡、U盘等外置存储设备提供连接接口。12 is a schematic structural diagram of an article identification device according to yet other embodiments of the present disclosure. As shown in FIG. 12, the article identification device 120 of this embodiment includes a memory 1210 and a processor 1220, and may further include an input/output interface 1230, a network interface 1240, a storage interface 1250, and the like. The interfaces 1230, 1240, 1250 and the memory 1210 and the processor 1220 may be connected via a bus 1260, for example. The input/output interface 1230 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 1240 provides a connection interface for various networked devices. The storage interface 1250 provides a connection interface for external storage devices such as SD cards and U disks.
本公开的实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现前述任意一种物品识别方法。Embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, which is characterized in that when the program is executed by a processor, any one of the foregoing item identification methods is implemented.
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code .
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple flows of a flowchart and/or one block or multiple blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一 个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present disclosure shall be included in the protection of the present disclosure Within range.

Claims (20)

  1. 一种物品识别方法,包括:An item identification method, including:
    将训练图像输入到神经网络模型中,其中,所述神经网络模型包括目标对象分类器和场景负向分类器;Input the training image into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier;
    获取目标对象分类器的第一输出和场景负向分类器的第二输出;Obtain the first output of the target object classifier and the second output of the scene negative classifier;
    根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值;Determine the total loss value according to the first loss value determined based on the first output and the second loss value determined based on the second output;
    根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型,以便采用所述目标对象分类模型识别物品。The weights of the nodes in the neural network model are adjusted according to the total loss value to obtain a target object classification model that has completed training, so that the target object classification model is used to identify items.
  2. 根据权利要求1所述的物品识别方法,其中,所述场景负向分类器通过为场景正向分类器的其中一层添加为负的权重系数实现。The item identification method according to claim 1, wherein the scene negative classifier is implemented by adding a negative weight coefficient to one of the layers of the scene positive classifier.
  3. 根据权利要求2所述的物品识别方法,其中,所述场景正向分类器包括特征映射层、浅层神经网络和场景分类层;特征映射层、浅层神经网络和场景分类层依次连接,并且特征映射层具有为负数的权重系数。The item recognition method according to claim 2, wherein the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are sequentially connected, and The feature map layer has a negative weight coefficient.
  4. 根据权利要求1所述的物品识别方法,其中,所述神经网络模型还包括特征提取网络;The item recognition method according to claim 1, wherein the neural network model further includes a feature extraction network;
    所述获取目标对象分类器的第一输出和场景负向分类器的第二输出包括:The first output of acquiring the target object classifier and the second output of the scene negative classifier include:
    获取特征提取网络从训练图像中提取并输出的图像特征;Obtain the image features extracted and output from the training image by the feature extraction network;
    将图像特征输入到目标对象分类器,获得第一输出;Input image features to the target object classifier to obtain the first output;
    将图像特征输入到场景负向分类器,获得第二输出。The image features are input to the negative classifier of the scene to obtain the second output.
  5. 根据权利要求1~4中任一项所述的物品识别方法,还包括:The article identification method according to any one of claims 1 to 4, further comprising:
    将采集的真实图像输入到生成式对抗网络的生成网络中,以获得输出的虚拟图像;Input the collected real images into the generation network of the generative confrontation network to obtain the output virtual images;
    将虚拟图像确定为训练图像。The virtual image is determined as the training image.
  6. 根据权利要求5所述的物品识别方法,还包括:The article identification method according to claim 5, further comprising:
    将来自源场景的图像和来自目标场景的图像输入到生成式对抗网络的生成网络中,以获得生成网络基于来自源场景的图像生成的目标场景的虚拟图像;Input the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain a virtual image of the target scene generated by the generation network based on the image from the source scene;
    将目标场景的虚拟图像和来自目标场景的图像输入到生成式对抗网络的判定网络中,以获取判定网络对目标场景的虚拟图像和来自目标场景的图像的场景相似程度 的判定结果;Input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of the judgment degree of the scene similarity between the virtual image of the target scene and the image from the target scene;
    计算生成式对抗网络的损失值;Calculate the loss value of the generative confrontation network;
    根据生成式对抗网络的损失值,对生成式对抗网络的节点的权重进行调整,以获得完成训练的生成式对抗网络。According to the loss value of the generative adversarial network, the weights of the nodes of the generative adversarial network are adjusted to obtain the trained generative adversarial network.
  7. 根据权利要求5所述的物品识别方法,还包括:The article identification method according to claim 5, further comprising:
    令多张显卡同步生成式对抗网络的节点的权重;Make multiple graphics cards synchronously generate weights against the nodes of the network;
    将多对图像输入到所述多张显卡中,以便每张显卡根据输入的图像计算生成式对抗网络的损失值,进而计算生成式对抗网络的节点的权重的梯度值,其中,每张显卡接收一对或多对图像的输入,每对图像中的两个图像来自不同的场景;Input multiple pairs of images into the multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network based on the input image, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each graphics card receives Input of one or more pairs of images, two images in each pair of images come from different scenes;
    获取每张显卡计算的生成式对抗网络的节点的权重的梯度值;Obtain the gradient value of the weight of the nodes of the generative confrontation network calculated by each graphics card;
    将每张显卡计算的所述梯度值汇总到内存中,以便内存确定每张显卡计算的所述梯度值的平均值,进而计算更新后的生成式对抗网络的节点的权重。The gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the weight of the updated generative confrontation network node.
  8. 根据权利要求1所述的物品识别方法,还包括:The article identification method according to claim 1, further comprising:
    将待测图像输入到完成训练的目标对象分类模型中;Input the image to be tested into the target object classification model after training;
    将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果。The output of the target object classifier of the target object classification model is used as the item recognition result of the target object in the image to be tested.
  9. 根据权利要求8所述的物品识别方法,还包括:The article identification method according to claim 8, further comprising:
    响应于售货柜的柜门被开启,采集待测图像。In response to the door of the vending cabinet being opened, the image to be tested is collected.
  10. 一种物品识别装置,包括:An item identification device, including:
    训练图像输入模块,被配置为将训练图像输入到神经网络模型中,其中,所述神经网络模型包括目标对象分类器和场景负向分类器;A training image input module configured to input training images into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier;
    输出获取模块,被配置为获取目标对象分类器的第一输出和场景负向分类器的第二输出;The output acquisition module is configured to acquire the first output of the target object classifier and the second output of the scene negative classifier;
    总损失值计算模块,被配置为根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值;The total loss value calculation module is configured to determine the total loss value based on the first loss value determined based on the first output and the second loss value determined based on the second output;
    权重调整模块,被配置为根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型,其中,所述目标对象分类模型用于识别物品。The weight adjustment module is configured to adjust the weights of the nodes in the neural network model according to the total loss value to obtain a target object classification model for completing training, wherein the target object classification model is used to identify items.
  11. 一种物品识别装置,包括:An item identification device, including:
    存储器;以及Storage; and
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行包括以下操作的物品识别方法:A processor coupled to the memory, the processor configured to perform an item identification method including the following operations based on instructions stored in the memory:
    将训练图像输入到神经网络模型中,其中,所述神经网络模型包括目标对象分类器和场景负向分类器;Input the training image into a neural network model, wherein the neural network model includes a target object classifier and a scene negative classifier;
    获取目标对象分类器的第一输出和场景负向分类器的第二输出;Obtain the first output of the target object classifier and the second output of the scene negative classifier;
    根据基于第一输出确定的第一损失值、以及基于第二输出确定的第二损失值,确定总损失值;Determine the total loss value according to the first loss value determined based on the first output and the second loss value determined based on the second output;
    根据总损失值对神经网络模型中的节点的权重进行调整,以获得完成训练的目标对象分类模型,以便采用所述目标对象分类模型识别物品。The weights of the nodes in the neural network model are adjusted according to the total loss value to obtain a target object classification model that has completed training, so that the target object classification model is used to identify items.
  12. 根据权利要求11所述的物品识别装置,其中,所述场景负向分类器通过为场景正向分类器的其中一层添加为负的权重系数实现。The item identification device according to claim 11, wherein the negative scene classifier is implemented by adding a negative weight coefficient to one of the layers of the positive scene classifier.
  13. 根据权利要求12所述的物品识别装置,其中,所述场景正向分类器包括特征映射层、浅层神经网络和场景分类层;特征映射层、浅层神经网络和场景分类层依次连接,并且特征映射层具有为负数的权重系数。The item recognition device according to claim 12, wherein the scene forward classifier includes a feature map layer, a shallow neural network, and a scene classification layer; the feature map layer, the shallow neural network, and the scene classification layer are connected in sequence, and The feature map layer has a negative weight coefficient.
  14. 根据权利要求11所述的物品识别装置,其中,所述神经网络模型还包括特征提取网络;The article identification device according to claim 11, wherein the neural network model further includes a feature extraction network;
    所述获取目标对象分类器的第一输出和场景负向分类器的第二输出包括:The first output of acquiring the target object classifier and the second output of the scene negative classifier include:
    获取特征提取网络从训练图像中提取并输出的图像特征;Obtain the image features extracted and output from the training image by the feature extraction network;
    将图像特征输入到目标对象分类器,获得第一输出;Input image features to the target object classifier to obtain the first output;
    将图像特征输入到场景负向分类器,获得第二输出。The image features are input to the negative classifier of the scene to obtain the second output.
  15. 根据权利要求11~14中任一项所述的物品识别装置,其中,所述操作还包括:The article identification device according to any one of claims 11 to 14, wherein the operation further includes:
    将采集的真实图像输入到生成式对抗网络的生成网络中,以获得输出的虚拟图像;Input the collected real images into the generation network of the generative confrontation network to obtain the output virtual images;
    将虚拟图像确定为训练图像。The virtual image is determined as the training image.
  16. 根据权利要求15所述的物品识别装置,其中,所述操作还包括:The article identification device according to claim 15, wherein the operation further comprises:
    将来自源场景的图像和来自目标场景的图像输入到生成式对抗网络的生成网络中,以获得生成网络基于来自源场景的图像生成的目标场景的虚拟图像;Input the image from the source scene and the image from the target scene into the generation network of the generative confrontation network to obtain a virtual image of the target scene generated by the generation network based on the image from the source scene;
    将目标场景的虚拟图像和来自目标场景的图像输入到生成式对抗网络的判定网络中,以获取判定网络对目标场景的虚拟图像和来自目标场景的图像的场景相似程度的判定结果;Input the virtual image of the target scene and the image from the target scene into the judgment network of the generative adversarial network to obtain the judgment result of the judgment degree of the scene similarity between the virtual image of the target scene and the image from the target scene;
    计算生成式对抗网络的损失值;Calculate the loss value of the generative confrontation network;
    根据生成式对抗网络的损失值,对生成式对抗网络的节点的权重进行调整,以获得完成训练的生成式对抗网络。According to the loss value of the generative adversarial network, the weights of the nodes of the generative adversarial network are adjusted to obtain the trained generative adversarial network.
  17. 根据权利要求15所述的物品识别装置,其中,所述操作还包括:The article identification device according to claim 15, wherein the operation further comprises:
    令多张显卡同步生成式对抗网络的节点的权重;Make multiple graphics cards synchronously generate weights against the nodes of the network;
    将多对图像输入到所述多张显卡中,以便每张显卡根据输入的图像计算生成式对抗网络的损失值,进而计算生成式对抗网络的节点的权重的梯度值,其中,每张显卡接收一对或多对图像的输入,每对图像中的两个图像来自不同的场景;Input multiple pairs of images into the multiple graphics cards, so that each graphics card calculates the loss value of the generative confrontation network based on the input image, and then calculates the gradient value of the weight of the node of the generative confrontation network, where each graphics card receives Input of one or more pairs of images, two images in each pair of images come from different scenes;
    获取每张显卡计算的生成式对抗网络的节点的权重的梯度值;Obtain the gradient value of the weight of the nodes of the generative confrontation network calculated by each graphics card;
    将每张显卡计算的所述梯度值汇总到内存中,以便内存确定每张显卡计算的所述梯度值的平均值,进而计算更新后的生成式对抗网络的节点的权重。The gradient values calculated by each graphics card are aggregated into the memory, so that the memory determines the average value of the gradient values calculated by each graphics card, and then calculates the weight of the updated generative confrontation network node.
  18. 根据权利要求11所述的物品识别装置,其中,所述操作还包括:The article identification device according to claim 11, wherein the operation further comprises:
    将待测图像输入到完成训练的目标对象分类模型中;Input the image to be tested into the target object classification model after training;
    将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果。The output of the target object classifier of the target object classification model is used as the object recognition result of the target object in the image to be tested.
  19. 一种售货系统,包括:A vending system, including:
    摄像设备,位于售货柜,被配置为响应于售货柜的柜门被开启,采集待测图像;The camera device, located in the vending cabinet, is configured to collect the image to be tested in response to the door of the vending cabinet being opened;
    分类装置,被配置为将待测图像输入到完成训练的目标对象分类模型中,以及将目标对象分类模型的目标对象分类器的输出作为待测图像中的目标对象的物品识别结果;以及A classification device configured to input the image to be tested into the target object classification model that has completed the training, and use the output of the target object classifier of the target object classification model as the item recognition result of the target object in the image to be tested; and
    权利要求11~18中任一项所述的物品识别装置。The article identification device according to any one of claims 11 to 18.
  20. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1~9中任一项所述的物品识别方法。A computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the article identification method according to any one of claims 1 to 9.
PCT/CN2019/099811 2018-12-29 2019-08-08 Article recognition method and device, vending system, and storage medium WO2020134102A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811630337.5 2018-12-29
CN201811630337.5A CN109754009B (en) 2018-12-29 2018-12-29 Article identification method, article identification device, vending system and storage medium

Publications (1)

Publication Number Publication Date
WO2020134102A1 true WO2020134102A1 (en) 2020-07-02

Family

ID=66404347

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/099811 WO2020134102A1 (en) 2018-12-29 2019-08-08 Article recognition method and device, vending system, and storage medium

Country Status (2)

Country Link
CN (1) CN109754009B (en)
WO (1) WO2020134102A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036455A (en) * 2020-08-19 2020-12-04 浙江大华技术股份有限公司 Image identification method, intelligent terminal and storage medium
CN113052246A (en) * 2021-03-30 2021-06-29 北京百度网讯科技有限公司 Method and related device for training classification model and image classification
CN113743459A (en) * 2021-07-29 2021-12-03 深圳云天励飞技术股份有限公司 Target detection method and device, electronic equipment and storage medium
CN114049518A (en) * 2021-11-10 2022-02-15 北京百度网讯科技有限公司 Image classification method and device, electronic equipment and storage medium
CN114372940A (en) * 2021-12-15 2022-04-19 南京邮电大学 Real scene image synthesis method and system
CN116129201A (en) * 2023-04-18 2023-05-16 新立讯科技股份有限公司 Commodity biological feature extraction and verification method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754009B (en) * 2018-12-29 2021-07-13 北京沃东天骏信息技术有限公司 Article identification method, article identification device, vending system and storage medium
CN110490225B (en) * 2019-07-09 2022-06-28 北京迈格威科技有限公司 Scene-based image classification method, device, system and storage medium
CN111144417B (en) * 2019-12-27 2023-08-01 创新奇智(重庆)科技有限公司 Intelligent container small target detection method and detection system based on teacher and student network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170290095A1 (en) * 2016-03-30 2017-10-05 The Markov Corporation Electronic oven with infrared evaluative control
CN107729822A (en) * 2017-09-27 2018-02-23 北京小米移动软件有限公司 Object identifying method and device
CN107833209A (en) * 2017-10-27 2018-03-23 浙江大华技术股份有限公司 A kind of x-ray image detection method, device, electronic equipment and storage medium
CN108921040A (en) * 2018-06-08 2018-11-30 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment
CN109754009A (en) * 2018-12-29 2019-05-14 北京沃东天骏信息技术有限公司 Item identification method, device, vending system and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8856123B1 (en) * 2007-07-20 2014-10-07 Hewlett-Packard Development Company, L.P. Document classification
CN102509119B (en) * 2011-09-30 2014-03-05 北京航空航天大学 Method for processing image scene hierarchy and object occlusion based on classifier
US20170039484A1 (en) * 2015-08-07 2017-02-09 Hewlett-Packard Development Company, L.P. Generating negative classifier data based on positive classifier data
CN106295678B (en) * 2016-07-27 2020-03-06 北京旷视科技有限公司 Neural network training and constructing method and device and target detection method and device
CN108495110B (en) * 2018-01-19 2020-03-17 天津大学 Virtual viewpoint image generation method based on generation type countermeasure network
CN108710847B (en) * 2018-05-15 2020-11-27 北京旷视科技有限公司 Scene recognition method and device and electronic equipment
CN108810413B (en) * 2018-06-15 2020-12-01 Oppo广东移动通信有限公司 Image processing method and device, electronic equipment and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170290095A1 (en) * 2016-03-30 2017-10-05 The Markov Corporation Electronic oven with infrared evaluative control
CN107729822A (en) * 2017-09-27 2018-02-23 北京小米移动软件有限公司 Object identifying method and device
CN107833209A (en) * 2017-10-27 2018-03-23 浙江大华技术股份有限公司 A kind of x-ray image detection method, device, electronic equipment and storage medium
CN108921040A (en) * 2018-06-08 2018-11-30 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment
CN109754009A (en) * 2018-12-29 2019-05-14 北京沃东天骏信息技术有限公司 Item identification method, device, vending system and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036455A (en) * 2020-08-19 2020-12-04 浙江大华技术股份有限公司 Image identification method, intelligent terminal and storage medium
CN112036455B (en) * 2020-08-19 2023-09-01 浙江大华技术股份有限公司 Image identification method, intelligent terminal and storage medium
CN113052246A (en) * 2021-03-30 2021-06-29 北京百度网讯科技有限公司 Method and related device for training classification model and image classification
CN113052246B (en) * 2021-03-30 2023-08-04 北京百度网讯科技有限公司 Method and related apparatus for training classification model and image classification
CN113743459A (en) * 2021-07-29 2021-12-03 深圳云天励飞技术股份有限公司 Target detection method and device, electronic equipment and storage medium
CN113743459B (en) * 2021-07-29 2024-04-02 深圳云天励飞技术股份有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN114049518A (en) * 2021-11-10 2022-02-15 北京百度网讯科技有限公司 Image classification method and device, electronic equipment and storage medium
CN114372940A (en) * 2021-12-15 2022-04-19 南京邮电大学 Real scene image synthesis method and system
CN116129201A (en) * 2023-04-18 2023-05-16 新立讯科技股份有限公司 Commodity biological feature extraction and verification method

Also Published As

Publication number Publication date
CN109754009B (en) 2021-07-13
CN109754009A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
WO2020134102A1 (en) Article recognition method and device, vending system, and storage medium
CN107895160A (en) Human face detection and tracing device and method
US20170161591A1 (en) System and method for deep-learning based object tracking
CN104599287B (en) Method for tracing object and device, object identifying method and device
CN109670546B (en) Commodity matching and quantity regression recognition algorithm based on preset template
CN112132213A (en) Sample image processing method and device, electronic equipment and storage medium
CN108345912A (en) Commodity rapid settlement system based on RGBD information and deep learning
CN108229456A (en) Method for tracking target and device, electronic equipment, computer storage media
CN105574848A (en) A method and an apparatus for automatic segmentation of an object
CN110298297A (en) Flame identification method and device
CN111222870B (en) Settlement method, device and system
CN108229375B (en) Method and device for detecting face image
US10318844B2 (en) Detection and presentation of differences between 3D models
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
CN109840503A (en) A kind of method and device of determining information
CN113516146A (en) Data classification method, computer and readable storage medium
Fei et al. Flow-pose Net: An effective two-stream network for fall detection
CN114255377A (en) Differential commodity detection and classification method for intelligent container
CN111428743B (en) Commodity identification method, commodity processing device and electronic equipment
Sharma Object detection and recognition using Amazon Rekognition with Boto3
Gündüz et al. A new YOLO-based method for social distancing from real-time videos
Chen et al. Self-supervised multi-category counting networks for automatic check-out
CN113160414B (en) Automatic goods allowance recognition method, device, electronic equipment and computer readable medium
CN113743382B (en) Shelf display detection method, device and system
CN115131826A (en) Article detection and identification method, and network model training method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19901449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19901449

Country of ref document: EP

Kind code of ref document: A1