CN109977983B

CN109977983B - Method and device for obtaining training image

Info

Publication number: CN109977983B
Application number: CN201810425521.XA
Authority: CN
Inventors: 刘思伟; 冯新宇; 梁瀚君
Original assignee: Guangzhou Comma Smart Retail Co ltd
Current assignee: Guangzhou Comma Smart Retail Co ltd
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2021-06-08
Anticipated expiration: 2038-05-07
Also published as: CN109977983A

Abstract

The invention relates to the technical field of commodity identification, and provides a method and a device for obtaining a training image. The method comprises the steps of firstly, selecting a first image from a commodity image library, and obtaining a foreground image in the first image, wherein the foreground image is a partial image which only contains commodities in the first image; then selecting a background image from a background image library, wherein the background image is an image of a scene which can be used for placing a commodity; and finally, combining the foreground image and the background image to obtain a training image. Therefore, in the method, the training images are generated by freely combining the foreground images and the background images, the generation mode is simple and flexible, and a large number of training images in different scenes can be generated conveniently and rapidly. Furthermore, the obtained high-quality training image can be used for training a model for commodity identification, and a good training effect is achieved.

Description

Method and device for obtaining training image

Technical Field

The invention relates to the technical field of commodity identification, in particular to a method and a device for obtaining a training image.

Background

Currently, in the retail industry, the main means for identifying the goods is to provide an identifier, such as a bar code, a two-dimensional code, an RFID tag, etc., on the surface of the package of the goods. When the customer checks out, the customer reads the information contained in the marker by using a corresponding reading device, such as a bar code scanning gun, a two-dimensional code scanning gun, an RFID reader, and the like, so as to identify the commodity, and then the commodity settlement is performed. Since the identification device is usually operated by a cashier, the product identification method is contrary to the future trend of retail industry.

In recent years, image recognition technology based on convolutional neural network is rapidly developed, and good foundation is laid for commodity recognition by utilizing the image recognition technology. However, the convolutional neural network requires a large number of training samples to train in the process of establishing the model. For the task of commodity identification, it means that images of a commodity under a large number of different scenes need to be collected as training samples, which is time-consuming and labor-consuming in the collection process, and it is difficult to find a large number of different scenes.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for obtaining training images, which obtain a large number of training samples through a combination of foreground images and background images to solve the above technical problem.

In order to achieve the purpose, the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a method for obtaining a training image, including:

selecting a first image from a commodity image library, and obtaining a foreground image in the first image, wherein the foreground image is a partial image only containing commodities in the first image;

selecting a background image from a background image library, wherein the background image is an image of a scene in which a commodity can be placed;

and combining the foreground image and the background image to obtain a training image.

In the method, the training images can be generated by freely combining the foreground images and the background images without being limited to the images shot actually, so that the acquisition process of the training images is greatly simplified, a large number of training images under different scenes can be easily generated, and the training images can be further used for training models for commodity identification to obtain a better training effect.

With reference to the first aspect, in a first possible implementation manner of the first aspect, before selecting the first image from the commodity image library and obtaining a foreground image in the first image, the method further includes:

obtaining a first image from an original image set of a commodity;

performing foreground and background segmentation on the first image to obtain a first segmentation result;

marking the position of the foreground and the background of the first image based on the first segmentation result to obtain a first marking result;

storing the first marking result and the first image into a commodity image library;

selecting a first image from a commodity image library and obtaining a foreground image in the first image, wherein the method comprises the following steps:

a first image is selected from a commodity image library, and a foreground image in the first image is obtained based on a first marking result.

The foreground images can be segmented and marked in advance, so that the foreground images can be extracted directly according to the marking result when foreground and background combination is carried out, and a large number of training images can be generated quickly.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, performing foreground and background segmentation on the first image to obtain a first segmentation result, includes:

and performing foreground and background segmentation on the first image by using a pre-trained first convolution neural network to obtain a first segmentation result. Background segmentation by using a convolutional neural network is a very popular image segmentation method at present, and a good segmentation effect is achieved.

With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, before obtaining the first image from the original image set of the commodity, the method further includes:

obtaining a second image different from the first image from the original image set of the item;

determining a segmentation result of the foreground and background segmentation of the second image by the first user as a second segmentation result in response to the foreground and background segmentation operation of the first user;

and training the first convolution neural network by using the second segmentation result.

Namely, for the images in the original image set of the commodity, a part of the images is manually segmented and then used for training the first convolution neural network, and after the first convolution neural network is trained, foreground and background segmentation can be performed on the rest of the images.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, before obtaining a second image different from the first image from the original image set, the method further includes:

acquiring an image of a commodity acquired by an image acquisition device at each preset shooting angle of a plurality of preset shooting angles, and acquiring a plurality of images;

a plurality of images is determined as a set of original images.

The original image set of the commodity is a data source for subsequently generating the commodity image library, the commodity is shot at a plurality of preset shooting angles through the image acquisition equipment, images of the commodity at different preset shooting angles are obtained, the appearance information of the commodity can be reflected in an all-round mode, and the quality of a subsequently generated training image is improved.

With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, after obtaining the images of the commodity captured by the image capturing device at each of the plurality of preset capturing angles, and before determining the plurality of images as the original image set, the method further includes:

judging whether the number of repeated images in the plurality of images exceeds a preset threshold value or not;

if so, carrying out image duplicate removal on the plurality of images to obtain the residual images after the duplicate removal;

modeling the commodity based on the preset shape information of the commodity to obtain a three-dimensional model of the commodity;

carrying out image synthesis on the stereo model and a preset scene to obtain a synthetic image, wherein the shooting angle of the synthetic image is different from any one of a plurality of preset shooting angles;

the remaining images and the composite image are determined as the original image set.

For some commodities with special shapes, such as axisymmetric commodities, a large number of repeated images may exist in images shot at a plurality of preset shooting angles, so that effective information in an original image set is too little, at the moment, the commodities can be subjected to three-dimensional modeling according to shape information of the commodities preset before an image acquisition process is started, a synthetic image is generated, the shooting angle of the synthetic image is different from the preset shooting angle, and therefore the synthetic image can be supplemented to the original image set, the effective information in the original image set is increased, and the quality of a subsequently generated training image is improved.

With reference to the first aspect or any one of the first to the fifth possible implementation manners of the first aspect, in a sixth possible implementation manner of the first aspect, after the foreground image and the background image are combined to obtain the training image, the method further includes:

obtaining the position of a foreground image in a training image;

obtaining the commodity category of the commodity corresponding to the foreground image;

and training a second convolutional neural network by utilizing the training image, the position and the commodity category, wherein the second convolutional neural network is used for commodity detection and classification.

The method for detecting and classifying objects by using the convolutional neural network is a very popular image identification method at present, and has a good identification effect, so that the obtained training images can be used for training the second convolutional neural network to detect and classify commodities.

With reference to the sixth possible implementation manner of the first aspect, in a seventh possible implementation manner of the first aspect, the second Convolutional Neural Network is a Regional Convolutional Neural Network (R-CNN), and the R-CNN and the Fast R-CNN derived from the R-CNN are mainstream Convolutional Neural networks currently used for image recognition, and both of them achieve a better recognition effect in the field of image recognition.

With reference to the sixth possible implementation manner of the first aspect, in an eighth possible implementation manner of the first aspect, before the obtaining the commodity category of the commodity corresponding to the foreground image, the method further includes:

and responding to the commodity classification operation of the second user, and determining the class specified by the second user as the commodity class of the commodity corresponding to the foreground image, namely the commodity class for training can adopt a mode manually specified by the user.

In a second aspect, an embodiment of the present invention provides an apparatus for obtaining a training image, including:

the foreground acquisition module is used for selecting a first image from the commodity image library and acquiring a foreground image in the first image, wherein the foreground image is a partial image only containing commodities in the first image;

the background acquisition module is used for selecting a background image from a background image library, wherein the background image is an image of a scene in which a commodity can be placed;

and the foreground and background combination module is used for combining the foreground image and the background image to obtain a training image.

In a third aspect, an embodiment of the present invention provides a computer storage medium, where computer program instructions are stored in the computer storage medium, and when the computer program instructions are read and executed by a processor of a computer, the computer storage medium executes the method provided in the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a computer storage medium, where computer program instructions are stored in the computer storage medium, and when the computer program instructions are read and executed by the processor, the electronic device performs the method according to the first aspect or any one of the possible implementation manners of the first aspect.

In order to make the above objects, technical solutions and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 shows a block diagram of an electronic device applicable to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a method of obtaining a training image according to a first embodiment of the present invention;

fig. 3 shows a flowchart of steps S01 to S04 of the method of obtaining a training image according to the first embodiment of the present invention;

fig. 4 shows a flowchart of steps S13 to S15 of the method of obtaining a training image according to the first embodiment of the present invention;

fig. 5 is a functional block diagram of an apparatus for obtaining a training image according to a second embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Fig. 1 shows a block diagram of a terminal device 100 applicable to an embodiment of the present invention. As shown in fig. 1, the terminal device 100 includes a memory 102, a memory controller 104, one or more (only one shown) processors 106, a peripheral interface 108, an input-output unit 110, an audio unit 112, a display unit 114, and the like. These components communicate with each other via one or more communication buses/signal lines 116.

The memory 102 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for obtaining training images in the embodiments of the present invention, and the processor 106 may implement the method and apparatus for obtaining training images provided in the embodiments of the present invention by running the software programs and modules stored in the memory 102.

The Memory 102 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. Access to the memory 102 by the processor 106, and possibly other components, may be under the control of the memory controller 104.

The processor 106 may be an integrated circuit chip having signal processing capabilities. The Processor 106 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; it may also be a special purpose Processor including a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed.

The peripheral interface 108 couples various input/output devices to the processor 106 as well as to the memory 102. In some embodiments, the peripheral interface 108, the processor 106, and the memory controller 104 may be implemented in a single chip. In other examples, they may be implemented separately from the individual chips.

The input and output unit 110 is used for providing input data for a user to realize the interaction of the user with the terminal device 100. The input/output unit 110 may be, but is not limited to, a mouse, a keyboard, and the like.

Audio module 112 provides an audio interface to a user that may include one or more microphones, one or more speakers, and audio circuitry.

The display unit 114 provides a display interface between the terminal device 100 and the user. In particular, display unit 114 displays visual output to the user, the content of which may include text, graphics, video, and any combination thereof.

It is to be understood that the configuration shown in fig. 1 is merely illustrative, and that the terminal device 100 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

In the embodiment of the present invention, the terminal device 100 may be a server, a personal computer, a mobile device, an intelligent wearable device, a network device, or the like. In some embodiments, the terminal device 100 may also not be limited to a physical device, and may also be, for example, a virtual machine, a virtual server, a cloud computing platform, and the like.

First embodiment

Fig. 2 is a flowchart illustrating a method for obtaining a training image according to a first embodiment of the present invention, which may be applied to a processor in the terminal device. Referring to fig. 2, the method for obtaining a training image according to the first embodiment includes:

step S10: the processor selects a first image from the library of commodity images and obtains a foreground image in the first image.

The commodity image library stores images of various commodities, each image comprises the commodities and scenes for placing the commodities, and the foreground image is a partial image only comprising the commodities in each image. The first image is selected from the commodity image library, and may be selected randomly or according to a preset rule, wherein the first image may be one or more images, and thus the corresponding foreground images may also be one or more images. As for the step of obtaining the foreground image in the first image, foreground image segmentation may be performed in real time to obtain the foreground image, or foreground segmentation may be performed on all images in the commodity image library in advance before step S10 is performed, and positions of the foreground and background are marked, and the foreground image in the first image may be obtained directly according to the marking result in step S10, so as to speed up the obtaining of the foreground image.

The foreground and background labeling process of the first image performed before the execution of step S10 is specifically described below. Fig. 3 shows a flowchart of steps S01 to S04 of the method for obtaining a training image according to the first embodiment of the present invention. Referring to fig. 3, steps S01 to S04 specifically include:

step S01: the processor obtains a first image from the original image set of the article.

The original image set is an image of a commodity without foreground and background labeling. In one implementation of the embodiment of the present invention, the plurality of images in the original image set may be obtained by shooting the commodity at each of a plurality of preset shooting angles by the image capturing device. The original image set of the commodity is a data source for subsequently generating the commodity image library, and the original image set is obtained by shooting the commodity at different preset shooting angles through the image acquisition equipment, so that the appearance of the commodity can be comprehensively described by the original image set, and the quality of a training image generated based on the commodity image library can be improved. It should be noted that the quality referred to herein refers to the richness of the effective information about the appearance of the product contained in the training image, and does not refer to the accuracy of the image itself.

The image capturing device may have a variety of specific implementations, for example, in one implementation, the image capturing device may include a camera, a turntable, and a drive. The commodity of treating the collection is placed on the carousel, and the carousel can carry out 360 degrees rotations under drive arrangement's drive, and the camera setting can carry out position control in carousel one side. Before beginning image acquisition, fix the camera in predetermineeing the position, at the rotatory in-process of carousel, the image of the commodity on the carousel is gathered from different shooting angles to the camera, accomplishes 360 degrees collections backs, can fix the camera in another different preset position, repeats above-mentioned collection process. Image acquisition is performed according to the method until the number of images required by the original image set is acquired.

The commodity images collected under a plurality of preset shooting angles may have a large number of repeated images, so that effective information in the original image set is insufficient, and generation of later-period training images is not facilitated. The repeated images referred to herein refer to identical or highly similar images. The reason for the existence of the repeated images is generally the special shape of the commodity, such as the cylindrical commodity, when the image acquisition is performed by the image acquisition device comprising the turntable, the camera performs the acquisition of 360 degrees at the same position, and the obtained image content is similar, namely, a large number of repeated images exist.

To solve this problem, in one embodiment, the shape of the commodity may be artificially marked before starting image acquisition, for example, the shape and size of the commodity in a cylindrical shape are marked, the marking result of the commodity shape is referred to as preset shape information, and the preset shape information is stored. After the commodity image is collected by the image collecting device, the processor detects and judges whether the number of the repeated images in the collected commodity image exceeds a preset threshold value. If the judgment result is negative, the commodity is not a commodity with a special shape, and the acquired image of the commodity is directly determined to be the original image set of the commodity. If the product is judged to be the product with a special shape, such as a cylindrical shape, the processor firstly performs image de-duplication on the repeated image to reduce information redundancy and obtain a residual image after de-duplication. And then the processor models the commodity by combining the surface texture of the commodity according to the pre-stored preset shape information to obtain a three-dimensional model of the commodity, and synthesizes the three-dimensional model and a preset scene to obtain a synthesized image. The preset scene can be a scene where the commodity is located when the image acquisition equipment acquires images. The shooting angle of the composite image is different from any one of a plurality of preset shooting angles of the image acquisition equipment, which is equivalent to that the composite image provides new effective information of the commodity besides the rest images. Note that the composite image is not an actually captured image, and therefore the capturing angle of the composite image is a virtual capturing angle corresponding to the actual capturing angle. Finally, the processor determines the composite image and the remaining images together as the original image set. In the process, the synthetic image is actually used for replacing the repeated image, so that effective information in the original image set is enriched, and the quality of a subsequently generated training image can be improved.

Each image in the original image set may be labeled with a foreground and background, including the first image, and the following explanation in steps S02 to S04 is given by taking the foreground and background segmentation process of the first image as an example, and the processing manner for other images in the original image set may be performed with reference to the first image.

Step S02: the processor performs foreground and background segmentation on the first image to obtain a first segmentation result.

For the foreground and background segmentation of the first image, various existing foreground and background segmentation methods can be adopted. In an implementation manner of the embodiment of the present invention, a foreground and background segmentation may be performed on the first image by using a first convolutional neural network trained in advance. Namely, the first image is used as the input of the first convolution neural network, the probability that each pixel in the first image is the foreground or the background is output, each pixel can be judged as the foreground or the background through a preset probability threshold, and the segmentation result is called as a first segmentation result. The first convolution neural network can be trained by utilizing an original image set of a commodity, a part of images are divided from the original image set of the commodity to be used as a training set, and the rest of images are used as a test set. The first image is an image in the test set, and the training process is described by taking a second image in the training set as an example, which is obviously different from the first image. The first user firstly carries out foreground and background segmentation on the second image in a manual mode, and the processor responds to the foreground and background segmentation operation of the first user to obtain a foreground and background segmentation result of the second image, wherein the segmentation result is called as a second segmentation result. Wherein, the foreground and background dividing operation of the first user can be made through the input and output unit. And the processor determines the image blocks in the foreground as positive samples and the image blocks in the background as negative samples by using the second segmentation result, and inputs the positive and negative sample blocks into the first convolutional neural network to train the model parameters of the first convolutional neural network. The process of training the first convolutional neural network for other images in the training set may be performed with reference to the second image, and after the images in the training set are processed, the training of the first convolutional neural network is completed, and at this time, the first convolutional neural network may be used to perform foreground and background segmentation on the images in the test set. The foreground and background segmentation by using the convolutional neural network is a mature existing method, which has achieved a good segmentation effect, and details about specific implementation thereof are not set forth herein too much. It is understood that other foreground and background segmentation methods may be adopted in step S02, such as background subtraction algorithm.

Step S03: the processor marks the position of the foreground and the background of the first image based on the first segmentation result to obtain a first marking result.

The manner of labeling the foreground and background is not limited, for example, the pixels belonging to the foreground may be labeled as a first value, for example, 1, and the pixels belonging to the background may be labeled as a second value different from the first value, for example, 0, or a boundary between the foreground and the background may also be labeled.

Step S04: the processor stores the first marking result and the first image to a commodity image library.

Obviously, the foreground and background marking can also be performed on the second image according to the second segmentation result, and the second image and the corresponding marking result are also stored in the commodity image library. That is, all images in the original image set of the product can be added to the product image library after foreground and background segmentation. In step S10, after the first image is selected, the foreground image therein can be quickly obtained according to the first labeling result.

Step S11: the processor selects a background image from a library of background images, the background image being an image of a scene available for placement of merchandise.

The background image library stores a plurality of background images, the background images are images without containing commodities, the content of the background images is a certain environment in which the commodities can be placed, and the background images can be acquired according to the environment in which the commodities are actually placed or directly acquired from a third party. The background image is selected from the background image library, and may be selected randomly or according to a preset rule, and the selected background image may be one or more images.

Step S12: the processor combines the foreground image and the background image to obtain a training image.

The foreground image and the background image can be freely combined to ensure that each background image has at least one foreground image, and certainly, one background image can also have a plurality of foreground images. The actual placing condition of the commodity can be simulated during combination, and a plurality of foreground images are combined after being mutually overlapped, rotated and zoomed in one background image. Of course, the foreground image and the background image may be combined according to a preset rule. The image generated by combining the foreground image and the background image is called a training image, and obviously, the above combination method can generate a large number of training images which cover different scenes and contain different commodities in a short time. Furthermore, the training images can be used for training a model for commodity identification, and due to the fact that the number of the training images is large and effective information is rich, a good training effect can be achieved. The specific form of the model for product identification is not limited, and may be, for example, a convolutional neural network model, a support vector machine model, or the like. Of course, in some embodiments, the training images may also be used for other purposes, and are not necessarily limited to training models for merchandise recognition therewith.

The following steps, which may be performed after step S12, are described below using training images for training convolutional neural networks as an example. Fig. 4 shows a flowchart of steps S13 to S15 of the method for obtaining a training image according to the first embodiment of the present invention. Referring to fig. 4, steps S13 to S15 specifically include:

step S13: the processor obtains the position of the foreground image in the training image.

Since the shape of the foreground image is known, the position of the foreground image can be clearly marked in the training image after the foreground image is combined with the background image to generate the training image.

Step S14: the processor obtains the commodity category of the commodity corresponding to the foreground image.

And the commodity category corresponding to the foreground image can be specified in real time in a manual mode. For example, in an embodiment, before step S14 is executed, the second user manually specifies the commodity category corresponding to the foreground image, and the processor responds to the commodity classification operation of the second user, determines and stores the commodity category corresponding to each foreground image in the training image according to the manual classification result of the second user. Wherein the article classification operation of the second user may be made through the input-output unit. The second user may perform the commodity sorting operation at the earliest possible time during the commodity image acquisition stage.

Step S15: the processor trains a second convolutional neural network using the training images, the positions of the foreground images, and the commodity categories.

The second convolutional neural network is used for commodity detection and commodity classification, the commodity detection refers to the detection of a suggested area where commodities may exist from an input image, the suggested area can be marked on the input image through a square frame, the commodity classification refers to the feature extraction and classification of contents in the suggested area, the probability that the contents belong to a certain commodity category is output, and the commodity category is judged through a preset probability threshold. The method for detecting and classifying objects by using the convolutional neural network is a relatively mature image identification method at present, and has a good identification effect, so that the obtained training image can be used for training the second convolutional neural network so as to detect and classify commodities. Due to the adoption of the method provided by the embodiment of the invention, a large number of training images containing different commodity combinations under different scenes can be obtained, so that the second convolutional neural network can be trained to obtain a better training effect, the commodity identification accuracy rate by utilizing the trained second convolutional neural network is high, and the result is reliable. It should be noted that the commodity classification herein may be rough classification or fine classification according to actual requirements, that is, according to specific requirements, the second convolutional neural network may identify that a certain commodity is a certain type of specific commodity, or directly identify that a certain commodity is a certain type of specific commodity, so that the commodity classification is actually the commodity identification.

At present, R-CNN and Fast R-CNN derived from the R-CNN, Fast R-CNN and the like are common in the convolutional neural networks for object detection and classification, and the second convolutional neural network can be realized by adopting any one of the R-CNN and the Fast R-CNN. Taking the Fast R-CNN as an example, the Fast RCNN can be regarded as a combination of a Regional recommendation Network (RPN) module and a Fast R-CNN module, wherein the RPN is used for detecting a recommendation region where a commodity may exist, and the Fast R-CNN performs feature extraction on the recommendation region and outputs a commodity classification result. In the Faster RCNN, the RPN and Fast R-CNN can share the convolutional layer, thereby realizing the sharing of feature extraction and accelerating the speed of commodity detection and classification. The positions of the training images and the foreground images can be used for training RPN, and the positions of the training images and the foreground images and the commodity classes of the foreground images can be used for training Fast R-CNN. The fast RCNN belongs to an existing image recognition technology, and details of specific implementation thereof are not set forth herein in an excessive manner.

In summary, in the method for obtaining training images provided in the first embodiment of the present invention, a large number of training images can be quickly generated by combining the foreground images and the background images. Further, the training images can be used for training the model for commodity identification, and a better training effect is achieved. In one embodiment, the foreground image is an image of the commodity at multiple angles acquired by the image acquisition device or synthesized by the processor, and contains abundant effective information about the appearance shape of the commodity, so that a training image generated based on the foreground image has high quality, and is beneficial to subsequent model training. In one embodiment, the training images may be used to train a convolutional neural network model, which has a better effect on object detection and classification, but has a disadvantage of requiring a large number of training images for training, and the method for obtaining training images provided by the first embodiment of the present invention can provide a large number of training images exactly and quickly, so that the obtained training images can be used to train a convolutional neural network for commodity identification, thereby replacing the conventional way of identifying commodities by a reading device in retail industry, and promoting the development of unmanned retail industry.

Second embodiment

Fig. 5 is a functional block diagram of an apparatus 200 for obtaining training images according to a second embodiment of the present invention. Referring to fig. 5, the apparatus 200 for obtaining a training image according to the second embodiment of the present invention includes a foreground obtaining module 210, a background obtaining module 220, and a foreground and background combining module 230. The foreground obtaining module 210 is configured to select a first image from a commodity image library, and obtain a foreground image in the first image, where the foreground image is a partial image in which only a commodity is included in the first image; the background acquisition module 220 is configured to select a background image from a background image library, where the background image is an image of a scene in which the commodity can be placed; the foreground and background combining module 230 is configured to combine the foreground image and the background image to obtain a training image.

The apparatus 200 for obtaining training images according to the second embodiment of the present invention has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments for the parts of the apparatus embodiments that are not mentioned.

Third embodiment

A third embodiment of the present invention provides a computer storage medium, in which computer program instructions are stored, and when the computer program instructions are read and executed by a processor of a computer, the method provided in the first embodiment of the present invention is executed. The computer storage medium may be implemented as, but is not limited to, the memory shown in fig. 1.

Fourth embodiment

A fourth embodiment of the present invention provides an electronic device, which includes a processor and a computer storage medium, where computer program instructions are stored in the computer storage medium, and when the computer program instructions are read and executed by the processor, the method provided in the first embodiment of the present invention is executed. The electronic device may be implemented as, but is not limited to, the terminal device shown in fig. 1.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device to execute all or part of the steps of the method according to the embodiments of the present invention. The aforementioned computer device includes: various devices having the capability of executing program codes, such as a personal computer, a server, a mobile device, an intelligent wearable device, a network device, and a virtual device, the storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic disk, magnetic tape, or optical disk.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of obtaining a training image, comprising:

obtaining a first image from an original image set of a commodity;

performing foreground and background segmentation on the first image by using a pre-trained first convolution neural network to obtain a first segmentation result;

storing the first marking result and the first image to a commodity image library;

selecting the first image from the commodity image library, and obtaining a foreground image in the first image based on the first marking result, wherein the foreground image is a partial image of the first image only containing commodities;

selecting a background image from a background image library, wherein the background image is an image of a scene in which the commodity can be placed;

2. The method of obtaining a training image of claim 1, wherein prior to obtaining the first image from the original set of images of the commodity, the method further comprises:

obtaining a second image different from the first image from the original set of images of the item;

determining a segmentation result of the first user for performing foreground and background segmentation on the second image as a second segmentation result in response to a foreground and background segmentation operation of the first user;

training the first convolutional neural network using the second segmentation result.

3. The method of obtaining a training image of claim 2, wherein prior to obtaining a second image different from the first image from the set of original images, the method further comprises:

acquiring an image of the commodity acquired by an image acquisition device at each preset shooting angle of a plurality of preset shooting angles, and acquiring a plurality of images;

determining the plurality of images as the original set of images.

4. The method of obtaining a training image according to claim 3, wherein the obtaining an image of the commodity captured by the image capturing device at each of a plurality of preset capturing angles is performed after obtaining a plurality of images and before determining the plurality of images as the original image set, the method further comprising:

carrying out image synthesis on the stereo model and a preset scene to obtain a synthetic image, wherein the shooting angle of the synthetic image is different from any one of the plurality of preset shooting angles;

determining the remaining image and the composite image as the original image set.

5. A method of obtaining a training image according to any of claims 1-4, wherein said combining the foreground image with the background image, after obtaining the training image, the method further comprises:

obtaining a position of the foreground image in the training image;

6. The method of obtaining a training image of claim 5, wherein the second convolutional neural network is a regional convolutional neural network.

7. The method for obtaining the training image according to claim 5, wherein before obtaining the commodity category of the commodity corresponding to the foreground image, the method further comprises:

and responding to the commodity classification operation of a second user, and determining the class specified by the second user as the commodity class of the commodity corresponding to the foreground image.

8. An apparatus for obtaining a training image, comprising:

the foreground obtaining module is used for selecting a first image from a commodity image library and obtaining a foreground image in the first image based on a first marking result, wherein the foreground image is a partial image of the first image only containing commodities;

the background acquisition module is used for selecting a background image from a background image library, wherein the background image is an image of a scene in which the commodity can be placed;

the foreground and background combination module is used for combining the foreground image with the background image to obtain a training image;

the apparatus is further configured to: before the foreground obtaining module selects a first image from a commodity image library and obtains a foreground image in the first image based on a first marking result, obtaining the first image from an original image set of the commodity; performing foreground and background segmentation on the first image by using a pre-trained first convolution neural network to obtain a first segmentation result; marking the position of the foreground and the background of the first image based on the first segmentation result to obtain a first marking result; and storing the first marking result and the first image into the commodity image library.