CN108009560B

CN108009560B - Commodity image similarity category judgment method and device

Info

Publication number: CN108009560B
Application number: CN201610944563.5A
Authority: CN
Inventors: 李明强
Original assignee: Guangzhou Tupu Network Technology Co ltd
Current assignee: Guangzhou Tupu Network Technology Co ltd
Priority date: 2016-11-02
Filing date: 2016-11-02
Publication date: 2021-05-11
Anticipated expiration: 2036-11-02
Also published as: CN108009560A

Abstract

The invention relates to a method and a device for judging similar categories of commodity images. And preprocessing the target commodity image into an image which meets the input requirement of a network model. Inputting the preprocessed target commodity image into a network model to obtain the similarity probability of the target commodity image, and judging the similar category of the target commodity image according to the similarity probability, wherein the network model is obtained by performing iterative training on a commodity image sample set comprising a plurality of groups of labeled commodity images. By adopting the method and the device, the whole process of judging the similar categories is obtained by the automatic training of the network model, and the method and the device have very high accuracy and high-efficiency operation speed.

Description

Commodity image similarity category judgment method and device

Technical Field

The present invention relates to the field of image processing, and in particular, to a method and an apparatus for determining a similar category of a product image.

Background

With the rapid development of computer vision technology, image retrieval attracts people's attention. The reprocessing of the image retrieval result is a key problem in computer vision, and the similarity judgment and classification of the retrieved images can make the retrieval result more visual and clear and facilitate the extraction and use of effective information. For example, when a user searches for a dress item, the first 100 items in the returned image result are all items of the same money, and the items of similar money cannot be displayed until the 101 th item, so that the search result enables the user to obtain a lot of invalid information and severely limits the extensibility of the user to obtain the information. However, if the same-style commodities and similar-style commodities in the retrieval image can be classified and the display arrangement mode of the images is reasonably set, the user can obtain more effective information.

In the traditional commodity image similarity judging and classifying method, meaningful features need to be manually selected in the image searching process, indexes are established for neighbor searching after the features are extracted from each image, and when multiple features are selected, weights need to be set for different features so as to be used for classifying the same type and similar type of searched nearest neighbor results. Where the selection of features and the assignment of different features is overly dependent on the experience of the decision maker, resulting in the selection of features and assignments being effective for individual cases but not necessarily globally optimal. In addition, common image features such as SIFT, GIST, color, and the like are effective for a specific scene, but are not suitable for a wide variety of commodity images, and therefore, the accuracy of the conventional commodity image similarity determination and classification method is low.

Disclosure of Invention

Therefore, it is necessary to provide an automatic, accurate and fast method and apparatus for determining the similar categories of images of commodities in order to solve the above technical problems.

A method for judging similar categories of commodity images comprises the following steps:

acquiring target commodity images, wherein the target commodity images are two commodity images of similar categories to be judged;

preprocessing a target commodity image into an image which meets the input requirement of a network model;

inputting the preprocessed target commodity image into a network model to obtain the similarity probability of the target commodity image, and judging the similar category of the target commodity image according to the similarity probability, wherein the network model is obtained by performing iterative training on a commodity image sample set comprising a plurality of groups of labeled commodity images.

In one embodiment, the network model comprises a connected convolutional neural network and a linear logistic regression model, and the step of iteratively training the commodity image sample set comprising a plurality of sets of labeled commodity images comprises:

acquiring a commodity image sample set, wherein the commodity image sample set comprises a plurality of groups of commodity images, and the number of each group of commodity images is two;

labeling similar categories of each group of commodity images, wherein the similar categories comprise different types and the same type;

preprocessing each group of commodity images;

inputting each group of preprocessed commodity images into a convolutional neural network and outputting high-dimensional image features, inputting the high-dimensional image features into a logistic regression model and outputting the predicted similarity probability of the commodity images, calculating prediction errors according to the similar categories and the similarity probability labeled by each group of commodity images, and performing iterative training on the convolutional neural network and the linear logistic regression model by adopting a supervised back propagation method to obtain a deep learning network model.

In one embodiment, the step of labeling the similar categories for each group of commodity images includes:

the similar categories of the commodity images are represented by a number set {0, 1}, and different money and the same money are represented by 1 and 0 respectively;

if the group of commodity images are different, marking the similar category of the commodity images as 1;

if the group of commodity images are the same money, the similar category of the commodity images is marked as 0.

In one embodiment, the step of preprocessing the merchandise image comprises:

scaling the commodity images in the commodity image sample set to a standard size;

carrying out zero-averaging on pixel data of each corresponding pixel point of the commodity image in the commodity image sample set;

and randomly selecting a sample sub-image with the size smaller than that of the commodity image and a horizontal mirror image of the sample sub-image in the commodity image as two groups of commodity images input into the convolutional neural network.

In one embodiment, the convolutional neural network comprises a plurality of convolutional layers, an activation function layer, a pooling layer and a full-connection layer, wherein the activation function adopted in the activation function layer is a hyperbolic tangent function, and the pooling layer adopts a maximum pooling mode;

the steps of inputting each group of the preprocessed commodity images into a convolution neural network and outputting high-dimensional image features, inputting the high-dimensional image features into a logistic regression model and outputting the predicted similarity probability of the commodity images comprise:

the commodity image similarity degree calculation method comprises the steps of performing convolution operation on a commodity image through a convolution layer, performing nonlinear transformation on a convolution operation result through an activation function layer, performing pooling operation on a nonlinear transformation result through a pooling layer to accelerate training speed, performing linear transformation on a pooling operation result through a full connection layer to obtain high-dimensional image features, and calculating the high-dimensional image features through a logistic regression model to obtain the similarity degree probability of the commodity image.

In one embodiment, the step of preprocessing the target commodity image into an image meeting the network model input requirements comprises:

scaling the target commodity image to a standard size;

and selecting a plurality of target subimages and horizontal mirror images of the target subimages at different positions in the target commodity image as a plurality of groups of commodity images input into the network model, wherein the sizes of the target subimages meet the input requirements of the network model.

A commodity image similarity category determination device includes:

the target image acquisition module is used for acquiring target commodity images, wherein the target commodity images are two commodity images of similar categories to be judged;

the target image preprocessing module is used for preprocessing the target commodity image into an image meeting the input requirement of a network model;

and the similar type judging module is used for inputting the preprocessed target commodity image into a network model to obtain the similarity probability of the target commodity image, and judging the similar type of the target commodity image according to the similarity probability, wherein the network model is obtained by performing iterative training on a commodity image sample set comprising a plurality of groups of labeled commodity images.

In one embodiment, the network model includes a convolutional neural network and a linear logistic regression model connected to each other, and the commodity image similarity class determination apparatus further includes a network training module, which includes:

the system comprises a sample image acquisition module, a commodity image analysis module and a commodity image analysis module, wherein the sample image acquisition module is used for acquiring a commodity image sample set, the commodity image sample set comprises a plurality of groups of commodity images, and the number of each group of commodity images is two;

the image labeling module is used for labeling similar types of each group of commodity images, wherein the similar types comprise different types and the same type;

the image preprocessing module is used for preprocessing each group of commodity image sample sets;

and the training module is used for inputting each group of preprocessed commodity images into the convolutional neural network and outputting high-dimensional image characteristics, inputting the high-dimensional image characteristics into the logistic regression model and outputting the predicted similarity probability of the commodity images, calculating prediction errors according to the similar categories and the similarity probability marked by each group of commodity images, and performing iterative training on the convolutional neural network and the linear logistic regression model by adopting a supervised back propagation method to obtain a deep learning network model.

In one embodiment, the image annotation module comprises:

the label setting module is used for representing the similar categories of the sample commodity images by using a number set {0, 1}, and different money and the same money are represented by 1 and 0 respectively;

the labeling module is used for labeling the similar category of the commodity image as 1 if the group of commodity images are different types;

In one embodiment, the image pre-processing module comprises:

the scaling module is used for scaling the commodity images in the commodity image sample set to a standard size;

the zero equalization module is used for carrying out zero equalization on the pixel data of each corresponding pixel point of the commodity image in the commodity image sample set;

and the subimage selection module is used for randomly selecting the sample subimages with the size smaller than that of the commodity image and the horizontal mirror images of the sample subimages in the commodity image as two groups of commodity images input into the convolutional neural network.

the training module is further used for carrying out convolution operation on the commodity image through the convolution layer, carrying out nonlinear transformation on a convolution operation result through the activation function layer, carrying out pooling operation on a nonlinear transformation result through the pooling layer to accelerate the training speed, carrying out linear transformation on a pooling operation result through the full connection layer to obtain a high-dimensional image feature, and calculating the high-dimensional image feature through a logistic regression model to obtain the similarity probability of the commodity image.

In one embodiment, the target image preprocessing module comprises:

the target image zooming module is used for zooming the target commodity image to a standard size;

and the target sub-image selection module is used for selecting a plurality of target sub-images and horizontal mirror images of the target sub-images at different positions in the target commodity image as a plurality of groups of commodity images input into the network model, and the sizes of the target sub-images meet the input requirements of the network model.

According to the commodity image similar category judgment method and device, the preprocessed target commodity image is input into the network model subjected to iterative training of a large number of commodity images, so that the similarity probability can be obtained, the similar category of the target commodity image can be judged, the network model is automatically trained to obtain proper characteristics in the whole process of judging the similar category of the target commodity image, the proper characteristics are not required to be selected in advance by relying on manual experience, and the network model is subjected to training iteration of a large number of sample commodity images, so that the output of the network model has high accuracy and high-efficiency operation speed.

Drawings

FIG. 1 is a flowchart of a method for determining similar categories of images of merchandise according to an embodiment;

FIG. 2 is a flow diagram of a method for iterative training of a sample set of images of a commodity in accordance with one embodiment;

FIG. 3 is a flowchart of a method for iterative training of a sample set of images of a commodity according to another embodiment;

FIG. 4 is a schematic diagram of sample sub-image selection in one embodiment;

FIG. 5 is a schematic diagram of a convolutional neural network in one embodiment;

FIG. 6 is a flowchart of a method for determining similar categories of images of merchandise according to another embodiment;

fig. 7 is a schematic structural diagram of a similar category determination device for a commodity image according to an embodiment;

FIG. 8 is a diagram illustrating the architecture of a network training module in one embodiment;

FIG. 9 is a diagram illustrating the structure of a network training module in another embodiment;

fig. 10 is a schematic structural diagram of a similar category determination device for a product image in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In one embodiment, as shown in fig. 1, a method for determining a similar category of a product image is provided, which specifically includes the following steps:

step 102: and acquiring a target commodity image.

The target commodity images are two commodity images of similar categories to be judged. If the commodities in the two commodity images are the same money, the similar categories of the two commodity images are the same money, and if the commodities in the two commodity images are not completely the same, the similar categories of the two commodity images are different money.

Step 104: and preprocessing the target commodity image into an image which meets the input requirement of a network model.

The preprocessing of the target image commodity image comprises the step of standardizing the size of the commodity image, the data of each pixel point and the like so that the image meets the input requirement of a network model.

Step 106: inputting the preprocessed target commodity image into a network model to obtain the similarity probability of the target commodity image, and judging the similarity category of the target commodity image according to the similarity probability.

The network model is obtained by performing iterative training on a commodity image sample set comprising a plurality of groups of labeled commodity images. The architecture of the network model is designed according to the judgment requirement of the commodity image similarity category, the network parameters of the network model are obtained by adopting supervised back propagation iterative training on a commodity image sample set containing a plurality of groups of commodity images, the output result of the network model is the predicted similarity probability of the two commodity images, the preprocessed target commodity image is input into the network model, the similarity probability can be obtained through the operation processing of the network model, and the similarity category of the target commodity image can be judged according to the similarity probability.

In one embodiment, where the network model comprises a connected convolutional neural network and a linear logistic regression model, as shown in fig. 2, the step of iteratively training a sample set of merchandise images comprising a plurality of sets of labeled merchandise images comprises:

step 202: a sample set of commodity images is obtained.

The commodity image sample set comprises a plurality of groups of commodity images, and the number of each group of commodity images is two. The commodity image sample set is a sample set for iterative training of the network model, the commodity image sample set comprises a plurality of identical or similar commodity images, and two of the commodity images are randomly set as a group to be used as training input of the network model. The commodity images can be collected by using an electronic commerce website, and a large number of commodity images can be collected in a short time.

Step 204: and labeling similar categories of each group of commodity images.

Similar categories include different and the same. The classification of the similar categories into the same type and the different types takes different subjective feelings of different people on the similarity level of the image into consideration, but the judgment of people on the same concept and the different concept is clear and consistent, so that the classification of the similar categories can reduce the interference of human subjective factors.

In one embodiment, as shown in fig. 3, the step of labeling the similar categories for each group of commodity images includes:

step 2042: similar categories of the commodity images are represented by a set of numbers 0, 1, and different and the same money are represented by 1 and 0, respectively. Alternatively, different terms may be represented by 0, and the same term may be represented by 1.

Step 2044: if the group of commodity images are different, marking the similar category of the commodity images as 1;

And manually judging whether a group of commodity images are of the same money or different money, marking the similar category corresponding to the group of commodity images as 0 when judging that the group of commodity images are of the same money, and marking the similar category corresponding to the group of commodity images as 1 when judging that the group of commodity images are of different money. Optionally, the definition of the similar categories is that different items are represented by 0, and the same item is represented by 1, and then labeling is performed according to the defined similar categories.

Step 206: and preprocessing each group of commodity images.

Before each group of commodity images are input into the network model, each group of commodity images need to be preprocessed so as to enable the commodity images to meet the input requirements of the network model.

In one embodiment, as shown in fig. 3, the step of preprocessing the commodity image comprises:

step 2062: and scaling the commodity image in the commodity image sample set to a standard size.

And uniformly scaling all the commodity images in the acquired commodity image sample set to a standard size so as to process all the images subsequently. For example, all commodity images may be uniformly scaled to 256 × 256 pixels in size.

Step 2064: and carrying out zero equalization on the pixel data of each corresponding pixel point of the commodity image in the commodity image sample set.

The commodity images are generally represented by RGB (Red Green Blue ) color space, after the image sizes are standardized, the average value of R, G, B channel image color data of each corresponding pixel point of all the commodity images is respectively obtained, then the average value of the color data corresponding to the pixel point is subtracted from the image color data of the three channels of each pixel point of each commodity image, and the overall average value of the pixel data of each corresponding pixel point of all the commodity images in the commodity image sample set obtained through processing is zero, so that zero averaging is realized. The zero-equalization operation can avoid deviation in network model training caused by that individual data is far larger or smaller than other data, noise interference can be reduced, and the accuracy of the model training process is improved.

Step 2066: and randomly selecting a sample sub-image with the size smaller than that of the commodity image and a horizontal mirror image of the sample sub-image in the commodity image as two groups of commodity images input into the convolutional neural network.

Randomly selecting a sample sub-image from the commodity image after size standardization, wherein the size of the sample sub-image meets the input requirement of the network model. For example, as shown in fig. 4, a sample sub-image with a length and a width of 227 pixels is randomly selected at any position in a group of commodity images (including two commodity images) with an image resolution of 256 × 256, the positions of the sample sub-images in the two commodity images are the same, and color data of 3 channels of the two sample sub-images are superimposed together to form RGB color data of 6 channels, which is used as a training input of a group of network models. And the RGB color data of 6 channels of the horizontal mirror image of the selected sample sub-image is used as the training input for the other set of network models. Therefore, two groups of input can be formed by selecting the same group of commodity images through random sample subimages, so that the number of images for network model training is increased, and overfitting on an image data set of an original sample set is prevented in the network model training process. The size of the sample sub-image is determined according to the size of the standardized commodity image, but the size of the sample sub-image is not too small, and the overall characteristics of the image can be reflected. In addition, the vertical mirror image or other forms of images of the sample sub-images may also be selected as training inputs for another set of network models, and are not limited to the horizontal mirror image described in this embodiment.

Step 208: inputting each group of preprocessed commodity images into a convolutional neural network and outputting high-dimensional image features, inputting the high-dimensional image features into a logistic regression model and outputting the predicted similarity probability of the commodity images, calculating prediction errors according to the similar categories and the similarity probability labeled by each group of commodity images, and performing iterative training on the convolutional neural network and the linear logistic regression model by adopting a supervised back propagation method to obtain a deep learning network model.

In one embodiment, the convolutional neural network comprises a plurality of convolutional layers, an activation function layer, a pooling layer and a full-link layer, wherein the activation function adopted in the activation function layer is a hyperbolic tangent function, and the pooling layer adopts a maximum pooling mode. The steps of inputting each group of the preprocessed commodity images into a convolution neural network and outputting high-dimensional image features, inputting the high-dimensional image features into a logistic regression model and outputting the predicted similarity probability of the commodity images comprise:

Specifically, in the present embodiment, as shown in fig. 5, a schematic diagram of a convolutional neural network is shown, and the description is given by taking 227 × 227 as an example of the resolution of a sample sub-image input to a network model.

The convolutional neural network and the logistic regression model constitute a network model with 11 layers, wherein the 11 layers are a convolutional layer, a posing layer (pooling layer), a convolutional layer, a posing layer, a convolutional layer, a posing layer, a fully-connected network layer and a logistic regression model. In addition, an activation function layer, not shown in the figure, is also present in the network model.

The input of the first layer of convolution layer is a sample sub-image after picture preprocessing, namely 6-channel RGB color data of two images with the length and width being 227 pixels, and the expression of the picture changes every time the calculation of one layer of nodes of the network is carried out. The number of the channels after convolution is determined by the convolution kernel of the convolution layer, the length and the width of the representation are reduced, and the picture after the pooling layer is reduced. The pooling layer is selected in a max-pooling manner. The max-posing is realized by selecting a small n-n area in each channel and selecting the maximum value as the output of the pooling layer, and the max-posing can effectively reduce the parameters and reduce the representation of the picture so as to accelerate the training speed.

In one embodiment, other not depicted hierarchies exist in the network model. For example, a relu layer (activation function layer) is arranged after the convolution layer and the full-link layer, the relu layer is larger than 0 compared with the output of the previous layer, so that unimportant picture background interference information can be effectively removed, and the accuracy of commodity picture classification is increased. And a normalization layer is connected behind the pooling layer and normalizes coefficients among different channels to reduce the interference of noise points in the picture. And a dropout layer is arranged behind the full connection layer, the output of the full connection layer is a multi-dimensional matrix, elements in the multi-dimensional matrix are randomly discarded at a probability of 50%, the complexity of a network model can be increased by adding the dropout layer, direct coupling of nodes is reduced, the classification effect of commodity images is improved, and overfitting is prevented.

The specific parameters of each layer of the network model are as follows:

in the first layer of convolutional layer, the kernel size is 11 × 11 pixels, and the kernel shift step size is 4 pixels. The relu layer is then connected. Each image yielded an intermediate variable of size 96 channels 55 x 55.

The second stacking layer, kernel size 3 x 3 pixels, kernel move step size 2 pixels. Then normalization is performed. Each image yielded an intermediate variable of size 96 channels 27 x 27.

And in the third convolution layer, the size of the kernel is 5 x 5 pixels, and the kernel moving step size is 1 pixel. The relu layer is then connected. Each image yielded intermediate variables of 256 channels 27 x 27 size.

And a fourth layer of firing, the kernel size is 3 x 3 pixels, and the kernel moving step size is 2 pixels. The normalization layer is then connected. Each picture gets 256 channels 13 x 13 pixels size intermediate variables.

And a fifth layer of convolution layer, the kernel size is 3 x 3 pixels, and the kernel moving step size is 1 pixel. And then the relu layer. Each image yielded an intermediate variable of 384 channels 13 x 13 size.

And in the sixth convolution layer, the kernel size is 3 pixels, and the kernel moving step length is 1 pixel. And then the relu layer. Each image yielded an intermediate variable of 384 channels 13 x 13 size.

And in the seventh convolution layer, the kernel size is 3 pixels, and the kernel moving step size is 1 pixel. And then the relu layer. Each image yielded intermediate variables of size 256 channels 13 x 13.

And an eighth posing layer, wherein the size of the kernel is 3 pixels, and the step length of kernel movement is 2 pixels. Each image yielded 256 channels 6 x 6 of intermediate variables.

And the ninth layer is fully connected with the layers, outputs are 4098-dimensional variables, and then the relu layer and the dropout layer are connected. Each image yielded a median variable of 4098 channels 1 x 1.

And a tenth layer is fully connected, outputs are 4098-dimensional variables, and then a relu layer and a dropout layer are connected. Each image yielded a median variable of 4098 channels 1 x 1.

The eleventh layer is a logistic regression model, and 4098-dimensional high-dimensional image features output by the convolutional neural network are input into the logistic regression model to calculate the predicted similarity probability.

The high-dimensional image features output by the convolutional neural network can represent the features of the image, a low-dimensional feature database replacing high-dimensional information of the image can be established according to the high-dimensional image features, a neighbor search framework is established in the low-dimensional feature database, the neighbor search framework can be used for searching the neighbor of the image on the low-dimensional feature, and therefore similar pictures can be obtained, and the low-dimensional feature database can also be used as pre-screening of judgment processing of the same type and different types. Thus, the training output of the convolutional neural network has the advantages of being scalable and portable.

And finally, training the network model in a supervised back propagation mode by taking the collected preprocessed commodity images as training samples to obtain a deep learning network model which takes the data of two similar commodity images as input and takes the predicted similarity probability as output.

In one embodiment, a method of training a network model includes the steps of: the method comprises the steps of initially randomizing initial weight values in a network, training all layers in a network model together, calculating a loss function value by a primary forward transfer process, calculating the loss function value by subtracting similar class values marked correspondingly to an image from a predicted similarity probability, obtaining a difference value by taking a logarithm of the difference value, performing primary reverse transfer according to the loss function value, updating the weight values in the network in a gradient descending mode, alternately performing forward transfer and reverse transfer, and continuously updating the weight values in a neural network, wherein the learning rate is slowly reduced, so that the accuracy of the similar class of the commodity image is continuously improved.

The random gradient descent method can randomly select a subset part of all training samples at a time to perform gradient descent and update the weight values in the neural network, so that more optimal neural network parameter configuration can be achieved more efficiently and more quickly, and the commodity classification accuracy of the overall training samples is higher.

In an embodiment, as shown in fig. 6, a method for segmenting a commodity image is provided, which specifically includes the following steps:

step 602: and acquiring a target commodity image.

The target commodity images are two commodity images of similar categories to be judged.

Step 603: and scaling the target commodity image to a standard size.

And zooming the obtained target commodity image to a standard size, wherein the standard size is consistent with the standard size of commodity image zooming in the commodity image sample set when the network model is trained. For example, if the standard size for the network model training is 256 × 256 pixels, the target commodity image is scaled to 256 × 256 pixels.

In one embodiment, after the target commodity image is scaled to the standard size, the average value of the image color data of the three channels of the corresponding pixel points, which is obtained from all the commodity images in the sample set during network model training, is subtracted from the image color data of the three channels of RGB of each pixel point of the target commodity image. The data of the target commodity image can be more suitable for the network model.

Step 605: and selecting a plurality of target subimages and horizontal mirror images of the target subimages at different positions in the target commodity image as a plurality of groups of commodity images input into the network model, wherein the sizes of the target subimages meet the input requirements of the network model.

And randomly selecting a plurality of sample sub-images at different corresponding positions in the two target commodity images after size standardization, wherein the sizes of the sample sub-images meet the input requirements of the network model. For example, in one embodiment, five groups of target sub-images with a size smaller than 227 × 227 pixels are selected from five positions, namely, left, right, upper, lower and middle positions, of a target commodity image with a resolution of 256 × 256 pixels as a commodity image of an input network model, horizontal mirror images of the five groups of target sub-images in an original image are used as five other groups of commodity images of the input network model to obtain ten groups of input images, and color data of 3 channels of one group of two target sub-images are superposed to form RGB color data of 6 channels as training input of one group of network model. The number of input samples can be increased by adopting a mode of selecting the target sub-images, and the prediction probability values of multiple groups of input can be averaged in the later step of calculating the similarity probability, so that the prediction accuracy can be improved.

It should be noted that the size of the sample sub-image is determined according to the size of the standardized commodity image, but the size of the target sub-image is not too small, and the overall characteristics of the image should be reflected. In addition, the vertical mirror image of the target sub-image or the image in other form may also be selected as the training input of another group of network models, which is not limited to the horizontal mirror image described in this embodiment, and the selection position and the selection number of the target sub-image are not limited to the five groups of left, right, upper, lower, and middle described in this embodiment, and may also be selected randomly or in other selection manners, and the selection number may be set according to specific needs.

Step 606: inputting the preprocessed target commodity image into a network model to obtain the similarity probability of the target commodity image, and judging the similarity category of the target commodity image according to the similarity probability.

Inputting RGB color data of 6 channels of a plurality of groups of target sub-images into a network model subjected to deep training learning, obtaining the similarity probability of each group of target sub-images through the operation processing of the network model, solving the average value of the similarity probability, and comparing the average value with a preset similarity class value to judge the similarity class of the target commodity image. In one embodiment, the similarity class value is 0 when the similarity class is the same type, and the similarity class value is 1 when the similarity class is different type, in this embodiment, the obtained similarity probability value represents the similarity distance, and the smaller the similarity probability value is, the smaller the similarity distance is, the more similar the target product images are (0 is the same type, that is, the most similar and the same); the greater the similarity probability value is, the greater the similarity distance is, and the greater the difference of the target commodity images is (1 is different, i.e. most dissimilar, completely different). For example, if the output value of the similarity probability is 0.8, which indicates that the similarity is completely different, the similarity category of the target commodity image is judged to be different; and if the output similarity probability output value is 0.2, which indicates that the images are nearly completely the same, judging that the similarity category of the target commodity image is the same money.

According to the commodity image similar category judgment method, the preprocessed target commodity image is input into the network model subjected to iterative training of a large number of commodity images, so that the similarity probability can be obtained, the similar category of the target commodity image can be judged, the network model is automatically trained to obtain proper characteristics in the whole process of judging the similar category of the target commodity image, the proper characteristics are not required to be selected in advance depending on manual experience, and the network model is subjected to training iteration of a large number of sample commodity images, so that the output of the network model has very high accuracy and high operation speed.

In the commodity image similarity category judgment method, the feature extraction and the mode classification of the network model deep learning are carried out simultaneously, the high-dimensional image features output by the convolutional neural network can represent the features of the image, a low-dimensional feature database replacing high-dimensional information of the image can be established according to the high-dimensional image features, a neighbor search framework can be established in the low-dimensional feature database, and the low-dimensional feature database can also be used as a pre-screening for judgment processing of the same type and different types. Thus, the training output of the convolutional neural network has the advantages of being scalable and portable.

In one embodiment, as shown in fig. 7, there is provided an apparatus 700 for determining similar categories of images of commodities, the apparatus comprising a target image acquiring module 702, a target image preprocessing module 704 and a similar category determining module 706, wherein: and the target image acquisition module 702 is configured to acquire target commodity images, where the target commodity images are two commodity images of similar categories to be determined.

And the target image preprocessing module 704 is used for preprocessing the target commodity image into an image meeting the input requirement of the network model.

The similar type determining module 706 is configured to input the preprocessed target commodity image into a network model to obtain a similarity probability of the target commodity image, and determine a similar type of the target commodity image according to the similarity probability, where the network model is obtained by performing iterative training on a commodity image sample set including multiple groups of labeled commodity images.

In one embodiment, the apparatus 700 for determining similar categories of images of merchandise further comprises a network training module 710. As shown in fig. 8, the network training module 710 includes a sample image acquisition module 7102, an image annotation module 7104, an image preprocessing module 7106, and a training module 7108, wherein: and the sample image acquisition module 7102 is used for acquiring a commodity image sample set, wherein the commodity image sample set comprises a plurality of groups of commodity images, and the number of each group of commodity images is two.

And the image labeling module 7104 is used for labeling similar types of each group of commodity images, wherein the similar types comprise different types and the same type.

And the image preprocessing module 7106 is used for preprocessing each group of commodity image sample sets.

The training module 7108 is used for inputting each group of preprocessed commodity images into a convolutional neural network and outputting high-dimensional image features, inputting the high-dimensional image features into a logistic regression model and outputting the predicted similarity probability of the commodity images, calculating prediction errors according to the similar categories and the similarity probability labeled by each group of commodity images, and performing iterative training on the convolutional neural network and the linear logistic regression model by adopting a supervised back propagation method to obtain a deep learning network model.

In one embodiment, as shown in FIG. 9, the image annotation module 7104 includes an annotation settings module 71042 and an annotation module 71044, wherein:

and the label setting module 71042 is used for representing similar categories of the sample commodity images by a number set {0, 1}, wherein different money and the same money are represented by 1 and 0 respectively.

The labeling module 71044 is configured to label a similar category of the commodity image as 1 if a group of commodity images are of different types;

In one embodiment, image pre-processing module 7106 includes a scaling module 71062, a zero-averaging module 71064, and a subimage selection module 71066, wherein:

a scaling module 71062 for scaling the merchandise images in the merchandise image sample set to a standard size.

The zero-averaging module 71064 is configured to perform zero-averaging on the pixel data of each corresponding pixel point of the commodity image in the commodity image sample set.

And the sub-image selecting module 71066 is used for randomly selecting the sample sub-images with the size smaller than that of the commodity image in the commodity image and the horizontal mirror images of the sample sub-images as two groups of commodity images input into the convolutional neural network.

In one embodiment, the convolutional neural network comprises a plurality of convolutional layers, an activation function layer, a pooling layer and a full-link layer, wherein the activation function adopted in the activation function layer is a hyperbolic tangent function, and the pooling layer adopts a maximum pooling mode; the training module 7108 is also used for performing convolution operation on the commodity image through the convolution layer, performing nonlinear transformation on a convolution operation result through the activation function layer, performing pooling operation on a nonlinear transformation result through the pooling layer to increase the training speed, performing linear transformation on a pooling operation result through the full connection layer to obtain a high-dimensional image feature, and calculating the high-dimensional image feature through a logistic regression model to obtain the similarity probability of the commodity image.

In one embodiment, as shown in fig. 10, target image pre-processing module 704 includes a target image scaling module 7042 and a target image selection module 7044, wherein:

a target image scaling module 7042, configured to scale the target product image to a standard size;

and the target sub-image selecting module 7044 is configured to select, at different positions in the target commodity image, a plurality of target sub-images and horizontal mirror images of the target sub-images as a plurality of sets of commodity images input to the network model, where the size of the target sub-images meets the input requirement of the network model.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for judging the similar category of a commodity image is used for reprocessing an image retrieval result and comprises the following steps:

acquiring target commodity images, wherein the target commodity images are two commodity images of similar categories to be judged, and the target commodity images comprise commodity images to be searched and retrieved images corresponding to the commodity images to be searched;

preprocessing the target commodity image into an image which meets the input requirement of a network model;

inputting the preprocessed target commodity image into the network model to obtain the similarity probability of the target commodity image, judging whether the similarity category of the target commodity image is the same or different according to the similarity probability to obtain a classification result that the commodity image to be searched and the retrieved image are the same or different, wherein the network model is obtained by carrying out iterative training on a commodity image sample set comprising a plurality of groups of marked commodity images;

and reasonably setting the image display arrangement mode of the images according to the classification result.

2. The method for determining similar categories of commodity images according to claim 1, wherein the network model includes a connected convolutional neural network and a linear logistic regression model, and the step of iteratively training a commodity image sample set including a plurality of sets of labeled commodity images includes:

acquiring the commodity image sample set, wherein the commodity image sample set comprises a plurality of groups of commodity images, and the number of the commodity images in each group is two;

labeling similar categories of the commodity images in each group, wherein the similar categories comprise different types and the same type;

preprocessing each group of commodity images;

inputting each group of the preprocessed commodity images into the convolutional neural network and outputting high-dimensional image features, inputting the high-dimensional image features into the linear logistic regression model and outputting the predicted similarity probability of the commodity images, calculating prediction errors according to the similar categories marked by each group of the commodity images and the similarity probability, and performing iterative training on the convolutional neural network and the linear logistic regression model by adopting a supervised back propagation method to obtain a deep learning network model.

3. The method for determining the similar category of a product image according to claim 2, wherein the step of labeling the similar category for each group of the product images includes:

the similar categories of the commodity images are represented by a number set {0, 1}, and the different money and the same money are represented by 1 and 0 respectively;

if a group of the commodity images are different, marking the similar category of the commodity images as 1;

and if the group of the commodity images are the same money, marking the similar category of the commodity images as 0.

4. The method for determining the similarity category of a product image according to claim 2, wherein the step of preprocessing the product image includes:

scaling the commodity image in the commodity image sample set to a standard size;

5. The method for determining the similar categories of the commodity images according to claim 2, wherein the convolutional neural network comprises a plurality of convolutional layers, an activation function layer, a pooling layer and a full-link layer, wherein the activation function adopted in the activation function layer is a hyperbolic tangent function, and the pooling layer adopts a maximum pooling mode;

the step of inputting each group of the preprocessed commodity images into a convolutional neural network and outputting high-dimensional image features, and the step of inputting the high-dimensional image features into a logistic regression model and outputting the predicted similarity probability of the commodity images comprises the following steps:

performing convolution operation on the commodity image through the convolution layer, performing nonlinear transformation on a convolution operation result through the activation function layer, performing pooling operation on a nonlinear transformation result through the pooling layer to accelerate training speed, performing linear transformation on a pooling operation result through the full-connection layer to obtain the high-dimensional image feature, and calculating the high-dimensional image feature through the logistic regression model to obtain the similarity probability of the commodity image.

6. The method for determining the similar category of a product image according to claim 1, wherein the step of preprocessing the target product image into an image that meets a network model input requirement includes:

scaling the target commodity image to a standard size;

and selecting a plurality of target sub-images and horizontal mirror images of the target sub-images at different positions in the target commodity image as a plurality of groups of commodity images input into the network model, wherein the sizes of the target sub-images meet the input requirements of the network model.

7. A product image similar type determination device for reprocessing an image search result, comprising: the system comprises a target image acquisition module, a target image acquisition module and a display module, wherein the target image acquisition module is used for acquiring target commodity images, and the target commodity images are two commodity images of similar categories to be judged;

the target image preprocessing module is used for preprocessing the target commodity image into an image which meets the input requirement of a network model, wherein the target commodity image comprises a commodity image to be searched and a retrieved image corresponding to the commodity image to be searched;

the similar type judging module is used for inputting the preprocessed target commodity image into a network model to obtain the similarity probability of the target commodity image, judging the similar type of the target commodity image to be the same type or different types according to the similarity probability to obtain a classification result that the commodity image to be searched and the retrieved image are the same type or different types, and the network model is obtained by carrying out iterative training on a commodity image sample set comprising a plurality of groups of marked commodity images;

and the output display module is used for reasonably setting the image display arrangement mode of the images according to the classification result.

8. The apparatus according to claim 7, wherein the network model includes a connected convolutional neural network and a linear logistic regression model, and further comprising a network training module that includes:

the system comprises a sample image acquisition module, a commodity image analysis module and a commodity image analysis module, wherein the sample image acquisition module is used for acquiring a commodity image sample set, the commodity image sample set comprises a plurality of groups of commodity images, and the number of the commodity images in each group is two;

the image labeling module is used for labeling similar types of the commodity images in each group, wherein the similar types comprise different types and the same type;

and the training module is used for inputting each group of the preprocessed commodity images into the convolutional neural network and outputting high-dimensional image characteristics, inputting the high-dimensional image characteristics into the linear logistic regression model and outputting the predicted similarity probability of the commodity images, calculating prediction errors according to the similar categories marked by each group of the commodity images and the similarity probability, and performing iterative training on the convolutional neural network and the linear logistic regression model by adopting a supervised back propagation method to obtain a deep learning network model.

9. The apparatus for determining the similarity category of a product image according to claim 8, wherein the image labeling module includes:

the label setting module is used for representing the similar categories of the sample commodity images by using a number set {0, 1}, wherein the different money and the same money are represented by 1 and 0 respectively;

the labeling module is used for labeling the similar category of the commodity image as 1 if a group of the commodity images are different in style;

10. The apparatus according to claim 8, wherein the image preprocessing module includes:

a scaling module for scaling the commodity image in the commodity image sample set to a standard size;

the zero equalization module is used for performing zero equalization on pixel data of each corresponding pixel point of the commodity image in the commodity image sample set;

and the subimage selection module is used for randomly selecting the sample subimages with the size smaller than that of the commodity image in the commodity image and the horizontal mirror images of the sample subimages as two groups of commodity images input into the convolutional neural network.

11. The commodity image similar category determination device according to claim 8, wherein the convolutional neural network includes a plurality of convolutional layers, activation function layers, pooling layers, and full-link layers, the activation function used in the activation function layers is a hyperbolic tangent function, and the pooling layers use a maximum pooling scheme;

the training module is further configured to perform convolution operation on the commodity image through the convolution layer, perform nonlinear transformation on a result of the convolution operation through the activation function layer, perform pooling operation on a result of the nonlinear transformation through the pooling layer to increase training speed, perform linear transformation on a result of the pooling operation through the full connection layer to obtain the high-dimensional image feature, and calculate the high-dimensional image feature through the logistic regression model to obtain the similarity probability of the commodity image.

12. The apparatus according to claim 7, wherein the target image preprocessing module includes: