CN111291765A

CN111291765A - Method and device for determining similar pictures

Info

Publication number: CN111291765A
Application number: CN201811495715.3A
Authority: CN
Inventors: 张超
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2020-06-16

Abstract

The embodiment of the application discloses a method and a device for determining similar pictures. One embodiment of the method comprises: inputting the target picture into a feature extraction network in a trained image similarity evaluation model for image feature extraction to obtain a feature vector of the target picture; determining similar pictures of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of the candidate picture set acquired in advance; the feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by extracting the features of the candidate picture based on a feature extraction network in a trained image similarity evaluation model. The method and the device improve the accuracy and efficiency of searching the similar pictures.

Description

Method and device for determining similar pictures

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of image processing, and particularly relates to a method and a device for determining similar pictures.

Background

With the development of network technology, more and more internet data is generated. The search engine can collect and arrange massive internet data, and a user can input key description information to obtain data content through the search engine based on the crawler technology. The commonly used key description information is a text keyword, and the text keyword can be used for matching during searching.

In some platforms, the description mode of the data content is diversified, for example, the data content in the e-commerce platform is mostly described based on pictures and characters. The retrieval result data obtained in the platforms by adopting the traditional text keyword retrieval method is large in data volume, and manual secondary screening is required. In addition, in some scenes, a user wants to check similar contents of known data contents, the data contents in the network can be captured in advance by using a background search engine, and the similar contents can be searched according to an index constructed by the text description information of the data contents, and the content searching mode has strong dependency on a data index mode of the background search engine.

Disclosure of Invention

The embodiment of the application provides a method and a device for determining similar pictures.

In a first aspect, an embodiment of the present application provides a method for determining similar pictures, including: inputting the target picture into a feature extraction network in a trained image similarity evaluation model for image feature extraction to obtain a feature vector of the target picture; determining similar pictures of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of the candidate picture set acquired in advance; the feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by extracting the features of the candidate picture based on a feature extraction network in a trained image similarity evaluation model.

In some embodiments, the feature vector of the candidate picture is a normalized row feature vector, and the feature vector of the target picture is a normalized column feature vector; in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a row vector in the feature matrix; the determining of the similar picture of the target picture from the candidate picture set based on the feature vector of the target picture and the feature matrix of the candidate picture set acquired in advance includes: determining the maximum value of elements in a result column vector obtained by multiplying the feature matrix by the feature vector of the target picture, and determining that a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result column vector in the feature matrix is a similar picture of the target picture.

In some embodiments, the feature vector of the candidate picture is a normalized column feature vector, and the feature vector of the target picture is a normalized row feature vector; in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a column vector in the feature matrix; the determining of the similar picture of the target picture from the candidate picture set based on the feature vector of the target picture and the feature matrix of the candidate picture set acquired in advance includes: determining the maximum value of elements in a result row vector obtained by multiplying the feature vector of the target picture by the feature matrix, and determining a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result row vector in the feature matrix as a similar picture of the target picture.

In some embodiments, the image similarity evaluation model includes a similarity calculation network and two weight-sharing feature extraction networks; the method further comprises the following steps: and training the set based on sample pictures constructed by the candidate pictures to obtain a trained image similarity evaluation model.

In some embodiments, the sample picture pair set includes a sample picture pair formed by candidate pictures and similar attribute labeling information of the sample picture pair; the training of the sample picture pair set constructed based on the candidate pictures to obtain the trained image similarity evaluation model comprises the following steps: respectively inputting two sample pictures in the sample picture pair into two feature extraction networks in an image similarity evaluation model to be trained to obtain feature vectors of the two sample pictures in the sample picture pair; calculating the similarity of the two sample pictures in the sample picture pair by adopting a similarity calculation network in an image similarity evaluation model to be trained on the basis of the feature vectors of the two sample pictures in the sample picture pair; and iteratively adjusting the weight of the feature extraction network in the image similarity evaluation model to be trained by adopting a back propagation method based on a preset loss function so that the value of the loss function meets a preset convergence condition, wherein the value of the loss function is used for representing the difference between the similarity of the sample picture pair calculated by the image similarity evaluation model to be trained and the labeling information of the similarity attribute of the corresponding sample picture pair.

In some embodiments, the above method further comprises: acquiring user feedback information for evaluating the similarity between the similar picture of the target picture and the target picture; and determining similar attribute labeling information between the target picture and the similar pictures of the target picture based on the user feedback information, and adding the target picture and the similar pictures of the target picture as sample picture pairs into the sample picture pair set.

In a second aspect, an embodiment of the present application provides an apparatus for determining similar pictures, including: the extraction unit is configured to input the target picture into a feature extraction network in the trained image similarity evaluation model to perform image feature extraction, so as to obtain a feature vector of the target picture; the determining unit is configured to determine similar pictures of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance; the feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by extracting the features of the candidate picture based on a feature extraction network in a trained image similarity evaluation model.

In some embodiments, the feature vector of the candidate picture is a normalized row feature vector, and the feature vector of the target picture is a normalized column feature vector; in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a row vector in the feature matrix; the determining unit is further configured to determine, based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance, a similar picture of the target picture from the candidate picture set as follows: determining the maximum value of elements in a result column vector obtained by multiplying the feature matrix by the feature vector of the target picture, and determining that a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result column vector in the feature matrix is a similar picture of the target picture.

In some embodiments, the feature vector of the candidate picture is a normalized column feature vector, and the feature vector of the target picture is a normalized row feature vector; in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a column vector in the feature matrix; the determining unit is further configured to determine, based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance, a similar picture of the target picture from the candidate picture set as follows: determining the maximum value of elements in a result row vector obtained by multiplying the feature vector of the target picture by the feature matrix, and determining a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result row vector in the feature matrix as a similar picture of the target picture.

In some embodiments, the image similarity evaluation model includes a similarity calculation network and two weight-sharing feature extraction networks; the above-mentioned device still includes: and the training unit is configured to train the sample picture pair set constructed by the candidate pictures to obtain a trained image similarity evaluation model.

In some embodiments, the sample picture pair set includes a sample picture pair formed by candidate pictures and similar attribute labeling information of the sample picture pair; the training unit is further configured to train a trained image similarity evaluation model as follows: respectively inputting two sample pictures in the sample picture pair into two feature extraction networks in an image similarity evaluation model to be trained to obtain feature vectors of the two sample pictures in the sample picture pair; calculating the similarity of the two sample pictures in the sample picture pair by adopting a similarity calculation network in an image similarity evaluation model to be trained on the basis of the feature vectors of the two sample pictures in the sample picture pair; and iteratively adjusting the weight of the feature extraction network in the image similarity evaluation model to be trained by adopting a back propagation method based on a preset loss function so that the value of the loss function meets a preset convergence condition, wherein the value of the loss function is used for representing the difference between the similarity of the sample picture pair calculated by the image similarity evaluation model to be trained and the labeling information of the similarity attribute of the corresponding sample picture pair.

In some embodiments, the apparatus further comprises an updating unit configured to: acquiring user feedback information for evaluating the similarity between the similar picture of the target picture and the target picture; and determining similar attribute labeling information between the target picture and the similar pictures of the target picture based on the user feedback information, and adding the target picture and the similar pictures of the target picture as sample picture pairs into the sample picture pair set.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the method for determining similar pictures as provided in the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the program, when executed by a processor, implements the method for determining similar pictures provided in the first aspect.

According to the method and the device for determining the similar picture, the target picture is input into the feature extraction network in the trained image similarity evaluation model to extract the image features, and the feature vector of the target picture is obtained; determining similar pictures of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of the candidate picture set acquired in advance; the feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by feature extraction of the candidate picture based on a feature extraction network in a trained image similarity evaluation model, so that the similar picture is quickly and accurately positioned, and the searching efficiency and accuracy of the similar picture are improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for determining similar pictures according to the present application;

FIG. 3 is a flow diagram of another embodiment of a method for determining similar pictures according to the present application;

fig. 4 is a schematic structural diagram of an image similarity evaluation model according to the method for determining similar pictures shown in fig. 3;

FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for determining similar pictures according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which the method for determining similar pictures or the apparatus for determining similar pictures of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user 110 may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages or the like. Various applications having an information search function, such as an e-commerce application, a comment application, a social platform application, and the like, may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting internet access including, but not limited to, desktop computers, smart phones, tablet computers, smart watches, notebook computers, laptop portable computers, e-book readers, and the like.

The server 105 may be a server that provides various types of information query services, such as a vendor platform server or a search engine server. The server 105 may receive the query request sent by the

terminal device

101, 102, 103, find out relevant information according to the query condition defined in the query request, and feed back the found information to the

terminal device

101, 102, 103 through the network 104.

The

terminal devices

101, 102, and 103 may include a component (e.g., a processor such as a GPU) for performing a physical operation, and the

terminal devices

101, 102, and 103 may also perform local processing on the similar picture search initiated by the user 110 to obtain a similar picture search result.

The method for determining similar pictures provided by the embodiment of the application can be executed by the

terminal device

101, 102, 103 or the server 105, and accordingly, the device for determining similar pictures can be disposed in the

terminal device

101, 102, 103 or the server 105.

It should be understood that the number of terminal devices, networks, servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for determining similar pictures according to the present application is shown. The method for determining similar pictures comprises the following steps:

step 201, inputting a target picture into a feature extraction network in a trained image similarity evaluation model for image feature extraction, so as to obtain a feature vector of the target picture.

In the present embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for determining a similar picture may acquire a target picture. The target picture may be a picture selected by the user, for example, one or more pictures on a web page selected by the user while browsing the web page; the target picture may also be a picture randomly extracted from a network picture library, for example, when it is desired to analyze the related information of the article provided by the e-commerce platform in a competitor's platform or a store, a picture may be extracted from the description picture of the article provided by the e-commerce platform as the target picture.

The target picture may be input into the trained image similarity evaluation model. The image similarity evaluation model may be a model for evaluating similarity between two images or a plurality of images, and may be trained in advance based on a sample set. The image similarity model may include a feature extraction network that may extract features of each of the two or more input images, and may calculate similarity between the images according to the extracted features.

The feature extraction network may be a deep neural network, for example, a deep convolutional neural network. The parameters of the feature extraction network can be adjusted in an iterative mode when the image similarity evaluation model is trained, so that the feature extraction network can extract features with stronger descriptive power on the pictures, and the features extracted by the feature extraction network can effectively distinguish similar pictures from dissimilar pictures.

In this embodiment, after the target picture is input into the image similarity evaluation model, the feature extraction network in the image similarity evaluation model may extract features of the target picture to obtain a feature vector of the target picture.

Step 202, determining a similar picture of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of the candidate picture set acquired in advance.

The feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by performing feature extraction on the candidate picture based on a feature extraction network in a trained image similarity evaluation model.

The candidate pictures in the candidate picture set may be pictures in a network or pictures in a pre-specified picture library. In practice, when the target picture is a description picture of an item in an e-commerce platform, item description pictures in the same platform or other platforms may be collected to form a candidate picture set, and for example, a crawler program may be used to capture item data on each platform. Optionally, the article data may be subjected to data cleaning to obtain information related to the candidate picture, and the information is stored in the database after being subjected to structuring processing. For example, the address of the extracted picture after the data of the article is washed, the identifier of the article described by the picture, the selling price of the article, the update time of the article description information, the identifier of the platform where the picture is located, the identifier of the picture (for example, the ID of the picture), and the like may be stored in the database.

In addition, feature extraction may be performed on candidates in the candidate picture set in advance to obtain a feature vector of each candidate picture, specifically, the candidate pictures may be input into a trained image similarity evaluation model, and the output of the feature extraction network is used as the feature vector of the candidate pictures. Further, a feature matrix of the candidate picture set may be constructed according to the feature vector of each candidate picture, for example, the feature vector of each candidate picture may be combined into a feature matrix by rows or columns, and a row number or a column number of the feature matrix may be associated with the corresponding candidate picture.

In this embodiment, the similarity between the target picture and the candidate picture may be calculated by using the feature vector of the target picture and the feature matrix of the candidate picture set, and a similar picture of the target picture may be determined from the candidate picture set according to a similarity calculation result. The similarity calculation can be performed by using similarity measurement methods such as cosine similarity, Pearson correlation coefficient, Euclidean distance and the like.

Specifically, if the feature matrix of the candidate picture set is formed by combining the feature vectors of the candidate pictures by rows or by columns, the similarity measurement parameters such as cosine similarity, pearson correlation coefficient, and the like between the feature vector of the target picture and each row vector or each column vector in the feature matrix can be calculated as the similarity between the target picture and the corresponding candidate picture. And sequentially calculating the similarity between the feature vector of the target picture and the feature vector of the candidate picture represented by each row element or each column element in the feature matrix. Then, the similarity between the calculated feature vectors and the candidate pictures can be ranked, and the candidate pictures with preset positions before ranking are selected as the similar pictures of the target picture; and selecting candidate pictures with the similarity greater than a preset similarity threshold as the similar pictures of the target picture.

In the method, the characteristics of the massive candidate pictures can be orderly represented by constructing the characteristic matrix of the candidate picture set. In some optional implementation manners, after extracting the feature vector of the candidate picture or the feature vector of the target picture by using the feature extraction network in the trained image similarity evaluation model, normalization processing may be performed on the extracted feature vector to accurately derive a similarity measure between the target picture and the candidate picture based on a unified similarity evaluation criterion when determining the similar picture of the target picture.

According to the method for determining the similar pictures, the feature extraction network in the trained image similarity evaluation model is used for extracting the features of the target picture and the candidate pictures, the candidate pictures similar to the target picture are determined according to the extracted feature vectors, the similarity between the images can be accurately estimated by the trained image similarity evaluation model, and the feature vectors extracted by the feature extraction network have better description power for the image features used as the basis for determining the similar pictures, so that the accuracy of searching the similar pictures can be improved. Meanwhile, because operations such as word segmentation, index establishment and the like do not need to be carried out on the description data of the picture in advance, the dependency on a data index mode of a search engine is reduced, and the similar picture of the target picture can be determined quickly and efficiently.

An exemplary application scenario of the foregoing embodiment of the present application is as follows: when a user accesses the e-commerce platform to browse the picture of a certain article, if the user desires to find the related information of other similar articles in the e-commerce platform or desires to view the related information of the same article or similar articles in other e-commerce platforms, then the description picture of the currently browsed article can be used as a target picture, a similar picture searching request is sent to a background server, the background server can extract the feature vector of the target picture by utilizing a feature extraction network in a trained image similarity evaluation model, and acquiring a feature matrix constructed by feature vectors of the collected article pictures in the e-commerce platform, and calculating the similarity between the target picture and the collected article pictures in the E-commerce platform according to the feature vector of the target picture and the feature matrix so as to determine similar pictures and further find similar articles of the currently browsed articles.

In some optional implementations of the foregoing embodiment, the feature vector of the candidate picture may be a normalized row feature vector, in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a row vector of the feature matrix, and the feature vector of the target picture may be a normalized column feature vector. For example, the feature matrix of the candidate picture set is A_n＝(α₁，α₂，α₃，…，α_n)^TThe normalized column eigenvector of the target picture is β, where n is the number of rows of the eigenvector matrix, i.e., the total number of candidate pictures corresponding to the eigenvector used to construct the eigenvector matrix, α_iThe normalized line feature vector of the ith candidate picture is 1,2,3, …, n. The feature vectors of the candidate pictures and the feature vectors of the target pictures are extracted through the feature extraction network in the trained image similarity evaluation model, so that the feature vectors of the candidate pictures and the feature vectors of the target pictures are extractedThe dimensions of the vectors being the same, i.e. vector α_iThe same dimension as β.

In step 202, a similar picture of the target picture may also be determined as follows: determining the maximum value of elements in a result column vector obtained by multiplying the feature matrix by the feature vector of the target picture, and determining that a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result column vector in the feature matrix is a similar picture of the target picture. In particular, feature matrix A may be aligned_nMultiplying the characteristic vector β to obtain a resultant column vector A_n×β＝(α₁β，α₂β，α₃β，…，α_nβ)^TThe maximum value of the elements in the resulting column vector is then determined, and the number of rows in which the maximum value is located is determined, at α₁β，α₂β，α₃β，…，α_nβ, the maximum max is determined (α)_iβ) and a maximum value max (α)_iβ) corresponding row number argmax (α)_iβ), and setting the line number in the candidate picture feature matrix as argmax (α)_iβ) is determined to be a similar picture to the target picture.

Similarly, in another optional implementation manner of this embodiment, the feature vector of the candidate picture may be a normalized column feature vector, in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a column vector of the feature matrix, and the feature vector of the target picture may be a normalized row feature vector. For example, the feature matrix of the candidate picture set is B_m＝(β₁，β₂，β₃，…，β_m) The normalized line eigenvector of the target picture is α, wherein m is the column number of the eigenvector matrix, i.e. the total number of candidate pictures corresponding to the eigenvector for constructing the eigenvector matrix β_jFor the normalized column feature vector of the jth candidate picture, j is 1,2,3, …, m.

In step 202, a similar picture of the target picture may also be determined as follows: determining elements in a result row vector obtained by multiplying a feature vector of a target picture by a feature matrixThe candidate picture characterized by the eigenvector corresponding to the position of the maximum value in the resultant row vector in the eigenvector is determined to be a similar picture of the target picture_mMultiplying to obtain a resultant row vector α × B_m＝(αβ₁，αβ₂，αβ₃，…，αβ_m) The maximum value of the elements in the resulting row vector is then determined, and the row number at which the maximum value is located is determined, at αβ₁，αβ₂，αβ₃，…，αβ_mTo determine the maximum value max (αβ)_j) And a maximum value max (αβ)_j) Corresponding line number argmax (αβ)_j) The line number in the candidate picture feature matrix is argmax (αβ)_j) The candidate picture represented by the feature vector of (2) is determined as a similar picture of the target picture.

The maximum value is inquired in the column vector or the row vector obtained by multiplying the feature matrix of the candidate picture set and the feature vector of the target picture, and the position of the maximum value in the column vector or the row vector obtained by multiplying corresponds to the feature matrix, so that the feature vector of the similar picture of the target picture is obtained, the similar picture of the target picture can be quickly determined by the multiplication operation of the matrix, and the searching efficiency of the similar picture is further improved.

With continuing reference to fig. 3, a flowchart illustration of another embodiment of a method for determining similar pictures according to the present application is shown. As shown in fig. 3, a flow 300 of the method for determining similar pictures of the present embodiment includes the following steps:

step 301, training a set based on sample pictures constructed by candidate pictures to obtain a trained image similarity evaluation model.

In this embodiment, the image similarity evaluation model includes a similarity calculation network and two feature extraction networks sharing a weight. The feature extraction network may be a neural network, for example, a multilayer convolutional neural network, and weights of the two feature extraction networks in the image similarity evaluation model may be shared. Also, the two feature extraction networks have the same structure. In the training process, when the parameters of the image similarity evaluation model to be trained are adjusted, the weights of the two feature extraction networks are synchronously adjusted.

The crawler program can be used for capturing pictures in the network to form a candidate picture set and constructing a sample picture pair set of the image similarity evaluation model. Here, a sample picture pair in the sample picture pair set may be composed of two candidate pictures. The features of the sample picture pairs may be extracted by using an operator such as SIFT (Scale-invariant feature transform), and the similarity of the sample picture pairs may be calculated based on the extracted features as the similarity annotation information of the sample picture pairs.

Firstly, an image similarity evaluation model to be trained can be constructed based on a twin network, then a sample picture pair is input into the image similarity evaluation model to be trained to obtain a similarity evaluation result of two candidate pictures in the sample picture pair, the similarity evaluation result is compared with similarity labeling information of the sample picture pair, and parameters in the image similarity evaluation model to be trained are adjusted in an iterative mode according to the difference between the similarity evaluation result and the similarity labeling information of the sample picture pair, wherein the parameters comprise the adjustment of the sharing weight of two feature extraction networks. And then repeatedly executing the operations of inputting the sample picture pair into the image similarity evaluation model to be trained to obtain the similarity evaluation results of two candidate images in the sample picture pair, then comparing the similarity evaluation results with the similarity marking information of the sample picture pair, iteratively adjusting parameters in the image similarity evaluation model to be trained according to the difference between the similarity marking information and the similarity marking information, and fixing the parameters of the image similarity evaluation model to be trained when the value of the loss function after iteration reaches a preset convergence condition to obtain the trained image similarity evaluation model.

Referring to fig. 4, a schematic structural diagram of an image similarity evaluation model in the method for determining similar pictures shown in fig. 3 is shown. As shown in fig. 4, the image similarity evaluation model may include two feature extraction networks sharing a weight and a similarity calculation network. The two feature extraction networks can respectively perform feature extraction on the input images I1 and I2, and the extracted features are input into the similarity calculation network to obtain the similarity Sim (I1, I2) between the input images I1 and I2.

In some optional implementations of the present embodiment, the sample picture pair set may include a sample picture pair formed by candidate pictures and similar attribute labeling information of the sample picture pair. Specifically, the candidate pictures in the candidate picture pair set may be combined pairwise to form a sample picture pair set, and a manually labeled tag used for characterizing whether two pictures in the sample picture pair are similar is used as similar attribute labeling information of the sample picture pair, for example, when two pictures in the sample picture pair are similar, the tag is "1", and when the two pictures in the sample picture pair are not similar, the tag is "0".

At this time, based on the sample picture pair set, a trained image similarity evaluation model can be obtained by training as follows: respectively inputting two sample pictures in the sample picture pair into two feature extraction networks in an image similarity evaluation model to be trained to obtain feature vectors of the two sample pictures in the sample picture pair; calculating the similarity of the two sample pictures in the sample picture pair by adopting a similarity calculation network in an image similarity evaluation model to be trained on the basis of the feature vectors of the two sample pictures in the sample picture pair; and iteratively adjusting the weight of the feature extraction network in the image similarity evaluation model to be trained by adopting a back propagation method based on a preset loss function so that the value of the loss function meets a preset convergence condition, wherein the value of the loss function is used for representing the difference between the similarity of the sample picture pair calculated by the image similarity evaluation model to be trained and the labeling information of the similarity attribute of the corresponding sample picture pair.

Specifically, in the training process of the image similarity evaluation model, an image similarity evaluation model to be trained including a similarity calculation network and two shared weight feature extraction networks may be constructed, then two sample pictures in the sample picture pair are input into the two feature extraction networks respectively for feature extraction, and then the euclidean distance between the two extracted feature vectors is based on the similarity calculation network. Specifically, a value of a preset loss function may be calculated, and the loss function may be constructed based on the euclidean distance between the two extracted feature vectors, for example, the loss function may be constructed according to the similarity between the two sample pictures represented by the euclidean distance and the similar attribute labeling information of the two sample pictures (i.e., a label indicating whether the two sample pictures are similar).

Alternatively, the loss function L may be constructed according to equation (1):

d is an Euclidean distance between two sample pictures in the sample picture pair obtained by the image similarity model to be trained, y is a label of whether the two sample pictures in the sample picture pair are similar, y is 1 to represent that the two sample pictures are similar, y is 0 to represent that the two sample pictures are not similar, margin is a set threshold, and N is the number of the sample picture pairs.

According to the formula (1), when the Euclidean distance between two sample pictures in the sample picture pair calculated by the image similarity evaluation model to be trained is consistent with the similarity attribute between the two sample pictures indicated by the similarity attribute marking information, the smaller the value of the loss function is, and otherwise, the larger the value of the loss function is. The value of the loss function can effectively represent the accuracy of the similarity evaluation result of the image similarity evaluation model to be trained.

In the training process, the values of the loss function can be propagated back to the image similarity evaluation model to be trained by using a gradient descent method, and parameters of the image similarity evaluation model to be trained are iteratively adjusted, wherein the parameters of the image similarity evaluation model to be trained can comprise weights of the feature extraction network. And after the parameters are adjusted, the image similarity evaluation model is reused for evaluating the similarity of the sample image pair, then the evaluation result of the image similarity evaluation model and the similar attribute labeling information are continuously compared by using the loss function value, the parameters are continuously adjusted according to the comparison result until the error of the similarity evaluation result of the image similarity evaluation model to be trained after the parameters are adjusted is converged or the number of iterative adjustment reaches the preset number, and the trained image type similarity evaluation model is obtained.

The trained image similarity evaluation model is obtained by training a sample picture pair set based on the candidate pictures, so that a training sample with a large sample number can be constructed, and the image similarity evaluation model is obtained by training based on the candidate pictures, so that the features of the candidate pictures can be more accurately extracted by a feature extraction network in the trained image similarity evaluation model, and the extracted feature vectors can better distinguish the dissimilar pictures.

In addition, by acquiring the similar attribute labeling information of the sample picture pair when the training sample is constructed, the image similarity evaluation model can use the similar attribute labeling information as an expected similarity evaluation result, so that the logics of picture feature extraction and similarity evaluation can be learned quickly, and the training speed is effectively improved.

In some optional implementation manners, after step 301, user feedback information for evaluating similarity between the target picture and the similar picture of the target picture may also be obtained; and determining similar attribute labeling information between the target picture and the similar pictures of the target picture based on the user feedback information, and adding the target picture and the similar pictures of the target picture as sample picture pairs into the sample picture pair set.

Specifically, the determined similar picture may be pushed to a user who issues a request for finding a similar picture of the target picture, and then feedback information of whether the pushed similar picture is similar to the target picture or not by the user is collected. And then taking the target picture and the pushed similar picture as a sample picture pair, taking feedback information of a user as similar attribute marking information of the sample picture pair, and adding the similar attribute marking information into a sample picture pair set. Therefore, the sample picture pair set can be continuously updated by utilizing the user feedback information, the image similarity evaluation model is iteratively updated, and the quality of the image similarity evaluation model is improved.

Returning to fig. 3, in step 302, inputting the target picture into the feature extraction network in the trained image similarity evaluation model to perform image feature extraction, so as to obtain a feature vector of the target picture.

The target picture may be input into the image similarity evaluation model obtained by training in step 301, and the feature vector of the target picture may be extracted by using the feature extraction network in the trained image similarity evaluation model.

In step 303, a similar picture of the target picture is determined from the candidate picture set based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance.

The feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by extracting the features of the candidate picture based on a feature extraction network in a trained image similarity evaluation model.

In the training process of the image similarity evaluation model, the feature vectors of the candidate pictures output by the feature extraction network in the last iteration can be obtained, and the feature matrix of the candidate picture set can be constructed, or the candidate pictures in the candidate picture set can be input into the feature extraction network in the trained image similarity evaluation model again to obtain the feature vectors of the candidate pictures, and then the feature matrix of the candidate picture set is constructed.

Optionally, the feature vector of each candidate picture in the candidate picture set and the feature vector of the target picture may be normalized, so that when the similarity between the target picture and each candidate picture is calculated, the similarity between the target picture and different candidate pictures is determined based on the normalized similarity evaluation criterion. For example, when a candidate picture with the highest similarity to the target picture is screened by using the euclidean distance, the normalized feature vector can more accurately describe the similarity of the two candidate pictures.

Step 302 and step 303 of this embodiment are respectively consistent with step 201 and step 202 of the foregoing embodiment, and specific implementation manners of step 302 and step 303 may also refer to descriptions of implementation manners of step 201 and step 202 of the foregoing embodiment, which are not described herein again.

The method for determining similar pictures shown in fig. 3 can further improve the accuracy of searching similar pictures by adding a step of training an image similarity evaluation model of a feature extraction network including a similarity calculation network and two shared weights by using a candidate picture pair set.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for determining similar pictures, which corresponds to the method embodiments shown in fig. 2 and fig. 3, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for determining a similar picture of the present embodiment includes an extraction unit 501 and a determination unit 502. The extracting unit 501 may be configured to input the target picture into a feature extraction network in a trained image similarity evaluation model for image feature extraction, so as to obtain a feature vector of the target picture; the determining unit 502 may be configured to determine a similar picture of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance. The feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by extracting the features of the candidate picture based on a feature extraction network in a trained image similarity evaluation model.

In some embodiments, the feature vector of the candidate picture is a normalized row feature vector, and the feature vector of the target picture is a normalized column feature vector; in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a row vector in the feature matrix; the determining unit 502 may be further configured to determine, based on the feature vector of the target picture and the feature matrix of the pre-acquired candidate picture set, a similar picture of the target picture from the candidate picture set as follows: determining the maximum value of elements in a result column vector obtained by multiplying the feature matrix by the feature vector of the target picture, and determining that a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result column vector in the feature matrix is a similar picture of the target picture.

In some embodiments, the feature vector of the candidate picture is a normalized column feature vector, and the feature vector of the target picture is a normalized row feature vector; in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a column vector in the feature matrix; the determining unit 502 may be further configured to determine, based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance, a similar picture of the target picture from the candidate picture set as follows: determining the maximum value of elements in a result row vector obtained by multiplying the feature vector of the target picture by the feature matrix, and determining a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result row vector in the feature matrix as a similar picture of the target picture.

In some embodiments, the image similarity evaluation model includes a similarity calculation network and two feature extraction networks sharing a weight. The apparatus 500 may further comprise: and the training unit is configured to train the sample picture pair set constructed by the candidate pictures to obtain a trained image similarity evaluation model.

In some embodiments, the sample picture pair set includes a sample picture pair formed by candidate pictures and similar attribute labeling information of the sample picture pair; the training unit may be further configured to train a trained image similarity evaluation model as follows: respectively inputting two sample pictures in the sample picture pair into two feature extraction networks in an image similarity evaluation model to be trained to obtain feature vectors of the two sample pictures in the sample picture pair; calculating the similarity of the two sample pictures in the sample picture pair by adopting a similarity calculation network in an image similarity evaluation model to be trained on the basis of the feature vectors of the two sample pictures in the sample picture pair; and iteratively adjusting the weight of the feature extraction network in the image similarity evaluation model to be trained by adopting a back propagation method based on a preset loss function so that the value of the loss function meets a preset convergence condition, wherein the value of the loss function is used for representing the difference between the similarity of the sample picture pair calculated by the image similarity evaluation model to be trained and the labeling information of the similarity attribute of the corresponding sample picture pair.

In some embodiments, the apparatus 500 may further include an updating unit configured to: acquiring user feedback information for evaluating the similarity between the similar picture of the target picture and the target picture; and determining similar attribute labeling information between the target picture and the similar pictures of the target picture based on the user feedback information, and adding the target picture and the similar pictures of the target picture as sample picture pairs into the sample picture pair set.

It should be understood that the elements recited in apparatus 500 correspond to various steps in the methods described with reference to fig. 2 and 3. Thus, the operations and features described above for the method are equally applicable to the apparatus 500 and the units included therein, and are not described in detail here.

According to the device 500 for determining similar pictures in the above embodiment of the application, feature extraction is performed on the target picture and the candidate picture by using the feature extraction network in the trained image similarity evaluation model, and the candidate picture similar to the target picture is determined according to the extracted feature vector, so that the efficiency and accuracy of searching for similar pictures are improved.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an extraction unit and a determination unit. The names of these units do not form a limitation on the unit itself in some cases, for example, the extraction unit may also be described as a unit for inputting a target picture into a feature extraction network in a trained image similarity evaluation model to perform image feature extraction, so as to obtain a feature vector of the target picture.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs, and when the one or more programs are executed by the device, the device inputs the target picture into a feature extraction network in a trained image similarity evaluation model to perform image feature extraction, so as to obtain a feature vector of the target picture; determining similar pictures of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of the candidate picture set acquired in advance; the feature matrix of the candidate picture set is constructed by the feature vector of each candidate picture in the candidate picture set, and the feature vector of the candidate picture is obtained by extracting the features of the candidate picture based on a feature extraction network in a trained image similarity evaluation model.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for determining similar pictures, comprising:

inputting a target picture into a feature extraction network in a trained image similarity evaluation model to perform image feature extraction, so as to obtain a feature vector of the target picture;

determining a similar picture of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance;

the feature matrix of the candidate picture set is constructed by feature vectors of all candidate pictures in the candidate picture set, and the feature vectors of the candidate pictures are obtained by performing feature extraction on the candidate pictures based on a feature extraction network in the trained image similarity evaluation model.

2. The method of claim 1, wherein the feature vector of the candidate picture is a normalized row feature vector and the feature vector of the target picture is a normalized column feature vector;

in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a row vector in the feature matrix;

the determining a similar picture of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance comprises:

determining the maximum value of elements in a result column vector obtained by multiplying the feature matrix by the feature vector of the target picture, and determining a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result column vector in the feature matrix as a similar picture of the target picture.

3. The method of claim 1, wherein the feature vector of the candidate picture is a normalized column feature vector, and the feature vector of the target picture is a normalized row feature vector;

in the feature matrix of the candidate picture set, the feature vector of each candidate picture is a column vector in the feature matrix;

determining the maximum value of elements in a result row vector obtained by multiplying the feature vector of the target picture by the feature matrix, and determining a candidate picture represented by the feature vector corresponding to the position of the maximum value in the result row vector in the feature matrix as a similar picture of the target picture.

4. The method according to any one of claims 1-3, wherein the image similarity evaluation model comprises a similarity calculation network and two weight-sharing feature extraction networks;

the method further comprises the following steps:

and training a set based on sample pictures constructed by candidate pictures to obtain the trained image similarity evaluation model.

5. The method according to claim 4, wherein the sample picture pair set comprises sample picture pairs formed by candidate pictures and similar attribute labeling information of the sample picture pairs;

the training of the sample picture pair set constructed based on the candidate pictures to obtain the trained image similarity evaluation model comprises the following steps:

respectively inputting two sample pictures in the sample picture pair into two feature extraction networks in an image similarity evaluation model to be trained to obtain feature vectors of the two sample pictures in the sample picture pair;

calculating the similarity of the two sample pictures in the sample picture pair by adopting a similarity calculation network in an image similarity evaluation model to be trained on the basis of the feature vectors of the two sample pictures in the sample picture pair;

and iteratively adjusting the weight of the feature extraction network in the image similarity evaluation model to be trained by adopting a back propagation method based on a preset loss function so that the value of the loss function meets a preset convergence condition, wherein the value of the loss function is used for representing the difference between the similarity of the sample picture pair calculated by the image similarity evaluation model to be trained and the labeling information of the similarity attribute of the corresponding sample picture pair.

6. The method of claim 5, wherein the method further comprises:

acquiring user feedback information for evaluating the similarity between the similar picture of the target picture and the target picture;

and determining similar attribute labeling information between the target picture and the similar picture of the target picture based on the user feedback information, and adding the target picture and the similar picture of the target picture as a sample picture pair to the sample picture pair set.

7. An apparatus for determining similar pictures, comprising:

the extraction unit is configured to input a target picture into a feature extraction network in a trained image similarity evaluation model to perform image feature extraction, so as to obtain a feature vector of the target picture;

the determining unit is configured to determine similar pictures of the target picture from the candidate picture set based on the feature vector of the target picture and a feature matrix of a candidate picture set acquired in advance;

8. The apparatus of claim 7, wherein the feature vector of the candidate picture is a normalized row feature vector, and the feature vector of the target picture is a normalized column feature vector;

the determining unit is further configured to determine a similar picture of the target picture from the candidate picture set according to the feature vector of the target picture and a feature matrix of a pre-acquired candidate picture set as follows:

9. The apparatus of claim 7, wherein the feature vector of the candidate picture is a normalized column feature vector, and the feature vector of the target picture is a normalized row feature vector;

10. The apparatus according to any one of claims 7-9, wherein the image similarity evaluation model comprises a similarity calculation network and two weight-sharing feature extraction networks;

the device further comprises:

and the training unit is configured to train a sample picture pair set constructed by candidate pictures to obtain the trained image similarity evaluation model.

11. The apparatus according to claim 10, wherein the sample picture pair set includes a sample picture pair formed by candidate pictures and similar attribute labeling information of the sample picture pair;

the training unit is further configured to train the trained image similarity evaluation model as follows:

12. The apparatus of claim 11, wherein the apparatus further comprises an update unit configured to:

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.