CN108446307A

CN108446307A - A kind of the binary set generation method and image, semantic similarity search method of multi-tag image

Info

Publication number: CN108446307A
Application number: CN201810111604.1A
Authority: CN
Inventors: 吴大衍; 叶明臻; 李波; 古晓艳; 王伟平; 孟丹
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2018-02-05
Filing date: 2018-02-05
Publication date: 2018-08-24

Abstract

The invention discloses a kind of binary set generation method of multi-tag image and image, semantic similarity search methods.The present invention is trained using training data set pair convolutional neural networks model, until the penalty values of the convolutional neural networks model tend towards stability；Then the picture in image data base is calculated using the convolutional neural networks model after training, obtain the binary set of every pictures and is stored；The binary set of picture to be checked is calculated using the convolutional neural networks model after training, and the binary set of itself and storage is subjected to similarity calculation, is returned and the most like several pictures of the picture to be checked according to similarity calculation result.The present invention substantially increases the storage efficiency and search efficiency of image；And Target Photo can be ranked up according to the multistage semantic similarity with inquiry picture.

Description

A kind of the binary set generation method and image, semantic similarity search of multi-tag image Method

Technical field

Present invention is mainly applied to field of image search, are related to a kind of the binary set generation method and needle of multi-tag image To the similarity search method of multi-tag image, semantic feature.

Background technology

In recent years, with the fast development of network technology, thousands of picture is uploaded in internet daily, such as What rapidly and accurately retrieved according to different user demands from the picture of magnanimity picture concerned have become research hot spot and Difficult point.For example, content-based image retrieval needs retrieve and inquire the similar Target Photo of image content, " phase here Seemingly " finger vision or semantic similar.Image higher-dimension primitive character is mapped to low-dimensional by the hash algorithm towards image, semantic feature Image, semantic information is remained while binary features, therefore is received significant attention.

The hash algorithm towards image, semantic feature of mainstream being capable of Simultaneous Extracting Image using depth learning technology at present Semantic feature simultaneously learns hash function, but still has following limitation：(1) most of hash algorithm can only simple zones split-phase Sihe Dissimilar picture, it is bad for the multi-tag image effect containing multistage semantic similarity.(2) Kazakhstan towards multi-tag image Uncommon algorithm can not the effective district multi-tag picture that divides similarity different.

Invention content

For the technical problems in the prior art, the purpose of the present invention is to provide a kind of two-values of multi-tag image Vector generation method and extensive multi-tag image, semantic similarity search method.The present invention is based on convolutional neural networks model, By well-designed loss function learning model parameter, realizes Simultaneous Extracting Image primitive character and learn hash function, most The image binary coding exported eventually has the following properties that：

● it is formed by 1, -1, while improving storage and search efficiency；

● Target Photo can be ranked up according to the multistage semantic similarity with inquiry picture；

● in the case where characteristics of image coding is shorter, remain to according to semantic similarity effective district partial objectives for picture, especially It is to return the result former accuracys rate compared with mainstream algorithm higher.

The technical scheme is that：

A kind of binary set generation method of multi-tag image, step include：

1) it is trained using training data set pair convolutional neural networks model, until the damage of the convolutional neural networks model Mistake value tends towards stability；Wherein, every time training when training dataset include N to picture, for i-th pair picture I_{I, 1}、I_{I, 2}If figure Piece I_{I, 1}Number of labels be n_{I, 1}, picture I in i-th pair image_{I, 1}With picture I_{I, 2}Common associated number of labels is n_{I, 2}, calculate The loss function η of the penalty values is

Wherein, n_{I, 1}=n_{I, 2}When, y_i=0, otherwise y_i=1；Ones representative elements are all 1 vector, | | | |₁Indicate to First norm of amount,Indicate the Euclidean distance between vector, | | it indicates to carry out absolute value operation, α per bit element to vector It is the parameter for controlling quantization loss size, w is the weight vectors of the Hash layer of convolutional neural networks model, f (I；W) it is volume The positions the k binary set of the picture I of product neural network model output, m refers to Hamming distance threshold parameter；

2) binary set of picture is calculated using the convolutional neural networks model after training.

A kind of multi-tag image, semantic similarity search method, step include：

Wherein, n_{I, 1}=n_{I, 2}When, y_i=0, otherwise y_i=1；Ones representative elements are all 1 k bit vectors, | | | |₁It indicates First norm of vector,Indicate the Euclidean distance between vector, | | it indicates to carry out absolute value operation per bit element to vector, α is the parameter for controlling quantization loss size, and w is the weight vectors of the Hash layer of convolutional neural networks model, f (I；W) it is The positions the k binary set of the picture I of convolutional neural networks model output, m refers to Hamming distance threshold parameter；

2) picture in image data base is calculated using the convolutional neural networks model after training, obtains every figure The binary set of piece is simultaneously stored；

3) binary set of picture to be checked is calculated using the convolutional neural networks model after training, and by itself and step 2) Obtained binary set carries out similarity calculation, is returned and most like several of the picture to be checked according to similarity calculation result Picture.

Further, the convolutional neural networks model is trained using minimum batch gradient descent method, can minimized The value of the loss function.

Further, which includes sequentially connected first convolutional layer, maximum pond layer, volume Two Lamination, maximum pond layer, third convolutional layer, Volume Four lamination, the 5th convolutional layer, maximum pond layer, the first full articulamentum, second Full articulamentum and Hash layer.

Further, the hash function of the Hash layer is h (x；W)=sign (f (x；w))；Wherein, f (x；W)=w^Tf′ (x), f ' (x) is the output vector of the second full articulamentum；The Hash layer is equipped with k node, and a weight is arranged in each node, this A little weights constitute weight vectors w.

Further, α=0.01.

Further, the Hamming distance of the binary set obtained according to the binary set of picture to be checked and step 2), really The fixed similarity.

The invention mainly comprises the following contents：

1) image, semantic feature extraction and Hash letter can be carried out at the same time by being based on convolutional neural networks modelling one kind The frame that mathematics is practised.Using the frame, multi-tag image can be mapped to binary set and retained more between image by the present invention Grade semantic similarity.

2) a kind of well-designed loss function based on image tag pair.It can be to each layer of model based on the loss function Parameter is learnt.

Compared with prior art, the positive effect of the present invention is：

The present invention substantially increases the storage efficiency and search efficiency of image；And it can be according to the multistage with inquiry picture Semantic similarity is ranked up Target Photo.The present invention is remained in the case where characteristics of image coding is shorter according to semantic phase Like degree effective district partial objectives for picture, former accuracys rate are especially returned the result compared with mainstream algorithm higher.

Description of the drawings

Fig. 1 is the frame construction drawing of the present invention.

Specific implementation mode

Present invention will be further explained below with reference to the attached drawings and examples.

One：Hash function

Define hash function h (x；w)：

h(x；W)=sign (f (x；w)) (1)

f(x；W)=w^TF ' (x), w are the weight vectors of Hash layer, and f ' (x) is the output vector of full articulamentum seven.Hash Layer is made of a full articulamentum, and Hash node layer number is equal with the binary set digit k generated is finally needed, and k values are advance It sets.

Two：Loss function

By way of optimizing loss function, the study of each layer parameter of convolutional neural networks model is realized.For training mould The training data of type is concentrated, and every image corresponds to different labels, and label is that artificial mark obtains, can be according to image when training Label information judges the similarity degree of image.If i-th pair picture I_i1,I_i2Respectively with p₁,p₂A label association, enables I_i1's Number of labels is n_i1, then n_i1=| p₁|, enable I_i1,I_i2Common associated number of labels is n_i2, then n_i2=| p₁∩p₂|, work as n_i1= n_i2When, enable variable y_i=0, otherwise y_i=1, for I_i1,I_i2Loss function be defined as follows：

D_H() is the Hamming distance of two binary sets, and m is threshold parameter (m>0, taking for m can be discussed in detail hereinafter Value).

Loss function consists of two parts, and is to separate with plus sige.n₁=n₂When, y=0, at this time it is considered herein that two pictures Closely similar, as long as two image binary features have differences being presented as in loss function, the present invention implements to punish；n₁ ≠n₂When, y=1 is presented as two at this time it is considered herein that two pictures are general similar or dissimilar in loss function The similarity of image is different, and the Hamming distance between binary feature vector should change therewith.When N to picture as training set When, the present invention finally needs the loss function minimized to be：

Three：The conversion of loss function

Hamming distance in equation (2) is presented in a discrete fashion, it is difficult to solution is directly optimized, it is of the invention right thus Equation (2) is converted.Specifically, the present invention converts the Hamming distance in equation (2) to Euclidean distance, meanwhile, in order to So that the output f (I of convolutional neural networks model；W) 1 or -1 is approached, invention introduces quantizations to lose.At this point, peer-to-peer (2) optimization can be approximated to be the optimization to following formula：

Loss function is made of three parts, is to separate with plus sige, preceding two parts act on, Part III identical as equation (2) Effect be make picture feature vector each approach 1 or -1, wherein Ones representative elements are all 1 vector, vectorial length Degree is k (identical with the binary set length of output),Indicate the Euclidean distance between vector, | | | |₁Indicate the first of vector Norm, | | it indicates to carry out absolute value operation per bit element to vector, α (0 α≤1 ＜) is for controlling quantization loss size Parameter.Bringing equation (4) into equation (3) can obtain：

Wherein, I_{I, 1}、I_{I, 2}It is two images in i-th pair image, n_{I, 1}It is image I in i-th pair image_{I, 1}Number of labels, n_{I, 2}It is image I in i-th pair image_{I, 1}、I_{I, 2}Common associated number of labels, n_{I, 1}=n_{I, 2}When, y_i=0, otherwise y_i=1；M is threshold Value parameter.

Four：The study of model parameter

The parameter of convolutional neural networks model includes the connection weight of convolution nuclear parameter and full articulamentum, it is determined that parameter It could finally determine model, the two-value semantic feature of image can be extracted using model, the final quick phase for realizing large nuber of images Like degree match query.The learning algorithm of model parameter utilizes the thought of backpropagation, specific to utilize minimum batch gradient descent method Training neural network, can minimize the value of loss function i.e. formula (5), and the extraction of subsequent pictures feature will all use training The modular form (5) obtained afterwards can be separated into three (being respectively Term1, Term2, Regularizer) according to plus sige, each is right In f_i,jDerivative it is as follows：

When -1≤x≤0 or x >=1, δ (x)=1, otherwise, δ (x)=0.

Wherein, I_{I, j}It is an image in i-th pair image, the value of j is 1 or 2, n_{I, 1}It is an image in i-th pair image Number of labels, n_{I, 2}It is I in i-th pair image_{I, 1}、I_{I, 2}Common associated number of labels, n_{I, 1}=n_{I, 2}When, y_i=0, otherwise y_i= 1, i value range is 1~N.

Five, details is realized

Algorithm is realized based on Caffe deep learning frames, as shown in Figure 1, after convolutional layer one, convolutional layer two and convolutional layer five There is maximum pond layer (ReLU layers) in face, and in model training, minimum batch gradient descent algorithm parameter setting is as follows：

Batch size=32, momentum=0.9, weight decay=0.004.

The present invention compared the experiment effect of α={ 0.1,0.01,0.001 } respectively, the results showed that, as α=0.01, inspection Rope best results.

The final output of algorithm, when the label of two pictures is entirely different, is enabled by the positions the k binary set of 1, -1 composition Hamming distance between binary set is at leastWhen label overlaps, the Hamming distance of binary set is enabled to be at least The specific calculations of threshold parameter m are following (m* initial values are 2k)：

Function：Calculate threshold parameter m

Input:m^*,n₁,k.

Output:m.

1:M=0；

2:while m<m^*do

3:M+=4n₁；

4:end while

5:if m>4k then

6:M=4k；

7:end if

8:return m。

To make full use of computing resource and memory space, the present invention to generate image tag online from each minimum batch It is right.For the image pair for obtaining between different batches, the training of a full dose data set is often completed, the present invention can upset training set Picture sequence, the following (Labels (I of specific implementation₁) it is picture I₁Associated number of labels)：

Function：It is online to generate image pair

Input:a batch S of 32training images

Output:a set T of 2-tuples

1：

2：For every 2-tuple t=(I₁；I₂)that I₁∈S；I₂∈S do

3：if Labels(I₁)>0then

4：T←T∪t

5：end if

6：end for

7：return T.

Embodiment 1

Assuming that possessing the artificial V pictures marked now, there is at least one label to be used for characterizing the figure per pictures The semantic information of piece, these pictures will train neural network model as training set.

It is convolutional layer one, maximum pond layer one, volume successively from front to back 1. building network model according to the structure in Fig. 1 Lamination two, maximum pond layer two, convolutional layer three, convolutional layer four, convolutional layer five, maximum pond layer, full articulamentum six, full articulamentum Seven and Hash layer, wherein each channel of convolutional layer a pair of image carries out convolution operation, behind each convolutional layer to last layer Output carry out convolution operation, maximum pond layer to the output of last layer carries out that maximum regional value is taken to operate, and full articulamentum is to upper One layer of output carries out full attended operation, and the input of loss function is the output of the last one Hash layer.

2. the v inputs as network model are chosen in order from V pictures, according to the loss of formula (5) computation model Value, wherein.By minimum batch gradient descent method, model is trained, obtains the value of each parameter in model, Model parameter includes convolution nuclear parameter, each Node connectedness weight of full articulamentum.

3. random upset picture sequence, return to step 2 is until the penalty values being calculated according to formula (5) tend towards stability.Really The value of all parameters in cover half type.

4. all pictures in image data base are sent into model, obtain the binary set of every pictures and stored.

Embodiment 2

Assuming that there is picture to be checked, which need not possess label information, and the present invention wishes the picture number in magnanimity According to finding what semantic most like pictures in library：

1. as input, it is input in the model that training obtains, the value of its binary set is calculated.

2. then calculate Hamming distance with the binary sets of all pictures in database, the smaller picture of Hamming distance with It is more similar to inquire picture.

3. sorting according to Hamming distance, image results collection is returned by sequence from small to large.

It is above to implement to be merely illustrative of the technical solution of the present invention rather than be limited, the ordinary skill people of this field Member can be modified or replaced equivalently technical scheme of the present invention, without departing from the spirit and scope of the present invention, this hair Bright protection domain should be subject to described in claims.

Claims

1. a kind of binary set generation method of multi-tag image, step include：

1) it is trained using training data set pair convolutional neural networks model, until the penalty values of the convolutional neural networks model It tends towards stability；Wherein, every time training when training dataset include N to picture, for i-th pair picture I_{I, 1}、I_{I, 2}If picture I_{I, 1}Number of labels be n_{I, 1}, picture I in i-th pair image_{I, 1}With picture I_{I, 2}Common associated number of labels is n_{I, 2}, calculate institute The loss function η for stating penalty values is

Wherein, n_{I, 1}=n_{I, 2}When, y_i=0, otherwise y_i=1；Ones representative elements are all 1 vector, | | | |₁Indicate vector First norm,Indicate the Euclidean distance between vector, | | it indicates to carry out absolute value operation per bit element to vector, α is to use Come control quantization loss size parameter, w be convolutional neural networks model Hash layer weight vectors, f (I；W) it is convolution god The positions the k binary set of picture I through network model output, m refers to Hamming distance threshold parameter；

2. a kind of multi-tag image, semantic similarity search method, step include：

Wherein, n_{I, 1}=n_{I, 2}When, y_i=0, otherwise y_i=1；Ones representative elements are all 1 k bit vectors, | | | |₁Indicate vector The first norm,Indicate the Euclidean distance between vector, | | it indicates to carry out absolute value operation per bit element to vector, α is For controlling the parameter of quantization loss size, w is the weight vectors of the Hash layer of convolutional neural networks model, f (I；W) it is convolution The positions the k binary set of the picture I of neural network model output, m refers to Hamming distance threshold parameter；

2) picture in image data base is calculated using the convolutional neural networks model after training, obtains every pictures Binary set is simultaneously stored；

3) binary set of picture to be checked is calculated using the convolutional neural networks model after training, and itself and step 2) are obtained Binary set carry out similarity calculation, according to similarity calculation result return with the most like several figures of the picture to be checked Piece.

3. method as claimed in claim 1 or 2, which is characterized in that train the convolution refreshing using minimum batch gradient descent method Through network model, the value of the loss function can be minimized.

4. method as claimed in claim 1 or 2, which is characterized in that the convolutional neural networks model includes sequentially connected One convolutional layer, maximum pond layer, the second convolutional layer, maximum pond layer, third convolutional layer, Volume Four lamination, the 5th convolutional layer, most Great Chiization layer, the first full articulamentum, the second full articulamentum and Hash layer.

5. method as claimed in claim 4, which is characterized in that the hash function of the Hash layer is h (x；W)=sign (f (x；w))；Wherein, f (x；W)=w^TF ' (x), f ' (x) are the output vector of the second full articulamentum；The Hash layer is equipped with k section A weight is arranged in point, each node, these weights constitute weight vectors w.

6. method as claimed in claim 1 or 2, which is characterized in that α=0.01.

7. method as claimed in claim 2, which is characterized in that obtained according to the binary set of picture to be checked and step 2) The Hamming distance of binary set determines the similarity.