CN108399211B

CN108399211B - Large-scale image retrieval algorithm based on binary characteristics

Info

Publication number: CN108399211B
Application number: CN201810106624.XA
Authority: CN
Inventors: 鲁继文; 周杰; 陈志祥; 袁鑫
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-02-02
Filing date: 2018-02-02
Publication date: 2020-11-24
Anticipated expiration: 2038-02-02
Also published as: CN108399211A

Abstract

The invention discloses a large-scale image retrieval algorithm based on binary characteristics, which comprises the following steps: step S1: initializing neural network parameters, and initializing real-value output characteristics according to a training picture set; step S2: constructing a picture similarity matrix according to the training picture set, and constructing a Laplace matrix; step S3: constructing a loss function through weighting similarity measurement; step S4: the real-valued output characteristics are derived through a loss function, the real-valued output characteristics are updated by fixing the difference quantity, and meanwhile, the network parameters are updated; step S5: the difference is derived through a loss function, and the real-value output characteristic is fixed to update the difference; step S6: the high-order expansion weight is increased, and the real-valued output features and the network parameters are continuously updated according to steps S3 and S4 in combination with the loss function until the training is finished. The method can effectively compensate the problem caused by the imbalance of the positive training sample and the negative training sample in the input data pair, and effectively improve the retrieval precision.

Description

Large-scale image retrieval algorithm based on binary characteristics

Technical Field

The invention relates to the technical field of computer vision and multimedia, in particular to a large-scale image retrieval algorithm based on binary characteristics.

Background

Image retrieval is a data search method used to find pictures. The system searches some pictures similar to the input pictures from the database according to the retrieval information such as keywords, pictures and the like input by the user and feeds back the pictures to the user. The measure of similarity may be based on picture auxiliary information (e.g. keywords) or picture content characteristics such as picture texture, color, shape, etc.

Content-based image retrieval is an application of computational vision in the field of image retrieval. Such algorithms aim to avoid retrieving images using textual information, but rather by the characteristics of the texture, color, shape, etc. of the picture itself. Such algorithms require calculation of euclidean distances in feature space for the search image and the database image. In large-scale datasets, both the storage overhead using real-valued features and the time overhead of calculating euclidean distances at the time of retrieval are unacceptable.

The hash-based image retrieval can solve the problem of excessive time and storage overhead. Hash-based image retrieval algorithms store and retrieve images using binary features rather than real-valued features. The distance calculation between the binary features can be quickly realized by utilizing exclusive-or operation, and meanwhile, because each bit in the binary features only needs 1-bit storage space, the storage cost of the database picture features can be obviously reduced. The binary feature is called a hash feature, a function mapped from an original space to a hamming space is called a hash function, and a process of learning to obtain the hash feature is called hash learning.

The biggest difficulty in hash learning is that the optimization problem of solving the optimal hash characteristics is an NP problem. This is determined by the property that the value of the hash feature can only take 0,1 or ± 1. The integer optimization problem cannot solve the optimal solution through a traditional numerical optimization method, and therefore constraint conditions need to be relaxed. There are three main types of relaxation methods: and directly discarding the binary constraint, introducing a quantization error optimization term, and relaxing the step function into a sigmoid function. The first method is directly neglected by the constraint condition, so that the learned hash function has huge quantization error. The second method needs to introduce real-valued hidden layer characteristics and other auxiliary variables, decompose the original integer optimization problem into a plurality of solvable subproblems, and seek a local optimal solution through a step-by-step iterative optimization method. Sometimes, the sub-optimization problem of the hash feature is still an NP-hard problem without a closed-form solution, and a local optimal solution of the sub-problem needs to be converged to by a coordinated descent method. In the third method, the convergence speed of the training model is obviously reduced due to the introduction of a nonlinear function. In any of the above methods, there is always a difference between the trained hash function Φ and the hash function Ψ ═ sgn (Φ) actually used. This can result in a reduction in the effectiveness of the search on data outside the sample set.

Furthermore, when the training data is given in the form of pairs of pairwise data, previous approaches have defined the supervision flags of similar pairs of pictures as 1 and the supervision flags of dissimilar pairs of pictures as 0 or-1. The imbalance between positive and negative training samples is caused because the number of similarities in the pair of pairwise data constructed from most training sets is always much smaller than the number of dissimilarities.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the invention aims to provide a large-scale image retrieval algorithm based on binary characteristics, and the method can effectively improve the retrieval precision.

In order to achieve the above object, an embodiment of the present invention provides a large-scale image retrieval algorithm based on binary features, including the following steps: step S1: initializing neural network parameters, and initializing real-value output characteristics according to a training picture set; step S2: constructing a picture similarity matrix according to the training picture set, and constructing a Laplace matrix; step S3: constructing a loss function through weighting similarity measurement; step S4: the real-valued output characteristics are derived through the loss function, the fixed difference quantity updates the real-valued output characteristics, and meanwhile, network parameters are updated; step S5: the difference is derived through the loss function, and the difference is updated by fixing the real-value output characteristic; step S6: the higher-order unfolding weights are increased, and the real-valued output features and the network parameters are continuously updated according to the steps S3 and S4 in combination with the loss function until the training is finished.

According to the large-scale image retrieval algorithm based on the binary characteristics, an original binary optimization problem in Hash learning is converted into an optimization problem which can be guided to a Hash function, a binary constraint and similarity keeping target in Hash learning is decoupled, so that the converted problem can be optimized and solved through a simple interactive iteration frame, the problem caused by unbalance of positive and negative training samples in an input data pair is effectively compensated through a weighting similarity measurement method, the problem that the Hash function obtained through training is inconsistent with the Hash function used actually is effectively solved, and retrieval precision is improved.

Further, in an embodiment of the present invention, the step S1 further includes: step S101: initializing neural network parameters; step S102: and acquiring real-value output characteristics of pictures in a training picture set by using the initialized neural network, taking the real-value output characteristics as the initial real-value output characteristics H and the discrete output characteristics B as sgn (H), and taking the discrete output characteristics B as picture hash codes.

Further, the method can be used for preparing a novel materialIn one embodiment of the present invention, the step S2 includes the following steps: step S201: calculating the similarity of any two pictures in the training picture set through a binary similarity function, and recording the similarity of the ith picture and the jth picture as s_ijThe picture similarity matrix formed by the method is marked as S; step S202: obtaining a Laplace matrix of the picture similarity matrix S, and recording the Laplace matrix as L_sym。

Further, in an embodiment of the present invention, the step S202 specifically includes:

computing

D＝diag(d₁,d₂,......,d_N)，L＝D-S；

The Laplace matrix of the picture similarity matrix S

Further, in an embodiment of the present invention, the step S3 includes the following steps:

step S301: compensating imbalance of positive and negative training samples by using a weighted similarity measurement method, calculating weighted similarity of any two pictures in the training picture set through the similarity, and recording the weighted similarity of the ith picture and the jth picture as

The weighted similarity matrix of the two sets is recorded as

Step S302: for any pictures i and j in the training picture set, according to the discrete output characteristic b_iAnd b_jAnd constructing a loss function:

step S303: summing all sample pairs in the training picture set to construct a loss function:

the loss function matrix form is:

wherein the content of the first and second substances,

d is a diagonal matrix and diagonal elements

Step S304: defining a difference Δ ═ B-H, according to a taylor series, the loss function develops at the real-valued output characteristic H as follows:

wherein, if the real-valued output characteristic H and the difference quantity delta are connected with each element in a row manner, then

Is the ith column vector of real-valued output features H,

is the ith column vector of the difference Δ;

step S305: according to the expanded form of step S304, the loss function in step S303 is:

step S306: combining the step S303 and the step S305, constructing the loss function with respect to the real-valued output feature H and the difference Δ:

wherein, (H + delta) epsilon { -1,1}^n×lWherein λ is₁And λ₂Weights are expanded for higher orders.

Further, in an embodiment of the present invention, the step S301 specifically includes:

for any pictures i and j in the training picture set, according to the similarity s_ijCalculating the weighted similarity:

in order to ensure that the similarity of the similar pictures is a positive value, and the similarity of the non-similar pictures is a negative value in the similarity measurement, let β <1 > be 0, and if β is 0.5, the original similarity is obtained by 0.5-fold scaling, and a weighted similarity measurement method is not used.

Further, in an embodiment of the present invention, the step S4 includes the following steps:

step S401: fixing a difference quantity, constructing the loss function with respect to the real-valued output features:

step S402: fixing the difference, calculating the derivative of the loss function on the real-valued output characteristic:

step S403: and updating the real-valued output characteristics and the network parameters according to a random gradient descent method.

Further, in an embodiment of the present invention, the step S5 includes the following steps:

step S501: fixing the real-valued output characteristics, constructing a loss function with respect to the difference quantity:

satisfies (H + delta) epsilon { -1,1}^n×l；

Step S502: and (3) fixing real-value output characteristics, calculating the derivation of the loss function on the difference quantity:

step S503: the difference amount Δ is updated.

Further, in an embodiment of the present invention, the step S503 specifically includes:

according to

And iteratively updating the difference quantity.

Further, in an embodiment of the present invention, the step S6 includes the following steps: step S601: increasing the higher order expansion weight λ₁And λ₂(ii) a Step S602: according to the steps S4 and S5, the real-valued output features and the network parameters are continuously updated in combination with the loss function until the training is finished.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a large scale image retrieval algorithm based on binary features according to an embodiment of the present invention;

FIG. 2 is a flowchart of a large-scale image retrieval algorithm based on binary features according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

A proposed binary-feature-based large-scale image retrieval algorithm according to an embodiment of the present invention is described below with reference to the accompanying drawings.

FIG. 1 is a flowchart of a large-scale image retrieval algorithm based on binary features according to an embodiment of the present invention.

As shown in fig. 1 and fig. 2, the binary-feature-based large-scale image retrieval algorithm includes the following steps:

step S1: initializing neural network parameters, and initializing real-value output characteristics according to the training picture set.

Further, in an embodiment of the present invention, the step S1 further includes: step S101: initializing neural network parameters; step S102: and acquiring real-value output characteristics of the pictures in the training picture set by using the initialized neural network, taking the real-value output characteristics as initial real-value output characteristics H and discrete output characteristics B as sgn (H), and taking the discrete output characteristics B as picture hash codes.

Step S2: and constructing a picture similarity matrix according to the training picture set, and constructing a Laplace matrix.

Further, in an embodiment of the present invention, the step S2 includes the following steps: step S201: calculating the similarity of any two pictures in the training picture set through a binary similarity function, and recording the similarity of the ith picture and the jth picture as s_ijThe picture similarity matrix formed by the method is marked as S; step S202: obtain the Laplace matrix of the picture similarity matrix S, and record as L_sym。

Further, in an embodiment of the present invention, step S202 specifically includes:

computing

D＝diag(d₁,d₂,......,d_N)，L＝D-S；

Laplace matrix of picture similarity matrix S

Step S3: the loss function is constructed by weighting the similarity measure.

step S301: using a weighted similarity measurement method to compensate the imbalance of positive and negative training samples, calculating the weighted similarity of any two pictures in a training picture set through the similarity, and recording the weighted similarity of the ith picture and the jth picture as

The weighted similarity matrix of the two sets is recorded as

the loss function matrix form is:

wherein the content of the first and second substances,

d is a diagonal matrix and diagonal elements

Step S304: defining the difference Δ ═ B-H, the loss function develops at the real-valued output characteristic H according to a taylor series as follows:

Is the ith column vector of real-valued output features H,

is the ith column vector of the difference Δ;

step S306: combining step S303 and step S305, a loss function is constructed for the real-valued output feature H and the difference Δ:

Further, in an embodiment of the present invention, step S301 specifically includes:

in order to ensure that the similarity measure has a positive similarity for similar pictures and a negative similarity for non-similar pictures, let β <1 > be 0, and if β is 0.5, obtain the original similarity by 0.5-fold scaling, a weighted similarity measure method is not used.

Step S4: and (4) the real-valued output characteristics are derived through a loss function, the real-valued output characteristics are updated by fixing the difference quantity, and meanwhile, the network parameters are updated.

step S401: fixing the difference quantity, constructing a loss function with respect to the real-valued output features:

step S402: fixing the difference, calculating the derivation of the loss function on the real-valued output characteristics:

Step S5: and (4) carrying out derivation on the difference quantity through a loss function, and fixing the real-value output characteristic to update the difference quantity.

step S501: and (3) fixing real-value output characteristics, constructing a loss function about the difference:

satisfies (H + delta) epsilon { -1,1}^n×l；

Step S502: fixing real value output characteristics, calculating the derivation of the loss function on the difference quantity:

step S503: the difference amount Δ is updated.

Further, in an embodiment of the present invention, step S503 specifically includes:

according to

And iteratively updating the difference quantity.

Step S6: the high-order expansion weight is increased, and the real-valued output features and the network parameters are continuously updated according to steps S3 and S4 in combination with the loss function until the training is finished.

Further, in an embodiment of the present invention, the step S6 includes the following steps: step S601: increasing the higher order expansion weight λ₁And λ₂(ii) a Step S602: according to steps S4 and S5, the real-valued output features and the network parameters are continuously updated by combining the loss function until the training is finished.

According to the large-scale image retrieval algorithm based on the binary characteristics, the original binary optimization problem in the Hash learning is converted into the optimization problem which can be guided to the Hash function, the binary constraint and the similarity keeping target in the Hash learning are decoupled, the converted problem can be optimized and solved through a simple interactive iteration framework, the problem caused by unbalance of positive and negative training samples in an input data pair is effectively compensated through a weighted similarity measurement method, the problem that the Hash function obtained through training is inconsistent with the Hash function used actually is effectively solved, and the retrieval precision is improved.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be considered limiting of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A large-scale image retrieval method based on binary characteristics is characterized by comprising the following steps:

step S1: initializing neural network parameters, and initializing real-value output characteristics according to a training picture set;

step S2: constructing a picture similarity matrix according to the training picture set, and constructing a Laplace matrix;

step S3: constructing a loss function through weighting similarity measurement;

step S4: the real-valued output characteristics are derived through the loss function, the fixed difference quantity updates the real-valued output characteristics, and meanwhile, network parameters are updated;

step S5: the difference is derived through the loss function, and the difference is updated by fixing the real-value output characteristic; and

step S6: increasing the high-order expansion weight, and continuously updating the real-valued output features and the network parameters according to the step S4 and the step S5 in combination with the loss function until the training is finished;

wherein the step S1 further includes:

step S101: initializing neural network parameters;

step S102: acquiring real-value output characteristics of pictures in a training picture set by using the initialized neural network, taking the real-value output characteristics as initial real-value output characteristics H and discrete output characteristics B as sgn (H), and taking the discrete output characteristics B as picture hash codes;

the step S2 includes the steps of:

step S201: calculating the similarity of any two pictures in the training picture set through a binary similarity function, and recording the similarity of the ith picture and the jth picture as s_ijThe picture similarity matrix formed by the method is marked as S;

step S202: obtaining a Laplace matrix of the picture similarity matrix S, and recording the Laplace matrix as L_sym；

The step S3 includes the steps of:

The weighted similarity matrix of the two sets is recorded as

the loss function matrix form is:

wherein the content of the first and second substances,

d is a diagonal matrix and diagonal elements

Is the ith column vector of real-valued output features H,

is the ith column vector of the difference Δ;

2. The method for retrieving a large-scale image based on binary features according to claim 1, wherein the step S202 specifically comprises:

computing

D＝diag(d₁,d₂,......,d_N)，L＝D-S；

The Laplace matrix of the picture similarity matrix S

3. The method for retrieving a large-scale image based on binary features according to claim 1, wherein the step S301 specifically comprises:

4. The binary-feature-based large-scale image retrieval method according to claim 1 or 3, wherein the step S4 includes the steps of:

5. The binary-feature-based large-scale image retrieval method according to claim 4, wherein the step S5 comprises the steps of:

satisfies (H + delta) epsilon { -1,1}^n×l；

step S503: the difference amount Δ is updated.

6. The large-scale image retrieval method based on binary features as claimed in claim 5, wherein said step S503 specifically is:

according to

And iteratively updating the difference quantity.

7. The binary-feature-based large-scale image retrieval method according to claim 5, wherein the step S6 comprises the steps of:

step S601: increasing the higher order expansion weight λ₁And λ₂；

Step S602: according to the steps S4 and S5, the real-valued output features and the network parameters are continuously updated in combination with the loss function until the training is finished.