WO2020010691A1

WO2020010691A1 - Method and apparatus for extracting hash code from image, and image retrieval method and apparatus

Info

Publication number: WO2020010691A1
Application number: PCT/CN2018/105534
Authority: WO
Inventors: 王浩; 杜长营; 庞旭林; 张晨; 杨康
Original assignee: 北京奇虎科技有限公司
Priority date: 2018-07-12
Filing date: 2018-09-13
Publication date: 2020-01-16

Abstract

A method and apparatus for extracting a hash code from an image, and an image retrieval method and apparatus. The method comprises: constructing a hash code extraction model, the model comprising an encoder and a decoder (S11), wherein the encoder consists of a multi-layer deep neural network (DNN), extracting a hash code from image data, outputting said hash code to the decoder, the decoder consisting of a multi-layer DNN, and converting the input hash code into an image; regularizing, in the encoder, hidden layer encoding by measuring hidden layer encoding redundancy, so as to reduce encoding space information redundancy to obtain anti-redundancy hash code deep extraction model; training the anti-redundancy hash code deep extraction model to determine parameters in the model (S12); and extracting, using the encoder in the trained anti-redundancy hash code deep extraction model, a hash code from the image (S13). The method can reduce the encoding space information redundancy, effectively utilize all dimensions, extract an image hash code with high precision, and improve the accuracy in application fields related to image retrieval, etc.

Description

Method and device for extracting hash code from image and image retrieval method and device

Technical field

The present invention relates to the field of artificial intelligence technology, and in particular, to a method and an apparatus for extracting a hash code from an image, and an image retrieval method and apparatus, as well as an electronic device and a computer-readable storage medium.

Background technique

LTH (learning to hash) is an image compression method that is very effective in image retrieval applications. The framework extracts binary hash codes from images, calculates the similarity between the input image and the image hash codes in the image library, and performs retrieval. The LTH framework can greatly reduce storage space and improve retrieval efficiency.

The extraction of the hash code of an image in LTH is very critical and is generally implemented by an encoder. An autoencoder is an unsupervised neural network method consisting of an encoder and a decoder that can generate images based on random encoding. VAE (Variational Autoencoder, Variational Autoencoder) is a standard normal distribution constraint on random encoding to generate images. The most widely used hash code extraction method in the LTH framework, SGH (Stochastic Generative Hashing, random hash generation) is an application based on the VAE framework.

Variational pruning of the VAE framework will cause some hidden layer units to collapse before the model is trained, which will cause a significant congenital deficiency. For example, (1) there is a lot of coding space. Redundant dimensions (ie, redundant data without information); (2) the framework makes insufficient use of latent codes in the coding space; etc. These deficiencies are even more pronounced when the decoder structure is complex. This will lead to problems such as inability to accurately extract the image hash code, resulting in reduced image retrieval accuracy and other related applications.

Summary of the invention

In view of the above problems, the present invention is provided in order to provide a method and device for extracting a hash code from an image, and an image retrieval method and device that overcome the above problems or at least partially solve the above problems.

According to an aspect of the present invention, a method for extracting a hash code from an image is provided, and the method includes:

Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;

Regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information, and obtaining a redundant extraction model of the anti-redundant hash code;

Training against redundant hash code depth extraction model to determine the parameters in the model;

Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.

Optionally,

The constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);

among them,

D _KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;

Regularizing hidden layer coding by measuring the redundancy of hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);

Where A is the coefficient matrix and Z is the image hash code output by the decoder,

Is the Frobenius norm, δ> 0, δ is the regularization parameter, K is the dimension of Z

Optionally, before training the hash code extraction model, the method further includes:

Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder and forcing the encoder to extract high-quality hash codes.

Optionally, regularizing the output of the last layer of the decoder to try to ensure that the output of the hidden layer of the DNN is close to the hash code. Specific examples include:

Add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);

Among them, H ^(M) is the output of the M layer of the decoder network; M is the number of layers of the encoder and decoder network, η> 0, η is the regularization parameter.

Optionally, training the anti-redundant hash code depth extraction model includes:

Using training data, θ, φ, and A are alternately trained in the objective function.

Optionally, using training data to alternately train θ, φ, and A in the objective function includes:

Obtain the original training data;

After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;

Repeat the above steps a preset number of times.

According to another aspect of the present invention, a method for extracting a hash code from an image is provided. The method includes:

Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction model;

Optionally,

among them,

Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction models include:

Add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);

Where Z is the image hash code output by the decoder,

Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H ^(M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .

According to another aspect of the present invention, an image retrieval method is provided, and the method includes:

Using any of the methods described above, calculating a hash code for each image in the image library and calculating a hash code for the retrieved image;

Calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.

According to another aspect of the present invention, a device for extracting a hash code from an image is provided, and the device includes:

A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. It is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information and obtaining Anti-redundant hash code depth extraction model;

Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;

The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.

Optionally, the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);

among them,

And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);

Optionally, the model construction unit is further adapted to regularize the output of the last layer of the decoder, so as to ensure that the output of the hidden layer of the DNN is similar to the hash code, thereby simplifying the network structure of the decoder and forcing encoding. The extractor extracts high-quality hash codes.

Optionally, the model building unit is specifically adapted to add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);

Among them, H ^(M) is the output of the M layer of the decoder network, M is the number of layers of the encoder and decoder network, η> 0, and η is a regularization parameter.

Optionally, the model training unit is adapted to use training data to alternately train θ, φ, and A in an objective function.

Optionally, the model training unit is specifically adapted to:

Obtain the original training data;

Repeat the above steps a preset number of times.

A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. Consists of multiple layers of DNN to convert the input hash code into an image; and is suitable for regularizing the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder Network structure. The backward encoder extracts high-quality hash codes and obtains an anti-redundant hash code depth extraction model.

among them,

Where Z is the image hash code output by the decoder,

According to another aspect of the present invention, an image retrieval device is provided, and the device includes:

The apparatus for extracting a hash code from an image according to any one of the foregoing, is suitable for calculating a hash code of each image in an image library and calculating a hash code of a retrieved image;

The hash code similarity calculation unit is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.

According to still another aspect of the present invention, there is provided an electronic device, the electronic device includes: a processor, and a memory storing a computer program executable on the processor;

Wherein, the processor is configured to execute the method according to any one of the above when executing a computer program in the memory.

According to yet another aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of the foregoing is implemented.

The technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder. Consists of multi-layer DNN, which converts the input hash code into an image; in the encoder, the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image. By constructing an anti-redundant hash code extraction model, this method can effectively reduce the redundancy of coding space information, effectively use all dimensions, extract image hash codes with high accuracy, and effectively improve the accuracy of related applications such as image retrieval.

The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more comprehensible. In the following, specific embodiments of the invention are enumerated.

BRIEF DESCRIPTION OF THE DRAWINGS

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments and are not to be considered as limiting the invention. Moreover, the same reference numerals are used throughout the drawings to refer to the same parts. In the drawings:

FIG. 1 shows a flowchart of a method for extracting a hash code from an image according to an embodiment of the present invention;

Figure 2 is a comparison chart of the results of SGH and R-SGH image reconstruction errors;

FIG. 3 shows a flowchart of an image retrieval method according to an embodiment of the present invention;

4 shows a schematic diagram of an apparatus for extracting a hash code from an image according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention; FIG.

6 is a schematic structural diagram of an electronic device in an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.

detailed description

Explanation of professional terms appearing in the present invention:

1. Image retrieval: Image Retrieval refers to retrieving similar images from a huge image library based on the input image.

2. LTH: Learning To Hash is an image compression method that extracts a binary hash code from an image, calculates the similarity between the input image and the image hash code in the image library, and performs retrieval. In image retrieval applications, the LTH method can greatly reduce storage space and improve retrieval efficiency.

3. VAE: Variational Autoencoder, a variational autoencoder. An autoencoder is an unsupervised neural network method consisting of an encoder and a decoder that can generate images based on random encoding. Variational autoencoders impose standard normal distribution constraints on random encoding to generate images. The implicit representation generation model can be expressed by the following formula:

Where θ is a parameter obtained by the decoder DNN model and used to represent the likelihood; X is the input data; Z is the implicit data;

The inference model (that is, the DNN encoder) is expressed as follows:

The VAE objective function is as follows:

Where D _KL is the KL divergence, and the optimization goal is to adjust the maximum ELBO of θ and φ.

4. Redundancy means that there are many dimensions in the data that do not carry information (such as all 0), or that the data of different dimensions are linearly related.

5. SGH: Stochastic Generative Hashing, a randomly generated hash code method, is an application based on the VAE framework, uses linear Gaussian likelihood, and applies Bernoulli's prior to the implicit representation Z, as follows:

among them

Its inference model is expressed as follows:

among them

Scalar linear or deep non-linear transformation.

The objective function of SGH is the same as VAE.

6. R-SGH: The anti-redundant random hash generation method proposed by the present invention can ensure that there is no redundancy in the hash code extracted from the image.

7. DNN: Deep Neural Network, deep neural network.

8. KL divergence: Kullback-Leibler Divergence, used to characterize the closeness of two probability distributions.

9, Frobenius norm: Frobenius norm, the vector element's absolute value of the 2nd power and 1/2 power.

10. Stochastic gradient descent method, stochastic parallel gradient algorithm, referred to as SPGD algorithm. It is a model-free optimization algorithm, which is more suitable for the optimal control process with more control variables and more complex controlled systems, which cannot establish accurate mathematical models.

11. MNIST: A handwritten digital public data set from the National Institute of Standards and Technology (NIST). There are 60,000 samples in the training set and 10,000 samples in the test set.

12. CIFAR-10: It is a public data set, which contains 60,000 color images, including 10 categories such as airplanes, cars, cats, and birds, with 6000 pictures in each category.

13. Caltech-256: It is a public data set, containing 29780 images, a total of 256 categories.

14. mAP, mean average (average accuracy), is an evaluation index commonly used in retrieval fields.

15. Monte Carlo Method: Monte Carlo Method, also known as statistical simulation method, is a method proposed by the theory of probability and statistics due to the development of science and technology and the invention of electronic computers in the mid 1940s. Class is very important for numerical calculation methods. A method that uses random numbers (or more commonly pseudo-random numbers) to solve many computational problems.

Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a thorough understanding of the present disclosure, and to fully convey the scope of the present disclosure to those skilled in the art.

FIG. 1 shows a flowchart of a method for extracting a hash code from an image according to an embodiment of the present invention. The method includes:

Step S11: Construct a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from the image data and outputs it to the decoder, and the decoder is composed of multiple layers DNN composition, convert the input hash code into an image;

Hidden layer coding is regularized in the encoder by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information, and obtaining the anti-redundant hash code depth extraction model;

Step S12: training against a redundant hash code depth extraction model to determine parameters in the model;

Step S13: Utilize the encoder in the trained anti-redundant hash code depth extraction model to extract a hash code from the image.

The method shown in Figure 1 specifically includes the following:

(1) Construct an anti-redundant hash code depth extraction model to extract high-quality image hash codes.

Constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in equation (1);

among them,

D _KL is the KL divergence; X is the input data, and Z is the implicit representation data. In the field of image search, the implicit representation data is the image hash code. Θ is obtained by the decoder DNN model to represent the likelihood. parameter;

In the objective function shown in equation (1),

The model includes an encoder and a decoder, wherein the encoder extracts a hash code from the input image data based on the DNN model, and adds a hash code redundancy regularization constraint to the DNN output, that is, the encoder is encoded by measuring the hidden layer The redundancy is used to regularize the hidden layer coding to ensure the quality of the hash code result; the decoder uses the hash code to generate image data based on DNN. Wherein, regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);

Is the Frobenius norm, δ> 0, δ is the regularization parameter, K is the dimension of Z.

among them

Indicates that the value of A is adjusted such that

Minimum value

Representation adjustment

with

Make

The maximum value; Z is the image hash code output by the decoder, which is a multidimensional binary representation; A is the coefficient matrix; δ> 0, δ is the regularization parameter; K is the dimension of Z;

The decoder, which consists of M-layer DNNs, converts the input hash code into an image, and adds a regularization constraint to the DNN output, that is, the last output of the decoder is regularized before training the hash code extraction model Try to ensure that the output of the hidden layer of the DNN is similar to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes, including:

among them

Indicates that the value of A is adjusted such that

Minimum value

Representation adjustment

with

Make

Maximum value

Z is the image hash code output by the decoder, which is a multidimensional binary representation; A is the coefficient matrix;

Is the Frobenius norm; H ^(M) is the M layer output of the decoder network; M is the number of layers of the encoder and decoder network; δ> 0, η> 0, regularization parameters; K is The dimension of Z

In order to reduce the hash code redundancy, the dimensions of the hash code can be made as linear and uncorrelated as possible, so the objective function is introduced

Items, the overall optimization goal is

If any dimension of Z can be expressed linearly by other dimensions, then a coefficient matrix A must be found, such that

Is 0. If a coefficient matrix A can be found so that the term is not 0, but it is small, it indicates that some dimensions of Z have a linear correlation, that is, a high degree of redundancy. Therefore, given the matrix A, we want to adjust Z such that

Maximum, even if the dimensions of Z are as linear and uncorrelated as possible. And because Z is determined by the parameter

The encoder network transform is obtained, so we turn to optimize

When the decoder structure is complicated, the redundancy of the extracted hash code will also be increased, so the decoder structure needs to be simplified as much as possible. Introduced in the objective function

The optimization goal of this sub-item is

which is

It should be as small as possible, which makes the output D ^(M) of the decoder DNN layer H ^(M) as close as possible to the input hash code Z of the decoder, so as to make the decoder network as simple as possible, and finally extract the encoder part from the image more accurately and effectively Hash code.

(2) Training against the depth extraction model of redundant hash codes, including: using training data to alternately train θ, φ, and A in the objective function; specifically:

1. Obtain raw training data and prepare training data

2. Randomly scramble the original training data and divide it into multiple parts. Take each piece of training data in turn. Use each piece of training data to train θ, φ, and A alternately. S copies, each with N samples, set s = 0;

3.Take the sth training sample data

4.Theta optimization

Assuming φ and A are known parameters, the θ optimization objective function is transformed into the form shown in the following (formula a), with (formula a) as the objective function:

Transform it using:

Among them, ξ _j and ε _j to u (0,1)

So you can get

Then for the above formula, use Monte Carlo method to estimate the expected gradient, and update the value of θ based on the gradient.

5, φ optimization

Assuming θ and A are known parameters, the φ optimization objective function is transformed into the form shown in the following (Equation b), and (Equation b) is the objective function:

It is not difficult to see from Equation 5 that the first term in the equation can be infinite, which will affect the optimization effect. To avoid this problem, transform it as follows:

Where R = Z ^T -Z ^T A, ∈ is the hyperparameter threshold.

The value of φ is updated using the stochastic gradient descent method.

6.A optimization

Assuming that θ and φ are known parameters, the optimization objective function of A is transformed into the form shown in (Expression c) below, and (Expression c) is used as the objective function, and the value of A is updated using the stochastic gradient descent method:

7.s = s + 1

8. Repeat 3-7 until all S samples are involved in training;

9. Repeat 2-8 times for T rounds.

The following describes the technical solutions with specific examples.

Example 1, MNIST data and image reconstruction

Network parameter settings: The number of encoder and decoder layers M is set to 1, δ and η are 0.01, the prior parameter ρ _j is 0.5, the threshold parameter ∈ is set to 0.05, and the dimensions of the encoder input data and decoder output data are 28 * 28 = 784, the data dimension of each hidden layer of the hash code and encoder and decoder is 64;

The training data set is the training set of MIST; the number of data samples used in each training step is set to 32; the image reconstruction error of the model is evaluated at different training rounds. The evaluation uses the MIST test set and is extracted by the encoder. The Greek code and decoder generate reconstructed data, calculate the input and reconstructed data errors, and the calculation method is as follows:

Where N is the number of evaluation samples, D is the dimension of each sample data, x is the input data, and y is the reconstructed data.

FIG. 2 is a comparison chart of the results of SGH and R-SGH image reconstruction errors. The curve with a larger decrease represents R.SGH. From FIG. 2, it can be seen that the R-SGH proposed by the present invention has a better reconstruction ability for images .

Example 2, CIFAR-10 image retrieval

Network parameter settings: The number of encoder and decoder layers M is set to 4, δ is 0.01, η is 0.01, the prior parameter ρ _j is 0.5, the threshold parameter ∈ is set to 0.05, and the dimensions of the encoder input data and decoder output data are 512, Hash code and encoder and decoder hidden layer data dimensions are 32, 64 and 128;

100 samples of each of the 10 types of data in the CIFAR-10 dataset were randomly selected. A total of 1,000 samples were used as search input during the test. The remaining data were training samples and image libraries. The number of data samples for each training step is set to 32, and the number of training rounds is 200.

The mAP index is used to evaluate the image retrieval capability. The mAP results of the three models with hash code dimensions of 32, 64, and 128 are shown in Table 1. Table 1 shows the results of the mAP (%) test of SGH and R-SGH on the CIFAR-10 dataset :

Table 1 mAP (%) test results of SGH and R-SGH in the CIFAR-10 dataset

方法method	哈希码32bitHash code 32bit	哈希码64bitHash code 64bit	哈希码128bit128bit hash code
SGHSGH	23.8623.86	30.5630.56	35.6135.61
R-SGHR-SGH	24.6624.66	33.6233.62	44.1244.12

It can be seen from Table 1 that the R-SGH proposed by the present invention has better image retrieval capabilities.

Example 3, Caltech-256 image retrieval

1000 samples of data were randomly selected from the Caltech-256 data set for retrieval input during testing, and the remaining data were training samples and image libraries. The number of data samples for each training step is set to 32, and the number of training rounds is 200.

The mAP index is used to evaluate the image retrieval ability. The mAP results of the three models with the hash code dimensions of 32, 64, and 128 are shown in Table 2. Table 2 shows the mAP (%) test results of SGH and R-SGH on the Caltech-256 dataset :

Table 2 mAP (%) test results of SGH and R-SGH on Caltech-256 dataset

方法method	哈希码32bitHash code 32bit	哈希码64bitHash code 64bit	哈希码128bit128bit hash code
SGHSGH	47.1247.12	71.0971.09	78.6178.61
R-SGHR-SGH	59.0259.02	74.1874.18	84.9684.96

It can be seen from Table 2 that the R-SGH proposed by the present invention has better image retrieval capabilities.

An embodiment of the present invention provides another method for extracting a hash code from an image. The method includes:

Among them, constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in formula (1);

among them,

Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder. Code depth extraction models include:

Add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (4);

Where Z is the image hash code output by the decoder,

The optimization goal of this sub-item is

which is

FIG. 3 shows a flowchart of an image retrieval method according to an embodiment of the present invention. The method includes:

Step S31: Calculate a hash code of each image in the image library by using the above method, and calculate a hash code of the retrieved image;

Step S32: Calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.

FIG. 4 shows a schematic diagram of an apparatus for extracting a hash code from an image according to an embodiment of the present invention. The apparatus 40 includes:

The model construction unit 401 is adapted to construct a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multilayer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to a decoder for decoding. The encoder is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer encoding by measuring the redundancy of the hidden layer encoding in the encoder, thereby reducing the redundancy of the coding space information, Get anti-redundant hash code depth extraction model;

A model training unit 402, adapted to train against a redundant hash code depth extraction model to determine parameters in the model;

The hash code extraction unit 403 is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.

In an embodiment of the present invention, the model construction unit 402 is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in formula (1);

among them,

Is the Frobenius norm, δ> 0, δ is the regularization parameter, and K is the dimension of Z.

In an embodiment of the present invention, the model construction unit 401 is further adapted to regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder. The backward-force encoder extracts high-quality hash codes.

In an embodiment of the present invention, the model construction unit 401 is specifically adapted to add a constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);

In one embodiment of the present invention, the model training unit 402 is adapted to use training data to alternately train θ, φ, and A in the objective function.

The model training unit 402 is specifically adapted to,

Obtain the original training data;

Repeat the above steps a preset number of times.

An embodiment of the present invention shows another schematic diagram of an apparatus for extracting a hash code from an image. The apparatus includes:

In an embodiment of the present invention, the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);

among them,

Where Z is the image hash code output by the decoder,

FIG. 5 shows a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention. The apparatus 50 includes:

The device 501 for extracting a hash code from an image as described above is suitable for calculating a hash code of each image in an image library and calculating a hash code of a retrieved image;

The hash code similarity calculation unit 502 is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.

The technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder. Consists of multi-layer DNN, which converts the input hash code into an image; in the encoder, the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image. In the encoder, the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby effectively reducing the coding space information redundancy, effectively using all dimensions, and extracting a hash code that can accurately represent the image; decoding the model based on the hash code The output of the last layer of the decoder is regularized, simplifying the complexity of the decoder, and extracting the encoder to extract a more accurate and effective hash code. The two models are used to optimize the entire model, thereby effectively improving the problem of hash code information redundancy. , Extract image hash codes with high accuracy, and effectively improve the accuracy of related application areas such as image retrieval.

It should be noted:

The algorithms and displays provided here are not inherently related to any particular computer, virtual appliance, or other device. Various general-purpose devices can also be used with teaching based on this. The structure required to construct such a device is obvious from the above description. Furthermore, the invention is not directed to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is to disclose the best embodiment of the present invention.

In the description provided here, numerous specific details are explained. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of the specification.

Similarly, it should be understood that, in order to streamline the present disclosure and help understand one or more of the various aspects of the invention, in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together into a single embodiment, Figure, or description of it. However, this disclosed method should not be construed to reflect the intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single embodiment previously disclosed. Thus, the claims that follow a specific embodiment are hereby explicitly incorporated into this specific embodiment, where each claim itself serves as a separate embodiment of the invention.

Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except for such features and / or processes or units, which are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed may be employed in any combination or All processes or units of the equipment are combined. Each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

In addition, those skilled in the art can understand that although some embodiments described herein include some features included in other embodiments but not other features, the combination of features of different embodiments is meant to be within the scope of the present invention Within and form different embodiments. For example, in the following claims, any one of the claimed embodiments can be used in any combination.

The various component embodiments of the present invention may be implemented by hardware, or by software modules running on one or more processors, or by a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some of the photographic input device, electronic device, and computer-readable storage medium for text content according to the embodiments of the present invention. Or some or all functions of all components. The invention may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing part or all of the method described herein. Such a program that implements the present invention may be stored on a computer-readable medium or may have the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

For example, FIG. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention. The electronic device 600 includes a processor 610 and a memory 620 storing a computer program executable on the processor 610. The processor 610 is configured to execute each step of the method in the present invention when the computer program in the memory 620 is executed. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. The memory 620 has a storage space 630 that stores a computer program 631 for performing any of the method steps in the above method. The computer program 631 may be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a computer-readable storage medium such as that described in FIG. 7.

FIG. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. The computer-readable storage medium 700 stores a computer program 631 for performing the method steps according to the present invention, which can be read by the processor 610 of the electronic device 600. When the computer program 631 is run by the electronic device 600, the electronic device 600 is caused Each step in the method described above is performed. Specifically, the calculation program 631 stored in the computer-readable storage medium may execute the method shown in any one of the foregoing embodiments. The computer program 631 can be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate the invention rather than limit the invention, and that those skilled in the art may design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claim listing several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third does not imply any order. These words can be interpreted as names.

Claims

A method for extracting a hash code from an image, wherein the method includes:

Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;

Regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information, and obtaining a redundant extraction model of the anti-redundant hash code;

Training against redundant hash code depth extraction model to determine the parameters in the model;

Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.
The method of claim 1, wherein:

The constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);

among them,
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;

Regularizing hidden layer coding by measuring the redundancy of hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);

Where A is the coefficient matrix and Z is the image hash code output by the decoder,
Is the Frobenius norm, δ> 0, δ is the regularization parameter, and K is the dimension of Z.
The method of claim 2, wherein before training the hash code extraction model, the method further comprises:

Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder and forcing the encoder to extract high-quality hash codes.
The method according to claim 3, wherein regularizing the output of the last layer of the decoder to ensure as far as possible that the output of the hidden layer of the DNN is similar to the hash code specifically includes:

Add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);

Among them, H (M) is the output of the M layer of the decoder network; M is the number of layers of the encoder and decoder network, η> 0, η is the regularization parameter.
The method according to any one of claims 1-4, wherein training the anti-redundant hash code depth extraction model comprises:

Using training data, θ, φ, and A are alternately trained in the objective function.
The method according to claim 5, wherein the alternate training of θ, φ, and A in the objective function using the training data comprises:

Obtain the original training data;

After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;

Repeat the above steps a preset number of times.
A method for extracting a hash code from an image, wherein the method includes:

Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;

Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction model;

Training against redundant hash code depth extraction model to determine the parameters in the model;

Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.
The method according to claim 7, wherein:

The constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);

among them,
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;

Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction models include:

Add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (4);

Where Z is the image hash code output by the decoder,
Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
An image retrieval method, wherein the method includes:

Using the method according to any one of claims 1 to 8, calculating a hash code of each image in the image library and calculating a hash code of the retrieved image;

Calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
A device for extracting a hash code from an image, wherein the device includes:

A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. It is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information and obtaining Anti-redundant hash code depth extraction model;

Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;

The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
The apparatus according to claim 10, wherein the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);

among them,
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;

And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);

Where A is the coefficient matrix and Z is the image hash code output by the decoder,
Is the Frobenius norm, δ> 0, δ is the regularization parameter, and K is the dimension of Z.
The apparatus according to claim 11, wherein the model construction unit is further adapted to regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying decoding Network structure, the backward-forced encoder extracts high-quality hash codes.
The apparatus according to claim 12, wherein the model construction unit is specifically adapted to further add a constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);

Among them, H (M) is the output of the M layer of the decoder network, M is the number of layers of the encoder and decoder network, η> 0, and η is a regularization parameter.
The device according to any one of claims 10-13, wherein the model training unit is adapted to use the training data to alternately train θ, φ, and A in the objective function.
The apparatus according to claim 14, wherein the model training unit is specifically adapted to,

Obtain the original training data;

After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;

Repeat the above steps a preset number of times.
A device for extracting a hash code from an image, wherein the device includes:

A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. Consists of multiple layers of DNN to convert the input hash code into an image; and is suitable for regularizing the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder Network structure. The backward encoder extracts high-quality hash codes and obtains an anti-redundant hash code depth extraction model.

Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;

The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
The device according to claim 16, wherein the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);

among them,
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;

And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);

Where Z is the image hash code output by the decoder,
Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
An image retrieval device, wherein the device includes:

The apparatus for extracting a hash code from an image according to any one of claims 10-17, adapted to calculate a hash code of each image in an image library and calculate a hash code of a retrieved image;

The hash code similarity calculation unit is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
An electronic device, characterized in that the electronic device includes a processor and a memory storing a computer program executable on the processor;

The processor is configured to execute the method according to any one of claims 1 to 9 when executing a computer program in the memory.
A computer-readable storage medium having stored thereon a computer program, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1-9 is implemented.