WO2020010691A1 - Method and apparatus for extracting hash code from image, and image retrieval method and apparatus - Google Patents

Method and apparatus for extracting hash code from image, and image retrieval method and apparatus Download PDF

Info

Publication number
WO2020010691A1
WO2020010691A1 PCT/CN2018/105534 CN2018105534W WO2020010691A1 WO 2020010691 A1 WO2020010691 A1 WO 2020010691A1 CN 2018105534 W CN2018105534 W CN 2018105534W WO 2020010691 A1 WO2020010691 A1 WO 2020010691A1
Authority
WO
WIPO (PCT)
Prior art keywords
hash code
image
decoder
model
encoder
Prior art date
Application number
PCT/CN2018/105534
Other languages
French (fr)
Chinese (zh)
Inventor
王浩
杜长营
庞旭林
张晨
杨康
Original Assignee
北京奇虎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810765292.6A external-priority patent/CN109325140B/en
Priority claimed from CN201810766031.6A external-priority patent/CN109145132B/en
Application filed by 北京奇虎科技有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2020010691A1 publication Critical patent/WO2020010691A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding

Definitions

  • the present invention relates to the field of artificial intelligence technology, and in particular, to a method and an apparatus for extracting a hash code from an image, and an image retrieval method and apparatus, as well as an electronic device and a computer-readable storage medium.
  • LTH learning to hash
  • the framework extracts binary hash codes from images, calculates the similarity between the input image and the image hash codes in the image library, and performs retrieval.
  • the LTH framework can greatly reduce storage space and improve retrieval efficiency.
  • the extraction of the hash code of an image in LTH is very critical and is generally implemented by an encoder.
  • An autoencoder is an unsupervised neural network method consisting of an encoder and a decoder that can generate images based on random encoding.
  • VAE Vehicle Autoencoder, Variational Autoencoder
  • SGH Stochastic Generative Hashing, random hash generation
  • Variational pruning of the VAE framework will cause some hidden layer units to collapse before the model is trained, which will cause a significant congenital deficiency. For example, (1) there is a lot of coding space. Redundant dimensions (ie, redundant data without information); (2) the framework makes insufficient use of latent codes in the coding space; etc. These deficiencies are even more pronounced when the decoder structure is complex. This will lead to problems such as inability to accurately extract the image hash code, resulting in reduced image retrieval accuracy and other related applications.
  • the present invention is provided in order to provide a method and device for extracting a hash code from an image, and an image retrieval method and device that overcome the above problems or at least partially solve the above problems.
  • a method for extracting a hash code from an image includes:
  • a hash code extraction model which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
  • Regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information, and obtaining a redundant extraction model of the anti-redundant hash code;
  • the constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the image hash code output by the decoder
  • is a parameter used to represent the likelihood obtained by the decoder DNN model
  • Regularizing hidden layer coding by measuring the redundancy of hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
  • A is the coefficient matrix and Z is the image hash code output by the decoder, Is the Frobenius norm, ⁇ > 0, ⁇ is the regularization parameter, K is the dimension of Z
  • the method before training the hash code extraction model, the method further includes:
  • H (M) is the output of the M layer of the decoder network; M is the number of layers of the encoder and decoder network, ⁇ > 0, ⁇ is the regularization parameter.
  • training the anti-redundant hash code depth extraction model includes:
  • ⁇ , ⁇ , and A are alternately trained in the objective function.
  • using training data to alternately train ⁇ , ⁇ , and A in the objective function includes:
  • each training data is taken in turn, and ⁇ , ⁇ , and A are trained alternately with each training data;
  • a method for extracting a hash code from an image includes:
  • a hash code extraction model which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
  • Hash code depth extraction model Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy.
  • the constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the image hash code output by the decoder
  • is a parameter used to represent the likelihood obtained by the decoder DNN model
  • Hash code depth extraction models include:
  • Z is the image hash code output by the decoder
  • ⁇ > ⁇ is the regularization parameter
  • K is the dimension of Z
  • H (M) is the M layer output of the decoder network
  • M is the number of layers of the encoder and decoder networks .
  • an image retrieval method includes:
  • a device for extracting a hash code from an image includes:
  • a model building unit is suitable for constructing a hash code extraction model.
  • the model includes an encoder and a decoder.
  • the encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. It is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information and obtaining Anti-redundant hash code depth extraction model;
  • Model training unit suitable for training against redundant hash code depth extraction model to determine parameters in the model
  • the hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
  • the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the image hash code output by the decoder
  • is a parameter used to represent the likelihood obtained by the decoder DNN model
  • model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
  • A is the coefficient matrix and Z is the image hash code output by the decoder, Is the Frobenius norm, ⁇ > 0, ⁇ is the regularization parameter, K is the dimension of Z
  • the model construction unit is further adapted to regularize the output of the last layer of the decoder, so as to ensure that the output of the hidden layer of the DNN is similar to the hash code, thereby simplifying the network structure of the decoder and forcing encoding.
  • the extractor extracts high-quality hash codes.
  • model building unit is specifically adapted to add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
  • H (M) is the output of the M layer of the decoder network
  • M is the number of layers of the encoder and decoder network
  • ⁇ > 0 and ⁇ is a regularization parameter.
  • the model training unit is adapted to use training data to alternately train ⁇ , ⁇ , and A in an objective function.
  • model training unit is specifically adapted to:
  • each training data is taken in turn, and ⁇ , ⁇ , and A are trained alternately with each training data;
  • a device for extracting a hash code from an image includes:
  • a model building unit is suitable for constructing a hash code extraction model.
  • the model includes an encoder and a decoder.
  • the encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. Consists of multiple layers of DNN to convert the input hash code into an image; and is suitable for regularizing the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder Network structure.
  • the backward encoder extracts high-quality hash codes and obtains an anti-redundant hash code depth extraction model.
  • Model training unit suitable for training against redundant hash code depth extraction model to determine parameters in the model
  • the hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
  • the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the image hash code output by the decoder
  • is a parameter used to represent the likelihood obtained by the decoder DNN model
  • model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
  • Z is the image hash code output by the decoder
  • ⁇ > ⁇ is the regularization parameter
  • K is the dimension of Z
  • H (M) is the M layer output of the decoder network
  • M is the number of layers of the encoder and decoder networks .
  • an image retrieval device includes:
  • the apparatus for extracting a hash code from an image is suitable for calculating a hash code of each image in an image library and calculating a hash code of a retrieved image;
  • the hash code similarity calculation unit is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
  • an electronic device the electronic device includes: a processor, and a memory storing a computer program executable on the processor;
  • the processor is configured to execute the method according to any one of the above when executing a computer program in the memory.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of the foregoing is implemented.
  • the technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder.
  • Consists of multi-layer DNN which converts the input hash code into an image
  • the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image.
  • FIG. 1 shows a flowchart of a method for extracting a hash code from an image according to an embodiment of the present invention
  • Figure 2 is a comparison chart of the results of SGH and R-SGH image reconstruction errors
  • FIG. 3 shows a flowchart of an image retrieval method according to an embodiment of the present invention
  • FIG. 4 shows a schematic diagram of an apparatus for extracting a hash code from an image according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
  • Image retrieval refers to retrieving similar images from a huge image library based on the input image.
  • LTH Learning To Hash is an image compression method that extracts a binary hash code from an image, calculates the similarity between the input image and the image hash code in the image library, and performs retrieval. In image retrieval applications, the LTH method can greatly reduce storage space and improve retrieval efficiency.
  • VAE Variational Autoencoder, a variational autoencoder.
  • An autoencoder is an unsupervised neural network method consisting of an encoder and a decoder that can generate images based on random encoding. Variational autoencoders impose standard normal distribution constraints on random encoding to generate images.
  • the implicit representation generation model can be expressed by the following formula:
  • is a parameter obtained by the decoder DNN model and used to represent the likelihood
  • X is the input data
  • Z is the implicit data
  • the inference model (that is, the DNN encoder) is expressed as follows:
  • the VAE objective function is as follows:
  • D KL is the KL divergence
  • the optimization goal is to adjust the maximum ELBO of ⁇ and ⁇ .
  • Redundancy means that there are many dimensions in the data that do not carry information (such as all 0), or that the data of different dimensions are linearly related.
  • SGH Stochastic Generative Hashing, a randomly generated hash code method, is an application based on the VAE framework, uses linear Gaussian likelihood, and applies Bernoulli's prior to the implicit representation Z, as follows:
  • the objective function of SGH is the same as VAE.
  • R-SGH The anti-redundant random hash generation method proposed by the present invention can ensure that there is no redundancy in the hash code extracted from the image.
  • DNN Deep Neural Network, deep neural network.
  • KL divergence Kullback-Leibler Divergence, used to characterize the closeness of two probability distributions.
  • Frobenius norm Frobenius norm, the vector element's absolute value of the 2nd power and 1/2 power.
  • Stochastic gradient descent method stochastic parallel gradient algorithm, referred to as SPGD algorithm. It is a model-free optimization algorithm, which is more suitable for the optimal control process with more control variables and more complex controlled systems, which cannot establish accurate mathematical models.
  • MNIST A handwritten digital public data set from the National Institute of Standards and Technology (NIST). There are 60,000 samples in the training set and 10,000 samples in the test set.
  • CIFAR-10 It is a public data set, which contains 60,000 color images, including 10 categories such as airplanes, cars, cats, and birds, with 6000 pictures in each category.
  • Caltech-256 It is a public data set, containing 29780 images, a total of 256 categories.
  • mAP mean average (average accuracy), is an evaluation index commonly used in retrieval fields.
  • Monte Carlo Method also known as statistical simulation method, is a method proposed by the theory of probability and statistics due to the development of science and technology and the invention of electronic computers in the mid 1940s. Class is very important for numerical calculation methods. A method that uses random numbers (or more commonly pseudo-random numbers) to solve many computational problems.
  • FIG. 1 shows a flowchart of a method for extracting a hash code from an image according to an embodiment of the present invention. The method includes:
  • Step S11 Construct a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from the image data and outputs it to the decoder, and the decoder is composed of multiple layers DNN composition, convert the input hash code into an image;
  • the encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from the image data and outputs it to the decoder
  • the decoder is composed of multiple layers DNN composition, convert the input hash code into an image
  • Hidden layer coding is regularized in the encoder by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information, and obtaining the anti-redundant hash code depth extraction model;
  • Step S12 training against a redundant hash code depth extraction model to determine parameters in the model
  • Step S13 Utilize the encoder in the trained anti-redundant hash code depth extraction model to extract a hash code from the image.
  • the technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder.
  • Consists of multi-layer DNN which converts the input hash code into an image
  • the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image.
  • the method shown in Figure 1 specifically includes the following:
  • Constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in equation (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the implicit representation data.
  • the implicit representation data is the image hash code.
  • is obtained by the decoder DNN model to represent the likelihood. parameter;
  • the model includes an encoder and a decoder, wherein the encoder extracts a hash code from the input image data based on the DNN model, and adds a hash code redundancy regularization constraint to the DNN output, that is, the encoder is encoded by measuring the hidden layer The redundancy is used to regularize the hidden layer coding to ensure the quality of the hash code result; the decoder uses the hash code to generate image data based on DNN.
  • regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
  • A is the coefficient matrix and Z is the image hash code output by the decoder, Is the Frobenius norm, ⁇ > 0, ⁇ is the regularization parameter, K is the dimension of Z.
  • Z is the image hash code output by the decoder, which is a multidimensional binary representation
  • A is the coefficient matrix
  • is the regularization parameter
  • K is the dimension of Z
  • the decoder which consists of M-layer DNNs, converts the input hash code into an image, and adds a regularization constraint to the DNN output, that is, the last output of the decoder is regularized before training the hash code extraction model Try to ensure that the output of the hidden layer of the DNN is similar to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes, including:
  • Z is the image hash code output by the decoder, which is a multidimensional binary representation
  • A is the coefficient matrix
  • I the Frobenius norm
  • H (M) is the M layer output of the decoder network
  • M is the number of layers of the encoder and decoder network
  • K is The dimension of Z
  • the dimensions of the hash code can be made as linear and uncorrelated as possible, so the objective function is introduced Items, the overall optimization goal is If any dimension of Z can be expressed linearly by other dimensions, then a coefficient matrix A must be found, such that Is 0. If a coefficient matrix A can be found so that the term is not 0, but it is small, it indicates that some dimensions of Z have a linear correlation, that is, a high degree of redundancy. Therefore, given the matrix A, we want to adjust Z such that Maximum, even if the dimensions of Z are as linear and uncorrelated as possible. And because Z is determined by the parameter The encoder network transform is obtained, so we turn to optimize
  • Equation 5 It is not difficult to see from Equation 5 that the first term in the equation can be infinite, which will affect the optimization effect. To avoid this problem, transform it as follows:
  • the value of ⁇ is updated using the stochastic gradient descent method.
  • the training data set is the training set of MIST; the number of data samples used in each training step is set to 32; the image reconstruction error of the model is evaluated at different training rounds.
  • the evaluation uses the MIST test set and is extracted by the encoder.
  • the Greek code and decoder generate reconstructed data, calculate the input and reconstructed data errors, and the calculation method is as follows:
  • N is the number of evaluation samples
  • D is the dimension of each sample data
  • x is the input data
  • y is the reconstructed data.
  • FIG. 2 is a comparison chart of the results of SGH and R-SGH image reconstruction errors.
  • the curve with a larger decrease represents R.SGH. From FIG. 2, it can be seen that the R-SGH proposed by the present invention has a better reconstruction ability for images .
  • Network parameter settings The number of encoder and decoder layers M is set to 4, ⁇ is 0.01, ⁇ is 0.01, the prior parameter ⁇ j is 0.5, the threshold parameter ⁇ is set to 0.05, and the dimensions of the encoder input data and decoder output data are 512, Hash code and encoder and decoder hidden layer data dimensions are 32, 64 and 128;
  • the mAP index is used to evaluate the image retrieval capability.
  • the mAP results of the three models with hash code dimensions of 32, 64, and 128 are shown in Table 1.
  • Table 1 shows the results of the mAP (%) test of SGH and R-SGH on the CIFAR-10 dataset :
  • Hash code 32bit Hash code 64bit 128bit hash code SGH 23.86 30.56 35.61 R-SGH 24.66 33.62 44.12
  • Network parameter settings The number of encoder and decoder layers M is set to 4, ⁇ is 0.01, ⁇ is 0.01, the prior parameter ⁇ j is 0.5, the threshold parameter ⁇ is set to 0.05, and the dimensions of the encoder input data and decoder output data are 512, Hash code and encoder and decoder hidden layer data dimensions are 32, 64 and 128;
  • 1000 samples of data were randomly selected from the Caltech-256 data set for retrieval input during testing, and the remaining data were training samples and image libraries.
  • the number of data samples for each training step is set to 32, and the number of training rounds is 200.
  • the mAP index is used to evaluate the image retrieval ability.
  • the mAP results of the three models with the hash code dimensions of 32, 64, and 128 are shown in Table 2.
  • Table 2 shows the mAP (%) test results of SGH and R-SGH on the Caltech-256 dataset :
  • Hash code 32bit Hash code 64bit 128bit hash code SGH 47.12 71.09 78.61 R-SGH 59.02 74.18 84.96
  • An embodiment of the present invention provides another method for extracting a hash code from an image.
  • the method includes:
  • a hash code extraction model which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
  • Hidden layer coding is regularized in the encoder by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information, and obtaining the anti-redundant hash code depth extraction model;
  • constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in formula (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the image hash code output by the decoder
  • is a parameter used to represent the likelihood obtained by the decoder DNN model
  • Code depth extraction models include:
  • Z is the image hash code output by the decoder
  • ⁇ > ⁇ is the regularization parameter
  • K is the dimension of Z
  • H (M) is the M layer output of the decoder network
  • M is the number of layers of the encoder and decoder networks .
  • FIG. 3 shows a flowchart of an image retrieval method according to an embodiment of the present invention. The method includes:
  • Step S31 Calculate a hash code of each image in the image library by using the above method, and calculate a hash code of the retrieved image;
  • Step S32 Calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
  • FIG. 4 shows a schematic diagram of an apparatus for extracting a hash code from an image according to an embodiment of the present invention.
  • the apparatus 40 includes:
  • the model construction unit 401 is adapted to construct a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multilayer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to a decoder for decoding.
  • the encoder is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer encoding by measuring the redundancy of the hidden layer encoding in the encoder, thereby reducing the redundancy of the coding space information, Get anti-redundant hash code depth extraction model;
  • a model training unit 402 adapted to train against a redundant hash code depth extraction model to determine parameters in the model
  • the hash code extraction unit 403 is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
  • the model construction unit 402 is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in formula (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the image hash code output by the decoder
  • is a parameter used to represent the likelihood obtained by the decoder DNN model
  • model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
  • A is the coefficient matrix and Z is the image hash code output by the decoder, Is the Frobenius norm, ⁇ > 0, ⁇ is the regularization parameter, and K is the dimension of Z.
  • the model construction unit 401 is further adapted to regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder.
  • the backward-force encoder extracts high-quality hash codes.
  • the model construction unit 401 is specifically adapted to add a constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
  • H (M) is the output of the M layer of the decoder network
  • M is the number of layers of the encoder and decoder network
  • ⁇ > 0 and ⁇ is a regularization parameter.
  • the model training unit 402 is adapted to use training data to alternately train ⁇ , ⁇ , and A in the objective function.
  • the model training unit 402 is specifically adapted to,
  • each training data is taken in turn, and ⁇ , ⁇ , and A are trained alternately with each training data;
  • An embodiment of the present invention shows another schematic diagram of an apparatus for extracting a hash code from an image.
  • the apparatus includes:
  • a model building unit is suitable for constructing a hash code extraction model.
  • the model includes an encoder and a decoder.
  • the encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. Consists of multiple layers of DNN to convert the input hash code into an image; and is suitable for regularizing the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder Network structure.
  • the backward encoder extracts high-quality hash codes and obtains an anti-redundant hash code depth extraction model.
  • Model training unit suitable for training against redundant hash code depth extraction model to determine parameters in the model
  • the hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
  • the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
  • D KL is the KL divergence
  • X is the input data
  • Z is the image hash code output by the decoder
  • is a parameter used to represent the likelihood obtained by the decoder DNN model
  • model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
  • Z is the image hash code output by the decoder
  • ⁇ > ⁇ is the regularization parameter
  • K is the dimension of Z
  • H (M) is the M layer output of the decoder network
  • M is the number of layers of the encoder and decoder networks .
  • FIG. 5 shows a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention.
  • the apparatus 50 includes:
  • the device 501 for extracting a hash code from an image as described above is suitable for calculating a hash code of each image in an image library and calculating a hash code of a retrieved image;
  • the hash code similarity calculation unit 502 is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
  • the technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder.
  • Consists of multi-layer DNN which converts the input hash code into an image
  • the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image.
  • the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby effectively reducing the coding space information redundancy, effectively using all dimensions, and extracting a hash code that can accurately represent the image; decoding the model based on the hash code
  • the output of the last layer of the decoder is regularized, simplifying the complexity of the decoder, and extracting the encoder to extract a more accurate and effective hash code.
  • the two models are used to optimize the entire model, thereby effectively improving the problem of hash code information redundancy. , Extract image hash codes with high accuracy, and effectively improve the accuracy of related application areas such as image retrieval.
  • modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except for such features and / or processes or units, which are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed may be employed in any combination or All processes or units of the equipment are combined.
  • the various component embodiments of the present invention may be implemented by hardware, or by software modules running on one or more processors, or by a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some of the photographic input device, electronic device, and computer-readable storage medium for text content according to the embodiments of the present invention. Or some or all functions of all components.
  • the invention may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing part or all of the method described herein.
  • Such a program that implements the present invention may be stored on a computer-readable medium or may have the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • FIG. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
  • the electronic device 600 includes a processor 610 and a memory 620 storing a computer program executable on the processor 610.
  • the processor 610 is configured to execute each step of the method in the present invention when the computer program in the memory 620 is executed.
  • the memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • the memory 620 has a storage space 630 that stores a computer program 631 for performing any of the method steps in the above method.
  • the computer program 631 may be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a computer-readable storage medium such as that described in FIG. 7.
  • FIG. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
  • the computer-readable storage medium 700 stores a computer program 631 for performing the method steps according to the present invention, which can be read by the processor 610 of the electronic device 600.
  • the computer program 631 is run by the electronic device 600, the electronic device 600 is caused Each step in the method described above is performed.
  • the calculation program 631 stored in the computer-readable storage medium may execute the method shown in any one of the foregoing embodiments.
  • the computer program 631 can be compressed in a suitable form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for extracting a hash code from an image, and an image retrieval method and apparatus. The method comprises: constructing a hash code extraction model, the model comprising an encoder and a decoder (S11), wherein the encoder consists of a multi-layer deep neural network (DNN), extracting a hash code from image data, outputting said hash code to the decoder, the decoder consisting of a multi-layer DNN, and converting the input hash code into an image; regularizing, in the encoder, hidden layer encoding by measuring hidden layer encoding redundancy, so as to reduce encoding space information redundancy to obtain anti-redundancy hash code deep extraction model; training the anti-redundancy hash code deep extraction model to determine parameters in the model (S12); and extracting, using the encoder in the trained anti-redundancy hash code deep extraction model, a hash code from the image (S13). The method can reduce the encoding space information redundancy, effectively utilize all dimensions, extract an image hash code with high precision, and improve the accuracy in application fields related to image retrieval, etc.

Description

从图像中提取哈希码的方法、装置及图像检索方法、装置Method and device for extracting hash code from image and image retrieval method and device 技术领域Technical field
本发明涉及人工智能技术领域,具体涉及一种从图像中提取哈希码的方法、装置及图像检索方法、装置,以及电子设备和计算机可读存储介质。The present invention relates to the field of artificial intelligence technology, and in particular, to a method and an apparatus for extracting a hash code from an image, and an image retrieval method and apparatus, as well as an electronic device and a computer-readable storage medium.
背景技术Background technique
LTH(learning to hash)是一种图像压缩方法,在图像检索应用中非常有效,该框架从图像中提取二进制哈希码,计算输入图像与图像库中图像哈希码的相似度,进行检索。LTH框架可大大降低存储空间,并提升检索效率。LTH (learning to hash) is an image compression method that is very effective in image retrieval applications. The framework extracts binary hash codes from images, calculates the similarity between the input image and the image hash codes in the image library, and performs retrieval. The LTH framework can greatly reduce storage space and improve retrieval efficiency.
在LTH中图像的哈希码提取非常关键,一般采用编码器来实现。自编码器是一种无监督神经网络方法,由编码器和解码器组成,可根据随机编码生成图像。VAE(Variational Autoencoder,变分自动编码器)是对随机编码进行标准正态分布约束,进而生成图像。LTH框架中应用最广泛的哈希码提取方法SGH(Stochastic Generative Hashing,随机哈希生成)便是基于VAE框架的一种应用。The extraction of the hash code of an image in LTH is very critical and is generally implemented by an encoder. An autoencoder is an unsupervised neural network method consisting of an encoder and a decoder that can generate images based on random encoding. VAE (Variational Autoencoder, Variational Autoencoder) is a standard normal distribution constraint on random encoding to generate images. The most widely used hash code extraction method in the LTH framework, SGH (Stochastic Generative Hashing, random hash generation) is an application based on the VAE framework.
VAE框架的变分裁剪(Variational Pruning)会导致模型训练初期有些隐层单元还未被有效提取时就会出现崩塌(collapse),从而使框架存在明显的先天不足,例如,(1)编码空间存在很多冗余维度(即无信息的冗余数据);(2)框架对编码空间的隐码(latent code)利用不足;等。尤其当解码器结构复杂时,这些不足更为明显。这会导致:对无法准确提取图像哈希码,导致图像检索准确率下降以及其他相关应用的问题。Variational pruning of the VAE framework will cause some hidden layer units to collapse before the model is trained, which will cause a significant congenital deficiency. For example, (1) there is a lot of coding space. Redundant dimensions (ie, redundant data without information); (2) the framework makes insufficient use of latent codes in the coding space; etc. These deficiencies are even more pronounced when the decoder structure is complex. This will lead to problems such as inability to accurately extract the image hash code, resulting in reduced image retrieval accuracy and other related applications.
发明内容Summary of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的从图像中提取哈希码的方法、装置及图像检索方法、装置。In view of the above problems, the present invention is provided in order to provide a method and device for extracting a hash code from an image, and an image retrieval method and device that overcome the above problems or at least partially solve the above problems.
依据本发明的一个方面,提供了一种从图像中提取哈希码的方法,该方法包括:According to an aspect of the present invention, a method for extracting a hash code from an image is provided, and the method includes:
构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
在所述编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;Regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information, and obtaining a redundant extraction model of the anti-redundant hash code;
对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Training against redundant hash code depth extraction model to determine the parameters in the model;
利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.
可选地,Optionally,
所述构建哈希码提取模型包括:构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;The constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
Figure PCTCN2018105534-appb-000001
Figure PCTCN2018105534-appb-000001
其中,
Figure PCTCN2018105534-appb-000002
D KL为KL散度;X为输 入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000002
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
在所述编码器中通过度量隐层编码冗余度对隐层编码进行正则化包括:在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;Regularizing hidden layer coding by measuring the redundancy of hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
Figure PCTCN2018105534-appb-000003
Figure PCTCN2018105534-appb-000003
其中,A是系数矩阵,Z是解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000004
为弗罗贝尼乌斯范数,δ>0,δ为正则化参数,K为Z的维度。
Where A is the coefficient matrix and Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000004
Is the Frobenius norm, δ> 0, δ is the regularization parameter, K is the dimension of Z
可选地,在对哈希码提取模型进行训练之前,该方法进一步包括:Optionally, before training the hash code extraction model, the method further includes:
对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码。Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder and forcing the encoder to extract high-quality hash codes.
可选地,对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近具体包括:Optionally, regularizing the output of the last layer of the decoder to try to ensure that the output of the hidden layer of the DNN is close to the hash code. Specific examples include:
在(2)所示的目标函数中再增加一项约束项得到式(3)所示的目标函数;Add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
Figure PCTCN2018105534-appb-000005
Figure PCTCN2018105534-appb-000005
其中,H (M)为解码器网络第M层输出;M为编码器和解码器网络的层数,η>0,η为正则化参数。 Among them, H (M) is the output of the M layer of the decoder network; M is the number of layers of the encoder and decoder network, η> 0, η is the regularization parameter.
可选地,所述对抗冗余哈希码深度提取模型进行训练包括:Optionally, training the anti-redundant hash code depth extraction model includes:
利用训练数据,对目标函数中θ、φ和A交替训练。Using training data, θ, φ, and A are alternately trained in the objective function.
可选地,所述利用训练数据,对目标函数中θ、φ和A交替训练包括:Optionally, using training data to alternately train θ, φ, and A in the objective function includes:
获取原始训练数据;Obtain the original training data;
将原始训练数据进行随机打乱后,均分成多份,依次取各份训练数据,用每份训练数据对θ、φ和A交替训练一次;After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;
重复上述步骤预设数次。Repeat the above steps a preset number of times.
根据本发明的另一个方面,提供了一种从图像中提取哈希码的方法,该方法包括:According to another aspect of the present invention, a method for extracting a hash code from an image is provided. The method includes:
构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型;Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction model;
对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Training against redundant hash code depth extraction model to determine the parameters in the model;
利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.
可选地,Optionally,
所述构建哈希码提取模型包括:构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;The constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
Figure PCTCN2018105534-appb-000006
Figure PCTCN2018105534-appb-000006
其中,
Figure PCTCN2018105534-appb-000007
D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000007
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型包括:Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction models include:
在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;Add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
Figure PCTCN2018105534-appb-000008
Figure PCTCN2018105534-appb-000008
其中,Z为解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000009
为弗罗贝尼乌斯范数,η>0,η为正则化参数,K为Z的维度,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数。
Where Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000009
Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
根据本发明的又一个方面,提供了一种图像检索方法,该方法包括:According to another aspect of the present invention, an image retrieval method is provided, and the method includes:
利用上述任一项所述方法,计算图像库中的各图像的哈希码以及计算检索图像的哈希码;Using any of the methods described above, calculating a hash code for each image in the image library and calculating a hash code for the retrieved image;
计算检索图像的哈希码与图像库中的各图像的哈希码的相似度,输出相似度最高的一个或多个图像。Calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
根据本发明的又一个方面,提供了一种从图像中提取哈希码的装置,该装置包括:According to another aspect of the present invention, a device for extracting a hash code from an image is provided, and the device includes:
模型构建单元,适于构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;以及适于在所述编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. It is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information and obtaining Anti-redundant hash code depth extraction model;
模型训练单元,适于对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;
哈希码提取单元,适于利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
可选地,所述模型构建单元,适于构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;Optionally, the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
Figure PCTCN2018105534-appb-000010
Figure PCTCN2018105534-appb-000010
其中,
Figure PCTCN2018105534-appb-000011
D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000011
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
以及,所述模型构建单元适于在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
Figure PCTCN2018105534-appb-000012
Figure PCTCN2018105534-appb-000012
其中,A是系数矩阵,Z是解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000013
为弗罗贝尼乌斯范数,δ>0,δ为正则化参数,K为Z的维度。
Where A is the coefficient matrix and Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000013
Is the Frobenius norm, δ> 0, δ is the regularization parameter, K is the dimension of Z
可选地,所述模型构建单元,进一步适于对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码。Optionally, the model construction unit is further adapted to regularize the output of the last layer of the decoder, so as to ensure that the output of the hidden layer of the DNN is similar to the hash code, thereby simplifying the network structure of the decoder and forcing encoding. The extractor extracts high-quality hash codes.
可选地,所述模型构建单元,具体适于在(2)所示的目标函数中再增加一项约束项得到式(3)所示的目标函数;Optionally, the model building unit is specifically adapted to add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
Figure PCTCN2018105534-appb-000014
Figure PCTCN2018105534-appb-000014
其中,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数,η>0,η为正则化参数。 Among them, H (M) is the output of the M layer of the decoder network, M is the number of layers of the encoder and decoder network, η> 0, and η is a regularization parameter.
可选地,所述模型训练单元,适于利用训练数据,对目标函数中θ、φ和A交替训练。Optionally, the model training unit is adapted to use training data to alternately train θ, φ, and A in an objective function.
可选地,所述模型训练单元具体适于,Optionally, the model training unit is specifically adapted to:
获取原始训练数据;Obtain the original training data;
将原始训练数据进行随机打乱后,均分成多份,依次取各份训练数据,用每份训练数据对θ、φ和A交替训练一次;After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;
重复上述步骤预设数次。Repeat the above steps a preset number of times.
根据本发明的又一个方面,提供了一种从图像中提取哈希码的装置,该装置包括:According to another aspect of the present invention, a device for extracting a hash code from an image is provided, and the device includes:
模型构建单元,适于构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;以及适于对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型;A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. Consists of multiple layers of DNN to convert the input hash code into an image; and is suitable for regularizing the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder Network structure. The backward encoder extracts high-quality hash codes and obtains an anti-redundant hash code depth extraction model.
模型训练单元,适于对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;
哈希码提取单元,适于利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
可选地,所述模型构建单元,适于构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;Optionally, the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
Figure PCTCN2018105534-appb-000015
Figure PCTCN2018105534-appb-000015
其中,
Figure PCTCN2018105534-appb-000016
D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000016
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
以及,所述模型构建单元适于在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
Figure PCTCN2018105534-appb-000017
Figure PCTCN2018105534-appb-000017
其中,Z为解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000018
为弗罗贝尼乌斯范数,η>0,η为正则化参数,K为Z的维度,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数。
Where Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000018
Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
根据本发明的又一个方面,提供了一种图像检索装置,该装置包括:According to another aspect of the present invention, an image retrieval device is provided, and the device includes:
如上述任一项所述的从图像中提取哈希码的装置,适于计算图像库中的各图像的哈希码以及计算检索图像的哈希码;The apparatus for extracting a hash code from an image according to any one of the foregoing, is suitable for calculating a hash code of each image in an image library and calculating a hash code of a retrieved image;
哈希码相似度计算单元,适于计算检索图像的哈希码与图像库中的各图像的哈希码的相似度,输出相似度最高的一个或多个图像。The hash code similarity calculation unit is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
根据本发明的又一个方面,提供了一种电子设备,所述电子设备包括:处理器,以及存储有可在处理器上运行的计算机程序的存储器;According to still another aspect of the present invention, there is provided an electronic device, the electronic device includes: a processor, and a memory storing a computer program executable on the processor;
其中,所述处理器,用于在执行所述存储器中的计算机程序时执行上述任一项所述的方法。Wherein, the processor is configured to execute the method according to any one of the above when executing a computer program in the memory.
根据本发明的又一个方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述任一项所述的方法。According to yet another aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of the foregoing is implemented.
本发明的技术方案通过构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;在编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。该方法通过构建抗冗余哈希码提取模型可以有效降低编码空间信息冗余,有效利用所有维度,高精度地提取图像哈希码,有效提升图像检索等相关应用领域的准确度。The technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder. Consists of multi-layer DNN, which converts the input hash code into an image; in the encoder, the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image. By constructing an anti-redundant hash code extraction model, this method can effectively reduce the redundancy of coding space information, effectively use all dimensions, extract image hash codes with high accuracy, and effectively improve the accuracy of related applications such as image retrieval.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more comprehensible. In the following, specific embodiments of the invention are enumerated.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments and are not to be considered as limiting the invention. Moreover, the same reference numerals are used throughout the drawings to refer to the same parts. In the drawings:
图1示出了根据本发明一个实施例的一种从图像中提取哈希码的方法流程图;FIG. 1 shows a flowchart of a method for extracting a hash code from an image according to an embodiment of the present invention;
图2为SGH和R-SGH图像重构误差结果对比图;Figure 2 is a comparison chart of the results of SGH and R-SGH image reconstruction errors;
图3示出了根据本发明的一个实施例的一种图像检索方法流程图;FIG. 3 shows a flowchart of an image retrieval method according to an embodiment of the present invention;
图4示出了根据本发明的一个实施例的一种从图像中提取哈希码的装置示意图;4 shows a schematic diagram of an apparatus for extracting a hash code from an image according to an embodiment of the present invention;
图5示出了根据本发明的一个实施例提供的一种图像检索装置示意图;FIG. 5 is a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention; FIG.
图6是本发明实施例中的电子设备的结构示意图;6 is a schematic structural diagram of an electronic device in an embodiment of the present invention;
图7是本发明实施例中的一种计算机可读存储介质的结构示意图。FIG. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
具体实施方式detailed description
本发明出现的专业名词解释:Explanation of professional terms appearing in the present invention:
1、图像检索:Image Retrieval,指根据输入的图像,在庞大的图像库中检索相似的图像。1. Image retrieval: Image Retrieval refers to retrieving similar images from a huge image library based on the input image.
2、LTH:Learning To Hash,是一种图像压缩方法,从图像提取二进制哈希码,计算输入图像与图像库中图像哈希码的相似度,进行检索。在图像检索应用中,LTH方法可大大降低存储空间,并提升检索效率。2. LTH: Learning To Hash is an image compression method that extracts a binary hash code from an image, calculates the similarity between the input image and the image hash code in the image library, and performs retrieval. In image retrieval applications, the LTH method can greatly reduce storage space and improve retrieval efficiency.
3、VAE:Variational Autoencoder,变分自编码器。自编码器是一种无监督神经网络方法,由编码器和解码器组成,可根据随机编码生成图像。变分自编码器是对随机编码进行标准正态分布约束,进而生成图像。隐含表示生成模型可用如下公式表示:3. VAE: Variational Autoencoder, a variational autoencoder. An autoencoder is an unsupervised neural network method consisting of an encoder and a decoder that can generate images based on random encoding. Variational autoencoders impose standard normal distribution constraints on random encoding to generate images. The implicit representation generation model can be expressed by the following formula:
Figure PCTCN2018105534-appb-000019
Figure PCTCN2018105534-appb-000019
其中θ是由解码器DNN模型得到的、用于表示似然的参数;X为输入数据;Z为隐含表示数据;Where θ is a parameter obtained by the decoder DNN model and used to represent the likelihood; X is the input data; Z is the implicit data;
推理模型(即DNN编码器)如下式表示:The inference model (that is, the DNN encoder) is expressed as follows:
Figure PCTCN2018105534-appb-000020
Figure PCTCN2018105534-appb-000020
VAE目标函数如下式:The VAE objective function is as follows:
Figure PCTCN2018105534-appb-000021
Figure PCTCN2018105534-appb-000021
其中D KL是KL散度,优化目标为调整θ和φ的ELBO最大值。 Where D KL is the KL divergence, and the optimization goal is to adjust the maximum ELBO of θ and φ.
4、冗余:指数据中存在很多不携带信息的维度(如都是0),或者不同维度数据线性相关。4. Redundancy means that there are many dimensions in the data that do not carry information (such as all 0), or that the data of different dimensions are linearly related.
5、SGH:Stochastic Generative Hashing,随机生成哈希码方法,是基于VAE框架的一种应用,使用线性高斯似然,并将伯努利先验作用于隐含表示Z,如下式:5. SGH: Stochastic Generative Hashing, a randomly generated hash code method, is an application based on the VAE framework, uses linear Gaussian likelihood, and applies Bernoulli's prior to the implicit representation Z, as follows:
Figure PCTCN2018105534-appb-000022
Figure PCTCN2018105534-appb-000022
其中
Figure PCTCN2018105534-appb-000023
among them
Figure PCTCN2018105534-appb-000023
其推理模型如下式表示:Its inference model is expressed as follows:
Figure PCTCN2018105534-appb-000024
Figure PCTCN2018105534-appb-000024
其中
Figure PCTCN2018105534-appb-000025
Figure PCTCN2018105534-appb-000026
为标量线性或深度非线性变换。
among them
Figure PCTCN2018105534-appb-000025
Figure PCTCN2018105534-appb-000026
Scalar linear or deep non-linear transformation.
SGH的目标函数同VAE。The objective function of SGH is the same as VAE.
6、R-SGH:为本发明提出的抗冗余随机哈希生成方法,可以保证图像提取到的哈希码不存在冗余。6. R-SGH: The anti-redundant random hash generation method proposed by the present invention can ensure that there is no redundancy in the hash code extracted from the image.
7、DNN:Deep Neural Network,深度神经网络。7. DNN: Deep Neural Network, deep neural network.
8、KL散度:Kullback-Leibler Divergence,用于表征两个概率分布的接近程度。8. KL divergence: Kullback-Leibler Divergence, used to characterize the closeness of two probability distributions.
9、弗罗贝尼乌斯范数:Frobenius norm,向量元素绝对值的2次方和的1/2次幂。9, Frobenius norm: Frobenius norm, the vector element's absolute value of the 2nd power and 1/2 power.
10、随机梯度下降法,stochastic parallel gradient descent algorithm,简称SPGD算法。是一种无模型优化算法,比较适用于控制变量较多、受控系统比较复杂,无法建立准确数学模型的最优化控制过程。10. Stochastic gradient descent method, stochastic parallel gradient algorithm, referred to as SPGD algorithm. It is a model-free optimization algorithm, which is more suitable for the optimal control process with more control variables and more complex controlled systems, which cannot establish accurate mathematical models.
11、MNIST:来自美国国家标准与技术研究所(National Institute of Standards and Technology,NIST)的手写数字公开数据集。其中训练集60000个样本,测试集10000个样本。11. MNIST: A handwritten digital public data set from the National Institute of Standards and Technology (NIST). There are 60,000 samples in the training set and 10,000 samples in the test set.
12、CIFAR-10:为公开数据集,含有60000张彩色图像,包括飞机、汽车、猫、鸟等10类,每类6000张图片。12. CIFAR-10: It is a public data set, which contains 60,000 color images, including 10 categories such as airplanes, cars, cats, and birds, with 6000 pictures in each category.
13、Caltech-256:为公开数据集,含有29780张图像,共256类。13. Caltech-256: It is a public data set, containing 29780 images, a total of 256 categories.
14、mAP,mean average precision(平均准确率),是检索领域常用的检索准确率评估指标。14. mAP, mean average (average accuracy), is an evaluation index commonly used in retrieval fields.
15、蒙特卡洛方法:Monte Carlo Method,也称统计模拟方法,是二十世纪四十年代中期由于科学技术的发展和电子计算机的发明,而被提出的一种以概率统计理论为指导的一类非常重要的数值计算方法。是指使用随机数(或更常见的伪随机数)来解决很多计算问题的方法。15. Monte Carlo Method: Monte Carlo Method, also known as statistical simulation method, is a method proposed by the theory of probability and statistics due to the development of science and technology and the invention of electronic computers in the mid 1940s. Class is very important for numerical calculation methods. A method that uses random numbers (or more commonly pseudo-random numbers) to solve many computational problems.
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a thorough understanding of the present disclosure, and to fully convey the scope of the present disclosure to those skilled in the art.
图1示出了根据本发明一个实施例的一种从图像中提取哈希码的方法流程图,该方法包括:FIG. 1 shows a flowchart of a method for extracting a hash code from an image according to an embodiment of the present invention. The method includes:
步骤S11:构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;Step S11: Construct a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from the image data and outputs it to the decoder, and the decoder is composed of multiple layers DNN composition, convert the input hash code into an image;
在编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;Hidden layer coding is regularized in the encoder by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information, and obtaining the anti-redundant hash code depth extraction model;
步骤S12:对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Step S12: training against a redundant hash code depth extraction model to determine parameters in the model;
步骤S13:利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。Step S13: Utilize the encoder in the trained anti-redundant hash code depth extraction model to extract a hash code from the image.
本发明的技术方案通过构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;在编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。该方法通过构建抗冗余哈希码提取模型可以有效降低编码空间信息冗余,有效利用所有维度,高精度地提取图像哈希码,有效提升图像检索等相关应用领域的准确度。The technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder. Consists of multi-layer DNN, which converts the input hash code into an image; in the encoder, the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image. By constructing an anti-redundant hash code extraction model, this method can effectively reduce the redundancy of coding space information, effectively use all dimensions, extract image hash codes with high accuracy, and effectively improve the accuracy of related applications such as image retrieval.
图1所示方法,具体来说,包括如下内容:The method shown in Figure 1 specifically includes the following:
(1)构建抗冗余哈希码深度提取模型,以提取高质量图像哈希码。(1) Construct an anti-redundant hash code depth extraction model to extract high-quality image hash codes.
构建哈希码提取模型包括:构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;Constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in equation (1);
Figure PCTCN2018105534-appb-000027
Figure PCTCN2018105534-appb-000027
其中,
Figure PCTCN2018105534-appb-000028
D KL为KL散度;X为输入数据,Z为隐含表示数据,在图像搜索领域,隐含表示数据即为图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000028
D KL is the KL divergence; X is the input data, and Z is the implicit representation data. In the field of image search, the implicit representation data is the image hash code. Θ is obtained by the decoder DNN model to represent the likelihood. parameter;
式(1)所示的目标函数中,In the objective function shown in equation (1),
Figure PCTCN2018105534-appb-000029
Figure PCTCN2018105534-appb-000029
该模型包括编码器和解码器,其中,编码器基于DNN模型从输入图像数据提取哈希码,在DNN输出时加入哈希码冗余度正则化约束,即在编码器中通过度量隐层编码冗余度对隐层编码进行正则化,以保证哈希码结果质量;解码器基于DNN利用哈希码生成图像数据。其中,在编码器中通过度量隐层编码冗余度对隐层编码进行正则化包括:在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;The model includes an encoder and a decoder, wherein the encoder extracts a hash code from the input image data based on the DNN model, and adds a hash code redundancy regularization constraint to the DNN output, that is, the encoder is encoded by measuring the hidden layer The redundancy is used to regularize the hidden layer coding to ensure the quality of the hash code result; the decoder uses the hash code to generate image data based on DNN. Wherein, regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
Figure PCTCN2018105534-appb-000030
Figure PCTCN2018105534-appb-000030
其中,A是系数矩阵,Z是解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000031
为弗罗贝尼乌斯范数,δ>0,δ为正则化参数,K为Z的维度。
Where A is the coefficient matrix and Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000031
Is the Frobenius norm, δ> 0, δ is the regularization parameter, K is the dimension of Z.
其中
Figure PCTCN2018105534-appb-000032
表示调整A的值使
Figure PCTCN2018105534-appb-000033
值最小;
Figure PCTCN2018105534-appb-000034
表示调整
Figure PCTCN2018105534-appb-000035
Figure PCTCN2018105534-appb-000036
使
Figure PCTCN2018105534-appb-000037
值最大;Z是解码器输出的图像哈希码,为多维二进制表示;A是系数矩阵;δ>0,δ为正则化参数;K为Z的维度;
among them
Figure PCTCN2018105534-appb-000032
Indicates that the value of A is adjusted such that
Figure PCTCN2018105534-appb-000033
Minimum value
Figure PCTCN2018105534-appb-000034
Representation adjustment
Figure PCTCN2018105534-appb-000035
with
Figure PCTCN2018105534-appb-000036
Make
Figure PCTCN2018105534-appb-000037
The maximum value; Z is the image hash code output by the decoder, which is a multidimensional binary representation; A is the coefficient matrix; δ> 0, δ is the regularization parameter; K is the dimension of Z;
Figure PCTCN2018105534-appb-000038
Figure PCTCN2018105534-appb-000038
解码器,由M层DNN组成,将输入哈希码转换为图像,在DNN输出时加入正则化约束,即在对哈希码提取模型进行训练之前,对解码器的最后一层输出进行正则化尽量保证DNN隐藏层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量哈希码,具体包括:The decoder, which consists of M-layer DNNs, converts the input hash code into an image, and adds a regularization constraint to the DNN output, that is, the last output of the decoder is regularized before training the hash code extraction model Try to ensure that the output of the hidden layer of the DNN is similar to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes, including:
在(2)所示的目标函数中再增加一项约束项得到式(3)所示的目标函数;Add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
Figure PCTCN2018105534-appb-000039
Figure PCTCN2018105534-appb-000039
其中
Figure PCTCN2018105534-appb-000040
表示调整A的值使
Figure PCTCN2018105534-appb-000041
值最小;
among them
Figure PCTCN2018105534-appb-000040
Indicates that the value of A is adjusted such that
Figure PCTCN2018105534-appb-000041
Minimum value
Figure PCTCN2018105534-appb-000042
表示调整
Figure PCTCN2018105534-appb-000043
Figure PCTCN2018105534-appb-000044
使
Figure PCTCN2018105534-appb-000045
值最大;
Figure PCTCN2018105534-appb-000042
Representation adjustment
Figure PCTCN2018105534-appb-000043
with
Figure PCTCN2018105534-appb-000044
Make
Figure PCTCN2018105534-appb-000045
Maximum value
Z是解码器输出的图像哈希码,为多维二进制表示;A是系数矩阵;
Figure PCTCN2018105534-appb-000046
为弗罗贝尼乌斯范数;H (M)为解码器网络第M层输出;M为编码器和解码器网络的层数;δ>0,η>0,为正则化参数;K是Z的维度;
Z is the image hash code output by the decoder, which is a multidimensional binary representation; A is the coefficient matrix;
Figure PCTCN2018105534-appb-000046
Is the Frobenius norm; H (M) is the M layer output of the decoder network; M is the number of layers of the encoder and decoder network; δ> 0, η> 0, regularization parameters; K is The dimension of Z
Figure PCTCN2018105534-appb-000047
Figure PCTCN2018105534-appb-000047
为了降低哈希码冗余,可以使哈希码各个维度尽量线性不相关,因此目标函数中引入
Figure PCTCN2018105534-appb-000048
项,整体优化目标为
Figure PCTCN2018105534-appb-000049
如果Z的任一维都可由其他维度线性表出,那么一定可以找到一个系数矩阵A,使得
Figure PCTCN2018105534-appb-000050
为0。如果能找到某系数矩阵A使得该项不为0,但很小,则说明Z的某些维度存在线性相关性,也即冗余度较高。因此在给定矩阵A时,我们希望通过调整Z使得
Figure PCTCN2018105534-appb-000051
最大,也即使Z的各个维度尽量线性不相关。又因为Z是由参数为
Figure PCTCN2018105534-appb-000052
的编码器网络变换得到,所以我们转而优化
Figure PCTCN2018105534-appb-000053
In order to reduce the hash code redundancy, the dimensions of the hash code can be made as linear and uncorrelated as possible, so the objective function is introduced
Figure PCTCN2018105534-appb-000048
Items, the overall optimization goal is
Figure PCTCN2018105534-appb-000049
If any dimension of Z can be expressed linearly by other dimensions, then a coefficient matrix A must be found, such that
Figure PCTCN2018105534-appb-000050
Is 0. If a coefficient matrix A can be found so that the term is not 0, but it is small, it indicates that some dimensions of Z have a linear correlation, that is, a high degree of redundancy. Therefore, given the matrix A, we want to adjust Z such that
Figure PCTCN2018105534-appb-000051
Maximum, even if the dimensions of Z are as linear and uncorrelated as possible. And because Z is determined by the parameter
Figure PCTCN2018105534-appb-000052
The encoder network transform is obtained, so we turn to optimize
Figure PCTCN2018105534-appb-000053
在解码器结构复杂时,也会导致提取哈希码冗余度提升,因此需要尽量简化解码器结构。目标函数中引入
Figure PCTCN2018105534-appb-000054
这一子项的优化目标为
Figure PCTCN2018105534-appb-000055
Figure PCTCN2018105534-appb-000056
要尽量小,这就促使解码器DNN第M层输出H (M)与解码器输入哈希码Z尽量接近,从而使解码器网络尽量简单,最终倒逼编码器部分从图像中提取更加准确有效的哈希码。
When the decoder structure is complicated, the redundancy of the extracted hash code will also be increased, so the decoder structure needs to be simplified as much as possible. Introduced in the objective function
Figure PCTCN2018105534-appb-000054
The optimization goal of this sub-item is
Figure PCTCN2018105534-appb-000055
which is
Figure PCTCN2018105534-appb-000056
It should be as small as possible, which makes the output D (M) of the decoder DNN layer H (M) as close as possible to the input hash code Z of the decoder, so as to make the decoder network as simple as possible, and finally extract the encoder part from the image more accurately and effectively Hash code.
(2)对抗冗余哈希码深度提取模型进行训练,包括:利用训练数据,对目标函数中θ、φ和A交替训练;具体为:(2) Training against the depth extraction model of redundant hash codes, including: using training data to alternately train θ, φ, and A in the objective function; specifically:
1、获取原始训练数据,准备训练数据1. Obtain raw training data and prepare training data
2、将原始训练数据进行随机打乱后,均分成多份,依次取各份训练数据,用每份训练数据对θ、φ和A交替训练一次,即将训练数据进行随机打乱,并平均分成S份,每一份有N个样本,设置s=0;2. Randomly scramble the original training data and divide it into multiple parts. Take each piece of training data in turn. Use each piece of training data to train θ, φ, and A alternately. S copies, each with N samples, set s = 0;
3、取第s份训练样本数据3.Take the sth training sample data
4、θ优化4.Theta optimization
假定φ和A为已知参数,θ优化目标函数转换为下(式a)所示形式,以(式a)为目标函数:Assuming φ and A are known parameters, the θ optimization objective function is transformed into the form shown in the following (formula a), with (formula a) as the objective function:
Figure PCTCN2018105534-appb-000057
Figure PCTCN2018105534-appb-000057
使用下式对其进行变换:Transform it using:
Figure PCTCN2018105534-appb-000058
Figure PCTCN2018105534-appb-000058
其中,ξ j,ε j~u(0,1) Among them, ξ j and ε j to u (0,1)
Figure PCTCN2018105534-appb-000059
Figure PCTCN2018105534-appb-000059
于是可以得到So you can get
Figure PCTCN2018105534-appb-000060
Figure PCTCN2018105534-appb-000060
然后针对上式,使用蒙特卡洛方法估计改期望的梯度,并基于梯度更新θ的值Then for the above formula, use Monte Carlo method to estimate the expected gradient, and update the value of θ based on the gradient.
5、φ优化5, φ optimization
假定θ和A为已知参数,φ优化目标函数转换为下(式b)所示形式,以(式b)为目标函数:Assuming θ and A are known parameters, the φ optimization objective function is transformed into the form shown in the following (Equation b), and (Equation b) is the objective function:
Figure PCTCN2018105534-appb-000061
Figure PCTCN2018105534-appb-000061
从式5中不难看出,式中第一项可以无限大,这会影响优化效果,为了避免这一问题,对其进行变换,如下式:It is not difficult to see from Equation 5 that the first term in the equation can be infinite, which will affect the optimization effect. To avoid this problem, transform it as follows:
Figure PCTCN2018105534-appb-000062
Figure PCTCN2018105534-appb-000062
其中R=Z T-Z TA,∈是超参数阈值。 Where R = Z T -Z T A, ∈ is the hyperparameter threshold.
使用随机梯度下降法更新φ的值。The value of φ is updated using the stochastic gradient descent method.
6、A优化6.A optimization
假定θ和φ为已知参数,A优化目标函数转换为下(式c)所示形式,以(式c)为目标函数,使用随机梯度下降法更新A的值:Assuming that θ and φ are known parameters, the optimization objective function of A is transformed into the form shown in (Expression c) below, and (Expression c) is used as the objective function, and the value of A is updated using the stochastic gradient descent method:
Figure PCTCN2018105534-appb-000063
Figure PCTCN2018105534-appb-000063
7、s=s+17.s = s + 1
8、重复3-7,直到S份样本均参与训练;8. Repeat 3-7 until all S samples are involved in training;
9、重复2-8共T次,即进行T轮训练。9. Repeat 2-8 times for T rounds.
下面将举出具体实施例对技术方案进行阐述。The following describes the technical solutions with specific examples.
实施例1,MNIST数据及图像重构Example 1, MNIST data and image reconstruction
网络参数设置:编码器和解码器层数M设置为1,δ和η为0.01,先验参数ρ j为0.5,阈值参数∈设置为0.05,编码器输入数据和解码器输出数据维度为28*28=784,哈希码及编码器和解码器各隐藏层数据维度为64; Network parameter settings: The number of encoder and decoder layers M is set to 1, δ and η are 0.01, the prior parameter ρ j is 0.5, the threshold parameter ∈ is set to 0.05, and the dimensions of the encoder input data and decoder output data are 28 * 28 = 784, the data dimension of each hidden layer of the hash code and encoder and decoder is 64;
训练数据集为MIST的训练集;训练每一步使用数据样本数量batch size设置为32;在不同训练轮数时对模型的图像重构误差进行了评估,评估使用MIST测试集,经过编码器提取哈希码、解码器生成重构数据,计算输入和重构数据误差,计算方法为如下:The training data set is the training set of MIST; the number of data samples used in each training step is set to 32; the image reconstruction error of the model is evaluated at different training rounds. The evaluation uses the MIST test set and is extracted by the encoder. The Greek code and decoder generate reconstructed data, calculate the input and reconstructed data errors, and the calculation method is as follows:
Figure PCTCN2018105534-appb-000064
Figure PCTCN2018105534-appb-000064
其中N为评估样本数,D为每一个样本数据维数,x为输入数据,y为重构数据。Where N is the number of evaluation samples, D is the dimension of each sample data, x is the input data, and y is the reconstructed data.
图2为SGH和R-SGH图像重构误差结果对比图,其中下降幅度较大的曲线代表R.SGH,从图2可看出本发明提出的R-SGH对图像有更好的重构能力。FIG. 2 is a comparison chart of the results of SGH and R-SGH image reconstruction errors. The curve with a larger decrease represents R.SGH. From FIG. 2, it can be seen that the R-SGH proposed by the present invention has a better reconstruction ability for images .
实施例2,CIFAR-10图像检索Example 2, CIFAR-10 image retrieval
网络参数设置:编码器和解码器层数M设置为4,δ为0.01,η为0.01,先验参数ρ j为0.5,阈值参数∈设置为0.05,编码器输入数据和解码器输出数据维度为512,哈希码及编码器和解码器各隐藏层数据维度为32、64和128三种; Network parameter settings: The number of encoder and decoder layers M is set to 4, δ is 0.01, η is 0.01, the prior parameter ρ j is 0.5, the threshold parameter ∈ is set to 0.05, and the dimensions of the encoder input data and decoder output data are 512, Hash code and encoder and decoder hidden layer data dimensions are 32, 64 and 128;
将CIFAR-10数据集10类数据中各随机抽取100个样本数据,共1000个样本数据作为测试时检索输入,其余数据为训练样本,也为图像库。训练每一步使用数据样本数量batch size设置为32,训练轮数为200。100 samples of each of the 10 types of data in the CIFAR-10 dataset were randomly selected. A total of 1,000 samples were used as search input during the test. The remaining data were training samples and image libraries. The number of data samples for each training step is set to 32, and the number of training rounds is 200.
使用mAP指标进行评估图像检索能力,哈希码维度为32、64和128三种模型mAP结果如表1所示,表1为SGH和R-SGH在CIFAR-10数据集mAP(%)测试结果:The mAP index is used to evaluate the image retrieval capability. The mAP results of the three models with hash code dimensions of 32, 64, and 128 are shown in Table 1. Table 1 shows the results of the mAP (%) test of SGH and R-SGH on the CIFAR-10 dataset :
表1 SGH和R-SGH在CIFAR-10数据集mAP(%)测试结果Table 1 mAP (%) test results of SGH and R-SGH in the CIFAR-10 dataset
方法method 哈希码32bitHash code 32bit 哈希码64bitHash code 64bit 哈希码128bit128bit hash code
SGHSGH 23.8623.86 30.5630.56 35.6135.61
R-SGHR-SGH 24.6624.66 33.6233.62 44.1244.12
从表1中可见本发明提出的R-SGH有更好的图像检索能力。It can be seen from Table 1 that the R-SGH proposed by the present invention has better image retrieval capabilities.
实施例3,Caltech-256图像检索Example 3, Caltech-256 image retrieval
网络参数设置:编码器和解码器层数M设置为4,δ为0.01,η为0.01,先验参数ρ j为0.5,阈值参数∈设置为0.05,编码器输入数据和解码器输出数据维度为512,哈希码及编码器和解码器各隐藏层数据维度为32、64和128三种; Network parameter settings: The number of encoder and decoder layers M is set to 4, δ is 0.01, η is 0.01, the prior parameter ρ j is 0.5, the threshold parameter ∈ is set to 0.05, and the dimensions of the encoder input data and decoder output data are 512, Hash code and encoder and decoder hidden layer data dimensions are 32, 64 and 128;
从Caltech-256数据集中随机抽取1000个样本数据作为测试时检索输入,其余数据为训练样本,也为图像库。训练每一步使用数据样本数量batch size设置为32,训练轮数为200。1000 samples of data were randomly selected from the Caltech-256 data set for retrieval input during testing, and the remaining data were training samples and image libraries. The number of data samples for each training step is set to 32, and the number of training rounds is 200.
使用mAP指标进行评估图像检索能力,哈希码维度为32、64和128三种模型mAP结果如表2所示,表2为SGH和R-SGH在Caltech-256数据集mAP(%)测试结果:The mAP index is used to evaluate the image retrieval ability. The mAP results of the three models with the hash code dimensions of 32, 64, and 128 are shown in Table 2. Table 2 shows the mAP (%) test results of SGH and R-SGH on the Caltech-256 dataset :
表2 SGH和R-SGH在Caltech-256数据集mAP(%)测试结果Table 2 mAP (%) test results of SGH and R-SGH on Caltech-256 dataset
方法method 哈希码32bitHash code 32bit 哈希码64bitHash code 64bit 哈希码128bit128bit hash code
SGHSGH 47.1247.12 71.0971.09 78.6178.61
R-SGHR-SGH 59.0259.02 74.1874.18 84.9684.96
从表2中可见本发明提出的R-SGH有更好的图像检索能力。It can be seen from Table 2 that the R-SGH proposed by the present invention has better image retrieval capabilities.
本发明一个实施例提供了另一种从图像中提取哈希码的方法,该方法包括:An embodiment of the present invention provides another method for extracting a hash code from an image. The method includes:
构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
在编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;Hidden layer coding is regularized in the encoder by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information, and obtaining the anti-redundant hash code depth extraction model;
对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Training against redundant hash code depth extraction model to determine the parameters in the model;
利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.
其中,构建哈希码提取模型包括:构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;Among them, constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in formula (1);
Figure PCTCN2018105534-appb-000065
Figure PCTCN2018105534-appb-000065
其中,
Figure PCTCN2018105534-appb-000066
D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000066
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
对解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型包括:Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder. Code depth extraction models include:
在(1)所示的目标函数中增加一项约束项得到式(4)所示的目标函数;Add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (4);
Figure PCTCN2018105534-appb-000067
Figure PCTCN2018105534-appb-000067
其中,Z为解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000068
为弗罗贝尼乌斯范数,η>0,η为正则化参数,K为Z的维度,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数。
Where Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000068
Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
在解码器结构复杂时,也会导致提取哈希码冗余度提升,因此需要尽量简化解码器结构。目标函数中引入
Figure PCTCN2018105534-appb-000069
这一子项的优化目标为
Figure PCTCN2018105534-appb-000070
Figure PCTCN2018105534-appb-000071
要尽量小,这就促使解码器DNN第M层输出H (M)与解码器输入哈希码Z尽量接近,从而使解码器网络尽量简单,最终倒逼编码器部分从图像中提取更加准确有效的哈希码。
When the decoder structure is complicated, the redundancy of the extracted hash code will also be increased, so the decoder structure needs to be simplified as much as possible. Introduced in the objective function
Figure PCTCN2018105534-appb-000069
The optimization goal of this sub-item is
Figure PCTCN2018105534-appb-000070
which is
Figure PCTCN2018105534-appb-000071
It should be as small as possible, which makes the output D (M) of the decoder DNN layer H (M) as close as possible to the input hash code Z of the decoder, so as to make the decoder network as simple as possible, and finally extract the encoder part from the image more accurately and effectively Hash code.
图3示出了根据本发明的一个实施例的一种图像检索方法流程图,该方法包括:FIG. 3 shows a flowchart of an image retrieval method according to an embodiment of the present invention. The method includes:
步骤S31:利用上述方法,计算图像库中的各图像的哈希码以及计算检索图像的哈希码;Step S31: Calculate a hash code of each image in the image library by using the above method, and calculate a hash code of the retrieved image;
步骤S32:计算检索图像的哈希码与图像库中的各图像的哈希码的相似度,输出相似度最高的一个或多个图像。Step S32: Calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
图4示出了根据本发明的一个实施例的一种从图像中提取哈希码的装置示意图,该装置40包括:FIG. 4 shows a schematic diagram of an apparatus for extracting a hash code from an image according to an embodiment of the present invention. The apparatus 40 includes:
模型构建单元401,适于构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;以及适于在所述编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;The model construction unit 401 is adapted to construct a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multilayer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to a decoder for decoding. The encoder is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer encoding by measuring the redundancy of the hidden layer encoding in the encoder, thereby reducing the redundancy of the coding space information, Get anti-redundant hash code depth extraction model;
模型训练单元402,适于对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;A model training unit 402, adapted to train against a redundant hash code depth extraction model to determine parameters in the model;
哈希码提取单元403,适于利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。The hash code extraction unit 403 is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
在本发明的一个实施例中,模型构建单元402,适于构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;In an embodiment of the present invention, the model construction unit 402 is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and its objective function is shown in formula (1);
Figure PCTCN2018105534-appb-000072
Figure PCTCN2018105534-appb-000072
其中,
Figure PCTCN2018105534-appb-000073
D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000073
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
以及,所述模型构建单元适于在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
Figure PCTCN2018105534-appb-000074
Figure PCTCN2018105534-appb-000074
其中,A是系数矩阵,Z是解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000075
为弗罗贝尼乌斯范数,δ>0,δ为正则化参数,K为Z的维度。
Where A is the coefficient matrix and Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000075
Is the Frobenius norm, δ> 0, δ is the regularization parameter, and K is the dimension of Z.
在本发明的一个实施例中,模型构建单元401,进一步适于对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码。In an embodiment of the present invention, the model construction unit 401 is further adapted to regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder. The backward-force encoder extracts high-quality hash codes.
在本发明的一个实施例中,模型构建单元401,具体适于在(2)所示的目标函数中再增加一项约束项得到式(3)所示的目标函数;In an embodiment of the present invention, the model construction unit 401 is specifically adapted to add a constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
Figure PCTCN2018105534-appb-000076
Figure PCTCN2018105534-appb-000076
其中,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数,η>0,η为正则化参数。 Among them, H (M) is the output of the M layer of the decoder network, M is the number of layers of the encoder and decoder network, η> 0, and η is a regularization parameter.
在本发明的一个实施例中,模型训练单元402,适于利用训练数据,对目标函数中θ、φ和A交替训练。In one embodiment of the present invention, the model training unit 402 is adapted to use training data to alternately train θ, φ, and A in the objective function.
模型训练单元402具体适于,The model training unit 402 is specifically adapted to,
获取原始训练数据;Obtain the original training data;
将原始训练数据进行随机打乱后,均分成多份,依次取各份训练数据,用每份训练数据对θ、φ和A交替训练一次;After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;
重复上述步骤预设数次。Repeat the above steps a preset number of times.
本发明的一个实施例示出了另一种从图像中提取哈希码的装置示意图,该装置包括:An embodiment of the present invention shows another schematic diagram of an apparatus for extracting a hash code from an image. The apparatus includes:
模型构建单元,适于构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;以及适于对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型;A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. Consists of multiple layers of DNN to convert the input hash code into an image; and is suitable for regularizing the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder Network structure. The backward encoder extracts high-quality hash codes and obtains an anti-redundant hash code depth extraction model.
模型训练单元,适于对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;
哈希码提取单元,适于利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
在本发明的一个实施例中,模型构建单元,适于构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;In an embodiment of the present invention, the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
Figure PCTCN2018105534-appb-000077
Figure PCTCN2018105534-appb-000077
其中,
Figure PCTCN2018105534-appb-000078
D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
among them,
Figure PCTCN2018105534-appb-000078
D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
以及,所述模型构建单元适于在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
Figure PCTCN2018105534-appb-000079
Figure PCTCN2018105534-appb-000079
其中,Z为解码器输出的图像哈希码,
Figure PCTCN2018105534-appb-000080
为弗罗贝尼乌斯范数,η>0,η为正则化参数,K为Z的维度,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数。
Where Z is the image hash code output by the decoder,
Figure PCTCN2018105534-appb-000080
Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
图5示出了根据本发明的一个实施例提供的一种图像检索装置示意图,该装置50包括:FIG. 5 shows a schematic diagram of an image retrieval apparatus according to an embodiment of the present invention. The apparatus 50 includes:
如上述的从图像中提取哈希码的装置501,适于计算图像库中的各图像的哈希码以及计算检索图像的哈希码;The device 501 for extracting a hash code from an image as described above is suitable for calculating a hash code of each image in an image library and calculating a hash code of a retrieved image;
哈希码相似度计算单元502,适于计算检索图像的哈希码与图像库中的各图像的哈希码的相似度,输出相似度最高的一个或多个图像。The hash code similarity calculation unit 502 is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
本发明的技术方案通过构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;在编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而有效降低编码空间信息冗余,有效利用所有维度,提取可精确表示图像的哈希码;基于哈希码对模型解码器最后一层输出进行正则化,简化解码器复杂度,倒逼编码器提取更加准确有效的哈希码;使用两项正则化组合对整个模型进行优化,从而有效改善哈希码信息冗余问题,高精度地提取图像哈希码,有效提升图像检索等相关应用领域的准确度。The technical solution of the present invention constructs a hash code extraction model, which includes an encoder and a decoder; wherein the encoder is composed of a multi-layer deep neural network DNN, extracts a hash code from image data, and outputs the hash code to the decoder and decoder. Consists of multi-layer DNN, which converts the input hash code into an image; in the encoder, the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby reducing the redundancy of the coding space information and obtaining an anti-redundant hash Code depth extraction model; training against redundant hash code depth extraction model to determine the parameters in the model; using the trained encoder in the anti-redundant hash code depth extraction model to extract the hash code from the image. In the encoder, the hidden layer coding is regularized by measuring the redundancy of the hidden layer coding, thereby effectively reducing the coding space information redundancy, effectively using all dimensions, and extracting a hash code that can accurately represent the image; decoding the model based on the hash code The output of the last layer of the decoder is regularized, simplifying the complexity of the decoder, and extracting the encoder to extract a more accurate and effective hash code. The two models are used to optimize the entire model, thereby effectively improving the problem of hash code information redundancy. , Extract image hash codes with high accuracy, and effectively improve the accuracy of related application areas such as image retrieval.
需要说明的是:It should be noted:
在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays provided here are not inherently related to any particular computer, virtual appliance, or other device. Various general-purpose devices can also be used with teaching based on this. The structure required to construct such a device is obvious from the above description. Furthermore, the invention is not directed to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is to disclose the best embodiment of the present invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided here, numerous specific details are explained. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of the specification.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be understood that, in order to streamline the present disclosure and help understand one or more of the various aspects of the invention, in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together into a single embodiment, Figure, or description of it. However, this disclosed method should not be construed to reflect the intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single embodiment previously disclosed. Thus, the claims that follow a specific embodiment are hereby explicitly incorporated into this specific embodiment, where each claim itself serves as a separate embodiment of the invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有 特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except for such features and / or processes or units, which are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed may be employed in any combination or All processes or units of the equipment are combined. Each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art can understand that although some embodiments described herein include some features included in other embodiments but not other features, the combination of features of different embodiments is meant to be within the scope of the present invention Within and form different embodiments. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的文字内容的拍照录入装置、电子设备和计算机可读存储介质中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented by hardware, or by software modules running on one or more processors, or by a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some of the photographic input device, electronic device, and computer-readable storage medium for text content according to the embodiments of the present invention. Or some or all functions of all components. The invention may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing part or all of the method described herein. Such a program that implements the present invention may be stored on a computer-readable medium or may have the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
例如,图6是本发明实施例中的电子设备的结构示意图。该电子设备600包括:处理器610,以及存储有可在所述处理器610上运行的计算机程序的存储器620。处理器610,用于在执行所述存储器620中的计算机程序时执行本发明中方法的各步骤。存储器620可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器620具有存储用于执行上述方法中的任何方法步骤的计算机程序631的存储空间630。计算机程序631可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图7所述的计算机可读存储介质。For example, FIG. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention. The electronic device 600 includes a processor 610 and a memory 620 storing a computer program executable on the processor 610. The processor 610 is configured to execute each step of the method in the present invention when the computer program in the memory 620 is executed. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. The memory 620 has a storage space 630 that stores a computer program 631 for performing any of the method steps in the above method. The computer program 631 may be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a computer-readable storage medium such as that described in FIG. 7.
图7是本发明实施例中的一种计算机可读存储介质的结构示意图。该计算机可读存储介质700存储有用于执行根据本发明的方法步骤的计算机程序631,可以被电子设备600的处理器610读取,当计算机程序631由电子设备600运行时,导致该电子设备600执行上面所描述的方法中的各个步骤,具体来说,该计算机可读存储介质存储的计算程序631可以执行上述任一实施例中示出的方法。计算机程序631可以以适当形式进行压缩。FIG. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. The computer-readable storage medium 700 stores a computer program 631 for performing the method steps according to the present invention, which can be read by the processor 610 of the electronic device 600. When the computer program 631 is run by the electronic device 600, the electronic device 600 is caused Each step in the method described above is performed. Specifically, the calculation program 631 stored in the computer-readable storage medium may execute the method shown in any one of the foregoing embodiments. The computer program 631 can be compressed in a suitable form.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate the invention rather than limit the invention, and that those skilled in the art may design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claim listing several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third does not imply any order. These words can be interpreted as names.

Claims (20)

  1. 一种从图像中提取哈希码的方法,其中,该方法包括:A method for extracting a hash code from an image, wherein the method includes:
    构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
    在所述编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;Regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information, and obtaining a redundant extraction model of the anti-redundant hash code;
    对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Training against redundant hash code depth extraction model to determine the parameters in the model;
    利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.
  2. 如权利要求1所述的方法,其中,The method of claim 1, wherein:
    所述构建哈希码提取模型包括:构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;The constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
    Figure PCTCN2018105534-appb-100001
    Figure PCTCN2018105534-appb-100001
    其中,
    Figure PCTCN2018105534-appb-100002
    D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
    among them,
    Figure PCTCN2018105534-appb-100002
    D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
    在所述编码器中通过度量隐层编码冗余度对隐层编码进行正则化包括:在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;Regularizing hidden layer coding by measuring the redundancy of hidden layer coding in the encoder includes: adding a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
    Figure PCTCN2018105534-appb-100003
    Figure PCTCN2018105534-appb-100003
    其中,A是系数矩阵,Z是解码器输出的图像哈希码,
    Figure PCTCN2018105534-appb-100004
    为弗罗贝尼乌斯范数,δ>0,δ为正则化参数,K为Z的维度。
    Where A is the coefficient matrix and Z is the image hash code output by the decoder,
    Figure PCTCN2018105534-appb-100004
    Is the Frobenius norm, δ> 0, δ is the regularization parameter, and K is the dimension of Z.
  3. 如权利要求2所述的方法,其中,在对哈希码提取模型进行训练之前,该方法进一步包括:The method of claim 2, wherein before training the hash code extraction model, the method further comprises:
    对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码。Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the network structure of the decoder and forcing the encoder to extract high-quality hash codes.
  4. 如权利要求3所述的方法,其中,对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近具体包括:The method according to claim 3, wherein regularizing the output of the last layer of the decoder to ensure as far as possible that the output of the hidden layer of the DNN is similar to the hash code specifically includes:
    在(2)所示的目标函数中再增加一项约束项得到式(3)所示的目标函数;Add another constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
    Figure PCTCN2018105534-appb-100005
    Figure PCTCN2018105534-appb-100005
    其中,H (M)为解码器网络第M层输出;M为编码器和解码器网络的层数,η>0,η为正则化参数。 Among them, H (M) is the output of the M layer of the decoder network; M is the number of layers of the encoder and decoder network, η> 0, η is the regularization parameter.
  5. 如权利要求1-4中任一项所述的方法,其中,所述对抗冗余哈希码深度提取模型进行训练包括:The method according to any one of claims 1-4, wherein training the anti-redundant hash code depth extraction model comprises:
    利用训练数据,对目标函数中θ、φ和A交替训练。Using training data, θ, φ, and A are alternately trained in the objective function.
  6. 如权利要求5所述的方法,其中,所述利用训练数据,对目标函数中θ、φ和A交替训练包括:The method according to claim 5, wherein the alternate training of θ, φ, and A in the objective function using the training data comprises:
    获取原始训练数据;Obtain the original training data;
    将原始训练数据进行随机打乱后,均分成多份,依次取各份训练数据,用每份训练数据对θ、φ和A交替训练一次;After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;
    重复上述步骤预设数次。Repeat the above steps a preset number of times.
  7. 一种从图像中提取哈希码的方法,其中,该方法包括:A method for extracting a hash code from an image, wherein the method includes:
    构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;Construct a hash code extraction model, which includes an encoder and a decoder; where the encoder is composed of a multilayer deep neural network DNN, which extracts the hash code from the image data and outputs it to the decoder, and the decoder is composed of a multilayer DNN, Convert the input hash code into an image;
    对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型;Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction model;
    对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Training against redundant hash code depth extraction model to determine the parameters in the model;
    利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。Use the encoder in the trained anti-redundant hash code depth extraction model to extract the hash code from the image.
  8. 如权利要求7所述的方法,其中,The method according to claim 7, wherein:
    所述构建哈希码提取模型包括:构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;The constructing a hash code extraction model includes: constructing a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
    Figure PCTCN2018105534-appb-100006
    Figure PCTCN2018105534-appb-100006
    其中,
    Figure PCTCN2018105534-appb-100007
    D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
    among them,
    Figure PCTCN2018105534-appb-100007
    D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
    对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型包括:Regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder network structure, and forcing the encoder to extract high-quality hash codes to obtain anti-redundancy. Hash code depth extraction models include:
    在(1)所示的目标函数中增加一项约束项得到式(4)所示的目标函数;Add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (4);
    Figure PCTCN2018105534-appb-100008
    Figure PCTCN2018105534-appb-100008
    其中,Z为解码器输出的图像哈希码,
    Figure PCTCN2018105534-appb-100009
    为弗罗贝尼乌斯范数,η>0,η为正则化参数,K为Z的维度,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数。
    Where Z is the image hash code output by the decoder,
    Figure PCTCN2018105534-appb-100009
    Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
  9. 一种图像检索方法,其中,该方法包括:An image retrieval method, wherein the method includes:
    利用如权利要求1-8中任一项所述方法,计算图像库中的各图像的哈希码以及计算检索图像的哈希码;Using the method according to any one of claims 1 to 8, calculating a hash code of each image in the image library and calculating a hash code of the retrieved image;
    计算检索图像的哈希码与图像库中的各图像的哈希码的相似度,输出相似度最高的一个或多个图像。Calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
  10. 一种从图像中提取哈希码的装置,其中,该装置包括:A device for extracting a hash code from an image, wherein the device includes:
    模型构建单元,适于构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;以及适于在所述编码器中通过度量隐层编码冗余度对隐层编码进行正则化,从而降低编码空间信息冗余,得到抗冗余哈希码深度提取模型;A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. It is composed of a multi-layer DNN, which converts the input hash code into an image; and is suitable for regularizing the hidden layer coding by measuring the redundancy of the hidden layer coding in the encoder, thereby reducing the redundancy of the coding space information and obtaining Anti-redundant hash code depth extraction model;
    模型训练单元,适于对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;
    哈希码提取单元,适于利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
  11. 如权利要求10所述的装置,其中,所述模型构建单元,适于构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;The apparatus according to claim 10, wherein the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
    Figure PCTCN2018105534-appb-100010
    Figure PCTCN2018105534-appb-100010
    其中,
    Figure PCTCN2018105534-appb-100011
    D KL为KL散度;X为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
    among them,
    Figure PCTCN2018105534-appb-100011
    D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
    以及,所述模型构建单元适于在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
    Figure PCTCN2018105534-appb-100012
    Figure PCTCN2018105534-appb-100012
    其中,A是系数矩阵,Z是解码器输出的图像哈希码,
    Figure PCTCN2018105534-appb-100013
    为弗罗贝尼乌斯范数,δ>0,δ为正则化参数,K为Z的维度。
    Where A is the coefficient matrix and Z is the image hash code output by the decoder,
    Figure PCTCN2018105534-appb-100013
    Is the Frobenius norm, δ> 0, δ is the regularization parameter, and K is the dimension of Z.
  12. 如权利要求11所述的装置,其中,所述模型构建单元,进一步适于对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码。The apparatus according to claim 11, wherein the model construction unit is further adapted to regularize the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying decoding Network structure, the backward-forced encoder extracts high-quality hash codes.
  13. 如权利要求12所述的装置,其中,所述模型构建单元,具体适于在(2)所示的目标函数中再增加一项约束项得到式(3)所示的目标函数;The apparatus according to claim 12, wherein the model construction unit is specifically adapted to further add a constraint term to the objective function shown in (2) to obtain the objective function shown in formula (3);
    Figure PCTCN2018105534-appb-100014
    Figure PCTCN2018105534-appb-100014
    其中,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数,η>0,η为正则化参数。 Among them, H (M) is the output of the M layer of the decoder network, M is the number of layers of the encoder and decoder network, η> 0, and η is a regularization parameter.
  14. 如权利要求10-13中任一项所述的装置,其中,所述模型训练单元,适于利用训练数据,对目标函数中θ、φ和A交替训练。The device according to any one of claims 10-13, wherein the model training unit is adapted to use the training data to alternately train θ, φ, and A in the objective function.
  15. 如权利要求14所述的装置,其中,所述模型训练单元具体适于,The apparatus according to claim 14, wherein the model training unit is specifically adapted to,
    获取原始训练数据;Obtain the original training data;
    将原始训练数据进行随机打乱后,均分成多份,依次取各份训练数据,用每份训练数据对θ、φ和A交替训练一次;After the original training data is randomly shuffled, it is divided into multiple parts, each training data is taken in turn, and θ, φ, and A are trained alternately with each training data;
    重复上述步骤预设数次。Repeat the above steps a preset number of times.
  16. 一种从图像中提取哈希码的装置,其中,该装置包括:A device for extracting a hash code from an image, wherein the device includes:
    模型构建单元,适于构建哈希码提取模型,该模型包括编码器和解码器;其中,编码器由多层深度神经网络DNN组成,从图像数据中提取哈希码输出给解码器,解码器由多层DNN组成,将输入的哈希码转换为图像;以及适于对所述解码器的最后一层输出进行正则化,以尽量保证DNN隐层输出与哈希码相近,从而简化解码器网络结构,倒逼编码器提取高质量的哈希码,得到抗冗余哈希码深度提取模型;A model building unit is suitable for constructing a hash code extraction model. The model includes an encoder and a decoder. The encoder is composed of a multilayer deep neural network DNN, and extracts a hash code from image data and outputs the hash code to the decoder and decoder. Consists of multiple layers of DNN to convert the input hash code into an image; and is suitable for regularizing the output of the last layer of the decoder to ensure that the output of the hidden layer of the DNN is close to the hash code, thereby simplifying the decoder Network structure. The backward encoder extracts high-quality hash codes and obtains an anti-redundant hash code depth extraction model.
    模型训练单元,适于对抗冗余哈希码深度提取模型进行训练,确定模型中的参数;Model training unit, suitable for training against redundant hash code depth extraction model to determine parameters in the model;
    哈希码提取单元,适于利用训练好的抗冗余哈希码深度提取模型中的编码器,从图像中提取哈希码。The hash code extraction unit is adapted to extract a hash code from an image by using an encoder in a trained anti-redundant hash code depth extraction model.
  17. 如权利要求16所述的装置,其中,所述模型构建单元,适于构建变分自编码器VAE模型或随机哈希生成SGH模型,其目标函数如式(1)所示;The device according to claim 16, wherein the model construction unit is adapted to construct a variational autoencoder VAE model or a random hash to generate an SGH model, and an objective function thereof is shown in formula (1);
    Figure PCTCN2018105534-appb-100015
    Figure PCTCN2018105534-appb-100015
    其中,
    Figure PCTCN2018105534-appb-100016
    D KL为KL散度;X 为输入数据,Z为解码器输出的图像哈希码,θ是由解码器DNN模型得到的用于表示似然的参数;
    among them,
    Figure PCTCN2018105534-appb-100016
    D KL is the KL divergence; X is the input data, Z is the image hash code output by the decoder, and θ is a parameter used to represent the likelihood obtained by the decoder DNN model;
    以及,所述模型构建单元适于在(1)所示的目标函数中增加一项约束项得到式(2)所示的目标函数;And, the model building unit is adapted to add a constraint term to the objective function shown in (1) to obtain the objective function shown in formula (2);
    Figure PCTCN2018105534-appb-100017
    Figure PCTCN2018105534-appb-100017
    其中,Z为解码器输出的图像哈希码,
    Figure PCTCN2018105534-appb-100018
    为弗罗贝尼乌斯范数,η>0,η为正则化参数,K为Z的维度,H (M)为解码器网络第M层输出,M为编码器和解码器网络的层数。
    Where Z is the image hash code output by the decoder,
    Figure PCTCN2018105534-appb-100018
    Is the Frobenius norm, η> 0, η is the regularization parameter, K is the dimension of Z, H (M) is the M layer output of the decoder network, and M is the number of layers of the encoder and decoder networks .
  18. 一种图像检索装置,其中,该装置包括:An image retrieval device, wherein the device includes:
    如权利要求10-17中任一项所述的从图像中提取哈希码的装置,适于计算图像库中的各图像的哈希码以及计算检索图像的哈希码;The apparatus for extracting a hash code from an image according to any one of claims 10-17, adapted to calculate a hash code of each image in an image library and calculate a hash code of a retrieved image;
    哈希码相似度计算单元,适于计算检索图像的哈希码与图像库中的各图像的哈希码的相似度,输出相似度最高的一个或多个图像。The hash code similarity calculation unit is adapted to calculate the similarity between the hash code of the retrieved image and the hash code of each image in the image library, and output one or more images with the highest similarity.
  19. 一种电子设备,其特征在于,所述电子设备包括:处理器,以及存储有可在处理器上运行的计算机程序的存储器;An electronic device, characterized in that the electronic device includes a processor and a memory storing a computer program executable on the processor;
    其中,所述处理器,用于在执行所述存储器中的计算机程序时执行权利要求1-9中任一项所述的方法。The processor is configured to execute the method according to any one of claims 1 to 9 when executing a computer program in the memory.
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1-9中任一项所述的方法。A computer-readable storage medium having stored thereon a computer program, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1-9 is implemented.
PCT/CN2018/105534 2018-07-12 2018-09-13 Method and apparatus for extracting hash code from image, and image retrieval method and apparatus WO2020010691A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810765292.6A CN109325140B (en) 2018-07-12 2018-07-12 Method and device for extracting hash code from image and image retrieval method and device
CN201810765292.6 2018-07-12
CN201810766031.6 2018-07-12
CN201810766031.6A CN109145132B (en) 2018-07-12 2018-07-12 Method and device for extracting hash code from image and image retrieval method and device

Publications (1)

Publication Number Publication Date
WO2020010691A1 true WO2020010691A1 (en) 2020-01-16

Family

ID=69143051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/105534 WO2020010691A1 (en) 2018-07-12 2018-09-13 Method and apparatus for extracting hash code from image, and image retrieval method and apparatus

Country Status (1)

Country Link
WO (1) WO2020010691A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209851A (en) * 2019-06-10 2019-09-06 北京字节跳动网络技术有限公司 Model training method, device, electronic equipment and storage medium
CN115080781A (en) * 2022-05-24 2022-09-20 同济大学 Class imbalance image hierarchical retrieval method based on deep hash

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040025025A1 (en) * 1999-10-19 2004-02-05 Ramarathnam Venkatesan System and method for hashing digital images
CN105930440A (en) * 2016-04-19 2016-09-07 中山大学 Large-scale quick retrieval method of pedestrian image on the basis of cross-horizon information and quantization error encoding
US20180101742A1 (en) * 2016-10-07 2018-04-12 Noblis, Inc. Face recognition and image search system using sparse feature vectors, compact binary vectors, and sub-linear search
US20180107902A1 (en) * 2016-10-16 2018-04-19 Ebay Inc. Image analysis and prediction based visual search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040025025A1 (en) * 1999-10-19 2004-02-05 Ramarathnam Venkatesan System and method for hashing digital images
CN105930440A (en) * 2016-04-19 2016-09-07 中山大学 Large-scale quick retrieval method of pedestrian image on the basis of cross-horizon information and quantization error encoding
US20180101742A1 (en) * 2016-10-07 2018-04-12 Noblis, Inc. Face recognition and image search system using sparse feature vectors, compact binary vectors, and sub-linear search
US20180107902A1 (en) * 2016-10-16 2018-04-19 Ebay Inc. Image analysis and prediction based visual search

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, JIEYUAN: "Research on Image Retrieval Method Combining Hash Coding and Deep Learning", INFORMATION & TECHNOLOGY, CHINA MASTER'S THESES FULL-TEXT DATABASE, 15 April 2018 (2018-04-15), pages 29 - 56 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209851A (en) * 2019-06-10 2019-09-06 北京字节跳动网络技术有限公司 Model training method, device, electronic equipment and storage medium
CN110209851B (en) * 2019-06-10 2021-08-20 北京字节跳动网络技术有限公司 Model training method and device, electronic equipment and storage medium
CN115080781A (en) * 2022-05-24 2022-09-20 同济大学 Class imbalance image hierarchical retrieval method based on deep hash

Similar Documents

Publication Publication Date Title
CN108052512B (en) Image description generation method based on depth attention mechanism
TW202117577A (en) Machine learning system and method to generate structure for target property
CN113963165B (en) Small sample image classification method and system based on self-supervision learning
Steingrimsson et al. Deep learning for survival outcomes
Jia et al. Adaptive neighborhood propagation by joint L2, 1-norm regularized sparse coding for representation and classification
WO2020010691A1 (en) Method and apparatus for extracting hash code from image, and image retrieval method and apparatus
CN115482418A (en) Semi-supervised model training method, system and application based on pseudo negative label
CN115599984B (en) Retrieval method
CN114925767A (en) Scene generation method and device based on variational self-encoder
CN117349494A (en) Graph classification method, system, medium and equipment for space graph convolution neural network
CN113592008B (en) System, method, device and storage medium for classifying small sample images
CN108805280B (en) Image retrieval method and device
CN109325140B (en) Method and device for extracting hash code from image and image retrieval method and device
CN117671673B (en) Small sample cervical cell classification method based on self-adaptive tensor subspace
CN112836007B (en) Relational element learning method based on contextualized attention network
CN114003900A (en) Network intrusion detection method, device and system for secondary system of transformer substation
Leke et al. Proposition of a theoretical model for missing data imputation using deep learning and evolutionary algorithms
EP4421655A1 (en) Co-training method for models, and related apparatus
CN109145132B (en) Method and device for extracting hash code from image and image retrieval method and device
CN107944045B (en) Image search method and system based on t distribution Hash
CN111797732B (en) Video motion identification anti-attack method insensitive to sampling
Bogetoft et al. Statistical Analysis in dea
CN112836065A (en) Prediction method of graph convolution knowledge representation learning model ComSAGCN based on combination self-attention
Gkillas et al. Resource efficient federated learning for deep anomaly detection in industrial IoT applications
Wang et al. Meta-Probability weighting for improving reliability of DNNs to label noise

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18926006

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18926006

Country of ref document: EP

Kind code of ref document: A1