CN109299306B - Image retrieval method and device - Google Patents

Image retrieval method and device Download PDF

Info

Publication number
CN109299306B
CN109299306B CN201811533583.9A CN201811533583A CN109299306B CN 109299306 B CN109299306 B CN 109299306B CN 201811533583 A CN201811533583 A CN 201811533583A CN 109299306 B CN109299306 B CN 109299306B
Authority
CN
China
Prior art keywords
image
feature vector
neural network
network model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811533583.9A
Other languages
Chinese (zh)
Other versions
CN109299306A (en
Inventor
张勇
朱立松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cntv Wuxi Co ltd
Original Assignee
Cntv Wuxi Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cntv Wuxi Co ltd filed Critical Cntv Wuxi Co ltd
Priority to CN201811533583.9A priority Critical patent/CN109299306B/en
Publication of CN109299306A publication Critical patent/CN109299306A/en
Application granted granted Critical
Publication of CN109299306B publication Critical patent/CN109299306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an image retrieval method and an image retrieval device, wherein the image retrieval method comprises the following steps: s10, constructing a neural network model; s20, training the neural network model for the first time by using the training set; s30 randomly selecting two similar images from the training set as input, carrying out secondary training on the neural network model by using an adjacent component minimization method and a back propagation algorithm, and dividing the feature vectors of the images into a first-stage feature vector and a second-stage feature vector; s40, inputting the image to be searched to the neural network model to obtain a first-stage feature vector and a second-stage feature vector of the image to be searched; s50, by calculating the distance between the first-stage feature vector of the image to be retrieved and the image in the database, the image similar to the image to be retrieved is searched in the database, thereby greatly accelerating the matching speed in the image searching process, simultaneously improving the searching precision, and obtaining excellent performance by performing manual design without depending on the experience of technical personnel.

Description

Image retrieval method and device
Technical Field
The invention relates to the technical field of computer vision, in particular to an image retrieval method and device.
Background
Image data belongs to typical unstructured data, and at present, query, retrieval, similarity comparison and the like of image data in a database have certain difficulties because of the following reasons: 1) the image data has high dimensionality, the resolution of a general high-definition image can reach about 200 ten thousand pixels, and the resolution of an ultra-definition image can reach 800 ten thousand pixels; 2) the semantics contained in the image are difficult to directly acquire from the image data, for example, if an image contains a car, the image semantics can be easily observed by a person when the person watches the image, but the computer is difficult to acquire, and the specific semantics of the car contained in the image can be recognized only through algorithms such as artificial intelligence and the like.
In order to make the image easier to be queried, retrieved and compared, it is a common method to extract the image features, for example, a Scale-invariant feature transform (SIFT) algorithm or a SURF algorithm (a modification of the SIFT algorithm) is used to extract the local feature points of the image, both algorithms are descriptions of the distribution of pixel point values in the local region of the feature points, for example, each feature point of the SIFT feature corresponds to a 128-bit description vector, and each feature point of the SURF feature corresponds to a 64-bit description vector. However, the dimensionality of the feature vector calculated by the feature algorithm such as the SURF feature or the SIFT feature is still high, and the requirement on the aspect of fast image retrieval cannot be met.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an image retrieval method and device, which effectively solve the technical problem that the rapid image retrieval cannot be realized in the prior art.
In order to achieve the purpose, the invention is realized by the following technical scheme:
an image retrieval method, comprising:
s10, constructing a neural network model, which comprises an input layer positioned at the data input side of the model, a characteristic output layer positioned in the middle of the model and a model output layer positioned at the output side of the model, wherein the neural network model is symmetrically arranged along the characteristic output layer positioned in the middle, the number of neurons from the input layer to the characteristic output layer is gradually reduced, and the number of neurons from the characteristic output layer to the model output layer is gradually increased;
s20, training the neural network model for the first time by using a training set, recovering and obtaining an image input by an input layer on a model output layer, and updating the weight connected with each layer of neural network and the feature vector of the image output from a feature output layer;
s30 randomly selecting two similar images from a training set as input, carrying out secondary training on the neural network model by using an adjacent component minimization method and a back propagation algorithm, and dividing the feature vector of the image into a first-level feature vector and a second-level feature vector, wherein the first-level feature vector consists of the same features of the two images, and the second-level feature vector consists of features different from the other image, so as to complete the training of the neural network model;
s40 reserving the part from the input layer to the feature output layer in the neural network model as the neural network model for image retrieval, and inputting the image to be retrieved to be inquired into the neural network model to obtain the first-stage feature vector and the second-stage feature vector of the image to be retrieved;
s50, finding out the image similar to the image to be searched in the database by calculating the distance between the first-stage characteristic vector of the image to be searched and the image in the database.
Further preferably, in step S20, the neural network model is trained for the first time according to the objective function O, and the weight connected to each layer of neural network and the feature vector of the image output from the feature output layer are updated;
Figure GDA0003155760870000021
wherein, yiIs the output vector of the model, xiAnd Q is the number of images in the training set.
Further preferably, in step S30, the objective function O is propagated backward from the model output layer1Updating the weight of each layer of neural network; when propagating backward to the feature output layer, the objective function O is superimposed2The gradient information of (2) continues to be propagated;
Figure GDA0003155760870000022
wherein, the lambda is a weight parameter, and the lambda belongs to (0, 1);
O2=maxλONCA
Figure GDA0003155760870000023
wherein x isiVector, x, representing image ijVector representing image j, c (x)i) Indicates the category to which the image i belongs, c (x)j) Indicates the class, p, to which the image j belongsijRepresenting the probability that image i will have image j as a neighbor,
Figure GDA0003155760870000031
the image q is an image different from the image i in the database, d (i, j) represents the Euclidean distance between the image i and the first-level feature vector of the image j, and d (i, j) ═ d (x)i,xj)=||f(xi)-f(xj)||2,f(xi) The first level feature vector, f (x), representing image ij) Representing the first level feature vector of image j.
Further preferably, in step S50, the method includes:
s51, calculating the Euclidean distance between the first-level feature vector of the image to be retrieved and the first-level feature vector of the image in the database;
s52, comparing the calculation results to obtain an image similar to the image to be retrieved in the database;
s53 judges whether or not there are a plurality of images similar to the image to be retrieved in the database, and if so,
s54, calculating the Euclidean distance between the second-level feature vector of the image to be retrieved and the second-level feature vector of the similar image in the database;
s55, comparing the calculation results to obtain the image most similar to the image to be searched in the database.
Further preferably, the neural network model is a binary neural network, and in step S50, a hamming distance between the first-level feature vector and the image in the database is calculated.
The present invention also provides an image retrieval apparatus comprising:
the network model building module is used for building a neural network model and comprises an input layer positioned at the data input side of the model, a characteristic output layer positioned in the middle of the model and a model output layer positioned at the output side of the model, wherein the neural network model is symmetrically arranged along the characteristic output layer positioned in the middle, the number of neurons from the input layer to the characteristic output layer is gradually reduced, and the number of neurons from the characteristic output layer to the model output layer is gradually increased;
the network model training module is used for carrying out first training on the neural network model constructed by the network model construction module by using a training set, recovering and obtaining an image input by an input layer on a model output layer, and updating the weight connected with each layer of neural network and the feature vector of the image output from a feature output layer; randomly selecting two similar images from a training set as input, carrying out second training on the neural network model by using an adjacent component minimization device and a back propagation algorithm, and dividing the feature vector of the image into a first-stage feature vector and a second-stage feature vector, wherein the first-stage feature vector consists of the same features of the two images, and the second-stage feature vector consists of features different from the other image, so as to finish the training of the neural network model;
the neural network model is used for outputting and obtaining a first-stage feature vector and a second-stage feature vector of the image to be retrieved according to the input image to be retrieved;
and the retrieval module is used for searching the image similar to the image to be retrieved in the database by calculating the distance between the first-stage characteristic vector of the image to be retrieved output by the neural network model and the image in the database.
Further preferably, in the network model training module, the neural network model is trained for the first time according to the target function O, and the weight connected to each layer of neural network and the feature vector of the image output from the feature output layer are updated;
Figure GDA0003155760870000041
wherein, yiIs the output vector of the model, xiAnd Q is the number of images in the training set.
Further preferably, the slave model is used in the second training of the neural network model by the network model training moduleOutput layer start back propagation objective function O1Updating the weight of each layer of neural network; when propagating backward to the feature output layer, the objective function O is superimposed2The gradient information of (2) continues to be propagated;
Figure GDA0003155760870000042
wherein, the lambda is a weight parameter, and the lambda belongs to (0, 1);
O2=maxλONCA
Figure GDA0003155760870000043
wherein x isiVector, x, representing image ijVector representing image j, c (x)i) Indicates the category to which the image i belongs, c (x)j) Indicates the class, p, to which the image j belongsijRepresenting the probability that image i will have image j as a neighbor,
Figure GDA0003155760870000044
the image q is an image different from the image i in the database, d (i, j) represents the Euclidean distance between the image i and the first-level feature vector of the image j, and d (i, j) ═ d (x)i,xj)=||f(xi)-f(xj)||2,f(xi) The first level feature vector, f (x), representing image ij) Representing the first level feature vector of image j.
Further preferably, in the retrieval module, the method includes:
the calculation unit is used for calculating a first Euclidean distance between a first-level feature vector of the image to be retrieved and a first-level feature vector of the image in the database; when a plurality of images similar to the image to be retrieved exist in the database, further calculating a second Euclidean distance between a second-level feature vector of the image to be retrieved and a second-level feature vector of the similar image in the database;
the comparison unit is used for comparing the first Euclidean distance obtained by calculation of the calculation unit to obtain an image similar to the image to be retrieved in the database; and the second Euclidean distance comparison unit is used for comparing the second Euclidean distance obtained by calculation, and obtaining the image which is most similar to the image to be retrieved in the database.
Further preferably, the neural network model is a binary neural network, and in the retrieval module, the hamming distance between the first-level feature vector and the image in the database is calculated.
In the image retrieval method and the device, a symmetrical neural network model is constructed, and the feature vector of the image is divided into a first-level feature vector and a second-level feature vector through training, wherein the first-level feature vector consists of the same features of two images, and the second-level feature vector consists of features different from the other image. Therefore, in the process of retrieving the image, the image similar to the image can be found by calculating the first-level feature vector of the image or the distance between the first-level feature vector and the second-level feature vector, the matching speed in the image searching process is greatly accelerated, the searching precision is improved, and the image searching method can obtain very good performance by performing manual design without depending on the experience of technicians.
Drawings
A more complete understanding of the present invention, and the attendant advantages and features thereof, will be more readily understood by reference to the following detailed description when considered in conjunction with the accompanying drawings wherein:
FIG. 1 is a schematic flow chart of an image retrieval method according to the present invention;
FIG. 2 is a schematic diagram of a neural network model constructed in the present invention;
FIG. 3 is a schematic diagram of an image retrieving device according to the present invention.
Reference numerals:
100-image retrieval device, 110-network model construction module, 120-network model training module, 130-neural network model and 140-retrieval module.
Detailed Description
In order to make the contents of the present invention more comprehensible, the present invention is further described below with reference to the accompanying drawings. The invention is of course not limited to this particular embodiment, and general alternatives known to those skilled in the art are also covered by the scope of the invention.
As shown in fig. 1, which is a schematic flow chart of the image retrieval method provided by the present invention, it can be seen that the image retrieval method includes:
s10 a neural network model is built, the neural network model comprises an input layer located at the data input side of the model, a characteristic output layer located in the middle of the model and a model output layer located at the output side of the model, the neural network model is symmetrically arranged along the characteristic output layer located in the middle, the number of neurons from the input layer to the characteristic output layer is gradually reduced, and the number of neurons from the characteristic output layer to the model output layer is gradually increased;
s20, training the neural network model for the first time by using a training set, recovering and obtaining the image input by the input layer on the model output layer, updating the weight connected with each layer of neural network and obtaining the feature vector of the image output from the feature output layer;
s30 randomly selecting two similar images from the training set as input, carrying out second training on the neural network model by using an adjacent component minimization method and a back propagation algorithm, and dividing the feature vector of the image into a first-level feature vector and a second-level feature vector, wherein the first-level feature vector consists of the same features of the two images, and the second-level feature vector consists of features different from the other image, so as to complete the training of the neural network model;
s40 reserving the part from the input layer to the feature output layer in the neural network model as the neural network model for image retrieval, and inputting the image to be retrieved to be inquired into the neural network model to obtain the first-stage feature vector and the second-stage feature vector of the image to be retrieved;
s50, finding out the image similar to the image to be searched in the database by calculating the distance between the first-stage characteristic vector of the image to be searched and the image in the database.
Fig. 2 shows a constructed neural network model, which includes an input layer, a first hidden layer, a feature output layer, a second hidden layer, and a model output layer, and the input layer and the model output layer are symmetrically disposed along the feature output layer located in the middle, that is, the input layer corresponds to the model output layer, each layer in the first hidden layer and each layer in the second hidden layer are symmetrically disposed about the feature output layer, and specifically, the number of neuron nodes in each layer is the same, for example, in an example, the first hidden layer includes 3 hidden layers, and the number of neurons in the input layer and the 3 hidden layers is in sequence [ 3072500400300 ], and the number of neurons in the feature output layer is 100, and the number of neurons in the corresponding second hidden layer (also including the 3 hidden layers) and the model output layer is [ 3004005003072 ].
In constructing the neural network model, the image is first processed to reduce the resolution of the image to a smaller size, e.g., to 32x 32. Considering a color image having three components YUV (or RGB) and encoded using 8-bit pixels, a 32x32 sized image consists of between 3072 integers 0,255. Then, the 3072-dimensional vector is divided by 255 and normalized to a real number between [0,1], and is input into the neural network model as the state of the neuron of the input layer, and then the neuron state value of the hidden layer is sequentially calculated according to formula (1):
Figure GDA0003155760870000071
where vector x represents the state of the input layer neurons, table W represents the weight matrix from the input layer to the first one of the first hidden layers (including multiple hidden layers), b represents the bias values for the first hidden layer, and y represents the state of the first hidden layer. After the state value of the first hidden layer is obtained, states of a second layer, a third layer, …, a feature output layer, a second hidden layer and a model output layer in the first hidden layer are sequentially calculated by using the same method, wherein the feature output layer comprises N neurons, and N real numbers output by the neurons are N-dimensional feature vectors of the image. It should be noted that the second hidden layer and the model output layer in the model are only used in the process of training the neural network, and after the training is completed, the second hidden layer and the model output layer may be removed from the neural network model, and the input layer, the first hidden layer and the feature output layer are retained to retrieve the image.
In the process of training the constructed neural network model, firstly, a training set is used for carrying out first training on the neural network model:
assume that the training set contains a total of Q images, each image corresponding to [0,1] of a particular dimension (e.g., 3072)]The real number vector between, noted as { xiTraining a neural network model according to an objective function O shown in a formula (2), updating weights connected with each layer of neural network, and outputting a feature vector of an obtained image from a feature output layer;
Figure GDA0003155760870000072
wherein, yiIs the output vector of the model (corresponding to the output of the output layer of the model), xiIs the input vector corresponding to the image of the input model (corresponding to the input of the input layer).
As can be seen from the objective function O, the first training is aimed at the output of the model output layer being as identical as possible to the input of the input layer, i.e. the image from which the input was retrieved is restored in the model output layer. If the objective function O is compressed to zero in one training, it means that the model output layer can output the image input by the neural network model, that is, the feature vector (including N features) output by the feature output layer contains all the information of the image. For the selection of the target function, other functions can be selected according to actual conditions as long as the purpose of the invention can be achieved, for example, a cross entropy function can also be selected.
After the neural network model is trained for the first time, the neural network model is trained for the second time by using an adjacent component minimization method and a back propagation algorithm, so that similar images have the same first-stage feature vectors as much as possible. Assume that all Q images in the training set have a labeled class, i.e. { xi,c(xi) In which c (x)i) E {1, 2.., C } represents a graphImages belong to the category and images with the same label are considered similar. In addition, the first K real numbers of N real numbers output in the feature output layer are defined as first-stage feature vectors, and are recorded as f (x)i)=fk(xi),k=1,...,K。
In the training process, a back propagation algorithm is used, and the back propagation is started from the model output layer to obtain an objective function O of the formula (3)1And updating the weight of each layer of neural network, and minimizing the objective function O by adjusting the weight1
Figure GDA0003155760870000081
When propagating backwards to the feature output layer, the objective function O as in equations (4) and (5) is superimposed2The gradient information of (a) is continuously propagated, in the process, the lambda O is maximized by adjusting the network weightNCA(relevant only to the first K neurons in the feature output layer);
O2=maxλONCA (4)
Figure GDA0003155760870000082
wherein, lambda is weight parameter, and an objective function O is determined1And an objective function O2λ ∈ (0, 1); x is the number ofiVector, x, representing image ijVector representing image j, c (x)i) Indicates the category to which the image i belongs, c (x)j) Indicates the class, p, to which the image j belongsijThe probability that image i has image j as a neighbor is expressed by equation (6):
Figure GDA0003155760870000091
wherein, the image q is an image different from the image i in the database, and piiD (i, j) represents the euclidean distance between image i and the first level feature vector of image j, and d (i, j) is d (x)i,xj)=||f(xi)-f(xj)||2,f(xi) The first level feature vector, f (x), representing image ij) Representing the first level feature vector of image j.
The probability of determining that the image i belongs to the class c is as follows (7):
Figure GDA0003155760870000092
the training of the neural network model is completed in this way, and N real numbers output by the feature output layer after the second training is completed are divided into two stages, wherein the first-stage feature vector contains K real numbers, the K feature real numbers of two similar images are identical under an ideal condition, and the Euclidean distance between the first-stage feature vectors of the two images is calculated to compare in practical application; the second-level feature vector contains N-K real numbers, which are the features of the image itself (features different from other images in the similar set). Then, reserving a part from an input layer to a feature output layer in the neural network model as the neural network model for image retrieval, and when an image which is the same as or similar to the image to be retrieved needs to be queried in a database, inputting the image to be retrieved which needs to be queried into the neural network model to obtain a first-stage feature vector and a second-stage feature vector of the image to be retrieved; then, calculating the Euclidean distance between the first-stage feature vector of the image to be retrieved and the first-stage feature vector of the image in the database; and selecting the image with the minimum Euclidean distance as a similar image according to the calculation result, and completing the retrieval of the image. In practical application, the Euclidean distance between the first-level feature vector of the image to be retrieved and the first-level feature vector of the image in the database is calculated and compared to possibly obtain a large number of similar image sets, at the moment, the Euclidean distance between the second-level feature vector of the image to be retrieved and the second-level feature vector of the similar image in the database is further calculated, and the image which is most similar to the image to be retrieved in the database can be obtained through comparison. In another example, the neural network model is a binary neural network, and in step S50, the hamming distance between the first-level feature vector and the image in the database is calculated, thereby completing the data retrieval.
As shown in fig. 3, which is a schematic structural diagram of the image retrieval apparatus provided by the present invention, it can be seen that the image retrieval apparatus 100 includes: the neural network model comprises a network model building module 110, a network model training module 120, a neural network model 130 and a retrieval module 140, wherein the network model building module 110 is used for building the neural network model 130 and comprises an input layer positioned at the input side of model data, a characteristic output layer positioned in the middle of the model and a model output layer positioned at the output side of the model, the neural network model 130 is symmetrically arranged along the characteristic output layer positioned in the middle, the number of neurons from the input layer to the characteristic output layer is gradually reduced, and the number of neurons from the characteristic output layer to the model output layer is gradually increased; the network model training module 120 is configured to perform first training on the neural network model constructed by the network model construction module 110 by using a training set, recover and obtain an image input by the input layer on the model output layer, and update weights connected to each layer of the neural network and feature vectors of the image output from the feature output layer; randomly selecting two similar images from a training set as input, carrying out second training on the neural network model by using an adjacent component minimization method and a back propagation algorithm, and dividing the feature vector of the image into a first-stage feature vector and a second-stage feature vector, wherein the first-stage feature vector consists of the same features of the two images, and the second-stage feature vector consists of features different from the other image, so as to finish the training of the neural network model; the neural network model 130 is used for obtaining a first-level feature vector and a second-level feature vector of the image to be retrieved according to the input image to be retrieved; the retrieval module 140 is configured to search for an image similar to the image to be retrieved in the database by calculating a distance between the first-level feature vector of the image to be retrieved output by the neural network model 130 and the image in the database.
As shown in fig. 2, the neural network model 130 constructed by the network model construction module 110 includes an input layer, a first hidden layer, a feature output layer, a second hidden layer and a model output layer, which are symmetrically disposed along the feature output layer located in the middle, that is, the input layer corresponds to the model output layer, each layer in the first hidden layer and each layer in the second hidden layer are symmetrically disposed about the feature output layer, specifically, the number of neuron nodes in each layer is the same, for example, in an example, the first hidden layer includes 3 hidden layers, and the number of neurons in the input layer and the 3 hidden layers is in sequence [ 3072500400300 ], the number of neurons in the feature output layer is 100, and the number of neurons in the corresponding second hidden layer (also including the 3 hidden layers) and the model output layer is [ 3004005003072 ].
The network model building module 110 first processes the image to reduce the resolution of the image to a smaller size, e.g., to reduce the resolution of the picture to 32x32, in building the neural network model 130. Considering a color image having three components YUV (or RGB) and encoded using 8-bit pixels, a 32x32 sized image consists of between 3072 integers 0,255. Then, the 3072-dimensional vector is divided by 255 and normalized to a real number between [0,1], and is input to the neural network model 130 as the state of the neuron of the input layer, and then the neuron state value of the hidden layer is sequentially calculated according to formula (1):
Figure GDA0003155760870000111
where vector x represents the state of the input layer neurons, table W represents the weight matrix from the input layer to the first one of the first hidden layers (including multiple hidden layers), b represents the bias values for the first hidden layer, and y represents the state of the first hidden layer. After the state value of the first hidden layer is obtained, states of a second layer, a third layer, …, a feature output layer, a second hidden layer and a model output layer in the first hidden layer are sequentially calculated by using the same method, wherein the feature output layer comprises N neurons, and N real numbers output by the neurons are N-dimensional feature vectors of the image. It should be noted that the second hidden layer and the model output layer in the model are only used in the process of training the neural network, and after the training is completed, the second hidden layer and the model output layer may be removed from the neural network model, and the input layer, the first hidden layer and the feature output layer are retained to retrieve the image.
In the process of training the constructed neural network model 130 by using the network model training module 120, firstly, the neural network model is trained for the first time by using a training set:
assume that the training set contains a total of Q images, each image corresponding to [0,1] of a particular dimension (e.g., 3072)]The real number vector between, noted as { xiQ, and according to an objective function
Figure GDA0003155760870000112
Training the neural network model, updating the weight of each layer of neural network connection and obtaining the feature vector of the image output from the feature output layer, wherein yiIs the output vector of the model (corresponding to the output of the output layer of the model), xiIs the input vector corresponding to the image of the input model (corresponding to the input of the input layer).
As can be seen from the objective function O, the first training is aimed at the output of the model output layer being as identical as possible to the input of the input layer, i.e. the image from which the input was retrieved is restored in the model output layer. If the objective function O is compressed to zero in one training, it means that the model output layer can output the image input by the neural network model 130, that is, the feature vector (including N features) output by the feature output layer contains all the information of the image. For the selection of the target function, other functions can be selected according to actual conditions as long as the purpose of the invention can be achieved, for example, a cross entropy function can also be selected.
After the first training of the neural network model 130, the neural network model is trained for the second time by using the neighbor component minimization method and the back propagation algorithm, so that similar images have the same first-stage feature vectors as much as possible. Assume that all Q images in the training set have a labeled class, i.e. { xi,c(xi) In which c (x)i) E {1, 2., C } represents the category to which the image belongs, and images with the same label are considered similar; in addition, the first K real numbers of N real numbers output in the feature output layer are defined as first-stage feature vectors, and are recorded as f (x)i)=fk(xi),k=1,...,K。
In the training process, the target function which is propagated reversely from the model output layer by using a back propagation algorithm
Figure GDA0003155760870000121
And updating the weight of each layer of neural network, and minimizing the objective function O by adjusting the weight1(ii) a When propagating backwards to the feature output layer, O is superimposed2=maxλONCAThe gradient information of (a) is continuously propagated, in the process, the lambda O is maximized by adjusting the network weightNCA(only related to the first K neurons in the feature output layer). In particular, the method comprises the steps of,
Figure GDA0003155760870000122
lambda is weight parameter, and an objective function O is determined1And an objective function O2λ ∈ (0, 1); x is the number ofiVector, x, representing image ijVector representing image j, c (x)i) Indicates the category to which the image i belongs, c (x)j) Indicates the class, p, to which the image j belongsijRepresenting the probability that image i will have image j as a neighbor,
Figure GDA0003155760870000123
wherein, the image q is an image different from the image i in the database, and piiD (i, j) represents the euclidean distance between image i and the first level feature vector of image j, and d (i, j) is d (x)i,xj)=||f(xi)-f(xj)||2,f(xi) The first level feature vector, f (x), representing image ij) Representing the first level feature vector of image j. In addition, the probability that the image i belongs to the category c is determined as
Figure GDA0003155760870000124
Thus, the training of the neural network model 130 is completed, and after the second training is completed, the N real numbers output by the feature output layer are divided into two stages, wherein the first-stage feature vector contains K real numbers, and the K feature real numbers of the two images which are similar under an ideal condition are identical, and are compared by calculating the euclidean distance between the first-stage feature vectors of the two images in practical application; the second-level feature vector contains N-K real numbers, which are the features of the image itself (features different from other images in the similar set). Then, the part from the input layer to the feature output layer in the neural network model 130 is reserved as the neural network model 130 for image retrieval, when an image which is the same as or similar to the image to be retrieved needs to be queried in the database through the retrieval module 140, the image to be retrieved which needs to be queried is input into the neural network model 130, and a first-stage feature vector and a second-stage feature vector of the image to be retrieved are obtained; then, the calculation unit calculates the Euclidean distance between the first-level feature vector of the image to be retrieved and the first-level feature vector of the image in the database; and the comparison unit selects the image with the minimum Euclidean distance as a similar image according to the calculation result, and completes the retrieval of the image. In practical application, the Euclidean distance between the first-level feature vector of the image to be retrieved and the first-level feature vector of the image in the database is calculated and compared to possibly obtain a large number of similar image sets, at the moment, the calculating unit further calculates the Euclidean distance between the second-level feature vector of the image to be retrieved and the second-level feature vector of the similar image in the database, and the comparing unit can obtain the image which is most similar to the image to be retrieved in the database through comparison. In another example, the neural network model 130 is a binary neural network, and in the retrieval module 140, the hamming distance between the first-level feature vector and the image in the database is calculated, so as to complete the data retrieval.

Claims (6)

1. An image retrieval method, comprising:
s10, constructing a neural network model, which comprises an input layer positioned at the data input side of the model, a characteristic output layer positioned in the middle of the model and a model output layer positioned at the output side of the model, wherein the neural network model is symmetrically arranged along the characteristic output layer positioned in the middle, the number of neurons from the input layer to the characteristic output layer is gradually reduced, and the number of neurons from the characteristic output layer to the model output layer is gradually increased;
s20, training the neural network model for the first time by using a training set, recovering and obtaining an image input by an input layer on a model output layer, and updating the weight connected with each layer of neural network and the feature vector of the image output from a feature output layer;
s30 randomly selecting two similar images from a training set as input, carrying out secondary training on the neural network model by using an adjacent component minimization method and a back propagation algorithm, and dividing the feature vector of the image into a first-level feature vector and a second-level feature vector, wherein the first-level feature vector consists of the same features of the two images, and the second-level feature vector consists of features different from the other image, so as to complete the training of the neural network model;
s40 reserving the part from the input layer to the feature output layer in the neural network model as the neural network model for image retrieval, and inputting the image to be retrieved to be inquired into the neural network model to obtain the first-stage feature vector and the second-stage feature vector of the image to be retrieved;
s50, searching an image similar to the image to be retrieved in the database by calculating the distance between the first-stage feature vector of the image to be retrieved and the image in the database;
in step S20, training the neural network model for the first time according to the objective function O, updating weights of connections of each layer of neural network, and obtaining feature vectors of the image output from the feature output layer;
Figure FDA0003155760860000011
wherein, yiIs the output vector of the model, xiInputting a corresponding input vector of the image of the model, wherein Q is the number of the images in the training set;
in step S30, the objective function O is propagated backward from the model output layer1Updating the weight of each layer of neural network; when propagating backward to the feature output layer, the objective function O is superimposed2Continuation of gradient information ofSpreading;
Figure FDA0003155760860000012
wherein, the lambda is a weight parameter, and the lambda belongs to (0, 1);
O2=maxλONCA
Figure FDA0003155760860000021
wherein x isiVector, x, representing image ijVector representing image j, c (x)i) Indicates the category to which the image i belongs, c (x)j) Indicates the class, p, to which the image j belongsijRepresenting the probability that image i will have image j as a neighbor,
Figure FDA0003155760860000022
the image q is an image different from the image i in the database, d (i, j) represents the Euclidean distance between the image i and the first-level feature vector of the image j, and d (i, j) ═ d (x)i,xj)=||f(xi)-f(xj)||2,f(xi) The first level feature vector, f (x), representing image ij) Representing the first level feature vector of image j.
2. The image retrieval method according to claim 1, wherein in step S50, comprising:
s51, calculating the Euclidean distance between the first-level feature vector of the image to be retrieved and the first-level feature vector of the image in the database;
s52, comparing the calculation results to obtain an image similar to the image to be retrieved in the database;
s53 judges whether or not there are a plurality of images similar to the image to be retrieved in the database, and if so,
s54, calculating the Euclidean distance between the second-level feature vector of the image to be retrieved and the second-level feature vector of the similar image in the database;
s55, comparing the calculation results to obtain the image most similar to the image to be searched in the database.
3. The image retrieval method of claim 1, wherein the neural network model is a binary neural network, and in step S50, a hamming distance between the first-level feature vector and the image in the database is calculated.
4. An image retrieval apparatus, comprising:
the network model building module is used for building a neural network model and comprises an input layer positioned at the data input side of the model, a characteristic output layer positioned in the middle of the model and a model output layer positioned at the output side of the model, wherein the neural network model is symmetrically arranged along the characteristic output layer positioned in the middle, the number of neurons from the input layer to the characteristic output layer is gradually reduced, and the number of neurons from the characteristic output layer to the model output layer is gradually increased;
the network model training module is used for carrying out first training on the neural network model constructed by the network model construction module by using a training set, recovering and obtaining an image input by an input layer on a model output layer, and updating the weight connected with each layer of neural network and the feature vector of the image output from a feature output layer; randomly selecting two similar images from a training set as input, carrying out second training on the neural network model by using an adjacent component minimization device and a back propagation algorithm, and dividing the feature vector of the image into a first-stage feature vector and a second-stage feature vector, wherein the first-stage feature vector consists of the same features of the two images, and the second-stage feature vector consists of features different from the other image, so as to finish the training of the neural network model;
the neural network model is used for outputting and obtaining a first-stage feature vector and a second-stage feature vector of the image to be retrieved according to the input image to be retrieved;
the retrieval module is used for searching images similar to the images to be retrieved in the database by calculating the distance between the first-stage characteristic vector of the images to be retrieved output by the neural network model and the images in the database;
in a network model training module, performing first training on the neural network model according to an objective function O, updating weights connected with each layer of neural network and outputting feature vectors of the obtained image from a feature output layer;
Figure FDA0003155760860000031
wherein, yiIs the output vector of the model, xiInputting a corresponding input vector of the image of the model, wherein Q is the number of the images in the training set;
in the process of carrying out secondary training on the neural network model by the network model training module, a target function O is propagated reversely from the model output layer1Updating the weight of each layer of neural network; when propagating backward to the feature output layer, the objective function O is superimposed2The gradient information of (2) continues to be propagated;
Figure FDA0003155760860000032
wherein, the lambda is a weight parameter, and the lambda belongs to (0, 1);
O2=maxλONCA
Figure FDA0003155760860000033
wherein x isiVector, x, representing image ijVector representing image j, c (x)i) Indicates the category to which the image i belongs, c (x)j) Indicates the class, p, to which the image j belongsijRepresenting the probability that image i will have image j as a neighbor,
Figure FDA0003155760860000041
the image q is an image different from the image i in the database, d (i, j) represents the Euclidean distance between the image i and the first-level feature vector of the image j, and d (i, j) ═ d (x)i,xj)=||f(xi)-f(xj)||2,f(xi) The first level feature vector, f (x), representing image ij) Representing the first level feature vector of image j.
5. The image retrieval apparatus according to claim 4, wherein in the retrieval module, comprising:
the calculation unit is used for calculating a first Euclidean distance between a first-level feature vector of the image to be retrieved and a first-level feature vector of the image in the database; when a plurality of images similar to the image to be retrieved exist in the database, further calculating a second Euclidean distance between a second-level feature vector of the image to be retrieved and a second-level feature vector of the similar image in the database;
the comparison unit is used for comparing the first Euclidean distance obtained by calculation of the calculation unit to obtain an image similar to the image to be retrieved in the database; and the second Euclidean distance comparison unit is used for comparing the second Euclidean distance obtained by calculation, and obtaining the image which is most similar to the image to be retrieved in the database.
6. The image retrieval device of claim 4, wherein the neural network model is a binary neural network, and in the retrieval module, a Hamming distance between the first-level feature vector and the image in the database is calculated.
CN201811533583.9A 2018-12-14 2018-12-14 Image retrieval method and device Active CN109299306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811533583.9A CN109299306B (en) 2018-12-14 2018-12-14 Image retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811533583.9A CN109299306B (en) 2018-12-14 2018-12-14 Image retrieval method and device

Publications (2)

Publication Number Publication Date
CN109299306A CN109299306A (en) 2019-02-01
CN109299306B true CN109299306B (en) 2021-09-07

Family

ID=65141727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811533583.9A Active CN109299306B (en) 2018-12-14 2018-12-14 Image retrieval method and device

Country Status (1)

Country Link
CN (1) CN109299306B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640051B (en) * 2019-03-01 2023-07-25 浙江大学 Image processing method and device
CN113780304B (en) * 2021-08-09 2023-12-05 国网安徽省电力有限公司超高压分公司 Substation equipment image retrieval method and system based on neural network
CN113792339A (en) * 2021-09-09 2021-12-14 浙江数秦科技有限公司 Bidirectional privacy secret neural network model sharing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186538A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Image classification method, image classification device, image retrieval method and image retrieval device
CN106529395A (en) * 2016-09-22 2017-03-22 文创智慧科技(武汉)有限公司 Signature image recognition method based on deep brief network and k-means clustering
CN107463932A (en) * 2017-07-13 2017-12-12 央视国际网络无锡有限公司 A kind of method that picture feature is extracted using binary system bottleneck neutral net
WO2018170671A1 (en) * 2017-03-20 2018-09-27 Intel Corporation Topic-guided model for image captioning system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186538A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Image classification method, image classification device, image retrieval method and image retrieval device
CN106529395A (en) * 2016-09-22 2017-03-22 文创智慧科技(武汉)有限公司 Signature image recognition method based on deep brief network and k-means clustering
WO2018170671A1 (en) * 2017-03-20 2018-09-27 Intel Corporation Topic-guided model for image captioning system
CN107463932A (en) * 2017-07-13 2017-12-12 央视国际网络无锡有限公司 A kind of method that picture feature is extracted using binary system bottleneck neutral net

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Hakan Çevikalp等.Feature extraction with convolutional neural networks for aerial image retrieval.《2017 25th Signal Processing and Communications Applications Conference (SIU)》.2017, *

Also Published As

Publication number Publication date
CN109299306A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN109299306B (en) Image retrieval method and device
CN109740541B (en) Pedestrian re-identification system and method
CN110852168A (en) Pedestrian re-recognition model construction method and device based on neural framework search
CN107463932B (en) Method for extracting picture features by using binary bottleneck neural network
CN110738146A (en) target re-recognition neural network and construction method and application thereof
CN107315795B (en) The instance of video search method and system of joint particular persons and scene
CN111738048B (en) Pedestrian re-identification method
US9842279B2 (en) Data processing method for learning discriminator, and data processing apparatus therefor
US20220253977A1 (en) Method and device of super-resolution reconstruction, computer device and storage medium
CN108763295A (en) A kind of video approximate copy searching algorithm based on deep learning
CN109472282B (en) Depth image hashing method based on few training samples
JP2020173562A (en) Objection recognition system and objection recognition method
CN109255043B (en) Image retrieval method based on scene understanding
CN113011444B (en) Image identification method based on neural network frequency domain attention mechanism
JP5004743B2 (en) Data processing device
JP2004086737A (en) Method and device for similarity determination and program
KR100671099B1 (en) Method for comparing similarity of two images and method and apparatus for searching images using the same
CN111079585A (en) Image enhancement and pseudo-twin convolution neural network combined pedestrian re-identification method based on deep learning
JP6220737B2 (en) Subject area extraction apparatus, method, and program
CN116416649A (en) Video pedestrian re-identification method based on multi-scale resolution alignment
CN115641449A (en) Target tracking method for robot vision
CN109711454A (en) A kind of feature matching method based on convolutional neural networks
Amiri et al. Copy-move forgery detection using a bat algorithm with mutation
CN115457269A (en) Semantic segmentation method based on improved DenseNAS
Phookronghin et al. 2 Level simplified fuzzy ARTMAP for grape leaf disease system using color imagery and gray level co-occurrence matrix

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant