CN114898104A - Hash method and device for image features and processing equipment - Google Patents
Hash method and device for image features and processing equipment Download PDFInfo
- Publication number
- CN114898104A CN114898104A CN202210813030.9A CN202210813030A CN114898104A CN 114898104 A CN114898104 A CN 114898104A CN 202210813030 A CN202210813030 A CN 202210813030A CN 114898104 A CN114898104 A CN 114898104A
- Authority
- CN
- China
- Prior art keywords
- hash
- feature
- image
- network
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application provides a hashing method, a hashing device and processing equipment for image features, which are used for enhancing the spatial correlation among feature graphs of an input hashing layer, so that a hash code obtained by subsequent hashing has obviously improved precision. The method comprises the following steps: the processing equipment acquires an image to be processed; the processing equipment inputs an image to be processed into a deep hash network, wherein the deep hash network comprises a feature extraction module, a long-time dependence module and a hash layer, and in the working process of the deep hash network, the feature extraction module performs feature extraction on the input image to be processed to obtain a feature map as an image feature; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the Hash layer carries out Hash coding on the enhanced feature graph output by the long-time dependency module to obtain a Hash code; the processing device extracts the hash code output by the deep hash network.
Description
Technical Field
The present application relates to the field of images, and in particular, to a hashing method and apparatus for image features, and a processing device.
Background
The Hash method projects high-dimensional features into compact binary codes through a Hash function, so that the database can store more data, and meanwhile, the retrieval efficiency can be improved. Compared with the traditional method, the deep hash method can effectively learn the high-quality nonlinear hash function while extracting the deep features, so that the learned hash function can more effectively encode the extracted deep features, and the generated hash code can better represent the image features.
The deep hash method mainly comprises two steps: feature extraction and hash coding. The characteristic extraction extracts useful image characteristics in the input image to form a characteristic graph, and the Hash coding converts the extracted characteristic information into a Hash code through a Hash function.
In the existing research process of related technologies, the inventor finds that the existing deep hash network has the problem of limited feature extraction precision, which causes low precision of hash codes obtained by subsequent hash coding.
Disclosure of Invention
The application provides a hashing method, a hashing device and processing equipment for image features, which are used for enhancing the spatial correlation among feature graphs of an input hashing layer, so that a hash code obtained by subsequent hashing has obviously improved precision.
In a first aspect, the present application provides a hashing method for image features, including:
the processing equipment acquires an image to be processed;
the processing equipment inputs an image to be processed into a deep hash network, wherein the deep hash network comprises a feature extraction module, a long-time dependence module and a hash layer, and in the working process of the deep hash network, the feature extraction module performs feature extraction on the input image to be processed to obtain a feature map as an image feature; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the Hash layer carries out Hash coding on the enhanced feature graph output by the long-time dependency module and obtains a Hash code;
the processing device extracts the hash code output by the deep hash network.
With reference to the first aspect of the present application, in a first possible implementation manner of the first aspect of the present application, the long-time dependency module specifically employs a network structure of a Gated Round Unit (GRU).
With reference to the first aspect of the present application, in a second possible implementation manner of the first aspect of the present application, the hash layer specifically uses a network structure of three fully-connected layers to implement hash coding, the first two layers use a ReLU activation function, the last layer uses a hyperbolic tangent activation function, and an output of the last layer is a hash code.
With reference to the first aspect of the present application, in a third possible implementation manner of the first aspect of the present application, the feature extraction module specifically uses a network structure of a ResNet50 network, the feature extraction module is configured with 50 two-dimensional convolution layer operations including a convolution process and four residual blocks, a batch normalization layer is provided after each part to accelerate training speed, a rectifier unit avoids gradient disappearance, a maximum pooling layer realizes downsampling, and an output of the last residual block is subjected to average pooling.
With reference to the first aspect of the present application, in a fourth possible implementation manner of the first aspect of the present application, the method further includes:
the processing equipment carries out network training on the deep hash network through the sample image, and a central similarity loss function L is adopted in the training process C Pairwise similarity loss function L P And a quantization loss function L Q Training is carried out;
center similarity loss function L C For quantizing hash codesAnd corresponding hash centerHamming distance to maintain center similarity learning;
pairwise similarity loss function L P The Hamming distance of the hash codes of the data with only partial similar labels in the multi-label data set is quantized with the associated rows between the labels so as to keep the hash codes of the data pairs with the similar labels as close as possible;
quantization loss function L Q For assignment with maximum probabilityTo control the quality of the generated hash code.
With reference to the fourth possible implementation manner of the first aspect of the present application, in a fifth possible implementation manner of the first aspect of the present application, the central similarity loss function L C Is defined as follows:
wherein the content of the first and second substances,Nas to the number of samples,Kis the length of the binary hash code,a hash code is represented that is, in turn,a semantic hash center representing the sample;
pairwise similarity loss function L P Is defined as follows:
wherein the content of the first and second substances,in order to be a hyper-parameter,in order to be the length of the hash code,so thatThe value of (a) is not less than 0,a label vector code for the sample;
quantization loss function L Q Is defined as follows:
With reference to the fourth possible implementation manner of the first aspect of the present application, in a sixth possible implementation manner of the first aspect of the present application, the central similarity loss function L C Pairwise similarity loss function L P And a quantization loss function L Q The three are specifically trained by a combined loss function, which is defined as follows:
wherein the content of the first and second substances,is the set of all parameters learned by the deep hash function,andis a hyper-parameter obtained by a grid search.
In a second aspect, the present application provides an image feature hashing apparatus, including:
the acquisition unit is used for acquiring an image to be processed;
the system comprises a Hash unit, a processing unit and a processing unit, wherein the Hash unit is used for inputting an image to be processed into a deep Hash network, the deep Hash network comprises a feature extraction module, a long-time dependence module and a Hash layer, and in the working process of the deep Hash network, the feature extraction module carries out feature extraction on the input image to be processed to obtain a feature map as an image feature; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the Hash layer carries out Hash coding on the enhanced feature graph output by the long-time dependency module to obtain a Hash code;
and the extraction unit is used for extracting the hash code output by the deep hash network.
With reference to the second aspect of the present application, in a first possible implementation manner of the second aspect of the present application, the long-time dependent module specifically employs a network structure of a GRU.
With reference to the second aspect of the present application, in a second possible implementation manner of the second aspect of the present application, the hash layer specifically uses a network structure of three fully-connected layers to implement hash coding, the first two layers use a ReLU activation function, the last layer uses a hyperbolic tangent activation function, and an output of the last layer is a hash code.
With reference to the second aspect of the present application, in a third possible implementation manner of the second aspect of the present application, the feature extraction module specifically adopts a network structure of a ResNet50 network, the feature extraction module is configured with 50 two-dimensional convolution layer operations including a convolution process and four residual blocks, a batch normalization layer is provided after each part to accelerate training speed, a rectifier unit avoids gradient disappearance, a maximum pooling layer realizes downsampling, and an output of a last residual block is subjected to average pooling.
With reference to the second aspect of the present application, in a fourth possible implementation manner of the second aspect of the present application, the apparatus further includes a training unit, configured to:
performing network training on the deep hash network through the sample image, and adopting a central similarity loss function L in the training process C Pairwise similarity loss function L P And a quantization loss function L Q Training is carried out;
center similarity loss function L C For quantizing hash codesAnd corresponding hash centerHamming distance to maintain center similarity learning;
pairwise similarity loss function L P The Hamming distance of the hash codes of the data with only partial similar labels in the multi-label data set is quantized with the associated rows between the labels so as to keep the hash codes of the data pairs with the similar labels as close as possible;
quantization loss function L Q For assignment with maximum probabilityTo control the quality of the generated hash code.
With reference to the fourth possible implementation manner of the second aspect of the present application, in a fifth possible implementation manner of the second aspect of the present application, the central similarity loss function L C Is defined as follows:
wherein the content of the first and second substances,Nas to the number of samples,Kis the length of the binary hash code,a hash code is represented that is, in turn,semantic hashing to represent samplesA center;
pairwise similarity loss function L P Is defined as follows:
wherein the content of the first and second substances,in order to be a hyper-parameter,in order to be the length of the hash code,so thatThe value of (a) is not less than 0,a label vector code for the sample;
quantization loss function L Q Is defined as follows:
With reference to the fourth possible implementation manner of the second aspect of the present application, in a sixth possible implementation manner of the second aspect of the present application, the central similarity loss function L C Pairwise similarity loss function L P And a quantization loss function L Q The three are specifically trained by a combined loss function, which is defined as follows:
wherein the content of the first and second substances,is a set of all parameters learned by the deep hash function,andis a hyper-parameter obtained by a grid search.
In a third aspect, the present application provides a processing device, including a processor and a memory, where the memory stores a computer program, and the processor executes the method provided in the first aspect of the present application or any one of the possible implementation manners of the first aspect of the present application when calling the computer program in the memory.
In a fourth aspect, the present application provides a computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the method provided in the first aspect of the present application or any one of the possible implementations of the first aspect of the present application.
From the above, the present application has the following advantageous effects:
for the hash processing, a long-time dependency module is embedded between a feature extraction module for performing the feature extraction processing and a hash layer for performing the hash coding, and the module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation between each feature map, so that the obtained enhanced feature map can be coded to obtain a hash code with significantly improved precision after being input into the hash layer, thereby avoiding the situation that the spatial information between the input features is ignored by the hash coding in the prior art, and further realizing the image feature storage effect with higher precision.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a hashing method for image features according to the present application;
fig. 2 is a schematic view of an application scenario of the deep hash method of the present application;
fig. 3 is a schematic view of a scene in which a GRU is applied to a deep hash according to the present application;
fig. 4 is a schematic diagram of another application scenario of the deep hash method of the present application;
FIG. 5 is a schematic diagram of a hash apparatus for image features according to the present application;
FIG. 6 is a schematic diagram of a processing apparatus according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus. The naming or numbering of the steps appearing in the present application does not mean that the steps in the method flow have to be executed in the chronological/logical order indicated by the naming or numbering, and the named or numbered process steps may be executed in a modified order depending on the technical purpose to be achieved, as long as the same or similar technical effects are achieved.
The division of the modules presented in this application is a logical division, and in practical applications, there may be another division, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed, and in addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, and the indirect coupling or communication connection between the modules may be in an electrical or other similar form, which is not limited in this application. The modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purpose of the present disclosure.
Before describing the hashing method for image features provided in the present application, the background related to the present application will be described first.
The image feature hashing method, the image feature hashing device and the computer-readable storage medium can be applied to processing equipment and are used for enhancing the spatial correlation among feature graphs of input hash layers, and hash codes obtained through subsequent hash coding have remarkably improved precision.
In the image feature hashing method mentioned in the present application, an execution main body may be a hashing device for an image feature, or different types of processing devices such as a server, a physical host, or a User Equipment (UE) that integrates the hashing device for the image feature. The image characteristic hash device may be implemented in a hardware or software manner, the UE may specifically be a terminal device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, or a Personal Digital Assistant (PDA), and the processing device may be set in a device cluster manner.
Next, the hash method of the image features provided in the present application is described.
First, referring to fig. 1, fig. 1 shows a schematic flow chart of the image feature hashing method according to the present application, and the image feature hashing method according to the present application may specifically include the following steps S101 to S103:
step S101, a processing device acquires an image to be processed;
it can be understood that the present application is specifically directed to an image storage scenario, in which an image feature is stored on a data level by specifically using a depth hash method, so as to complete the image storage work.
Correspondingly, the image related to the present application is denoted as a to-be-processed image, and may be an image that can be related in any application scene, and is specifically adjusted according to the actual application, which is not specifically limited herein.
For the acquisition of the image to be processed, in a specific application, the image may be acquired in real time or called.
Step S102, inputting an image to be processed into a deep hash network by processing equipment, wherein the deep hash network comprises a feature extraction module, a long-time dependence module and a hash layer, and in the working process of the deep hash network, the feature extraction module carries out feature extraction on the input image to be processed to obtain a feature map as an image feature; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the Hash layer carries out Hash coding on the enhanced feature graph output by the long-time dependency module and obtains a Hash code;
it can be understood that the core work of the present application is to provide an improvement on the existing deep hash method, specifically, to perform network optimization configuration on a deep hash network.
According to the application, in a traditional deep hash network, a hash layer only combines feature maps extracted by features, however, spatial correlation exists between the feature maps extracted by the features in a deep layer, and the spatial correlation information between the feature maps is ignored in the hash coding process in the existing scheme, so that the quality of a generated hash code is influenced.
In this case, a long-time dependency module is embedded between a feature extraction module for performing feature extraction processing and a hash layer for performing hash coding, and in the previous training process, the module regards each input feature map as a time sequence (a plurality of feature maps are regarded as sequences with time precedence characteristics) to detect spatial correlation between the feature maps, so that after the training is completed, the module can regard each feature map output by the feature extraction module as a time sequence in practical application to detect the spatial correlation between each feature map.
At the moment, the long-time dependency module is used for establishing the long-time characteristic dependency to learn the spatial correlation information among the characteristic graphs, so that the spatial correlation among the characteristic graphs of the hash layer behind the input is strengthened, the spatial correlation information among the input characteristic graphs can be effectively avoided being ignored in the hash coding, and the improvement of the quality of the hash code is promoted.
Specifically, an application scenario diagram of the deep hash method of the present application shown in fig. 2 may be further combined to learn the deep hash network provided in the present application.
(1) As a practical implementation manner, in the present application, the feature extraction module may specifically adopt a network structure of a ResNet50 network in practical application, and the feature extraction module is configured with 50 two-dimensional convolution layers (conv 2 d) and includes a convolution process and four residual blocks, each part is followed by a Batch Normalization (BN) layer to accelerate training speed, a rectifier unit (ReLU function) to avoid gradient disappearance, a maximum pooling layer to implement downsampling, and an output of the last residual block is averaged pooling.
As an example, ResNet50 is used for residual learning among three layers of convolution layers, convolution kernels are 1x1, 3x3 and 1x1 in size, images of 112x112 pixels are input, the output of the last layer of convolution layers is 2048 feature maps of 7x7, and 2048 feature maps of 1x1 are obtained through a global averaging pooling layer.
(2) As a practical implementation manner, the application depends on the module for a long time, and in practical application, a network structure of a GRU may be specifically adopted.
Specifically, as an example, the GRU layer has an input dimension of 2048, and only one layer contains 2048 hidden units. In that
In the process of deep hashing, taking ResNet50 as an example, after a series of convolution pooling processes, ResNet50 generates 2048 feature mapsIn the process, the spatial association information between the feature maps is ignored by the fully-connected layer of the prior art, which may cause that the extracted image features may be combined into a target object that does not exist, and a large deviation may be generated from the actual situation, resulting in a coding error.
The Recurrent Neural Network (RNN) can learn the time sequence of an input sequence, namely the output of the current sequence is related to historical output but cannot be kept for a Long time and has the problem of gradient disappearance, so that a Long Short-Term Memory (LSTM) solves the problem by utilizing a gate structure, mainly comprises a forgetting gate, an input gate and an output gate, allows information to be persistent, and a GRU is continuously improved on the basis of the LSTM, so that the Network structure is simpler and is easier to calculate.
Therefore, the GRU layer is introduced between the ResNet50 and the hash layer for the first time, and each feature map obtained by the ResNet50 is regarded as a sequence, so that the GRU can learn the time sequence information in the sequences, namely the spatial correlation information between the feature maps.
Referring to fig. 3, a schematic diagram of a scenario in which GRU of the present application is applied to deep hash is shown, which illustrates an exemplary diagram of current input featuresObtain the corresponding output characteristic diagramThe process of (1).
Wherein the first characteristic diagram is inputThere is no history-related information for the signature, but since the hidden layer between the GRUs is connected, it will retain the signatureSome information of (2) output gate inside GRUIn (1), make the characteristic diagramCan be combined with the characteristic diagramAnd establishing a dependency relationship. For characteristic diagramGRU will be based on the feature mapInformation of andto obtainMake the characteristic diagramCan be combined with the characteristic diagramAnd characteristic diagramsAnd establishing a dependency relationship. By analogy, for the current input feature mapIn other words, GRU will be according to the preambleInformation acquisition of individual characteristic mapsMake the characteristic diagramCan be combined withThe individual feature maps establish dependencies.
Based on the above process, the GRU can establish a long-time dependency relationship between the currently input feature diagram and the previously input feature diagram. The GRU establishes a long-time dependency relationship to learn the relation between input sequences, and for the sequences with time sequence, the GRU learns the time correlation information between the sequences; for the sequence of feature maps herein, these feature maps are only spatially related, so the GRU learns spatial correlation information between feature maps. Therefore, the GRU establishes the long-time characteristics and relies on the spatial correlation information among the learning characteristic graphs, the spatial correlation of the characteristics of the input hash layer is strengthened, and the spatial correlation information among the input characteristic graphs is prevented from being ignored in hash coding.
(3) As a practical implementation manner, in practical application, the hash layer of the present application may specifically adopt a network structure of three fully-connected layers for implementing hash coding, the first two layers adopt a ReLU activation function, the last layer adopts a hyperbolic tangent activation function (Tanh), the output of the last layer is a hash code, the input dimensions may all be 2048, the length of the output hash code may be 16, 32, 64, and the hash code may be compressed into [1,1 ].
In step S103, the processing device extracts the hash code output by the deep hash network.
After the feature extraction and the hash coding of the image to be processed are completed through the deep hash network, the hash code obtained by processing the image to be processed can be extracted.
At this time, after the hash code is obtained, the hash code may be subjected to related processing according to application requirements in a specific application scenario, for example, the hash code is stored in a corresponding database, and the like, which is not limited herein.
As can be seen from the embodiment shown in fig. 1, for the hash process, a long-time dependency module is embedded between a feature extraction module for performing the feature extraction process and a hash layer for performing the hash coding, and the module regards each feature map output by the feature extraction module as a time sequence to detect spatial correlation between each feature map, so that after the obtained enhanced feature map is input into the hash layer, a hash code with significantly improved accuracy can be encoded, and a situation that spatial information between input features is ignored by the hash coding in the prior art is avoided, thereby achieving an image feature storage effect with higher accuracy.
Further, it can be understood that the design of the deep hash algorithm in practical application may be considered to include two parts, namely, the design of the network structure and the design of the loss function, and in this case, in the training link of the deep hash algorithm, the present application also focuses on the problem of the existing training scheme of the deep hash algorithm to a certain extent.
The starting point of the loss function design is to keep the similarity between the image and the hash feature, at present, the deep hash mostly learns the hash function through the similarity between a binary group, a triple group or a center. The center similarity learning encourages the hash codes generated by similar images to approach a common hash center, and different images converge to different hash centers, so that the hash learning efficiency and the retrieval accuracy are improved.
In the center similarity learning, each label has its corresponding hash center, and the semantic hash center is determined by the hash center corresponding to the label. For a single-label data point, the semantic hash center of the single-label data point is consistent with the hash center corresponding to the label, and the generated hash code converges towards the semantic hash center corresponding to the single-label data point, namely converges towards the label corresponding to the single-label data point. For a multi-label data point, the semantic hash center is the centroid of the hash centers corresponding to the labels, so the corresponding semantic hash center is a little away from the label. The hash code is far away from the corresponding semantic hash center, so that only the data pairs with partially similar labels cannot generate the hash code with a short hamming distance.
Thus, although the globally optimal hash code can be obtained in the center similarity learning manner, for a multi-label data point, a semantic hash center is determined by the hash centers corresponding to a plurality of labels together, which may cause that a hash code with a short hamming distance cannot be generated for a data pair with only partially similar labels but inconsistent semantic hash centers, and this may cause that in the existing center similarity learning technology, for data with only partially similar labels on a multi-label data set, a link between the hamming distance of the generated hash code and the label thereof is ignored, resulting in a situation that the quality of the hash code is not high.
In view of this problem, as another practical implementation manner of the present application, the present application may further include the following:
the processing device performs network training on the deep hash network through the sample image, and in the training process, referring to another application scenario diagram of the deep hash method shown in fig. 4, a central similarity loss function L may be specifically adopted C Pairwise similarity loss function L P And a quantization loss function L Q Carry out trainingWherein, for the three loss functions specifically configured in the present application, the following is briefly described:
(1) center similarity loss function L C For quantizing hash codesAnd corresponding hash centerHamming distance to maintain center similarity learning;
(2) pairwise similarity loss function L P The Hamming distance of the hash codes of the data with only partial similar labels in the multi-label data set is quantized with the associated rows between the labels so as to keep the hash codes of the data pairs with the similar labels as close as possible;
(3) quantization loss function L Q For assignment with maximum probabilityTo control the quality of the generated hash code.
It can be understood that, for the setting of the above-mentioned loss function of the present application, it can be understood that, because the pairwise similarity loss function can link paired hamming distances and paired similarity labels, the present application utilizes this characteristic, and introduces pairwise similarity to improve the problem of central similarity learning, so that the generated hash code can reduce the hamming distance of the hash code between data pairs having only partial similar labels in the multi-label data set while converging to the corresponding semantic hash center, thus achieving the effect of significantly guaranteeing the quality of the hash code.
Wherein for the above mentioned central similarity loss function L C Pairwise similarity loss function L P And a quantization loss function L Q For convenience of understanding, as another practical implementation, the following may be included:
(1) during training, probability distribution of expected network prediction is close to data distribution of training set, and cross entropy can be constantThe distance between the distributions. The hash center is a Binary vector, and Binary Cross Entropy (BCE) can be used to measure the hamming distance between the hash code and its center. Hash codeWith its hash centerThe smaller the Hamming distance between the hash codes is, the hash codes are close to the corresponding centers, so the center similarity loss function L C The definition of (a) can be as follows:
wherein the content of the first and second substances,Nas to the number of samples,Kis the length of the binary hash code,a hash code is represented that is, in turn,a semantic hash center representing the sample;
(2) pairwise similarity loss function L P Cause the Hamming distance of the hash code of data in the multi-label dataset with only partially similar labels to be associated with the label, for which a pairwise similarity loss function L P The definition of (a) can be as follows:
wherein the content of the first and second substances,in order to be a hyper-parameter,in order to be the length of the hash code,so thatThe value of (a) is not less than 0,a label vector code for the sample;
pairwise similarity loss function P L Therefore, in the training process of the multi-label data set, the Hamming distance of the data with only partial similar labels to the generated hash code is reduced.
(3) The purpose of the quantization loss is to assign it with maximum probabilityQuantizing the loss function L Q The definition of (a) can be as follows:
wherein the content of the first and second substances,is a vector. Function of loss due to quantization L Q Is a non-smooth function, the derivative of which is difficult to calculate, so the smooth function is adopted in the applicationInstead of, i.e. usingThen the loss function L is quantized Q Can be converted into:
Furthermore, as yet another practical implementation, the similarity loss function L is applied to the center C Pairwise similarity loss function L P And a quantization loss function L Q In practical application, the method and the device can also introduce more flexible setting so as to enhance the corresponding network training effect.
In particular, the central similarity loss function L C Pairwise similarity loss function L P And a quantization loss function L Q The three can be trained by a combined loss function, which can be defined as follows:
wherein the content of the first and second substances,is the set of all parameters learned by the deep hash function,andis a hyper-parameter obtained by a grid search.
The above is an introduction of the image feature hashing method provided by the present application, and in order to better implement the image feature hashing method provided by the present application, the present application further provides an image feature hashing device from the perspective of a functional module.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an image feature hashing device according to the present application, in the present application, the image feature hashing device 500 may specifically include the following structure:
an obtaining unit 501, configured to obtain an image to be processed;
the hash unit 502 is configured to input an image to be processed into a deep hash network, where the deep hash network includes a feature extraction module, a long-term dependency module, and a hash layer, and in a working process of the deep hash network, the feature extraction module performs feature extraction on the input image to be processed to obtain a feature map as an image feature; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the Hash layer carries out Hash coding on the enhanced feature graph output by the long-time dependency module to obtain a Hash code;
an extracting unit 503, configured to extract the hash code output by the deep hash network.
In yet another exemplary implementation, the long-term dependent module specifically employs a network structure of GRUs.
In another exemplary implementation manner, the hash layer specifically uses a network structure of three fully-connected layers to implement hash coding, the first two layers use a ReLU activation function, the last layer uses a hyperbolic tangent activation function, and the output of the last layer is a hash code.
In another exemplary implementation, the feature extraction module specifically adopts a network structure of a ResNet50 network, the feature extraction module is configured with 50 two-dimensional convolutional layer operations including one convolutional processing and four residual blocks, each part is followed by a batch normalization layer to accelerate training speed, a rectifier unit to avoid gradient disappearance, a maximum pooling layer to implement down-sampling, and the output of the last residual block is averaged pooled.
In yet another exemplary implementation, the apparatus further includes a training unit 504 configured to:
performing network training on the deep hash network through the sample image, and adopting a central similarity loss function L in the training process C Pairwise similarity loss function L P And a quantization loss function L Q Training is carried out;
center similarity loss function L C For quantizing hash codesAnd corresponding hash centerHamming distance to maintain center similarity learning;
pairwise similarity loss function L P The Hamming distance of the hash codes of the data with only partial similar labels in the multi-label data set is quantized with the associated rows between the labels so as to keep the hash codes of the data pairs with the similar labels as close as possible;
quantization loss function L Q For assignment with maximum probabilityTo control the quality of the generated hash code.
In yet another exemplary implementation, a central similarity loss function L C Is defined as follows:
wherein the content of the first and second substances,Nas to the number of samples,Kis the length of the binary hash code,a hash code is represented that is, in turn,a semantic hash center representing the sample;
pairwise similarity loss function L P Is defined as follows:
wherein the content of the first and second substances,in order to be a hyper-parameter,in order to be the length of the hash code,so thatThe value of (a) is not less than 0,a label vector code for the sample;
quantization loss function L Q Is defined as follows:
In yet another exemplary implementation, a central similarity loss function L C Pairwise similarity loss function L P And a quantization loss function L Q The three are specifically trained by a combined loss function, which is defined as follows:
wherein the content of the first and second substances,is a set of all parameters learned by the deep hash function,andis a hyper-parameter obtained by a grid search.
The present application further provides a processing device from a hardware structure perspective, referring to fig. 6, fig. 6 shows a schematic structural diagram of the processing device of the present application, specifically, the processing device of the present application may include a processor 601, a memory 602, and an input/output device 603, where the processor 601 is configured to implement steps of the hash method of the image features in the corresponding embodiment of fig. 1 when executing a computer program stored in the memory 602; alternatively, the processor 601 is configured to implement the functions of the units in the embodiment corresponding to fig. 5 when executing the computer program stored in the memory 602, and the memory 602 is configured to store the computer program required by the processor 601 to execute the hash method of the image feature in the embodiment corresponding to fig. 1.
Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in the memory 602 and executed by the processor 601 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing certain functions, the instruction segments being used to describe the execution of a computer program in a computer device.
The processing devices may include, but are not limited to, a processor 601, a memory 602, and input-output devices 603. It will be appreciated by a person skilled in the art that the illustration is merely an example of a processing device and does not constitute a limitation of the processing device and may comprise more or less components than those illustrated, or some components may be combined, or different components, e.g. the processing device may further comprise a network access device, a bus, etc. via which the processor 601, the memory 602, the input output device 603, etc. are connected.
The Processor 601 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center for the processing device and the various interfaces and lines connecting the various parts of the overall device.
The memory 602 may be used for storing computer programs and/or modules, and the processor 601 may implement various functions of the computer apparatus by executing or executing the computer programs and/or modules stored in the memory 602 and calling data stored in the memory 602. The memory 602 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the processing apparatus, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The processor 601, when executing the computer program stored in the memory 602, may specifically implement the following functions:
acquiring an image to be processed;
inputting an image to be processed into a deep hash network, wherein the deep hash network comprises a feature extraction module, a long-time dependence module and a hash layer, and in the working process of the deep hash network, the feature extraction module performs feature extraction on the input image to be processed to obtain a feature map as an image feature; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the Hash layer carries out Hash coding on the enhanced feature graph output by the long-time dependency module to obtain a Hash code;
and extracting the hash code output by the deep hash network.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the hash apparatus, the processing device and the corresponding units of the image features described above may refer to the description of the hash method of the image features in the embodiment corresponding to fig. 1, and are not described herein again in detail.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
For this reason, the present application provides a computer-readable storage medium, where a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps of the image feature hashing method in the embodiment corresponding to fig. 1 in the present application, and specific operations may refer to the description of the image feature hashing method in the embodiment corresponding to fig. 1, which is not described herein again.
Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the computer-readable storage medium can execute the steps of the image feature hashing method in the embodiment corresponding to fig. 1, the beneficial effects that can be achieved by the image feature hashing method in the embodiment corresponding to fig. 1 can be achieved, which are described in detail in the foregoing description and are not repeated herein.
The foregoing describes in detail the hashing method, apparatus, processing device and computer-readable storage medium for image features provided in the present application, and specific examples are applied herein to explain the principles and embodiments of the present application, and the description of the foregoing embodiments is only used to help understand the method and its core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. A method for hashing an image feature, the method comprising:
the processing equipment acquires an image to be processed;
the processing equipment inputs the image to be processed into a deep hash network, wherein the deep hash network comprises a feature extraction module, a long-time dependence module and a hash layer, and in the working process of the deep hash network, the feature extraction module performs feature extraction on the input image to be processed to obtain a feature map as an image feature; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the hash layer carries out hash coding on the reinforced characteristic graph output by the long-time dependence module to obtain a hash code;
the processing device extracts the hash code output by the deep hash network.
2. The method according to claim 1, characterized in that said long time dependent module specifically employs a network structure of gated round robin units GRU.
3. The method according to claim 1, wherein the hash layer specifically employs a network structure of three fully-connected layers for implementing hash coding, the first two layers employ a ReLU activation function, the last layer employs a hyperbolic tangent activation function, and an output of the last layer is the hash code.
4. The method of claim 1, wherein the feature extraction module specifically adopts a network structure of a ResNet50 network, the feature extraction module is configured with 50 two-dimensional convolutional layer operations including a convolutional process and four residual blocks, each part is followed by a batch normalization layer to accelerate training speed, a rectifier unit to avoid gradient disappearance, a maximum pooling layer to realize down-sampling, and an output of the last residual block is averaged and pooled.
5. The method of claim 1, further comprising:
the processing equipment carries out network training on the deep hash network through the sample image, and in the training process, center similarity is adoptedFunction of sexual loss L C Pairwise similarity loss function L P And a quantization loss function L Q Training is carried out;
the central similarity loss function L C For quantizing hash codesAnd corresponding hash centerHamming distance to maintain center similarity learning;
the pairwise similarity loss function L P The Hamming distance of the hash codes of the data with only partial similar labels in the multi-label data set is quantized with the associated rows between the labels so as to keep the hash codes of the data pairs with the similar labels as close as possible;
6. The method of claim 5, wherein the central similarity loss function L C Is defined as follows:
wherein the content of the first and second substances,Nas to the number of samples,Kis the length of the binary hash code,a hash code is represented that is, in turn,a semantic hash center representing the sample;
pairwise similarityLoss function L P Is defined as follows:
wherein the content of the first and second substances,in order to be a hyper-parameter,in order to be the length of the hash code,so thatThe value of (a) is not less than 0,a label vector code for the sample;
quantization loss function L Q Is defined as follows:
7. The method of claim 5, wherein the central similarity loss function L C The pairwise similarity loss function L P And said quantization loss function L Q The three are specifically trained by a combined loss function, which is defined as follows:
8. An apparatus for hashing a feature of an image, the apparatus comprising:
the acquisition unit is used for acquiring an image to be processed;
the system comprises a Hash unit, a processing unit and a processing unit, wherein the Hash unit is used for inputting the image to be processed into a deep Hash network, the deep Hash network comprises a feature extraction module, a long-time dependence module and a Hash layer, and in the working process of the deep Hash network, the feature extraction module carries out feature extraction on the input image to be processed to obtain a feature map as image features; the long-time dependence module takes each feature map output by the feature extraction module as a time sequence so as to detect the spatial correlation among the feature maps and obtain an enhanced feature map; the hash layer carries out hash coding on the reinforced characteristic graph output by the long-time dependence module to obtain a hash code;
and the extraction unit is used for extracting the hash code output by the deep hash network.
9. A processing device comprising a processor and a memory, a computer program being stored in the memory, the processor performing the method according to any of claims 1 to 7 when calling the computer program in the memory.
10. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210813030.9A CN114898104A (en) | 2022-07-12 | 2022-07-12 | Hash method and device for image features and processing equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210813030.9A CN114898104A (en) | 2022-07-12 | 2022-07-12 | Hash method and device for image features and processing equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114898104A true CN114898104A (en) | 2022-08-12 |
Family
ID=82729260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210813030.9A Pending CN114898104A (en) | 2022-07-12 | 2022-07-12 | Hash method and device for image features and processing equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114898104A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241310A (en) * | 2020-01-10 | 2020-06-05 | 济南浪潮高新科技投资发展有限公司 | Deep cross-modal Hash retrieval method, equipment and medium |
CN112989120A (en) * | 2021-05-13 | 2021-06-18 | 广东众聚人工智能科技有限公司 | Video clip query system and video clip query method |
-
2022
- 2022-07-12 CN CN202210813030.9A patent/CN114898104A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241310A (en) * | 2020-01-10 | 2020-06-05 | 济南浪潮高新科技投资发展有限公司 | Deep cross-modal Hash retrieval method, equipment and medium |
CN112989120A (en) * | 2021-05-13 | 2021-06-18 | 广东众聚人工智能科技有限公司 | Video clip query system and video clip query method |
Non-Patent Citations (5)
Title |
---|
LI YUAN等: "Central Similarity Quantization for Efficient Image and Video Retrieval", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
QIBING QIN等: "Unsupervised Deep Multi-Similarity Hashing With Semantic Structure for Image Retrieval", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 * |
ZHANGJIE CAO等: "HashNet: Deep Learning to Hash by Continuation", 《 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
ZHENG ZHANG等: "Improved Deep Hashing with Soft Pairwise Similarity for Multi-label Image Retrieval", 《ARXIV:1803.02987V3 [CS.CV]》 * |
丁斌: "基于非线性哈希的图像与视频检索算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109885709B (en) | Image retrieval method and device based on self-coding dimensionality reduction and storage medium | |
Boureau et al. | A theoretical analysis of feature pooling in visual recognition | |
CN112437926B (en) | Fast robust friction ridge patch detail extraction using feedforward convolutional neural network | |
CN111177438B (en) | Image characteristic value searching method and device, electronic equipment and storage medium | |
US11714921B2 (en) | Image processing method with ash code on local feature vectors, image processing device and storage medium | |
US20200175259A1 (en) | Face recognition method and apparatus capable of face search using vector | |
CN113869282B (en) | Face recognition method, hyper-resolution model training method and related equipment | |
CN111988614A (en) | Hash coding optimization method and device and readable storage medium | |
JP2023520625A (en) | IMAGE FEATURE MATCHING METHOD AND RELATED DEVICE, DEVICE AND STORAGE MEDIUM | |
JP2015036939A (en) | Feature extraction program and information processing apparatus | |
CN111241550B (en) | Vulnerability detection method based on binary mapping and deep learning | |
CN114332500A (en) | Image processing model training method and device, computer equipment and storage medium | |
CN116978011A (en) | Image semantic communication method and system for intelligent target recognition | |
Tsai et al. | A single‐stage face detection and face recognition deep neural network based on feature pyramid and triplet loss | |
CN114299304A (en) | Image processing method and related equipment | |
CN115631330B (en) | Feature extraction method, model training method, image recognition method and application | |
CN114077685A (en) | Image retrieval method and device, computer equipment and storage medium | |
CN113743593B (en) | Neural network quantization method, system, storage medium and terminal | |
CN114898104A (en) | Hash method and device for image features and processing equipment | |
CN108536769B (en) | Image analysis method, search method and device, computer device and storage medium | |
CN112307243A (en) | Method and apparatus for retrieving image | |
CN115880556A (en) | Multi-mode data fusion processing method, device, equipment and storage medium | |
CN113963241B (en) | FPGA hardware architecture, data processing method thereof and storage medium | |
Liu et al. | Margin-based two-stage supervised hashing for image retrieval | |
Žižakić et al. | Learning local image descriptors with autoencoders |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220812 |
|
RJ01 | Rejection of invention patent application after publication |