CN111737507A - Single-mode image Hash retrieval method - Google Patents
Single-mode image Hash retrieval method Download PDFInfo
- Publication number
- CN111737507A CN111737507A CN202010577850.3A CN202010577850A CN111737507A CN 111737507 A CN111737507 A CN 111737507A CN 202010577850 A CN202010577850 A CN 202010577850A CN 111737507 A CN111737507 A CN 111737507A
- Authority
- CN
- China
- Prior art keywords
- image
- hash
- mode
- retrieval
- hash retrieval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000007246 mechanism Effects 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 14
- 238000005457 optimization Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 3
- 238000013139 quantization Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Library & Information Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention particularly relates to a single-mode image hash retrieval method. The single-mode image Hash retrieval method comprises four parts of image preprocessing, image feature extraction, attention image output and Hash retrieval model generation. According to the single-mode image Hash retrieval method, semantic information in a picture mode is extracted through an attention mechanism, the quality of Hash function generated by a Hash retrieval model is improved, meanwhile, the retrieval precision among a plurality of label data is enhanced through a multi-stage semantic supervision mode, the most matched item is located in front of the final retrieval result, and therefore the retrieval efficiency is greatly improved.
Description
Technical Field
The invention relates to the technical field of image retrieval, in particular to a single-mode image hash retrieval method.
Background
With the technological progress, the internet technology is rapidly developed, the technology is updated day by day, and the image video data is increased explosively. Conventional Image Retrieval technologies include two Retrieval modes, Text-based Image Retrieval (TBIR) and Content-based Image Retrieval (CBIR). The text-based image retrieval technology describes the characteristics of an image, such as the author, the age, the genre, the size and the like of a pictorial work in a text description mode; a content-based image retrieval technique is an image retrieval technique that analyzes and retrieves the content semantics of an image, such as the color, texture, layout, etc. of the image. Currently, content-based image retrieval techniques are the mainstream image retrieval methods.
The image hash retrieval technology aims to search the existing data set to find out the image data meeting the requirements. Since the hash code has the advantages of small stored data and high retrieval speed, hash retrieval is widely applied to retrieval tasks. The existing image hash retrieval technology can be divided into a depth model retrieval technology and a non-depth model retrieval technology. The traditional method generally adopts a deep network, extracts image features, and converts samples into hash codes according to the extracted features and cross entropy loss by using a full-connection network and stores the hash codes in a database.
In a real environment, one image contains very much rich information, a plurality of classes of information often exist, the accuracy of the traditional method for one class of information is often not enough, and redundant information in the background of the image and information in an area worth focusing on are in the same position in the process of hash learning. Most of the existing hash retrieval models only pay attention to information in an area worth focusing on in an image, and all image information cannot be fully utilized.
Based on the problems, the invention provides a single-mode image hash retrieval method.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient single-mode image hash retrieval method.
The invention is realized by the following technical scheme:
a single-mode image hash retrieval method is characterized by comprising the following steps: the method comprises four parts of image preprocessing, image feature extraction, attention image output and Hash retrieval model generation;
the method comprises the steps of firstly, defining a multilevel semantic similarity relation matrix to keep rich semantic information in multi-label data, simultaneously, spontaneously searching a key Attention area in an image by adopting an Attention mechanism, and generating a mask with the same size as the image representation size by learning, so as to extract the semantic information in a picture mode and assist a hash retrieval model to obtain a hash function with higher quality.
The invention relates to a single-mode image Hash retrieval method, which comprises the following specific implementation steps:
firstly, acquiring original pictures of a training set, and inputting the pictures corresponding to different residual error networks respectively;
secondly, inputting the training samples into a Hash retrieval model, and optimizing parameters of the Hash retrieval model through a minimum loss function;
fixing the model, namely obtaining corresponding hash codes of all samples through a hash retrieval model, and storing the hash codes into an input library for use;
and fourthly, when a Hash retrieval model is used for carrying out retrieval tasks, only the sample of any mode of the picture is required to be input into the model to generate the Hash code corresponding to the mode, then N (self-defined as required) Hash codes with the nearest Hamming distance are searched in a Hash code database of the other mode, and the sample corresponding to the Hash codes is returned.
In the second step, the model parameters are optimized by adopting an iterative optimization method, namely, one parameter is fixed, and the other parameters are optimized.
In the second step, the hash retrieval model is optimized, and the method comprises the following steps:
(1) generating a similarity matrix S with multilevel semantics;
(2) extracting the characteristics of the image mode to obtain the image mode characteristics PiClassifying the images and outputting attention images;
(3) performing dot multiplication on the obtained characteristic image and the attention image to obtain a characteristic representation F of a picture modeiAnd a feature representation F of a text modalityj;
(4) And (4) performing iterative optimization on the Hash retrieval model by adopting a loss function to finally obtain the optimized Hash retrieval model.
In the step (1), the similarity matrix S with multilevel semantics is expressed as:
wherein, | CiI and I CjL represents the number of categories of the sample i and the sample j, respectively, and D (i, j) represents the number of categories shared by the two samples; similarity matrix S composed of sample i and sample jij∈[0,1]Thereby ensuring that the generated S matrix has greater distinguishability.
In the step (2), a Resnet101 network is adopted for extraction, and an image modal characteristic P is obtainedi(ii) a Meanwhile, a Resnet01 network is adopted, a full connection layer is removed, an average pooling layer is added, sample class data is output, classification tasks are carried out on the images, an Attention mechanism is added in the last layer, Attention images are output, and areas with important Attention are activated.
In the step (3), the obtained characteristic image and the attention map image are subjected to dot multiplication, and the obtained result is used as input to enter a full-connection layer to obtain a characteristic representation F of a picture modei(ii) a Inputting BOW (bag of words) representation of text modality into full connection layer to obtain feature representation F of text modalityj。
In the step (4), the loss function is expressed as:
wherein S isijA similarity matrix formed by the samples i and j, sigma being a hyperparameter for balancing a penalty term and a data loss term, FT iTransposing the feature representation of the modality of the picture, FjBeing a characteristic representation of a text modality, L2Is a common quantization loss, L3Is a bit balance penalty.
The invention has the beneficial effects that: according to the single-mode image Hash retrieval method, semantic information in a picture mode is extracted through an attention mechanism, the quality of Hash function generated by a Hash retrieval model is improved, meanwhile, the retrieval precision among a plurality of label data is enhanced through a multi-stage semantic supervision mode, the most matched item is located in front of the final retrieval result, and therefore the retrieval efficiency is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic diagram of a single-mode image hash retrieval method according to the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The single-mode image Hash retrieval method comprises four parts, namely image preprocessing, image feature extraction, attention image output and Hash retrieval model generation;
the method comprises the steps of firstly, defining a multilevel semantic similarity relation matrix to keep rich semantic information in multi-label data, simultaneously, spontaneously searching a key Attention area in an image by adopting an Attention mechanism, and generating a mask with the same size as the image representation size by learning, so as to extract the semantic information in a picture mode and assist a hash retrieval model to obtain a hash function with higher quality.
The Hash retrieval method for the single-mode image comprises the following specific implementation steps:
firstly, acquiring original pictures of a training set, and inputting the pictures corresponding to different residual error networks respectively;
secondly, inputting the training samples into a Hash retrieval model, and optimizing parameters of the Hash retrieval model through a minimum loss function;
fixing the model, namely obtaining corresponding hash codes of all samples through a hash retrieval model, and storing the hash codes into an input library for use;
and fourthly, when a Hash retrieval model is used for carrying out retrieval tasks, only the sample of any mode of the picture is required to be input into the model to generate the Hash code corresponding to the mode, then N (self-defined as required) Hash codes with the nearest Hamming distance are searched in a Hash code database of the other mode, and the sample corresponding to the Hash codes is returned.
In the second step, the model parameters are optimized by adopting an iterative optimization method, namely, one parameter is fixed, and the other parameters are optimized.
In the second step, the hash retrieval model is optimized, and the method comprises the following steps:
(1) generating a similarity matrix S with multilevel semantics;
(2) extracting the characteristics of the image mode to obtain the image mode characteristics PiClassifying the images and outputting attention images;
(3) performing dot multiplication on the obtained characteristic image and the attention image to obtain a characteristic representation F of a picture modeiAnd a feature representation F of a text modalityj;
(4) And (4) performing iterative optimization on the Hash retrieval model by adopting a loss function to finally obtain the optimized Hash retrieval model.
In the step (1), the similarity matrix S with multilevel semantics is expressed as:
wherein, | CiI and I CjRespectively representsThe number of classes sample i and sample j have, D (i, j) represents the number of classes shared by the two samples; similarity matrix S composed of sample i and sample jij∈[0,1]Thereby ensuring that the generated S matrix has greater distinguishability.
In the step (2), a Resnet101 network is adopted for extraction, and an image modal characteristic P is obtainedi(ii) a Meanwhile, a Resnet01 network is adopted, a full connection layer is removed, an average pooling layer is added, sample class data is output, classification tasks are carried out on the images, an Attention mechanism is added in the last layer, Attention images are output, and areas with important Attention are activated.
In the step (3), the obtained characteristic image and the attention map image are subjected to dot multiplication, and the obtained result is used as input to enter a full-connection layer to obtain a characteristic representation F of a picture modei(ii) a Inputting BOW (bag of words) representation of text modality into full connection layer to obtain feature representation F of text modalityj。
In the step (4), the loss function is expressed as:
wherein S isijA similarity matrix formed by the samples i and j, sigma being a hyperparameter for balancing a penalty term and a data loss term, FT iTransposing the feature representation of the modality of the picture, FjBeing a characteristic representation of a text modality, L2Is a common quantization loss, L3Is a bit balance penalty.
Compared with the prior art, the single-mode image hash retrieval method has the following characteristics:
firstly, the focused key points are separated from redundant data, and key areas are highlighted, so that the retrieval efficiency is improved;
in the conventional image retrieval, most of existing hash retrieval models for image information cannot be fully utilized, and background and redundant information in an image and information of an area worth focusing on a key point are in the same position in the hash learning process, so that the background and redundant information are separated from the area worth focusing on the key point by the method, and the retrieval efficiency can be greatly improved.
Secondly, through an Attention mechanism, the Hash retrieval model can pay Attention to a more distinctive area, and the quality of the Hash function generated by the model is improved;
in recent years, the Attention mechanism is widely applied to computer vision, and all the mechanisms achieve good effects. The Attention mechanism is used for image identification, a place needing important Attention in an image can be found spontaneously, namely a Mask with the same size as the image representation is generated through learning, for an Attention area, a Mask corresponding position has a higher Attention area, the Attention mechanism is fused into a Hash retrieval method, the method is more explanatory, and the retrieval efficiency is improved.
And thirdly, a multi-level semantic supervision mode is used, so that the retrieval precision among a plurality of label data is enhanced, and the most matched item is positioned in front of the final retrieval result.
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (8)
1. A single-mode image hash retrieval method is characterized by comprising the following steps: the method comprises four parts of image preprocessing, image feature extraction, attention image output and Hash retrieval model generation;
the method comprises the steps of firstly, defining a multilevel semantic similarity relation matrix to keep rich semantic information in multi-label data, simultaneously, spontaneously searching a key Attention area in an image by adopting an Attention mechanism, and generating a mask with the same size as the image representation size by learning, so as to extract the semantic information in a picture mode and assist a hash retrieval model to obtain a hash function with higher quality.
2. The single-modality image hash retrieval method according to claim 1, is characterized by comprising the following specific implementation steps:
firstly, acquiring original pictures of a training set, and inputting the pictures corresponding to different residual error networks respectively;
secondly, inputting the training samples into a Hash retrieval model, and optimizing parameters of the Hash retrieval model through a minimum loss function;
fixing the model, namely obtaining corresponding hash codes of all samples through a hash retrieval model, and storing the hash codes into an input library for use;
and fourthly, when a Hash retrieval model is used for carrying out retrieval tasks, only the sample of any mode of the picture is input into the model to generate the Hash code corresponding to the mode, then n Hash codes with the nearest Hamming distance are searched in a Hash code database of the other mode, and the sample corresponding to the Hash codes is returned.
3. The single-modality image hash retrieval method according to claim 2, characterized in that: in the second step, the model parameters are optimized by adopting an iterative optimization method, namely, one parameter is fixed, and the other parameters are optimized.
4. The single-modality image hash retrieval method according to claim 3, characterized in that: in the second step, the hash retrieval model is optimized, and the method comprises the following steps:
(1) generating a similarity matrix S with multilevel semantics;
(2) extracting the characteristics of the image mode to obtain the image mode characteristics PiClassifying the images and outputting attention images;
(3) performing dot multiplication on the obtained characteristic image and the attention image to obtain a characteristic representation F of a picture modeiAnd a feature representation F of a text modalityj;
(4) And (4) performing iterative optimization on the Hash retrieval model by adopting a loss function to finally obtain the optimized Hash retrieval model.
5. The single-modality image hash retrieval method according to claim 4, characterized in that: in the step (1), the similarity matrix S with multilevel semantics is expressed as:
wherein, | CiI and I CjL represents the number of categories of the sample i and the sample j, respectively, and D (i, j) represents the number of categories shared by the two samples; similarity matrix S composed of sample i and sample jij∈[0,1]Thereby ensuring that the generated S matrix has greater distinguishability.
6. The single-modality image hash retrieval method according to claim 4, characterized in that: in the step (2), a Resnet101 network is adopted for extraction, and an image modal characteristic P is obtainedi(ii) a Meanwhile, a Resnet01 network is adopted, a full connection layer is removed, an average pooling layer is added, sample class data is output, classification tasks are carried out on the images, an Attention mechanism is added in the last layer, Attention images are output, and areas with important Attention are activated.
7. The single-modality image hash retrieval method according to claim 6, characterized in that: in the step (3), the obtained characteristic image and the attention map image are subjected to dot multiplication, and the obtained result is used as input to enter a full-connection layer to obtain a characteristic representation F of a picture modei(ii) a Inputting the BOW representation of the text mode into the full connection layer to obtain the feature representation F of the text modej。
8. The single-modality image hash retrieval method according to claim 4, 5, 6 or 7, wherein: in the step (4), the loss function is expressed as:
wherein S isijA similarity matrix formed by the samples i and j, sigma being a hyperparameter for balancing a penalty term and a data loss term, FT iFeature representation for picture modalityTranspose of (F)jBeing a characteristic representation of a text modality, L2Is a common quantization loss, L3Is a bit balance penalty.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010577850.3A CN111737507A (en) | 2020-06-23 | 2020-06-23 | Single-mode image Hash retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010577850.3A CN111737507A (en) | 2020-06-23 | 2020-06-23 | Single-mode image Hash retrieval method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111737507A true CN111737507A (en) | 2020-10-02 |
Family
ID=72650704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010577850.3A Pending CN111737507A (en) | 2020-06-23 | 2020-06-23 | Single-mode image Hash retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111737507A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111858984A (en) * | 2020-07-13 | 2020-10-30 | 济南浪潮高新科技投资发展有限公司 | Image matching method based on attention mechanism Hash retrieval |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765281A (en) * | 2019-11-04 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | Multi-semantic depth supervision cross-modal Hash retrieval method |
CN111125457A (en) * | 2019-12-13 | 2020-05-08 | 山东浪潮人工智能研究院有限公司 | Deep cross-modal Hash retrieval method and device |
-
2020
- 2020-06-23 CN CN202010577850.3A patent/CN111737507A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765281A (en) * | 2019-11-04 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | Multi-semantic depth supervision cross-modal Hash retrieval method |
CN111125457A (en) * | 2019-12-13 | 2020-05-08 | 山东浪潮人工智能研究院有限公司 | Deep cross-modal Hash retrieval method and device |
Non-Patent Citations (1)
Title |
---|
双锴编著: "计算机视觉", 31 January 2020, 北京邮电大学出版社, pages: 131 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111858984A (en) * | 2020-07-13 | 2020-10-30 | 济南浪潮高新科技投资发展有限公司 | Image matching method based on attention mechanism Hash retrieval |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
CN114003698B (en) * | 2021-12-27 | 2022-04-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113283551B (en) | Training method and training device of multi-mode pre-training model and electronic equipment | |
CN110599592B (en) | Three-dimensional indoor scene reconstruction method based on text | |
CN108509521B (en) | Image retrieval method for automatically generating text index | |
CN113761377B (en) | False information detection method and device based on attention mechanism multi-feature fusion, electronic equipment and storage medium | |
CN113672693B (en) | Label recommendation method of online question-answering platform based on knowledge graph and label association | |
CN112100413A (en) | Cross-modal Hash retrieval method | |
CN110598022B (en) | Image retrieval system and method based on robust deep hash network | |
CN111125457A (en) | Deep cross-modal Hash retrieval method and device | |
CN115238690A (en) | Military field composite named entity identification method based on BERT | |
CN114461890A (en) | Hierarchical multi-modal intellectual property search engine method and system | |
CN111858984A (en) | Image matching method based on attention mechanism Hash retrieval | |
Patel et al. | Dynamic lexicon generation for natural scene images | |
CN117010500A (en) | Visual knowledge reasoning question-answering method based on multi-source heterogeneous knowledge joint enhancement | |
US11494431B2 (en) | Generating accurate and natural captions for figures | |
Al-Barhamtoshy et al. | Arabic documents information retrieval for printed, handwritten, and calligraphy image | |
CN114780582A (en) | Natural answer generating system and method based on form question and answer | |
CN114328934A (en) | Attention mechanism-based multi-label text classification method and system | |
CN111737507A (en) | Single-mode image Hash retrieval method | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN116680420B (en) | Low-resource cross-language text retrieval method and device based on knowledge representation enhancement | |
CN117688220A (en) | Multi-mode information retrieval method and system based on large language model | |
CN114169325B (en) | Webpage new word discovery and analysis method based on word vector representation | |
CN115359486A (en) | Method and system for determining custom information in document image | |
Idziak et al. | Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |