CN111737507A - Single-mode image Hash retrieval method - Google Patents

Single-mode image Hash retrieval method Download PDF

Info

Publication number
CN111737507A
CN111737507A CN202010577850.3A CN202010577850A CN111737507A CN 111737507 A CN111737507 A CN 111737507A CN 202010577850 A CN202010577850 A CN 202010577850A CN 111737507 A CN111737507 A CN 111737507A
Authority
CN
China
Prior art keywords
image
hash
mode
retrieval
hash retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010577850.3A
Other languages
Chinese (zh)
Inventor
凌泽乐
高岩
高明
金长新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Group Co Ltd
Original Assignee
Inspur Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Group Co Ltd filed Critical Inspur Group Co Ltd
Priority to CN202010577850.3A priority Critical patent/CN111737507A/en
Publication of CN111737507A publication Critical patent/CN111737507A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention particularly relates to a single-mode image hash retrieval method. The single-mode image Hash retrieval method comprises four parts of image preprocessing, image feature extraction, attention image output and Hash retrieval model generation. According to the single-mode image Hash retrieval method, semantic information in a picture mode is extracted through an attention mechanism, the quality of Hash function generated by a Hash retrieval model is improved, meanwhile, the retrieval precision among a plurality of label data is enhanced through a multi-stage semantic supervision mode, the most matched item is located in front of the final retrieval result, and therefore the retrieval efficiency is greatly improved.

Description

Single-mode image Hash retrieval method
Technical Field
The invention relates to the technical field of image retrieval, in particular to a single-mode image hash retrieval method.
Background
With the technological progress, the internet technology is rapidly developed, the technology is updated day by day, and the image video data is increased explosively. Conventional Image Retrieval technologies include two Retrieval modes, Text-based Image Retrieval (TBIR) and Content-based Image Retrieval (CBIR). The text-based image retrieval technology describes the characteristics of an image, such as the author, the age, the genre, the size and the like of a pictorial work in a text description mode; a content-based image retrieval technique is an image retrieval technique that analyzes and retrieves the content semantics of an image, such as the color, texture, layout, etc. of the image. Currently, content-based image retrieval techniques are the mainstream image retrieval methods.
The image hash retrieval technology aims to search the existing data set to find out the image data meeting the requirements. Since the hash code has the advantages of small stored data and high retrieval speed, hash retrieval is widely applied to retrieval tasks. The existing image hash retrieval technology can be divided into a depth model retrieval technology and a non-depth model retrieval technology. The traditional method generally adopts a deep network, extracts image features, and converts samples into hash codes according to the extracted features and cross entropy loss by using a full-connection network and stores the hash codes in a database.
In a real environment, one image contains very much rich information, a plurality of classes of information often exist, the accuracy of the traditional method for one class of information is often not enough, and redundant information in the background of the image and information in an area worth focusing on are in the same position in the process of hash learning. Most of the existing hash retrieval models only pay attention to information in an area worth focusing on in an image, and all image information cannot be fully utilized.
Based on the problems, the invention provides a single-mode image hash retrieval method.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient single-mode image hash retrieval method.
The invention is realized by the following technical scheme:
a single-mode image hash retrieval method is characterized by comprising the following steps: the method comprises four parts of image preprocessing, image feature extraction, attention image output and Hash retrieval model generation;
the method comprises the steps of firstly, defining a multilevel semantic similarity relation matrix to keep rich semantic information in multi-label data, simultaneously, spontaneously searching a key Attention area in an image by adopting an Attention mechanism, and generating a mask with the same size as the image representation size by learning, so as to extract the semantic information in a picture mode and assist a hash retrieval model to obtain a hash function with higher quality.
The invention relates to a single-mode image Hash retrieval method, which comprises the following specific implementation steps:
firstly, acquiring original pictures of a training set, and inputting the pictures corresponding to different residual error networks respectively;
secondly, inputting the training samples into a Hash retrieval model, and optimizing parameters of the Hash retrieval model through a minimum loss function;
fixing the model, namely obtaining corresponding hash codes of all samples through a hash retrieval model, and storing the hash codes into an input library for use;
and fourthly, when a Hash retrieval model is used for carrying out retrieval tasks, only the sample of any mode of the picture is required to be input into the model to generate the Hash code corresponding to the mode, then N (self-defined as required) Hash codes with the nearest Hamming distance are searched in a Hash code database of the other mode, and the sample corresponding to the Hash codes is returned.
In the second step, the model parameters are optimized by adopting an iterative optimization method, namely, one parameter is fixed, and the other parameters are optimized.
In the second step, the hash retrieval model is optimized, and the method comprises the following steps:
(1) generating a similarity matrix S with multilevel semantics;
(2) extracting the characteristics of the image mode to obtain the image mode characteristics PiClassifying the images and outputting attention images;
(3) performing dot multiplication on the obtained characteristic image and the attention image to obtain a characteristic representation F of a picture modeiAnd a feature representation F of a text modalityj
(4) And (4) performing iterative optimization on the Hash retrieval model by adopting a loss function to finally obtain the optimized Hash retrieval model.
In the step (1), the similarity matrix S with multilevel semantics is expressed as:
Figure BDA0002551903960000021
wherein, | CiI and I CjL represents the number of categories of the sample i and the sample j, respectively, and D (i, j) represents the number of categories shared by the two samples; similarity matrix S composed of sample i and sample jij∈[0,1]Thereby ensuring that the generated S matrix has greater distinguishability.
In the step (2), a Resnet101 network is adopted for extraction, and an image modal characteristic P is obtainedi(ii) a Meanwhile, a Resnet01 network is adopted, a full connection layer is removed, an average pooling layer is added, sample class data is output, classification tasks are carried out on the images, an Attention mechanism is added in the last layer, Attention images are output, and areas with important Attention are activated.
In the step (3), the obtained characteristic image and the attention map image are subjected to dot multiplication, and the obtained result is used as input to enter a full-connection layer to obtain a characteristic representation F of a picture modei(ii) a Inputting BOW (bag of words) representation of text modality into full connection layer to obtain feature representation F of text modalityj
In the step (4), the loss function is expressed as:
Figure BDA0002551903960000031
wherein S isijA similarity matrix formed by the samples i and j, sigma being a hyperparameter for balancing a penalty term and a data loss term, FT iTransposing the feature representation of the modality of the picture, FjBeing a characteristic representation of a text modality, L2Is a common quantization loss, L3Is a bit balance penalty.
The invention has the beneficial effects that: according to the single-mode image Hash retrieval method, semantic information in a picture mode is extracted through an attention mechanism, the quality of Hash function generated by a Hash retrieval model is improved, meanwhile, the retrieval precision among a plurality of label data is enhanced through a multi-stage semantic supervision mode, the most matched item is located in front of the final retrieval result, and therefore the retrieval efficiency is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic diagram of a single-mode image hash retrieval method according to the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The single-mode image Hash retrieval method comprises four parts, namely image preprocessing, image feature extraction, attention image output and Hash retrieval model generation;
the method comprises the steps of firstly, defining a multilevel semantic similarity relation matrix to keep rich semantic information in multi-label data, simultaneously, spontaneously searching a key Attention area in an image by adopting an Attention mechanism, and generating a mask with the same size as the image representation size by learning, so as to extract the semantic information in a picture mode and assist a hash retrieval model to obtain a hash function with higher quality.
The Hash retrieval method for the single-mode image comprises the following specific implementation steps:
firstly, acquiring original pictures of a training set, and inputting the pictures corresponding to different residual error networks respectively;
secondly, inputting the training samples into a Hash retrieval model, and optimizing parameters of the Hash retrieval model through a minimum loss function;
fixing the model, namely obtaining corresponding hash codes of all samples through a hash retrieval model, and storing the hash codes into an input library for use;
and fourthly, when a Hash retrieval model is used for carrying out retrieval tasks, only the sample of any mode of the picture is required to be input into the model to generate the Hash code corresponding to the mode, then N (self-defined as required) Hash codes with the nearest Hamming distance are searched in a Hash code database of the other mode, and the sample corresponding to the Hash codes is returned.
In the second step, the model parameters are optimized by adopting an iterative optimization method, namely, one parameter is fixed, and the other parameters are optimized.
In the second step, the hash retrieval model is optimized, and the method comprises the following steps:
(1) generating a similarity matrix S with multilevel semantics;
(2) extracting the characteristics of the image mode to obtain the image mode characteristics PiClassifying the images and outputting attention images;
(3) performing dot multiplication on the obtained characteristic image and the attention image to obtain a characteristic representation F of a picture modeiAnd a feature representation F of a text modalityj
(4) And (4) performing iterative optimization on the Hash retrieval model by adopting a loss function to finally obtain the optimized Hash retrieval model.
In the step (1), the similarity matrix S with multilevel semantics is expressed as:
Figure BDA0002551903960000041
wherein, | CiI and I CjRespectively representsThe number of classes sample i and sample j have, D (i, j) represents the number of classes shared by the two samples; similarity matrix S composed of sample i and sample jij∈[0,1]Thereby ensuring that the generated S matrix has greater distinguishability.
In the step (2), a Resnet101 network is adopted for extraction, and an image modal characteristic P is obtainedi(ii) a Meanwhile, a Resnet01 network is adopted, a full connection layer is removed, an average pooling layer is added, sample class data is output, classification tasks are carried out on the images, an Attention mechanism is added in the last layer, Attention images are output, and areas with important Attention are activated.
In the step (3), the obtained characteristic image and the attention map image are subjected to dot multiplication, and the obtained result is used as input to enter a full-connection layer to obtain a characteristic representation F of a picture modei(ii) a Inputting BOW (bag of words) representation of text modality into full connection layer to obtain feature representation F of text modalityj
In the step (4), the loss function is expressed as:
Figure BDA0002551903960000051
wherein S isijA similarity matrix formed by the samples i and j, sigma being a hyperparameter for balancing a penalty term and a data loss term, FT iTransposing the feature representation of the modality of the picture, FjBeing a characteristic representation of a text modality, L2Is a common quantization loss, L3Is a bit balance penalty.
Compared with the prior art, the single-mode image hash retrieval method has the following characteristics:
firstly, the focused key points are separated from redundant data, and key areas are highlighted, so that the retrieval efficiency is improved;
in the conventional image retrieval, most of existing hash retrieval models for image information cannot be fully utilized, and background and redundant information in an image and information of an area worth focusing on a key point are in the same position in the hash learning process, so that the background and redundant information are separated from the area worth focusing on the key point by the method, and the retrieval efficiency can be greatly improved.
Secondly, through an Attention mechanism, the Hash retrieval model can pay Attention to a more distinctive area, and the quality of the Hash function generated by the model is improved;
in recent years, the Attention mechanism is widely applied to computer vision, and all the mechanisms achieve good effects. The Attention mechanism is used for image identification, a place needing important Attention in an image can be found spontaneously, namely a Mask with the same size as the image representation is generated through learning, for an Attention area, a Mask corresponding position has a higher Attention area, the Attention mechanism is fused into a Hash retrieval method, the method is more explanatory, and the retrieval efficiency is improved.
And thirdly, a multi-level semantic supervision mode is used, so that the retrieval precision among a plurality of label data is enhanced, and the most matched item is positioned in front of the final retrieval result.
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A single-mode image hash retrieval method is characterized by comprising the following steps: the method comprises four parts of image preprocessing, image feature extraction, attention image output and Hash retrieval model generation;
the method comprises the steps of firstly, defining a multilevel semantic similarity relation matrix to keep rich semantic information in multi-label data, simultaneously, spontaneously searching a key Attention area in an image by adopting an Attention mechanism, and generating a mask with the same size as the image representation size by learning, so as to extract the semantic information in a picture mode and assist a hash retrieval model to obtain a hash function with higher quality.
2. The single-modality image hash retrieval method according to claim 1, is characterized by comprising the following specific implementation steps:
firstly, acquiring original pictures of a training set, and inputting the pictures corresponding to different residual error networks respectively;
secondly, inputting the training samples into a Hash retrieval model, and optimizing parameters of the Hash retrieval model through a minimum loss function;
fixing the model, namely obtaining corresponding hash codes of all samples through a hash retrieval model, and storing the hash codes into an input library for use;
and fourthly, when a Hash retrieval model is used for carrying out retrieval tasks, only the sample of any mode of the picture is input into the model to generate the Hash code corresponding to the mode, then n Hash codes with the nearest Hamming distance are searched in a Hash code database of the other mode, and the sample corresponding to the Hash codes is returned.
3. The single-modality image hash retrieval method according to claim 2, characterized in that: in the second step, the model parameters are optimized by adopting an iterative optimization method, namely, one parameter is fixed, and the other parameters are optimized.
4. The single-modality image hash retrieval method according to claim 3, characterized in that: in the second step, the hash retrieval model is optimized, and the method comprises the following steps:
(1) generating a similarity matrix S with multilevel semantics;
(2) extracting the characteristics of the image mode to obtain the image mode characteristics PiClassifying the images and outputting attention images;
(3) performing dot multiplication on the obtained characteristic image and the attention image to obtain a characteristic representation F of a picture modeiAnd a feature representation F of a text modalityj
(4) And (4) performing iterative optimization on the Hash retrieval model by adopting a loss function to finally obtain the optimized Hash retrieval model.
5. The single-modality image hash retrieval method according to claim 4, characterized in that: in the step (1), the similarity matrix S with multilevel semantics is expressed as:
Figure FDA0002551903950000011
wherein, | CiI and I CjL represents the number of categories of the sample i and the sample j, respectively, and D (i, j) represents the number of categories shared by the two samples; similarity matrix S composed of sample i and sample jij∈[0,1]Thereby ensuring that the generated S matrix has greater distinguishability.
6. The single-modality image hash retrieval method according to claim 4, characterized in that: in the step (2), a Resnet101 network is adopted for extraction, and an image modal characteristic P is obtainedi(ii) a Meanwhile, a Resnet01 network is adopted, a full connection layer is removed, an average pooling layer is added, sample class data is output, classification tasks are carried out on the images, an Attention mechanism is added in the last layer, Attention images are output, and areas with important Attention are activated.
7. The single-modality image hash retrieval method according to claim 6, characterized in that: in the step (3), the obtained characteristic image and the attention map image are subjected to dot multiplication, and the obtained result is used as input to enter a full-connection layer to obtain a characteristic representation F of a picture modei(ii) a Inputting the BOW representation of the text mode into the full connection layer to obtain the feature representation F of the text modej
8. The single-modality image hash retrieval method according to claim 4, 5, 6 or 7, wherein: in the step (4), the loss function is expressed as:
Figure FDA0002551903950000021
wherein S isijA similarity matrix formed by the samples i and j, sigma being a hyperparameter for balancing a penalty term and a data loss term, FT iFeature representation for picture modalityTranspose of (F)jBeing a characteristic representation of a text modality, L2Is a common quantization loss, L3Is a bit balance penalty.
CN202010577850.3A 2020-06-23 2020-06-23 Single-mode image Hash retrieval method Pending CN111737507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010577850.3A CN111737507A (en) 2020-06-23 2020-06-23 Single-mode image Hash retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010577850.3A CN111737507A (en) 2020-06-23 2020-06-23 Single-mode image Hash retrieval method

Publications (1)

Publication Number Publication Date
CN111737507A true CN111737507A (en) 2020-10-02

Family

ID=72650704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010577850.3A Pending CN111737507A (en) 2020-06-23 2020-06-23 Single-mode image Hash retrieval method

Country Status (1)

Country Link
CN (1) CN111737507A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium
CN114003698B (en) * 2021-12-27 2022-04-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113283551B (en) Training method and training device of multi-mode pre-training model and electronic equipment
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN110599592B (en) Three-dimensional indoor scene reconstruction method based on text
CN110083729B (en) Image searching method and system
CN108509521B (en) Image retrieval method for automatically generating text index
CN113672693B (en) Label recommendation method of online question-answering platform based on knowledge graph and label association
CN112100413A (en) Cross-modal Hash retrieval method
CN111324765A (en) Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation
CN110598022B (en) Image retrieval system and method based on robust deep hash network
CN111858984A (en) Image matching method based on attention mechanism Hash retrieval
CN115238690A (en) Military field composite named entity identification method based on BERT
Patel et al. Dynamic lexicon generation for natural scene images
CN111125457A (en) Deep cross-modal Hash retrieval method and device
CN114461890A (en) Hierarchical multi-modal intellectual property search engine method and system
CN114780582A (en) Natural answer generating system and method based on form question and answer
Al-Barhamtoshy et al. Arabic documents information retrieval for printed, handwritten, and calligraphy image
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN117010500A (en) Visual knowledge reasoning question-answering method based on multi-source heterogeneous knowledge joint enhancement
CN113761377B (en) False information detection method and device based on attention mechanism multi-feature fusion, electronic equipment and storage medium
CN114328934A (en) Attention mechanism-based multi-label text classification method and system
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
US11494431B2 (en) Generating accurate and natural captions for figures
CN111737507A (en) Single-mode image Hash retrieval method
CN116227486A (en) Emotion analysis method based on retrieval and contrast learning
CN115359486A (en) Method and system for determining custom information in document image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination