CN110851629A - Image retrieval method - Google Patents
Image retrieval method Download PDFInfo
- Publication number
- CN110851629A CN110851629A CN201910971299.8A CN201910971299A CN110851629A CN 110851629 A CN110851629 A CN 110851629A CN 201910971299 A CN201910971299 A CN 201910971299A CN 110851629 A CN110851629 A CN 110851629A
- Authority
- CN
- China
- Prior art keywords
- features
- feature
- information
- sample points
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Abstract
The invention discloses an image retrieval method, which comprises the steps of selecting a metadata set to construct a database, then classifying and extracting the metadata set, and dividing the metadata set into visual information and text description information; extracting text features in the text description information and category features of visual features in the visual information, then constructing a source field based on the visual features, and constructing an auxiliary field based on the text features; abstracting text features and visual features into points serving as sample points, firstly performing dimensionality reduction on high-dimensional sample points, then performing enhancement processing on all the sample points, and finally performing feature fusion on the enhanced sample points; and finally, performing similarity matching by adopting a cosine similarity method to realize the retrieval of similar images. The invention can learn the visual information of the image and the text information of the image, thereby improving the accuracy of the image retrieval.
Description
Technical Field
The invention relates to the field of retrieval, in particular to an image retrieval method.
Background
At present, the number of stored pictures on the network is increased explosively, and meanwhile, the number of users of different types of social networks and media is also continuously increased. Under the condition, the multimedia data types uploaded to the network by the users are changed, the content shared by the users is not single image visual information any more, but is added with additional information of other data types such as descriptive sentences, self-defined labels, shooting time, scene places and the like by taking images as main. Therefore, the images in the social network not only contain visual information carried by the images, but also contain text, time and other modal information. In such a multimodal data environment, when a user performs image retrieval, if only visual information of an image is used to generate image features for retrieval, effective clues provided by a large amount of other modality information are omitted, and the retrieval performance is directly affected. Therefore, how to construct representative image features to further realize rapid and effective image retrieval is an urgent problem to be solved.
However, in the conventional image retrieval method, single-mode information of an image is generally processed, and other mode information such as text, time, place and the like is omitted, so that the generated features have a large bias in retrieval, and the requirement of a user on quick and effective retrieval in a multi-mode data mashup environment is difficult to meet.
Disclosure of Invention
The present invention is directed to provide an image retrieval method, which solves the above problems of the prior art, and learns the visual information of an image and the text information of the image, thereby improving the accuracy of image retrieval.
In order to achieve the purpose, the invention provides the following scheme: the invention provides an image retrieval method, which comprises the following steps:
step one, selecting a lightweight multi-modal image data set as a metadata set to construct a database, then classifying and extracting the metadata set, and dividing the metadata set into two types of information: visual information, text description information;
secondly, extracting text features in the text description information and category features of visual features in the visual information in the first step, constructing a source field based on the visual features, constructing an auxiliary field based on the text features, constructing the auxiliary field based on the text features, and capturing commonalities among the source field, the auxiliary field and a target field;
abstracting text features and visual features into points as sample points, firstly extracting high-dimensional sample points of each feature, secondly performing dimensionality reduction on the high-dimensional sample points, then performing enhancement processing on all the sample points, and finally performing feature fusion on the enhanced sample points;
and step four, finally, performing similarity matching on the multiple modes by adopting a cosine similarity method so as to realize the retrieval of similar images.
Preferably, the pretreatment process in the step one is as follows: first, the metadata set extracts image features by using the fused convolutional layers, and adds full connection layers between the fused convolutional layers to reduce loss of feature information.
Preferably, the specific process of the sample point enhancement processing in step three is as follows: and mapping the feature expression on a feature plane to form feature nodes after the sample points are subjected to linear transformation once, and generating enhanced nodes by the obtained feature nodes through nonlinear transformation of an activation function.
Preferably, the weight is set according to the commonality among the cross-domain features, the feature vectors of the source field, the target field and the auxiliary field are calculated according to the Laplace matrix, the efficient dimensionality reduction processing and the fusion processing are carried out on high-dimensional feature data by adopting a typical correlation analysis (CCA), and the fused feature vectors are used as image features.
The invention discloses the following technical effects: the method is different from the traditional image retrieval algorithm based on single-mode information extraction features, the transfer learning is integrated into the image feature construction, the cross-mode transfer learning is realized by using the text features and the visual features of the images, and the finally obtained image features are the results after adjustment is carried out on the basis of the visual features according to the text features obtained by the transfer learning. The method has the advantages that other modal information such as texts, time and places cannot be omitted, so that the generated characteristics have a large bias in retrieval, and the quick and effective retrieval requirements of a user in a multi-modal data mashup environment can be met; the method and the device can learn the visual information of the image and the text information of the image at the same time, thereby improving the accuracy and stability of the image retrieval.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 shows the pretreatment process of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The Wiki database is used herein as a database. The database comprises 2866 pictures of ten types of pictures of art, biology, geography and the like, and also comprises the corresponding word description of the pictures in Wikipedia, and the 2866 pictures and the word description corresponding to the pictures are used as a metadata set. Firstly, extracting image features by utilizing fused convolutional layers, and adding full-connection layers between the fused convolutional layers to reduce loss of feature information, so as to realize classification extraction of a metadata set, namely dividing the metadata set into two types of information: visual information, text description information.
And then extracting text features in the text description information and category features of visual features in the visual information, constructing a source field based on the visual features, constructing an auxiliary field based on the text features, constructing the auxiliary field based on the text features, and capturing commonalities among the source field, the auxiliary field and a target field. Because the dimensionalities of the text features and the visual features are different, the text features and the visual features need to be processed by using a dimensionality reduction algorithm, and information fusion under multiple modes can be realized. Firstly, performing dimensionality reduction on high-dimensional features, then performing enhancement processing on the dimensionality-reduced features, and finally performing feature fusion on the enhanced sample points.
The specific process of the enhancement treatment is as follows: and abstracting the features into sample points, mapping the feature expression on a feature plane to form feature nodes after the sample points are subjected to linear transformation once, and generating enhanced nodes by the obtained feature nodes through nonlinear transformation of an activation function.
And finally, performing similarity matching by adopting a cosine similarity method: and performing similarity measurement on the feature vector of the image to be retrieved and a feature library, returning a feature index with higher similarity, then finding out corresponding pictures from the image set, sequencing according to a decreasing rule, and displaying the first k pictures to obtain a retrieval result.
The similarity matching process is as follows: extracting feature vectors after feature fusion, and judging similarity by comparing the sizes of cosine included angles among the feature vectors.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.
Claims (4)
1. A method of image retrieval, comprising the steps of:
step one, selecting a lightweight multi-modal image data set as a metadata set to construct a database, then classifying and extracting the metadata set, and dividing the metadata set into two types of information: visual information, text description information;
secondly, extracting text features in the text description information and category features of visual features in the visual information in the first step, then constructing a source field based on the visual features, constructing an auxiliary field based on the text features, and capturing commonalities among the source field, the auxiliary field and a target field;
abstracting text features and visual features into points as sample points, firstly extracting high-dimensional sample points of each feature, secondly performing dimensionality reduction on the high-dimensional sample points, then performing enhancement processing on all the sample points, and finally performing feature fusion on the enhanced sample points;
and step four, finally, performing similarity matching on the multiple modes by adopting a cosine similarity method so as to realize the retrieval of similar images.
2. The method of image retrieval according to claim 1, wherein: the pretreatment process in the first step is as follows: first, the metadata set extracts image features by using the fused convolutional layers, and adds full connection layers between the fused convolutional layers to reduce loss of feature information.
3. The method of image retrieval according to claim 1, wherein: the specific process of the sample point enhancement treatment in the third step is as follows: and mapping the feature expression on a feature plane to form feature nodes after the sample points are subjected to linear transformation once, and generating enhanced nodes by the obtained feature nodes through nonlinear transformation of an activation function.
4. The method of image retrieval according to claim 1, wherein: the fourth step is specifically as follows: setting weight according to the commonality among the cross-domain features, calculating according to the Laplace matrix to obtain the feature vectors of the source field, the target field and the auxiliary field, carrying out effective dimensionality reduction processing and fusion processing on high-dimensional feature data by adopting a typical correlation analysis (CCA), and taking the fused feature vectors as image features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910971299.8A CN110851629A (en) | 2019-10-14 | 2019-10-14 | Image retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910971299.8A CN110851629A (en) | 2019-10-14 | 2019-10-14 | Image retrieval method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110851629A true CN110851629A (en) | 2020-02-28 |
Family
ID=69596312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910971299.8A Pending CN110851629A (en) | 2019-10-14 | 2019-10-14 | Image retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110851629A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111428063A (en) * | 2020-03-31 | 2020-07-17 | 杭州博雅鸿图视频技术有限公司 | Image feature association processing method and system based on geographic spatial position division |
WO2021180109A1 (en) * | 2020-03-10 | 2021-09-16 | 华为技术有限公司 | Electronic device and search method thereof, and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9177225B1 (en) * | 2014-07-03 | 2015-11-03 | Oim Squared Inc. | Interactive content generation |
CN108595636A (en) * | 2018-04-25 | 2018-09-28 | 复旦大学 | The image search method of cartographical sketching based on depth cross-module state correlation study |
CN108829847A (en) * | 2018-06-20 | 2018-11-16 | 山东大学 | Commodity search method and system based on multi-modal shopping preferences |
CN110298395A (en) * | 2019-06-18 | 2019-10-01 | 天津大学 | A kind of picture and text matching process based on three mode confrontation network |
-
2019
- 2019-10-14 CN CN201910971299.8A patent/CN110851629A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9177225B1 (en) * | 2014-07-03 | 2015-11-03 | Oim Squared Inc. | Interactive content generation |
CN108595636A (en) * | 2018-04-25 | 2018-09-28 | 复旦大学 | The image search method of cartographical sketching based on depth cross-module state correlation study |
CN108829847A (en) * | 2018-06-20 | 2018-11-16 | 山东大学 | Commodity search method and system based on multi-modal shopping preferences |
CN110298395A (en) * | 2019-06-18 | 2019-10-01 | 天津大学 | A kind of picture and text matching process based on three mode confrontation network |
Non-Patent Citations (4)
Title |
---|
李晓雨等: "基于迁移学习的图像检索算法", 《计算机科学》 * |
王一丁等: "《数字图像处理》", 31 August 2015 * |
贾晨等: "基于宽度学习方法的多模态信息融合", 《智能系统学报》 * |
郭宝龙等: "《数字图像处理系统工程导论》", 31 July 2012 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021180109A1 (en) * | 2020-03-10 | 2021-09-16 | 华为技术有限公司 | Electronic device and search method thereof, and medium |
CN111428063A (en) * | 2020-03-31 | 2020-07-17 | 杭州博雅鸿图视频技术有限公司 | Image feature association processing method and system based on geographic spatial position division |
CN111428063B (en) * | 2020-03-31 | 2023-06-30 | 杭州博雅鸿图视频技术有限公司 | Image feature association processing method and system based on geographic space position division |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111159409B (en) | Text classification method, device, equipment and medium based on artificial intelligence | |
CN114332680A (en) | Image processing method, video searching method, image processing device, video searching device, computer equipment and storage medium | |
CN113627447A (en) | Label identification method, label identification device, computer equipment, storage medium and program product | |
CN116580257A (en) | Feature fusion model training and sample retrieval method and device and computer equipment | |
EP4310695A1 (en) | Data processing method and apparatus, computer device, and storage medium | |
CN115062134B (en) | Knowledge question-answering model training and knowledge question-answering method, device and computer equipment | |
CN115269913A (en) | Video retrieval method based on attention fragment prompt | |
CN114332679A (en) | Video processing method, device, equipment, storage medium and computer program product | |
JP7181999B2 (en) | SEARCH METHOD AND SEARCH DEVICE, STORAGE MEDIUM | |
CN112085120A (en) | Multimedia data processing method and device, electronic equipment and storage medium | |
CN110851629A (en) | Image retrieval method | |
Lu et al. | Web multimedia object classification using cross-domain correlation knowledge | |
CN114330704A (en) | Statement generation model updating method and device, computer equipment and storage medium | |
CN116578738B (en) | Graph-text retrieval method and device based on graph attention and generating countermeasure network | |
CN110580294B (en) | Entity fusion method, device, equipment and storage medium | |
JP2012194691A (en) | Re-learning method and program of discriminator, image recognition device | |
CN116977701A (en) | Video classification model training method, video classification method and device | |
CN114398973B (en) | Media content tag identification method, device, equipment and storage medium | |
CN116955707A (en) | Content tag determination method, device, equipment, medium and program product | |
CN115204301A (en) | Video text matching model training method and device and video text matching method and device | |
CN114443916A (en) | Supply and demand matching method and system for test data | |
CN113297485A (en) | Method for generating cross-modal representation vector and cross-modal recommendation method | |
CN112287159A (en) | Retrieval method, electronic device and computer readable medium | |
CN111581335A (en) | Text representation method and device | |
CN115168599B (en) | Multi-triplet extraction method, device, equipment, medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200228 |