CN110851629A - Image retrieval method - Google Patents

Image retrieval method Download PDF

Info

Publication number
CN110851629A
CN110851629A CN201910971299.8A CN201910971299A CN110851629A CN 110851629 A CN110851629 A CN 110851629A CN 201910971299 A CN201910971299 A CN 201910971299A CN 110851629 A CN110851629 A CN 110851629A
Authority
CN
China
Prior art keywords
features
feature
information
sample points
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910971299.8A
Other languages
Chinese (zh)
Inventor
赵喜玲
周瑞乾
牛炳麟
马原
张伯琨
黄蓉
徐莉
郑铭海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyang Agriculture and Forestry University
Original Assignee
Xinyang Agriculture and Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyang Agriculture and Forestry University filed Critical Xinyang Agriculture and Forestry University
Priority to CN201910971299.8A priority Critical patent/CN110851629A/en
Publication of CN110851629A publication Critical patent/CN110851629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The invention discloses an image retrieval method, which comprises the steps of selecting a metadata set to construct a database, then classifying and extracting the metadata set, and dividing the metadata set into visual information and text description information; extracting text features in the text description information and category features of visual features in the visual information, then constructing a source field based on the visual features, and constructing an auxiliary field based on the text features; abstracting text features and visual features into points serving as sample points, firstly performing dimensionality reduction on high-dimensional sample points, then performing enhancement processing on all the sample points, and finally performing feature fusion on the enhanced sample points; and finally, performing similarity matching by adopting a cosine similarity method to realize the retrieval of similar images. The invention can learn the visual information of the image and the text information of the image, thereby improving the accuracy of the image retrieval.

Description

Image retrieval method
Technical Field
The invention relates to the field of retrieval, in particular to an image retrieval method.
Background
At present, the number of stored pictures on the network is increased explosively, and meanwhile, the number of users of different types of social networks and media is also continuously increased. Under the condition, the multimedia data types uploaded to the network by the users are changed, the content shared by the users is not single image visual information any more, but is added with additional information of other data types such as descriptive sentences, self-defined labels, shooting time, scene places and the like by taking images as main. Therefore, the images in the social network not only contain visual information carried by the images, but also contain text, time and other modal information. In such a multimodal data environment, when a user performs image retrieval, if only visual information of an image is used to generate image features for retrieval, effective clues provided by a large amount of other modality information are omitted, and the retrieval performance is directly affected. Therefore, how to construct representative image features to further realize rapid and effective image retrieval is an urgent problem to be solved.
However, in the conventional image retrieval method, single-mode information of an image is generally processed, and other mode information such as text, time, place and the like is omitted, so that the generated features have a large bias in retrieval, and the requirement of a user on quick and effective retrieval in a multi-mode data mashup environment is difficult to meet.
Disclosure of Invention
The present invention is directed to provide an image retrieval method, which solves the above problems of the prior art, and learns the visual information of an image and the text information of the image, thereby improving the accuracy of image retrieval.
In order to achieve the purpose, the invention provides the following scheme: the invention provides an image retrieval method, which comprises the following steps:
step one, selecting a lightweight multi-modal image data set as a metadata set to construct a database, then classifying and extracting the metadata set, and dividing the metadata set into two types of information: visual information, text description information;
secondly, extracting text features in the text description information and category features of visual features in the visual information in the first step, constructing a source field based on the visual features, constructing an auxiliary field based on the text features, constructing the auxiliary field based on the text features, and capturing commonalities among the source field, the auxiliary field and a target field;
abstracting text features and visual features into points as sample points, firstly extracting high-dimensional sample points of each feature, secondly performing dimensionality reduction on the high-dimensional sample points, then performing enhancement processing on all the sample points, and finally performing feature fusion on the enhanced sample points;
and step four, finally, performing similarity matching on the multiple modes by adopting a cosine similarity method so as to realize the retrieval of similar images.
Preferably, the pretreatment process in the step one is as follows: first, the metadata set extracts image features by using the fused convolutional layers, and adds full connection layers between the fused convolutional layers to reduce loss of feature information.
Preferably, the specific process of the sample point enhancement processing in step three is as follows: and mapping the feature expression on a feature plane to form feature nodes after the sample points are subjected to linear transformation once, and generating enhanced nodes by the obtained feature nodes through nonlinear transformation of an activation function.
Preferably, the weight is set according to the commonality among the cross-domain features, the feature vectors of the source field, the target field and the auxiliary field are calculated according to the Laplace matrix, the efficient dimensionality reduction processing and the fusion processing are carried out on high-dimensional feature data by adopting a typical correlation analysis (CCA), and the fused feature vectors are used as image features.
The invention discloses the following technical effects: the method is different from the traditional image retrieval algorithm based on single-mode information extraction features, the transfer learning is integrated into the image feature construction, the cross-mode transfer learning is realized by using the text features and the visual features of the images, and the finally obtained image features are the results after adjustment is carried out on the basis of the visual features according to the text features obtained by the transfer learning. The method has the advantages that other modal information such as texts, time and places cannot be omitted, so that the generated characteristics have a large bias in retrieval, and the quick and effective retrieval requirements of a user in a multi-modal data mashup environment can be met; the method and the device can learn the visual information of the image and the text information of the image at the same time, thereby improving the accuracy and stability of the image retrieval.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 shows the pretreatment process of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The Wiki database is used herein as a database. The database comprises 2866 pictures of ten types of pictures of art, biology, geography and the like, and also comprises the corresponding word description of the pictures in Wikipedia, and the 2866 pictures and the word description corresponding to the pictures are used as a metadata set. Firstly, extracting image features by utilizing fused convolutional layers, and adding full-connection layers between the fused convolutional layers to reduce loss of feature information, so as to realize classification extraction of a metadata set, namely dividing the metadata set into two types of information: visual information, text description information.
And then extracting text features in the text description information and category features of visual features in the visual information, constructing a source field based on the visual features, constructing an auxiliary field based on the text features, constructing the auxiliary field based on the text features, and capturing commonalities among the source field, the auxiliary field and a target field. Because the dimensionalities of the text features and the visual features are different, the text features and the visual features need to be processed by using a dimensionality reduction algorithm, and information fusion under multiple modes can be realized. Firstly, performing dimensionality reduction on high-dimensional features, then performing enhancement processing on the dimensionality-reduced features, and finally performing feature fusion on the enhanced sample points.
The specific process of the enhancement treatment is as follows: and abstracting the features into sample points, mapping the feature expression on a feature plane to form feature nodes after the sample points are subjected to linear transformation once, and generating enhanced nodes by the obtained feature nodes through nonlinear transformation of an activation function.
And finally, performing similarity matching by adopting a cosine similarity method: and performing similarity measurement on the feature vector of the image to be retrieved and a feature library, returning a feature index with higher similarity, then finding out corresponding pictures from the image set, sequencing according to a decreasing rule, and displaying the first k pictures to obtain a retrieval result.
The similarity matching process is as follows: extracting feature vectors after feature fusion, and judging similarity by comparing the sizes of cosine included angles among the feature vectors.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims (4)

1. A method of image retrieval, comprising the steps of:
step one, selecting a lightweight multi-modal image data set as a metadata set to construct a database, then classifying and extracting the metadata set, and dividing the metadata set into two types of information: visual information, text description information;
secondly, extracting text features in the text description information and category features of visual features in the visual information in the first step, then constructing a source field based on the visual features, constructing an auxiliary field based on the text features, and capturing commonalities among the source field, the auxiliary field and a target field;
abstracting text features and visual features into points as sample points, firstly extracting high-dimensional sample points of each feature, secondly performing dimensionality reduction on the high-dimensional sample points, then performing enhancement processing on all the sample points, and finally performing feature fusion on the enhanced sample points;
and step four, finally, performing similarity matching on the multiple modes by adopting a cosine similarity method so as to realize the retrieval of similar images.
2. The method of image retrieval according to claim 1, wherein: the pretreatment process in the first step is as follows: first, the metadata set extracts image features by using the fused convolutional layers, and adds full connection layers between the fused convolutional layers to reduce loss of feature information.
3. The method of image retrieval according to claim 1, wherein: the specific process of the sample point enhancement treatment in the third step is as follows: and mapping the feature expression on a feature plane to form feature nodes after the sample points are subjected to linear transformation once, and generating enhanced nodes by the obtained feature nodes through nonlinear transformation of an activation function.
4. The method of image retrieval according to claim 1, wherein: the fourth step is specifically as follows: setting weight according to the commonality among the cross-domain features, calculating according to the Laplace matrix to obtain the feature vectors of the source field, the target field and the auxiliary field, carrying out effective dimensionality reduction processing and fusion processing on high-dimensional feature data by adopting a typical correlation analysis (CCA), and taking the fused feature vectors as image features.
CN201910971299.8A 2019-10-14 2019-10-14 Image retrieval method Pending CN110851629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910971299.8A CN110851629A (en) 2019-10-14 2019-10-14 Image retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910971299.8A CN110851629A (en) 2019-10-14 2019-10-14 Image retrieval method

Publications (1)

Publication Number Publication Date
CN110851629A true CN110851629A (en) 2020-02-28

Family

ID=69596312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910971299.8A Pending CN110851629A (en) 2019-10-14 2019-10-14 Image retrieval method

Country Status (1)

Country Link
CN (1) CN110851629A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428063A (en) * 2020-03-31 2020-07-17 杭州博雅鸿图视频技术有限公司 Image feature association processing method and system based on geographic spatial position division
WO2021180109A1 (en) * 2020-03-10 2021-09-16 华为技术有限公司 Electronic device and search method thereof, and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9177225B1 (en) * 2014-07-03 2015-11-03 Oim Squared Inc. Interactive content generation
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
CN108829847A (en) * 2018-06-20 2018-11-16 山东大学 Commodity search method and system based on multi-modal shopping preferences
CN110298395A (en) * 2019-06-18 2019-10-01 天津大学 A kind of picture and text matching process based on three mode confrontation network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9177225B1 (en) * 2014-07-03 2015-11-03 Oim Squared Inc. Interactive content generation
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
CN108829847A (en) * 2018-06-20 2018-11-16 山东大学 Commodity search method and system based on multi-modal shopping preferences
CN110298395A (en) * 2019-06-18 2019-10-01 天津大学 A kind of picture and text matching process based on three mode confrontation network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
李晓雨等: "基于迁移学习的图像检索算法", 《计算机科学》 *
王一丁等: "《数字图像处理》", 31 August 2015 *
贾晨等: "基于宽度学习方法的多模态信息融合", 《智能系统学报》 *
郭宝龙等: "《数字图像处理系统工程导论》", 31 July 2012 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021180109A1 (en) * 2020-03-10 2021-09-16 华为技术有限公司 Electronic device and search method thereof, and medium
CN111428063A (en) * 2020-03-31 2020-07-17 杭州博雅鸿图视频技术有限公司 Image feature association processing method and system based on geographic spatial position division
CN111428063B (en) * 2020-03-31 2023-06-30 杭州博雅鸿图视频技术有限公司 Image feature association processing method and system based on geographic space position division

Similar Documents

Publication Publication Date Title
CN111159409B (en) Text classification method, device, equipment and medium based on artificial intelligence
CN114332680A (en) Image processing method, video searching method, image processing device, video searching device, computer equipment and storage medium
CN113627447A (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
EP4310695A1 (en) Data processing method and apparatus, computer device, and storage medium
CN115062134B (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN115269913A (en) Video retrieval method based on attention fragment prompt
CN114332679A (en) Video processing method, device, equipment, storage medium and computer program product
JP7181999B2 (en) SEARCH METHOD AND SEARCH DEVICE, STORAGE MEDIUM
CN112085120A (en) Multimedia data processing method and device, electronic equipment and storage medium
CN110851629A (en) Image retrieval method
Lu et al. Web multimedia object classification using cross-domain correlation knowledge
CN114330704A (en) Statement generation model updating method and device, computer equipment and storage medium
CN116578738B (en) Graph-text retrieval method and device based on graph attention and generating countermeasure network
CN110580294B (en) Entity fusion method, device, equipment and storage medium
JP2012194691A (en) Re-learning method and program of discriminator, image recognition device
CN116977701A (en) Video classification model training method, video classification method and device
CN114398973B (en) Media content tag identification method, device, equipment and storage medium
CN116955707A (en) Content tag determination method, device, equipment, medium and program product
CN115204301A (en) Video text matching model training method and device and video text matching method and device
CN114443916A (en) Supply and demand matching method and system for test data
CN113297485A (en) Method for generating cross-modal representation vector and cross-modal recommendation method
CN112287159A (en) Retrieval method, electronic device and computer readable medium
CN111581335A (en) Text representation method and device
CN115168599B (en) Multi-triplet extraction method, device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228