CN111737512A - Silk cultural relic image retrieval method based on depth feature region fusion - Google Patents

Silk cultural relic image retrieval method based on depth feature region fusion Download PDF

Info

Publication number
CN111737512A
CN111737512A CN202010498104.5A CN202010498104A CN111737512A CN 111737512 A CN111737512 A CN 111737512A CN 202010498104 A CN202010498104 A CN 202010498104A CN 111737512 A CN111737512 A CN 111737512A
Authority
CN
China
Prior art keywords
cultural relic
target
silk cultural
retrieval
silk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010498104.5A
Other languages
Chinese (zh)
Other versions
CN111737512B (en
Inventor
赵鸣博
沙晟涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
National Dong Hwa University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN202010498104.5A priority Critical patent/CN111737512B/en
Publication of CN111737512A publication Critical patent/CN111737512A/en
Application granted granted Critical
Publication of CN111737512B publication Critical patent/CN111737512B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/54Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a silk cultural relic image retrieval method based on depth feature region fusion, which is characterized by comprising the following steps of: classifying and learning the silk cultural relic image by adopting a deep learning global feature extraction mode; selecting an activation area corresponding to a silk cultural relic image of a certain category by adopting a neural network visualization mode, and further realizing retrieval target positioning; fusing the characteristics related to the target area in a regional characteristic fusion mode to be used as a local descriptor of the target; and selecting the silk cultural relic image with the characteristic distance closest to the user request picture for retrieval. Aiming at the characteristic that the silk cultural relic image retrieval target usually only occupies a small part, the invention can accurately position and extract fine-grained characteristics of the retrieval target by combining depth characteristic extraction and candidate retrieval areas, thereby improving the silk cultural relic image retrieval performance and realizing the small target retrieval of the silk cultural relic image.

Description

Silk cultural relic image retrieval method based on depth feature region fusion
Technical Field
The invention relates to a retrieval method of a silk cultural relic image, in particular to a retrieval method of a silk cultural relic image based on depth feature extraction and fine-grained region fusion, and belongs to the technical field of information.
Background
The development and the propagation of silk cultural relic image information resources as a widely utilized medium are witnessed. The silk cultural relic retrieval method adopting the depth feature extraction can effectively manage the rapidly-increased silk cultural relic image data set, and displays the traditional silk cultural relic path to a great number of users in a digital mode through a network means.
The silk cultural relic retrieval method adopting depth feature extraction at present is mainly based on global features, namely, the output of a full connection layer of a depth feature network is adopted as a feature descriptor, so that the whole semantic information of an image is kept. The global-based method mostly focuses on image classification type retrieval tasks, and the feature extraction method is also based on global full-connected layer output. However, since the convolutional neural network mainly encodes global spatial information, the obtained features lack invariance to geometric transformations such as scale, rotation, translation and the like and spatial layout changes of the image, and robustness of the convolutional neural network to highly variable image retrieval is limited. Meanwhile, for silk images, the retrieval target only occupies a small part of the whole image, so for the small target retrieval problem, the small target cannot be effectively represented and the small target area cannot be accurately positioned by the global-based feature.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the existing silk cultural relic retrieval method cannot realize small target retrieval and positioning.
In order to solve the technical problem, the technical scheme of the invention is to provide a silk cultural relic image retrieval method based on depth feature region fusion, which is characterized by comprising the following steps:
step 1, classifying and learning silk cultural relic images by adopting a deep learning global feature extraction mode, and classifying all silk cultural relic images into different categories;
step 2, selecting an activation area corresponding to the silk cultural relic image of a certain category determined in the step 1 by adopting a neural network visualization mode, and further realizing retrieval target positioning, wherein the method comprises the following steps:
step 201, fusing the characteristic surfaces of the silk cultural relic images of a certain specific category determined in the step 1 by using a Grad-CAM method to obtain a Grad-CAM image;
step 202, performing global mean pooling on the Grad-CAM graphs of each category, namely taking and scoring the mean value of the Grad-CAM graphs, and reserving the Grad-CAM graphs higher than a certain threshold value to indicate that the Grad-CAM graphs contain targets of the current category;
step 203, positioning the specific position of the target of the corresponding category according to the reserved contour of the Grad-CAM diagram, and realizing target positioning;
and 3, fusing the features related to the target region in a region feature fusion mode to be used as the local descriptor of the target, wherein the method comprises the following steps:
step 301, positioning a detection target to obtain a sensor feature surface of which the convolution result of the target in a positioning area is H multiplied by W multiplied by D, wherein H, W, D respectively represents the height, width and channel number of the feature surface;
step 302, adopting a policy of Region Maximum Activation of constants, regarding H × W × D sensor feature surfaces as D H × W-dimensional descriptors, and performing local average pooling or Maximum pooling on the D H × W descriptors to obtain a D-dimensional feature to represent the target;
and 4, obtaining a user request picture, obtaining the characteristics of the user request picture by adopting the methods in the steps 2 and 3, calculating Euclidean distances between the characteristics of the user request picture and the characteristics of each type of silk cultural relic images in a local characteristic space, and selecting the type of silk cultural relic image closest to the characteristics of the user request picture for retrieval.
Preferably, in step 1, in the classification learning, the target data is subjected to classification fine tuning on a pre-training model by using a transfer learning manner.
Preferably, in step 302, if one picture contains multiple objects, D-dimensional features of different objects are concatenated as output in a region feature fusion manner.
Aiming at the characteristic that the silk cultural relic image retrieval target usually only occupies a small part, the invention can accurately position and extract fine-grained characteristics of the retrieval target by combining depth characteristic extraction and candidate retrieval areas, thereby improving the silk cultural relic image retrieval performance and realizing the small target retrieval of the silk cultural relic image.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The invention is improved on the basis of the existing method for extracting the global features based on deep learning so as to realize the small target retrieval and positioning of the silk cultural relic image.
The invention provides a silk cultural relic image retrieval method based on depth feature region fusion, which comprises the following steps:
step 1, classifying and learning the silk cultural relic image by adopting a deep learning global feature extraction mode, thereby keeping global classification information of features. During classification learning, the target data is classified and fine-tuned on a pre-training model (such as VGGNet or ResNet) by using a transfer learning mode, so that the fine-tuned CNN network feature plane can contain classification information. The classification information is only coarse-grained learning, and the subsequent fine-grained learning needs to be finely adjusted through a network.
And 2, selecting an activation area corresponding to the silk cultural relic image of a certain category determined in the step 1 by adopting a neural network visualization mode, and further realizing retrieval target positioning.
The step 2 comprises the following steps:
step 201, fusing the specific characteristic surface of the silk cultural relic image of a certain category determined in step 1 by using a Grad-CAM (Gradient-weighted Class Activation Mapping) method to obtain a Grad-CAM image so as to achieve the purpose of visualizing the target area. The core idea is to perform weighted fusion on a certain convolution layer feature plane to visualize object information of a specific type.
Step 202, performing Global average pooling (Global averagePooling) on the Grad-CAM map of each category, namely, taking the average of the Grad-CAM maps and scoring (voting), wherein the Grad-CAM map higher than a certain threshold value is reserved, which indicates that the Grad-CAM map contains the target of the current category.
And step 203, positioning the specific position of the target of the corresponding category according to the reserved contour of the Grad-CAM diagram, and realizing the positioning of the retrieval target.
And 3, fusing the features related to the target region in a region feature fusion mode to be used as the local descriptor of the target.
The step 3 comprises the following steps:
step 301, positioning the detected target to obtain a convolutional result of the target in a positioning area of the target, wherein the convolutional result is an H × W × D Tensor feature plane, and H, W, D respectively represents the height, width and channel number of the feature plane.
Step 302, in order to convert the sensor feature plane into a feature vector representing the target, a strategy of RegionMaximum Activation of constants is adopted, and the H multiplied by W multiplied by D sensor feature plane is regarded as a D H multiplied by W dimensional descriptor. And performing local average pooling or maximum pooling on the D H multiplied by W descriptors to obtain a D-dimensional feature to represent the target.
Step 303, if one picture contains a plurality of targets, D-dimensional features of different targets can be connected in series in a region feature fusion manner to serve as output.
And 4, obtaining a user request picture, obtaining the characteristics of the user request picture by adopting the methods in the steps 2 and 3, calculating Euclidean distances between the characteristics of the user request picture and the characteristics of each type of silk cultural relic images in a local characteristic space, and selecting the type of silk cultural relic image closest to the characteristics of the user request picture for retrieval.

Claims (3)

1. A silk cultural relic image retrieval method based on depth feature region fusion is characterized by comprising the following steps:
step 1, classifying and learning silk cultural relic images by adopting a deep learning global feature extraction mode, and classifying all silk cultural relic images into different categories;
step 2, selecting an activation area corresponding to the silk cultural relic image of a certain category determined in the step 1 by adopting a neural network visualization mode, and further realizing target positioning, wherein the method comprises the following steps:
step 201, fusing the characteristic surfaces of the silk cultural relic images of a certain specific category determined in the step 1 by using a Grad-CAM method to obtain a Grad-CAM image;
step 202, performing global mean pooling on the Grad-CAM graphs of each category, namely taking and scoring the mean value of the Grad-CAM graphs, and reserving the Grad-CAM graphs higher than a certain threshold value to indicate that the Grad-CAM graphs contain targets of the current category;
step 203, positioning the specific position of the target of the corresponding category according to the reserved contour of the Grad-CAM diagram, and realizing target positioning;
and 3, fusing the features related to the target region in a region feature fusion mode to be used as the local descriptor of the target, wherein the method comprises the following steps:
step 301, positioning a detection target to obtain a sensor feature surface of which the convolution result of the target in a positioning area is H multiplied by W multiplied by D, wherein H, W, D respectively represents the height, width and channel number of the feature surface;
step 302, adopting a policy of Region Maximum Activation of constants, regarding H × W × D sensor feature surfaces as D H × W-dimensional descriptors, and performing local average pooling or Maximum pooling on the D H × W descriptors to obtain a D-dimensional feature to represent the target;
and 4, obtaining a user request picture, obtaining the characteristics of the user request picture by adopting the methods in the steps 2 and 3, calculating Euclidean distances between the characteristics of the user request picture and the characteristics of each type of silk cultural relic images in a local characteristic space, and selecting the type of silk cultural relic image closest to the characteristics of the user request picture for retrieval.
2. The silk cultural relic image retrieval method based on depth feature region fusion as claimed in claim 1, wherein in step 1, during the classification learning, a migration learning mode is used to perform classification fine adjustment on the target data on a pre-training model.
3. The silk cultural relic image retrieval method based on depth feature region fusion as claimed in claim 1, wherein in step 302, if a picture comprises a plurality of objects, D-dimensional features of different objects are connected in series as output by using a region feature fusion mode.
CN202010498104.5A 2020-06-04 2020-06-04 Silk cultural relic image retrieval method based on depth feature region fusion Active CN111737512B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010498104.5A CN111737512B (en) 2020-06-04 2020-06-04 Silk cultural relic image retrieval method based on depth feature region fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010498104.5A CN111737512B (en) 2020-06-04 2020-06-04 Silk cultural relic image retrieval method based on depth feature region fusion

Publications (2)

Publication Number Publication Date
CN111737512A true CN111737512A (en) 2020-10-02
CN111737512B CN111737512B (en) 2021-11-12

Family

ID=72649012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010498104.5A Active CN111737512B (en) 2020-06-04 2020-06-04 Silk cultural relic image retrieval method based on depth feature region fusion

Country Status (1)

Country Link
CN (1) CN111737512B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837299A (en) * 2021-02-09 2021-05-25 浙江工业大学 Textile image fingerprint retrieval method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3166049A1 (en) * 2015-11-03 2017-05-10 Baidu USA LLC Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering
CN109272011A (en) * 2018-07-31 2019-01-25 东华大学 Multitask depth representing learning method towards image of clothing classification
CN110334746A (en) * 2019-06-12 2019-10-15 腾讯科技(深圳)有限公司 A kind of image detecting method and device
CN110688511A (en) * 2019-08-15 2020-01-14 深圳久凌软件技术有限公司 Fine-grained image retrieval method and device, computer equipment and storage medium
CN110825899A (en) * 2019-09-18 2020-02-21 武汉纺织大学 Clothing image retrieval method integrating color features and residual network depth features
WO2020052513A1 (en) * 2018-09-14 2020-03-19 阿里巴巴集团控股有限公司 Image identification and pedestrian re-identification method and apparatus, and electronic and storage device
US20200117951A1 (en) * 2018-10-15 2020-04-16 Ancestry.com Operations Inc. (019404) (019404) Image captioning with weakly-supervised attention penalty
CN111104539A (en) * 2019-12-20 2020-05-05 湖南千视通信息科技有限公司 Fine-grained vehicle image retrieval method, device and equipment
CN111104538A (en) * 2019-12-06 2020-05-05 深圳久凌软件技术有限公司 Fine-grained vehicle image retrieval method and device based on multi-scale constraint
CN111159456A (en) * 2019-12-30 2020-05-15 云南大学 Multi-scale clothing retrieval method and system based on deep learning and traditional features
CN111177376A (en) * 2019-12-17 2020-05-19 东华大学 Chinese text classification method based on BERT and CNN hierarchical connection
CN111177446A (en) * 2019-12-12 2020-05-19 苏州科技大学 Method for searching footprint image

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3166049A1 (en) * 2015-11-03 2017-05-10 Baidu USA LLC Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering
CN109272011A (en) * 2018-07-31 2019-01-25 东华大学 Multitask depth representing learning method towards image of clothing classification
WO2020052513A1 (en) * 2018-09-14 2020-03-19 阿里巴巴集团控股有限公司 Image identification and pedestrian re-identification method and apparatus, and electronic and storage device
US20200117951A1 (en) * 2018-10-15 2020-04-16 Ancestry.com Operations Inc. (019404) (019404) Image captioning with weakly-supervised attention penalty
CN110334746A (en) * 2019-06-12 2019-10-15 腾讯科技(深圳)有限公司 A kind of image detecting method and device
CN110688511A (en) * 2019-08-15 2020-01-14 深圳久凌软件技术有限公司 Fine-grained image retrieval method and device, computer equipment and storage medium
CN110825899A (en) * 2019-09-18 2020-02-21 武汉纺织大学 Clothing image retrieval method integrating color features and residual network depth features
CN111104538A (en) * 2019-12-06 2020-05-05 深圳久凌软件技术有限公司 Fine-grained vehicle image retrieval method and device based on multi-scale constraint
CN111177446A (en) * 2019-12-12 2020-05-19 苏州科技大学 Method for searching footprint image
CN111177376A (en) * 2019-12-17 2020-05-19 东华大学 Chinese text classification method based on BERT and CNN hierarchical connection
CN111104539A (en) * 2019-12-20 2020-05-05 湖南千视通信息科技有限公司 Fine-grained vehicle image retrieval method, device and equipment
CN111159456A (en) * 2019-12-30 2020-05-15 云南大学 Multi-scale clothing retrieval method and system based on deep learning and traditional features

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI CHAO,等: "Clothes Keypoints Detection with Cascaded Pyramid Network", 《JOURNAL OF DONGHUA UNIVERSITY》 *
孙洁,等: "基于卷积神经网络的织物图像特征提取与检索研究进展", 《纺织学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837299A (en) * 2021-02-09 2021-05-25 浙江工业大学 Textile image fingerprint retrieval method
CN112837299B (en) * 2021-02-09 2024-02-27 浙江工业大学 Textile image fingerprint retrieval method

Also Published As

Publication number Publication date
CN111737512B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
Chen et al. Improved saliency detection in RGB-D images using two-phase depth estimation and selective deep fusion
Wang et al. A three-layered graph-based learning approach for remote sensing image retrieval
JP6440303B2 (en) Object recognition device, object recognition method, and program
CN110717526A (en) Unsupervised transfer learning method based on graph convolution network
US9025863B2 (en) Depth camera system with machine learning for recognition of patches within a structured light pattern
CN109086777B (en) Saliency map refining method based on global pixel characteristics
Zheng et al. Domain adaptation via a task-specific classifier framework for remote sensing cross-scene classification
CN110363071A (en) A kind of sea ice detection method cooperateing with Active Learning and transductive SVM
Yan et al. TrAdaBoost based on improved particle swarm optimization for cross-domain scene classification with limited samples
Chen et al. Integrated content and context analysis for mobile landmark recognition
Qian et al. On combining social media and spatial technology for POI cognition and image localization
Liao et al. Tag features for geo-aware image classification
CN113159043A (en) Feature point matching method and system based on semantic information
CN111737512B (en) Silk cultural relic image retrieval method based on depth feature region fusion
CN114510594A (en) Traditional pattern subgraph retrieval method based on self-attention mechanism
Chen et al. Human motion target posture detection algorithm using semi-supervised learning in internet of things
CN112446431A (en) Feature point extraction and matching method, network, device and computer storage medium
Chen et al. Correlation filter tracking via distractor-aware learning and multi-anchor detection
JP2012022419A (en) Learning data creation device, learning data creation method, and program
Han et al. A novel loop closure detection method with the combination of points and lines based on information entropy
CN107578003A (en) A kind of remote sensing images transfer learning method based on GEOGRAPHICAL INDICATION image
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
Liao et al. Multi-scale saliency features fusion model for person re-identification
CN111144466B (en) Image sample self-adaptive depth measurement learning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant