CN114020948A - Sketch image retrieval method and system based on sorting clustering sequence identification selection - Google Patents

Sketch image retrieval method and system based on sorting clustering sequence identification selection Download PDF

Info

Publication number
CN114020948A
CN114020948A CN202111259946.6A CN202111259946A CN114020948A CN 114020948 A CN114020948 A CN 114020948A CN 202111259946 A CN202111259946 A CN 202111259946A CN 114020948 A CN114020948 A CN 114020948A
Authority
CN
China
Prior art keywords
layer
image
sketch
image retrieval
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111259946.6A
Other languages
Chinese (zh)
Inventor
陈亚雄
汤一博
李小玉
赵东婕
熊盛武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202111259946.6A priority Critical patent/CN114020948A/en
Publication of CN114020948A publication Critical patent/CN114020948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a sketch image retrieval method based on sorting clustering sequence identification selection. Finally, the hash code is used for retrieving the sketch image. The invention designs a triple transform backbone of a sequence discrimination selection module, and captures an important discrimination domain between a sketch and a natural image; and providing an objective function consisting of three groups of items, semantic similar items, sorting clustering items and distinguishing learning items, keeping the semantic similarity of the hash codes in the process of learning the hash codes, capturing the similarity between different modes, and optimizing sorting information so as to cluster similar examples and know distinguishing domain learning. The problems of redundant information and neglected sequencing information are solved, the retrieval precision is higher, and the performance is further improved.

Description

Sketch image retrieval method and system based on sorting clustering sequence identification selection
Technical Field
The invention belongs to the technical field of image retrieval, relates to a sketch image retrieval method and a sketch image retrieval system, and particularly relates to a sketch image retrieval method and a sketch image retrieval system based on sorting clustering sequence identification selection.
Background
Due to the explosive growth of touch screen devices, the use of sketches is becoming more and more frequent: the user can draw a sketch on the touch screen device with a finger anytime and anywhere. It makes sense to mine an effective natural image using a sketch. And thus, the interest of sketch image retrieval is increasing, and the purpose of sketch image retrieval is to match natural images by using a hand-drawn sketch as a query mode.
The existing sketch image retrieval methods are roughly divided into two types: a manual production method and a deep learning method. However, the handmade sketch image retrieval method does not reduce cross-domain differences between the sketch and the natural image very well because handmade features do not effectively represent edges of natural images and misaligned sketches with large variations and ambiguities. In order to solve the problem of cross-domain difference, a deep learning sketch image retrieval method is provided. However, the existing deep learning method still faces two challenges: (1) the sketch and the natural image contain different objects with similar contour shapes. Some deep learning sketch retrieval methods cannot capture an important discrimination domain between a sketch and a natural image, so that the problem of information redundancy is caused, and the performance of sketch image retrieval is influenced finally; (2) the ranking information is closely related to the search results. In the process of learning the hash code of the sketch retrieval task, the conventional method neglects the utilization of sequencing information, so that the performance is not ideal.
Disclosure of Invention
The invention provides a sketch image retrieval method and a sketch image retrieval system based on sorting clustering sequence identification selection, aiming at the defects of the prior art, wherein the method and the system fully utilize distinguishing regions and sorting information to execute hash code learning, firstly draw a query sketch and select the distinguishing regions, and simultaneously utilize the sorting information to aggregate samples of the same category, so that the sample can be known in what kind under other modes. And finally, retrieving the sketch image by utilizing the hash code.
The method adopts the technical scheme that: a sketch image retrieval method based on sorting clustering sequence identification selection is characterized in that a sketch image retrieval network is firstly constructed, and then the sketch image retrieval network is utilized to carry out sketch image retrieval;
the construction of the sketch image retrieval network specifically comprises the following steps:
step 1: constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
Figure BDA0003325304540000021
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-th layer of the transform outputs and inputs a hash layer to carry out deep hash function learning, and the output hash code is used for constructing a three-tuple item, a category level semantic item and a sequencing cluster in the target functionAn item;
step 2: acquiring an existing sketch image data set, and dividing the data set into a training data set, a verification data set and a test data set;
and step 3: in the training dataset, N triplet elements are given
Figure BDA0003325304540000022
And triple tags
Figure BDA0003325304540000023
Wherein
Figure BDA0003325304540000024
The three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;
Figure BDA0003325304540000025
to represent
Figure BDA0003325304540000026
The class label of (a) is used,
Figure BDA0003325304540000027
to represent
Figure BDA0003325304540000028
The class label of (a) is used,
Figure BDA0003325304540000029
watch small
Figure BDA00033253045400000210
Class labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
and 4, step 4: training a sketch image retrieval network by using a training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
The technical scheme adopted by the system of the invention is as follows: a sketch image retrieval system for discriminating selections based on a sorted cluster sequence, comprising the following modules:
the module 1 is used for constructing a sketch image retrieval network module;
the module 2 is used for searching the sketch images by utilizing the sketch image searching network;
the module 1 specifically comprises the following sub-modules:
the submodule 1 is used for constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
Figure BDA0003325304540000031
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-layer transform outputs and inputs the hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a triad item, a category level semantic item and a sequencing clustering item in the target function;
the submodule 2 is used for acquiring the existing sketch image data set and dividing the data set into a training data set, a verification data set and a test data set;
submodule step 3 for assigning N triplet elements in the training dataset
Figure BDA0003325304540000032
And triple tags
Figure BDA0003325304540000033
Wherein
Figure BDA0003325304540000034
The three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;
Figure BDA0003325304540000035
to represent
Figure BDA0003325304540000036
The class label of (a) is used,
Figure BDA0003325304540000037
to represent
Figure BDA0003325304540000038
The class label of (a) is used,
Figure BDA0003325304540000039
to represent
Figure BDA00033253045400000310
Class labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
the submodule 4 is used for training the sketch image retrieval network by utilizing the training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
Compared with the prior art, the invention has the following advantages:
1) designing a triple transform backbone of a sequence identification selection module, and capturing an important identification domain between a sketch and a natural image;
2) and providing an objective function consisting of three groups of items, semantic similar items, sorting clustering items and distinguishing learning items, keeping the semantic similarity of the hash codes in the process of learning the hash codes, capturing the similarity between different modes, and optimizing sorting information so as to cluster similar examples and know distinguishing domain learning. The problems of redundant information and neglected sequencing information are solved, the retrieval precision is higher, and the performance is further improved.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a network structure diagram according to an embodiment of the present invention.
Fig. 3 is a comparison of the inventive method and the DSIH-V method on an extended TU-Berlin dataset. (a) The first 20 of the 256-bit hash code image is retrieved using DSIH-V. (b) The first 20 names of the 256-bit hash code image are retrieved using the DSIH. The wrong retrieved image is labeled by x under the image.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The invention provides a sketch image retrieval method based on sorting clustering sequence identification selection, which fully utilizes distinguishing regions and sorting information to execute hash code learning, firstly draws a query sketch and selects the distinguishing regions, and meanwhile utilizes the sorting information to aggregate samples of the same category. Finally, the hash code is used for retrieving the sketch image.
Referring to fig. 1, the sketch image retrieval method based on sorting clustering sequence identification selection provided by the invention comprises the steps of firstly constructing a sketch image retrieval network, and then utilizing the sketch image retrieval network to perform sketch image retrieval;
the method for constructing the sketch image retrieval network comprises the following specific steps of:
step 1: constructing a sketch image retrieval network;
referring to fig. 2, the sketch image retrieval network of the present embodiment includes a transform partitioning module, a linear projection module, and a transform encoding module;
a transform partitioning module for dividing the input image into M2D patch images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
Figure BDA0003325304540000041
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and embedding learnable positions into the small block image for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass
The embedded vector is:
Figure BDA0003325304540000051
wherein the content of the first and second substances,
Figure BDA0003325304540000052
respectively representing 1 st, 2 nd, … th, M2-dimensional patch images; e denotes a patch image embedding projection, EposIndicating position embedding.
To better focus on the most significant area, z0And sending the sequence into a transform coding module, and mining the relation between small images in the sequence.
The transform coding module of this embodiment includes L transform layers and a hash layer, where each transform layer includes a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-layer transform outputs and inputs the hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a triad item, a category level semantic item and a sequencing clustering item in the target function;
the transformer coding module is as follows:
z′l=MAS(LN(zl-1)+zl-1) (2)
zl=CONV(LN(z′l)+z′l) (3)
wherein LN (-) represents the normalization of the layer, zlRepresenting the embedded image representation; z'lRepresents the output of a multi-headed self-attention layer, and CONV (·) represents a convolution operation.
To fully exploit the attention information, sequence discrimination selection is used to select valid regions to form a new sequence. For a transformer, the input to the L < th > layer is
Figure BDA0003325304540000053
Wherein the content of the first and second substances,
Figure BDA0003325304540000054
m outputs respectively representing the L-1 th layer; the K-head self-attention weight of each layer except the L-th layer is
Figure BDA0003325304540000055
Wherein L ∈ 1, 2.., L-1. For self-attention of each layer, each patch image has K sets of nodes. Thus, the weights of the M patch images in each layer can be expressed as
Figure BDA0003325304540000056
Where i ∈ 1, 2. Multiplying the weight of the front L-1 layer to obtain a final weight, wherein the final weight is as follows:
Figure BDA0003325304540000057
wherein, w f Indicating the final weight at which the discrimination region can be selected.
The index of the tile image carrying the useful information can be obtained from the selection area, while the index is used as position information to find the corresponding tile image embedding. The selective embedding forms a new sequence and enters the L-th layer transformer.
The L < th > layer transform is followed by a hash layer, giving correspondingly arbitrary triplet elements
Figure BDA0003325304540000061
The deep hash function is:
Figure BDA0003325304540000062
wherein sign (·) represents an element sign function; φ (-) represents a tanh function;
Figure BDA0003325304540000063
representing a sample
Figure BDA0003325304540000064
K of (1) is a hash code;
Figure BDA0003325304540000065
representing a sample
Figure BDA0003325304540000066
At the output of the L-th layer transform, and
Figure BDA0003325304540000067
representing a deep hash function; thetagRepresenting the weight parameter of the hash layer.
Step 2: acquiring an existing sketch image data set, and dividing the data set into a training data set, a verification data set and a test data set; the validation set is used in the experimental process for verifying the model training effect, and only the model performance on the test set is written here.
An example of an implementation of the invention uses two data sets, Sketchy dataset and TU-Berlindataset, for each data set, at 70:10: a scale of 20 divides the data set into a training set, a validation set, and a test set.
And step 3: in the training dataset, N triplet elements are given
Figure BDA0003325304540000068
Positive triple label
Figure BDA0003325304540000069
Wherein
Figure BDA00033253045400000610
The three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;
Figure BDA00033253045400000611
to represent
Figure BDA00033253045400000612
The class label of (a) is used,
Figure BDA00033253045400000613
to represent
Figure BDA00033253045400000614
The class label of (a) is used,
Figure BDA00033253045400000615
to represent
Figure BDA00033253045400000616
Class labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
it is an object of the present invention to perform hash code learning to project instances to hash codes while preserving similarity between matching sketches and images. More specifically,
Figure BDA00033253045400000617
ratio of
Figure BDA00033253045400000618
And smaller, where H (·, ·) represents the Hamming distance,
Figure BDA00033253045400000619
and
Figure BDA00033253045400000620
respectively represent
Figure BDA00033253045400000621
And
Figure BDA00033253045400000622
the k-bit hash code of (a),
Figure BDA00033253045400000623
and
Figure BDA00033253045400000624
hash codes respectively representing an anchor image, a positive example image and a negative example image;
and 4, step 4: training a sketch image retrieval network by using a training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
In this embodiment, the learning rate is set to 0.0004, the loss function is optimized by using the Adam function, and the initial parameters are updated.
The invention provides a new objective function consisting of three groups of items, semantic similarity items, sorting clustering items and distinguishing learning items, which keeps the semantic similarity of hash codes in the process of learning the hash codes, captures the similarity of different modes, optimizes the sorting information clustering similar examples and knows the distinguishing domain learning.
The present invention performs hash code learning, can map instances to hash codes while preserving similarity of matching sketches and images, and to capture similarity of different modalities, three tuple items can be defined as:
Figure BDA0003325304540000071
where H (·,. cndot.) represents a Hamming distance, δ represents a boundary parameter, and max (·) represents a maximum function.
However, the above-mentioned three-tuple items are difficult to be optimized in the training process, so that the binary code is used
Figure BDA0003325304540000072
And
Figure BDA0003325304540000073
relaxed to hash-like code
Figure BDA0003325304540000074
And
Figure BDA0003325304540000075
using the two-normal form instead of the Hamming distance, the triplet terms are redefined as follows:
Figure BDA0003325304540000076
wherein | · | purple sweet2Representing a two-normal vector.
The category-level semantic information is helpful for improving the potential correlation between similar hash codes, so that the label information is used for providing category-level semantics for the learning hash function, and the category-level semantic items are defined as follows:
Figure BDA0003325304540000077
wherein the content of the first and second substances,
Figure BDA0003325304540000078
a cross-entropy function is represented that is,
Figure BDA0003325304540000079
and
Figure BDA00033253045400000710
respectively represent
Figure BDA00033253045400000711
And
Figure BDA00033253045400000712
the tag information of (1).
The Average Precision (AP) is a retrieval index for judging whether the related examples are in the front of the ranking list, the higher the AP value is, the higher the aggregation degree of the related examples is, and the query example hvCan be approximated as:
Figure BDA00033253045400000713
wherein R isuRepresenting positive correlation diversity, | RuI denotes the amount of positive correlation diversity, RtA set of scores representing all instances; η represents a boundary parameter. And d isut=[cos(hv,hu)-cos(hv,ht)]Where cos (·, ·) denotes cosine similarity, hu∈Ru,ht∈Rt,hvRepresenting a query instance; to cluster similar natural images, the sorted cluster entry of the natural images can be expressed as:
Figure BDA0003325304540000081
wherein the content of the first and second substances,
Figure BDA0003325304540000082
representing natural image query instance haV represents the size of a batch of data volumes, and thus the sorted cluster entry of the sketch can be expressed as:
Figure BDA0003325304540000083
wherein the content of the first and second substances,
Figure BDA0003325304540000084
AP values representing a sketch query example.
Thus, the final sorted cluster term can be composed of equations 10 and 11:
Figure BDA0003325304540000085
wherein the content of the first and second substances,
Figure BDA0003325304540000086
representing a sorted clustering entry that optimizes the sorting information to cluster similar instances.
In order to improve the discrimination domain learning, the similarity of the classification marks corresponding to different labels is minimized, and the similarity of the classification marks of the samples with the same label is maximized. The discriminative learning term for a batch of sketch data can be expressed as:
Figure BDA0003325304540000087
wherein cos (·, ·) represents cosine similarity, μ represents a boundary parameter;
Figure BDA0003325304540000088
v denotes the L-th layer1The classification marks of the individual sketch;
Figure BDA0003325304540000089
v denotes the L-th layer2The classification of each sketch.
The discrimination learning item of the natural image may be expressed as:
Figure BDA00033253045400000810
wherein the content of the first and second substances,
Figure BDA00033253045400000811
denotes the L th layer v1The classification of the individual images is marked,
Figure BDA00033253045400000812
specifying the L th layer v2A classification label of each image;
Figure BDA00033253045400000813
respectively represent L layers of the v1Anchor point label of individual image, Lth layer, vth2Anchor point label of individual image, Lth layer, vth1Positive example label and Lth layer v of individual image2A positive example label of an image.
Thus, in conjunction with equations 13 and 14, the discrimination learning term can be defined as:
Figure BDA0003325304540000099
wherein the content of the first and second substances,
Figure BDA0003325304540000091
a discrimination learning term is represented that is capable of learning in a discrimination domain.
Consider the above four parts (three component items)
Figure BDA0003325304540000092
Category level semantic items
Figure BDA0003325304540000093
Ordering clustered items
Figure BDA0003325304540000094
And discriminating between learning items
Figure BDA0003325304540000095
) The overall objective function may be defined as:
Figure BDA0003325304540000096
wherein, α, β and γ represent weight parameters,
Figure BDA0003325304540000097
the overall objective function is represented, and the overall network is trained by combining the four loss functions proposed by the invention.
The network was trained on GeForce GTX Titan X GPU, InterCore i7-5930K 3.50GHZ CPU and 64GRAM devices. Input examples are reshaped into 288 × 288, a loss function is optimized by a learning rate of 0.0004 and an Adam function, and the size of one batch is set to 64; to generate a hash code with a number of bits 32, 64, 128, 256, 512, the hash code length k is set from 32 to 512; the initial weights of the sketch branches and the image branches both use weights pre-trained on the ImageNet dataset; for the three-tuple term, the boundary parameter delta is set to be 0.5, the boundary parameter eta in the sorting clustering term is set to be 0.01, and the boundary parameter mu in the discrimination learning term is set to be 0.5; the hyper-parameters α, β and γ are set to 0.8, 0.1 and 1, respectively. The network trains 500 rounds or until the loss no longer drops.
In the embodiment, the trained sketch image retrieval network is used for calculating the top n precision of the ranking list in the test data set to obtain the average precision mAP and the precision (precision @200) of the top 200 names, and the higher the values of the measurement indexes are, the better the performance of the experimental method is.
Referring to fig. 3, in order to verify the effectiveness of different influencing factors in the method of the present invention, an ablation experiment is first performed: firstly, the method of the invention has no three-tuple learning hash function (DSIH-T); second, no transform performs sketch image retrieval learning (DSIH-V) using the method of the present invention; thirdly, the method of the invention is utilized to perform Hash code learning (DSIH-R) by sequencing and clustering; finally, the method of the invention (HASE) is carried out. The method of the present invention was then compared to advanced methods such as DBSH, GDH, DVML, DSH, TVAE and StyleMeUp for search performance.
TABLE 1
Figure BDA0003325304540000098
Figure BDA0003325304540000101
Table 1 shows the mAP values for different embedding dimensions on the extended Sketchy dataset for the present invention and DSIH-T, DSIH-V and DSIH-R. The comparison result shows that the average precision index of the first 200 retrieval results aiming at different hash bits on the expanded Sketchy data set is the highest by the method provided by the invention.
TABLE 2
Figure BDA0003325304540000102
Table 2 is the maps values for different embedding dimensions on the extended TU-Berlin dataset for the present invention and other methods. The comparison result shows that the average precision index of the first 200 retrieval results aiming at different hash bits on the expanded TU-Berlin data set is the highest by the method provided by the invention.
TABLE 3
Figure BDA0003325304540000103
Table 3 shows the comparison of the present invention with other existing methods, which shows that the method of the present invention has higher searching precision.
In specific implementation, the above process can adopt computer software technology to realize automatic operation process.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A sketch image retrieval method based on sorting clustering sequence identification selection is characterized in that: firstly, constructing a sketch image retrieval network, and then utilizing the sketch image retrieval network to retrieve sketch images;
the construction of the sketch image retrieval network specifically comprises the following steps:
step 1: constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
Figure FDA0003325304530000011
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-th layer of the transform outputs and inputs a hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a ternary group item and a category level semantic item in an objective functionAnd ordering the cluster items;
step 2: acquiring an existing sketch image data set, and dividing the data set into a training data set, a verification data set and a test data set;
and step 3: in the training dataset, N triplet elements are given
Figure FDA0003325304530000012
And triple tags
Figure FDA0003325304530000013
Wherein
Figure FDA0003325304530000014
The three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;
Figure FDA0003325304530000015
to represent
Figure FDA0003325304530000016
The class label of (a) is used,
Figure FDA0003325304530000017
to represent
Figure FDA0003325304530000018
The class label of (a) is used,
Figure FDA0003325304530000019
to represent
Figure FDA00033253045300000110
Class labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
and 4, step 4: training a sketch image retrieval network by using a training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
2. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the location embedding capable of learning is added to the small block image embedding, and the embedding vector is as follows:
Figure FDA0003325304530000021
wherein the content of the first and second substances,
Figure FDA0003325304530000022
respectively representing 1 st, 2 nd, … th, M2-dimensional patch images; e denotes a patch image embedding projection, EposIndicating position embedding.
3. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the transformer encoding module is:
z′l=MAS(LN(zl-1)+zl-1) (2)
zl=CONV(LN(z′l)+z′l) (3)
wherein LN (-) represents the normalization of the layer, zlRepresenting the embedded image representation; z'lRepresents the output of a multi-headed self-attention layer, and CONV (·) represents a convolution operation.
4. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the input of the L-th layer of the transformer coding module is
Figure FDA0003325304530000023
Figure FDA0003325304530000024
Wherein the content of the first and second substances,
Figure FDA0003325304530000025
m outputs respectively representing the L-1 th layer; the K-head self-attention weight of each layer except the L-th layer is
Figure FDA0003325304530000026
Wherein L is belonged to 1,2, … and L-1; for self-attention of each layer, each small image has K groups of nodes; thus, the weights of the M patch images in each layer are expressed as
Figure FDA0003325304530000027
Wherein i ∈ 1,2, …, K; multiplying the weight of the front L-1 layer to obtain a final weight, wherein the final weight is as follows:
Figure FDA0003325304530000028
wherein, wfRepresenting the final weight of the selectable discrimination region;
the index of the small block image carrying the useful information can be obtained from the selection area, meanwhile, the index is used as the position information to find the corresponding small block image embedding, and the selective embedding forms a new sequence and enters the L-th layer transformer.
5. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the hash layer is used for giving any given triple unit
Figure FDA0003325304530000029
The deep hash function is:
Figure FDA0003325304530000031
wherein sign (·) represents an element sign function; φ (-) represents a tanh function;
Figure FDA0003325304530000032
representing a sample
Figure FDA0003325304530000033
K of (1) is a hash code;
Figure FDA0003325304530000034
representing a sample
Figure FDA0003325304530000035
At the output of the L-th layer transform, and
Figure FDA0003325304530000036
representing a deep hash function; thetagRepresenting the weight parameter of the hash layer.
6. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 2, n data sets are used, and for each data set, the data sets are divided into a training set, a verification set and a test set according to the proportion of 70:10: 20; wherein n is a preset value.
7. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 4, the target function consists of three groups of items, semantic similar items, sorting clustering items and distinguishing learning items, the semantic similarity of the hash codes is kept in the hash code learning process, the similarity of different modes is captured, the similar clustering examples of the sorting information are optimized, and distinguishing domain learning is known;
the triad terms are defined as follows:
Figure FDA0003325304530000037
wherein | · | purple sweet2Representing a two-normal vector, delta representing a boundary parameter, and max (·) representing a maximum function;
Figure FDA0003325304530000038
and
Figure FDA0003325304530000039
being hash-like, from binary codes
Figure FDA00033253045300000310
And
Figure FDA00033253045300000311
relaxed to hash-like code;
Figure FDA00033253045300000312
and
Figure FDA00033253045300000313
hash codes respectively representing an anchor image, a positive example image and a negative example image;
category-level semantic similarity terms are defined as follows:
Figure FDA00033253045300000314
wherein the content of the first and second substances,
Figure FDA00033253045300000315
a cross-entropy function is represented that is,
Figure FDA00033253045300000316
and
Figure FDA00033253045300000317
respectively represent
Figure FDA00033253045300000318
And
Figure FDA00033253045300000319
the tag information of (a);
the sort cluster term is defined as follows:
Figure FDA00033253045300000320
Figure FDA00033253045300000321
Figure FDA00033253045300000322
Figure FDA00033253045300000323
wherein the content of the first and second substances,
Figure FDA00033253045300000324
an ordered cluster item representing a natural image,
Figure FDA00033253045300000325
a sorted cluster entry representing a sketch image,
Figure FDA00033253045300000326
representing natural image query instance haV represents the size of a batch of data;
Figure FDA0003325304530000041
representing sketch query realityAverage precision AP value of example; ruRepresenting positive correlation diversity, | RuI denotes the amount of positive correlation diversity, RtA set of scores representing all instances; eta represents a boundary parameter, and dut=[cos(hv,hu)-cos(hv,ht)]Where cos (·, ·) denotes cosine similarity, hu∈Ru,ht∈Rt,hvRepresenting a query instance;
the discrimination learning term is defined as follows:
Figure FDA0003325304530000042
Figure FDA0003325304530000043
Figure FDA0003325304530000044
wherein the content of the first and second substances,
Figure FDA0003325304530000045
for the discrimination learning term of the sketch data,
Figure FDA0003325304530000046
learning terms for discrimination of natural graph data; cos (·, ·) represents cosine similarity, μ represents a boundary parameter;
Figure FDA0003325304530000047
v denotes the L-th layer1The classification marks of the individual sketch;
Figure FDA0003325304530000048
v denotes the L-th layer2The classification marks of the individual sketch;
Figure FDA0003325304530000049
denotes the L th layer v1The classification of the individual images is marked,
Figure FDA00033253045300000410
denotes the L th layer v2A classification label of each image;
Figure FDA00033253045300000411
respectively represent the L th layer v1Anchor point label of individual image, Lth layer, vth2Anchor point label of individual image, Lth layer, vth1Positive example label and Lth layer v of individual image2A proper label of the individual image;
the objective function is defined as:
Figure FDA00033253045300000412
where α, β, and γ represent weight parameters.
8. The sketch image retrieval method based on sorting clustering sequence identification selection according to any one of claims 1-7, wherein: and calculating the top n precisions of the ranking list in the test data set by using the trained sketch image retrieval network to obtain the average precision mAP and the top n precisions, wherein the higher the precision value is, the better the performance of the method is.
9. A sketch image retrieval system for distinguishing selection based on a sorting clustering sequence is characterized by comprising the following modules:
the module 1 is used for constructing a sketch image retrieval network module;
the module 2 is used for searching the sketch images by utilizing the sketch image searching network;
the module 1 specifically comprises the following sub-modules:
the submodule 1 is used for constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
Figure FDA0003325304530000051
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-layer transform outputs and inputs the hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a triad item, a category level semantic item and a sequencing clustering item in the target function;
the submodule 2 is used for acquiring the existing sketch image data set and dividing the data set into a training data set, a verification data set and a test data set;
submodule step 3 for assigning N triplet elements in the training dataset
Figure FDA0003325304530000052
And triple tags
Figure FDA0003325304530000053
Wherein
Figure FDA0003325304530000054
The three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;
Figure FDA0003325304530000055
to represent
Figure FDA0003325304530000056
The class label of (a) is used,
Figure FDA0003325304530000057
to represent
Figure FDA0003325304530000058
The class label of (a) is used,
Figure FDA0003325304530000059
to represent
Figure FDA00033253045300000510
Class labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
the submodule 4 is used for training the sketch image retrieval network by utilizing the training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
CN202111259946.6A 2021-10-28 2021-10-28 Sketch image retrieval method and system based on sorting clustering sequence identification selection Pending CN114020948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111259946.6A CN114020948A (en) 2021-10-28 2021-10-28 Sketch image retrieval method and system based on sorting clustering sequence identification selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111259946.6A CN114020948A (en) 2021-10-28 2021-10-28 Sketch image retrieval method and system based on sorting clustering sequence identification selection

Publications (1)

Publication Number Publication Date
CN114020948A true CN114020948A (en) 2022-02-08

Family

ID=80058252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111259946.6A Pending CN114020948A (en) 2021-10-28 2021-10-28 Sketch image retrieval method and system based on sorting clustering sequence identification selection

Country Status (1)

Country Link
CN (1) CN114020948A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596456A (en) * 2022-05-10 2022-06-07 四川大学 Image set classification method based on aggregated hash learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596456A (en) * 2022-05-10 2022-06-07 四川大学 Image set classification method based on aggregated hash learning

Similar Documents

Publication Publication Date Title
CN111198959B (en) Two-stage image retrieval method based on convolutional neural network
CN107885764B (en) Rapid Hash vehicle retrieval method based on multitask deep learning
CN109241317B (en) Pedestrian Hash retrieval method based on measurement loss in deep learning network
Cakir et al. Adaptive hashing for fast similarity search
CN110942091B (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
KR102305568B1 (en) Finding k extreme values in constant processing time
CN108280187B (en) Hierarchical image retrieval method based on depth features of convolutional neural network
CN110532379B (en) Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis
CN109271486B (en) Similarity-preserving cross-modal Hash retrieval method
Liu et al. Towards optimal binary code learning via ordinal embedding
CN110688474B (en) Embedded representation obtaining and citation recommending method based on deep learning and link prediction
CN102968419B (en) Disambiguation method for interactive Internet entity name
CN104112005B (en) Distributed mass fingerprint identification method
CN105808709A (en) Quick retrieval method and device of face recognition
CN114241273A (en) Multi-modal image processing method and system based on Transformer network and hypersphere space learning
CN113377981B (en) Large-scale logistics commodity image retrieval method based on multitask deep hash learning
CN114357120A (en) Non-supervision type retrieval method, system and medium based on FAQ
CN111325264A (en) Multi-label data classification method based on entropy
CN113836341A (en) Remote sensing image retrieval method based on unsupervised converter balance hash
CN115457332A (en) Image multi-label classification method based on graph convolution neural network and class activation mapping
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
CN114020948A (en) Sketch image retrieval method and system based on sorting clustering sequence identification selection
CN113095229B (en) Self-adaptive pedestrian re-identification system and method for unsupervised domain
CN105117735A (en) Image detection method in big data environment
CN116108217B (en) Fee evasion vehicle similar picture retrieval method based on depth hash coding and multitask prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination