CN114020948A - Sketch image retrieval method and system based on sorting clustering sequence identification selection - Google Patents
Sketch image retrieval method and system based on sorting clustering sequence identification selection Download PDFInfo
- Publication number
- CN114020948A CN114020948A CN202111259946.6A CN202111259946A CN114020948A CN 114020948 A CN114020948 A CN 114020948A CN 202111259946 A CN202111259946 A CN 202111259946A CN 114020948 A CN114020948 A CN 114020948A
- Authority
- CN
- China
- Prior art keywords
- layer
- image
- sketch
- image retrieval
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a sketch image retrieval method based on sorting clustering sequence identification selection. Finally, the hash code is used for retrieving the sketch image. The invention designs a triple transform backbone of a sequence discrimination selection module, and captures an important discrimination domain between a sketch and a natural image; and providing an objective function consisting of three groups of items, semantic similar items, sorting clustering items and distinguishing learning items, keeping the semantic similarity of the hash codes in the process of learning the hash codes, capturing the similarity between different modes, and optimizing sorting information so as to cluster similar examples and know distinguishing domain learning. The problems of redundant information and neglected sequencing information are solved, the retrieval precision is higher, and the performance is further improved.
Description
Technical Field
The invention belongs to the technical field of image retrieval, relates to a sketch image retrieval method and a sketch image retrieval system, and particularly relates to a sketch image retrieval method and a sketch image retrieval system based on sorting clustering sequence identification selection.
Background
Due to the explosive growth of touch screen devices, the use of sketches is becoming more and more frequent: the user can draw a sketch on the touch screen device with a finger anytime and anywhere. It makes sense to mine an effective natural image using a sketch. And thus, the interest of sketch image retrieval is increasing, and the purpose of sketch image retrieval is to match natural images by using a hand-drawn sketch as a query mode.
The existing sketch image retrieval methods are roughly divided into two types: a manual production method and a deep learning method. However, the handmade sketch image retrieval method does not reduce cross-domain differences between the sketch and the natural image very well because handmade features do not effectively represent edges of natural images and misaligned sketches with large variations and ambiguities. In order to solve the problem of cross-domain difference, a deep learning sketch image retrieval method is provided. However, the existing deep learning method still faces two challenges: (1) the sketch and the natural image contain different objects with similar contour shapes. Some deep learning sketch retrieval methods cannot capture an important discrimination domain between a sketch and a natural image, so that the problem of information redundancy is caused, and the performance of sketch image retrieval is influenced finally; (2) the ranking information is closely related to the search results. In the process of learning the hash code of the sketch retrieval task, the conventional method neglects the utilization of sequencing information, so that the performance is not ideal.
Disclosure of Invention
The invention provides a sketch image retrieval method and a sketch image retrieval system based on sorting clustering sequence identification selection, aiming at the defects of the prior art, wherein the method and the system fully utilize distinguishing regions and sorting information to execute hash code learning, firstly draw a query sketch and select the distinguishing regions, and simultaneously utilize the sorting information to aggregate samples of the same category, so that the sample can be known in what kind under other modes. And finally, retrieving the sketch image by utilizing the hash code.
The method adopts the technical scheme that: a sketch image retrieval method based on sorting clustering sequence identification selection is characterized in that a sketch image retrieval network is firstly constructed, and then the sketch image retrieval network is utilized to carry out sketch image retrieval;
the construction of the sketch image retrieval network specifically comprises the following steps:
step 1: constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass;
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-th layer of the transform outputs and inputs a hash layer to carry out deep hash function learning, and the output hash code is used for constructing a three-tuple item, a category level semantic item and a sequencing cluster in the target functionAn item;
step 2: acquiring an existing sketch image data set, and dividing the data set into a training data set, a verification data set and a test data set;
and step 3: in the training dataset, N triplet elements are givenAnd triple tagsWhereinThe three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;to representThe class label of (a) is used,to representThe class label of (a) is used,watch smallClass labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
and 4, step 4: training a sketch image retrieval network by using a training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
The technical scheme adopted by the system of the invention is as follows: a sketch image retrieval system for discriminating selections based on a sorted cluster sequence, comprising the following modules:
the module 1 is used for constructing a sketch image retrieval network module;
the module 2 is used for searching the sketch images by utilizing the sketch image searching network;
the module 1 specifically comprises the following sub-modules:
the submodule 1 is used for constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass;
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-layer transform outputs and inputs the hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a triad item, a category level semantic item and a sequencing clustering item in the target function;
the submodule 2 is used for acquiring the existing sketch image data set and dividing the data set into a training data set, a verification data set and a test data set;
submodule step 3 for assigning N triplet elements in the training datasetAnd triple tagsWhereinThe three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;to representThe class label of (a) is used,to representThe class label of (a) is used,to representClass labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
the submodule 4 is used for training the sketch image retrieval network by utilizing the training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
Compared with the prior art, the invention has the following advantages:
1) designing a triple transform backbone of a sequence identification selection module, and capturing an important identification domain between a sketch and a natural image;
2) and providing an objective function consisting of three groups of items, semantic similar items, sorting clustering items and distinguishing learning items, keeping the semantic similarity of the hash codes in the process of learning the hash codes, capturing the similarity between different modes, and optimizing sorting information so as to cluster similar examples and know distinguishing domain learning. The problems of redundant information and neglected sequencing information are solved, the retrieval precision is higher, and the performance is further improved.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a network structure diagram according to an embodiment of the present invention.
Fig. 3 is a comparison of the inventive method and the DSIH-V method on an extended TU-Berlin dataset. (a) The first 20 of the 256-bit hash code image is retrieved using DSIH-V. (b) The first 20 names of the 256-bit hash code image are retrieved using the DSIH. The wrong retrieved image is labeled by x under the image.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The invention provides a sketch image retrieval method based on sorting clustering sequence identification selection, which fully utilizes distinguishing regions and sorting information to execute hash code learning, firstly draws a query sketch and selects the distinguishing regions, and meanwhile utilizes the sorting information to aggregate samples of the same category. Finally, the hash code is used for retrieving the sketch image.
Referring to fig. 1, the sketch image retrieval method based on sorting clustering sequence identification selection provided by the invention comprises the steps of firstly constructing a sketch image retrieval network, and then utilizing the sketch image retrieval network to perform sketch image retrieval;
the method for constructing the sketch image retrieval network comprises the following specific steps of:
step 1: constructing a sketch image retrieval network;
referring to fig. 2, the sketch image retrieval network of the present embodiment includes a transform partitioning module, a linear projection module, and a transform encoding module;
a transform partitioning module for dividing the input image into M2D patch images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and embedding learnable positions into the small block image for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass;
The embedded vector is:
wherein the content of the first and second substances,respectively representing 1 st, 2 nd, … th, M2-dimensional patch images; e denotes a patch image embedding projection, EposIndicating position embedding.
To better focus on the most significant area, z0And sending the sequence into a transform coding module, and mining the relation between small images in the sequence.
The transform coding module of this embodiment includes L transform layers and a hash layer, where each transform layer includes a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-layer transform outputs and inputs the hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a triad item, a category level semantic item and a sequencing clustering item in the target function;
the transformer coding module is as follows:
z′l=MAS(LN(zl-1)+zl-1) (2)
zl=CONV(LN(z′l)+z′l) (3)
wherein LN (-) represents the normalization of the layer, zlRepresenting the embedded image representation; z'lRepresents the output of a multi-headed self-attention layer, and CONV (·) represents a convolution operation.
To fully exploit the attention information, sequence discrimination selection is used to select valid regions to form a new sequence. For a transformer, the input to the L < th > layer isWherein the content of the first and second substances,m outputs respectively representing the L-1 th layer; the K-head self-attention weight of each layer except the L-th layer isWherein L ∈ 1, 2.., L-1. For self-attention of each layer, each patch image has K sets of nodes. Thus, the weights of the M patch images in each layer can be expressed asWhere i ∈ 1, 2. Multiplying the weight of the front L-1 layer to obtain a final weight, wherein the final weight is as follows:
wherein, w f Indicating the final weight at which the discrimination region can be selected.
The index of the tile image carrying the useful information can be obtained from the selection area, while the index is used as position information to find the corresponding tile image embedding. The selective embedding forms a new sequence and enters the L-th layer transformer.
The L < th > layer transform is followed by a hash layer, giving correspondingly arbitrary triplet elementsThe deep hash function is:
wherein sign (·) represents an element sign function; φ (-) represents a tanh function;representing a sampleK of (1) is a hash code;representing a sampleAt the output of the L-th layer transform, andrepresenting a deep hash function; thetagRepresenting the weight parameter of the hash layer.
Step 2: acquiring an existing sketch image data set, and dividing the data set into a training data set, a verification data set and a test data set; the validation set is used in the experimental process for verifying the model training effect, and only the model performance on the test set is written here.
An example of an implementation of the invention uses two data sets, Sketchy dataset and TU-Berlindataset, for each data set, at 70:10: a scale of 20 divides the data set into a training set, a validation set, and a test set.
And step 3: in the training dataset, N triplet elements are givenPositive triple labelWhereinThe three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;to representThe class label of (a) is used,to representThe class label of (a) is used,to representClass labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
it is an object of the present invention to perform hash code learning to project instances to hash codes while preserving similarity between matching sketches and images. More specifically,ratio ofAnd smaller, where H (·, ·) represents the Hamming distance,andrespectively representAndthe k-bit hash code of (a),andhash codes respectively representing an anchor image, a positive example image and a negative example image;
and 4, step 4: training a sketch image retrieval network by using a training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
In this embodiment, the learning rate is set to 0.0004, the loss function is optimized by using the Adam function, and the initial parameters are updated.
The invention provides a new objective function consisting of three groups of items, semantic similarity items, sorting clustering items and distinguishing learning items, which keeps the semantic similarity of hash codes in the process of learning the hash codes, captures the similarity of different modes, optimizes the sorting information clustering similar examples and knows the distinguishing domain learning.
The present invention performs hash code learning, can map instances to hash codes while preserving similarity of matching sketches and images, and to capture similarity of different modalities, three tuple items can be defined as:
where H (·,. cndot.) represents a Hamming distance, δ represents a boundary parameter, and max (·) represents a maximum function.
However, the above-mentioned three-tuple items are difficult to be optimized in the training process, so that the binary code is usedAndrelaxed to hash-like codeAndusing the two-normal form instead of the Hamming distance, the triplet terms are redefined as follows:
wherein | · | purple sweet2Representing a two-normal vector.
The category-level semantic information is helpful for improving the potential correlation between similar hash codes, so that the label information is used for providing category-level semantics for the learning hash function, and the category-level semantic items are defined as follows:
wherein the content of the first and second substances,a cross-entropy function is represented that is,andrespectively representAndthe tag information of (1).
The Average Precision (AP) is a retrieval index for judging whether the related examples are in the front of the ranking list, the higher the AP value is, the higher the aggregation degree of the related examples is, and the query example hvCan be approximated as:
wherein R isuRepresenting positive correlation diversity, | RuI denotes the amount of positive correlation diversity, RtA set of scores representing all instances; η represents a boundary parameter. And d isut=[cos(hv,hu)-cos(hv,ht)]Where cos (·, ·) denotes cosine similarity, hu∈Ru,ht∈Rt,hvRepresenting a query instance; to cluster similar natural images, the sorted cluster entry of the natural images can be expressed as:
wherein the content of the first and second substances,representing natural image query instance haV represents the size of a batch of data volumes, and thus the sorted cluster entry of the sketch can be expressed as:
wherein the content of the first and second substances,AP values representing a sketch query example.
Thus, the final sorted cluster term can be composed of equations 10 and 11:
wherein the content of the first and second substances,representing a sorted clustering entry that optimizes the sorting information to cluster similar instances.
In order to improve the discrimination domain learning, the similarity of the classification marks corresponding to different labels is minimized, and the similarity of the classification marks of the samples with the same label is maximized. The discriminative learning term for a batch of sketch data can be expressed as:
wherein cos (·, ·) represents cosine similarity, μ represents a boundary parameter;v denotes the L-th layer1The classification marks of the individual sketch;v denotes the L-th layer2The classification of each sketch.
The discrimination learning item of the natural image may be expressed as:
wherein the content of the first and second substances,denotes the L th layer v1The classification of the individual images is marked,specifying the L th layer v2A classification label of each image;respectively represent L layers of the v1Anchor point label of individual image, Lth layer, vth2Anchor point label of individual image, Lth layer, vth1Positive example label and Lth layer v of individual image2A positive example label of an image.
Thus, in conjunction with equations 13 and 14, the discrimination learning term can be defined as:
wherein the content of the first and second substances,a discrimination learning term is represented that is capable of learning in a discrimination domain.
Consider the above four parts (three component items)Category level semantic itemsOrdering clustered itemsAnd discriminating between learning items) The overall objective function may be defined as:
wherein, α, β and γ represent weight parameters,the overall objective function is represented, and the overall network is trained by combining the four loss functions proposed by the invention.
The network was trained on GeForce GTX Titan X GPU, InterCore i7-5930K 3.50GHZ CPU and 64GRAM devices. Input examples are reshaped into 288 × 288, a loss function is optimized by a learning rate of 0.0004 and an Adam function, and the size of one batch is set to 64; to generate a hash code with a number of bits 32, 64, 128, 256, 512, the hash code length k is set from 32 to 512; the initial weights of the sketch branches and the image branches both use weights pre-trained on the ImageNet dataset; for the three-tuple term, the boundary parameter delta is set to be 0.5, the boundary parameter eta in the sorting clustering term is set to be 0.01, and the boundary parameter mu in the discrimination learning term is set to be 0.5; the hyper-parameters α, β and γ are set to 0.8, 0.1 and 1, respectively. The network trains 500 rounds or until the loss no longer drops.
In the embodiment, the trained sketch image retrieval network is used for calculating the top n precision of the ranking list in the test data set to obtain the average precision mAP and the precision (precision @200) of the top 200 names, and the higher the values of the measurement indexes are, the better the performance of the experimental method is.
Referring to fig. 3, in order to verify the effectiveness of different influencing factors in the method of the present invention, an ablation experiment is first performed: firstly, the method of the invention has no three-tuple learning hash function (DSIH-T); second, no transform performs sketch image retrieval learning (DSIH-V) using the method of the present invention; thirdly, the method of the invention is utilized to perform Hash code learning (DSIH-R) by sequencing and clustering; finally, the method of the invention (HASE) is carried out. The method of the present invention was then compared to advanced methods such as DBSH, GDH, DVML, DSH, TVAE and StyleMeUp for search performance.
TABLE 1
Table 1 shows the mAP values for different embedding dimensions on the extended Sketchy dataset for the present invention and DSIH-T, DSIH-V and DSIH-R. The comparison result shows that the average precision index of the first 200 retrieval results aiming at different hash bits on the expanded Sketchy data set is the highest by the method provided by the invention.
TABLE 2
Table 2 is the maps values for different embedding dimensions on the extended TU-Berlin dataset for the present invention and other methods. The comparison result shows that the average precision index of the first 200 retrieval results aiming at different hash bits on the expanded TU-Berlin data set is the highest by the method provided by the invention.
TABLE 3
Table 3 shows the comparison of the present invention with other existing methods, which shows that the method of the present invention has higher searching precision.
In specific implementation, the above process can adopt computer software technology to realize automatic operation process.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (9)
1. A sketch image retrieval method based on sorting clustering sequence identification selection is characterized in that: firstly, constructing a sketch image retrieval network, and then utilizing the sketch image retrieval network to retrieve sketch images;
the construction of the sketch image retrieval network specifically comprises the following steps:
step 1: constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass;
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-th layer of the transform outputs and inputs a hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a ternary group item and a category level semantic item in an objective functionAnd ordering the cluster items;
step 2: acquiring an existing sketch image data set, and dividing the data set into a training data set, a verification data set and a test data set;
and step 3: in the training dataset, N triplet elements are givenAnd triple tagsWhereinThe three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;to representThe class label of (a) is used,to representThe class label of (a) is used,to representClass labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
and 4, step 4: training a sketch image retrieval network by using a training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
2. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the location embedding capable of learning is added to the small block image embedding, and the embedding vector is as follows:
3. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the transformer encoding module is:
z′l=MAS(LN(zl-1)+zl-1) (2)
zl=CONV(LN(z′l)+z′l) (3)
wherein LN (-) represents the normalization of the layer, zlRepresenting the embedded image representation; z'lRepresents the output of a multi-headed self-attention layer, and CONV (·) represents a convolution operation.
4. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the input of the L-th layer of the transformer coding module is Wherein the content of the first and second substances,m outputs respectively representing the L-1 th layer; the K-head self-attention weight of each layer except the L-th layer isWherein L is belonged to 1,2, … and L-1; for self-attention of each layer, each small image has K groups of nodes; thus, the weights of the M patch images in each layer are expressed asWherein i ∈ 1,2, …, K; multiplying the weight of the front L-1 layer to obtain a final weight, wherein the final weight is as follows:
wherein, wfRepresenting the final weight of the selectable discrimination region;
the index of the small block image carrying the useful information can be obtained from the selection area, meanwhile, the index is used as the position information to find the corresponding small block image embedding, and the selective embedding forms a new sequence and enters the L-th layer transformer.
5. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 1, the hash layer is used for giving any given triple unitThe deep hash function is:
6. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 2, n data sets are used, and for each data set, the data sets are divided into a training set, a verification set and a test set according to the proportion of 70:10: 20; wherein n is a preset value.
7. The sketch image retrieval method based on sorting clustering sequence identification selection as claimed in claim 1, wherein: in step 4, the target function consists of three groups of items, semantic similar items, sorting clustering items and distinguishing learning items, the semantic similarity of the hash codes is kept in the hash code learning process, the similarity of different modes is captured, the similar clustering examples of the sorting information are optimized, and distinguishing domain learning is known;
the triad terms are defined as follows:
wherein | · | purple sweet2Representing a two-normal vector, delta representing a boundary parameter, and max (·) representing a maximum function;andbeing hash-like, from binary codesAndrelaxed to hash-like code;andhash codes respectively representing an anchor image, a positive example image and a negative example image;
category-level semantic similarity terms are defined as follows:
wherein the content of the first and second substances,a cross-entropy function is represented that is,andrespectively representAndthe tag information of (a);
the sort cluster term is defined as follows:
wherein the content of the first and second substances,an ordered cluster item representing a natural image,a sorted cluster entry representing a sketch image,representing natural image query instance haV represents the size of a batch of data;representing sketch query realityAverage precision AP value of example; ruRepresenting positive correlation diversity, | RuI denotes the amount of positive correlation diversity, RtA set of scores representing all instances; eta represents a boundary parameter, and dut=[cos(hv,hu)-cos(hv,ht)]Where cos (·, ·) denotes cosine similarity, hu∈Ru,ht∈Rt,hvRepresenting a query instance;
the discrimination learning term is defined as follows:
wherein the content of the first and second substances,for the discrimination learning term of the sketch data,learning terms for discrimination of natural graph data; cos (·, ·) represents cosine similarity, μ represents a boundary parameter;v denotes the L-th layer1The classification marks of the individual sketch;v denotes the L-th layer2The classification marks of the individual sketch;denotes the L th layer v1The classification of the individual images is marked,denotes the L th layer v2A classification label of each image;respectively represent the L th layer v1Anchor point label of individual image, Lth layer, vth2Anchor point label of individual image, Lth layer, vth1Positive example label and Lth layer v of individual image2A proper label of the individual image;
the objective function is defined as:
where α, β, and γ represent weight parameters.
8. The sketch image retrieval method based on sorting clustering sequence identification selection according to any one of claims 1-7, wherein: and calculating the top n precisions of the ranking list in the test data set by using the trained sketch image retrieval network to obtain the average precision mAP and the top n precisions, wherein the higher the precision value is, the better the performance of the method is.
9. A sketch image retrieval system for distinguishing selection based on a sorting clustering sequence is characterized by comprising the following modules:
the module 1 is used for constructing a sketch image retrieval network module;
the module 2 is used for searching the sketch images by utilizing the sketch image searching network;
the module 1 specifically comprises the following sub-modules:
the submodule 1 is used for constructing a sketch image retrieval network;
the sketch image retrieval network comprises a transform partitioning module, a linear projection module and a transform coding module;
the transform segmentation module is used for dividing the input image into M2D small block images xpThe size of each picture is H x W, the size of each small block picture in the picture is P x P,
the linear projection module is used for mapping the small block image output by the transform module to a D dimension, and adding learnable position embedding into the small block image embedding for storing position information; where the embedding vector is denoted as z0The output of the position zero is a D-dimensional class token xclass;
The transformer coding module is used for transmitting z into the transformer coding module0Excavating the relation between small images in the sequence; the transformer coding module comprises L transformer layers and a hash layer, wherein each transformer layer comprises a multi-headed self-attention layer MSA and Conv1×1Block, Conv1×1The block consists of two convolutional layers with 1 x 1 convolutional kernels and one fully-connected layer; for each transform layer, its input is the output of the previous layer; the L-layer transform outputs and inputs the hash layer, deep hash function learning is carried out, and the output hash code is used for constructing a triad item, a category level semantic item and a sequencing clustering item in the target function;
the submodule 2 is used for acquiring the existing sketch image data set and dividing the data set into a training data set, a verification data set and a test data set;
submodule step 3 for assigning N triplet elements in the training datasetAnd triple tagsWhereinThe three elements in the table sequentially represent an anchor sketch, a positive example image and a negative example image of the ith data respectively;to representThe class label of (a) is used,to representThe class label of (a) is used,to representClass labels of (1); n, I respectively represents the number of triple elements and the number of samples in the data set; a, p and n respectively represent an anchor point image, a positive example image and a negative example image;
the submodule 4 is used for training the sketch image retrieval network by utilizing the training set, calculating a target function of the sketch image retrieval network and updating initial parameters of the sketch image retrieval network; the network training reaches a preset turn or until the loss does not decrease any more; and obtaining the trained sketch image retrieval network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111259946.6A CN114020948A (en) | 2021-10-28 | 2021-10-28 | Sketch image retrieval method and system based on sorting clustering sequence identification selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111259946.6A CN114020948A (en) | 2021-10-28 | 2021-10-28 | Sketch image retrieval method and system based on sorting clustering sequence identification selection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114020948A true CN114020948A (en) | 2022-02-08 |
Family
ID=80058252
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111259946.6A Pending CN114020948A (en) | 2021-10-28 | 2021-10-28 | Sketch image retrieval method and system based on sorting clustering sequence identification selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114020948A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114596456A (en) * | 2022-05-10 | 2022-06-07 | 四川大学 | Image set classification method based on aggregated hash learning |
-
2021
- 2021-10-28 CN CN202111259946.6A patent/CN114020948A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114596456A (en) * | 2022-05-10 | 2022-06-07 | 四川大学 | Image set classification method based on aggregated hash learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111198959B (en) | Two-stage image retrieval method based on convolutional neural network | |
CN107885764B (en) | Rapid Hash vehicle retrieval method based on multitask deep learning | |
CN109241317B (en) | Pedestrian Hash retrieval method based on measurement loss in deep learning network | |
Cakir et al. | Adaptive hashing for fast similarity search | |
CN110942091B (en) | Semi-supervised few-sample image classification method for searching reliable abnormal data center | |
KR102305568B1 (en) | Finding k extreme values in constant processing time | |
CN108280187B (en) | Hierarchical image retrieval method based on depth features of convolutional neural network | |
CN110532379B (en) | Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis | |
CN109271486B (en) | Similarity-preserving cross-modal Hash retrieval method | |
Liu et al. | Towards optimal binary code learning via ordinal embedding | |
CN110688474B (en) | Embedded representation obtaining and citation recommending method based on deep learning and link prediction | |
CN102968419B (en) | Disambiguation method for interactive Internet entity name | |
CN104112005B (en) | Distributed mass fingerprint identification method | |
CN105808709A (en) | Quick retrieval method and device of face recognition | |
CN114241273A (en) | Multi-modal image processing method and system based on Transformer network and hypersphere space learning | |
CN113377981B (en) | Large-scale logistics commodity image retrieval method based on multitask deep hash learning | |
CN114357120A (en) | Non-supervision type retrieval method, system and medium based on FAQ | |
CN111325264A (en) | Multi-label data classification method based on entropy | |
CN113836341A (en) | Remote sensing image retrieval method based on unsupervised converter balance hash | |
CN115457332A (en) | Image multi-label classification method based on graph convolution neural network and class activation mapping | |
CN114579794A (en) | Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion | |
CN114020948A (en) | Sketch image retrieval method and system based on sorting clustering sequence identification selection | |
CN113095229B (en) | Self-adaptive pedestrian re-identification system and method for unsupervised domain | |
CN105117735A (en) | Image detection method in big data environment | |
CN116108217B (en) | Fee evasion vehicle similar picture retrieval method based on depth hash coding and multitask prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |