CN114647754A - Hand-drawn image real-time retrieval method fusing image label information - Google Patents

Hand-drawn image real-time retrieval method fusing image label information Download PDF

Info

Publication number
CN114647754A
CN114647754A CN202210396360.2A CN202210396360A CN114647754A CN 114647754 A CN114647754 A CN 114647754A CN 202210396360 A CN202210396360 A CN 202210396360A CN 114647754 A CN114647754 A CN 114647754A
Authority
CN
China
Prior art keywords
label
image
sketch
sample
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210396360.2A
Other languages
Chinese (zh)
Inventor
戴大伟
唐晓宇
刘颖格
夏书银
王国胤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210396360.2A priority Critical patent/CN114647754A/en
Publication of CN114647754A publication Critical patent/CN114647754A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of image retrieval, and particularly relates to a hand-drawn image real-time retrieval method fusing image label information, which comprises the following steps: extracting characteristic graphs of the hand-drawn sketch and characteristic vectors of the real object image by adopting an improved neural network model, calculating Euclidean distances D between sketch branches and all images when the characteristic vectors of the hand-drawn image are generated for retrieval, and taking an average value D of DmAs a label distance reference value, a pseudo label P is processed according to an input label and the probability value of the corresponding input label category processed by Softmax stored in a databasecRespectively for the distance d according to the database sample categorymWeighting to obtain a label weighted distance value DlFinally according to D and DlThe sum sorts the images in the database, and the top-k searched images are returned; when the method is used for searching the early sketch, the information such as the color, the characteristics and the like of the target image can be used for inquiring, and the searching efficiency is greatly improved when stroke information is less.

Description

Hand-drawn image real-time retrieval method fusing image label information
Technical Field
The invention belongs to the field of dynamic sketch retrieval, and particularly relates to a hand-drawn image real-time retrieval method fusing image label information.
Background
Due to the development of touch screens, in recent years, the sketch-based image retrieval can flexibly use an unlimited hand-drawn sketch to inquire a natural image receives wide attention. The sketch search may be classified into a coarse-level sketch search (CBIR) and a Fine-grained sketch search (FG-SBIR) according to the search Category. Fine-grained sketch retrieval FG-SBIR is image matching of the details of a hand-drawn sketch, aiming at retrieving a specific photo in the gallery. Currently, much progress is made in the research on FG-SBIR, but there are three problems in the sketching process that prevent FG-SBIR from being widely used in practice: (1) the drawing skill of the user is insufficient, the drawn sketch pattern has large difference, and the retrieval efficiency is low. (2) The time required to draw a complete sketch and the reduction in sketch retrieval time required to retrieve the target image with minimal strokes must also be considered. (3) The sketch has abstraction, a target image is generally searched by using simple lines during sketch retrieval, and information contained in the sample sketch is only black and white lines and is less. Secondly, the sketches have diversity, the contour similarity of target images (such as lady high-heeled shoes) with small style difference is extremely high, so that the sketches of the target images also have extremely high similarity, the target images cannot be distinguished from the sketches, and the reason that the early sketches are low in searching efficiency is also caused. In the traditional method, when a user searches commodities, only a target image can be searched by using lines, and if the user wants a chair with red color in the searching process, the user can search the wanted content when a draft at the later stage is complete because the line information does not contain color information.
In summary, the prior art has the technical problems that: how to improve retrieval efficiency when stroke information is few.
Disclosure of Invention
The invention provides a hand-drawn image real-time retrieval method fusing image label information to solve the technical problems, and the method is used for increasing sketch information in a sketch retrieval frame fusing the sketch style and enhancing the early retrieval efficiency of sketch retrieval. When the user searches early sketches, the user simultaneously uses information such as color, characteristics and the like of the target image to inquire, and the searching efficiency can be greatly improved when stroke information is less.
A hand-drawn image real-time retrieval method fusing image label information comprises the following steps:
inputting a hand-drawn image and label information of a target image into a neural network model trained and improved through a training set, and retrieving in real time to obtain a retrieval result;
the training of the improved neural network model comprises1、f2、f3、fexWherein, f1To pre-train the network, f2For the layer of attention, f3To lower the dimension layer, fexA label extraction layer;
the training process of the improved neural network model comprises the following steps:
s1: constructing a training set, wherein the training set comprises an image set consisting of a plurality of images and a complete sketch which is correspondingly retracted, and an expansion tag set corresponding to the images, and the expansion tag set corresponding to the images consists of all tag information of the images;
s2, selecting one image in the image set as a target image in each step of training, and training f of the neural network model by using the corresponding hand-drawn sketch of the image1、f2、f3Three branches, fixed after training f1、f2Parameters, simultaneous training is completed by f1、f2、f3Extracting embedded vectors of all target images;
s3, inputting the target images in the image set into the trained f1In (1), a feature map of the target image is obtained, and the feature map is input into (f)exPredicting the label of the image, training f by adopting a cross entropy loss function according to the label information in the extended label setexAfter the training is finishedFixing parameters;
s4, rendering the complete sketch of each image in the image set into a plurality of sketches according to the stroke sequence of the drawing, forming a sketch branch set of the image set after each sketch is rendered, and processing the sketch branch set by f1、f2、f3Extracting an embedded vector of a sketch branch;
s5, calculating the embedded vector error of each picture in the draft branch and the embedded vector error of the target image by adopting a triple loss function, reversely propagating the errors to approach the target image and keep away from the non-target image as the target, and adjusting f in the model3The parameters of (1);
and S6, obtaining a sketch branch of the next target image, and repeating the steps S4-S6 until the model reaches the upper limit of the training times.
Furthermore, a complete sketch of an image is rendered into N pictures according to the stroke sequence of drawing, the N pictures form a sketch branch, each picture in the sketch branch comprises a first pen to an nth pen of the complete sketch, the strokes of each picture are different, N is more than or equal to 1 and less than or equal to N, and one sketch branch S is arranged according to the ascending sequence of the number of the strokes contained in the pictures, wherein { S ═ S { (S) }1,s2,…,sn,…,sN},snRepresenting a picture containing first through nth strokes.
Further, the images in the training set are labeled with L ═ L1,l2,…,ln,…,lNIs used for training the label extraction layer fexThe cross entropy Loss expression is as follows:
Figure BDA0003599166670000031
wherein K represents the total number of categories contained by the tag; n represents the total number of samples; n represents the nth sample; p is a radical ofncRepresenting the probability that sample n belongs to class c; lncA correct probability label of class c representing sample n;
further, calculating the error of the embedded vector of each picture in the sketch branch and the embedded vector of the target image by adopting a triple Loss function, wherein the expression of the triple Loss is as follows:
Loss=max(d(VSi,Vp)-d(VSi,Vn)+α,0)
wherein, VSiAn embedded vector representing the ith picture in the sketch branch; vpAn embedded vector representing the target image; vnAn embedded vector representing a random one of the images in the image set other than the target image; α is a constant; d is the Euclidean distance calculation.
Further, the step of inputting the hand-drawn sketch and the label information of the target image by the user, retrieving in real time and obtaining a final retrieval result comprises the following steps:
the method comprises the following steps: user-entered sketch through image distance network f1、f2、f3Obtaining a sketch embedding vector V of the step iSi
Step two: calculating VSiWith the embedded vector V of each image in the databasepThe Euclidean distance of (D), obtain distance vector D ═ D1,d2,…,dn,…,dN};
Step three: calculate the average of the elements in the distance vector and average f1Output feature map input fexPredicting the label probability of an input sketch, and processing the label probability by utilizing Softmax to obtain a pseudo label;
step four: average value d of elements in distance vector according to relation of pseudo label and input labelmWeighting to obtain a label weighted distance value Dl
Step five: assigning an attenuation coefficient to the tag weighted distance based on D and DlAnd the sum sorts the images in the database and obtains a retrieval result.
Further, label probability of a convolutional neural network prediction image is adopted, and a probability vector set P of N samples respectively belonging to the category c is obtained through Softmax processingc={p1c,p2c,…,pnc,…,pNcH, will PcProbability p of a sample n belonging to class c as a pseudo labelncExpressed as:
Figure BDA0003599166670000041
wherein, VncA probability vector representing that sample n belongs to class c; vnkA probability vector representing the total number of label categories for sample n; k represents the total number of categories contained in the label; n represents the total number of samples; n represents the nth sample; p is a radical of formulancIndicating the probability that sample n belongs to class c.
Further, the average value d of the elements in the distance vector is calculated according to the relation between the pseudo label and the input labelmWeighting to obtain a label weighted distance value DlMax (p), the maximum value of the pseudo label of the sample nn) For the label class to which the sample belongs, if Max (p)n)>0.8, marking the sample n as a credible sample, otherwise, marking the sample n as an untrustworthy sample; if the pseudo label Max (p)n)>0.8 and the same as the input label, wherein the sample n is a credible positive sample; if the pseudo tag Max (p)n)>0.8 and different from the input label, the sample n is a credible negative sample; otherwise, the distance is an untrusted sample, and the distance is not weighted; calculating a tag weighted distance value DlThe expression of (a) is:
Figure BDA0003599166670000042
wherein d ismRepresents the average of the elements in the distance vector; dnRepresenting the Euclidean distance of the sample n from the vector of the sketch; n represents the total number of samples; omegap<0,ωpWeighting the credible negative sample label, omegan>0,ωnWeighting the credible positive sample label; p is a radical ofnA pseudo label representing the probability value of the sample n.
Further, a decay factor, D and D, is assigned to the tag weighted distancelAnd the sum ranks the images in the database, and the expression is as follows:
Dfinal=D+ω·Dl
wherein D is the distance vector between the sketch branch and all the images;DlWeighting the label distance; dfinalDistance according to the final sorting; ω is the label weighted distance weight, and ω gradually decreases as i increases, i.e., the input sketch is more complete.
The invention integrates the image label information of the sketch to carry out early image retrieval, retrieves images with less strokes in the early stage according to the extended label set of the target image, integrates the sketch style with the sketch retrieval frame, and can retrieve the target image by using the least strokes of the sketch, thereby reducing the early retrieval time of the hand-drawn sketch and improving the retrieval efficiency.
Drawings
FIG. 1 is a diagram of a baseline model of the present invention;
FIG. 2 is a diagram of a deep neural network search framework model according to the present invention;
FIG. 3 is a schematic diagram of a sketch branch rendering process and a picture tag classification;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A hand-drawn image real-time retrieval method fusing image label information, as shown in fig. 1-2, comprising:
acquiring a complete sketch of a target image and label information of all target images, wherein the label information of all target images forms an expansion label set, then rendering the complete sketch into N pictures according to the stroke sequence of drawing, forming a sketch branch by the N pictures after rendering, wherein each picture in the sketch branch comprises the first stroke to the nth stroke of the complete sketch, the strokes of each picture are different, N is more than or equal to 1 and less than or equal to N, and one sketch branch S is { S { (S) } according to the ascending sequence of the strokes of the pictures1,s2,…,sn,…,sN},snRepresenting a picture containing the first through nth strokes.
As shown in fig. 3, a QMUL-Shoe-V2 data set and a QMUL-Chair-V2 data set are selected from an image retrieval data set of the fine-grained sketch retrieval FG-SBIR as a training data set of the current model, an image is respectively selected from a QMUL-Shoe-V2 data set and a QMUL-Chair-V2 data set as a target image, a complete sketch and label information of the image are obtained to form an extended label set, and a sketch branch of the target image is obtained by rendering according to a picture stroke sequence.
Specifically, as shown in (b) tag information in fig. 3, volunteers who found different drawing bases let them manually draw a complete sketch according to the target image. According to the picture information in the two data sets, corresponding label information is printed on each picture in the two data sets, and the content of the label information is classified artificially.
Specifically, as shown in the process of drawing a sketch by hand in fig. 3 (a), for a complete sketch, the complete sketch is rendered into N pictures according to the completeness of the sketch, the N pictures after rendering are sketch branches, and each picture in the sketch branches includes the first stroke to the nth stroke of the complete sketch. For example: the first picture in the sketch branch only comprises the first pen of the complete sketch, the second picture comprises the first pen and the second pen of the complete sketch, and the third picture comprises the first pen, the second pen and the third pen of the complete sketch, and so on.
A plurality of target images acquired from the QMEL-Shoe-V2 data set and the QMEL-Chair-V2 data set and label information of a complete sketch image set corresponding to the target images form an extended label set to form a training set, when a model is trained, label information of a branch specially used for training the images is provided, and when the model is searched, the label of the branch judging image is compared with the input label to perform auxiliary search; inputting a hand-drawn sketch and label information of a target image into the trained improved neural network model, and retrieving and obtaining the target image in real time;
the training process of the improved neural network model comprises the following steps:
s1: constructing a training set, wherein the training set comprises an image set consisting of a plurality of images and a complete sketch which is correspondingly recovered, and an expansion tag set corresponding to the images, and the expansion tag set corresponding to the images consists of all tag information of the images;
s2, selecting one image in the image set as a target image in each step of training, and using the corresponding hand-drawn sketch of the image to train f of the neural network model1、f2、f3Three branches, fixed after training f1、f2Parameters, simultaneous training is completed by f1、f2、f3Extracting embedded vectors of all target images;
s3, inputting the target images in the image set into the trained f1In (1), a feature map of the target image is obtained, and the feature map is input into (f)exPredicting a label of the target image; training f according to the label information in the extended label set by adopting a cross entropy loss functionexFixing parameters after training;
s4, rendering the complete sketch of each image in the image set into a plurality of sketches according to the stroke sequence of the drawing, forming a sketch branch set of the image set after each sketch is rendered, and processing the sketch branch set by f1、f2、f3Extracting an embedded vector of a sketch branch;
s5, calculating the embedded vector error of each picture in the draft branch and the embedded vector error of the target image by adopting a triple loss function, reversely propagating the errors to approach the target image and keep away from the non-target image as the target, and adjusting f in the model3The parameters of (a);
s6, obtaining the draft branch of the next target image, and repeating the steps S4-S6 until the model reaches the upper limit of the training times.
In step S3, the training process of the improved neural network model uses a cross entropy loss function to train f according to the label information in the extended label setexThe method comprises the following steps: set label L ═ L for images in the training set1,l2,…,ln,…,lNIs used for training the label extraction layer fexThe cross entropy Loss expression is as follows:
Figure BDA0003599166670000071
wherein K represents the total number of categories contained by the tag; n represents the total number of samples; p is a radical ofncRepresenting the probability that sample n belongs to class c; lncThe correct probability label of class c representing sample n.
In the step S5 of the training process of the improved neural network model, a triple Loss function is used to calculate the error between the embedded vector of each picture in the sketch branch and the embedded vector of the target image, and the expression of the triple Loss is:
Loss=max(d(VSi,Vp)-d(VSi,Vn)+α,0)
wherein, VSiAn embedded vector representing the ith picture in the sketch branch; vpAn embedded vector representing the target image; vnAn embedded vector representing a random one of the images in the image set other than the target image; α is a constant; d is the Euclidean distance calculation.
Preferably, the user inputs a hand-drawn sketch of the target image f1Through fexPredicting the label probability of the target image, processing the label probability by utilizing Softmax to obtain a pseudo label, and storing the pseudo label into a database;
further, label probability of a convolutional neural network prediction image is adopted, and a probability vector set P of N samples respectively belonging to the category c is obtained through Softmax processingc={p1c,p2c,…,pnc,…,pNcH, mixing PcAs pseudo label, the probability p that a sample n belongs to class cncExpressed as:
Figure BDA0003599166670000081
wherein, VncA probability vector representing that sample n belongs to class c; vnkLabel class representing sample nA probability vector of the total; k represents the total number of categories contained in the label; n represents the total number of samples; n represents the nth sample; p is a radical ofncRepresenting the probability that sample n belongs to class c;
further, the input sketch passes through an image distance network f1、f2、f3Obtaining a sketch embedding vector V of the step iSiCalculating VSiWith the embedded vector V of each image in the databasepThe Euclidean distance of (D), obtain distance vector D ═ D1,d2,…,dn,…,dN}; taking the average value D of DmSelecting a probability value pseudo label P corresponding to the input label category processed by Softmax stored in a database as a label distance reference value according to the input labelc={p1c,p2c,…,pnc,…,pNcH, for distance dmWeighting to obtain a label weighted distance value DlMax (p), the maximum value of the pseudo label of the sample nn) For the label class to which the sample belongs, if Max (p)n)>0.8, marking the sample n as a credible sample, otherwise, marking the sample n as an untrusted sample; if the pseudo tag Max (p)n)>0.8 and the same as the input label, wherein the sample n is a credible positive sample; if the pseudo label Max (p)n)>0.8 and different from the input label, the sample n is a credible negative sample; otherwise, the distance is an untrusted sample, and the distance is not weighted; meanwhile, an attenuation coefficient is given to the weighted distance, so that the influence of the label on the retrieval result is reduced along with the increase of the steps, and finally the label is obtained according to D and DlThe sum sequences the images in the database, compares the label information of the pseudo label in the database with the label information of the target image, and obtains a retrieval result; the expression is as follows:
Figure BDA0003599166670000082
Figure BDA0003599166670000083
Dfinal=D+ω·Dl
wherein, ω is a label weighted distance weight, and when i is increased, i.e. the input sketch is more complete, ω is gradually decreased; omegap<0,ωpRepresenting confidence negative sample label weighted weights, ωn>0,ωnRepresenting a trusted positive sample label weighted weight; dnRepresents the average of the elements in the distance vector; d is a distance vector between the sketch branch and all the images; dlWeighting the label distance; dfinalThe distance according to which the final sorting is based.
When no commodity picture exists and the commodity is difficult to describe by characters, a user can manually draw a commodity sketch on a touch screen device by means of the image of the commodity, meanwhile, the characteristics (color, height, shape and the like) of the commodity to be searched can be input and searched at the same time, the commodity sketch is rendered into sketch branches and then input into a trained neural network model, the model returns k images most similar to the commodity sketch through the search of the sketch branches and the search of the label branch parts, and the searching efficiency is improved when stroke information is few.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A hand-drawn image real-time retrieval method fusing image label information is characterized by comprising the following steps:
inputting a hand-drawn sketch and label information of a target image into the trained improved neural network model, and retrieving in real time to obtain a retrieval result;
the improved neural network model comprises1、f2、f3And fex,f1To pre-train the network, f2For the layer of attention, f3To lower the dimension layer, fexA label extraction layer;
the training process of the improved neural network model comprises the following steps:
s1: constructing a training set, wherein the training set comprises an image set consisting of a plurality of images and a complete sketch which is correspondingly retracted, and an expansion tag set corresponding to the images, and the expansion tag set corresponding to the images consists of all tag information of the images;
s2: selecting one image in the image set as a target image in each step of training, and training f of the neural network model by using the hand-drawn sketch corresponding to the image1、f2、f3Three branches, fixed after training f1、f2Parameters, simultaneous training is completed by f1、f2、f3Extracting embedded vectors of all target images;
s3: inputting the target image in the image set into the trained f1In (1), a feature map of the target image is obtained, and the feature map is input into (f)exPredicting the label of the target image, and training f according to the label information in the extended label set by adopting a cross entropy loss functionexFixing parameters after training;
s4: rendering the complete sketch of each image in the image set into a plurality of sketches according to the stroke sequence of drawing, forming a sketch branch set of the image set after rendering of each sketch is completed, and performing f1、f2、f3Extracting an embedded vector of a sketch branch;
s5: calculating the embedded vector error of each picture in the draft branch and the embedded vector error of the target image by adopting a triple loss function, reversely propagating the errors to approach the target image and keep away from the non-target image as targets, and adjusting f in the model3The parameters of (1);
s6: and obtaining a sketch branch of the next target image, and repeating the steps S4-S6 until the model reaches the upper limit of the training times.
2. The method for retrieving hand-drawn images fused with image label information in real time as claimed in claim 1, wherein a label L ═ L is set for the images in the training set1,l2,...,ln,...,lNIs used for training the label extraction layer fexThe cross entropy Loss expression is as follows:
Figure FDA0003599166660000021
wherein K represents the total number of categories contained by the tag; n represents the total number of samples; n represents the nth sample; p is a radical ofncRepresenting the probability that sample n belongs to class c; lncThe correct probability label of class c representing sample n.
3. The method for retrieving the hand-drawn image fused with the image tag information in real time as claimed in claim 1, wherein a triple Loss function is adopted to calculate the error between the embedded vector of each picture in the sketch branch and the embedded vector of the target image, and the expression of triple Loss is as follows:
Loss=max(d(VSi,Vp)-d(VSi,Vn)+α,0)
wherein, VSiAn embedded vector representing the ith picture in the sketch branch; vpAn embedded vector representing the target image; vnAn embedded vector representing a random one of the images in the image set other than the target image; α is a constant; d is the Euclidean distance calculation.
4. The method for searching the hand-drawn image fused with the image label information in real time according to claim 1, wherein the step of inputting the target image hand-drawn sketch and the label information, searching in real time and obtaining a final search result comprises the following steps:
the method comprises the following steps: user-entered sketch through image distance network f1、f2、f3Obtaining a sketch embedding vector V of the step iSi
Step two: calculating VSiWith the embedded vector V of each image in the databasepTo obtain a distance vector D ═ D1,d2,...,dn,...,dN};
Step three: computing elements in a distance vectorAnd f is calculated and1output feature map input fexPredicting the label probability of an input sketch, and processing the label probability by utilizing Softmax to obtain a pseudo label;
step four: average value d of elements in distance vector according to relation of pseudo label and input labelmWeighting to obtain a label weighted distance value Dl
Step five: assigning an attenuation coefficient to the tag weighted distance based on D and DlAnd the sum sorts the images in the database and obtains a retrieval result.
5. The method for retrieving the hand-drawn image fused with the image tag information in real time as claimed in claim 4, wherein tag probability of the image is predicted by using a convolutional neural network, and probability vector sets P of N samples respectively belonging to the category c are obtained by Softmax processingc={p1c,p2c,...,pnc,...,pNcH, mixing PcProbability p that a sample n belongs to class c as a pseudo labelncExpressed as:
Figure FDA0003599166660000031
wherein, VncA probability vector representing that sample n belongs to class c; vnkA probability vector representing the total number of label categories for sample n; k represents the total number of categories contained in the label; n represents the total number of samples; n represents the nth sample; p is a radical ofncRepresenting the probability that sample n belongs to class c.
6. The method as claimed in claim 4, wherein the method comprises averaging d of elements in the distance vector according to the relationship between the pseudo tag and the input tagmWeighting to obtain a label weighted distance value DlMax (p), the maximum value of the pseudo label of the sample nn) For the label class to which the sample belongs, if Max (p)n) > 0.8, the sample n is markedA credible sample, otherwise, a non-credible sample is marked; if the pseudo label Max (p)n) Is more than 0.8 and is the same as the input label, and the sample n is a credible positive sample; if the pseudo label Max (p)n) Is more than 0.8 and is different from the input label, and the sample n is a credible negative sample; otherwise, the distance is an untrusted sample, and the distance is not weighted; calculating a tag weighted distance value DlThe expression of (a) is:
Figure FDA0003599166660000032
wherein d ismRepresents the average of the elements in the distance vector; dnRepresenting the Euclidean distance of the sample n from the vector of the sketch; n represents the total number of samples; dlRepresenting a tag weighted distance value; omegap<0,ωpWeighting weights, ω, for the confidence negative sample labelsn>0,ωnWeighting the credible positive sample label; p is a radical ofnA pseudo label representing the probability value of the sample n.
7. The method as claimed in claim 4, wherein the hand-drawn image real-time retrieval method comprises assigning an attenuation coefficient to the label weighting distance, D and DlThe sum ranks the images in the database, and the expression is as follows:
Dfinal=D+ω·Dl
wherein D is a distance vector between the sketch branch and all the images; dlWeighting the label distance; dfinalDistance according to the final sorting; ω is the label weighted distance weight, and ω gradually decreases as i increases, i.e., the input sketch is more complete.
CN202210396360.2A 2022-04-15 2022-04-15 Hand-drawn image real-time retrieval method fusing image label information Pending CN114647754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210396360.2A CN114647754A (en) 2022-04-15 2022-04-15 Hand-drawn image real-time retrieval method fusing image label information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210396360.2A CN114647754A (en) 2022-04-15 2022-04-15 Hand-drawn image real-time retrieval method fusing image label information

Publications (1)

Publication Number Publication Date
CN114647754A true CN114647754A (en) 2022-06-21

Family

ID=81996817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210396360.2A Pending CN114647754A (en) 2022-04-15 2022-04-15 Hand-drawn image real-time retrieval method fusing image label information

Country Status (1)

Country Link
CN (1) CN114647754A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310425A (en) * 2023-05-24 2023-06-23 山东大学 Fine-grained image retrieval method, system, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310425A (en) * 2023-05-24 2023-06-23 山东大学 Fine-grained image retrieval method, system, equipment and storage medium
CN116310425B (en) * 2023-05-24 2023-09-26 山东大学 Fine-grained image retrieval method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN111191732B (en) Target detection method based on full-automatic learning
CN110598029B (en) Fine-grained image classification method based on attention transfer mechanism
US20220415027A1 (en) Method for re-recognizing object image based on multi-feature information capture and correlation analysis
CN105701502B (en) Automatic image annotation method based on Monte Carlo data equalization
CN107683469A (en) A kind of product classification method and device based on deep learning
CN110633708A (en) Deep network significance detection method based on global model and local optimization
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN112668579A (en) Weak supervision semantic segmentation method based on self-adaptive affinity and class distribution
CN111753828A (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN112115291B (en) Three-dimensional indoor model retrieval method based on deep learning
Rad et al. Image annotation using multi-view non-negative matrix factorization with different number of basis vectors
CN111061904A (en) Local picture rapid detection method based on image content identification
CN111738113A (en) Road extraction method of high-resolution remote sensing image based on double-attention machine system and semantic constraint
CN110287952A (en) A kind of recognition methods and system for tieing up sonagram piece character
CN112347284A (en) Combined trademark image retrieval method
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN111340034A (en) Text detection and identification method and system for natural scene
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
CN115292532B (en) Remote sensing image domain adaptive retrieval method based on pseudo tag consistency learning
CN116610778A (en) Bidirectional image-text matching method based on cross-modal global and local attention mechanism
CN114510594A (en) Traditional pattern subgraph retrieval method based on self-attention mechanism
CN112819837A (en) Semantic segmentation method based on multi-source heterogeneous remote sensing image
CN115457332A (en) Image multi-label classification method based on graph convolution neural network and class activation mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination