CN112256904A - Image retrieval method based on visual description sentences - Google Patents
Image retrieval method based on visual description sentences Download PDFInfo
- Publication number
- CN112256904A CN112256904A CN202010998165.8A CN202010998165A CN112256904A CN 112256904 A CN112256904 A CN 112256904A CN 202010998165 A CN202010998165 A CN 202010998165A CN 112256904 A CN112256904 A CN 112256904A
- Authority
- CN
- China
- Prior art keywords
- image
- visual
- map
- statement
- reward
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses an image retrieval method based on visual description sentences, which comprises the following steps: based on a graph convolution deep learning network, constructing an information transfer mode of nodes and edges in the representation of a visual knowledge graph, and realizing the aggregation and the update of the characteristics of each semantic unit; coding each semantic unit feature aggregated and updated in the map by adopting a multi-level length short-time memory network in combination with an attention mechanism for generating an image description statement; under the framework of reinforcement learning, by utilizing CIDER scores and map similarity, a reward and punishment function based on an image description statement is designed and used for feedback regulation and optimization of an image-map process, an image-statement process and an image-map-statement process, a visual description statement with finer granularity of an image is obtained and retrieved, and a target retrieval image corresponding to an inquiry image is output. The invention improves the feasibility of utilizing the character-based image retrieval on a large-scale data set.
Description
Technical Field
The invention relates to the field of image retrieval, in particular to an image retrieval method based on visual description sentences.
Background
In recent years, image retrieval has been a research focus in the field of computer vision, of which content-based image retrieval methods are the most popular[1][2]. However, content-based image retrieval methods have mainly focused on retrieval using low-level visual features of images, such as color, shape, and texture[3][4][5]The method can not capture the high-level semantic information of the image, and does not accord with the method that people usually judge the image similarity according to the semantic information of the image. Thus, most content-based image retrieval systems do not fully reflect or match query intent, and there is a semantic gap between low-level features and upper-level understanding. To reduce the semantic gap, processing content-based image retrieval by matching visual elements of the image in the form of natural language has attracted the attention of researchers[6][7]Wherein the visual descriptive statement of the image plays a key role. The visual description sentence of the image expands the image representation from a small group of category labels or keywords to a detailed sentence, contains richer high-level semantic features of the image, and can improve the retrieval precision through longer and more targeted query. However, it is a major challenge to efficiently and accurately generate a plurality of visual descriptive sentences of an image, and not only to deeply understand and model the image content, but also to process the natural description language (including the elements of words, phrases, sentences, etc.) to match the corresponding image content. Recently, many researchers have applied visual knowledge-graph theory to visual descriptive statement generation of images[8][9]However, most of the research work only stays in the feed-forward type depth modeling stage, i.e., a unidirectional modeling process of first encoding "images" into "maps" and then decoding "maps" into "sentences". Because of the subjectivity of image description, a visual knowledge graph can often resolve various description sentences, so that the requirement of resolving the diversity of the visual resolution cannot be met only by a feedforward depth modeling algorithm[10]。
Disclosure of Invention
The invention provides an image retrieval method based on visual description sentences, which is different from the prior method that the image retrieval is carried out only by depending on the visual characteristics of the lower layer of the image, the invention adopts the visual description sentences which generate the image to carry out the image retrieval, can effectively reduce the semantic gap in the image retrieval, better support the refinement and complicated retrieval of the image, automatically generate the corresponding image description, avoid the defects in manual labeling, improve the feasibility of utilizing the image retrieval based on characters on a large-scale data set, and is described in detail as follows:
an image retrieval method based on visual descriptive sentences, the method comprising the steps of:
based on a graph convolution deep learning network, constructing an information transfer mode of nodes and edges in the representation of a visual knowledge graph, and realizing the aggregation and the update of the characteristics of each semantic unit;
coding each semantic unit feature aggregated and updated in the map by adopting a multi-level length short-time memory network in combination with an attention mechanism for generating an image description statement;
under the framework of reinforcement learning, by utilizing CIDER scores and map similarity, a reward and punishment function based on an image description statement is designed and used for feedback regulation and optimization of an image-map process, an image-statement process and an image-map-statement process, a visual description statement with finer granularity of an image is obtained and retrieved, and a target retrieval image corresponding to an inquiry image is output.
The reward and punishment function is specifically as follows:
wherein, ω iscAnd ωsA trainable fusion weight;a prediction description sentence representing an image,a CIDER score that describes a sentence for prediction; si={s1,s2,s3Are 3 similarity scores characterizing the similarity of the spectra, is a reward and penalty function of 3, whereinFor the optimization of the process of image-map-sentence,for individual optimization of the "image-map" process,for individual optimization of the "map-sentence" process.
Further, the method further comprises:
taking a reward and punishment function as a reward mechanism, updating network parameters by using a strategy gradient in reinforcement learning, and obtaining a loss function LRL(θ) the gradient calculation with respect to the parameter θ is as follows:
wherein the content of the first and second substances,in order to reward the objective function for the desire,as a function of the reward and punishment defined above,as a function of the score.
Wherein the method further comprises: and detecting the visual relation between the visual entity and the vision from the input image, and constructing a visual knowledge map representation corresponding to the image.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention utilizes the visual descriptive sentences of the images to carry out retrieval, uses the visual knowledge map to better represent the image semantics, introduces a feedback regulation mechanism, uses reinforcement learning to improve the accuracy and diversity of the descriptive sentences, effectively reduces the semantic gap in the image retrieval and improves the efficiency and precision of the image retrieval;
2. the method can automatically generate the corresponding image description without manual marking, avoids the defects generated during manual marking, and improves the feasibility of utilizing the character-based image retrieval on a large-scale data set.
Drawings
FIG. 1 is a flow chart of a method for image retrieval based on visual descriptive statements;
FIG. 2 is a schematic diagram of a visual knowledge map of an image;
FIG. 3 is a diagram of a two-layer long-short-term memory (LSTM) network code.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
In order to solve the problems existing in the background technology, the existing cognitive neuroscience shows that the human visual system is formed by connecting a large number of feedforward and feedback, the image retrieval based on the content is usually represented by using low-level visual features, the query intention cannot be completely reflected or matched, and semantic gaps exist in the retrieval. Therefore, in the description model based on the visual map, the feedback regulation mechanism is introduced, the feedback type depth modeling optimization of the process from the sentence to the map, the process from the map to the image and the whole process is jointly explored, the complex image retrieval is carried out based on the visual description sentence of the generated image, the semantic gap in the image retrieval can be effectively reduced, the accurate and diversified visual description sentence generation of the image is realized, the method is further used for the image retrieval, the retrieval efficiency and accuracy are improved, and the refinement and the complicated retrieval of the image are better supported.
Example 1
An image retrieval method based on visual descriptive sentences, referring to fig. 1, the method comprises the following steps:
step 101: detecting a visual entity and a visual relation from an input image, and constructing a visual knowledge map representation corresponding to the image;
step 102: based on a graph convolution deep learning network, constructing an information transfer mode of nodes and edges in the representation of a visual knowledge graph, and realizing the aggregation and the update of the characteristics of each semantic unit;
step 103: coding each semantic unit feature aggregated and updated in the map by adopting a multi-level length short-time memory network in combination with an attention mechanism for generating an image description statement;
step 104: in a reinforcement learning framework, 3 punishment functions based on an image description statement are designed by utilizing CIDER fraction and map similarity and are respectively used for feedback regulation and optimization of an image-map process, an image-statement process and an image-map-statement overall process to obtain a visual description statement with finer granularity of an image;
step 105: and retrieving based on the fine-grained visual description sentences to obtain a target retrieval image corresponding to the query image.
Example 2
The scheme of example 1 is further described below with reference to specific calculation formulas and examples, which are described in detail below:
201: the knowledge graph is represented as tuple G ═ (N, E), where N and E are the set of nodes and edges, respectively;
wherein, N contains three types of nodes: entity node o, attribute node a, and relationship node r. To obtain the node representation, the entity is first detected and classified using the Faster R-CNN (fast area convolutional neural network) target detector. The fast R-CNN is composed of a convolutional layer, an RPN (regional proposal network) layer, a RoI Pooling (region of interest Pooling) layer and a classification and regression layer, the characteristics of an input image are extracted through a plurality of convolutional layers and sent into the RPN network, a plurality of candidate regional frames are generated through training, the regional frames and the image characteristics extracted by the convolutional layers are integrated, and after the RoI Pooling, the regional frame classification and position regression are carried out in the classification and regression layer to obtain the final accurate position of a target class and a detection frame.
The invention selects at least 10 and at most 100 entities for each image by using pre-trained Faster R-CNN, extracts the characteristics of the entities by using RoI Pooling, calculates the attribute classification result of the visual entity by using a full-link layer and a Softmax function, and calculates the attribute classification result of the visual entity by using a MOTIFS model[11]Calculating the relation classification result between visual entities to obtain the characteristic representation u of the entities, attributes and relation nodeso、uaAnd ur. Further, the information of each node is known, using oiDenotes the ith entity, rijRepresents an entity oiAnd ojRelation between ai,lRepresents an entity oiThe definition of the edge in E is as follows: if the object oiPossession Attribute ai,lThen a slave o is establishediTo ai,lIf present, directed edge of<oi-rij-oj>The relationship triple of (3), then the slave o is establishediTo rijAnd rijTo ojTwo directed edges.
202: the proposed multi-modal graph convolution network specifically uses a four-function spatial graph convolution function fr、fs、faAnd foAnd calculating the complex semantic association among the nodes in the graph, wherein the four functions are all two-layer fc-ReLU structures with independent parameters, and carrying out context coding on the node characteristics of the graph to generate a semantic context-aware aggregation characteristic representation V.
Wherein, the aggregation feature representation V contains three types of embedding (embedded vector): relation node rijRelationship of (3)Entity node oiGenus ofSex embeddingAnd entity embedding
In particular, relationship triplets in a known atlas<oi-rij-oj>Calculating the relationship embeddingComprises the following steps:
wherein the content of the first and second substances,andare respectively entity nodes oiAnd ojIs used to represent the feature vector of (a),is oiAnd ojRelation node r betweenijIs represented by the feature vector of (1).
Entity node o in known mapiAll the attributes ofWhereinRepresents oiNumber of all attributes with, compute attribute embeddingComprises the following steps:
wherein the content of the first and second substances,is oiFeature vector representation of the corresponding attribute node.
In the graph, because of the node oiIn a relational tuple, either as the head or tail entity "subject", o is integrated using different functionsiKnowledge of all relation tuples, computing entity embeddingComprises the following steps:
wherein, the node oj∈sbj(oi) When it is object, oiIs a subject; node ok∈obj(oi) When it is, it is subject, oiIs object;is at oiThe number of all relationship triplets present; sbj (o)i) As a set of head entity nodes, obj (o)i) Is a tail entity node set; f. ofsAnd foRespectively as characteristic convolution transfer functions of a head entity and a tail entity;andare respectively entity nodes oi、ojAnd okIs used to represent the feature vector of (a),andare respectively oiAnd ojAnd okAnd oiNode r of the relation betweenijAnd rkiIs represented by the feature vector of (1).
203: the following notation is defined to represent the operation of a long-short-time memory (LSTM) network over a single time step:
ht=LSTM(xt,ht-1) (4)
wherein x istIs the input vector of LSTM, htIs the output vector of the LSTM.
Specifically, first, at each time step t, the word ω is input by concatenationtAverage pooling characteristicsAnd previous outputs of the second layer LSTMMaximum context information is collected as input to the first layer LSTM as follows:
wherein, WΣIs a word embedding matrix for a vocabulary of size Σ,the method is a result of average pooling of the aggregation characteristic expression V output by the multi-modal graph convolution network in the previous step.
Further, at each time step t, the output according to the first layer LSTMThe normalized attention distribution is generated for the aggregate feature V as follows:
αt=softmax(at) (6)
wherein v ismIs embedding, W of three types in Vv、WhAnd ωaIs a learning parameter, αtIs the resulting normalized attention distribution.
Based on the attention distribution, carrying out attention weighted sum on all embedding in the V, and calculating to obtain new characteristicsWill be provided withAndcombined as input to the second layer LSTM, as follows:
further, using the symbol y1:TTo represent a word sequence (y)1,...,yT) At each time step t, using the output of the second layer LSTMThe conditional distribution of possible output words is given by:
wherein, WpAnd bpAre the learning weights and biases. Finally, the distribution of the complete output sequence is calculated as the product of the conditional distributions:
wherein T is the total number of time steps.
204: first, calculate the CIDER score CIDER of the model prediction sentence (c)i,Si) Wherein c isiAnd Si={si1,...simAre respectively images IiThe candidate description sentence and the reference description sentence of (4);
defining an n-gram tuple wkAppear in the reference sentence sijThe number of times in (1) is hk(sij) Appears in the candidate sentence ciThe number of times in (1) is hk(ci) Each n-gram tuple w is calculated bykTF-IDF weight of:
where Ω is the set of all n-grams, I is the set of all images in the dataset, and wlIs any n-gram tuple in omega, hl(sij) Is wlAppear in the reference sentence sijNumber of times of (1), IpIs an arbitrary image in the data set, hk(spq) Is wkAppears in IpCorresponding reference sentence spqThe number of times (1).
For n-grams tuples of length n, the candidate sentence c is usediAnd a reference sentence SiAverage cosine similarity between them to calculate its CIDERnAnd (3) fractional:
wherein m is SiNumber of sentences contained in, gn(ci) For all n-grams tuples of length n in the candidate sentence ciOf TF-IDF weight vector gn(sij) For all n-grams tuples of length n in the reference sentence sijTF-IDF weight vector in (1).
The scores of all n-grams of different lengths are weighted and summedThe total CIDER score is calculated as follows, where ωnIs a trade-off parameter:
further, for matching image-sentence pairs, the features of two different modalities of the image and the sentence are mapped to the same space by using a knowledge graph for comparison. Knowledge-graph characterization G of known images1For model prediction description sentences and real description sentences of the images, through natural language processing, a sentence parser of Stanford is used for constructing corresponding knowledge graph representation G2And G3. Performing a map G1、G2And G3Comparing the nodes between every two, and calculating the node similarity to represent the similarity between the maps according to the following formula:
wherein the content of the first and second substances,for node comparison function, sigma is sigmoid activation function, s1、s2And s3Are each G1And G2、G1And G3、G2And G3Normalized similarity score of (a).
The similarity score is fused with the CIDER score as follows:
wherein, ω iscAnd ωsA trainable fusion weight;a prediction description sentence representing an image,a CIDER score that describes a sentence for prediction; si={s1,s2,s3Are the 3 similarity scores mentioned above,is a reward and penalty function of 3, whereinFor the optimization of the process of image-map-sentence,for individual optimization of the "image-map" process,for individual optimization of the "map-sentence" process.
In the optimization process, the reward and punishment function is used as a reward mechanism, the strategy gradient in reinforcement learning is used for updating network parameters, and the loss function LRL(θ) the gradient calculation with respect to the parameter θ is as follows:
wherein the content of the first and second substances,in order to reward the objective function for the desire,as a function of the reward and punishment defined above,as a function of the score.
205: with I ═ I1,i2,...imIndicates the data set of the current image retrieval, byAndrepresenting a query image q and any candidate image i in a datasetlE.g. I visual description statement, first using Word2vec willAndencoding into a corresponding vector representation Vq={eq1,eq2,...eqnAndwherein eqnAndare respectivelyAndthe word imbedding in (1). Further, by using the coded vector representation, the semantic similarity between the images is measured by calculating the distance between the vectors, and the smaller the distance is, the more similar the images are. Thus, searching for an image becomes based on the distance metric of the visual descriptive statement embedding vector representation of the image, the equation being defined as follows:
wherein N and M are vector representations V respectivelyqAndnumber of elements in (e)qn∈Vq,sim(q,il) The similarity between the query image and any of the candidate images is scored.
After similarity calculation between the query image and all candidate images in the data set is performed by using the above formula, the candidate images are ranked according to the similarity scores, and the required number of target images with the highest similarity score relative to the query image are retrieved. Reference documents:
[1]Patel T,Gandhi S.A survey on context based similarity techniques for image retrieval[C]//2017International Conference on Innovative Mechanisms for Industry Applications(ICIMIA).IEEE,2017:219-223.
[2]Khawandi S,Abdallah F,Ismail A.A survey on Image Indexing and Retrieval based on Content Based Image[C]//2019International Conference on Machine Learning,Big Data,Cloud and Parallel Computing(COMITCon).IEEE,2019:222-225.
[3]Mahamuni C V,Wagh N B.Study of CBIR methods for retrieval of digital images based on colour and texture extraction[C]//2017International Conference on Computer Communication and Informatics(ICCCI).IEEE,2017:1-7.
[4]Zhou W,Li H,Tian Q.Recent advance in content-based image retrieval:A literature survey[J].arXiv preprint arXiv:1706.06064,2017.
[5]Narayan R,Reddy S C,Narayan L,et al.The Study of Approaches of Content Based Image Retrieval[J].2019.
[6]Wei X,Qi Y,Liu J,et al.Image retrieval by dense caption reasoning[C]//2017IEEE Visual Communications and Image Processing(VCIP).IEEE,2017:1-4.
[7]Hoxha G,Melgani F,Demir B.Retrieving Images with Generated Textual Descriptions[C]//IGARSS 2019-2019IEEE International Geoscience and Remote Sensing Symposium.IEEE,2019:5812-5815.
[8]Li X,Jiang S.Know more say less:Image captioning based on scene graphs[J].IEEE Transactions on Multimedia,2019,21(8):2117-2130.
[9]Yang X,Tang K,Zhang H,et al.Auto-encoding scene graphs for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10685-10694.
[10] computational modeling and application research of a feedback mechanism in the Caochun and deep convolutional neural network [ D ]. university of Chinese science and technology, 2018.
[11]R.Zellers,M.Yatskar,S.Thomson,and Y.Choi.Neural motifs:Scene graph parsing with global context.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pages 5831–5840,2018.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (4)
1. An image retrieval method based on visual descriptive sentences, characterized in that the method comprises the following steps:
based on a graph convolution deep learning network, constructing an information transfer mode of nodes and edges in the representation of a visual knowledge graph, and realizing the aggregation and the update of the characteristics of each semantic unit;
coding each semantic unit feature aggregated and updated in the map by adopting a multi-level length short-time memory network in combination with an attention mechanism for generating an image description statement;
under the framework of reinforcement learning, by utilizing CIDER scores and map similarity, a reward and punishment function based on an image description statement is designed and used for feedback regulation and optimization of an image-map process, an image-statement process and an image-map-statement process, a visual description statement with finer granularity of an image is obtained and retrieved, and a target retrieval image corresponding to an inquiry image is output.
2. The image retrieval method based on the visual description statement of claim 1, wherein the reward and punishment function is specifically:
wherein, ω iscAnd ωsA trainable fusion weight;a prediction description sentence representing an image,a CIDER score that describes a sentence for prediction; si={s1,s2,s3Are 3 similarity scores characterizing the similarity of the spectra, is a reward and penalty function of 3, whereinFor the optimization of the process of image-map-sentence,for individual optimization of the "image-map" process,for individual optimization of the "map-sentence" process.
3. An image retrieval method based on visual descriptive sentences according to claim 1 or 2, characterized in that the method further comprises:
taking a reward and punishment function as a reward mechanism, updating network parameters by using a strategy gradient in reinforcement learning, and obtaining a loss function LRL(θ) the gradient calculation with respect to the parameter θ is as follows:
4. An image retrieval method based on visual descriptive sentences according to claim 1 or 2, characterized in that the method further comprises: and detecting the visual relation between the visual entity and the vision from the input image, and constructing a visual knowledge map representation corresponding to the image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010998165.8A CN112256904A (en) | 2020-09-21 | 2020-09-21 | Image retrieval method based on visual description sentences |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010998165.8A CN112256904A (en) | 2020-09-21 | 2020-09-21 | Image retrieval method based on visual description sentences |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112256904A true CN112256904A (en) | 2021-01-22 |
Family
ID=74231454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010998165.8A Pending CN112256904A (en) | 2020-09-21 | 2020-09-21 | Image retrieval method based on visual description sentences |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112256904A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989088A (en) * | 2021-02-04 | 2021-06-18 | 西安交通大学 | Visual relation example learning method based on reinforcement learning |
CN114020779A (en) * | 2021-10-22 | 2022-02-08 | 上海卓辰信息科技有限公司 | Self-adaptive optimization retrieval performance database and data query method |
CN114677580A (en) * | 2022-05-27 | 2022-06-28 | 中国科学技术大学 | Image description method based on self-adaptive enhanced self-attention network |
CN117648444A (en) * | 2024-01-30 | 2024-03-05 | 广东省华南技术转移中心有限公司 | Patent clustering method and system based on graph convolution attribute aggregation |
CN117648444B (en) * | 2024-01-30 | 2024-04-30 | 广东省华南技术转移中心有限公司 | Patent clustering method and system based on graph convolution attribute aggregation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108171283A (en) * | 2017-12-31 | 2018-06-15 | 厦门大学 | A kind of picture material automatic describing method based on structuring semantic embedding |
CN110674850A (en) * | 2019-09-03 | 2020-01-10 | 武汉大学 | Image description generation method based on attention mechanism |
CN110991515A (en) * | 2019-11-28 | 2020-04-10 | 广西师范大学 | Image description method fusing visual context |
CN111259724A (en) * | 2018-11-30 | 2020-06-09 | 塔塔顾问服务有限公司 | Method and system for extracting relevant information from image and computer program product |
CN111612103A (en) * | 2020-06-23 | 2020-09-01 | 中国人民解放军国防科技大学 | Image description generation method, system and medium combined with abstract semantic representation |
-
2020
- 2020-09-21 CN CN202010998165.8A patent/CN112256904A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108171283A (en) * | 2017-12-31 | 2018-06-15 | 厦门大学 | A kind of picture material automatic describing method based on structuring semantic embedding |
CN111259724A (en) * | 2018-11-30 | 2020-06-09 | 塔塔顾问服务有限公司 | Method and system for extracting relevant information from image and computer program product |
CN110674850A (en) * | 2019-09-03 | 2020-01-10 | 武汉大学 | Image description generation method based on attention mechanism |
CN110991515A (en) * | 2019-11-28 | 2020-04-10 | 广西师范大学 | Image description method fusing visual context |
CN111612103A (en) * | 2020-06-23 | 2020-09-01 | 中国人民解放军国防科技大学 | Image description generation method, system and medium combined with abstract semantic representation |
Non-Patent Citations (1)
Title |
---|
孙晓领: "《面向特定领域图像的语义知识抽取方法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989088A (en) * | 2021-02-04 | 2021-06-18 | 西安交通大学 | Visual relation example learning method based on reinforcement learning |
CN112989088B (en) * | 2021-02-04 | 2023-03-21 | 西安交通大学 | Visual relation example learning method based on reinforcement learning |
CN114020779A (en) * | 2021-10-22 | 2022-02-08 | 上海卓辰信息科技有限公司 | Self-adaptive optimization retrieval performance database and data query method |
CN114020779B (en) * | 2021-10-22 | 2022-07-22 | 上海卓辰信息科技有限公司 | Self-adaptive optimization retrieval performance database and data query method |
CN114677580A (en) * | 2022-05-27 | 2022-06-28 | 中国科学技术大学 | Image description method based on self-adaptive enhanced self-attention network |
CN114677580B (en) * | 2022-05-27 | 2022-09-30 | 中国科学技术大学 | Image description method based on self-adaptive enhanced self-attention network |
CN117648444A (en) * | 2024-01-30 | 2024-03-05 | 广东省华南技术转移中心有限公司 | Patent clustering method and system based on graph convolution attribute aggregation |
CN117648444B (en) * | 2024-01-30 | 2024-04-30 | 广东省华南技术转移中心有限公司 | Patent clustering method and system based on graph convolution attribute aggregation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval | |
CN110969020B (en) | CNN and attention mechanism-based Chinese named entity identification method, system and medium | |
CN110598005B (en) | Public safety event-oriented multi-source heterogeneous data knowledge graph construction method | |
CN108038122B (en) | Trademark image retrieval method | |
CN104899253B (en) | Towards the society image across modality images-label degree of correlation learning method | |
CN106845411B (en) | Video description generation method based on deep learning and probability map model | |
Yin et al. | Region search based on hybrid convolutional neural network in optical remote sensing images | |
CN112256904A (en) | Image retrieval method based on visual description sentences | |
CN111881677A (en) | Address matching algorithm based on deep learning model | |
CN111291556A (en) | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item | |
CN109271539A (en) | A kind of image automatic annotation method and device based on deep learning | |
CN114398491A (en) | Semantic segmentation image entity relation reasoning method based on knowledge graph | |
CN111324765A (en) | Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation | |
CN108170823B (en) | Hand-drawn interactive three-dimensional model retrieval method based on high-level semantic attribute understanding | |
Zhang et al. | Hierarchical scene parsing by weakly supervised learning with image descriptions | |
CN114461890A (en) | Hierarchical multi-modal intellectual property search engine method and system | |
CN114612767A (en) | Scene graph-based image understanding and expressing method, system and storage medium | |
CN113868448A (en) | Fine-grained scene level sketch-based image retrieval method and system | |
Barman et al. | A graph-based approach for making consensus-based decisions in image search and person re-identification | |
Astolfi et al. | Syntactic pattern recognition in computer vision: A systematic review | |
CN116610778A (en) | Bidirectional image-text matching method based on cross-modal global and local attention mechanism | |
Li et al. | Caption generation from road images for traffic scene modeling | |
CN112699685A (en) | Named entity recognition method based on label-guided word fusion | |
CN114969343B (en) | Weak supervision text classification method combined with relative position information | |
Tian et al. | Scene graph generation by multi-level semantic tasks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210122 |