CN108509521B - Image retrieval method for automatically generating text index - Google Patents

Image retrieval method for automatically generating text index Download PDF

Info

Publication number
CN108509521B
CN108509521B CN201810198490.9A CN201810198490A CN108509521B CN 108509521 B CN108509521 B CN 108509521B CN 201810198490 A CN201810198490 A CN 201810198490A CN 108509521 B CN108509521 B CN 108509521B
Authority
CN
China
Prior art keywords
word
image
images
text
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810198490.9A
Other languages
Chinese (zh)
Other versions
CN108509521A (en
Inventor
吴良超
苏锦钿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810198490.9A priority Critical patent/CN108509521B/en
Publication of CN108509521A publication Critical patent/CN108509521A/en
Application granted granted Critical
Publication of CN108509521B publication Critical patent/CN108509521B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an image retrieval method for automatically generating text indexes, which comprises the following steps: (1) training an automatic labeling model, firstly extracting image characteristics through a CNN part of the model, taking the characteristics and descriptors of the image as the input of the RNN part of the model, and performing back propagation by taking a cross entropy loss function as a target function; (2) generating a text index for the image, training to obtain an automatic labeling model and a dictionary, generating a description word sequence and confidence degrees corresponding to each word for the image which is not labeled through the automatic labeling model, normalizing the confidence degrees, and using the two words as the text index of the image to construct an image retrieval index; (3) when the query keyword is not in the dictionary, searching the word bank through the similar meaning word to find the similar meaning word of the keyword in the dictionary; (4) and finding out corresponding images in the image retrieval index according to the keywords or the similar meaning words thereof, and returning from high to low in sequence according to the confidence coefficient.

Description

Image retrieval method for automatically generating text index
Technical Field
The invention belongs to the technical field of information retrieval, particularly relates to image retrieval based on texts, and particularly relates to an image retrieval method for automatically generating text indexes for images.
Background
With the explosive growth of image data on the internet, how to screen out required data from these massive data becomes an urgent problem to be solved, and therefore image retrieval is receiving attention from more and more researchers.
The mainstream Image search can be classified into two categories, one is Content-Based Image search (CBIR) and the other is text-Based Image search (TBIR), depending on how the Image Content is described. The text-based image retrieval method describes the contents in the images in a text labeling mode, so that keywords for describing the contents of the images are formed for each image, such as objects, scenes and the like in the images; when searching, the user can provide the query key words according to the interest of the user, the searching system finds out the pictures marked with the corresponding query key words according to the query key words provided by the user, and finally, the searched result is returned to the user.
The image retrieval mode based on the text is intuitive, the interpretability of the result is strong, and the precision ratio is relatively high. The drawbacks of this approach are however also very significant: firstly, the method needs manual intervention in the labeling process, and obviously consumes a great deal of manpower and financial resources for completing text labeling for the images along with the rapid increase of the number of the images on the internet; secondly, the result obtained by manual labeling is often articles appearing in some images, namely, some nouns representing the articles, the information such as the number, the action, the state and the like of the articles is ignored, and each word in the result is not distinguished, namely, which word covers more image information cannot be distinguished; finally, the method can only carry out accurate retrieval, namely the query keyword of the user must appear in the label to return a corresponding result, but the same meaning can be expressed by a plurality of different terms generally, and the label data can not cover all terms, which causes that the content meeting the requirements in the database can not be retrieved.
Disclosure of Invention
The invention aims to solve the problems that the efficiency is low, the labeling result cannot cover all the contents of an image and words which do not appear in the label cannot be retrieved due to the fact that manual labeling is needed in the current text-based image retrieval, and provides an image retrieval method capable of automatically generating text indexes.
The purpose of the invention can be achieved by adopting the following technical scheme:
an image retrieval method for automatically generating text indexes comprises the following steps:
s1, learning the automatic labeling model M, and the process is as follows:
s101, acquiring a labeled training data set and an unlabeled image data set, wherein the training data set comprises training images and text descriptions corresponding to the training images, and the image data set only comprises images and does not have text descriptions corresponding to the images;
s102, segmenting all text descriptions of the training data set to construct a dictionary D;
s103, extracting the characteristics of each image in the training data set through CNN, wherein the characteristics are one-dimensional vectors;
s104, for a certain image i in the training data set, performing word segmentation on the corresponding text description to obtain wi1,wi2,…wiLTotal L words, of said image i to be extracted from CNN at the same timeCharacteristic fiAs initial input of hidden unit of RNN, and sequentially inputting word w in each step of recurrent neural network cyclei1,wi2,…wiLObtaining the probability value of each word output in the dictionary after the output result of each step passes through the softmax layer, and recording the word input in the t step as witThe probability distribution of the output is PitThen the step outputs the word witHas a probability of Pit(wit) From the maximum likelihood estimation, it is desirable to maximize the probability of equation (1),
Figure BDA0001593769830000031
s105, aiming at all images in the training data set, the probability of the formula (2) needs to be maximized, and the formula is taken as a target function to carry out back propagation to update the parameters of the model so as to obtain an automatic labeling model M, wherein the model consists of the CNN and the RNN;
Figure BDA0001593769830000032
s2, generating text indexes for all images through the automatic annotation model M;
for any image i in the image data set, firstly, the image feature f is extracted from the CNN part of the automatic labeling model MiAs an initial input to the RNN portion of the auto-labeling model M, then the words are generated in turn, generating the word w'itIs dependent on w 'already generated'i1,w′i2…w′i(t-1)Selecting the word with the maximum output probability value as a generated word at each step, and taking the probability value as the confidence coefficient of the generated word in the image and marking as z;
when the final words generated in the above steps or the length of the generated words reaches a preset threshold value, stopping continuously generating the words; for any image i described above, a sequence of descriptors w 'can be generated'i1,w′i2…w′ilAnd said descriptor is in the imageZ of (1)i1,zi2…zilNormalizing the confidence level by formula (3)
Figure BDA0001593769830000033
W 'mentioned above'i1,w′i2…w′ilAnd z'i1,z′i2…z′ilTogether constituting a text index of said image i;
s3, constructing an image retrieval index by the text index of each image, and aiming at any word w in the dictionary DuFind all images i described by the word1,i2…ioAnd the confidence level z 'that the term corresponds in the image'u1,z′u2…z′uoIf the images are ranked according to the confidence degree from high to low, a candidate image set ranked according to the confidence degree is generated for any word in the dictionary D in the mode;
s4, establishing a near-meaning word query word bank, obtaining text data which does not need to be labeled from a network text data set, training the text data through a word2vec algorithm, constructing a word bank DB, belonging to the DB, wherein each word in the word bank has a corresponding word vector, and calculating the meaning similarity of any two words in the word bank DB; when inquiring about the keyword wuNot present in dictionary D described above, and found with word w via lexicon DBuWord w having the closest meaning and appearing in dictionary DvAnd by the word wvRetrieving the relevant images;
s5, receiving query keywords to perform image retrieval, generating a group of candidate image sets sorted according to confidence degrees for any word in the word bank DB according to the steps S3 and S4, combining the candidate image sets generated by each word when a plurality of query keywords exist, and superposing all the confidence degrees of the candidate images i in different query keywords as the final confidence degree of the candidate images i for the candidate images i with the occurrence times larger than 1, removing redundant candidate images i and enabling the candidate images i to appear only once; and sorting the candidate images from high to low according to the confidence coefficient after superposition, and selecting the first plurality of candidate images as a returned result.
Further, in step S2, when the automatic labeling model M generates descriptors for an image, a confidence is generated for each descriptor at the same time, which indicates the accuracy of the descriptor in describing the image; and by sequencing the confidence level, images with higher relevance to the keywords are accurately retrieved.
Further, in the step S4 and the step S5, when the query keyword does not appear in the dictionary D, a word vector is constructed for each word by constructing a lexicon DB, and for any two word vectors ve,vuCalculating the similarity of the two words by formula (4), finding out the word which appears in the dictionary D and has the closest meaning to the query keyword, and further retrieving the corresponding image,
Figure BDA0001593769830000051
wherein v ise·vuRepresents the inner product of two vectors, | ve|×|vuI represents the product of the lengths of two vectors, and the larger the value, the closer the two meanings are.
Further, the CNN adopts ResNet, and the RNN adopts LSTM.
Compared with the prior art, the invention has the following advantages and effects:
1. according to the method, the automatic labeling model is learned, a plurality of description words can be automatically generated for the image, and when massive image data are faced, manual intervention can be effectively reduced, and required manpower and financial resources are greatly reduced.
2. The description words generated for the image in the invention can contain quantifier for describing the number of the objects, adjective for describing the state of the objects, verb for describing the action of the objects and the like besides the nouns for representing the objects, thereby more comprehensively covering the content of the image, and more accurately retrieving the image compared with the traditional processing mode of marking out nouns only by manual marking.
3. The method can process the query keywords which do not appear in the text of the image training set, train the similar meaning word query word bank by an unsupervised method, find the words with the similar meaning to the query keywords, and avoid the problem of low matching success rate of an accurate matching mode.
Drawings
FIG. 1 is a diagram of an automatic tagging model of the present invention;
FIG. 2 is a schematic diagram of text indexing of images generated by an automatic annotation model;
FIG. 3 is a schematic diagram of a process for finding a synonym for a query keyword from a synonym library;
fig. 4 is a flowchart of image retrieval by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The image retrieval method for automatically generating the text index is mainly applied to retrieval of Internet images, such as search engines of Google, must and Baidu. The following are the implementation steps of the application of the invention:
step S1, learning the automatic labeling model M, which comprises the following steps:
step S101, acquiring a labeled training data set and an unlabeled image data set from the network image data, first selecting a specific language, here taking chinese as an example, acquiring the labeled training data set as an image chinese description data set of AI challenge, where the data set includes 300000 labeled images, and each image has sentences described by 5 sentences. The obtained unlabelled image data set is pictures provided by ImageNet, the number of the images is 14197122, and meanwhile, more pictures can be captured by means of a crawler and the like.
Step S102, segmenting the 1500000 samples in the training data set through the bus segmentation, counting different words appearing in the 1500000 samples, and constructing a dictionary D, wherein the size of the constructed dictionary is 8233.
Step S103, extracting features of the images in the training data set through a convolutional neural network (hereinafter, abbreviated as CNN), where the specific CNN used here is ResNet, and each image extracts a feature vector with a dimension of 2048.
Step S104, for a certain image i in the training data set, performing word segmentation on the corresponding text description to obtain wi1,wi2,…wiLThe feature vector of the image i extracted from the CNN is used as the initial input of a hidden unit of a recurrent neural network (hereinafter abbreviated as RNN), where the specific RNN used here is LSTM, and the specific step number of the input is determined according to the above L.
Before inputting, it also needs to add label words at the head and tail of L words obtained by word segmentation, where the head label word is marked as wsThe tail marker word is marked as weThe two tokens are the same for all samples in the training dataset. Therefore, the words w need to be input sequentially in each step of the RNN cycles,wi1,wi2,…wiL,weThe result output at each step passes through the softmax layer to obtain a probability distribution, which in this example is a 8233-long vector corresponding to 8233 words in the dictionary, and the value of one dimension in the vector represents the probability value of the corresponding word of the dimension. The word input in the t step is recorded as witThe probability distribution of the output is PitThen the step outputs the word witHas a probability of Pit(wit). In order to make the words output from the image i fit the words input to it as closely as possible, the probability of equation (1) needs to be maximized according to the maximum likelihood estimation.
Step S105, according to step S104, may fit the output of one image to its input, where in order to fit the output term of each image in the entire training set to the input term as much as possible, the probability of equation (2) needs to be maximized. However, in the training process, a negative sign is often added before the formula (2) and then used as a loss function of the model, so that the target is changed into a loss function minimization, then the parameters of the model are updated through back propagation, during the training, the training set and the verification set need to be divided, and during the training process, whether the model is converged or not is judged by observing the effect of the model on the verification set, and the automatic labeling model M can be obtained after the model is converged, wherein the specific structure of the model is shown in fig. 1.
Figure BDA0001593769830000072
Step S2, generating text indexes for all images through the automatic annotation model M described in step S1.
For any image i, firstly, the image feature f is extracted from the CNN part of the automatic labeling model MiImage feature fiAnd a head marker word w in step S104sTogether as an initial input to the RNN portion of the automatic annotation model M, the first word w 'describing the image is generated'i1Then the first word w'i1Generating a second descriptive term w 'as input to RNN'i2. Analogize in turn to generate the t < th > word w'itDependent on w 'generated ahead of it'i1,w′i2…w′i(t-1)
And selecting the word with the maximum RNN output part probability value when generating each word, and taking the probability value as the confidence coefficient of the word in the image, and marking the confidence coefficient as z. When the tail markup word w in step S104 is generatedeOr stopping continuously generating the words when the length of the generated words reaches a preset threshold value, and recording that the words generated at last are w'il. Then for any image i described above, a sequence can be generatedDescriptor w'i1,w′i2…w′ilAnd the confidence z of these words in the imagei1,zi2…zilNormalizing the confidence level by the formula (3) to obtain a normalized confidence level z'i1,z′i2…z′il
Figure BDA0001593769830000081
W 'produced above'i1,w′i2…w′ilAnd z'i1,z′i2…z′ilTogether forming a text index for the image, fig. 2 shows an example of generating a text index for an image by an automatic annotation model.
Step S3, establishing index of all images for the text index generated by step S2, wherein the specific step is that any one of 8233 words in the dictionary is marked as wuFind all images i in which the word appears1,i2…ioAnd the confidence level z 'of the corresponding word in the image'u1,z′u2…z′uoThe images are ordered according to confidence from high to low. A candidate set ordered by confidence may be generated in this manner for any of 8233 words.
Step S4, establishing a thesaurus for solving the situation that the query keyword does not appear in the 8233 terms mentioned above. The method specifically comprises the steps of acquiring a large amount of text data which do not need to be labeled from a network text data set, wherein a corpus of Chinese Wikipedia is acquired. The Chinese Wikipedia corpus comprises the title and the text part of each entry in the Chinese Wikipedia, the number of the entries is 984451, and 984451 texts are obtained after the preprocessing steps of punctuation removal, complex and simple conversion, word segmentation and the like. 984451 texts of Wikipedia and 150000 texts in a training set are trained through word2vec algorithm, wherein all words form a word stock DB, the size of the word stock is 408787, and then each word in the DB can be generatedWord vectors, for any two word vectors ve,vuThe degree of similarity, v, can be calculated by formula (4)e·vuRepresents the inner product of two vectors, | ve|×|vuI represents the product of the lengths of two vectors, and the larger the value, the closer the two meanings are
Figure BDA0001593769830000091
When inquiring about the keyword wuNot present in dictionary D described above, and found with word w via lexicon DBuWord w having the closest meaning and appearing in dictionary DvAnd by the word wvRetrieving related images, FIG. 3 shows an example of finding a synonym for a query keyword from a synonym query thesaurus.
And step S5, receiving the query key words to perform image retrieval. A set of candidate image sets ordered according to confidence may be generated for any word in the thesaurus DB through steps S3, S4. When there are a plurality of query keywords w1,w2,…wnAt first, each keyword retrieves a group of candidate image sets<i1,z1>,<i2,z2>….<io,zo>And for the candidate image i with the occurrence frequency larger than 1, superposing all confidence degrees z of the candidate image i as the final confidence degree of the candidate image i, and removing redundant i to ensure that the candidate image i only appears once. And sorting the candidate images from high to low according to the confidence coefficient after superposition, and selecting the first plurality of candidate images as a returned result.
FIG. 4 shows a general flow chart of image retrieval according to the present invention, which integrates the contents of the above steps, first obtaining a text data set, training to obtain a query thesaurus of near-sense words; acquiring an image data set, training an automatic labeling model and generating a dictionary, generating a text index for an image through the automatic labeling model, and constructing an image retrieval index through the text index; when the query keyword of the user is not in the dictionary of the image data set, the word bank is queried through the similar meaning word to find the word which is closest to the meaning of the query keyword in the dictionary, and the image retrieval is carried out by replacing the query keyword.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (4)

1. An image retrieval method for automatically generating a text index is characterized by comprising the following steps:
s1, learning the automatic labeling model M, and the process is as follows:
s101, acquiring a labeled training data set and an unlabeled image data set, wherein the training data set comprises training images and text descriptions corresponding to the training images, and the image data set only comprises images and does not have text descriptions corresponding to the images;
s102, segmenting all text descriptions of the training data set to construct a dictionary D;
s103, extracting the characteristics of each image in the training data set through CNN, wherein the characteristics are one-dimensional vectors;
s104, for a certain image i in the training data set, performing word segmentation on the corresponding text description to obtain wi1,wi2,…wiLL words in total, and the feature f of the image i extracted from the CNNiAs initial input of hidden unit of RNN, and sequentially inputting word w in each step of recurrent neural network cyclei1,wi2,…wiLObtaining the probability value of each word output in the dictionary after the output result of each step passes through the softmax layer, and recording the word input in the t step as witThe probability distribution of the output is PitThen the step outputs the word witHas a probability of Pit(wit) From the maximum likelihood estimation, it is desirable to maximize the probability of equation (1),
Figure FDA0001593769820000011
s105, aiming at all images in the training data set, the probability of the formula (2) needs to be maximized, and the formula is taken as a target function to carry out back propagation to update the parameters of the model so as to obtain an automatic labeling model M, wherein the model consists of the CNN and the RNN;
Figure FDA0001593769820000012
s2, generating text indexes for all images through the automatic annotation model M;
for any image i in the image data set, firstly, the image feature f is extracted from the CNN part of the automatic labeling model MiAs an initial input to the RNN portion of the auto-labeling model M, then the words are generated in turn, generating the word w'itIs dependent on w 'already generated'i1,w′i2…w′i(t-1)Selecting the word with the maximum output probability value as a generated word at each step, and taking the probability value as the confidence coefficient of the generated word in the image and marking as z;
when the final words generated in the above steps or the length of the generated words reaches a preset threshold value, stopping continuously generating the words; for any image i described above, a sequence of descriptors w 'can be generated'i1,w′i2…w′ilAnd the confidence z of said descriptor in the imagei1,zi2…zilNormalizing the confidence level by formula (3)
Figure FDA0001593769820000021
W 'mentioned above'i1,w′i2…w′ilAnd z'i1,z′i2…z′ilTogether constituting a text index of said image i;
s3, constructing images through the text index of each imageIndex of retrieval, for any word w in dictionary D as described aboveuFind all images i described by the word1,i2…ioAnd the confidence level z 'that the term corresponds in the image'u1,z′u2…z′uoIf the images are ranked according to the confidence degree from high to low, a candidate image set ranked according to the confidence degree is generated for any word in the dictionary D in the mode;
s4, establishing a near-meaning word query word bank, obtaining text data which does not need to be labeled from a network text data set, training the text data through a word2vec algorithm, constructing a word bank DB, belonging to the DB, wherein each word in the word bank has a corresponding word vector, and calculating the meaning similarity of any two words in the word bank DB; when inquiring about the keyword wuNot present in dictionary D described above, and found with word w via lexicon DBuWord w having the closest meaning and appearing in dictionary DvAnd by the word wvRetrieving the relevant images;
s5, receiving query keywords to perform image retrieval, generating a group of candidate image sets sorted according to confidence degrees for any word in the word bank DB according to the steps S3 and S4, combining the candidate image sets generated by each word when a plurality of query keywords exist, and superposing all the confidence degrees of the candidate images i in different query keywords as the final confidence degree of the candidate images i for the candidate images i with the occurrence times larger than 1, removing redundant candidate images i and enabling the candidate images i to appear only once; and sorting the candidate images from high to low according to the confidence coefficient after superposition, and selecting the first plurality of candidate images as a returned result.
2. The image retrieval method of claim 1, wherein in step S2, when the automatic annotation model M generates descriptors for an image, a confidence level is generated for each descriptor at the same time, which indicates the accuracy of the descriptor in describing the image; and by sequencing the confidence level, images with higher relevance to the keywords are accurately retrieved.
3. The image retrieval method of claim 1, wherein in the steps S4 and S5, when the query keyword does not appear in the dictionary D, a word vector is constructed for each word by constructing a lexicon DB, and v is a vector for any two wordse,vuCalculating the similarity of the two words by formula (4), finding out the word which appears in the dictionary D and has the closest meaning to the query keyword, and further retrieving the corresponding image,
Figure FDA0001593769820000031
wherein v ise·vuRepresents the inner product of two vectors, | ve|×|vuI represents the product of the lengths of two vectors, and the larger the value, the closer the two meanings are.
4. The image retrieval method of claim 1, wherein the CNN is ResNet and the RNN is LSTM.
CN201810198490.9A 2018-03-12 2018-03-12 Image retrieval method for automatically generating text index Expired - Fee Related CN108509521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810198490.9A CN108509521B (en) 2018-03-12 2018-03-12 Image retrieval method for automatically generating text index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810198490.9A CN108509521B (en) 2018-03-12 2018-03-12 Image retrieval method for automatically generating text index

Publications (2)

Publication Number Publication Date
CN108509521A CN108509521A (en) 2018-09-07
CN108509521B true CN108509521B (en) 2020-02-18

Family

ID=63376458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810198490.9A Expired - Fee Related CN108509521B (en) 2018-03-12 2018-03-12 Image retrieval method for automatically generating text index

Country Status (1)

Country Link
CN (1) CN108509521B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635135A (en) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 Image index generation method, device, terminal and storage medium
CN110188775B (en) * 2019-05-28 2020-06-26 创意信息技术股份有限公司 Image content description automatic generation method based on joint neural network model
CN111243729B (en) * 2020-01-07 2022-03-08 同济大学 Automatic generation method of lung X-ray chest radiography examination report
CN112349150B (en) * 2020-11-19 2022-05-20 飞友科技有限公司 Video acquisition method and system for airport flight guarantee time node
CN112381038B (en) * 2020-11-26 2024-04-19 中国船舶工业系统工程研究院 Text recognition method, system and medium based on image
CN112148831B (en) * 2020-11-26 2021-03-19 广州华多网络科技有限公司 Image-text mixed retrieval method and device, storage medium and computer equipment
CN113204666B (en) * 2021-05-26 2022-04-05 杭州联汇科技股份有限公司 Method for searching matched pictures based on characters

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901249A (en) * 2009-05-26 2010-12-01 复旦大学 Text-based query expansion and sort method in image retrieval
CN101582080B (en) * 2009-06-22 2011-05-04 浙江大学 Web image clustering method based on image and text relevant mining
CN102360431A (en) * 2011-10-08 2012-02-22 大连海事大学 Method for automatically describing image
US8687886B2 (en) * 2011-12-29 2014-04-01 Konica Minolta Laboratory U.S.A., Inc. Method and apparatus for document image indexing and retrieval using multi-level document image structure and local features

Also Published As

Publication number Publication date
CN108509521A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN108509521B (en) Image retrieval method for automatically generating text index
CN109829104B (en) Semantic similarity based pseudo-correlation feedback model information retrieval method and system
CN113268995B (en) Chinese academy keyword extraction method, device and storage medium
CN107480200B (en) Word labeling method, device, server and storage medium based on word labels
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN110888991A (en) Sectional semantic annotation method in weak annotation environment
Li et al. Question answering over community-contributed web videos
US20190095525A1 (en) Extraction of expression for natural language processing
Gong et al. A semantic similarity language model to improve automatic image annotation
CN103136221B (en) A kind of method for generating requirement templet, demand know method for distinguishing and its device
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
Perdana et al. Instance-based deep transfer learning on cross-domain image captioning
Zhang et al. Semantic image retrieval using region based inverted file
CN115687960B (en) Text clustering method for open source security information
Tian et al. Automatic image annotation with real-world community contributed data set
CN113516202A (en) Webpage accurate classification method for CBL feature extraction and denoising
Ramachandran et al. Document Clustering Using Keyword Extraction
Anusha et al. Multi-classification and automatic text summarization of Kannada news articles
Aref Mining publication papers via text mining Evaluation and Results
Tian et al. Textual ontology and visual features based search for a paleontology digital library
Wiesen et al. Overview of uni-modal and multi-modal representations for classification tasks
CN113742520B (en) Video query and search method of dense video description algorithm based on semi-supervised learning
Joga et al. Semantic text analysis using machine learning
Wu et al. A new passage ranking algorithm for video question answering
Training-Less Wael Alkhatib (), Saba Sabrin, Svenja Neitzel, and Christoph Rensing Communication Multimedia Lab, TU Darmstadt, Rundeturmstr. 10, 64283 Darmstadt, Germany {wael. alkhatib, svenja. neitzel, christoph. rensing}@ kom. tu-darmstadt. de, saba. sabrin@ stud. tu-darmstadt. de

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200218

CF01 Termination of patent right due to non-payment of annual fee