CN111651660A - Method for cross-media retrieval of difficult samples - Google Patents

Method for cross-media retrieval of difficult samples Download PDF

Info

Publication number
CN111651660A
CN111651660A CN202010468272.XA CN202010468272A CN111651660A CN 111651660 A CN111651660 A CN 111651660A CN 202010468272 A CN202010468272 A CN 202010468272A CN 111651660 A CN111651660 A CN 111651660A
Authority
CN
China
Prior art keywords
text
data set
sample data
similarity
fine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010468272.XA
Other languages
Chinese (zh)
Other versions
CN111651660B (en
Inventor
王春辉
胡勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Polar Intelligence Technology Co ltd
Original Assignee
Polar Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Polar Intelligence Technology Co ltd filed Critical Polar Intelligence Technology Co ltd
Priority to CN202010468272.XA priority Critical patent/CN111651660B/en
Publication of CN111651660A publication Critical patent/CN111651660A/en
Application granted granted Critical
Publication of CN111651660B publication Critical patent/CN111651660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of natural language understanding, and discloses a method for cross-media retrieval of difficult samples. The method comprises the following steps: and calculating a fine-grained label representing the correlation size between the text in the text image pair and the text description of the image, and calculating the similarity of the text image pair based on the fine-grained label, thereby realizing the cross-media retrieval of the difficult sample. The method fully utilizes the characteristic that text information contains richer information compared with image information, fully excavates difficult samples in training data, distributes fine-grained labels to the difficult samples according to the difficulty degree, calculates the similarity of the text image pair based on the fine-grained labels, and improves the accuracy of the cross-media retrieval difficult samples.

Description

Method for cross-media retrieval of difficult samples
Technical Field
The invention belongs to the technical field of natural language understanding, and particularly relates to a method for cross-media retrieval of difficult samples.
Background
With the rapid development of internet technology and social media, data in various media forms has exploded. The demand of internet users for information retrieval is gradually increasing. The traditional information retrieval method based on single media cannot meet the requirements of internet users, and users hope to inquire the results of other multiple media types by retrieving media information of one mode. To meet this demand, cross-media information retrieval techniques are receiving increasing attention.
In 2004, hardon et al applied a typical correlation analysis cca (cancer correlation analysis) to the cross-media information retrieval task for the first time. CCA is a linear mathematical model whose main objective is to learn the subspace to maximize the pairwise correlation of two sets of heterogeneous data. After inputting an image/text pair, the CCA measures the similarity between text and image by mapping image and text features to the largest relevant subspace.
In recent years, with the rapid development of deep learning, more and more deep neural network-based cross-media information retrieval models are proposed. The original data set is a positive example of a pair, i.e. a text/image pair representing the same semantic concept. To provide the negative examples required for model training, it is common practice to randomly combine images and text of different semantic concepts to form negative image/text pairs. The model based on the deep neural network generally uses the neural network to extract features of cross-media data, and due to the characteristic of nonlinear mapping, the deep learning model has good expression capability on various complex media data. Dcca (deep CCA) is a non-linear extension of the CCA model used to learn complex non-linear transformations between two types of media data. It constructs a network with a shared layer for data of different media types, which contains two sub-networks, and the output layers are maximally correlated by learning. This method of constructing a data set introduces inevitable problems for the training of the model: the randomly combined negative samples have a large number of simple samples which can be easily and accurately detected by the model, and the samples contribute little to the training of the model. However, there are always some positive and negative examples in the dataset that are easily misclassified, and such examples are called difficult examples. In the process of model training, the influence of a small number of difficult samples which are easily classified wrongly is often ignored because of the influence of a large number of simple samples, so that the model can not converge to a better result and falls into local optimization.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a method for cross-media searching a difficult sample.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of retrieving difficult samples across media, comprising the steps of:
step 1, calculating a fine-grained label representing the correlation size between texts in a text image pair and text descriptions of images;
step 1.1, randomly selecting texts and images belonging to the same semantic category from an original data set D of a text image pair to form a positive sample data set
Figure BDA0002513369810000021
Randomly selecting texts and images belonging to different semantic categories from the D to form a negative sample data set
Figure BDA0002513369810000022
Wherein the content of the first and second substances,
Figure BDA0002513369810000023
each text image pair in the step D has the same semantic category; n, J, K number of sample pairs D, P, E, K ═ J;
step 1.2, extracting from D and P
Figure BDA0002513369810000024
Corresponding text
Figure BDA0002513369810000025
Composing positive text pairs
Figure BDA0002513369810000026
Extraction of D from E
Figure BDA0002513369810000027
Corresponding text
Figure BDA0002513369810000028
Composing negative text pairs
Figure BDA0002513369810000029
Computing
Figure BDA00025133698100000210
And
Figure BDA00025133698100000211
degree of similarity of
Figure BDA00025133698100000212
Figure BDA00025133698100000213
And
Figure BDA00025133698100000214
degree of similarity of
Figure BDA00025133698100000215
Step 1.3, calculating a fine-grained label of any text image pair in the positive sample data set P and the negative sample data set E:
Figure BDA0002513369810000031
Figure BDA0002513369810000032
step 2, calculating the similarity of the text image pair based on the fine-grained label;
step 2.1, using graph volume model GCN (GraphConvolation)alNetwork) extracting text features v of input text TT
Step 2.2, extracting image features v of the input image I by using a convolutional Neural network model CCN (convolutional Neural networks)I
Step 2.3, based on vT、vIConstructing a positive sample data set
Figure BDA0002513369810000033
And negative sample data set
Figure BDA0002513369810000034
Q1、Q2The number of sample pairs of the positive sample data set and the negative sample data set respectively; respectively calculating the similarity of the text image pair in the positive sample data set and the negative sample data set
Figure BDA0002513369810000035
And correcting by using a fine-grained label:
Figure BDA0002513369810000036
Figure BDA0002513369810000037
in the formula (I), the compound is shown in the specification,
Figure BDA0002513369810000038
for the corrected similarity, β is the influence coefficient of the set fine-grained label on the similarity,
Figure BDA0002513369810000039
calculated according to the formula (1),
Figure BDA00025133698100000310
calculated according to the formula (2).
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the fine-grained labels representing the correlation size between the text in the text image pair and the text description of the image are calculated, the similarity of the text image pair is calculated based on the fine-grained labels, and cross-media retrieval of difficult samples is achieved. The method fully utilizes the characteristic that text information contains richer information compared with image information, fully excavates difficult samples in training data, distributes fine-grained labels to the difficult samples according to the difficulty degree, calculates the similarity of the text image pair based on the fine-grained labels, and improves the accuracy of the cross-media retrieval difficult samples.
Drawings
Fig. 1 is a schematic diagram of a similarity distribution curve of a text image, wherein the horizontal axis represents similarity and the vertical axis represents logarithm of samples.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The embodiment of the invention provides a method for cross-media retrieval of difficult samples, which comprises the following steps:
s101, calculating a fine-grained label representing the correlation size between texts in a text image pair and text descriptions of images;
s1011, randomly selecting texts and images belonging to the same semantic category from the original data set D of the text image pair to form a positive sample data set
Figure BDA0002513369810000041
Randomly selecting texts and images belonging to different semantic categories from the D to form a negative sample data set
Figure BDA0002513369810000042
Wherein the content of the first and second substances,
Figure BDA0002513369810000043
each text image pair in the step D has the same semantic category; n, J, K number of sample pairs D, P, E, K ═ J;
s1012, extracting from D and P
Figure BDA0002513369810000044
Corresponding text
Figure BDA0002513369810000045
Composing positive text pairs
Figure BDA0002513369810000046
Extraction of D from E
Figure BDA0002513369810000047
Corresponding text
Figure BDA0002513369810000048
Composing negative text pairs
Figure BDA0002513369810000049
Computing
Figure BDA00025133698100000410
And
Figure BDA00025133698100000411
degree of similarity of
Figure BDA00025133698100000412
Figure BDA00025133698100000413
And
Figure BDA00025133698100000414
degree of similarity of
Figure BDA00025133698100000415
S1013, calculating a fine-grained label of any text image pair in the positive sample data set P and the negative sample data set E:
Figure BDA00025133698100000416
Figure BDA00025133698100000417
s102, calculating the similarity of the text image pair based on the fine-granularity label;
s1021, extracting text feature v of input text T by using graph rolling model GCNT
S1022, extracting image features v of input image I by using convolutional neural network model CCNI
S1023 based on vT、vIConstructing a positive sample data set
Figure BDA0002513369810000051
And negative sample data set
Figure BDA0002513369810000052
Q1、Q2The number of sample pairs of the positive sample data set and the negative sample data set respectively; respectively calculating the similarity of each text image pair in the positive sample data set and the negative sample data set
Figure BDA0002513369810000053
And correcting by using a fine-grained label:
Figure BDA0002513369810000054
Figure BDA0002513369810000055
in the formula (I), the compound is shown in the specification,
Figure BDA0002513369810000056
for the corrected similarity, β is the influence coefficient of the set fine-grained label on the similarity,
Figure BDA0002513369810000057
calculated according to the formula (1),
Figure BDA0002513369810000058
calculated according to the formula (2).
The implementation of the present embodiment is divided into two phases. The first stage is to calculate a fine-grained label of text similarity, which is realized by the step S101; the second phase is to implement cross-modal information retrieval based on fine-grained tags, which is implemented by step S102. The main goal of the first stage is to measure the correlation between the text in the text image pair and the original text description of the image. Text descriptions typically contain richer and more specific information than images. Therefore, the present embodiment represents the image semantics by using the original text description of the image, and judges the difficulty level of the sample by calculating the similarity between the original text and the text in the text image pair. For positive samples, the smaller the similarity, the greater the sample difficulty; for negative samples, the greater the similarity, the greater the sample difficulty.
Step S101 specifically includes S1011 to S1013.
Step S1011 constructs a positive sample data set P and a negative sample data set E based on the original data set D.
Step S1012 extracts the positive text pair and the negative text pair based on D, P, E, and calculates the similarity of each positive text pair and negative text pair, respectively. The similarity adopts cosine similarity.
Step S1013 calculates a fine-grained label of any one text image pair in the positive sample data set P and the negative sample data set E according to the formulas (1) and (2) according to the similarity between each positive text pair and each negative text pair. According to the formulas (1) and (2), the maximum value of the fine-grained label is 1, and the minimum value is 0.
Step S102 specifically includes S1021 to S1023.
Step S1021 extracts a text feature of the input text T using the graph convolution model GCN. The GCN expands the convolution operation into the data of the graph structure, so that the GCN has strong capability of learning local features and fixed features of the graph and is widely applied to a text classification task. In recent research, GCN has demonstrated powerful text semantic modeling and text classification capabilities. In this embodiment, the GCN includes two convolution layers, each convolution layer is followed by a ReLU; the text features are then mapped to the underlying shared semantic space through a fully connected layer.
Step S1022 extracts the image features of the input image I using the convolutional neural network model CCN. CCN is a common model for extracting image features. Pre-trained VGG-19 may also be used to extract image features. For a given one 224 x 224 image, select a 4096-dimensional vector of outputs of the penultimate layer, i.e., FC7 layer, in VGG-19; and then mapped to the potential shared semantic space through a fully connected layer.
And S1023, constructing a positive sample data set and a negative sample data set based on the text features and the image features extracted in the last step, respectively calculating the similarity of each text image pair in the positive sample data set and the negative sample data set, and correcting by using a fine-grained label.
As an alternative embodiment, the Loss function Loss of model learning is:
Loss=(σ2+2-)+λmax(0,m-(μ+-))(5)
Figure BDA0002513369810000061
Figure BDA0002513369810000062
Figure BDA0002513369810000063
Figure BDA0002513369810000064
in the formula, mu+、σ2+Is composed of
Figure BDA0002513369810000065
Mean and variance of (d), mu-、σ2-Is composed of
Figure BDA0002513369810000066
λ is a set proportionality coefficient for adjusting the mean and the variance, and m is a set (μ)+-) The upper limit value of (3).
In this implementationIn the example, in order to reduce the proportion of the model to the identification error of the difficult sample and make the neural network model converge to a better result, the loss function is improved, such as the formulas (5) to (9), and the improved similarity is a value corrected by a fine-grained label. The left curve in fig. 1 represents the similarity distribution of the text image pair of different semantic categories, the right curve represents the similarity distribution of the text image pair of the same semantic category, and the size of the area of the shaded portion reflects the size of the false alarm ratio. The result of minimizing the loss function is to make μ according to equation (5)+At maximum, make mu-、σ2-、σ2+And minimum. From FIG. 1, it is clear that μ-、σ2-、σ2+The smaller, mu+The larger the size, the smaller the area of the shaded portion. Therefore, the shadow area is minimized when the loss function is minimized, and the false alarm rate is reduced. According to the formula (4), after the fine-grained label correction, the similarity of the negative sample pairs is increased, the negative simple samples are increased less, the negative difficult samples are increased more, and the penalty of the negative difficult samples in the learning process is increased, which is equivalent to the right shift of the left curve in fig. 1. Similarly, according to the formula (3), the similarity of the positive sample pairs is reduced, the number of the positive simple samples is reduced, the number of the positive difficult samples is reduced, the penalty of the positive difficult samples is increased in the learning process, and the left shift of the right curve in fig. 1 is equivalent. The area of the shadow part is increased as a result of the left curve moving to the right and the right curve moving to the left, the area of the shadow part is minimized in the learning process, the attention to difficult samples is increased, and the model converges to a better result.
To verify the effectiveness of the present invention, a set of experimental data is given below. The experiment used three data sets, English-Wiki, TVGraz and Chinese-Wiki, respectively, containing 2866, 2360 and 3103 pairs of text images, respectively. The cross-media retrieval is performed on three data sets by using the method and the existing GIN model. The method is different from GIN in that excavation of difficult samples and fine-grained label distribution of samples with different difficulty degrees are added, and the fine-grained labels are added in the calculation process of the loss function, so that the influence of the difficult samples on model learning is enhanced. The results of the experiment are shown in table 1.
TABLE 1 results of the experiment
Figure BDA0002513369810000071
As can be seen from Table 1, the accuracy of the method of the present invention is significantly better than that of the other models, and about 4%, 3% and 10% increase in English-Wiki, TVGraz and Chinese-Wiki, respectively, compared to GIN. This indicates that the information of sample difficulty degree marked by fine-grained label is helpful to improve the performance of the existing model in the cross-media information retrieval task. Meanwhile, the effectiveness of the method in the task of distributing the fine-grained labels is proved, the introduction of the fine-grained labels enables the learning of the model to pay more attention to the difficult samples, and the model retrieval performance is further improved.
The above description is only for the purpose of illustrating a few embodiments of the present invention, and should not be taken as limiting the scope of the present invention, in which all equivalent changes, modifications, or equivalent scaling-up or down, etc. made in accordance with the spirit of the present invention should be considered as falling within the scope of the present invention.

Claims (2)

1. A method of retrieving difficult samples across media, comprising the steps of:
step 1, calculating a fine-grained label representing the correlation size between texts in a text image pair and text descriptions of images;
step 1.1, randomly selecting texts and images belonging to the same semantic category from an original data set D of a text image pair to form a positive sample data set
Figure FDA0002513369800000011
Randomly selecting texts and images belonging to different semantic categories from the D to form a negative sample data set
Figure FDA0002513369800000012
Wherein the content of the first and second substances,
Figure FDA0002513369800000013
each text image in DPairs all have the same semantic category; n, J, K number of sample pairs D, P, E, K ═ J;
step 1.2, extracting from D and P
Figure FDA0002513369800000014
Corresponding text
Figure FDA0002513369800000015
Composing positive text pairs
Figure FDA0002513369800000016
Extraction of D from E
Figure FDA0002513369800000017
Corresponding text
Figure FDA0002513369800000018
Composing negative text pairs
Figure FDA0002513369800000019
Computing
Figure FDA00025133698000000110
And
Figure FDA00025133698000000111
degree of similarity of
Figure FDA00025133698000000112
And
Figure FDA00025133698000000113
degree of similarity of
Figure FDA00025133698000000114
Step 1.3, calculating a fine-grained label of any text image pair in the positive sample data set P and the negative sample data set E:
Figure FDA00025133698000000115
Figure FDA00025133698000000116
step 2, calculating the similarity of the text image pair based on the fine-grained label;
step 2.1, extracting text characteristics v of input text T by using graph volume model GCNT
Step 2.2, extracting image characteristics v of the input image I by using the convolutional neural network model CCNI
Step 2.3, based on vT、vIConstructing a positive sample data set
Figure FDA00025133698000000117
And negative sample data set
Figure FDA00025133698000000118
Q1、Q2The number of sample pairs of the positive sample data set and the negative sample data set respectively; respectively calculating the similarity of the text image pair in the positive sample data set and the negative sample data set
Figure FDA0002513369800000021
And correcting by using a fine-grained label:
Figure FDA0002513369800000022
Figure FDA0002513369800000023
in the formula (I), the compound is shown in the specification,
Figure FDA0002513369800000024
for the corrected similarity, β is set fine granularityThe influence coefficient of the label on the similarity,
Figure FDA0002513369800000025
calculated according to the formula (1),
Figure FDA0002513369800000026
calculated according to the formula (2).
2. The method of retrieving difficult samples across media as claimed in claim 1, wherein the Loss function Loss of model learning is:
Loss=(σ2+2-)+λmax(0,m-(μ+-)) (5)
Figure FDA0002513369800000027
Figure FDA0002513369800000028
Figure FDA0002513369800000029
Figure FDA00025133698000000210
in the formula, mu+、σ2+Is composed of
Figure FDA00025133698000000211
Mean and variance of (d), mu-、σ2-Is composed of
Figure FDA00025133698000000212
λ is a set proportionality coefficient for adjusting the mean and the variance, and m is a set (μ)+-) The upper limit value of (3).
CN202010468272.XA 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples Active CN111651660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010468272.XA CN111651660B (en) 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010468272.XA CN111651660B (en) 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples

Publications (2)

Publication Number Publication Date
CN111651660A true CN111651660A (en) 2020-09-11
CN111651660B CN111651660B (en) 2023-05-02

Family

ID=72347038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010468272.XA Active CN111651660B (en) 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples

Country Status (1)

Country Link
CN (1) CN111651660B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688915A (en) * 2021-08-24 2021-11-23 北京玖安天下科技有限公司 Content security-oriented difficult sample mining method and device
CN115630178A (en) * 2022-11-14 2023-01-20 南京码极客科技有限公司 Cross-media retrieval method based on channel fine-grained semantic features

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120089552A1 (en) * 2008-12-22 2012-04-12 Shih-Fu Chang Rapid image annotation via brain state decoding and visual pattern mining
CN105701227A (en) * 2016-01-15 2016-06-22 北京大学 Cross-media similarity measure method and search method based on local association graph
CN106095893A (en) * 2016-06-06 2016-11-09 北京大学深圳研究生院 A kind of cross-media retrieval method
JP2018025949A (en) * 2016-08-09 2018-02-15 日本電信電話株式会社 Learning device, image search device, method, and program
CN108319686A (en) * 2018-02-01 2018-07-24 北京大学深圳研究生院 Antagonism cross-media retrieval method based on limited text space
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
US20190213447A1 (en) * 2017-02-08 2019-07-11 Nanjing University Of Aeronautics And Astronautics Sample selection method and apparatus and server
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
JP2019178949A (en) * 2018-03-30 2019-10-17 株式会社 Ngr Image generation method
CN110457516A (en) * 2019-08-12 2019-11-15 桂林电子科技大学 A kind of cross-module state picture and text search method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120089552A1 (en) * 2008-12-22 2012-04-12 Shih-Fu Chang Rapid image annotation via brain state decoding and visual pattern mining
CN105701227A (en) * 2016-01-15 2016-06-22 北京大学 Cross-media similarity measure method and search method based on local association graph
CN106095893A (en) * 2016-06-06 2016-11-09 北京大学深圳研究生院 A kind of cross-media retrieval method
JP2018025949A (en) * 2016-08-09 2018-02-15 日本電信電話株式会社 Learning device, image search device, method, and program
US20190213447A1 (en) * 2017-02-08 2019-07-11 Nanjing University Of Aeronautics And Astronautics Sample selection method and apparatus and server
CN108319686A (en) * 2018-02-01 2018-07-24 北京大学深圳研究生院 Antagonism cross-media retrieval method based on limited text space
JP2019178949A (en) * 2018-03-30 2019-10-17 株式会社 Ngr Image generation method
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
CN110457516A (en) * 2019-08-12 2019-11-15 桂林电子科技大学 A kind of cross-module state picture and text search method

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
GE SONG 等: "Learning Multilevel Semantic Similarity for Large-Scale Multi-Label Image Retrieval" *
YUXIN PENG 等: "CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network", 《IEEE》 *
卓昀侃;綦金玮;彭宇新;: "跨媒体深层细粒度关联学习方法" *
卓昀侃等: "跨媒体深层细粒度关联学习方法", 《软件学报》 *
张超;陈莹;: "残差网络下基于困难样本挖掘的目标检测" *
綦金玮等: "面向跨媒体检索的层级循环注意力网络模型", 《中国图象图形学报》 *
舒忠;: "基于深度学习的图像样本标签赋值校正算法实现" *
裔阳;周绍光;赵鹏飞;胡屹群;: "基于正样本和未标记样本的遥感图像分类方法" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688915A (en) * 2021-08-24 2021-11-23 北京玖安天下科技有限公司 Content security-oriented difficult sample mining method and device
CN113688915B (en) * 2021-08-24 2023-07-25 北京玖安天下科技有限公司 Difficult sample mining method and device for content security
CN115630178A (en) * 2022-11-14 2023-01-20 南京码极客科技有限公司 Cross-media retrieval method based on channel fine-grained semantic features

Also Published As

Publication number Publication date
CN111651660B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
US11436414B2 (en) Device and text representation method applied to sentence embedding
US11544474B2 (en) Generation of text from structured data
CN110929515B (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
CN105279495B (en) A kind of video presentation method summarized based on deep learning and text
Xu et al. Remote sensing image scene classification based on generative adversarial networks
CN110765260A (en) Information recommendation method based on convolutional neural network and joint attention mechanism
CN105022754B (en) Object classification method and device based on social network
CN111444367B (en) Image title generation method based on global and local attention mechanism
US10685012B2 (en) Generating feature embeddings from a co-occurrence matrix
WO2020215683A1 (en) Semantic recognition method and apparatus based on convolutional neural network, and non-volatile readable storage medium and computer device
US20200342909A1 (en) Methods and systems of automatically generating video content from scripts/text
CN111599340A (en) Polyphone pronunciation prediction method and device and computer readable storage medium
CN111651660A (en) Method for cross-media retrieval of difficult samples
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN113239159B (en) Cross-modal retrieval method for video and text based on relational inference network
Lin et al. Ensemble making few-shot learning stronger
CN113076744A (en) Cultural relic knowledge relation extraction method based on convolutional neural network
CN116958868A (en) Method and device for determining similarity between text and video
CN111159370A (en) Short-session new problem generation method, storage medium and man-machine interaction device
CN116108181A (en) Client information processing method and device and electronic equipment
CN115934951A (en) Network hot topic user emotion prediction method
CN110489660A (en) A kind of user's economic situation portrait method of social media public data
CN116151258A (en) Text disambiguation method, electronic device and storage medium
Li et al. Short text sentiment analysis based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant