CN111651660B - Method for cross-media retrieval of difficult samples - Google Patents

Method for cross-media retrieval of difficult samples Download PDF

Info

Publication number
CN111651660B
CN111651660B CN202010468272.XA CN202010468272A CN111651660B CN 111651660 B CN111651660 B CN 111651660B CN 202010468272 A CN202010468272 A CN 202010468272A CN 111651660 B CN111651660 B CN 111651660B
Authority
CN
China
Prior art keywords
text
sample data
image
similarity
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010468272.XA
Other languages
Chinese (zh)
Other versions
CN111651660A (en
Inventor
王春辉
胡勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Polar Intelligence Technology Co ltd
Original Assignee
Polar Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Polar Intelligence Technology Co ltd filed Critical Polar Intelligence Technology Co ltd
Priority to CN202010468272.XA priority Critical patent/CN111651660B/en
Publication of CN111651660A publication Critical patent/CN111651660A/en
Application granted granted Critical
Publication of CN111651660B publication Critical patent/CN111651660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of natural language understanding, and discloses a method for cross-media retrieval of difficult samples. The method comprises the following steps: and calculating a fine granularity label for representing the correlation between the text in the text image pair and the text description of the image, and calculating the similarity of the text image pair based on the fine granularity label, so that the cross-media retrieval of the difficult sample is realized. The method and the device fully utilize the characteristic that text information contains richer information compared with image information, fully mine difficult samples in training data, allocate fine granularity labels for the difficult samples according to the difficulty degree, calculate the similarity of text image pairs based on the fine granularity labels, and improve the accuracy of cross-media retrieval difficult samples.

Description

Method for cross-media retrieval of difficult samples
Technical Field
The invention belongs to the technical field of natural language understanding, and particularly relates to a method for cross-media retrieval of difficult samples.
Background
With the rapid growth of internet technology and social media, data in various media forms has seen explosive growth. The demands of internet users for information retrieval are increasing. The conventional information retrieval method based on single media cannot meet the requirements of internet users, and the users hope to search the results of other multiple media types by retrieving media information of one mode. To meet this need, cross-media information retrieval technology is receiving increasing attention.
In 2004, hardoon et al applied typical correlation analysis CCA (Canonical Correlation Analysis) to cross-media information retrieval tasks for the first time. CCA is a linear mathematical model, the main purpose of which is to learn the subspace to maximize the pairwise correlation of two sets of heterogeneous data. After the image/text pair is entered, the CCA measures the similarity between the text and the image by mapping the image and text features to the largest relevant subspace.
In recent years, with the rapid development of deep learning, more and more cross-media information retrieval models based on deep neural networks are proposed. The original dataset is a positive example of a pair, i.e. a text/image pair representing the same semantic concept. To provide the negative examples required for model training, it is common practice to combine images and text of different semantic concepts randomly, constituting negative image/text pairs. The model based on the deep neural network generally uses the neural network to perform feature extraction on cross-media data, and the deep learning model has good expression capability on various complex media data due to the characteristic of nonlinear mapping of the model. DCCA (Deep CCA) is a nonlinear extension of the CCA model for learning complex nonlinear transformations between two types of media data. It constructs a network with shared layers for data of different media types, comprising two sub-networks, which are maximally correlated by learning. This method of constructing a data set presents an unavoidable problem for model training: there are a large number of simple samples in the randomly combined negative samples that are easily accurately detected by the model, and such samples contribute little to the training of the model. However, there are always some positive and negative samples in the dataset that are prone to misclassification, such samples being called difficult samples. In the model training process, the influence of a small number of difficult samples which are easy to be misclassified is ignored because of the influence of a large number of simple samples, so that the model cannot be converged to a better result and falls into local optimum.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention proposes a method for cross-media retrieval of difficult samples.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a method of cross-media retrieval of difficult samples, comprising the steps of:
step 1, calculating a fine granularity label which characterizes the correlation between the text in a text image pair and the text description of an image;
step 1.1, randomly selecting texts and images belonging to the same semantic category from the original data set D of the text image pair to form a positive sample data set
Figure BDA0002513369810000021
Randomly selecting texts and images belonging to different semantic categories from D to form a negative sample data set +.>
Figure BDA0002513369810000022
Wherein (1)>
Figure BDA0002513369810000023
D each text image pair has the same semantic category; n, J, K number of sample pairs of D, P, E, k=j;
step 1.2, extracting from D and P
Figure BDA0002513369810000024
Corresponding text->
Figure BDA0002513369810000025
Composing text pairs->
Figure BDA0002513369810000026
Extracting and E from D>
Figure BDA0002513369810000027
Corresponding text->
Figure BDA0002513369810000028
Composing negative text pairs->
Figure BDA0002513369810000029
Calculate->
Figure BDA00025133698100000210
And->
Figure BDA00025133698100000211
Similarity of->
Figure BDA00025133698100000212
Figure BDA00025133698100000213
And (3) with
Figure BDA00025133698100000214
Similarity of->
Figure BDA00025133698100000215
Step 1.3, calculating fine granularity labels of any text image pair in the positive sample data set P and the negative sample data set E:
Figure BDA0002513369810000031
Figure BDA0002513369810000032
step 2, calculating the similarity of the text image pair based on the fine granularity label;
step 2.1, extracting text feature v of the input text T using the graph convolution model GCN (GraphConvolutionalNetwork) T
Step 2.2, convolutional neural network is utilizedModel CCN (Convolutional Neural Networks) extracts image features v of input image I I
Step 2.3, v-based T 、v I Constructing positive sample data sets
Figure BDA0002513369810000033
And negative sample dataset +.>
Figure BDA0002513369810000034
Q 1 、Q 2 The number of sample pairs of the positive sample data set and the negative sample data set, respectively; calculating similarity of text image pair in positive sample data set and negative sample data set>
Figure BDA0002513369810000035
And correcting by using the fine granularity label:
Figure BDA0002513369810000036
Figure BDA0002513369810000037
in the method, in the process of the invention,
Figure BDA0002513369810000038
for the corrected similarity, β is the influence coefficient of the set fine-grained label on the similarity, +.>
Figure BDA0002513369810000039
Calculated according to the formula (1),>
Figure BDA00025133698100000310
calculated according to the formula (2).
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the device and the system, the fine granularity labels which characterize the correlation between the texts in the text image pairs and the text descriptions of the images are calculated, and the similarity of the text image pairs is calculated based on the fine granularity labels, so that the cross-media retrieval of difficult samples is realized. The method and the device fully utilize the characteristic that text information contains richer information compared with image information, fully mine difficult samples in training data, allocate fine granularity labels for the difficult samples according to the difficulty degree, calculate the similarity of text image pairs based on the fine granularity labels, and improve the accuracy of cross-media retrieval difficult samples.
Drawings
Fig. 1 is a schematic diagram of a text image pair similarity distribution curve, with the horizontal axis representing similarity and the vertical axis representing the sample logarithm.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The embodiment of the invention discloses a method for searching a difficult sample by cross media, which comprises the following steps:
s101, calculating a fine granularity label for representing the correlation between text in a text image pair and text description of an image;
s1011, randomly selecting texts and images belonging to the same semantic category from the original data set D of the text image pair to form a positive sample data set
Figure BDA0002513369810000041
Randomly selecting texts and images belonging to different semantic categories from D to form a negative sample data set +.>
Figure BDA0002513369810000042
Wherein (1)>
Figure BDA0002513369810000043
D each text image pair has the same semantic category; n, J, K number of sample pairs of D, P, E, k=j;
s1012, extracting from D and P
Figure BDA0002513369810000044
Corresponding text->
Figure BDA0002513369810000045
Composing text pairs->
Figure BDA0002513369810000046
Extracting and E from D>
Figure BDA0002513369810000047
Corresponding text->
Figure BDA0002513369810000048
Composing negative text pairs->
Figure BDA0002513369810000049
Calculate->
Figure BDA00025133698100000410
And->
Figure BDA00025133698100000411
Similarity of->
Figure BDA00025133698100000412
Figure BDA00025133698100000413
And->
Figure BDA00025133698100000414
Similarity of->
Figure BDA00025133698100000415
S1013, calculating a fine granularity label of any text image pair in the positive sample data set P and the negative sample data set E:
Figure BDA00025133698100000416
Figure BDA00025133698100000417
s102, calculating the similarity of the text image pair based on the fine granularity label;
s1021, extracting text feature v of input text T by using graph rolling model GCN T
S1022, extracting image feature v of input image I by using convolutional neural network model CCN I
S1023 based on v T 、v I Constructing positive sample data sets
Figure BDA0002513369810000051
And negative sample dataset +.>
Figure BDA0002513369810000052
Q 1 、Q 2 The number of sample pairs of the positive sample data set and the negative sample data set, respectively; calculating the similarity of each text image pair in the positive sample data set and the negative sample data set>
Figure BDA0002513369810000053
And correcting by using the fine granularity label:
Figure BDA0002513369810000054
Figure BDA0002513369810000055
in the method, in the process of the invention,
Figure BDA0002513369810000056
for the corrected similarity, β is the influence coefficient of the set fine-grained label on the similarity, +.>
Figure BDA0002513369810000057
Calculated according to the formula (1),>
Figure BDA0002513369810000058
calculated according to the formula (2).
The implementation of this embodiment is divided into two phases. The first stage is to calculate fine granularity labels of text similarity, which is realized by step S101; the second stage is to realize cross-modal information retrieval based on the fine-grained labels, and is realized by step S102. The main objective of the first stage is to measure the correlation between the text in the text image pair and the original text description of the image. Text descriptions typically contain more rich and specific information than images. Therefore, the present embodiment adopts the original text description of the image to represent the image semantics, and judges the difficulty level of the sample by calculating the similarity between the original text and the text in the text image pair. For positive samples, the smaller the similarity, the greater the sample difficulty; for negative samples, the greater the similarity, the greater the sample difficulty.
Step S101 specifically includes S1011 to S1013.
Step S1011 builds a positive sample data set P and a negative sample data set E based on the original data set D.
Step S1012 extracts the text pairs and the negative text pairs based on D, P, E, and calculates the similarity of each text pair and the negative text pair, respectively. The similarity adopts cosine similarity.
Step S1013 calculates fine granularity labels of any one of the text image pairs in the positive sample data set P and the negative sample data set E according to formulas (1), (2) according to the similarity of each text pair and the negative text pair. As can be seen from the formulas (1) and (2), the maximum value of the fine-grained label is 1, and the minimum value is 0.
Step S102 specifically includes S1021-S1023.
Step S1021 extracts text features of the input text T using the graph rolling model GCN. The GCN expands the convolution operation into the data of the graph structure, so that the GCN has strong capability of learning the local features and the fixed features of the graph and is widely applied to text classification tasks. In recent research, GCNs have demonstrated powerful text semantic modeling and text classification capabilities. In this embodiment, the GCN comprises two convolutions, each of which is followed by a ReLU; text features are then mapped to the underlying shared semantic space through a fully connected layer.
Step S1022 extracts image features of the input image I using the convolutional neural network model CCN. CCN is a common model for extracting image features. Pre-trained VGG-19 may also be used to extract image features. For a given 224×224 image, select the vector of 4096 dimensions output by the penultimate layer in VGG-19, the FC7 layer; and then mapped to the underlying shared semantic space through a fully connected layer.
Step S1023 constructs a positive sample data set and a negative sample data set based on the text features and the image features extracted in the previous step, calculates the similarity of each text image pair in the positive sample data set and the negative sample data set respectively, and corrects the text image pairs by using fine granularity labels.
As an alternative embodiment, the model learning Loss function Loss is:
Loss=(σ 2+2- )+λmax(0,m-(μ +- )) (5)
Figure BDA0002513369810000061
Figure BDA0002513369810000062
Figure BDA0002513369810000063
Figure BDA0002513369810000064
wherein mu is + 、σ 2+ Is that
Figure BDA0002513369810000065
Mean and variance, mu - 、σ 2- Is->
Figure BDA0002513369810000066
Lambda is a set scaling factor for adjusting the mean and variance, m is a set (mu) +- ) Upper limit value of (2).
In this embodiment, in order to reduce the ratio of the model to the recognition errors of the difficult sample, and to make the neural network model converge to a better result, the loss function is improved, for example, formulas (5) to (9), and the improved similarity is a value corrected by fine-grained labels. The left curve in fig. 1 represents the similarity distribution of the text image pairs of different semantic categories, the right curve represents the similarity distribution of the text image pairs of the same semantic category, and the size of the hatched area reflects the size of the false positive ratio. The result of minimizing the loss function is to minimize mu according to equation (5) + Maximizing mu - 、σ 2- 、σ 2+ Minimum. From FIG. 1, it is apparent that μ - 、σ 2- 、σ 2+ Smaller, mu + The larger the shadow area, the smaller. Therefore, the area of the shadow part is minimized when the loss function is minimized, so that the false alarm rate is reduced. According to the formula (4), after the fine granularity label correction, the similarity of the negative sample pair is increased, the negative simple sample is increased less, the negative difficult sample is increased more, and the penalty of the negative difficult sample in the learning process is increased, which is equivalent to the right shift of the left curve in fig. 1. Similarly, according to equation (3), the similarity of the positive sample pair decreases, the positive simple sample decreases less, the positive difficult sample decreases more, and the penalty for the positive difficult sample increases during learning, which corresponds to the left shift of the right curve in fig. 1. The left curve moves right and the right curve moves left, so that the area of the shadow part is increased, the area of the shadow part is minimized in the learning process, the attention to difficult samples is increased, and the model is converged to a better result.
In order to verify the effectiveness of the present invention, a set of experimental data is presented below. The experiment employed three data sets, engish-Wiki, TVGraz, and Chinese-Wiki, containing 2866, 2360, and 3103 text image pairs, respectively. The method and the existing GIN model are utilized to carry out cross-media retrieval on three data sets. The biggest difference between the invention and GIN is that the mining of difficult samples and the fine-grained label distribution of samples with different difficult degrees are added, and the fine-grained labels are added in the calculation process of the loss function, so that the influence of the difficult samples on model learning is enhanced. The experimental results are shown in table 1.
Table 1 experimental results
Figure BDA0002513369810000071
As can be seen from Table 1, the accuracy of the method of the present invention is significantly better than other models, and increases by about 4%, 3% and 10% over the English-Wiki, TVGAz and Chinese-Wiki, respectively, as compared to GIN. This information indicating the difficulty level of the sample marked by the fine-grained labels helps to improve the performance of existing models in cross-media information retrieval tasks. Meanwhile, the effectiveness of the method in the task of distributing the fine-grained labels is proved, and the introduction of the fine-grained labels enables the learning of the model to pay more attention to difficult samples, so that the retrieval performance of the model is further improved.
The foregoing description of the embodiments of the present invention should not be taken as limiting the scope of the invention, but rather should be construed as falling within the scope of the invention, as long as the invention is modified or enlarged or reduced in terms of equivalent variations or modifications, equivalent proportions, or the like, which are included in the spirit of the invention.

Claims (2)

1. A method for cross-media retrieval of difficult samples, comprising the steps of:
step 1, calculating a fine granularity label which characterizes the correlation between the text in a text image pair and the text description of an image;
step 1.1, randomly selecting texts and images belonging to the same semantic category from the original data set D of the text image pair to form a positive sample data set
Figure FDA0004040493170000011
Randomly selecting texts and images belonging to different semantic categories from D to form a negative sample data set +.>
Figure FDA0004040493170000012
Wherein (1)>
Figure FDA0004040493170000013
D each text image pair has the same semantic category; n, J, K number of sample pairs of D, P, E, k=j; />
Figure FDA0004040493170000014
For text in the j-th positive sample data,
Figure FDA0004040493170000015
for the image in the j-th positive sample data, is->
Figure FDA0004040493170000016
For text in the kth negative sample data, +.>
Figure FDA0004040493170000017
T is the image in the kth negative sample data i D For text in the ith raw data, +.>
Figure FDA0004040493170000018
Is the image in the ith original data;
step 1.2, extracting from D and P
Figure FDA0004040493170000019
Corresponding text->
Figure FDA00040404931700000110
Composing text pairs->
Figure FDA00040404931700000111
Extracting from D and E
Figure FDA00040404931700000112
Corresponding text->
Figure FDA00040404931700000113
Composing negative text pairs->
Figure FDA00040404931700000114
Calculate->
Figure FDA00040404931700000115
And->
Figure FDA00040404931700000116
Similarity of->
Figure FDA00040404931700000117
Figure FDA00040404931700000118
And->
Figure FDA00040404931700000119
Similarity of->
Figure FDA00040404931700000120
Step 1.3, calculating fine granularity labels of any text image pair in the positive sample data set P and the negative sample data set E:
Figure FDA00040404931700000121
Figure FDA00040404931700000122
step 2, calculating the similarity of the text image pair based on the fine granularity label;
step 2.1, extracting input by using a graph rolling model GCNText feature v of incoming text T T
Step 2.2, extracting image feature v of the input image I by using the convolutional neural network model CCN I
Step 2.3, v-based T 、v I Constructing positive sample data sets
Figure FDA0004040493170000021
And negative sample dataset +.>
Figure FDA0004040493170000022
Q 1 、Q 2 The number of sample pairs of the positive sample data set and the negative sample data set, respectively; calculating similarity of text image pair in positive sample data set and negative sample data set>
Figure FDA0004040493170000023
And correcting by using the fine granularity label:
Figure FDA0004040493170000024
Figure FDA0004040493170000025
in the method, in the process of the invention,
Figure FDA0004040493170000026
for the corrected similarity, β is the influence coefficient of the set fine-grained label on the similarity, +.>
Figure FDA0004040493170000027
Calculated according to the formula (1),>
Figure FDA0004040493170000028
calculated according to formula (2),>
Figure FDA0004040493170000029
text in the n-th positive sample data, +.>
Figure FDA00040404931700000210
For the image in the nth positive sample data, +.>
Figure FDA00040404931700000211
For text in the nth negative sample data, +.>
Figure FDA00040404931700000212
Is the image in the nth negative sample data. />
2. The method of claim 1, wherein the model learning Loss function Loss of the cross-media information retrieval model based on the deep neural network is:
Loss=(σ 2+2- )+λmax(0,m-(μ +- ))(5)
Figure FDA00040404931700000213
Figure FDA00040404931700000214
Figure FDA00040404931700000215
Figure FDA00040404931700000216
wherein mu is + 、σ 2+ Is that
Figure FDA00040404931700000217
Mean and variance, mu - 、σ 2- Is->
Figure FDA00040404931700000218
Lambda is a set scaling factor for adjusting the mean and variance, m is a set (mu) +- ) Upper limit value of (2). />
CN202010468272.XA 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples Active CN111651660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010468272.XA CN111651660B (en) 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010468272.XA CN111651660B (en) 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples

Publications (2)

Publication Number Publication Date
CN111651660A CN111651660A (en) 2020-09-11
CN111651660B true CN111651660B (en) 2023-05-02

Family

ID=72347038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010468272.XA Active CN111651660B (en) 2020-05-28 2020-05-28 Method for cross-media retrieval of difficult samples

Country Status (1)

Country Link
CN (1) CN111651660B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688915B (en) * 2021-08-24 2023-07-25 北京玖安天下科技有限公司 Difficult sample mining method and device for content security
CN115630178A (en) * 2022-11-14 2023-01-20 南京码极客科技有限公司 Cross-media retrieval method based on channel fine-grained semantic features

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018025949A (en) * 2016-08-09 2018-02-15 日本電信電話株式会社 Learning device, image search device, method, and program
CN108319686A (en) * 2018-02-01 2018-07-24 北京大学深圳研究生院 Antagonism cross-media retrieval method based on limited text space
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8671069B2 (en) * 2008-12-22 2014-03-11 The Trustees Of Columbia University, In The City Of New York Rapid image annotation via brain state decoding and visual pattern mining
CN105701227B (en) * 2016-01-15 2019-02-01 北京大学 A kind of across media method for measuring similarity and search method based on local association figure
CN106095893B (en) * 2016-06-06 2018-11-20 北京大学深圳研究生院 A kind of cross-media retrieval method
CN108399414B (en) * 2017-02-08 2021-06-01 南京航空航天大学 Sample selection method and device applied to cross-modal data retrieval field
JP2019178949A (en) * 2018-03-30 2019-10-17 株式会社 Ngr Image generation method
CN110457516A (en) * 2019-08-12 2019-11-15 桂林电子科技大学 A kind of cross-module state picture and text search method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018025949A (en) * 2016-08-09 2018-02-15 日本電信電話株式会社 Learning device, image search device, method, and program
CN108319686A (en) * 2018-02-01 2018-07-24 北京大学深圳研究生院 Antagonism cross-media retrieval method based on limited text space
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Ge Song 等.Learning Multilevel Semantic Similarity for Large-Scale Multi-Label Image Retrieval.ACM.2018,第64-72页. *
卓昀侃 ; 綦金玮 ; 彭宇新 ; .跨媒体深层细粒度关联学习方法.软件学报.2019,(04),第884-895页. *
张超 ; 陈莹 ; .残差网络下基于困难样本挖掘的目标检测.激光与光电子学进展.2018,(10),第111-117页. *
舒忠 ; .基于深度学习的图像样本标签赋值校正算法实现.数字印刷.2019,(Z1),第38-45、73页. *
裔阳 ; 周绍光 ; 赵鹏飞 ; 胡屹群 ; .基于正样本和未标记样本的遥感图像分类方法.计算机工程与应用.2017,(04),第160-166、230页. *

Also Published As

Publication number Publication date
CN111651660A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN109086303B (en) Intelligent conversation method, device and terminal based on machine reading understanding
CN107193797B (en) Hot topic detection and trend prediction method for Chinese microblog
CN110826337A (en) Short text semantic training model obtaining method and similarity matching algorithm
CN108228569B (en) Chinese microblog emotion analysis method based on collaborative learning under loose condition
CN110222560B (en) Text person searching method embedded with similarity loss function
CN108038492A (en) A kind of perceptual term vector and sensibility classification method based on deep learning
CN112257449B (en) Named entity recognition method and device, computer equipment and storage medium
CN110046223B (en) Film evaluation emotion analysis method based on improved convolutional neural network model
CN111651660B (en) Method for cross-media retrieval of difficult samples
CN111695338A (en) Interview content refining method, device, equipment and medium based on artificial intelligence
CN115186665B (en) Semantic-based unsupervised academic keyword extraction method and equipment
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN111767697A (en) Text processing method and device, computer equipment and storage medium
CN110298046B (en) Translation model training method, text translation method and related device
CN108920451A (en) Text emotion analysis method based on dynamic threshold and multi-categorizer
CN115761408A (en) Knowledge distillation-based federal domain adaptation method and system
CN110532378B (en) Short text aspect extraction method based on topic model
CN111079011A (en) Deep learning-based information recommendation method
CN113239159A (en) Cross-modal retrieval method of videos and texts based on relational inference network
CN112434533A (en) Entity disambiguation method, apparatus, electronic device, and computer-readable storage medium
CN116958868A (en) Method and device for determining similarity between text and video
CN109635289B (en) Entry classification method and audit information extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant