CN110647904A - Cross-modal retrieval method and system based on unmarked data migration - Google Patents

Cross-modal retrieval method and system based on unmarked data migration Download PDF

Info

Publication number
CN110647904A
CN110647904A CN201910707010.1A CN201910707010A CN110647904A CN 110647904 A CN110647904 A CN 110647904A CN 201910707010 A CN201910707010 A CN 201910707010A CN 110647904 A CN110647904 A CN 110647904A
Authority
CN
China
Prior art keywords
modal
cross
loss
data
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910707010.1A
Other languages
Chinese (zh)
Other versions
CN110647904B (en
Inventor
朱福庆
王雪如
张卫博
戴娇
虎嵩林
韩冀中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201910707010.1A priority Critical patent/CN110647904B/en
Publication of CN110647904A publication Critical patent/CN110647904A/en
Application granted granted Critical
Publication of CN110647904B publication Critical patent/CN110647904B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23211Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters

Abstract

The invention provides a cross-modal retrieval method and a cross-modal retrieval system based on label-free data migration. The invention well solves the problem of small data scale of the cross-modal data set, and better conforms to the condition that the actual user query is not in the predefined category range; meanwhile, upper-layer semantic information of data in different modes can be better extracted, the heterogeneity difference between the modes is overcome, the similarity between the modes is increased, and the cross-mode retrieval accuracy is improved.

Description

Cross-modal retrieval method and system based on unmarked data migration
Technical Field
The invention relates to the technical field of cross-modal data retrieval, in particular to a cross-modal retrieval method and a cross-modal retrieval system based on unmarked data migration.
Background
Different modal data such as images and texts are widely present in the internet and show a trend of mutual fusion. The cross-modal retrieval task tries to break the boundary between different modal data and cross different modal data to realize information retrieval, namely, a certain modal sample is tried to retrieve samples of other modalities similar to the semantics of the certain modal sample, and the cross-modal retrieval task is widely applied to search engines and big data management. The existing cross-modal retrieval method tries to map feature representations of different modal data to a common space to learn a unified representation, and measures similarity by calculating the distance between corresponding unified representations. However, due to the heterogeneity of different modal data, the data distribution and characterization are inconsistent, semantic association is difficult to achieve, and cross-modal similarity is still difficult to measure.
Although the internet has a lot of images and text data, most of the images and text data are unmarked and difficult to utilize. The data contains rich semantic information, on one hand, data annotation requires a large amount of cost, on the other hand, internet information is updated constantly, and each new hot event is accompanied by a large amount of data such as images and texts of new categories, so that the data of all the categories cannot be annotated, and how to fully utilize the non-annotated data is a great challenge for the traditional cross-modal retrieval method.
For the above reasons, in an actual scenario, the query submitted by the user often does not necessarily fall within the predefined category range, and sometimes the training set and the test set do not share the same category. Existing cross-modality retrieval methods are generally only directed to cases where training data and test data are of the same category (non-extensible cross-modality retrieval). How to better construct a cross-modal common space, inputting a modal data, no matter the category of the data is known or unknown, the multi-modal data related to the data can be retrieved, which has important significance in practical application.
Disclosure of Invention
In order to solve the problems of data heterogeneity of different modes, excessive unmarked data, insufficient training data, inextensibility and the like, the invention provides a cross-modal retrieval method and a system based on unmarked data migration.
The technical scheme of the invention is as follows:
a cross-modal retrieval method based on unmarked data migration comprises the following steps:
inputting a sample to be retrieved into a trained cross-modal data retrieval model to obtain characteristic representation of the cross-modal data retrieval model;
calculating Euclidean distances between each sample to be retrieved and all other modal samples, and then sequencing, wherein the other modal samples with the distances smaller than a specified threshold value are retrieval results;
the training process of the cross-modal data retrieval model is as follows:
(1) setting pseudo labels for the unmarked images and the texts respectively by a clustering method;
(2) respectively transferring knowledge contained in the unmarked image and the text with the pseudo label to the image and text parts of the cross-modal data set, and learning the independent expression of the image and the text of the cross-modal data set;
(3) and transmitting the independent expressions of the images and the texts into the same network, and learning the common expression of the images and the texts in the same semantic space.
Further, the method for determining the threshold value comprises the following steps: loss in training processcross-modalThe Loss value is the distance of the paired image text, in terms of Losscross-modalThe loss value is set to 10-20 initial thresholds, and the retrieved mAP value (mAP (mean average) is calculated under each threshold (the quality of the learned model on all queries is measured; the average precision of the AP is measured)Good or bad on each query), the threshold value that maximizes the value of mAP is the threshold value for retrieval; therein, Losscross-modalAs a loss function across modal knowledge:
Figure BDA0002152472220000021
where 16, 17 refer to two fully connected layers connected across the modal dataset image text, nl refers to the logarithm of the incoming image and text,
Figure BDA0002152472220000022
for the p-th image-text pair, the image and text are mapped into feature vectors using g ().
A cross-modal retrieval system based on unmarked data migration, comprising:
the system comprises a label-free data clustering module, a data migration module and a common space learning module, wherein a migration data set is constructed through the label-free data clustering module and is used as a migration source domain of the data migration module, and finally, the common space learning module is used for uniformly expressing the image and text learning obtained by the data migration module and establishing a similarity measurement basis of cross-modal data, so that cross-modal retrieval is realized.
Further, the label-free data clustering module comprises an image clustering submodule and a text clustering submodule. The module extracts the characteristics of all unmarked images/texts and then conducts unsupervised clustering to obtain a series of clustering centers; and classifying the image/text samples under the same cluster center into one class, and setting the samples as the same label, namely completing the construction of the migration data set.
Further, the data migration module comprises an image migration submodule and a text migration submodule, and migration only occurs in the same submodule. For each sub-module, the migration source domain is unmarked data after corresponding modal clustering, and the target domain is data of corresponding modal of the cross-modal data set. Transfer learning is achieved by minimizing the loss of distribution between the source domain and the target domain. The inputs of the cross-modal data set are all input in pairs and belong to the same category, the expressions generated finally should be similar, and the distance between the images and texts with the same semantic information is as close as possible and the distance between the images and texts with different semantics is as far as possible by minimizing the pair Euclidean distance between the two modal data sets, and the images and texts are independent of the modalities.
Furthermore, the common space learning module transmits the single expression of the image and the text obtained by the data migration module into the same network to learn the unified expression of data in different modes, the network comprises a plurality of shared full connection layers, word embedding vectors of cross-mode data set categories are added into the network, semantic association among different modes is increased, and semantic information is further enhanced.
The method has the beneficial effects that:
according to the method, a large number of unmarked monomodal data sets are clustered and are allocated with the pseudo labels, and the clustered unmarked data are transferred to the cross-modal data set, so that the problem of small data scale of the cross-modal data set is well solved, and the method is more suitable for the condition that the actual user query is not in the predefined category range. By the method, upper-layer semantic information of data in different modes can be better extracted, the heterogeneity difference between the modes is overcome, the similarity between the modes is increased, and the cross-mode retrieval accuracy is improved. The method achieves good effects in both public data sets and practical applications.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a data migration flow diagram;
FIG. 3 is a flow diagram of a feature extraction system.
Detailed Description
The method mainly introduces a cross-modal retrieval network modeling based on transfer learning, label-free data clustering, data transfer, co-expression learning and testing process.
The method will be further described with reference to the accompanying drawings.
Modeling of a cross-modal retrieval network based on transfer learning:
clustering unlabeled data, i.e. giving unlabeled data sets S, using image clusteringClass algorithm CiWill not have label image SiPoly is kiClass, using a text clustering algorithm CtFor unlabelled text StPoly is ktMarking the same pseudo label y on all images and texts in the same cluster center in each categoryi. Migrating the clustered label-free data set S to a cross-modal data set D by using a migration learning algorithm T, and performing combined training to generate a single vector expression R of images and texts of the cross-modal data seti,Rt. Finally, the pictures and the texts are expressed separately Ri,RtAnd transmitting the word embedding vector V of the category into the same full-connection network F, and generating a common expression R of the image and the text in the same space. Wherein:
unlabeled dataset S ═ Si,St}: as a source domain for transfer learning, where SiFor unlabeled image datasets, StIs a non-labeled text data set.
Cross-modality dataset D ═ { D ═ Di,Dt}:Di,DtImages, text across the modal dataset, images and text across the modal dataset are entered in pairs and correlated, for each image/text pair, the images and text are from the same article, or the text is a description of the image.
Word embedding vector V: all known classes across the modal dataset are converted to 300-dimensional Word vectors by the Word2vec model.
Text input: text is a description of an image and may be an article, paragraph, sentence, word, etc. Text vectors are extracted using Bert, with dimensions 768 dimensions.
Image input: in this network, the input of the image is the original image at 224 x 224.
Clustering algorithm C ═ { C ═ Ci,Ct},CiAs an image clustering algorithm, CtA text clustering algorithm.
Number of clusters ki,ktThe method is obtained through experience and multiple calculations.
The migration algorithm T: is an algorithm that gains some knowledge through the source domain to promote the target task, where the source domain is not equal to the target domain or the source task is not the same as the target task.
Co-expression vector R: the resulting vector representation of the image and text.
A label-free data clustering module:
for unlabeled images containing rich semantic information, firstly, a pre-trained VGG network is used for extracting a feature vector of each image, and then, the KMeans method is used for clustering the images. The specific method comprises the following steps: setting initial cluster center number (namely k) according to the number and distribution of unmarked imagesi) And randomly select kiThe images serve as the initial cluster centers. And traversing all the images, distributing each image to the nearest cluster center, updating the mean value of each cluster to be used as a new cluster center, and iterating for multiple times until each cluster is not changed any more or the maximum iteration times is reached. All samples of the same cluster are classified into one class and set as the same label for constructing a source domain data set for image migration.
For the unlabeled text, firstly using Bert to extract the characteristics of each text, then adopting the same unsupervised clustering method as the images to classify similar texts into the same cluster and marking the same labels for constructing a source domain data set of text migration.
Method for determining the number of suitable cluster centers: setting the initial value of k to be 5-15 according to the size of the unmarked data volume, clustering each value of k and recording the corresponding SSE (sum of the squared errors, SSE is the sum of the distances between each sample point and the corresponding clustering center of the sample point). With the increase of the number of clusters, the sample division is finer, the aggregation degree of each cluster is gradually improved, and the sum of squared errors SSE is gradually reduced. When k is smaller than the optimal clustering number, the increase of k greatly increases the aggregation degree of each cluster, and the descending extent of SSE is large, and when k reaches the optimal clustering number, the return of the aggregation degree obtained by increasing k is rapidly reduced, so that the descending extent of SSE is rapidly reduced and then becomes gentle as the k value continues to increase. The relationship between k and SSE is plotted, and the point at which the slope changes is the optimal value of k.
A data migration module:
the data migration module comprises two parts, namely monomodal knowledge migration and cross-modal knowledge sharing.
And the single-mode migration refers to migrating the clustered unlabeled images to images corresponding to the cross-mode data set and migrating the clustered unlabeled texts to texts corresponding to the cross-mode data set. Therefore, the module comprises two single-mode migration submodules of image and text.
Referring to fig. 2, for image migration, the source domain of the migration is the unlabeled image after clustering, and the target domain is the image portion of the cross-modal data. Firstly, pictures of a source domain and a target domain are transmitted into a network, the pictures pass through the first five convolutional layers of the AlexNet network, and then three full connection layers fc6, fc7 and fc8 are added, wherein the loss function of the source domain is SoftMax loss. The migration of knowledge of image modalities is achieved by minimizing the loss function MMD (Maximum Mean difference, MMD is used to measure the difference of two different but related distributions) of the source domain and the target domain. Defining the distribution of the image object field as XiDistribution of source domains is YiThe migration loss of the image modality is:
Figure BDA0002152472220000061
whereinIndicating that this distance is measured by f () mapping the data into the regenerated hilbert space (RKHS), m being the number of samples of the source domain data and n being the number of samples of the target domain data.
The text migration and image migration processes are similar, a source domain of the migration is a clustered unlabeled text, a target domain is a text part of cross-modal data, the text feature vectors of the source domain and the target domain are respectively extracted by using an NLP model Bert disclosed by GOOGLE, and then the text feature vectors pass through three full-connection layers fc6, fc7 and fc8, wherein a loss function of the source domain is SoftMax loss, and a loss function of the migration is MMD loss. Defining a distribution of text target fields as XtDistribution of source domains is YtThe migration loss of the text modality is:
the goal of setting the cross-modal knowledge sharing layer is to fully utilize similar semantic information among different modalities, overcome the heterogeneity difference among the modalities, and no matter which modality the data comes from, as long as the data contains the same semantic information, the data should have similar feature vectors, contain different semantic information, and the distance of the feature vectors should be longer. The similarity of vectors is measured using Euclidean distances (fc6-img/fc6-txt and fc7-img/fc7txt), the Euclidean distance of their features should be as small as possible for each pair of similar images and text input. The loss function of cross-modal knowledge is:
Figure BDA0002152472220000064
where 16, 17 refer to two fully connected layers connected across the modal dataset image text, nl refers to the logarithm of the incoming image and text,
Figure BDA0002152472220000065
for the p-th image-text pair, the image and text are mapped into feature vectors using g ().
After passing through the two monomodal knowledge migration modules and the cross-modal knowledge sharing module, the model makes full use of unmarked data, has stronger semantic discrimination capability, and generates a separate representation for each sample in the cross-modal data set.
The final loss function of the migration module is:
Losstransfer=Lossimg+Losstxt+Losscross-modal
a common space learning module:
the cross-modal target domain internal semantic association also provides key semantic information for the construction of a cross-modal common space, and in order to further enhance the semantic correlation of image and text features, a common space learning module is further designed to enhance the correlation. The module is a simple and efficient structure comprising two fully connected layers and a common classification layer. Word embedding (word embedding) vectors of image features, text features and categories are introduced into the module, and since the parameters of fc8 and fc9 are shared by two modalities, the semantic relevance of different modalities can be guaranteed by using supervision information in a cross-modality target domain. Considering the labels of two paired modalities in the target domain, the correlation penalty is:
Figure BDA0002152472220000071
wherein f issIn order to be a function of the SoftMax loss,
Figure BDA0002152472220000072
for the p-th relevant image-text pair input,/pA category label for the image text pair.
The migration module and the common space learning module are a unified network structure, and the two modules are trained together and mutually promoted. The net penalty is therefore:
Loss=Losstransfer+Losscommon
example (b):
the invention comprises a training system, a feature extraction system and a retrieval three parts: the three modules are combined to form the overall structure (figure 1) of the invention, and training data are transmitted into a training system for training and are stored to obtain a training model. The parameters of the feature extraction system (fig. 3) are the same as those of the training system, but structures such as data migration and word embedding of categories are not required, and the test set is transmitted to the feature extraction system to obtain vector representation of each sample of the test set. And during retrieval, calculating the distance between the sample to be retrieved and all samples in other modes, wherein the distance smaller than a specified threshold value is a retrieval result.
A training system:
as shown in fig. 1, the three modules (the unlabeled data clustering module, the data migration module, and the co-expression learning module) are combined to form a training system. The specific training steps are as follows:
1. image source domain preprocessing: for each image in the label-free image set, extracting image features by using a pre-trained VGG network, selecting ki images as initial clustering centers, distributing each image to the nearest clustering center, updating the mean value of each cluster as a new clustering center, and iterating for multiple times until each cluster is not changed or the maximum iteration number is reached. Classify all samples of the same cluster into one class and set these samples to the same label/i(liIn the range of 0 to kiBetween-1) for constructing a migration data set. Storing the image path and the pseudo label into the same txt file, wherein each line represents a picture in the format of' image path li”。
2. Text source domain preprocessing: for each text in the label-free text set, extracting the characteristics of each text by using Bert, and setting the number of clusters as ktThen, similar texts are classified into the same cluster by adopting the same unsupervised clustering method as the images, and the same label l is markedt(ltBetween 0 and t-1). Storing the text path and the pseudo label into the same txt file, wherein each line represents a text in the format of' text path lt”。
3. Cross-modality data set preprocessing: and (4) the images and texts of the cross-modal data set correspond to each other one by one and are input in pairs. The images are stored in a txt document in the format "image path similarity", each line representing an image. The text is firstly converted into a vector, and the vector and the category label are stored in the lmdb file.
4. Setting the network learning rate to be fixed, setting the basic learning rate to be 0.01, iterating for 500 rounds, updating the network parameters, and using a random gradient descent algorithm.
5. And transmitting the image source domain and the text source domain into the model across the modal data set, and starting to train the model. After the images and the texts pass through the migration module and the common space learning module, the expression R of the images and the texts in the common space is obtained. The test system comprises:
the invention features extraction process block diagram is shown in fig. 3, the system has fewer word embedding vectors and SoftMax loss functions for migration source domains and classes than the training system, and no pair-wise input is required across modal datasets. The feature extraction system firstly extracts feature representation of the image/text, wherein the input mode of the image/text is consistent with the training process, the image/text is sent into a CNN model after learning optimization in the training process, and the response of the last but one full connection layer is taken as the feature representation of the image/text. And after the characteristic representation of the image/text is obtained, cross-modal retrieval is carried out.
And (3) retrieval:
1. transmitting the images and texts of all the test sets into a feature extraction system to obtain feature representations of the images and the texts;
2. realizing 'text searching of images' and 'text searching of images': and calculating Euclidean distances between each image and all texts, and sequencing, wherein a plurality of texts closest to the image are retrieval results. The text is also true.

Claims (10)

1. A cross-modal retrieval method based on unmarked data migration comprises the following steps:
inputting a sample to be retrieved into a trained cross-modal data retrieval model to obtain characteristic representation of the cross-modal data retrieval model;
calculating Euclidean distances between each sample to be retrieved and all other modal samples, and then sequencing, wherein the other modal samples with the distances smaller than a specified threshold value are retrieval results;
the training process of the cross-modal data retrieval model is as follows:
(1) setting pseudo labels for the unmarked images and the texts respectively by a clustering method;
(2) respectively transferring knowledge contained in the unlabeled image and the text with the pseudo labels to the image and text parts of the cross-modal dataset to generate a single expression of the image and text of the cross-modal dataset;
(3) and transmitting the independent expressions of the images and the texts into the same network, and learning the common expression of the images and the texts in the same semantic space.
2. The cross-modal retrieval method based on unlabeled data migration of claim 1, wherein the clustering is an unsupervised clustering method, including KMeans method.
3. The cross-modal retrieval method based on markerless data migration of claim 1, wherein the migration comprises unimodal knowledge migration and cross-modal knowledge sharing.
4. The cross-modal search method based on unmarked data migration of claim 3, wherein the migration Loss function Loss istransferComprises the following steps:
Losstransfer=Lossimg+Losstxt+Losscross-modal
therein, LossimgA migration loss function for the image modality; losstxtA migration loss function that is a text modality; losscross-modalIs a loss function across modal knowledge.
5. The cross-modal retrieval method based on label-free data migration according to claim 4, wherein the knowledge migration implementation method of the image modality comprises: firstly, transmitting pictures of a source domain and a target domain into a network, passing through the first five convolutional layers of the AlexNet network, and then adding three full-connection layers, wherein the loss function of the source domain is SoftMax loss; the knowledge transfer of the image modality is realized by minimizing the loss function MMD of the source domain and the target domain;
loss of migration of image modalities LossimgComprises the following steps:
Figure FDA0002152472210000011
wherein the content of the first and second substances,
Figure FDA0002152472210000021
represents the distance measured by mapping the data into the regenerated hilbert space by f (); xiFor distribution of object fields of the image, YiAnd k is the distribution of the source domain, m is the number of the clustering centers, m is the number of samples of the source domain data, and n is the number of samples of the target domain data.
6. The cross-modal retrieval method based on unmarked data migration according to claim 4, wherein the knowledge migration implementation method of text modality comprises: respectively extracting text characteristic vectors of a source domain and a target domain by using Bert, and then passing through three full-connection layers, wherein a loss function of the source domain is SoftMax loss, and a loss function of migration is MMD loss;
loss of migration of text modalities LosstxtComprises the following steps:
wherein the content of the first and second substances,
Figure FDA0002152472210000023
represents the distance measured by mapping the data into the regenerated hilbert space by f (); xtFor distribution of object fields of the image, YtAnd k is the distribution of the source domain, m is the number of the clustering centers, m is the number of samples of the source domain data, and n is the number of samples of the target domain data.
7. The cross-modal search method based on unmarked data migration of claim 4, wherein the Loss function Loss of cross-modal knowledgecross-modalComprises the following steps:
Figure FDA0002152472210000024
where 16, 17 refer to two fully connected layers connected across the modal dataset image text, nl refers to the logarithm of the incoming image and text,
Figure FDA0002152472210000025
for the p-th image-text pair, sum the images using g ()The text is mapped into feature vectors.
8. The cross-modal search method based on markerless data migration of claim 1, wherein the common spatial learning Loss function LosscommonComprises the following steps:
Figure FDA0002152472210000026
wherein f issIn order to be a function of the SoftMax loss,
Figure FDA0002152472210000027
for the p-th relevant image-text pair input,/pN is the number of image text pairs as the category label of the image text pair.
9. The cross-modal retrieval method based on unmarked data migration as claimed in claim 1, wherein the threshold value determination method comprises: loss function Loss across modal knowledge during trainingcross-modalThe Loss value is the distance of the paired image text, in terms of Losscross-modalThe loss value is set to 10-20 initial thresholds and the retrieved mAP value is calculated at each threshold such that the threshold at which the mAP value is the largest is the retrieved threshold.
10. A cross-modal retrieval system based on unmarked data migration, comprising:
the system comprises a label-free data clustering module, a data migration module and a common space learning module;
and finally, a common space learning module is used for learning and uniformly expressing images and texts obtained by the data migration module, and a similarity measurement basis of cross-modal data is established, so that cross-modal retrieval is realized.
CN201910707010.1A 2019-08-01 2019-08-01 Cross-modal retrieval method and system based on unmarked data migration Active CN110647904B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910707010.1A CN110647904B (en) 2019-08-01 2019-08-01 Cross-modal retrieval method and system based on unmarked data migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910707010.1A CN110647904B (en) 2019-08-01 2019-08-01 Cross-modal retrieval method and system based on unmarked data migration

Publications (2)

Publication Number Publication Date
CN110647904A true CN110647904A (en) 2020-01-03
CN110647904B CN110647904B (en) 2022-09-23

Family

ID=68989992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910707010.1A Active CN110647904B (en) 2019-08-01 2019-08-01 Cross-modal retrieval method and system based on unmarked data migration

Country Status (1)

Country Link
CN (1) CN110647904B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111898663A (en) * 2020-07-20 2020-11-06 武汉大学 Cross-modal remote sensing image matching method based on transfer learning
CN112016523A (en) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 Cross-modal face recognition method, device, equipment and storage medium
CN112669331A (en) * 2020-12-25 2021-04-16 上海交通大学 Target data migration iterative learning method and target data migration iterative learning system
CN112732956A (en) * 2020-12-24 2021-04-30 江苏智水智能科技有限责任公司 Efficient query method based on perception multi-mode big data
CN113515657A (en) * 2021-07-06 2021-10-19 天津大学 Cross-modal multi-view target retrieval method and device
CN114120074A (en) * 2021-11-05 2022-03-01 北京百度网讯科技有限公司 Training method and training device of image recognition model based on semantic enhancement
CN116777896A (en) * 2023-07-07 2023-09-19 浙江大学 Negative migration inhibition method for cross-domain classification and identification of apparent defects
CN117636100A (en) * 2024-01-25 2024-03-01 北京航空航天大学杭州创新研究院 Pre-training task model adjustment processing method and device, electronic equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881019A (en) * 2012-10-08 2013-01-16 江南大学 Fuzzy clustering image segmenting method with transfer learning function
CN103020122A (en) * 2012-11-16 2013-04-03 哈尔滨工程大学 Transfer learning method based on semi-supervised clustering
CN107220337A (en) * 2017-05-25 2017-09-29 北京大学 A kind of cross-media retrieval method based on mixing migration network
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN109784405A (en) * 2019-01-16 2019-05-21 山东建筑大学 Cross-module state search method and system based on pseudo label study and semantic consistency

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881019A (en) * 2012-10-08 2013-01-16 江南大学 Fuzzy clustering image segmenting method with transfer learning function
CN103020122A (en) * 2012-11-16 2013-04-03 哈尔滨工程大学 Transfer learning method based on semi-supervised clustering
CN107220337A (en) * 2017-05-25 2017-09-29 北京大学 A kind of cross-media retrieval method based on mixing migration network
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN109784405A (en) * 2019-01-16 2019-05-21 山东建筑大学 Cross-module state search method and system based on pseudo label study and semantic consistency

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIN HUANG ET AL.: ""Cross-modal Common Representation Learning by Hybrid Transfer Network"", 《ARXIV》 *
季鼎承 等: ""基于域与样例平衡的多源迁移学习方法"", 《电子学报》 *
李晓雨 等: ""基于迁移学习的图像检索算法"", 《计算机科学》 *
贾刚 等: ""混合迁移学习方法在医学图像检索中的应用"", 《哈尔滨工程大学学报》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353076B (en) * 2020-02-21 2023-10-10 华为云计算技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111898663B (en) * 2020-07-20 2022-05-13 武汉大学 Cross-modal remote sensing image matching method based on transfer learning
CN111898663A (en) * 2020-07-20 2020-11-06 武汉大学 Cross-modal remote sensing image matching method based on transfer learning
CN112016523A (en) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 Cross-modal face recognition method, device, equipment and storage medium
CN112016523B (en) * 2020-09-25 2023-08-29 北京百度网讯科技有限公司 Cross-modal face recognition method, device, equipment and storage medium
CN112732956A (en) * 2020-12-24 2021-04-30 江苏智水智能科技有限责任公司 Efficient query method based on perception multi-mode big data
CN112669331B (en) * 2020-12-25 2023-04-18 上海交通大学 Target data migration iterative learning method and target data migration iterative learning system
CN112669331A (en) * 2020-12-25 2021-04-16 上海交通大学 Target data migration iterative learning method and target data migration iterative learning system
CN113515657B (en) * 2021-07-06 2022-06-14 天津大学 Cross-modal multi-view target retrieval method and device
CN113515657A (en) * 2021-07-06 2021-10-19 天津大学 Cross-modal multi-view target retrieval method and device
CN114120074A (en) * 2021-11-05 2022-03-01 北京百度网讯科技有限公司 Training method and training device of image recognition model based on semantic enhancement
CN114120074B (en) * 2021-11-05 2023-12-12 北京百度网讯科技有限公司 Training method and training device for image recognition model based on semantic enhancement
CN116777896A (en) * 2023-07-07 2023-09-19 浙江大学 Negative migration inhibition method for cross-domain classification and identification of apparent defects
CN116777896B (en) * 2023-07-07 2024-03-19 浙江大学 Negative migration inhibition method for cross-domain classification and identification of apparent defects
CN117636100A (en) * 2024-01-25 2024-03-01 北京航空航天大学杭州创新研究院 Pre-training task model adjustment processing method and device, electronic equipment and medium
CN117636100B (en) * 2024-01-25 2024-04-30 北京航空航天大学杭州创新研究院 Pre-training task model adjustment processing method and device, electronic equipment and medium

Also Published As

Publication number Publication date
CN110647904B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN110647904B (en) Cross-modal retrieval method and system based on unmarked data migration
CN109918532B (en) Image retrieval method, device, equipment and computer readable storage medium
CN106202256B (en) Web image retrieval method based on semantic propagation and mixed multi-instance learning
CN112905822B (en) Deep supervision cross-modal counterwork learning method based on attention mechanism
CN111753101B (en) Knowledge graph representation learning method integrating entity description and type
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN107220337B (en) Cross-media retrieval method based on hybrid migration network
CN111753189A (en) Common characterization learning method for few-sample cross-modal Hash retrieval
CN108038492A (en) A kind of perceptual term vector and sensibility classification method based on deep learning
JP2013519138A (en) Join embedding for item association
CN112800292B (en) Cross-modal retrieval method based on modal specific and shared feature learning
CN113177132A (en) Image retrieval method based on depth cross-modal hash of joint semantic matrix
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN113537304A (en) Cross-modal semantic clustering method based on bidirectional CNN
CN113961666B (en) Keyword recognition method, apparatus, device, medium, and computer program product
CN115309930A (en) Cross-modal retrieval method and system based on semantic identification
CN113011172A (en) Text processing method and device, computer equipment and storage medium
CN114329051B (en) Data information identification method, device, apparatus, storage medium and program product
CN116610831A (en) Semanteme subdivision and modal alignment reasoning learning cross-modal retrieval method and retrieval system
CN113764034B (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
Tian et al. Automatic image annotation with real-world community contributed data set
CN113742488B (en) Embedded knowledge graph completion method and device based on multitask learning
CN113779287A (en) Cross-domain multi-view target retrieval method and device based on multi-stage classifier network
Su et al. Deep supervised hashing with hard example pairs optimization for image retrieval
Mercy Rajaselvi Beaulah et al. Categorization of images using autoencoder hashing and training of intra bin classifiers for image classification and annotation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant