CN113157739A - Cross-modal retrieval method and device, electronic equipment and storage medium - Google Patents

Cross-modal retrieval method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113157739A
CN113157739A CN202110445359.XA CN202110445359A CN113157739A CN 113157739 A CN113157739 A CN 113157739A CN 202110445359 A CN202110445359 A CN 202110445359A CN 113157739 A CN113157739 A CN 113157739A
Authority
CN
China
Prior art keywords
cross
modal
data
features
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110445359.XA
Other languages
Chinese (zh)
Other versions
CN113157739B (en
Inventor
魏文琦
王健宗
张之勇
程宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110445359.XA priority Critical patent/CN113157739B/en
Publication of CN113157739A publication Critical patent/CN113157739A/en
Application granted granted Critical
Publication of CN113157739B publication Critical patent/CN113157739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to an artificial intelligence technology, and discloses a cross-modal retrieval method, which comprises the following steps: extracting the characteristics of original cross-modal data by using a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data, training a pre-constructed similarity matrix by using the characteristics of the original cross-modal data to obtain a standard similarity matrix, generating a cross-modal matching model based on the standard similarity matrix, matching the standard data by using the cross-modal matching model to obtain matching characteristics, distilling the matching characteristics according to the cross-modal matching model to obtain a cross-modal retrieval model, and retrieving the data to be retrieved by using the cross-modal retrieval model to obtain a retrieval result. Furthermore, the invention also relates to a blockchain technology, and the retrieval result can be stored in a node of the blockchain. The invention also provides a cross-modal retrieval device, electronic equipment and a computer readable storage medium. The invention can solve the problem of low cross-modal retrieval precision.

Description

Cross-modal retrieval method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a cross-modal retrieval method and device, electronic equipment and a computer-readable storage medium.
Background
With the rapid development of scientific technology, people have not limited data retrieval to homogeneous modal data, but have increasingly demanded cross-modal data retrieval, for example, retrieving images by text. In the cross-modal data retrieval, in order to retrieve characters by using images or retrieve images by using characters, a corresponding relationship needs to be found so that the two can be mutually converted by inquiring the relationship, a parameter matrix is usually trained by adopting a supervision method, and the parameter matrix is used as the corresponding relationship.
The existing supervised cross-modal data training method has the following defects: 1. the supervised training method requires a large amount of manual labels to obtain the labels, for example, training is performed by labeling real images corresponding to characters, and the training efficiency is low. 2. The accuracy of the obtained cross-modal data retrieval model is limited by a training set, and the retrieval capability of generalizing to unmarked data is poor.
Disclosure of Invention
The invention provides a cross-modal retrieval method, a cross-modal retrieval device and a computer-readable storage medium, and mainly aims to solve the problem of low cross-modal retrieval precision.
In order to achieve the above object, the present invention provides a cross-modal search method, including:
acquiring original cross-modal data, and extracting the characteristics of the original cross-modal data by using a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data;
training a pre-constructed similarity matrix by using the original cross-modal data characteristics to obtain a standard similarity matrix, and generating a cross-modal matching model based on the standard similarity matrix and the characteristic extraction network;
acquiring standard data, and matching the standard data by using the cross-modal matching model to obtain matching characteristics;
performing cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model;
and retrieving the data to be retrieved by utilizing the cross-modal retrieval model to obtain a retrieval result.
Optionally, the extracting, by using a pre-constructed feature extraction network, features of the original cross-modal data to obtain original cross-modal data features includes:
extracting picture features in the original cross-modal data by using an image feature extraction sub-network in the feature extraction network to obtain original image features, and extracting text features in the original cross-modal data by using a text feature extraction sub-network in the feature extraction network to obtain original text features;
and summarizing the original image characteristics and the original text characteristics to obtain the original cross-modal data characteristics.
Optionally, the training a pre-constructed similarity matrix by using the original cross-modal data features to obtain a standard similarity matrix, and generating a cross-modal matching model based on the standard similarity matrix and the feature extraction network, includes:
constructing an image-text embedding space, mapping the original image characteristics to the image-text embedding space to obtain image parameter vectors, and mapping the original text characteristics to the image-text embedding space to obtain text parameter vectors;
constructing an objective function based on the image parameter vector and the text parameter vector, wherein the objective function comprises a similarity matrix;
and when the target function is smaller than a preset target threshold value, obtaining a standard similarity matrix based on the similarity matrix.
4. Optionally, the constructing an objective function based on the image parameter vector and the text parameter vector includes:
constructing the following objective function based on the image parameter vector and the text parameter vector:
Figure BDA0003035373140000021
wherein, thetaIAs a vector of image parameters, thetaTIs a text parameter vector, S is a similarity matrix, fi IFor the mapping of the ith original image feature, fJ TIs a mapping of the jth original text feature.
Optionally, the matching the standard data by using the cross-modal matching model to obtain matching features includes:
extracting picture features in the standard data by using the image feature extraction sub-network to obtain standard image features, and extracting text features in the standard data by using the text feature extraction sub-network to obtain standard text features;
mapping the standard image features and the standard text features into the image-text embedding space;
and taking the standard image features and the standard text features which meet the standard similarity matrix as the matching features.
Optionally, the performing cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model includes:
constructing a cross-modal Hash mapping matrix based on the matching characteristics, and monitoring the cross-modal Hash mapping matrix by using the standard similarity matrix until the cross-modal Hash mapping matrix is converged;
calculating a loss value according to the converged cross-modal Hash mapping matrix and a pre-constructed loss function until the loss value is smaller than a preset loss threshold value, and obtaining the cross-modal retrieval model based on the cross-modal Hash mapping matrix.
Optionally, the constructing a cross-modal hash mapping matrix based on the matching features, and monitoring the cross-modal hash mapping matrix by using the standard similarity matrix until the cross-modal hash mapping matrix converges includes:
calculating the feature distribution of different features in the matched features by using a preset classification function;
and constructing the cross-modal Hash mapping matrix based on the characteristic distribution, and calculating a difference value between the standard similarity matrix and the cross-modal Hash mapping matrix until the difference value is smaller than a preset difference threshold value, so as to obtain the converged cross-modal Hash mapping matrix.
In order to solve the above problem, the present invention further provides a cross-modal search apparatus, including:
the characteristic extraction module is used for acquiring original cross-modal data and extracting the characteristics of the original cross-modal data by utilizing a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data;
the matrix training module is used for training a pre-constructed similarity matrix by utilizing the original cross-modal data characteristics to obtain a standard similarity matrix and generating a cross-modal matching model based on the standard similarity matrix and the characteristic extraction network;
the characteristic matching module is used for acquiring standard data and matching the standard data by using the cross-modal matching model to obtain matching characteristics;
the cross-modal distillation module is used for carrying out cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model;
and the data retrieval module is used for retrieving the data to be retrieved by utilizing the cross-modal retrieval model to obtain a retrieval result.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the cross-modal retrieval method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, where the at least one instruction is executed by a processor in an electronic device to implement the cross-modal retrieval method described above.
According to the invention, the pre-constructed similarity matrix is trained by utilizing the original cross-modal data characteristics to obtain the standard similarity matrix, and the cross-modal matching model is generated based on the standard similarity matrix and the characteristic extraction network, so that the image characteristics and the text characteristics can be automatically corresponded, and the cross-modal data retrieval rate is improved. And performing cross-modal data distillation on the matching features according to the cross-modal matching model, performing supervision through the similarity matrix to obtain a cross-modal retrieval model, and improving the generalization capability of the cross-modal retrieval model so that the accuracy of the cross-modal retrieval model is higher. Therefore, the cross-modal retrieval method, the cross-modal retrieval device, the electronic equipment and the computer-readable storage medium can solve the problem of low cross-modal retrieval precision.
Drawings
Fig. 1 is a schematic flowchart of a cross-modal retrieval method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart showing a detailed implementation of one of the steps in FIG. 1;
FIG. 3 is a schematic flow chart showing another step of FIG. 1;
FIG. 4 is a schematic flow chart showing another step of FIG. 1;
FIG. 5 is a schematic flow chart showing another step in FIG. 1;
FIG. 6 is a functional block diagram of a cross-modal search apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device for implementing the cross-modal retrieval method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a cross-modal retrieval method. The execution subject of the cross-modality retrieval method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like. In other words, the cross-modal retrieval method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a cross-modal retrieval method according to an embodiment of the present invention.
In this embodiment, the cross-modal retrieval method includes:
s1, acquiring original cross-modal data, and extracting the characteristics of the original cross-modal data by using a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data.
In the embodiment of the present invention, the original cross-modal data includes an original image and an original text, and the original text is a text description of the original image. For example, the original cross-modality data may be a currently published data set NUS-WIDE or the like.
Specifically, referring to fig. 2, the extracting, by using a pre-constructed feature extraction network, features of the original cross-modal data to obtain original cross-modal data features includes:
s10, extracting picture features in the original cross-modal data by using an image feature extraction sub-network in the feature extraction network to obtain original image features, and extracting text features in the original cross-modal data by using a text feature extraction sub-network in the feature extraction network to obtain original text features;
and S11, summarizing the original image features and the original text features to obtain the original cross-modal data features.
In the embodiment of the invention, the pre-constructed feature extraction network comprises an image feature extraction sub-network and a text feature extraction sub-network, the image feature extraction sub-network can be a VGG-F convolutional neural network, the VGG-F convolutional neural network comprises a plurality of convolutional layers, a pooling layer and a full connection layer, the convolutional layers are used for extracting image features, the pooling layer is used for reducing the image features, and the full connection layer is used for outputting the reduced image features. The text feature extraction sub-network may be a fully-connected neural network, the fully-connected neural network includes an input layer, a hidden layer and an output layer, the input layer is used for inputting the original text, the hidden layer is used for extracting features of the original text and predicting the features according to preset weights, and the output layer is used for outputting predicted text features. Meanwhile, as the original cross-modal data is mixed with data of two modalities (images and texts), the feature extraction network is utilized to directly extract the features of the original cross-modal data, and compared with the extraction of the features of single modality data, more abundant original cross-modal data features can be obtained.
In the embodiment of the invention, the characteristics of the original cross-modal data are extracted by utilizing the sub-network in the pre-constructed characteristic extraction network, so that richer data characteristics can be obtained.
S2, training a pre-constructed similarity matrix by using the original cross-modal data characteristics to obtain a standard similarity matrix, and generating a cross-modal matching model based on the standard similarity matrix and the characteristic extraction network.
In detail, referring to fig. 3, the training of the pre-constructed similarity matrix by using the original cross-modal data features to obtain a standard similarity matrix includes:
s20, constructing an image-text embedding space, mapping the original image features to the image-text embedding space to obtain image parameter vectors, and mapping the original text features to the image-text embedding space to obtain text parameter vectors;
s21, constructing an objective function based on the image parameter vector and the text parameter vector, wherein the objective function comprises a similarity matrix;
and S22, when the target function is smaller than a preset target threshold, obtaining a standard similarity matrix based on the similarity matrix.
The image-text embedding space is used for mapping the original image features and the original text features into parameter vectors of the same dimension.
In the embodiment of the invention, the following objective function is constructed based on the image parameter vector and the text parameter vector:
Figure BDA0003035373140000061
wherein, thetaIAs a vector of image parameters, thetaTIs a text parameter vector, S is a similarity matrix, fi IFor the mapping of the ith original image feature, fJ TIs a mapping of the jth original text feature.
In the embodiment of the invention, an objective function is constructed based on the image parameter vector and the text parameter vector, an optimal similarity matrix (namely the standard recognition matrix) is obtained by fitting the objective function, when the objective function is greater than or equal to a preset objective threshold, the objective function is continuously fitted, and when the objective function is smaller than the preset objective threshold, the fitting is stopped, so that the standard similarity matrix is obtained. For example, in practical applications, the obtained standard similarity matrix may be S ═ 4- | fi I-j I|2-|fi T-fj T|2) /4, wherein, | fi I-fj I|2Is the Euclidean distance, | f, of the mapping of the ith original image feature and the mapping of the jth original image featurei T-fj T|2As the ith original textThe euclidean distance of the mapping of the feature to the mapping of the jth original text feature.
In the embodiment of the invention, the original image features and the original text features are mapped into vectors of the same latitude by using the image-text embedding space, so that a large amount of manual labeling is not needed, and efficient unsupervised learning can be carried out. And based on the standard similarity matrix and the cross-modal matching model generated by the feature extraction network, the image features and the text features can be automatically corresponding, and the speed of cross-modal data retrieval is improved.
And S3, acquiring standard data, and matching the standard data by using the cross-modal matching model to obtain matching characteristics.
In the embodiment of the present invention, the standard data may be a data set without manual annotation, and include a standard image and a standard text.
Specifically, referring to fig. 4, the matching the standard data by using the cross-modal matching model to obtain matching features includes:
s30, extracting the picture features in the standard data by using the image feature extraction sub-network to obtain standard image features, and extracting the text features in the standard data by using the text feature extraction sub-network to obtain standard text features;
s31, mapping the standard image features and the standard text features into the image-text embedding space;
and S32, taking the standard image features and the standard text features meeting the standard similarity matrix as the matching features.
In the embodiment of the invention, the text and the image with higher similarity can be automatically matched through the standard similarity matrix in the cross-modal matching model, so that the accuracy of model training is improved.
And S4, performing cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model.
In detail, referring to fig. 5, the performing cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model includes:
s40, constructing a cross-modal Hash mapping matrix based on the matching characteristics, and monitoring the cross-modal Hash mapping matrix by using the standard similarity matrix until the cross-modal Hash mapping matrix is converged;
s41, calculating a loss value according to the converged cross-modal Hash mapping matrix and a pre-constructed loss function until the loss value is smaller than a preset loss threshold value, and obtaining the cross-modal retrieval model based on the cross-modal Hash mapping matrix.
The matching features are converted into data of the same latitude through the cross-modal matching model, and the standard similarity matrix is used for supervising the cross-modal Hash mapping matrix, so that the generalization capability of the model can be improved.
Specifically, the constructing a cross-modal hash mapping matrix based on the matching features, and monitoring the cross-modal hash mapping matrix by using the standard similarity matrix until the cross-modal hash mapping matrix is converged includes:
calculating the feature distribution of different features in the matched features by using a preset classification function;
and constructing the cross-modal Hash mapping matrix based on the characteristic distribution, and calculating a difference value between the standard similarity matrix and the cross-modal Hash mapping matrix until the difference value is smaller than a preset difference threshold value, so as to obtain the converged cross-modal Hash mapping matrix.
The preset classification function can be a softmax function with a temperature parameter, the temperature parameter is a hyper-parameter and is represented by T, and the larger T is, the smoother the output characteristic distribution can be achieved. The difference value may be a KL Divergence, which is a symmetric K-L Divergence (systematic Kullback-Leibler Divergence), an algorithm that quantifies the difference between two mixture distributions.
In the embodiment of the present invention, the pre-constructed loss function may be a triple loss:
Figure BDA0003035373140000081
wherein N is the number of matching features,
Figure BDA0003035373140000082
for converged cross-modal Hash mapping matrix
Figure BDA0003035373140000083
Like features of kth
Figure BDA0003035373140000084
The Euclidean distance of (a) is,
Figure BDA0003035373140000085
for converged cross-modal Hash mapping matrix
Figure BDA0003035373140000086
Features not in the same category as the kth
Figure BDA0003035373140000087
Alpha is a fixed parameter.
In the embodiment of the invention, the cross-modal Hash mapping matrix is supervised by the standard similarity matrix to complete knowledge distillation, so that the image characteristics and the text characteristics can be better combined together, and the accuracy of model retrieval is improved, namely, a better model is trained for the second time.
And S5, retrieving the data to be retrieved by utilizing the cross-modal retrieval model to obtain a retrieval result.
Specifically, retrieving the data to be retrieved by using the cross-modal retrieval model to obtain a retrieval result includes:
extracting data characteristics of the data to be retrieved by using the cross-modal retrieval model, and mapping the data characteristics to the image-text embedding space;
and matching modal data corresponding to the data characteristics based on the standard similarity matrix and the converged cross-modal Hash mapping matrix, and taking the modal data as the retrieval result.
In the embodiment of the invention, the cross-modal retrieval model is utilized to retrieve the data to be retrieved, so that the accuracy of the cross-modal data retrieval can be improved.
According to the invention, the pre-constructed similarity matrix is trained by utilizing the original cross-modal data characteristics to obtain the standard similarity matrix, and the cross-modal matching model is generated based on the standard similarity matrix and the characteristic extraction network, so that the image characteristics and the text characteristics can be automatically corresponded, and the cross-modal data retrieval rate is improved. And performing cross-modal data distillation on the matching features according to the cross-modal matching model, performing supervision through the similarity matrix to obtain a cross-modal retrieval model, and improving the generalization capability of the cross-modal retrieval model so that the accuracy of the cross-modal retrieval model is higher. Therefore, the embodiment of the invention can solve the problem of low cross-modal retrieval precision.
Fig. 6 is a functional block diagram of a cross-mode search apparatus according to an embodiment of the present invention.
The cross-modal search apparatus 100 of the present invention may be installed in an electronic device. According to the realized functions, the cross-modal search apparatus 100 may include a feature extraction module 101, a matrix training module 102, a feature matching module 103, a cross-modal distillation module 104, and a data search module 105. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the feature extraction module 101 is configured to obtain original cross-modal data, and extract features of the original cross-modal data by using a pre-constructed feature extraction network to obtain original cross-modal data features.
In the embodiment of the present invention, the original cross-modal data includes an original image and an original text, and the original text is a text description of the original image. For example, the original cross-modality data may be a currently published data set NUS-WIDE or the like.
Specifically, the feature extraction module 101 obtains the original cross-modal data features by:
extracting picture features in the original cross-modal data by using an image feature extraction sub-network in the feature extraction network to obtain original image features, and extracting text features in the original cross-modal data by using a text feature extraction sub-network in the feature extraction network to obtain original text features;
and summarizing the original image characteristics and the original text characteristics to obtain the original cross-modal data characteristics.
In the embodiment of the invention, the pre-constructed feature extraction network comprises an image feature extraction sub-network and a text feature extraction sub-network, the image feature extraction sub-network can be a VGG-F convolutional neural network, the VGG-F convolutional neural network comprises a plurality of convolutional layers, a pooling layer and a full connection layer, the convolutional layers are used for extracting image features, the pooling layer is used for reducing the image features, and the full connection layer is used for outputting the reduced image features. The text feature extraction sub-network may be a fully-connected neural network, the fully-connected neural network includes an input layer, a hidden layer and an output layer, the input layer is used for inputting the original text, the hidden layer is used for extracting features of the original text and predicting the features according to preset weights, and the output layer is used for outputting predicted text features. Meanwhile, as the original cross-modal data is mixed with data of two modalities (images and texts), the feature extraction network is utilized to directly extract the features of the original cross-modal data, and compared with the extraction of the features of single modality data, more abundant original cross-modal data features can be obtained.
In the embodiment of the invention, the characteristics of the original cross-modal data are extracted by utilizing the sub-network in the pre-constructed characteristic extraction network, so that richer data characteristics can be obtained.
The matrix training module 102 is configured to train a pre-constructed similarity matrix by using the original cross-modal data features to obtain a standard similarity matrix, and generate a cross-modal matching model based on the standard similarity matrix and the feature extraction network.
In detail, the matrix training module 102 obtains the standard similarity matrix by:
constructing an image-text embedding space, mapping the original image characteristics to the image-text embedding space to obtain image parameter vectors, and mapping the original text characteristics to the image-text embedding space to obtain text parameter vectors;
constructing an objective function based on the image parameter vector and the text parameter vector, wherein the objective function comprises a similarity matrix;
and when the target function is smaller than a preset target threshold value, obtaining a standard similarity matrix based on the similarity matrix.
The image-text embedding space is used for mapping the original image features and the original text features into parameter vectors of the same dimension.
In the embodiment of the invention, the following objective function is constructed based on the image parameter vector and the text parameter vector:
Figure BDA0003035373140000111
wherein, thetaIAs a vector of image parameters, thetaTIs a text parameter vector, S is a similarity matrix, fi IFor the mapping of the ith original image feature, fJ TIs a mapping of the jth original text feature.
In the embodiment of the invention, an objective function is constructed based on the image parameter vector and the text parameter vector, an optimal similarity matrix (namely the standard identification matrix) is obtained by fitting the objective function, when the objective function is greater than or equal to a preset objective threshold value, the objective function is continuously fitted, and when the objective function is smallAnd stopping fitting when a preset target threshold value is reached to obtain the standard similarity matrix. For example, in practical applications, the obtained standard similarity matrix may be S ═ 4- | fi I-fj I|2-|fi T-fj T|2) /4, wherein, | fi I-fj I|2Is the Euclidean distance, | f, of the mapping of the ith original image feature and the mapping of the jth original image featurei T-fj T|2The euclidean distance between the mapping of the ith original text feature and the mapping of the jth original text feature.
In the embodiment of the invention, the original image features and the original text features are mapped into vectors of the same latitude by using the image-text embedding space, so that a large amount of manual labeling is not needed, and efficient unsupervised learning can be carried out. And based on the standard similarity matrix and the cross-modal matching model generated by the feature extraction network, the image features and the text features can be automatically corresponding, and the speed of cross-modal data retrieval is improved.
The feature matching module 103 is configured to obtain standard data, and match the standard data by using the cross-modal matching model to obtain a matching feature.
In the embodiment of the present invention, the standard data may be a data set without manual annotation, and include a standard image and a standard text.
Specifically, the feature matching module 103 obtains matching features:
extracting picture features in the standard data by using the image feature extraction sub-network to obtain standard image features, and extracting text features in the standard data by using the text feature extraction sub-network to obtain standard text features;
mapping the standard image features and the standard text features into the image-text embedding space;
and taking the standard image features and the standard text features which meet the standard similarity matrix as the matching features.
In the embodiment of the invention, the text and the image with higher similarity can be automatically matched through the standard similarity matrix in the cross-modal matching model, so that the accuracy of model training is improved.
The cross-modal distillation module 104 is configured to perform cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model.
In detail, the cross-modal distillation module 104 obtains a cross-modal retrieval model by:
constructing a cross-modal Hash mapping matrix based on the matching characteristics, and monitoring the cross-modal Hash mapping matrix by using the standard similarity matrix until the cross-modal Hash mapping matrix is converged;
calculating a loss value according to the converged cross-modal Hash mapping matrix and a pre-constructed loss function until the loss value is smaller than a preset loss threshold value, and obtaining the cross-modal retrieval model based on the cross-modal Hash mapping matrix.
The matching features are converted into data of the same latitude through the cross-modal matching model, and the standard similarity matrix is used for supervising the cross-modal Hash mapping matrix, so that the generalization capability of the model can be improved.
Specifically, the cross-modal distillation module 104 supervises the cross-modal hash mapping matrix until the cross-modal hash mapping matrix converges by:
calculating the feature distribution of different features in the matched features by using a preset classification function;
and constructing the cross-modal Hash mapping matrix based on the characteristic distribution, and calculating a difference value between the standard similarity matrix and the cross-modal Hash mapping matrix until the difference value is smaller than a preset difference threshold value, so as to obtain the converged cross-modal Hash mapping matrix.
The preset classification function can be a softmax function with a temperature parameter, the temperature parameter is a hyper-parameter and is represented by T, and the larger T is, the smoother the output characteristic distribution can be achieved. The difference value may be a KL Divergence, which is a symmetric K-L Divergence (systematic Kullback-Leibler Divergence), an algorithm that quantifies the difference between two mixture distributions.
In the embodiment of the present invention, the pre-constructed loss function may be a triple loss:
Figure BDA0003035373140000131
wherein N is the number of matching features,
Figure BDA0003035373140000132
for converged cross-modal Hash mapping matrix
Figure BDA0003035373140000133
Like features of kth
Figure BDA0003035373140000134
The Euclidean distance of (a) is,
Figure BDA0003035373140000135
for converged cross-modal Hash mapping matrix
Figure BDA0003035373140000136
Features not in the same category as the kth
Figure BDA0003035373140000137
Alpha is a fixed parameter.
In the embodiment of the invention, the cross-modal Hash mapping matrix is supervised by the standard similarity matrix to complete knowledge distillation, so that the image characteristics and the text characteristics can be better combined together, and the accuracy of model retrieval is improved, namely, a better model is trained for the second time.
The data retrieval module 105 is configured to retrieve the data to be retrieved by using the cross-modal retrieval model to obtain a retrieval result.
Specifically, the data retrieval module 105 obtains a retrieval result by:
extracting data characteristics of the data to be retrieved by using the cross-modal retrieval model, and mapping the data characteristics to the image-text embedding space;
and matching modal data corresponding to the data characteristics based on the standard similarity matrix and the converged cross-modal Hash mapping matrix, and taking the modal data as the retrieval result.
In the embodiment of the invention, the cross-modal retrieval model is utilized to retrieve the data to be retrieved, so that the accuracy of the cross-modal data retrieval can be improved.
Fig. 7 is a schematic structural diagram of an electronic device for implementing a cross-modal retrieval method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a cross-mode retrieval program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the cross-mode search program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., cross-mode retrieval programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 7 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 7 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (e.g., a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (e.g. a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The cross-modal retrieval program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
acquiring original cross-modal data, and extracting the characteristics of the original cross-modal data by using a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data;
training a pre-constructed similarity matrix by using the original cross-modal data characteristics to obtain a standard similarity matrix, and generating a cross-modal matching model based on the standard similarity matrix and the characteristic extraction network;
acquiring standard data, and matching the standard data by using the cross-modal matching model to obtain matching characteristics;
performing cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model;
and retrieving the data to be retrieved by utilizing the cross-modal retrieval model to obtain a retrieval result.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 5, which is not repeated herein.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring original cross-modal data, and extracting the characteristics of the original cross-modal data by using a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data;
training a pre-constructed similarity matrix by using the original cross-modal data characteristics to obtain a standard similarity matrix, and generating a cross-modal matching model based on the standard similarity matrix and the characteristic extraction network;
acquiring standard data, and matching the standard data by using the cross-modal matching model to obtain matching characteristics;
performing cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model;
and retrieving the data to be retrieved by utilizing the cross-modal retrieval model to obtain a retrieval result.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A cross-modal retrieval method, the method comprising:
acquiring original cross-modal data, and extracting the characteristics of the original cross-modal data by using a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data;
training a pre-constructed similarity matrix by using the original cross-modal data characteristics to obtain a standard similarity matrix, and generating a cross-modal matching model based on the standard similarity matrix and the characteristic extraction network;
acquiring standard data, and matching the standard data by using the cross-modal matching model to obtain matching characteristics;
performing cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model;
and retrieving the data to be retrieved by utilizing the cross-modal retrieval model to obtain a retrieval result.
2. The cross-modal retrieval method of claim 1, wherein the extracting the features of the original cross-modal data using the pre-constructed feature extraction network to obtain the original cross-modal data features comprises:
extracting picture features in the original cross-modal data by using an image feature extraction sub-network in the feature extraction network to obtain original image features, and extracting text features in the original cross-modal data by using a text feature extraction sub-network in the feature extraction network to obtain original text features;
and summarizing the original image characteristics and the original text characteristics to obtain the original cross-modal data characteristics.
3. The cross-modal search method of claim 1, wherein the training of the pre-constructed similarity matrix with the original cross-modal data features to obtain a standard similarity matrix, and the generation of the cross-modal matching model based on the standard similarity matrix and the feature extraction network comprises:
constructing an image-text embedding space, mapping the original image characteristics to the image-text embedding space to obtain image parameter vectors, and mapping the original text characteristics to the image-text embedding space to obtain text parameter vectors;
constructing an objective function based on the image parameter vector and the text parameter vector, wherein the objective function comprises a similarity matrix;
and when the target function is smaller than a preset target threshold value, obtaining a standard similarity matrix based on the similarity matrix.
4. The cross-modal search method of claim 3, wherein the constructing an objective function based on the image parameter vector and the text parameter vector comprises:
constructing the following objective function based on the image parameter vector and the text parameter vector:
Figure FDA0003035373130000021
wherein, thetaIAs a vector of image parameters, thetaTIs a text parameter vector, S is a similarity matrix,
Figure FDA0003035373130000022
for the mapping of the ith original image feature,
Figure FDA0003035373130000023
is a mapping of the jth original text feature.
5. The cross-modal retrieval method of claim 3, wherein the matching the standard data using the cross-modal matching model to obtain matching features comprises:
extracting picture features in the standard data by using the image feature extraction sub-network to obtain standard image features, and extracting text features in the standard data by using the text feature extraction sub-network to obtain standard text features;
mapping the standard image features and the standard text features into the image-text embedding space;
and taking the standard image features and the standard text features which meet the standard similarity matrix as the matching features.
6. The cross-modal search method of claim 5, wherein the cross-modal data distillation of the matched features according to the cross-modal matching model to obtain a cross-modal search model comprises:
constructing a cross-modal Hash mapping matrix based on the matching characteristics, and monitoring the cross-modal Hash mapping matrix by using the standard similarity matrix until the cross-modal Hash mapping matrix is converged;
calculating a loss value according to the converged cross-modal Hash mapping matrix and a pre-constructed loss function until the loss value is smaller than a preset loss threshold value, and obtaining the cross-modal retrieval model based on the cross-modal Hash mapping matrix.
7. The cross-modal retrieval method of claim 6, wherein the constructing a cross-modal hash mapping matrix based on the matching features, and supervising the cross-modal hash mapping matrix with the standard similarity matrix until the cross-modal hash mapping matrix converges comprises:
calculating the feature distribution of different features in the matched features by using a preset classification function;
and constructing the cross-modal Hash mapping matrix based on the characteristic distribution, and calculating a difference value between the standard similarity matrix and the cross-modal Hash mapping matrix until the difference value is smaller than a preset difference threshold value, so as to obtain the converged cross-modal Hash mapping matrix.
8. A cross-modality retrieval apparatus, characterized in that the apparatus comprises:
the characteristic extraction module is used for acquiring original cross-modal data and extracting the characteristics of the original cross-modal data by utilizing a pre-constructed characteristic extraction network to obtain the characteristics of the original cross-modal data;
the matrix training module is used for training a pre-constructed similarity matrix by utilizing the original cross-modal data characteristics to obtain a standard similarity matrix and generating a cross-modal matching model based on the standard similarity matrix and the characteristic extraction network;
the characteristic matching module is used for acquiring standard data and matching the standard data by using the cross-modal matching model to obtain matching characteristics;
the cross-modal distillation module is used for carrying out cross-modal data distillation on the matching features according to the cross-modal matching model to obtain a cross-modal retrieval model;
and the data retrieval module is used for retrieving the data to be retrieved by utilizing the cross-modal retrieval model to obtain a retrieval result.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a cross-modal retrieval method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a cross-modal retrieval method as recited in any one of claims 1 to 7.
CN202110445359.XA 2021-04-23 2021-04-23 Cross-modal retrieval method and device, electronic equipment and storage medium Active CN113157739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110445359.XA CN113157739B (en) 2021-04-23 2021-04-23 Cross-modal retrieval method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110445359.XA CN113157739B (en) 2021-04-23 2021-04-23 Cross-modal retrieval method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113157739A true CN113157739A (en) 2021-07-23
CN113157739B CN113157739B (en) 2024-01-09

Family

ID=76870669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110445359.XA Active CN113157739B (en) 2021-04-23 2021-04-23 Cross-modal retrieval method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113157739B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627151A (en) * 2021-10-14 2021-11-09 北京中科闻歌科技股份有限公司 Cross-modal data matching method, device, equipment and medium
WO2024051350A1 (en) * 2022-09-07 2024-03-14 腾讯科技(深圳)有限公司 Image retrieval method and apparatus, and electronic device and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729513A (en) * 2017-10-25 2018-02-23 鲁东大学 Discrete supervision cross-module state Hash search method based on semanteme alignment
CN108399211A (en) * 2018-02-02 2018-08-14 清华大学 Large-scale image searching algorithm based on binary feature
CN109299216A (en) * 2018-10-29 2019-02-01 山东师范大学 A kind of cross-module state Hash search method and system merging supervision message
CN110059157A (en) * 2019-03-18 2019-07-26 华南师范大学 A kind of picture and text cross-module state search method, system, device and storage medium
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
CN110188210A (en) * 2019-05-10 2019-08-30 山东师范大学 One kind is based on figure regularization and the independent cross-module state data retrieval method of mode and system
US20200097604A1 (en) * 2018-09-21 2020-03-26 Microsoft Technology Licensing, Llc Stacked cross-modal matching
CN111026894A (en) * 2019-12-12 2020-04-17 清华大学 Cross-modal image text retrieval method based on credibility self-adaptive matching network
CN111209415A (en) * 2020-01-10 2020-05-29 重庆邮电大学 Image-text cross-modal Hash retrieval method based on mass training
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN112148916A (en) * 2020-09-28 2020-12-29 华中科技大学 Cross-modal retrieval method, device, equipment and medium based on supervision

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729513A (en) * 2017-10-25 2018-02-23 鲁东大学 Discrete supervision cross-module state Hash search method based on semanteme alignment
CN108399211A (en) * 2018-02-02 2018-08-14 清华大学 Large-scale image searching algorithm based on binary feature
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
US20200097604A1 (en) * 2018-09-21 2020-03-26 Microsoft Technology Licensing, Llc Stacked cross-modal matching
CN109299216A (en) * 2018-10-29 2019-02-01 山东师范大学 A kind of cross-module state Hash search method and system merging supervision message
CN110059157A (en) * 2019-03-18 2019-07-26 华南师范大学 A kind of picture and text cross-module state search method, system, device and storage medium
CN110188210A (en) * 2019-05-10 2019-08-30 山东师范大学 One kind is based on figure regularization and the independent cross-module state data retrieval method of mode and system
CN111026894A (en) * 2019-12-12 2020-04-17 清华大学 Cross-modal image text retrieval method based on credibility self-adaptive matching network
CN111209415A (en) * 2020-01-10 2020-05-29 重庆邮电大学 Image-text cross-modal Hash retrieval method based on mass training
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN112148916A (en) * 2020-09-28 2020-12-29 华中科技大学 Cross-modal retrieval method, device, equipment and medium based on supervision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIBIN ZHENG ET AL: "Multi-Modal Coreference Resolution with the Correlation between Space Structures", 《ARXIV:1804.08010V2 [CS.AI]》, pages 1 - 8 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627151A (en) * 2021-10-14 2021-11-09 北京中科闻歌科技股份有限公司 Cross-modal data matching method, device, equipment and medium
CN113627151B (en) * 2021-10-14 2022-02-22 北京中科闻歌科技股份有限公司 Cross-modal data matching method, device, equipment and medium
WO2024051350A1 (en) * 2022-09-07 2024-03-14 腾讯科技(深圳)有限公司 Image retrieval method and apparatus, and electronic device and storage medium

Also Published As

Publication number Publication date
CN113157739B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
CN112380870A (en) User intention analysis method and device, electronic equipment and computer storage medium
CN113157927A (en) Text classification method and device, electronic equipment and readable storage medium
CN113157739A (en) Cross-modal retrieval method and device, electronic equipment and storage medium
CN114708461A (en) Multi-modal learning model-based classification method, device, equipment and storage medium
CN113298159A (en) Target detection method and device, electronic equipment and storage medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN114398557A (en) Information recommendation method and device based on double portraits, electronic equipment and storage medium
CN114511038A (en) False news detection method and device, electronic equipment and readable storage medium
CN113658002B (en) Transaction result generation method and device based on decision tree, electronic equipment and medium
CN113344125A (en) Long text matching identification method and device, electronic equipment and storage medium
CN112269875A (en) Text classification method and device, electronic equipment and storage medium
CN111651625A (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN115982454A (en) User portrait based questionnaire pushing method, device, equipment and storage medium
CN112215336B (en) Data labeling method, device, equipment and storage medium based on user behaviors
CN114385815A (en) News screening method, device, equipment and storage medium based on business requirements
CN114595321A (en) Question marking method and device, electronic equipment and storage medium
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN114219367A (en) User scoring method, device, equipment and storage medium
CN113656690A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN112631589A (en) Application program home page layout configuration method and device, electronic equipment and storage medium
CN112580505A (en) Method and device for identifying opening and closing states of network points, electronic equipment and storage medium
CN113656703B (en) Intelligent recommendation method, device, equipment and storage medium based on new online courses
CN111814962B (en) Parameter acquisition method and device for identification model, electronic equipment and storage medium
CN115221274A (en) Text emotion classification method and device, electronic equipment and storage medium
CN115082736A (en) Garbage identification and classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant