CN111598712A - Training and searching method for data feature generator in social media cross-modal search - Google Patents
Training and searching method for data feature generator in social media cross-modal search Download PDFInfo
- Publication number
- CN111598712A CN111598712A CN202010418678.7A CN202010418678A CN111598712A CN 111598712 A CN111598712 A CN 111598712A CN 202010418678 A CN202010418678 A CN 202010418678A CN 111598712 A CN111598712 A CN 111598712A
- Authority
- CN
- China
- Prior art keywords
- information
- modal
- generator
- representation
- data information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000006870 function Effects 0.000 claims description 125
- 239000013598 vector Substances 0.000 claims description 22
- 230000007246 mechanism Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 abstract 1
- 230000000875 corresponding effect Effects 0.000 description 45
- 238000004364 calculation method Methods 0.000 description 8
- 238000005457 optimization Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
Abstract
The invention provides a training and searching method for a data feature generator in cross-modal search of social media, which comprises the following steps: the method comprises the steps of obtaining a training sample set, obtaining representation characteristics of each data information by adopting a generator for countermeasure learning based on the training sample set, monitoring a countermeasure generator through a discriminator, adjusting a parameter and optimizing the generator through a fixed discriminator and adjusting a parameter and optimizing the discriminator through the fixed generator, and iterating for multiple times to obtain a final generator. The searching method comprises the following steps: inputting the data information to be searched into a generator to obtain the representation characteristics of the data information to be searched; traversing the existing data information of the target mode, and acquiring the representation characteristics of the existing data information generated by the generator; and acquiring the existing data information of one or more target modes which are most similar to the representation characteristics of the data information to be searched based on similarity matching. The method can adapt to the characteristic of semantic sparsity of data information in social media, and realizes accurate search of cross-modal data information.
Description
Technical Field
The invention relates to the technical field of data search, in particular to a training and searching method for a data feature generator in social media cross-modal search.
Background
The premise of searching the cross-modal data content of the social network is to perform search feature mining on the social network data, and two strategies are mainly adopted: the method comprises the steps of searching characteristic analysis and mining based on manual work and searching characteristic mining based on a machine learning method. The social media has a huge data volume, and the text is short and irregular, so that the text has the problem of semantic sparsity; meanwhile, the situation that the image pixels in the social network are low and the composition is incomplete causes the problem of semantic sparsity similar to the social network text. Based on the characteristics, manual search feature analysis cannot adapt to huge data volume in a social network, and the existing machine learning is difficult to realize feature extraction of texts or images with sparse semantics. Therefore, it is difficult to perform a search between different modality data contents.
Disclosure of Invention
In view of this, the embodiment of the present invention provides a training and searching method for a data feature generator in social media cross-modal search, so as to solve the problem that the cross-modal search cannot be performed on data information in social media in the prior art.
The technical scheme of the invention is as follows:
in one aspect, the present invention provides a training method for a data representation feature generator in a social media cross-modal search, including:
obtaining a training sample set, the training sample set comprising: social media data information of multiple modalities, and topics to which the data information belongs and corresponding modalities are used as tags; wherein, the data information of the plurality of modalities comprises: text modality information and image modality information;
obtaining, with a generator, representative features of each data information based on the set of training samples, the generator including: the character modality generator and the image modality generator are used for acquiring original features of the data information in corresponding modalities, dividing each original feature to acquire a plurality of corresponding local features, and acquiring the representation features of the data information in the same representation subspace in each modality through a self-attention mechanism based on the local features;
supervising combating the generator by means of an arbiter, the arbiter employing a loss function comprising: a generating loss function obtained by weighting and summing the intra-modal semantic loss function and the inter-modal similarity loss function, and a cross-modal discriminant loss function; wherein the distribution difference between the representation features and corresponding topic labels is minimized by minimizing a calculated value of the intra-modal semantic loss function, the correlation between the representation features of different modal data information under the same topic is maximized by minimizing a calculated value of the inter-modal similarity loss function, and the distinction about modalities between the representation features of different modal data information is maximized by minimizing a calculated value of the cross-modal discriminant loss function;
tuning and optimizing the generator by minimizing a difference between the calculated value of the generation loss function and the calculated value of the cross-modal discriminant loss function; adjusting parameters to optimize the discriminator by maximizing a difference between the calculated value of the generating loss function and the calculated value of the cross-modal discriminant loss function; and carrying out multiple iterations to obtain a final generator.
In some embodiments, obtaining raw features of data information of multiple modalities includes:
obtaining TF-IDF characteristics of the character modal information as original characteristics of the character modal information, obtaining convolution characteristics of the image modal information as original characteristics of the image modal information, and recording the original characteristics X ═ X of each data informationt 1,xt 2,…,xt m,xv 1,xv 2,…,xv n},xt mIs the original feature of the m-th text mode information, xv nThe original features of the nth image mode information are M is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N, and M and N are positive integers.
In some embodiments, segmenting each original feature to obtain a corresponding plurality of local features, and acquiring the representation features of the data information of each modality in the same representation subspace through a self-attention mechanism based on the local features includes:
dividing the TF-IDF characteristics of the character modal information and the convolution characteristics of the image modal information into k blocks respectively, and recording the k blocks as follows: x is the number oft m={bt m,1,bt m,2,…,bt m,k},xv n={bv n,1,bv n,2,…,bv n,k},bt m,kSemantic features of the kth block of text as mth text mode information, bv n,kThe k block of image semantic features of the n image modality information;
using function ftAnd gtConverting the segmented text semantic features into features representing subspaces:wherein wt fAnd wt gIs ftAnd gtThe parameter vector of (2);
attention parameters between the ith block text semantic feature and the jth block text semantic feature of the mth word modal information are as follows:
the output characteristic expression of the i block text semantic characteristic of the m character modal information is as follows:
the representation characteristics of the mth text mode information are as follows: st m={ot m,1,ot m,2,...,ot m,k};
Using function fvAnd gvConverting the segmented image semantic features into features representing subspaces:wherein wv fAnd wv gIs fvAnd gvThe parameter vector of (2);
the attention parameter between the image semantic feature of the ith block and the image semantic feature of the jth block of the nth image modality information is as follows:
the output characteristic expression of the ith block of image semantic characteristics of the nth image modality information is as follows:
the representation characteristic of the nth image modality information is: sv n={ov n,1,ov n,2,...,ov n,k}。
In some embodiments, the intra-modal semantic loss function is:
wherein, yt iAnd yv jIndividual watchShowing that topic label vectors of the ith character modal information and the jth image modal information in one-hot form in the training sample set are under the same topic The parameter set for the text modality generator is thetatThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;a set of parameters for the image modality generator is combined to θvThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; m is the number of the character mode information in the training sample set, and N is the number of the image mode information in the training sample set; function(s)For handlesAndprocessed by fully connected neural network to be able to communicate with yt iAnd/or yv jThe dimension of the multiplication.
In some embodiments, the inter-modal similarity loss function is:
wherein, yt iAnd yv jRespectively representing the ith character modal information and the jth image modal information one-hot form topic label vectors in the training sample set under the same topic The parameter set for the text modality generator is thetatThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;a set of parameters for the image modality generator is combined to θvThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; m is the number of the character mode information in the training sample set, and N is the number of the image mode information in the training sample set;
the generation loss function is: l isgeneration=αLlabel+βLsimilarityα and β are weight coefficients of the intra-modality semantic loss function and the inter-modality similarity loss function, respectively.
In some embodiments, the cross-modal discriminant loss function is:
wherein, ceThe modal label is in a form of searched target data information one-hot;the parameter set for the text modality generator is thetatThe representation characteristics corresponding to the e-th character mode information,the original characteristics of the e-th character modal information are obtained;a set of parameters for the image modality generator is combined to θvThe representation characteristic corresponding to the e-th image modality information,original features of the e-th image modality information; in the training process, character modal information and image modal information are input in pairs, and E is the number of data pairs; function(s)In the parameter set thetapAnd converting the representation characteristics of each character mode information and each image mode information into the same representation subspace under the control.
On the other hand, the invention also provides a social media cross-modal data information searching method, which comprises the following steps:
inputting data information to be searched into a generator to obtain representation characteristics of the data information to be searched;
wherein the generator is derived by counterlearning based on a training sampler; the training sample set includes: social media data information of multiple modalities, and topics to which the data information belongs and corresponding modalities are used as tags; wherein, the data information of the plurality of modalities comprises: text modality information and image modality information; the generator includes: the character modality generator and the image modality generator are used for acquiring original features of the data information in corresponding modalities, dividing each original feature to acquire a plurality of corresponding local features, and acquiring the representation features of the data information in the same representation subspace in each modality through a self-attention mechanism based on the local features; supervising combating the generator by means of an arbiter, the arbiter employing a loss function comprising: a generating loss function obtained by weighting and summing the intra-modal semantic loss function and the inter-modal similarity loss function, and a cross-modal discriminant loss function; wherein the distribution difference between the representation features and corresponding topic labels is minimized by minimizing a calculated value of the intra-modal semantic loss function, the correlation between the representation features of different modal data information under the same topic is maximized by minimizing a calculated value of the inter-modal similarity loss function, and the distinction about modalities between the representation features of different modal data information is maximized by minimizing a calculated value of the cross-modal discriminant loss function; tuning and optimizing the generator by minimizing a difference between the calculated value of the generation loss function and the calculated value of the cross-modal discriminant loss function; adjusting parameters to optimize the discriminator by maximizing a difference between the calculated value of the generating loss function and the calculated value of the cross-modal discriminant loss function; iterating for multiple times to obtain a final generator;
traversing the existing data information of the target mode, and acquiring the representation characteristics of the existing data information generated by the generator;
and acquiring the existing data information of one or more target modes which are most similar to the representation characteristics of the data information to be searched based on similarity matching.
In some embodiments, obtaining the existing data information of one or more target modalities closest to the representation features of the data information to be searched based on similarity matching includes:
based on the representation features of the data information to be searched and the representation features corresponding to the existing data information of the target modality, calculating an L2 norm of cross-modality matching as a similarity:
wherein the content of the first and second substances,parameter sets for the text modality generator areThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;a set of parameters for the image modality generator is combined asThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; fixingOrOne is the representation characteristics of the data information to be searched in the corresponding mode, and the other is the representation characteristics of each existing data in the target mode;
and sequencing the existing data information based on the similarity, and acquiring the existing data information of one or more target modes with the highest similarity with the data information to be searched.
In another aspect, the present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the steps of the above-mentioned method.
The method has the advantages that the generator is used for respectively mapping the representation characteristics of the text modal information and the image modal information under the self-attention mechanism, and the semantic characteristics of the cross-modal data content in the social media under the same representation subspace are extracted; based on generation antagonism learning, the accuracy of mapping corresponding topics of the representation features generated by the generator between the same modal data information and between different modal data information is improved by using the supervision of the discriminator, and meanwhile, the distribution of the representation features of different modal data information under the same topic is differentiated. Therefore, the method adapts to the characteristic of semantic sparsity of data information in social media and improves the accuracy of searching between cross-modal data information.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. For purposes of illustrating and describing some portions of the present invention, corresponding parts of the drawings may be exaggerated, i.e., may be larger, relative to other components in an exemplary apparatus actually manufactured according to the present invention. In the drawings:
FIG. 1 is a schematic flowchart illustrating a method for training a data feature generator in a cross-modal search of social media according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a logic structure of a training method for a data feature generator in a cross-modal search of social media according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram illustrating iterative optimization of a training method for a data feature generator in a cross-modal search of social media according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a social media cross-modality data information search method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
It is also noted herein that the term "coupled," if not specifically stated, may refer herein to not only a direct connection, but also an indirect connection in which an intermediate is present.
It should be noted that, the "modality" mentioned in the present invention refers to the form of data information, and may include: text information, image information, audio information, or video information. "topic" refers to the semantically directed content of data information to represent matters discussed and focused on per media in social interaction, such as a particular news topic, which contains pieces of data information, image information, audio information, or video information associated therewith.
Because data information semantics in the social media are sparse, text modal information data is short and irregular, image modal information resolution is low, and composition is incomplete, the work of searching data information in the social media in a cross-modal mode is difficult to realize. In the prior art, the search of cross-modal data information in social media is difficult to adapt to the characteristic of sparse semantics to realize high-precision search, or the search analysis process is too complex and the realization difficulty is high.
The invention provides a training and searching method for a data feature generator in cross-modal search of social media, which is used for extracting representation features of data information in the social media, realizing the search of the cross-modal data information in a similarity comparison mode, improving the precision of search matching, simplifying the search implementation process and improving the efficiency.
In one aspect, the present invention provides a training method for a data representation feature generator in cross-modal search of social media, wherein the generator for extracting data information representation features in social media is generated based on counterstudy training, as shown in fig. 1, the training method includes steps S101 to S104, it should be noted that the steps of the training method do not limit the sequence, and it should be understood that, in the training process, the steps S101 to S104 may synchronize or change the implementation sequence in some cases:
step S101: obtaining a training sample set, the training sample set comprising: social media data information of multiple modalities, and topics to which the data information belongs and corresponding modalities are used as tags; wherein, the data information of the plurality of modalities comprises: text modality information and image modality information.
Step S102: based on the training sample set, the generator is adopted to obtain the representation characteristics of each data information, and the generator comprises: the character modal generator and the image modal generator are used for acquiring original features of data information under corresponding modalities, dividing each original feature to acquire a plurality of corresponding local features, and acquiring representation features of the data information under each modality in the same representation subspace through a self-attention mechanism based on the local features.
Step S103: supervising the countermeasure generator by means of a discriminator, the penalty function employed by the discriminator comprising: a generating loss function obtained by weighting and summing the intra-modal semantic loss function and the inter-modal similarity loss function, and a cross-modal discriminant loss function; the distribution difference between the representation features and the corresponding topic labels is minimized by minimizing the calculation value of a semantic loss function in the modes, the correlation between the representation features of different mode data information under the same topic is maximized by minimizing the calculation value of a similarity loss function between the modes, and the difference of the representation features of the different mode data information about the modes is maximized by minimizing the calculation value of a cross-mode discriminant loss function.
Step S104: adjusting a parameter optimization generator by minimizing the difference between the calculated value of the generated loss function and the calculated value of the cross-mode discriminant loss function; adjusting a parameter and optimizing a discriminator by maximizing the difference between a calculated value of a generated loss function and a calculated value of a cross-mode discrimination loss function; and carrying out multiple iterations to obtain a final generator.
In step S101, data information in social media is used as a data item, a sample training set is established for counterstudy, and topics and modalities corresponding to the data information are labeled as tags. In order to embody the characteristics of the social media cross-modal search, the data information at least comprises two forms of text modal information and image modal information, and in other embodiments, in order to adapt to higher retrieval requirements, the data information also comprises audio modal information and/or video information. The topic tags and the modal tags corresponding to the data information may be marked in a one-hot encoding form, and in other embodiments, the topic tags and the modal tags may also be marked in other forms of tag encoding according to specific situations. In some embodiments, the number of text modality information and image modality information in the default sample training set is consistent.
In step S102, as shown in fig. 2, a counterstudy method is adopted, and the generator is used to collect the representation features of each data information for similarity comparison, so as to implement cross-modal search. Specifically, because of the great difference in data form between different modality data information, features generated by single extraction of different modality data information are not consistent in form, content, meaning and standard, are not in the same evaluation dimension, cannot be directly compared, and even cannot be used for mutual retrieval. Therefore, in order to search between different modality data information, it is necessary to acquire features of the same evaluation dimension, i.e., features within the same representation subspace, generated by different modality data information.
In the present embodiment, the generator first obtains the original features directly generated from each data message, and the form and the collection method of the original features are determined according to the modality of the corresponding data message. For example, the text modal information may adopt TF-IDF (term frequency-inverse document frequency) characteristics as its corresponding original characteristics; the image modality information may employ a convolution feature (VGGNet convolution neural network feature) as its corresponding raw feature. Based on the principle of a self-attention mechanism, the original features of each piece of text mode information or image mode information are divided into a plurality of local features. For single text mode information or image mode information, attention parameters of each local feature relative to other local features are obtained, and the attention parameters are accumulated after being subjected to product in a uniform expression subspace, so that the attention of the local features can be expressed, and corresponding output features of the local features are obtained. The feature vector formed by combining the output features of the local features is used as a final representation feature, and the semantics and the mode of the feature vector can be reflected in a correlated manner.
Specifically, the training sample set has data information of a plurality of topics, and the modality of the data information includes two types, namely characters and images. Define the training sample set as C ═ t1,t2,…,tm,v1,v2,…,vn},tmRepresenting the m-th text modality information, vnRepresenting nth image modality information; and simultaneously, marking topics and modes corresponding to each character mode information and each graph mode information as labels in a training sample set. Topic tags and modal tags can be labeled in a one-hot (one-hot coded) coding vector form, Q states are coded by using Q-bit state registers, each state has an independent register bit, only one of the register bits is effective, and the tableA state is reached. For example, 5 topic categories, encoded with a 5-bit status register, have tags of [1,0,0 ] when the data information belongs to the first category]. The mode can be encoded by using a 2-bit status register, so that [1,0 ]]Mark text Modal, [0,1]The image modality is labeled.
In some embodiments, obtaining raw features of data information of multiple modalities includes:
obtaining TF-IDF characteristics of character modal information as original characteristics of the character modal information, obtaining convolution characteristics of image modal information as original characteristics of the image modal information, and recording the original characteristics X ═ X of each data informationt 1,xt 2,…,xt m,xv 1,xv 2,…,xv n},xt mIs the original feature of the m-th text mode information, xv nThe original features of the nth image mode information are M is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N, and M and N are positive integers.
In some embodiments, as shown in fig. 2, each original feature is segmented to obtain a plurality of corresponding local features, and the representation features of the data information of each modality in the same representation subspace are obtained through an attention-driven mechanism based on the local features, including S201 to S208, where S202 to S204 are generation processes of the text modality information representation features, and S205 to S207 are generation processes of the image modality information representation features:
s201: dividing TF-IDF characteristics of the character modal information and convolution characteristics of the image modal information into k blocks respectively, and recording the k blocks as follows: x is the number oft m={bt m,1,bt m,2,…,bt m,k},xv n={bv n,1,bv n,2,…,bv n,k},bt m,kSemantic features of the kth block of text as mth text mode information, bv n,kAnd the semantic features of the kth block of the nth image modality information.
S202: using function ftAnd gtSemantic features of segmented textConversion to a feature representing a subspace:
wherein wt fAnd wt gIs ftAnd gtThe parameter vector of (2).
S203: calculating attention parameters between the ith block text semantic feature and the jth block text semantic feature of the mth word modal information as follows:
s204: calculating an output characteristic expression of the ith block text semantic characteristic of the mth character modal information as follows:
outputting the representation characteristics of the mth text mode information as follows: st m={ot m,1,ot m,2,...,ot m,k}。
S205: using function fvAnd gvConverting the segmented image semantic features into features representing subspaces:
wherein, wv fAnd wv gIs fvAnd gvThe parameter vector of (2);
s206: calculating attention parameters between the ith block image semantic feature and the jth block image semantic feature of the nth image modality information as follows:
s207: calculating the output characteristic expression of the ith block of image semantic characteristics of the nth image modality information as follows:
the representation characteristics of the nth image modality information are as follows: sv n={ov n,1,ov n,2,...,ov n,k}。
In the embodiment, the representation features for representing the semantics of the text modal information and the image modal information are extracted through a self-attention mechanism, and the evaluation dimensions are unified, so that the representation features of the data information among different modalities are in the same representation subspace, and the search of the cross-modality data information is realized. Wherein the function ft、gtAnd htIs a function f for converting each local feature of the original features of the text modal information into a representation subspacev、gvAnd hvThe method is used for converting each local feature of the original features of the image mode information into the same expression subspace so as to realize the unification of evaluation dimensions. Function ft、gtAnd htAnd a function fv、gvAnd hvCorresponding parameter vectors are all in oppositionIn the learning process, the optimal value is obtained by iterative updating under the supervision of the discriminator.
In step S103, as shown in fig. 2, optimization of the generator is achieved by providing a system in which the arbiter and the generator form a counterlearning. In particular, the invention aims to adjust the generator to make the representation characteristics generated based on the social media data information have the following effects through supervision of the discriminator: 1. minimizing a difference in distribution between the representative features and the corresponding topic labels even if the representative features of the respective data information are accurately associated to represent the topics thereof; 2. the relevance of the representation features of different modal data information of the same topic about the topic is maximized, even if the representation features of different modal data information of the same topic are converged in a semantic angle; 3. the difference of representation features of different modal data information about the modal is strengthened, and even the representation features of different modal data information under the same topic tend to be differentiated in the modal angle. In order to enable the representation features generated by the generator to achieve the 3 effects, supervised learning is required to be performed through a discriminator, and specifically, a generation loss function obtained by weighted summation of an intra-modal semantic loss function and an inter-modal similarity loss function and a cross-modal discriminant loss function are combined to achieve the purpose.
In some embodiments, an intra-modal semantic loss function is used to minimize the distribution difference between the representation features and the corresponding topic tags, the intra-modal semantic loss function being:
wherein, yt iAnd yv jRespectively representing the topic label vectors of the ith character modal information and the jth image modal information in the training sample set in the one-hot form under the same topic As arguments for text modality generatorsNumber set of thetatThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;the set of parameters for the image modality generator is combined to θvThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; m is the number of the character mode information in the training sample set, and N is the number of the image mode information in the training sample set; is a function ofFor predicting topic probability distribution of each text or image representing features, processing the generated representation features into a probability distribution capable of being connected with y through a fully-connected neural networkt iAnd/or yv jThe dimension of the multiplication.
In some embodiments, an inter-modal similarity loss function is used to maximize the relevance of a representation feature with respect to a topic between different modal data information for the same topic, the inter-modal similarity loss function being:
wherein, yt iAnd yv jRespectively representing the topic label vectors of the ith character modal information and the jth image modal information in the training sample set in the one-hot form under the same topic As arguments for text modality generatorsNumber set of thetatThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;the set of parameters for the image modality generator is combined to θvThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; m is the number of the character mode information in the training sample set, and N is the number of the image mode information in the training sample set.
In some embodiments, the intra-modality semantic loss function and the inter-modality similarity loss function are summed in a weighted manner to obtain a resulting loss function. The resulting loss function is: l isgeneration=αLlabel+βLsimilarityα and β are weight coefficients corresponding to the intra-modal semantic loss function and the inter-modal similarity loss function, respectively.
In some embodiments, the cross-modal discriminant loss function is used to enhance the distinction of the representation features with respect to the modalities between different modality data information on the same topic, and the cross-modal discriminant loss function is:
wherein, ceThe modal label is in a form of searched target data information one-hot;the set of parameters for the text modality generator is thetatThe representation characteristics corresponding to the e-th character mode information,the original characteristics of the e-th character modal information are obtained;the set of parameters for the image modality generator is combined to θvThe representation characteristic corresponding to the e-th image modality information,original features of the e-th image modality information; in the training process, character modal information and image modal information are input in pairs, and E is the number of data pairs; function(s)In the parameter set thetapAnd converting the representation characteristics of each character mode information and each image mode information into the same representation subspace under the control.
In this embodiment, since the search process is divided into two cases, namely, a text search image (the searched target data information is image modality information) and an image search text (the searched target data information is text modality information), it is necessary to distinguish the modalities, that is, to discriminate the action of the loss function across the modalities.
In step S104, as shown in fig. 3, the generator is optimized by the fixed arbiter, and then the generator is optimized by the fixed arbiter, and multiple iterations are performed to obtain a better generator, so as to implement an effective and complete antagonistic learning process.
Specifically, in this embodiment, based on the difference between the minimum generation loss function and the cross-modal discriminant loss function, the optimized parameter set θ of the text modal generator is obtainedtAnd a parameter set theta of an image modality generatorvNamely:
Obtaining a parameter set theta of an optimization discriminator based on the difference between the maximum generation loss function and the cross-mode discrimination loss functionpNamely:
On the other hand, the invention also provides a social media cross-modal data information searching method, as shown in fig. 4, including steps S301 to S303:
step S301: and inputting the data information to be searched into a generator to obtain the representation characteristics of the data information to be searched.
Wherein the generator is obtained by counterlearning based on the training samplers; the training sample set includes: social media data information of multiple modalities, and topics to which the data information belongs and corresponding modalities are used as tags; wherein, the data information of the plurality of modalities comprises: text modality information and image modality information; the generator comprises: the system comprises a character modal generator and an image modal generator, wherein the character modal generator and the image modal generator are used for acquiring original features of data information under corresponding modalities, dividing each original feature to acquire a plurality of corresponding local features, and acquiring representation features of the data information under each modality in the same representation subspace through a self-attention mechanism based on the local features; supervising the countermeasure generator by means of a discriminator, the penalty function employed by the discriminator comprising: a generating loss function obtained by weighting and summing the intra-modal semantic loss function and the inter-modal similarity loss function, and a cross-modal discriminant loss function; the distribution difference between the representation features and the corresponding topic labels is minimized by minimizing the calculation value of a semantic loss function in the modes, the correlation between the representation features of different mode data information under the same topic is maximized by minimizing the calculation value of a similarity loss function between the modes, and the difference of the representation features of the different mode data information about the modes is maximized by minimizing the calculation value of a cross-mode discriminant loss function; adjusting a parameter optimization generator by minimizing the difference between the calculated value of the generated loss function and the calculated value of the cross-mode discriminant loss function; adjusting a parameter and optimizing a discriminator by maximizing the difference between a calculated value of a generated loss function and a calculated value of a cross-mode discrimination loss function; and carrying out multiple iterations to obtain a final generator.
Step S302: and traversing the existing data information of the target modality, and acquiring the representation characteristics generated by the same generator of each existing data information.
Step S303: and acquiring the existing data information of one or more target modes which are most similar to the representation characteristics of the data information to be searched based on similarity matching.
Based on the same inventive concept as steps S101 to S104, in step S301 of this embodiment, a generator generated by the training method of the data representation feature generator in the social media cross-modality search is used to collect the representation features of the data information to be searched. In step S302, the existing data is traversed to obtain the representation features of each existing data generated by the generator in step S301. In step S303, one or more pieces of recent existing data information of the target modality are obtained by the proximity matching search.
In some embodiments, in step S303, that is, acquiring existing data information of one or more target modalities that are closest to the representation features of the data information to be searched based on similarity matching, includes S3031 to S3032: :
s3031: based on the representation features of the data information to be searched and the representation features corresponding to the existing data information of the target modality, calculating an L2 norm of cross-modality matching as a similarity:
wherein the content of the first and second substances,parameter sets for text modality generatorsThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;a set of parameters for an image modality generator isThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; fixingOrOne is the representation characteristics of the data information to be searched in the corresponding modality, and the other is the representation characteristics of each existing data in the target modality.
S3032: and sequencing the existing data information based on the similarity, and acquiring the existing data information of one or more target modes with the highest similarity with the data information to be searched.
In the present embodiment, when searching for image modality information based on text modality information, it is fixedTraversing the image mode information in the existing data information for the representation characteristics of the data information to be searched, and calculating the similarity based on the calculation formula (14), and based on the similarityAnd arranging the image modality information in the existing data information to obtain one or more pieces of image modality information with the highest similarity. Similarly, when searching for text modality information based on image modality information, it is fixedTraversing character modal information in the existing data information for representing characteristics of the data information to be searched, calculating the similarity based on a calculation formula (14), and arranging the character modal information in the existing data information based on the similarity to obtain one or more pieces of character modal information with the highest similarity. Wherein, the smaller the sim, the higher the similarity.
In another aspect, the present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the steps of the above-mentioned method.
In summary, the training and searching method for the data feature generator in the social media cross-modal search according to the present invention realizes the search between the social media cross-modal data information by the counterstudy method, and emphasizes the cross-modal content search between the text modal information and the image modal information. The generator for counterlearning reconstructs original features of different modal data information in the social media based on a self-attention mechanism, and the original features are mapped into a representation subspace which can be directly compared, so that the search of cross-modal data information is realized. Further, a joint loss function is established through the discriminator, and the generated representation features are guided to be representation features following the corresponding modal semantic distribution by utilizing the intra-modal semantic loss function and the inter-modal similarity loss function. A loss function is discriminated across modes to achieve discrimination of modes. The method can adapt to the characteristic of sparse data information semantics in social media, complete accurate, efficient and stable search of cross-modal data information, and greatly improve the efficiency compared with the prior art.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A training method for a data representation feature generator in a social media cross-modal search is characterized by comprising the following steps:
obtaining a training sample set, the training sample set comprising: social media data information of multiple modalities, and topics to which the data information belongs and corresponding modalities are used as tags; wherein, the data information of the plurality of modalities comprises: text modality information and image modality information;
obtaining, with a generator, representative features of each data information based on the set of training samples, the generator including: the character modality generator and the image modality generator are used for acquiring original features of the data information in corresponding modalities, dividing each original feature to acquire a plurality of corresponding local features, and acquiring the representation features of the data information in the same representation subspace in each modality through a self-attention mechanism based on the local features;
supervising combating the generator by means of an arbiter, the arbiter employing a loss function comprising: a generating loss function obtained by weighting and summing the intra-modal semantic loss function and the inter-modal similarity loss function, and a cross-modal discriminant loss function; wherein the distribution difference between the representation features and corresponding topic labels is minimized by minimizing a calculated value of the intra-modal semantic loss function, the correlation between the representation features of different modal data information under the same topic is maximized by minimizing a calculated value of the inter-modal similarity loss function, and the distinction about modalities between the representation features of different modal data information is maximized by minimizing a calculated value of the cross-modal discriminant loss function;
tuning and optimizing the generator by minimizing a difference between the calculated value of the generation loss function and the calculated value of the cross-modal discriminant loss function; adjusting parameters to optimize the discriminator by maximizing a difference between the calculated value of the generating loss function and the calculated value of the cross-modal discriminant loss function; and carrying out multiple iterations to obtain a final generator.
2. The method for training a data representation feature generator in social media cross-modal search according to claim 1, wherein obtaining original features of the data information in corresponding modalities comprises:
obtaining TF-IDF characteristics of the character modal information as original characteristics of the character modal information, obtaining convolution characteristics of the image modal information as original characteristics of the image modal information, and recording the original characteristics X ═ X of each data informationt 1,xt 2,…,xt m,xv 1,xv 2,…,xv n},xt mIs the original feature of the m-th text mode information, xv nThe original features of the nth image mode information are M is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N, and M and N are positive integers.
3. The training method of the data representation feature generator in the social media cross-modal search according to claim 2, wherein the step of segmenting each original feature to obtain a plurality of corresponding local features, and obtaining the representation features of the data information of each modality in the same representation subspace through a self-attention mechanism based on the local features comprises the steps of:
dividing the TF-IDF characteristics of the character modal information and the convolution characteristics of the image modal information into k blocks respectively, and recording the k blocks as follows: x is the number oft m={bt m,1,bt m,2,…,bt m,k},xv n={bv n,1,bv n,2,…,bv n,k},bt m,kSemantic features of the kth block of text as mth text mode information, bv n,kIs n thThe kth block of image semantic features of the image modality information;
using function ftAnd gtConverting the segmented text semantic features into features representing subspaces:wherein wt fAnd wt gIs ftAnd gtThe parameter vector of (2);
attention parameters between the ith block text semantic feature and the jth block text semantic feature of the mth word modal information are as follows:
the output characteristic expression of the i block text semantic characteristic of the m character modal information is as follows:
the representation characteristics of the mth text mode information are as follows: st m={ot m,1,ot m,2,...,ot m,k};
Using function fvAnd gvConverting the segmented image semantic features into features representing subspaces:wherein wv fAnd wv gIs fvAnd gvThe parameter vector of (2);
the attention parameter between the image semantic feature of the ith block and the image semantic feature of the jth block of the nth image modality information is as follows:
the output characteristic expression of the ith block of image semantic characteristics of the nth image modality information is as follows:
the representation characteristic of the nth image modality information is: sv n={ov n,1,ov n,2,...,ov n,k}。
4. The method for training a data representation feature generator in social media cross-modal search according to claim 1, wherein the intra-modal semantic loss function is:
wherein, yt iAnd yv jRespectively representing the ith character modal information and the jth image modal information one-hot form topic label vectors in the training sample set under the same topic The parameter set for the text modality generator is thetatThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;a set of parameters for the image modality generator is combined to θvThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; m is the number of the character mode information in the training sample set, and N is the number of the image mode information in the training sample set; function(s)For handlesAndprocessed by fully connected neural network to be able to communicate with yt iAnd/or yv jThe dimension of the multiplication.
5. The method for training a data representation feature generator in a social media cross-modality search according to claim 4, wherein the inter-modality similarity loss function is:
wherein, yt iAnd yv jRespectively representing the ith character modal information and the jth image modal information one-hot form topic label vectors in the training sample set under the same topic The parameter set for the text modality generator is thetatThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;a set of parameters for the image modality generator is combined to θvThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; m is the number of the character mode information in the training sample set, and N is the number of the image mode information in the training sample set;
the generation loss function is: l isgeneration=αLlabel+βLsimilarityα and β are weight coefficients of the intra-modality semantic loss function and the inter-modality similarity loss function, respectively.
6. The method for training a data representation feature generator in social media cross-modal search according to claim 5, wherein the cross-modal discriminant loss function is:
wherein, ceThe modal label is in a form of searched target data information one-hot;the parameter set for the text modality generator is thetatIs first ofe expression characteristics corresponding to the text mode information,the original characteristics of the e-th character modal information are obtained;a set of parameters for the image modality generator is combined to θvThe representation characteristic corresponding to the e-th image modality information,original features of the e-th image modality information; in the training process, character modal information and image modal information are input in pairs, and E is the number of data pairs; function(s)In the parameter set thetapAnd converting the representation characteristics of each character mode information and each image mode information into the same representation subspace under the control.
7. A social media cross-modal data information search method is characterized by comprising the following steps:
inputting data information to be searched into a generator to obtain representation characteristics of the data information to be searched;
wherein the generator is derived by counterlearning based on a training sampler; the training sample set includes: social media data information of multiple modalities, and topics to which the data information belongs and corresponding modalities are used as tags; wherein, the data information of the plurality of modalities comprises: text modality information and image modality information; the generator includes: the character modality generator and the image modality generator are used for acquiring original features of the data information in corresponding modalities, dividing each original feature to acquire a plurality of corresponding local features, and acquiring the representation features of the data information in the same representation subspace in each modality through a self-attention mechanism based on the local features; supervising combating the generator by means of an arbiter, the arbiter employing a loss function comprising: a generating loss function obtained by weighting and summing the intra-modal semantic loss function and the inter-modal similarity loss function, and a cross-modal discriminant loss function; wherein the distribution difference between the representation features and corresponding topic labels is minimized by minimizing a calculated value of the intra-modal semantic loss function, the correlation between the representation features of different modal data information under the same topic is maximized by minimizing a calculated value of the inter-modal similarity loss function, and the distinction about modalities between the representation features of different modal data information is maximized by minimizing a calculated value of the cross-modal discriminant loss function; tuning and optimizing the generator by minimizing a difference between the calculated value of the generation loss function and the calculated value of the cross-modal discriminant loss function; adjusting parameters to optimize the discriminator by maximizing a difference between the calculated value of the generating loss function and the calculated value of the cross-modal discriminant loss function; iterating for multiple times to obtain a final generator;
traversing the existing data information of the target mode, and acquiring the representation characteristics of the existing data information generated by the generator;
and acquiring the existing data information of one or more target modes which are most similar to the representation characteristics of the data information to be searched based on similarity matching.
8. The method according to claim 7, wherein the obtaining of the existing data information of one or more target modalities closest to the representation features of the data information to be searched based on similarity matching comprises:
based on the representation features of the data information to be searched and the representation features corresponding to the existing data information of the target modality, calculating an L2 norm of cross-modality matching as a similarity:
wherein the content of the first and second substances,parameter sets for the text modality generator areThe representation characteristics corresponding to the ith character mode information,the original characteristics of the ith character modal information are obtained;a set of parameters for the image modality generator is combined asThe j-th image modality information corresponds to the representation characteristics,original features of jth image modality information; fixingOrOne is the representation characteristics of the data information to be searched in the corresponding mode, and the other is the representation characteristics of each existing data in the target mode;
and sequencing the existing data information based on the similarity, and acquiring the existing data information of one or more target modes with the highest similarity with the data information to be searched.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010418678.7A CN111598712B (en) | 2020-05-18 | 2020-05-18 | Training and searching method for data feature generator in social media cross-modal search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010418678.7A CN111598712B (en) | 2020-05-18 | 2020-05-18 | Training and searching method for data feature generator in social media cross-modal search |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111598712A true CN111598712A (en) | 2020-08-28 |
CN111598712B CN111598712B (en) | 2023-04-18 |
Family
ID=72192242
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010418678.7A Active CN111598712B (en) | 2020-05-18 | 2020-05-18 | Training and searching method for data feature generator in social media cross-modal search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111598712B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215837A (en) * | 2020-10-26 | 2021-01-12 | 北京邮电大学 | Multi-attribute image semantic analysis method and device |
CN113420166A (en) * | 2021-03-26 | 2021-09-21 | 阿里巴巴新加坡控股有限公司 | Commodity mounting, retrieving, recommending and training processing method and device and electronic equipment |
CN114091662A (en) * | 2021-11-26 | 2022-02-25 | 广东伊莱特电器有限公司 | Text image generation method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299341A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | One kind confrontation cross-module state search method dictionary-based learning and system |
CN110059157A (en) * | 2019-03-18 | 2019-07-26 | 华南师范大学 | A kind of picture and text cross-module state search method, system, device and storage medium |
CN110222140A (en) * | 2019-04-22 | 2019-09-10 | 中国科学院信息工程研究所 | A kind of cross-module state search method based on confrontation study and asymmetric Hash |
US20190333199A1 (en) * | 2018-04-26 | 2019-10-31 | The Regents Of The University Of California | Systems and methods for deep learning microscopy |
-
2020
- 2020-05-18 CN CN202010418678.7A patent/CN111598712B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190333199A1 (en) * | 2018-04-26 | 2019-10-31 | The Regents Of The University Of California | Systems and methods for deep learning microscopy |
CN109299341A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | One kind confrontation cross-module state search method dictionary-based learning and system |
CN110059157A (en) * | 2019-03-18 | 2019-07-26 | 华南师范大学 | A kind of picture and text cross-module state search method, system, device and storage medium |
CN110222140A (en) * | 2019-04-22 | 2019-09-10 | 中国科学院信息工程研究所 | A kind of cross-module state search method based on confrontation study and asymmetric Hash |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215837A (en) * | 2020-10-26 | 2021-01-12 | 北京邮电大学 | Multi-attribute image semantic analysis method and device |
CN113420166A (en) * | 2021-03-26 | 2021-09-21 | 阿里巴巴新加坡控股有限公司 | Commodity mounting, retrieving, recommending and training processing method and device and electronic equipment |
CN114091662A (en) * | 2021-11-26 | 2022-02-25 | 广东伊莱特电器有限公司 | Text image generation method and device and electronic equipment |
CN114091662B (en) * | 2021-11-26 | 2024-05-14 | 广东伊莱特生活电器有限公司 | Text image generation method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111598712B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543084B (en) | Method for establishing detection model of hidden sensitive text facing network social media | |
Li et al. | Weakly supervised deep matrix factorization for social image understanding | |
CN112800776B (en) | Bidirectional GRU relation extraction data processing method, system, terminal and medium | |
CN111598712B (en) | Training and searching method for data feature generator in social media cross-modal search | |
CN104899253B (en) | Towards the society image across modality images-label degree of correlation learning method | |
JP4514082B2 (en) | Method and apparatus for building a text classifier and text classifier | |
CN112131350B (en) | Text label determining method, device, terminal and readable storage medium | |
US9183173B2 (en) | Learning element weighting for similarity measures | |
US7711673B1 (en) | Automatic charset detection using SIM algorithm with charset grouping | |
CN109831460B (en) | Web attack detection method based on collaborative training | |
CN111914156A (en) | Cross-modal retrieval method and system for self-adaptive label perception graph convolution network | |
CN113239214A (en) | Cross-modal retrieval method, system and equipment based on supervised contrast | |
CN113657425A (en) | Multi-label image classification method based on multi-scale and cross-modal attention mechanism | |
US8560466B2 (en) | Method and arrangement for automatic charset detection | |
CN111475603A (en) | Enterprise identifier identification method and device, computer equipment and storage medium | |
CN114510939A (en) | Entity relationship extraction method and device, electronic equipment and storage medium | |
CN112163114B (en) | Image retrieval method based on feature fusion | |
CN114528827A (en) | Text-oriented confrontation sample generation method, system, equipment and terminal | |
Lee et al. | Effective evolutionary multilabel feature selection under a budget constraint | |
CN116956289B (en) | Method for dynamically adjusting potential blacklist and blacklist | |
CN112445862A (en) | Internet of things equipment data set construction method and device, electronic equipment and storage medium | |
CN112487263A (en) | Information processing method, system, equipment and computer readable storage medium | |
CN116385946A (en) | Video-oriented target fragment positioning method, system, storage medium and equipment | |
US11907307B1 (en) | Method and system for event prediction via causal map generation and visualization | |
Mady et al. | Enhancing performance of biomedical named entity recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |