CN113343664A - Method and device for determining matching degree between image texts - Google Patents

Method and device for determining matching degree between image texts Download PDF

Info

Publication number
CN113343664A
CN113343664A CN202110724610.6A CN202110724610A CN113343664A CN 113343664 A CN113343664 A CN 113343664A CN 202110724610 A CN202110724610 A CN 202110724610A CN 113343664 A CN113343664 A CN 113343664A
Authority
CN
China
Prior art keywords
image
common sense
information
text
matched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110724610.6A
Other languages
Chinese (zh)
Other versions
CN113343664B (en
Inventor
白亚龙
张炜
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Shuke Haiyi Information Technology Co Ltd
Original Assignee
Jingdong Shuke Haiyi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Shuke Haiyi Information Technology Co Ltd filed Critical Jingdong Shuke Haiyi Information Technology Co Ltd
Priority to CN202110724610.6A priority Critical patent/CN113343664B/en
Publication of CN113343664A publication Critical patent/CN113343664A/en
Application granted granted Critical
Publication of CN113343664B publication Critical patent/CN113343664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for determining matching degree between image texts. One embodiment of the method comprises: determining image characteristic information of an image to be matched and text characteristic information of a text to be matched; determining image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in an image to be matched, and the text common sense feature information represents common sense information related to the target information in a text to be matched; and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the image common sense characteristic information and the text common sense characteristic information. The method for determining the matching degree between the image texts combines the characteristic information of the image texts and the related common sense information, and improves the generalization capability.

Description

Method and device for determining matching degree between image texts
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for determining matching degree between image texts.
Background
Multimodal content understanding is an important topic in the multimedia and computer vision fields. Among them, cross-modal retrieval between images and texts, i.e. image-text matching, is a very challenging research target with important application value. With the rapid development of deep learning technology and the increasing amount of multimedia data, image-text matching technology has made great progress. At present, the idea of the mainstream method of image-text matching can be summarized as follows: and mapping data of two modalities of images and texts into a public hidden space by using a deep neural network, and carrying out similarity measurement.
Disclosure of Invention
The embodiment of the application provides a method and a device for determining matching degree between image texts.
In a first aspect, an embodiment of the present application provides a method for determining a matching degree between image texts, including: determining image characteristic information of an image to be matched and text characteristic information of a text to be matched; determining image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in an image to be matched, and the text common sense feature information represents common sense information related to the target information in a text to be matched; and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the image common sense characteristic information and the text common sense characteristic information.
In some embodiments, before determining the image common sense feature information and the text common sense feature information, the method further includes: generating logic type common sense characteristic information through a graph convolution network representing the logic type common sense information; on the basis of the logic type common sense feature information, generating common sense feature information comprising the logic type common sense information and the statistical type common sense information through a hypergraph convolution network representing the statistical type common sense information; and the above-mentioned common sense feature information of definite image and common sense feature information of text, including: and determining the image common sense feature information and the text common sense feature information according to the common sense feature information.
In some embodiments, the determining the matching degree between the image to be matched and the text to be matched according to the image feature information, the text feature information, the image common sense feature information and the text common sense feature information includes: combining the common sense characteristic information and the image common sense characteristic information to obtain the combined image common sense characteristic information; combining the common sense characteristic information and the text common sense characteristic information to obtain combined common sense characteristic information; and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the combined image common sense characteristic information and the combined text common sense characteristic information.
In some embodiments, the determining the matching degree between the image to be matched and the text to be matched according to the image feature information, the text feature information, the combined image common sense feature information, and the combined text common sense feature information includes: determining a first matching degree between the image characteristic information and the text characteristic information and a second matching degree between the combined image common sense characteristic information and the combined present common sense characteristic information; and determining the matching degree between the image to be matched and the text to be matched according to the first matching degree and the second matching degree.
In some embodiments, the determining the image common sense feature information and the text common sense feature information according to the common sense feature information includes: determining target information in the image to be matched according to the image characteristic information; determining target information in the text to be matched according to the text characteristic information; and determining image common sense feature information corresponding to the target information in the image to be matched and text common sense feature information corresponding to the target information in the text to be matched from the common sense feature information.
In some embodiments, the generating the common sense feature information of logical type through the graph convolution network for characterizing the common sense information of logical type includes: determining a data set corresponding to target information in an image to be matched and target information in a text to be matched; and inputting the initialization vector information of each concept in the data set into the graph convolution network to generate logic type common sense characteristic information.
In some embodiments, in a hypergraph characterized by a hypergraph convolutional network, semantic relevance between multiple concepts connected by a hypergraph is characterized by a hypergraph.
In some embodiments, the determining the image feature information of the image to be matched includes: determining first characteristic information of target information in an image to be matched through a target detection network; and determining the image characteristic information of the image to be matched through the first self-attention network based on the first characteristic information.
In some embodiments, determining text feature information of a text to be matched includes: determining second characteristic information of the text to be matched through a characteristic extraction network; and determining text characteristic information of the text to be matched through a second self-attention network based on the second characteristic information.
In some embodiments, the first self-attention network and the second self-attention network employ a multi-headed self-attention mechanism.
In a second aspect, an embodiment of the present application provides an apparatus for determining a matching degree between image texts, including: the image matching device comprises a first determining unit, a second determining unit and a matching unit, wherein the first determining unit is configured to determine image characteristic information of an image to be matched and text characteristic information of a text to be matched; the second determining unit is configured to determine image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in the image to be matched, and the text common sense feature information represents common sense information related to the target information in the text to be matched; and the third determining unit is configured to determine the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the image common sense characteristic information and the text common sense characteristic information.
In some embodiments, the above apparatus further comprises: a generating unit configured to generate the logic type common sense feature information through a graph convolution network representing the logic type common sense information; on the basis of the logic type common sense feature information, generating common sense feature information comprising the logic type common sense information and the statistical type common sense information through a hypergraph convolution network representing the statistical type common sense information; and a second determination unit further configured to: and determining the image common sense feature information and the text common sense feature information according to the common sense feature information.
In some embodiments, the third determining unit is further configured to: combining the common sense characteristic information and the image common sense characteristic information to obtain the combined image common sense characteristic information; combining the common sense characteristic information and the text common sense characteristic information to obtain combined common sense characteristic information; and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the combined image common sense characteristic information and the combined text common sense characteristic information.
In some embodiments, the third determining unit is further configured to: determining a first matching degree between the image characteristic information and the text characteristic information and a second matching degree between the combined image common sense characteristic information and the combined present common sense characteristic information; and determining the matching degree between the image to be matched and the text to be matched according to the first matching degree and the second matching degree.
In some embodiments, the second determining unit is further configured to: determining target information in the image to be matched according to the image characteristic information; determining target information in the text to be matched according to the text characteristic information; and determining image common sense feature information corresponding to the target information in the image to be matched and text common sense feature information corresponding to the target information in the text to be matched from the common sense feature information.
In some embodiments, the generating unit is further configured to: determining a data set corresponding to target information in an image to be matched and target information in a text to be matched; and inputting the initialization vector information of each concept in the data set into the graph convolution network to generate logic type common sense characteristic information.
In some embodiments, in a hypergraph characterized by a hypergraph convolutional network, semantic relevance between multiple concepts connected by a hypergraph is characterized by a hypergraph.
In some embodiments, the first determining unit is further configured to: determining first characteristic information of target information in an image to be matched through a target detection network; and determining the image characteristic information of the image to be matched through the first self-attention network based on the first characteristic information.
In some embodiments, the first determining unit is further configured to: determining second characteristic information of the text to be matched through a characteristic extraction network; and determining text characteristic information of the text to be matched through a second self-attention network based on the second characteristic information.
In some embodiments, the first self-attention network and the second self-attention network employ a multi-headed self-attention mechanism.
In a third aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method as described in any implementation of the first aspect.
According to the method and the device for determining the matching degree between the image texts, the image characteristic information of the image to be matched and the text characteristic information of the text to be matched are determined; determining image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in an image to be matched, and the text common sense feature information represents common sense information related to the target information in a text to be matched; according to the image feature information, the text feature information, the image common sense feature information and the text common sense feature information, the matching degree between the image to be matched and the text to be matched is determined, so that the method for determining the matching degree between the image texts by combining the image text feature information and the related common sense information is provided, and the generalization capability is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram according to one embodiment of a method for determining a degree of match between image texts of the present application;
fig. 3 is a schematic diagram of an application scenario of the determination method of the matching degree between image texts according to the present embodiment;
FIG. 4 is a flow diagram of yet another embodiment of a method of determining a degree of match between image text according to the present application;
FIG. 5 is a detailed schematic diagram according to an embodiment of the present application;
fig. 6 is a block diagram of an embodiment of a device for determining a degree of matching between image texts according to the present application;
FIG. 7 is a block diagram of a computer system suitable for use in implementing embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary architecture 100 to which the method and apparatus for determining a degree of matching between image texts of the present application can be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The communication connections between the terminal devices 101, 102, 103 form a topological network, and the network 104 serves to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 may be hardware devices or software that support network connections for data interaction and data processing. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices supporting network connection, information acquisition, interaction, display, processing, and the like, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, for example, a background processing server that acquires an image to be matched and a text to be matched, which are sent by the user through the terminal devices 101, 102, and 103, and determines whether the image to be matched and the text to be matched are matched. Optionally, the server may feed back the matching degree result to the terminal device. As an example, the server 105 may be a cloud server.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be further noted that the method for determining the matching degree between the image texts provided by the embodiment of the present application may be executed by a server, or may be executed by a terminal device, or may be executed by the server and the terminal device in cooperation with each other. Accordingly, the parts (for example, the units) included in the device for determining the matching degree between the image texts may be all provided in the server, all provided in the terminal device, or provided in the server and the terminal device, respectively.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the electronic device on which the determination method of the degree of matching between image texts is executed does not need to perform data transmission with other electronic devices, the system architecture may include only the electronic device (e.g., a server or a terminal device) on which the determination method of the degree of matching between image texts is executed.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for determining a degree of match between image text is shown, comprising the steps of:
step 201, determining image feature information of an image to be matched and text feature information of a text to be matched.
In this embodiment, an execution subject (for example, a server in fig. 1) of the method for determining the matching degree between the image texts may obtain the image to be matched and the text to be matched from a remote location or a local location in a wired network connection manner or a wireless network connection manner, and determine image feature information of the image to be matched and text feature information of the text to be matched.
The image to be processed and the text to be matched are any image and text of which the matching degree is to be determined. When the visual semantic information of the image to be processed is consistent with the text semantic information of the text to be matched, the image to be processed and the text semantic information can be considered to be matched.
As an example, the executing body may perform feature extraction on the image to be matched through an image feature extraction network corresponding to the image to be matched, so as to obtain image feature information of the image to be matched; and performing feature extraction on the text to be matched through a text feature extraction network corresponding to the text to be matched to obtain text feature information of the text to be matched. The feature extraction network may be any network model with a feature extraction function. Such as convolutional neural network models, residual neural networks, recurrent neural networks.
In some optional implementations of the embodiment, the executing body may extract image feature information of the image to be matched by:
first, first characteristic information of target information in an image to be matched is determined through a target detection network.
The target detection network is used for determining target frames of target information in the image to be matched and characteristic information of the target information in each target frame. The target information may be all concept information related in the image to be matched, including target objects (e.g., concepts such as people and objects), state information of the target objects (e.g., the target objects are people, and the state information may be a concept of sleeping), and correlation information between the target objects (e.g., the correlation between the target objects people and a hat is a concept of a person wearing a hat).
Then, based on the first feature information, image feature information of the image to be matched is determined through the first self-attention network.
Through the self-attention network, the information with high attention degree in the matching operation of the image text in the image to be matched can be emphasized, so that the accuracy of the image text matching is improved.
In some optional implementation manners of this embodiment, the execution main body may extract text feature information of a text to be matched by:
firstly, determining second feature information of a text to be matched through a feature extraction network.
In this implementation manner, the execution main body may input the text to be matched into the feature extraction network, and the execution main body takes each word in the text to be matched as a basic input unit, and models the text to be matched through the feature extraction network to obtain the second feature information of the text to be matched.
And then, determining text characteristic information of the text to be matched through a second self-attention network based on the second characteristic information.
In the implementation mode, similar to the characteristic extraction process of the image to be matched, the information with high attention degree in the image text matching operation in the text to be matched can be emphasized through the self-attention network, so that the accuracy of image text matching is improved.
In some alternative implementations of the present embodiment, the first self-attention network and the second self-attention network employ a multi-head self-attention mechanism.
Step 202, determining image common sense feature information and text common sense feature information.
In this embodiment, the execution subject may determine the image common sense feature information and the text common sense feature information. The image common sense feature information represents common sense information related to target information in the image to be matched, and the text common sense feature information represents common sense information related to the target information in the text to be matched.
The common sense information may be any common sense information related to the target information in the image to be matched and the text to be matched. Taking a seat as an example, the corresponding common sense information includes that the seat belongs to furniture.
As an example, first, the execution subject initializes each concept related to the target information, and determines a vectorized representation of each concept; and then updating the vectorization representation of each concept through a graph convolution network representing the common sense information to obtain the common sense feature information of each concept after the common sense information is fused. And finally, determining concepts (namely target information related to the image to be matched) in the image characteristic information through a concept prediction model, and determining common sense characteristic information corresponding to the concepts in the image characteristic information from the common sense characteristic information of each concept. Wherein the associations between concepts may be characterized by a knowledge graph. The knowledge graph is formed by using concepts as nodes and using the relevance between the concepts as edges.
In this implementation, after determining the common sense feature information of each concept, the executing entity may determine, from the common sense feature information, image common sense feature information representing common sense information related to the target information related to the image to be matched and text common sense feature information representing common sense information related to the target information related to the text to be matched.
In some optional implementations of this embodiment, before performing step 202, the executing main body may perform the following operations:
first, logical common sense feature information is generated by a graph convolution network representing logical common sense information.
Secondly, common sense feature information including the logic common sense information and the statistical common sense information is generated through a hypergraph convolution network representing the statistical common sense information on the basis of the logic common sense feature information. In this implementation, the common sense information includes logic common sense information and statistical common sense information. The logic type common sense information is common sense information that can be directly determined in daily life learning, and for example, human beings include men and women. The statistical common sense information is common sense information obtained by performing statistics and analysis on semantic correlations between concepts on the basis of the logical common sense information and further determining correlation information between the concepts. For example, the execution subject may count probabilities of the concept of "man" and the concept of "woman" appearing in various information at the same time, and obtain corresponding statistical common sense information by using the probability value as a weight of an edge between the two concepts.
In order to increase the richness of the statistical common sense information, the execution subject may represent the statistical common sense information in the form of a hypergraph. In the hypergraph, each concept is a node, and a hyperedge can exist between the concept and other concepts, and the hyperedge is determined based on similarity measurement between the concepts characterized by the statistical common sense information.
High-order semantic information represented by the statistical common sense information plays an important role in cross-modal semantic inference between images and texts.
In this implementation, the execution subject may determine the image common sense feature information and the text common sense feature information according to the common sense feature information.
Specifically, the execution subject may determine target information in the image to be matched according to the image feature information; determining target information in the text to be matched according to the text characteristic information; and determining image common sense feature information corresponding to the target information in the image to be matched and text common sense feature information corresponding to the target information in the text to be matched from the common sense feature information. The target information in the image feature information may be various concepts related to the image feature information, and the target information in the text feature information may be various concepts related to the text feature information.
In some optional implementations of this embodiment, the executing body may execute the first step by: firstly, determining a data set corresponding to target information in an image to be matched and target information in a text to be matched; then, initialization vector information of each concept in the data set is input into the graph convolution network, and logical common sense feature information is generated.
As an example, the executing entity may divide various concepts in the corpus to obtain data sets of various classifications, and determine a data set including a concept corresponding to the target information in the image to be matched and a concept corresponding to the target information in the text to be matched as a corresponding data set in the present implementation.
It is to be understood that, in the present embodiment, the common sense feature information generated by the hypergraph convolution network is common sense feature information related to concepts corresponding to the determined data set, on the basis of the logic common sense feature information. And step 203, determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the image common sense characteristic information and the text common sense characteristic information.
In this embodiment, the execution main body may determine the matching degree between the image to be matched and the text to be matched according to the image feature information, the text feature information, the image common sense feature information, and the text common sense feature information.
As an example, the executing entity may fuse image feature information corresponding to the image to be matched and image common sense feature information to obtain a fused image feature in which common sense information related to target information in the image to be matched is fused; fusing text characteristic information corresponding to the text to be matched and text common sense characteristic information to obtain fused text characteristics fused with common sense information related to target information in the text to be matched; and then, according to the fused image features and the fused text features, determining the similarity between the fused image features and the fused text features, and determining the determined similarity as the matching degree between the image to be matched and the text to be matched. Wherein the similarity between the vectors can be determined by determining the distance (e.g., euclidean distance, manhattan distance) between the two.
The matching degree determination process shown in steps 201 to 203 described above may be performed by a matching model. The matching model is obtained by training in the following way: firstly, acquiring a training sample set, wherein training samples in the training sample set comprise sample images, sample texts and labels for representing the sample images and indicating whether the sample texts are matched or not; then, training samples are selected from the training sample set, and image characteristic information and image common sense characteristic information corresponding to sample images in the selected training samples, and text characteristic information and text common sense characteristic information corresponding to sample texts in the selected training samples are determined through the initial matching model; then, determining the sample matching degree of the sample image and the sample text in the selected training sample according to the image characteristic information, the image common sense characteristic information, the text characteristic information and the text common sense characteristic information; and updating the initial image text model based on the target loss between the sample matching degree and the label until a trained matching model is obtained.
On the basis of the self characteristic information of the image to be processed and the text to be matched, the common knowledge information corresponding to the image to be processed and the text to be matched is combined, the problems that the existing matching degree determining method only concerns the information of the image text to the self and ignores the common knowledge, so that the matching degree determining model has good fitting capacity to common data in a training set but has poor generalization capacity on a few rare samples are solved, and the generalization capacity of the matching degree determining model is improved.
In some optional implementations of this embodiment, the executing main body may execute the step 203 by:
firstly, combining the common sense feature information and the image common sense feature information to obtain the combined image common sense feature information.
Secondly, the common sense feature information and the text common sense feature information are combined to obtain the combined common sense feature information.
As an example, the executing entity may perform a hadamard product on the common sense feature information and the image common sense feature information, and perform a hadamard product on the common sense feature information and the text common sense feature information to obtain the combined image common sense feature information and the combined common sense feature information, respectively.
Thirdly, determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the combined image common sense characteristic information and the combined text common sense characteristic information. In some optional implementations of this embodiment, the executing body may execute the third step by: firstly, determining a first matching degree between the image characteristic information and the text characteristic information and a second matching degree between the combined image common sense characteristic information and the combined common sense characteristic information; and then, determining the matching degree between the image to be matched and the text to be matched according to the first matching degree and the second matching degree. For example, the first matching degree and the second matching degree are weighted and averaged, and finally the matching degree between the image to be matched and the text to be matched is obtained.
With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the determination method of the matching degree between image texts according to the present embodiment. In the application scenario of fig. 3, the server first obtains an image to be matched 301 and a text to be matched 302. Then, the server determines image feature information 303 of an image 301 to be matched and text feature information 304 of a text 302 to be matched through a feature extraction network; then, the server determines image common sense feature information 305 representing common sense information related to the target information in the image 301 to be matched and text common sense feature information 306 representing common sense information related to the target information in the text 302 to be matched; according to the image characteristic information 303, the text characteristic information 304, the image common sense characteristic information 305 and the text common sense characteristic information 306, the matching degree between the image to be matched and the text to be matched is determined.
In the method provided by the embodiment of the application, the image characteristic information of the image to be matched and the text characteristic information of the text to be matched are determined; determining image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in an image to be matched, and the text common sense feature information represents common sense information related to the target information in a text to be matched; according to the image feature information, the text feature information, the image common sense feature information and the text common sense feature information, the matching degree between the image to be matched and the text to be matched is determined, so that the method for determining the matching degree between the image texts by combining the image text feature information and the related common sense information is provided, and the generalization capability is improved.
With continuing reference to FIG. 4, a schematic flow chart 400 illustrating one embodiment of a method for determining a degree of match between image text is shown in accordance with the present application including the steps of:
step 401, generating logic type common sense feature information through a graph convolution network representing logic type common sense information.
And 402, generating common sense feature information comprising the logic common sense information and the statistical common sense information through a hypergraph convolution network representing the statistical common sense information on the basis of the logic common sense feature information.
Step 403, determining first characteristic information of the target information in the image to be matched through the target detection network.
And step 404, determining image characteristic information of the image to be matched through the first self-attention network based on the first characteristic information.
And step 405, determining second feature information of the text to be matched through a feature extraction network.
And step 406, determining text characteristic information of the text to be matched through a second self-attention network based on the second characteristic information.
Step 407, determining image common sense feature information and text common sense feature information according to the common sense feature information.
The image common sense feature information represents common sense information related to target information in the image to be matched, and the text common sense feature information represents common sense information related to the target information in the text to be matched.
And step 408, combining the common sense characteristic information and the image common sense characteristic information to obtain the combined image common sense characteristic information.
And step 409, combining the common sense characteristic information and the text common sense characteristic information to obtain combined common sense characteristic information.
And step 410, determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the combined image common sense characteristic information and the combined text common sense characteristic information.
As shown in fig. 5, a specific schematic diagram of the method for determining the matching degree between image texts according to the present embodiment is shown. First, each concept vector instantiated in the corpus 501 passes through the graph convolution network 502 representing logic type common sense information and the hypergraph convolution network 503 representing statistical type common sense information to obtain common sense feature information of each concept, which is obtained by fusing the logic type common sense information and the statistical type common sense information. Then, the image 504 to be matched sequentially passes through the target detection network 505 and the first self-attention network 506, and image feature information corresponding to the image 504 to be matched is obtained. Based on the image characteristic information and the common sense characteristic information of each concept, the first concept prediction model 507 obtains the image common sense characteristic information; and combining the common sense characteristic information and the image common sense characteristic information to obtain the combined image common sense characteristic information. Meanwhile, the text 508 to be matched sequentially passes through the feature extraction network 509 and the second self-attention network 510, and text feature information corresponding to the text 508 to be matched is obtained. Based on the text feature information and the common sense feature information of each concept, the second concept prediction model 511 obtains the text common sense feature information; and combining the common sense characteristic information and the text common sense characteristic information to obtain the combined common sense characteristic information. And finally, determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the combined image common sense characteristic information and the combined text common sense characteristic information.
As can be seen from this embodiment, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for determining the matching degree between image texts in this embodiment specifically describes an obtaining process of feature information, an obtaining process of common sense feature information, and a determining process of the matching degree between image texts, so that the generalization ability and accuracy of the determination of the matching degree are further improved.
With continuing reference to fig. 6, as an implementation of the method shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for determining a matching degree between image texts, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices in particular.
As shown in fig. 6, the apparatus for determining the degree of matching between image texts includes: a first determining unit 601 configured to determine image feature information of an image to be matched and text feature information of a text to be matched; a second determining unit 602 configured to determine image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in an image to be matched, and the text common sense feature information represents common sense information related to the target information in the text to be matched; a third determining unit 603 configured to determine a matching degree between the image to be matched and the text to be matched according to the image feature information, the text feature information, the image common sense feature information, and the text common sense feature information.
In some embodiments, the above apparatus further comprises: a generating unit (not shown in the figure) configured to generate the common sense feature information of logical type through a graph convolution network characterizing the common sense information of logical type; on the basis of the logic type common sense feature information, generating common sense feature information comprising the logic type common sense information and the statistical type common sense information through a hypergraph convolution network representing the statistical type common sense information; and a second determining unit 602, further configured to: and determining the image common sense feature information and the text common sense feature information according to the common sense feature information.
In some embodiments, the third determining unit 603 is further configured to: combining the common sense characteristic information and the image common sense characteristic information to obtain the combined image common sense characteristic information; combining the common sense characteristic information and the text common sense characteristic information to obtain combined common sense characteristic information; and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the combined image common sense characteristic information and the combined text common sense characteristic information.
In some embodiments, the third determining unit 603 is further configured to: determining a first matching degree between the image characteristic information and the text characteristic information and a second matching degree between the combined image common sense characteristic information and the combined present common sense characteristic information; and determining the matching degree between the image to be matched and the text to be matched according to the first matching degree and the second matching degree.
In some embodiments, the second determining unit 602 is further configured to: determining target information in the image to be matched according to the image characteristic information; determining target information in the text to be matched according to the text characteristic information; and determining image common sense feature information corresponding to the target information in the image to be matched and text common sense feature information corresponding to the target information in the text to be matched from the common sense feature information.
In some embodiments, the generating unit (not shown in the figures) is further configured to: determining a data set corresponding to target information in an image to be matched and target information in a text to be matched; and inputting the initialization vector information of each concept in the data set into the graph convolution network to generate logic type common sense characteristic information.
In some embodiments, in a hypergraph characterized by a hypergraph convolutional network, semantic relevance between multiple concepts connected by a hypergraph is characterized by a hypergraph.
In some embodiments, the first determining unit 601 is further configured to: determining first characteristic information of target information in an image to be matched through a target detection network; and determining the image characteristic information of the image to be matched through the first self-attention network based on the first characteristic information.
In some embodiments, the first determining unit 601 is further configured to: determining second characteristic information of the text to be matched through a characteristic extraction network; and determining text characteristic information of the text to be matched through a second self-attention network based on the second characteristic information.
In some embodiments, the first self-attention network and the second self-attention network employ a multi-headed self-attention mechanism.
In this embodiment, a first determining unit in the apparatus for determining the matching degree between image texts is configured to determine image feature information of an image to be matched and text feature information of a text to be matched; the second determining unit is configured to determine image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in the image to be matched, and the text common sense feature information represents common sense information related to the target information in the text to be matched; and the third determining unit is configured to determine the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the image common sense characteristic information and the text common sense characteristic information, so that a device for determining the matching degree between the image texts by combining the image text characteristic information and the related common sense information is provided, and the generalization capability is improved.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing devices of embodiments of the present application (e.g., devices 101, 102, 103, 105 shown in FIG. 1). The apparatus shown in fig. 7 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in fig. 7, the computer system 700 includes a processor (e.g., CPU, central processing unit) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The processor 701, the ROM702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the processor 701, performs the above-described functions defined in the method of the present application.
It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the client computer, partly on the client computer, as a stand-alone software package, partly on the client computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the client computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determination unit, a second determination unit, and a third determination unit. Here, the names of the units do not constitute a limitation to the units themselves in some cases, and for example, the third determination unit may also be described as a "unit that determines the degree of matching between the image to be matched and the text to be matched based on the image feature information, the text feature information, the image common sense feature information, and the text common sense feature information".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the computer device to: determining image characteristic information of an image to be matched and text characteristic information of a text to be matched; determining image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in an image to be matched, and the text common sense feature information represents common sense information related to the target information in a text to be matched; and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the image common sense characteristic information and the text common sense characteristic information.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for determining matching degree between image texts comprises the following steps:
determining image characteristic information of an image to be matched and text characteristic information of a text to be matched;
determining image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in the image to be matched, and the text common sense feature information represents common sense information related to the target information in the text to be matched;
and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the image common sense characteristic information and the text common sense characteristic information.
2. The method of claim 1, wherein prior to determining the image common sense feature information and the text common sense feature information, further comprising:
generating logic type common sense characteristic information through a graph convolution network representing the logic type common sense information;
on the basis of the logic type common sense feature information, generating common sense feature information comprising the logic type common sense information and the statistical type common sense information through a hypergraph convolution network representing the statistical type common sense information;
and
the determining image common sense feature information and text common sense feature information includes:
and determining the image common sense feature information and the text common sense feature information according to the common sense feature information.
3. The method according to claim 2, wherein the determining the matching degree between the image to be matched and the text to be matched according to the image feature information, the text feature information, the image common sense feature information and the text common sense feature information comprises:
combining the common sense characteristic information with the image common sense characteristic information to obtain combined image common sense characteristic information;
combining the common sense characteristic information and the text common sense characteristic information to obtain combined common sense characteristic information;
and determining the matching degree between the image to be matched and the text to be matched according to the image characteristic information, the text characteristic information, the combined image common sense characteristic information and the combined common sense characteristic information.
4. The method according to claim 2, wherein the determining the matching degree between the image to be matched and the text to be matched according to the image feature information, the text feature information, the combined image common sense feature information and the combined common sense feature information comprises:
determining a first matching degree between the image characteristic information and the text characteristic information and a second matching degree between the combined image common sense characteristic information and the combined present common sense characteristic information;
and determining the matching degree between the image to be matched and the text to be matched according to the first matching degree and the second matching degree.
5. The method of claim 2, wherein the determining the image common sense feature information and the text common sense feature information from the common sense feature information comprises:
determining target information in the image to be matched according to the image characteristic information;
determining target information in the text to be matched according to the text characteristic information;
and determining image common sense feature information corresponding to the target information in the image to be matched and text common sense feature information corresponding to the target information in the text to be matched from the common sense feature information.
6. The method of claim 2, wherein the generating of the logical common sense trait information by the graph convolution network characterizing the logical common sense information comprises:
determining a data set corresponding to the target information in the image to be matched and the target information in the text to be matched;
and inputting the initialization vector information of each concept in the data set into the graph convolution network to generate the logic type common sense characteristic information.
7. The method of claim 6, wherein semantic correlations between the concepts connected by the hyperedge are characterized by hyperedges in the hypergraph characterized by the hyperedge convolutional network.
8. The method of claim 1, wherein the determining image feature information of the image to be matched comprises:
determining first characteristic information of target information in the image to be matched through a target detection network;
and determining the image characteristic information of the image to be matched through a first self-attention network based on the first characteristic information.
9. The method of claim 1, wherein determining text feature information of the text to be matched comprises:
determining second characteristic information of the text to be matched through a characteristic extraction network;
and determining the text characteristic information of the text to be matched through a second self-attention network based on the second characteristic information.
10. A determination device of a degree of matching between image texts, comprising:
the image matching device comprises a first determining unit, a second determining unit and a matching unit, wherein the first determining unit is configured to determine image characteristic information of an image to be matched and text characteristic information of a text to be matched;
a second determining unit configured to determine image common sense feature information and text common sense feature information, wherein the image common sense feature information represents common sense information related to target information in the image to be matched, and the text common sense feature information represents common sense information related to the target information in the text to be matched;
a third determining unit configured to determine a matching degree between the image to be matched and the text to be matched according to the image feature information, the text feature information, the image common sense feature information, and the text common sense feature information.
11. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-9.
12. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
CN202110724610.6A 2021-06-29 2021-06-29 Method and device for determining matching degree between image texts Active CN113343664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110724610.6A CN113343664B (en) 2021-06-29 2021-06-29 Method and device for determining matching degree between image texts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110724610.6A CN113343664B (en) 2021-06-29 2021-06-29 Method and device for determining matching degree between image texts

Publications (2)

Publication Number Publication Date
CN113343664A true CN113343664A (en) 2021-09-03
CN113343664B CN113343664B (en) 2023-08-08

Family

ID=77481334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110724610.6A Active CN113343664B (en) 2021-06-29 2021-06-29 Method and device for determining matching degree between image texts

Country Status (1)

Country Link
CN (1) CN113343664B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392389A (en) * 2022-09-01 2022-11-25 北京百度网讯科技有限公司 Cross-modal information matching and processing method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070065040A1 (en) * 2005-09-22 2007-03-22 Konica Minolta Systems Laboratory, Inc. Photo image matching method and apparatus
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN108228686A (en) * 2017-06-15 2018-06-29 北京市商汤科技开发有限公司 It is used to implement the matched method, apparatus of picture and text and electronic equipment
CN108288067A (en) * 2017-09-12 2018-07-17 腾讯科技(深圳)有限公司 Training method, bidirectional research method and the relevant apparatus of image text Matching Model
CN109543690A (en) * 2018-11-27 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for extracting information
WO2020122456A1 (en) * 2018-12-12 2020-06-18 주식회사 인공지능연구원 System and method for matching similarities between images and texts
CN111897950A (en) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN112148839A (en) * 2020-09-29 2020-12-29 北京小米松果电子有限公司 Image-text matching method and device and storage medium
CN112287738A (en) * 2020-04-20 2021-01-29 北京沃东天骏信息技术有限公司 Text matching method and device for graphic control, medium and electronic equipment
WO2021052358A1 (en) * 2019-09-16 2021-03-25 腾讯科技(深圳)有限公司 Image processing method and apparatus, and electronic device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070065040A1 (en) * 2005-09-22 2007-03-22 Konica Minolta Systems Laboratory, Inc. Photo image matching method and apparatus
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN108228686A (en) * 2017-06-15 2018-06-29 北京市商汤科技开发有限公司 It is used to implement the matched method, apparatus of picture and text and electronic equipment
CN108288067A (en) * 2017-09-12 2018-07-17 腾讯科技(深圳)有限公司 Training method, bidirectional research method and the relevant apparatus of image text Matching Model
CN109543690A (en) * 2018-11-27 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for extracting information
WO2020122456A1 (en) * 2018-12-12 2020-06-18 주식회사 인공지능연구원 System and method for matching similarities between images and texts
WO2021052358A1 (en) * 2019-09-16 2021-03-25 腾讯科技(深圳)有限公司 Image processing method and apparatus, and electronic device
CN112287738A (en) * 2020-04-20 2021-01-29 北京沃东天骏信息技术有限公司 Text matching method and device for graphic control, medium and electronic equipment
CN111897950A (en) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN112148839A (en) * 2020-09-29 2020-12-29 北京小米松果电子有限公司 Image-text matching method and device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李志欣 等: "融合两级相似度的跨媒体图像文本检索", 《电子学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392389A (en) * 2022-09-01 2022-11-25 北京百度网讯科技有限公司 Cross-modal information matching and processing method and device, electronic equipment and storage medium
CN115392389B (en) * 2022-09-01 2023-08-29 北京百度网讯科技有限公司 Cross-modal information matching and processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113343664B (en) 2023-08-08

Similar Documents

Publication Publication Date Title
US11507748B2 (en) Method and apparatus for outputting information
CN109460513B (en) Method and apparatus for generating click rate prediction model
CN107679039B (en) Method and device for determining statement intention
CN108427939B (en) Model generation method and device
CN109522483B (en) Method and device for pushing information
CN110489582B (en) Method and device for generating personalized display image and electronic equipment
CN109543058B (en) Method, electronic device, and computer-readable medium for detecting image
US20180068221A1 (en) System and Method of Advising Human Verification of Machine-Annotated Ground Truth - High Entropy Focus
CN107861954B (en) Information output method and device based on artificial intelligence
CN109241286B (en) Method and device for generating text
CN108121699B (en) Method and apparatus for outputting information
CN109582825B (en) Method and apparatus for generating information
CN108228567B (en) Method and device for extracting short names of organizations
CN111339295A (en) Method, apparatus, electronic device and computer readable medium for presenting information
CN107766498B (en) Method and apparatus for generating information
US20230367972A1 (en) Method and apparatus for processing model data, electronic device, and computer readable medium
CN112182255A (en) Method and apparatus for storing media files and for retrieving media files
CN109101956B (en) Method and apparatus for processing image
CN114357195A (en) Knowledge graph-based question-answer pair generation method, device, equipment and medium
CN112446214B (en) Advertisement keyword generation method, device, equipment and storage medium
CN113343664B (en) Method and device for determining matching degree between image texts
CN113392200A (en) Recommendation method and device based on user learning behaviors
CN111225009B (en) Method and device for generating information
CN112348615A (en) Method and device for auditing information
CN111666405A (en) Method and device for recognizing text implication relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100176 601, 6th floor, building 2, No. 18, Kechuang 11th Street, Daxing Economic and Technological Development Zone, Beijing

Applicant after: Jingdong Technology Information Technology Co.,Ltd.

Address before: 100176 601, 6th floor, building 2, No. 18, Kechuang 11th Street, Daxing Economic and Technological Development Zone, Beijing

Applicant before: Jingdong Shuke Haiyi Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant