CN115563573A - Information detection method based on modal dynamic feature fusion and cross-modal relationship extraction - Google Patents
Information detection method based on modal dynamic feature fusion and cross-modal relationship extraction Download PDFInfo
- Publication number
- CN115563573A CN115563573A CN202210974704.3A CN202210974704A CN115563573A CN 115563573 A CN115563573 A CN 115563573A CN 202210974704 A CN202210974704 A CN 202210974704A CN 115563573 A CN115563573 A CN 115563573A
- Authority
- CN
- China
- Prior art keywords
- feature
- modal
- text
- user
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an information detection method based on modal dynamic feature fusion and cross-modal relationship extraction, which comprises the following steps: multi-modal feature extractor extracts text features r t Image feature r v And user characteristics r u (ii) a The cross-modal relationship extractor pairs the text features r according to the associations between the modalities t Image feature r v And user characteristics r u Updating to obtain enhanced text features u t Enhancing image feature u v And enhanced user features u u (ii) a The multimodal feature fuser receives text features r t Image feature r v User characteristics r u Enhanced text feature u t Enhancing image features u v And enhanced user features u u And obtaining the multi-modal fusion characteristics a through dynamic allocation of a dynamic routing mechanism N (ii) a The classifier receives the multi-modal fused features a N And outputs the prediction result. The information detection method based on modal dynamic feature fusion and cross-modal relationship extraction realizes rumor detection with higher precision by constructing the cross-modal relationship and dynamic feature fusion.
Description
Technical Field
The invention relates to an information detection method based on modal dynamic feature fusion and cross-modal relationship extraction.
Background
Currently, rumor detection methods include machine learning methods and deep learning methods. Wherein, the traditional machine learning method uses a propagation tree and a propagation tree kernel to model microblog to detect rumors. Rumor two classification was performed using supervised learning using n-grams and bag-of-words models. For deep learning methods, a Recurrent Neural Network (RNN) is employed to capture changes in context information to differentiate rumors. The higher-performance rumor detection model is realized by mutual confrontation of a text generator (generator) and a discriminator (discriminator). Compared with a machine learning model, the existing deep learning model has more excellent feature extraction capability, so that the performance of the existing deep learning model is stronger. However, in the case of rumors with diverse forms of pictures, texts, etc., the existing deep learning methods still need further exploration.
Disclosure of Invention
The invention provides an information detection method based on modal dynamic feature fusion and cross-modal relationship extraction, which solves the above mentioned technical problems and specifically adopts the following technical scheme:
an information detection method based on modal dynamic feature fusion and cross-modal relationship extraction comprises the following steps:
the multi-mode feature extractor receives information to be detected containing text information, image information and user information and extracts text features r from the text information, the image information and the user information respectively t Image feature r v And user characteristics r u ;
Cross-modal relationship extractor receives text features r t Image feature r v And user characteristics r u Establishing the association among the modals, and aligning the text features r according to the association among the modals t Image feature r v And user characteristics r u Updating to obtain enhanced text features u t Enhancing image features u v And enhanced user features u u ;
The multimodal feature fuser receives text features r t Image feature r v User characteristics r u Enhanced text feature u t Enhancing image feature u v And enhanced user features u u And dynamically allocating the weight coefficient of each modal characteristic through a dynamic routing mechanism, and obtaining a multi-modal fusion characteristic a after multiple iterations N ;
The classifier receives a multi-modal fused feature a N And outputs the prediction result.
Further, the multi-modal feature extractor includes a text feature extractor, an image feature extractor, and a user feature extractor.
Further, the text feature extractor comprises a BERT model and a full connection layer;
the text feature extractor extracts a text feature r from the text information t The specific method comprises the following steps:
performing one-of-N encoding and expansion on the text information, expanding the length of a sentence to 512 to obtain an encoding vector E, and inputting the encoding vector E into a BERT model to obtain an output matrix B = [ B ] [CLS] ,b 1 ,...,b n ,b |text| ,...,b 510 ] T Wherein b is [CLS] Representing all semantic information in the text information;
b is to be [CLS] Inputting the full connection layer of the text feature extractor to obtain the text feature r by the following calculation t ,
r t =W tf ·b [CLS]
Wherein, W tf Representing text feature informationAnd taking the weight matrix of the full connection layer of the device.
Further, the image feature extractor comprises a VGG19 network and a full connection layer;
the image feature extractor extracts an image feature r from the image information v The specific method comprises the following steps:
inputting image information into VGG19 network to obtain image feature representation r VGG ,
Characterizing an image r VGG The image characteristic r is obtained by performing the following calculation on the full connection layer of the input image characteristic extractor v ,
r v =W vf ·r VGG
Wherein, W vf A weight matrix representing the fully connected layers of the image feature extractor.
Further, the user feature extractor extracts the user features of the user information by using a method of combining the manual features with the deep learning model.
Further, the user feature extractor comprises a full connectivity layer;
the user feature extractor extracts user features r from the user information u The specific method comprises the following steps:
coding user information through manual characteristics to obtain manual characteristics r raw ,
Will code good manual characteristics r raw Inputting the full connection layer of the user feature extractor to obtain the user feature r by the following calculation u ,r u =W uf ·r raw
Wherein, W uf The weight matrix of the fully connected layer of the user feature extractor.
Further, the cross-modal relationship extractor comprises three fully connected layers and a cross-modal function module;
cross-modal relationship extractor for text features r t Image feature r v And user characteristics r u Updating to obtain enhanced text features u t Enhancing image feature u v And enhanced user features u u The specific method comprises the following steps:
text feature r t Image feature r v And user characteristics r u Composition multimodal features R = [ R ] t ,r v ,r u ] T ,
Inputting the multi-modal characteristics R into three full connection layers of a cross-modal relationship extractor, and respectively generating a key characteristic matrix K through the following calculation R Query feature matrix Q R Sum value feature matrix V R ,
Wherein, W K ,W Q And W V Respectively a parameter matrix of three fully-connected layers,
and calculating the enhanced text characteristic u by the following formula t Enhancing image feature u v And enhanced user features u u ,
Further, the multi-modal feature fuser calculates multi-modal fused features a N The specific method comprises the following steps:
text feature r t Image feature r v User characteristics r u Enhanced text feature u t Enhancing image feature u v And enhanced user features u u Respectively inputting six full-connection layers to obtain six eigenvectors through the following calculationAnd
wherein the content of the first and second substances,andrespectively a parameter matrix of six fully-connected layers,
the multi-modal feature fuser receives the six feature vectorsAndand dynamically allocating the weight coefficient of each modal characteristic through a dynamic routing mechanism, and finally obtaining the multi-modal fusion characteristic a after multiple iterations N 。
Further, the classifier receives the multi-modal fused feature a N The specific method for outputting the prediction result comprises the following steps:
Wherein, W p1 And W p2 Is a learnable parameter matrix, b p1 、b p1 Is an offset term, sigmoid and LeakyReLU are activation functions.
Further, the model is optimized by minimizing the cross entropy, the loss function is defined as follows,
wherein, Θ represents all learnable parameters of the whole neural network, y represents the label of the information to be detected, rumor is 1, and nonrumor is 0.
The method has the advantages that the modal dynamic feature fusion and cross-modal relationship extraction-based information detection method realizes rumor detection with higher precision by constructing the cross-modal relationship and the dynamic feature fusion.
Drawings
FIG. 1 is a schematic diagram of a prediction model DFCM of the present invention;
FIG. 2 is a schematic diagram of an information detection method based on modal dynamic feature fusion and cross-modal relationship extraction according to the present invention;
FIG. 3 is a schematic diagram of a text feature extractor of the present invention;
FIG. 4 is a schematic diagram of an image feature extractor of the present invention;
FIG. 5 is a schematic diagram of a cross-modal relationship extractor of the present invention;
FIG. 6 is a schematic diagram of the multi-modal feature fuser of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and the embodiments.
FIG. 1 shows a prediction model DFC disclosed in the present applicationAnd M, which comprises a multi-modal feature extractor, a cross-modal relationship extractor, a multi-modal feature fusion device and a classifier. Fig. 2 shows an information detection method based on modal dynamic feature fusion and cross-modal relationship extraction, which is implemented based on a prediction model DFCM. Specifically, the information detection method based on modal dynamic feature fusion and cross-modal relationship extraction comprises the following steps: s1: the multi-modal feature extractor receives information to be detected including text information, image information and user information and extracts text features r from the text information, the image information and the user information respectively t Image feature r v And user characteristics r u . S2: cross-modal relationship extractor receives text features r t Image feature r v And user characteristics r u Establishing the association among the modals, and aligning the text features r according to the association among the modals t Image feature r v And user characteristics r u Updating to obtain enhanced text features u t Enhancing image features u v And enhanced user features u u . S3: the multimodal feature fuser receives text features r t Image feature r v User characteristics r u Enhanced text feature u t Enhancing image feature u v And enhanced user features u u And dynamically allocating the weight coefficient of each modal characteristic through a dynamic routing mechanism, and obtaining the multi-modal fusion characteristic a after multiple iterations N . S4: the classifier receives a multi-modal fused feature a N And outputs the prediction result. Through the steps, the modal dynamic feature fusion and cross-modal relationship extraction-based information detection method achieves rumor detection with higher precision by constructing the cross-modal relationship and dynamic feature fusion. The above steps are specifically described below.
For step S1: the multi-modal feature extractor receives information to be detected including text information, image information and user information and extracts text features r from the text information, the image information and the user information respectively t Image feature r v And user characteristics r u 。
The prediction model DFCM is mainly used for predicting whether a post d in a forum or a microblog is a rumor. Therefore, the information to be detected is a post d, and the post d specifically includes text information d.text, image information d.image and user information d.user. First, the multimodal feature extractor extracts feature information from the post d.
In particular, the multimodal feature extractor includes a text feature extractor, an image feature extractor, and a user feature extractor.
As shown in fig. 3, the text feature extractor includes a BERT model and a fully connected layer.
The text feature extractor extracts a text feature r from the text information t The specific method comprises the following steps:
text information d.text is one-of-N encoded and expanded, and the length of the sentence is expanded to 512. This length is the input length limit of the BERT model, resulting in the code vector E. E = [ E = [CLS] ,e 1 ,...,e |text| ,e [SEP] ,...,e 510 ]Where e represents a word, | text | represents the length of the input text, [ CLS [ ]]Is a sentence start identifier, [ SEP]Is a sentence end identifier, [ SEP]Followed by an extension portion.
Text feature r for effectively extracting text information d t The present application employs a pre-trained BERT model. The BERT model is a multi-layer bi-directional transform encoder. Inputting the encoding vector E into a BERT model to obtain an output matrix B = [ B = [) [CLS] ,b 1 ,...,b n ,b |text| ,...,b 510 ] T Wherein b is [CLS] Representing all semantic information in the text information.Wherein d is B Is the output dimension of the bi-directional transcoder model.
B is to [CLS] Inputting the full connection layer of the text feature extractor to obtain the text feature r by the following calculation t ,
r t =W tf ·b [CLS]
Wherein, W tf Representing text feature extractorThe weight matrix of the connection layer.d m To hide layer dimensions, the features
As shown in fig. 4, the image feature extractor contains a VGG19 network and a full connectivity layer. Image characteristic r of image information d.image is extracted effectively v According to the method, the pre-trained VGG19 network is adopted to firstly extract the object information in the picture, and a full connection layer (visual-fc) is added to the last layer of the VGG19 network, so that on one hand, the size of the image feature can be adjusted, the image feature dimension and the text feature dimension are unified, and preparation is made for multi-modal feature fusion. Since VGG19 was not re-fitted during training, the fully connected layers could further extract features in the pictures relevant to rumor detection.
The image feature extractor extracts image features r from the image information v The specific method comprises the following steps:
inputting image information d.image into VGG19 network to obtain image feature representation r VGG ,d v Is the output dimension of the VGG19 network.
Characterizing an image r VGG The image characteristic r is obtained by performing the following calculation on the full connection layer of the input image characteristic extractor v ,
r v =W vf ·r VGG
Wherein, W vf A weight matrix representing the fully connected layers of the image feature extractor.Wherein d is m Is the hidden layer dimension.
In the application, the user feature extractor extracts the user features of the user information by adopting a method of combining manual features and a deep learning model. The manual characteristics of the user information d.user are shown in table 1.
The user feature extractor includes a fully connected layer.
The user feature extractor extracts user features r from the user information u The specific method comprises the following steps:
user information d.user is coded through manual characteristics to obtain manual characteristics r raw ,Wherein d is u Is a manual feature dimension.
Will encode good manual characteristics r raw Inputting the full connection layer of the user characteristic extractor to obtain the user characteristic r by the following calculation u ,
r u =W uf ·r raw
for step S2: the cross-modal relationship extractor receives a text feature r t Image feature r v And user characteristics r u Establishing the association among the modalities, and aligning the text characteristics r according to the association among the modalities t Image feature r v And user characteristics r u Updating to obtain enhanced text features u t Enhancing image features u v And enhanced user features u u 。
As shown in fig. 5, the cross-modal relationship extractor of the present application comprises three fully connected layers and one crossstacking function module.
Cross-modal relationship extractor pair text features r t Image feature r v And use ofCharacteristic of house r u Updating to obtain enhanced text features u t Enhancing image features u v And enhanced user features u u The specific method comprises the following steps:
text feature r t Image feature r v And user characteristics r u Composition multimodal features R = [ R ] t ,r v ,r u ] T ,
Inputting the multi-modal characteristics R into three full-connection layers of a cross-modal relationship extractor, and respectively generating a key characteristic matrix K through the following calculation R Query feature matrix Q R Sum value feature matrix V R ,
Wherein, W K ,W Q And W V Parameter matrices, K, of three fully-connected layers, respectively R The matrix being used to match the characteristics of other modalities, Q R The matrix is used to wait for the matching of other modal characteristics, V R The matrix serves as the values to be summed.
Then, a cross-modal relationship between the modalities is established, and a key feature matrix K is matched R And query feature matrix Q R The similarity matrix between different modalities is calculated by scaling the dot product, and the calculation process is as follows.
d m Is W K Plays a role of scaling. For simplicity of derivation, the softmax and scaling functions in the above equation are omitted and the equation can be extended as follows:
for the above formula, the text feature r is used t For illustration, first, a text feature r is described t Will pass throughQuery vector q t Calculating a text feature r t Similarity with all modal characteristics, then weighting and summing the similarity as weight, and finally calculating to obtain updated enhanced text characteristics u t . In the process, the updating process of other modes is similar, namely, a cross-mode relation is established.
Finally, the enhanced text feature u is obtained t Enhancing image features u v And enhanced user features u u The result can be expressed as U = [ U ] t ,u v ,u u ] T Wherein
For step S3: the multimodal feature fuser receives text features r t Image feature r v User characteristics r u Enhanced text feature u t Enhancing image feature u v And enhanced user features u u And dynamically allocating the weight coefficient of each modal characteristic through a dynamic routing mechanism, and obtaining the multi-modal fusion characteristic a after multiple iterations N 。
As shown in FIG. 6, the multi-modal feature fuser calculates multi-modal fused features a N The specific method comprises the following steps:
text feature r t Image feature r v User characteristics r u Enhanced text feature u t Enhancing image feature u v And enhanced user features u u Respectively inputting six full-connection layers, and obtaining six eigenvectors through the following calculationAnd
wherein the content of the first and second substances,andthe parameter matrixes are six full-connection layers respectively, and the multi-mode feature fusion device receives the six feature vectorsAndand dynamically allocating the weight coefficient of each modal characteristic through a dynamic routing mechanism, and finally obtaining the multi-modal fusion characteristic a after multiple iterations N 。
The dynamic routing mechanism of the multi-modal feature fusion device of the present application adopts the dynamic routing method proposed by Sabour.
Dynamic routing is a mechanism for outputting vectors from capsules, and powerful dynamic routing mechanisms can ensure that the output of a capsule is sent to the appropriate parent in the upper layer.
In a fully connected neural network, neurons can be calculated by the following formula:
W ij the resulting parameter matrix is trained using a back propagation algorithm through a global function. Iterative dynamic routing provides an alternative method of calculating how a capsule is activated by using attributes of local features. Such a method allows better and simpler combination of the inputs to form an analytic tree with lower risk to the wind. In dynamic routing the output is routed to all possible parent nodes but scaled down by adding to a 1 coupling factor. For each possible parent node, each parent node is multiplied by a weight matrix in each iteration round to compute a "prediction vector". If the prediction vector has a large scalar product with a possible parent output, there is a top-down feedback, increasing the coupling coefficient of the node and decreasing the coupling coefficients of the other nodes. This type of "protocol routing" can be much more efficient than the very primitive form of routing that is implemented for maximum pooling.
Specifically, the process of dynamic routing iteration of the present application is as follows,
where N is the iteration round, b r The "square" function reduces the short vectors to a length close to zero and the long vectors to a length slightly less than 1 without changing the vector direction. In each iteration turn of the dynamic routing, a prediction vector 'a' is obtained by weighted summation and a 'square' function r If the feature vector j is input i And "prediction vector" a r With a larger dot product (i.e., more similar), the next iteration round increases the coupling coefficient c of the feature vector i Reducing the coupling coefficient of other eigenvectorsAnd (4) counting. Eventually, the output of the dynamic routing will tend to be the most prominent modal features, and other modal features are also fused. The result is the feature fusion result after the iteration output of the N rounds of dynamic routing
For step S4: the classifier receives a multi-modal fused feature a N And outputs the prediction result.
Further, the feature a is fused N Inputting the predicted post d into the classifier to obtain the predicted result of the post d. The classifier outputs a prediction probability that a post is rumor
Wherein, W p1 And W p2 Is a learnable parameter matrix.d p Is the dimension of the classifier. b is a mixture of p1 、b p1 Is an offset term and sigmoid and LeakyReLU are activation functions.
Further, the model is optimized by minimizing the cross entropy, the loss function is defined as,
wherein, Θ represents all learnable parameters of the whole neural network, y represents the label of the information to be detected, rumor is 1, and nonrumor is 0.
The microblog data sets were tested by different prediction models, and the results are shown in table 2. The accuracy, precision, recall and F1 score of the model DFCM proposed herein are 86.42%, 84.26%, 90.61% and 87.32% respectively, which are superior to other models.
Table 2 experimental results of different models on microblog data sets
The twitter data sets were subjected to experiments with different predictive models and the results are shown in table 3. The accuracy, precision, recall and F1 score of the model DFCM proposed herein are 88.64%, 91.93%, 91.01% and 91.47%, respectively, which are also superior to other models.
Table 3 experimental results of different models on twitter data sets
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalents or equivalent changes fall within the protection scope of the present invention.
Claims (10)
1. An information detection method based on modal dynamic feature fusion and cross-modal relationship extraction is characterized by comprising the following steps:
the multi-modal feature extractor receives information to be detected comprising text information, image information and user information and extracts text features r from the text information, the image information and the user information respectively t Image feature r v And user characteristics r u ;
Receiving the text feature r by the cross-modal relation extractor t The image feature r v And said user characteristic r u Establishing the association among the modes, and aligning the text features r according to the association among the modes t The image feature r v And said user characteristic r u Updating to obtain enhanced text features u t Enhancing image features u v And increaseStrong user characteristics u u ;
A multi-modal feature fuser receives the text features r t The image feature r v The user characteristic r u The enhanced text feature u t The enhanced image feature u v And said enhanced user features u u And dynamically allocating the weight coefficient of each modal characteristic through a dynamic routing mechanism, and obtaining the multi-modal fusion characteristic a after multiple iterations N ;
The classifier receives the multi-modal fusion feature a N And outputs the prediction result.
2. The method for detecting information based on modal dynamic feature fusion and cross-modal relationship extraction according to claim 1,
the multi-modal feature extractor includes a text feature extractor, an image feature extractor, and a user feature extractor.
3. The method for detecting information based on modal dynamic feature fusion and cross-modal relationship extraction according to claim 2,
the text feature extractor comprises a BERT model and a full connection layer;
the text feature extractor extracts the text feature r from the text information t The specific method comprises the following steps:
performing one-of-N coding and expansion on the text information, expanding the length of a sentence to 512 to obtain a coding vector E, and inputting the coding vector E into a BERT model to obtain an output matrix B = [ B ] [CLS] ,b 1 ,...,b n ,b |text| ,...,b 510 ] T Wherein b is [CLS] Representing all semantic information in the text information;
b is to be [CLS] Inputting the full connection layer of the text feature extractor to obtain the text feature r by the following calculation t ,
r t =W tf ·b [CLS]
Wherein,W tf A weight matrix representing a fully connected layer of the text feature extractor.
4. The method according to claim 3, wherein the information detection method based on modal dynamic feature fusion and cross-modal relationship extraction,
the image feature extractor comprises a VGG19 network and a fully connected layer;
the image feature extractor extracts the image feature r from the image information v The specific method comprises the following steps:
inputting the image information into a VGG19 network to obtain an image feature representation r VGG ,
Representing the image features r VGG Inputting the image feature r to the full-connection layer of the image feature extractor to obtain the image feature r by the following calculation v ,
r v =W vf ·r VGG
Wherein, W vf A weight matrix representing a fully connected layer of the image feature extractor.
5. The method according to claim 4, wherein the information detection method based on modal dynamic feature fusion and cross-modal relationship extraction,
the user feature extractor extracts the user features of the user information by adopting a method of combining manual features and a deep learning model.
6. The method according to claim 5, wherein the information extraction method based on modal dynamic feature fusion and cross-modal relationship extraction,
the user feature extractor comprises a full connectivity layer;
the user feature extractor extracts the user feature r from the user information u The specific method comprises the following steps:
coding the user information through manual characteristics to obtain manual characteristics r raw ,
The manual features r to be coded raw Inputting the full connection layer of the user feature extractor to perform the following calculation to obtain the user feature r u ,
r u =W uf ·r raw
Wherein, W uf A weight matrix for a fully connected layer of the user feature extractor.
7. The method according to claim 6, wherein the information extracted by the modal-based dynamic feature fusion and cross-modal relationship is extracted,
the cross-modal relationship extractor comprises three fully connected layers and a cross filter function module;
the cross-modal relationship extractor pair the text features r t The image feature r v And said user characteristic r u Updating to obtain enhanced text features u t Enhancing image feature u v And enhanced user features u u The specific method comprises the following steps:
the text characteristic r is combined t The image feature r v And said user characteristic r u Composition multimodal features R = [ R ] t ,r v ,r u ] T ,
Inputting the multi-modal characteristics R into three full-connection layers of the cross-modal relationship extractor, and respectively generating a key characteristic matrix K through the following calculation R Query feature matrix Q R Sum value feature matrix V R ,
Wherein, W K ,W Q And W V Respectively a parameter matrix of three fully-connected layers,
and calculating the enhanced text characteristic u by the following formula t Enhancing image features u v And enhanced user features u u ,
8. The method according to claim 7, wherein the information extracted by the modal-based dynamic feature fusion and cross-modal relationship is extracted,
the multi-modal feature fuser calculates the multi-modal fused features a N The specific method comprises the following steps:
the text characteristic r is combined t The image feature r v The user characteristic r u The enhanced text feature u t The enhanced image feature u v And said enhanced user features u u Respectively inputting six full-connection layers to obtain six eigenvectors through the following calculationAnd
wherein the content of the first and second substances,andrespectively, a parameter matrix of six fully-connected layers,
9. The method for information detection based on modal dynamic feature fusion and cross-modal relationship extraction according to claim 8,
the classifier receives the multi-modal fusion feature a N The specific method for outputting the prediction result comprises the following steps:
Wherein, W p1 And W p2 Is a learnable parameter matrix, b p1 、b p1 Is an offset term, sigmoid and LeakyReLU are activation functions.
10. The method according to claim 9, wherein the information extraction method based on modal dynamic feature fusion and cross-modal relationship extraction,
the model is optimized by minimizing the cross entropy, the loss function is defined as,
wherein Θ represents all learnable parameters of the whole neural network, y represents the label of the information to be detected, and the rumor is 1 and the nonrumor is 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210974704.3A CN115563573A (en) | 2022-08-15 | 2022-08-15 | Information detection method based on modal dynamic feature fusion and cross-modal relationship extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210974704.3A CN115563573A (en) | 2022-08-15 | 2022-08-15 | Information detection method based on modal dynamic feature fusion and cross-modal relationship extraction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115563573A true CN115563573A (en) | 2023-01-03 |
Family
ID=84739538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210974704.3A Pending CN115563573A (en) | 2022-08-15 | 2022-08-15 | Information detection method based on modal dynamic feature fusion and cross-modal relationship extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563573A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876941A (en) * | 2024-03-08 | 2024-04-12 | 杭州阿里云飞天信息技术有限公司 | Target multi-mode model system, construction method, video processing model training method and video processing method |
-
2022
- 2022-08-15 CN CN202210974704.3A patent/CN115563573A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876941A (en) * | 2024-03-08 | 2024-04-12 | 杭州阿里云飞天信息技术有限公司 | Target multi-mode model system, construction method, video processing model training method and video processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111291212B (en) | Zero sample sketch image retrieval method and system based on graph convolution neural network | |
CN111985245B (en) | Relationship extraction method and system based on attention cycle gating graph convolution network | |
CN110083705B (en) | Multi-hop attention depth model, method, storage medium and terminal for target emotion classification | |
CN110222140B (en) | Cross-modal retrieval method based on counterstudy and asymmetric hash | |
CN108875807B (en) | Image description method based on multiple attention and multiple scales | |
CN111897913B (en) | Semantic tree enhancement based cross-modal retrieval method for searching video from complex text | |
CN111309971B (en) | Multi-level coding-based text-to-video cross-modal retrieval method | |
WO2020107878A1 (en) | Method and apparatus for generating text summary, computer device and storage medium | |
CN111274398B (en) | Method and system for analyzing comment emotion of aspect-level user product | |
WO2023280065A1 (en) | Image reconstruction method and apparatus for cross-modal communication system | |
CN111061843A (en) | Knowledge graph guided false news detection method | |
CN111274375B (en) | Multi-turn dialogue method and system based on bidirectional GRU network | |
CN116204674B (en) | Image description method based on visual concept word association structural modeling | |
CN111159485A (en) | Tail entity linking method, device, server and storage medium | |
CN111444367A (en) | Image title generation method based on global and local attention mechanism | |
CN108256968A (en) | A kind of electric business platform commodity comment of experts generation method | |
CN113806609A (en) | Multi-modal emotion analysis method based on MIT and FSM | |
Mao et al. | Chinese sign language recognition with sequence to sequence learning | |
CN115796182A (en) | Multi-modal named entity recognition method based on entity-level cross-modal interaction | |
CN112487200A (en) | Improved deep recommendation method containing multi-side information and multi-task learning | |
CN114942998B (en) | Knowledge graph neighborhood structure sparse entity alignment method integrating multi-source data | |
CN115563573A (en) | Information detection method based on modal dynamic feature fusion and cross-modal relationship extraction | |
JP7181999B2 (en) | SEARCH METHOD AND SEARCH DEVICE, STORAGE MEDIUM | |
CN114004220A (en) | Text emotion reason identification method based on CPC-ANN | |
CN116933051A (en) | Multi-mode emotion recognition method and system for modal missing scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |