CN117892217A

CN117892217A - Causal inference-based public number push text multi-mode question text disagreement judging method and system

Info

Publication number: CN117892217A
Application number: CN202311769196.6A
Authority: CN
Inventors: 余建兴; 王世祺; 张宇锋; 朱怀杰; 刘威; 印鉴
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-04-16

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a causal inference-based public number push text multi-mode question text disagreement judging method and system, wherein the method comprises the following specific steps: acquiring a training text and extracting multi-modal characteristics; decoupling the invariant factor and the variable factor from the multimodal feature; decoupling the variable factors into decoy set ways reflecting that the questions do not conform to the push under different conditions and causal factors of the writing style and irrelevant factors containing false association bias based on a contrast learning strategy; fusing unchanged factors and causal factors; constructing a classifier; obtaining additional training push messages and enhancing data, and training a classifier through the enhanced training push messages; identifying the title and the text of the push text through a trained classifier; judging whether the questions are inconsistent. The invention solves the problem that the prior art lacks modeling of false association bias, so that causal relation information hidden in the text feature cannot be accurately discovered, and has the characteristic of low demand for scarce labeling resources.

Description

Causal inference-based public number push text multi-mode question text disagreement judging method and system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a system for judging multi-mode thematic disagreement of public number push text based on causal inference.

Background

With the popularization of the internet and the progress of mobile communication technology, more and more users tend to publish and share tweets on social platforms such as WeChat public numbers. The click quantity of the push text can be changed in a mode of putting advertisements, so that commercial benefits are pursued on one side by some authors, and the low-quality push text is silvered by using titles and cover pictures containing elements such as exaggeration, emotion, frightening, strangeness, distortion and the like, so that the push text such as advertisements, fraud and the like is dubbed by a decoy reader to click irrelevant bad push text. In recent years, a great deal of text with inconsistent questions emerges, and the situation of flooding is presented. If the read-only title does not read the whole text, readers are easily misled, even corrupted, the distrust of various information in the society is increased, and the social knowledge communication propagation is destroyed. In some very sensitive fields, such as law and medicine, poor text-to-text is even a direct loss. That is, incorrect legal knowledge may lead to illegal crimes of users, and false medical science misleading people to deal with diseases by themselves and cause problems to damage health. In general, such low quality tweets cause significant confusion and resource waste for readers, society, and social platforms. The daily push message release amount of the WeChat public number is numerous, and the WeChat public number cannot be fully manually checked, so that the research and development of the automatic machine discrimination technology aiming at the problem text inconsistent push message has important significance and value in the aspects of purifying network environment, cleaning information transmission channels, recovering public number platform public trust and the like.

The types of the text-disagreeable text are various and cover various modes and combinations thereof, such as a false title without midwifery, i.e. kneading and fact-disagreement; the pictures and texts are not in character, namely, the cover pictures and text contents which are not associated are spliced; the bloodshot shape uses words and pictures containing popular elements such as hunting, obscene, bloody smell and the like as the titles and covers of the push text. However, the conventional method focuses on single-class features, and cannot comprehensively analyze the combination characteristics of the information of each mode of the text. These methods can be divided into two categories, including social behavior-based discrimination methods and content quality-based discrimination methods. A push with decoy properties will have a large amount of reading and forwarding in a short time, but the reading time is short due to its poor content quality. Therefore, the method based on the propagation information is distinguished by comprehensively analyzing social behavior metadata such as comments, reading quantity, forwarding quantity, collection quantity and the like. Such methods require gathering a sufficient amount of reader feedback to make the judgment, but such feedback is often delayed, and a significant portion of the readers will not even produce social traces such as comments, forwarding, collection, etc. Therefore, the text with inconsistent questions can be detected only after extensive spreading and massive readers are involved, and the aim of blocking the transmission of low-quality text in time cannot be achieved. In contrast, the content quality-based discrimination method focuses on language patterns of the text analysis from angles such as decoy vocabulary, grammar, subjectivity, punctuation marks, content consistency and the like, and can find text with inconsistent questions in the auditing stage of the public number platform. Early work performed decisions based on manually set rules, which are highly dependent on expert knowledge and lack of extensibility. With the development of deep learning, recent research has turned to capturing the associated features between the title and the text of a push by means of a neural network. However, the learned features are often confounded with false correlation biases and do not effectively characterize intrinsic factors that lead to decoy behavior, which results in model robustness and insufficient generalization of the new variants.

The prior art has a method and a device for judging the consistency between the file content and the title, wherein the method for judging the consistency between the file content and the title comprises the following steps: A. searching at least one candidate website by using the title of the target file to obtain a candidate file with the same type as the target file; B. clustering the target files and the candidate files based on the similarity between the contents; C. determining an optimal cluster in the clustering result; D. and when the target file does not belong to the optimal class cluster, determining that the content of the target file is inconsistent with the title, otherwise, determining that the content of the target file is consistent with the title.

However, the prior art has the problem of lacking modeling of false association bias, so that causal relation information hidden in the text-pushing characteristics cannot be accurately discovered; therefore, how to invent a multi-mode problem text disagreement judging method and a system which can model false association bias and accurately discover causal relation information hidden in text pushing characteristics is a technical problem to be solved in the technical field.

Disclosure of Invention

The invention provides a causal inference-based method and a causal inference-based system for judging multi-mode thematic mismatching of public number inference, which are characterized by low demand for scarce labeling resources, in order to solve the problem that the prior art lacks modeling of false association bias, so that causal relationship information hidden in the inference features cannot be accurately discovered.

In order to achieve the above purpose of the present invention, the following technical scheme is adopted:

the method for judging the multi-mode thematic disagreement of the public number push text based on causal inference comprises the following specific steps:

acquiring a training text and extracting multi-modal characteristics;

by dividing training scripts into different scenes, simulating the influence caused by false association bias, and decoupling invariant factors with discriminant under different scenes and variable factors describing specific scene effects from the multi-modal characteristics;

decoupling the variable factors into decoy set ways reflecting that the questions do not conform to the push under different conditions and causal factors of the writing style and irrelevant factors containing false association bias based on a contrast learning strategy;

fusing unchanged factors and causal factors;

constructing a classifier according to the fused factors; obtaining additional training push messages and enhancing data, and training a classifier through the enhanced training push messages;

identifying the title and the text of the push text through a trained classifier; judging whether the questions are inconsistent.

Preferably, the multi-modal features include visual features, text features, cross-modal matching features, linguistic features, and statistical features.

Further, the method for extracting the multi-modal characteristics of the text comprises the following specific steps:

The method comprises the steps of taking a Swin transducer pre-training model as a basic framework, and extracting visual features of a text through a self-attention mechanism based on a sliding window;

word segmentation operation is carried out on the title and the text of the push text, embedded representation of each word is obtained, and the embedded representation is input into a BERT pre-training model to obtain text characteristics;

the CT transducer pre-training model is adopted to perform cross-modal matching on the extracted visual features and the text features, so that cross-modal matching features are obtained;

language features are based on cross-modal matching features, including text and thumbnail consistency features, text and title consistency features, thumbnail and title consistency features, and title emotion polarity features; the text and thumbnail consistency characteristics are obtained by extracting fusion characteristics of text texts and thumbnails of the push text through a pre-training model CLIP; the text and title consistency characteristic is obtained by inputting the text and the title into a Siamese network; the consistency characteristics of the thumbnail and the title are obtained by inputting the thumbnail of the push text into a text generator through a pre-training text generator to generate a new title and calculating the cosine similarity of the BERT characteristics of the new title and the original title; the title emotion polarity characteristics are obtained by inputting the title of the push text into an emotion classifier;

The statistical features comprise lexical statistical features, common word statistical features and author portrayal statistical features; the lexical statistical features are obtained by recording punctuation marks, expression signs, pronouns, positive words and fuzzy words appearing in titles of the push texts; the common word statistical characteristics are obtained by recording network words, sensitive words, public character names and place names appearing in the titles; the author portrayal statistical features are obtained by recording account ages, nicknames, the number of paid attention to, the number of push publications, the number of low-quality push publications in past history, the time from the first push publication to the present and the time from the last push publication;

splicing visual features, text features, cross-modal matching features, language features and statistical features to obtain multi-dimensional multi-modal features x _i 。

Furthermore, the influence caused by false correlation bias is simulated, the invariant factors with discriminant under different scenes and the variable factors describing specific scene effects are decoupled from the multi-mode characteristics, and the specific steps are as follows:

building a constant mask m by constant maskingSelecting dimension with general discriminant in the multi-modal feature xi to obtain unchanged feature ic _i ＝m⊙x _i Wherein +.is the element-wise product operator;

taking the inverse of m, extracting the variable feature vc related to the scene _i ＝(1-m)⊙x _i The method comprises the steps of carrying out a first treatment on the surface of the Simulating the influence caused by false association bias, and randomly distributing training push texts to each scene training subsetThe variable features are subjected to iterative scene division and sample redistribution until convergence modeling to obtain a scene model;

based on a scene model, constructing a non-variable risk minimization loss function, performing non-variable feature learning on a non-variable mask, and taking the obtained non-variable feature as a non-variable factor and the variable feature as a variable factor.

Further, training tweets are randomly distributed to each scene training subsetThe variable features are subjected to iterative scene division and sample redistribution until convergence modeling to obtain a scene model, and the specific steps are as follows:

training textDivision into multiple situational training subsets +.>For every scene->The construction parameters are->Is-> Is composed of a multi-layer perceptron; according to vc _i Evaluate its likelihood in a scenario:

constructionA context prediction network, which represents the characteristics of the push text affected by different periods, decoy types and authors; constructing a new training subset for each scene s by sample reassignment>Variable feature vc _i Inputting the scene sub-training set into each scene prediction network, and distributing samples to the scene sub-training set with the highest likelihood to obtain a scene model:

wherein θ _s Is a model parameter.

Furthermore, the risk-free minimum loss function is specifically:

where alpha and beta are trade-off parameters,for scene change constraint->Training subset +.>Classification loss on->Is->Model parameters.

Furthermore, based on a contrast learning strategy, the variable factors are decoupled into causal factors reflecting the decoy ways and writing styles of the questions under different conditions and irrelevant factors comprising false association bias, and the specific steps are as follows:

the vc obtained through the invariant feature learning is performed through a multilayer perceptron xi (& gt) _i Mapping to a potential embedding space to obtain an embedded code;

the causal mask gamma=gummel-SoffMax (ζ (vc) _i )，kd)；

Gamma sets kd dimensions with causal information to 1 and the remaining dimensions to 0; thus, the specific scenario causal feature is expressed as sc _i ＝γ⊙vc _i ；

Decoupling the causal features, effectively separating causal information from other irrelevant information, and applying contrast constraint to the decoupling process through causal intervention means:

Wherein,is a two-class model of causal factors, +.>Taking the obtained causal characteristics as causal factors after decoupling for model parameters, and taking the irrelevant characteristics nf of the rest parts _i ＝(1-γ)⊙vc _i Culling as an irrelevant factor.

Further, the method for obtaining the additional training text and enhancing the data comprises the following specific steps:

and collecting unlabeled tweets from a public platform, designing heuristic rules and generating pseudo tags according to social metadata, forwarding the tweets for more than a plurality of times per hour, wherein the watching time is less than a plurality of seconds, the tweets with the user's endorsements less than a set threshold are set as tweet non-conforming samples, and the rest are normal samples, so that data enhancement is realized.

Further, training the classifier by the enhanced training text, specifically:

wherein,representing classifier->Representing training model parameters; during training, different scenes are set randomly, and T-wheel training is performed.

The public number push text multi-mode question disagreement judging system based on causal inference comprises a multi-mode feature extracting module, a constant factor extracting module, a specific scene causal factor decoupling module, a prediction and data enhancement module;

the multi-modal feature extraction module is used for obtaining training push messages and extracting multi-modal features;

The invariant factor extraction module is used for dividing training scripts into different scenes, simulating the influence caused by false association bias, and decoupling invariant factors with discriminant under different scenes and variable factors describing specific scene effects from the multi-mode characteristics;

the specific scene causal factor decoupling module is used for decoupling the variable factors into decoy loop and causal factors of writing style and irrelevant factors containing false associated bias, wherein the decoy loop and the causal factors of writing style reflect the problem text disagreement under different scenes based on a comparison learning strategy;

the prediction and data enhancement module is used for fusing the invariant factors and the causal factors; constructing a classifier according to the fused factors; obtaining additional training push messages and enhancing data, and training a classifier through the enhanced training push messages; identifying the title and the text of the push text through a trained classifier; judging whether the questions are inconsistent.

The beneficial effects of the invention are as follows:

the invention discloses a causal inference-based public number push text multi-mode question mismatching judging method, which provides a rich basis for question mismatching judgment by extracting multi-mode characteristics; according to the method, the multi-modal characteristics are decoupled into the invariant factors reflecting the intention of an author and having discriminant under different situations, the variable factors are decoupled into the decoy loop and the causal factors of the writing style reflecting the misinformation of the questions under different situations and the irrelevant factors containing false association bias and the irrelevant factors reflecting the false association bias based on the comparison learning strategy, so that the causal information of the decoy behavior is characterized in fine granularity, modeling of the false association bias is realized, the causal relation information hidden in the pushing characteristics can be accurately explored, and the generalization is higher; the method also constructs a classifier according to the fused factors; the method has the characteristics of low demand for scarce labeling resources.

Drawings

FIG. 1 is a schematic flow chart of a method for judging whether a public number push text does not accord with a multi-mode question text based on causal judgment.

FIG. 2 is a schematic view of a causal structure of a topic disagreement discriminating task of the public number push text multi-mode topic disagreement discriminating method based on causal discrimination.

FIG. 3 is a schematic diagram of a system for judging whether the public number push text does not conform to the topic text or not based on causal judgment.

Detailed Description

The invention is described in detail below with reference to the drawings and the detailed description.

Example 1

As shown in fig. 1, the method for judging the multi-mode topic text disagreement of the public number push text based on causal inference comprises the following specific steps:

acquiring a training text and extracting multi-modal characteristics;

Fusing unchanged factors and causal factors;

Unlike traditional mode, the method analyzes the intrinsic factors of the decoy behavior from the new angle of causal characterization learning, thereby eliminating false association bias and guaranteeing the robustness and generalization of the model. In order to fully describe the text information, the patent firstly extracts various characteristics, such as text modal characteristics, visual modal characteristics, language characteristics, cross-modal characteristics, author portrait characteristics and the like. These features are entangled with each other, and the existence of redundant information and unobservable false correlation bias can introduce interference noise when the machine is discriminated. Therefore, the method further decouples the invariant factors reflecting the intention of an author and having discriminant in different situations from the multi-modal characteristics, decouples the variable factors into 3 potential factors reflecting the causal factors of the decoy loop and the writing style of the problem text inconsistent with the push in different situations based on the contrast learning strategy, and the irrelevant factors comprising the false associated bias and the irrelevant factors reflecting the false associated bias; when 3 potential factors are decoupled, a classifier which combines robustness and generalization is constructed by fusing the invariant factors and the causal factors of causal information, and the interference of false association bias and new variant push can be well dealt with. In order to improve the training efficiency of the model, the method adopts a data enhancement technology to expand training samples, and reduces the requirement for scarce labeling data. In the embodiment, the method is applied to the WeChat public number platform, can automatically identify that the questions do not accord with the text and timely block the transmission, and has high value in the aspects of improving user experience, platform public confidence and purifying network environment.

Example 2

In this embodiment, given a tweet xi, the present invention aims to build a classification modelJudging whether the topic text is inconsistent or not according to various modal information such as the title text, the cover map, the text and the like in xi (y) _i =1/0), wherein>Is a parameter of the model. For a text with inconsistent questions, each of its modal parts may contain objectionable information such as exaggeration, pornography, christmas, bloody smell, etc., and there is a content inconsistency, such as having eye-attractive titles and thumbnails, but not text content. Furthermore, malicious authors may continually write new variant stories to cater to tide and circumvent detection. The aim of the method is therefore to provide a training set +.>Upper minimized Cross entropy Classification loss->Parameters obtained after optimization->Can make the model->Test set constructed from samples which were not seen during training +.>The performance is very good.

The existing method is used for directly capturing the text feature x _i The co-occurrence relationship with the text disagreement tag is used for judging. This feature is derived from an encoder adapted to generic content understanding class tasks and does not carefully characterize the decoy behavior. At different times, decoy types and author scenarios s _i Under the condition, the specific writing style and the specific decoy set way characteristic c exist in the decoy behavior _i . The existing method does not have scene discussion on the decoy behavior, so that x is caused _i The model is robust, mixed with redundant information and false correlation biasThe rod performance is insufficient. To solve this problem, the method decomposes the multi-modal feature into three potential factors, including invariant feature ic _i Specific scenario causal feature sc _i Extraneous feature nf _i Wherein sc _i And nf _i The unique identifying causal information and prejudice noise of the scene where the text is located are respectively covered. As shown in FIG. 2, the method adopts causal inference technology to block and ic _i 、sc _i Nf (nf) _i And the associated back gate path, thereby eliminating false association deviations. Wherein the double-headed arrow represents a statistical correlation; unidirectional arrows indicate causal relationships; purple arrows are key causal relationships affecting discrimination results; orange arrows indicate the scene effect and scissors indicate the removal relationship.

In a specific embodiment, the multi-modal features include visual features, text features, cross-modal matching features, linguistic features, and statistical features.

In a specific embodiment, the method for extracting the multi-modal feature of the text comprises the following specific steps:

the text mainly comprises two visual information of a cover map and a text illustration, a Swin transducer pre-training model is used as a basic framework, and visual characteristics of the text are extracted through a self-attention mechanism based on a sliding window; swin transducer uses a transducer pre-training model as a basic framework, and extracts the hierarchy information of the image through a self-attention mechanism based on a sliding window.

In this embodiment, in practical application, a substantial part of the text relates to public figures, wherein the pictures mostly contain figures; for this purpose, face recognition and object detection are also carried out via DNN and RetinaNet networks, respectively.

And performing word segmentation operation on the title and the text of the push text, obtaining embedded representation of each word, and inputting the embedded representation into the BERT pre-training model to obtain a word feature set with the dimension of 768.

In this embodiment, OCR technology is also used to extract and encode such text, considering that there may also be decoy text in the cover map.

In this embodiment, the text of the inconsistent text often has inconsistencies in content, such as the cover map contains an object, but the text is not mentioned. Therefore, whether the contents of the various parts of the text are consistent is one of the important bases for distinguishing the inconsistent questions. The modes of the information of each part are different, and heterogeneous gaps exist among the corresponding features, so that direct unified modeling cannot be realized. The CT transducer pre-training model is adopted to perform cross-modal matching on the extracted visual features and the text features, so that cross-modal matching features are obtained;

in this embodiment, a cross-modal matching between extracted visual features and text features is performed by using a CT transducer pre-training model, so as to obtain cross-modal matching features, which specifically are: encoding the extracted visual features and the text features, and then outputting the text-aware visual features F by using a multi-head attention mechanism ^vt ＝CT((TW ^t )，(VW ^v ) And visually perceived text feature F) ^tv ＝CT((VW ^v )，(TW ^t ) Wherein W is ^t 、W ^v Is a weight matrix. Considering that cross-modality matching features rely on a multi-headed attentiveness mechanism, the model is made to characterize decoy behavior from multiple angles.

Language features are based on cross-modal matching features, including text and thumbnail consistency features, text and title consistency features, thumbnail and title consistency features, and title emotion polarity features;

the use of cover charts that give rise to strong visual impact is one of the means commonly used by malicious authors in writing questions that do not match the text, however these cover charts often do not correlate with the actual content of the text. The text and thumbnail consistency feature extracts the fusion feature [ b ] of the text b and the thumbnail t through a pre-training model CLIP ^c ，t ^c ]=clip (b, t) to capture this cross-modal disparity, where d is output separately for CLIP _c Dimension text features and thumbnail features;

malicious authors often make poor information that the title and reader are aware of to decoy the reader to click on the push links to fill in curiosity gaps. But typically the text quality of these tweets is not aligned with the title content. To measure whether the content of the text and the title has difference or not, the text and the title are used for processing Inputting Siamese network to obtain the consistency characteristic sim of the text and the title _bh ＝Siamese(b，h)；

Inconsistencies in the title and cover drawings may motivate the reader to click on the push links. In order to judge whether the content conflict exists between the thumbnail and the title, a text generator is pre-trained by using an MS COCO data set, the thumbnail of the push text is input into the text generator to generate a new title, and the cosine similarity of BERT characteristics of the new title and the original title is calculated to obtain the consistency characteristics of the thumbnail and the title;

decoy titles typically rely on extreme emotional colors to excite the reader to resonate, thereby inducing the curiosity of the reader. The title emotion polarity characteristics are obtained by inputting the title of the push text into an emotion classifier; the heading emotion polarity feature is a 2-dimensional feature, representing emotion polarity (positive/neutral/negative) and intensity values, respectively, with values of each dimension normalized to a range of 0, 1.

The statistical features comprise lexical statistical features, common word statistical features and author portrayal statistical features;

the lexical statistical feature is generated by recording the word that appears in the heading of the tweet as "+|! "? Punctuation marks, emoticons, pronouns, affirmative words and fuzzy words of "-" are obtained;

The common word statistical characteristics are obtained by recording network words, sensitive words, public character names and place names appearing in the titles; a common word statistical feature is also constructed for recording the number of occurrences of the common words.

The author portraits can reflect the quality of the push text to some extent. If one author frequently issues a topical inconsistent tweet in past history, it can be considered a malicious author, and the tweet issued afterwards is also highly likely to be of a decoy nature. To characterize each author u _j Whether the message is malicious or not is judged by recording the account age, nickname, the number of concerned, the number of message release, the number of low-quality message release in the past history, the time from the time to the time of the first message release and the time from the time to the time of the last message release of the first message release;

In this embodiment, the multi-modal feature x _i Is high-dimensional and contains unobservable false correlation biases that can interfere with the judgment of the model, reducing robustness and generalization. The false correlation bias is caused by the fact that the model does not model the influence caused by different periods, decoy types and author characteristics in the training process, the push is divided into different conditions Jing Ziji in the training process, and the invariant feature ic with discriminant in xi is captured by searching common causal information in each scene _i To give the model a stable generalization ability, i.e. a considerable discrimination accuracy for samples seen and not seen during training. At the moment of training the setDivision into multiple situational training subsets +.>After that, the invariant feature learning stage decouples the invariant feature ic with good discrimination performance under each scenario training subset from the multimodal features _i And variable feature vc of reaction scenario characteristics _i . Scenario modeling stage according to vc _i The more appropriate scene sub-training set for the tweet prediction, in turn, further facilitates the invariant feature learning phase. By alternately optimizing the decoupling and prediction process described above, the characteristic ic is unchanged _i Will have a stable generalization capability.

In a specific embodiment, the influence caused by false association bias is simulated, and the invariant factors with discriminant under different scenes and the variable factors describing specific scene effects are decoupled from the multi-mode characteristics, wherein the specific steps are as follows:

building a constant mask m by constant maskingSelecting multi-modal feature xDimension with general discriminant in i, obtaining unchanged characteristic ic _i ＝m⊙x _i Wherein +.is the element-wise product operator; ic (ic) _i Consistent predictions can be made in each scenario;

the scenes are the generalizations of influences caused by different periods, decoy types and authors, and the pushing tools created in the specific scenes have the characteristics of specific writing styles, topics, decoy ways and the like; therefore, the scene modeling quality plays a decisive role in the optimization of the invariant mask m; taking the inverse of m, extracting the variable feature vc related to the scene _i ＝(1-m)⊙x _i The method comprises the steps of carrying out a first treatment on the surface of the Simulating the influence caused by false association bias, and randomly distributing training push texts to each scene training subsetThe variable features are modeled through iterative scene division and sample redistribution until convergence, so as to obtain a scene model

In one embodiment, training tweets are randomly assigned to each contextual training subsetThe variable features are subjected to iterative scene division and sample redistribution until convergence modeling to obtain a scene model, and the specific steps are as follows:

Wherein θ _s Is a model parameter.

In one embodiment, the invariant risk minimization loss function is specifically:

where d and beta are trade-off parameters,for scene changesConstraint of transformation>Training subset +.>Classification loss on->Is->Model parameters.

In this embodiment, the extracted invariant feature vc _i The system has unbiased discrimination capability suitable for each scene, and variable characteristics are mixed with causal information reflecting scene characteristics, multi-mode redundant information and unobservable false associated bias. The causal information can further enrich the discrimination basis, and endow the model with generalization capability for adapting to various scenes.

In one embodiment, the variable factors are decoupled into causal factors reflecting the decoy set and writing style of the text disagreement under different conditions and irrelevant factors containing false association bias based on a contrast learning strategy, and the specific steps are as follows:

when designing the xi (·) structure, the present patent does not inject any scenario information. Therefore, when the model proposed by the patent is deployed in a new scene, the zeta (·) is not required to be reconstructed or parameters are not required to be relearned;

The causal mask gamma=gummel-SoftMax (ζ (vc) _i )，kd)；

wherein,is a two-class model of causal factors, +.>Taking the obtained causal characteristics as causal factors after decoupling for model parameters, and taking the irrelevant characteristics nf of the rest parts _i ＝(1-γ)⊙vc _i Culling as an irrelevant factor. For a text of inconsistent topic (y _i =1), irrelevant feature nf _i Masking causal information of inconsistent character of reaction questions when sc is replaced by the character _i When the judgment is carried out, the judgment result is changed, namely the normal push text is identified; for normal tweets (y _i ＝0)，sc _i Does not capture any decoy information, so even if nf is used instead _i The prediction does not alter the discrimination result.

In one embodiment, the classifier is trained by enhanced training with a push text, specifically:

In this embodiment, the deep learning model needs to be trained by means of a large number of scarce labeling sample resources, and the existing data set has obvious defects in scale and coverage, so that the model can be difficult to deploy and has high cost in the actual deployment process. For this purpose, in one embodiment, the additional training tweets are obtained and data enhancement is performed, specifically by:

collecting a large number of unlabeled tweets from a WeChat public number platform, designing heuristic rules and generating pseudo tags according to social metadata, designing heuristic rules, forwarding more than 10 ten thousand times per hour, wherein the watching time is less than 10 seconds, the tweets with the user praise number of 0 are set as question-text-inconsistent samples, and the rest tweets are normal samples, so that data enhancement is realized.

In this embodiment, in order to measure the effectiveness and superiority of the proposed method, a performance comparison experiment is set with five main flow methods including SVM-TS, biLSTM, dEFEND, HPFN, VLP, and the evaluation indexes include accuracy, precision, recall and F1 score. Firstly, 70 and 794 texts are crawled from a WeChat public number platform, pseudo tags are generated for the texts by utilizing the data enhancement technology mentioned in the unit 104, and finally, the problem text inconsistent texts 32, 418 and normal texts 38 and 376 are obtained respectively. These pseudo tag samples are used as training data sets to optimize the patent model and the comparison model. In order to construct a test data set, 8 and 327 public number scripts are additionally crawled, and 3 and 794 question scripts are respectively obtained through a manual labeling method, and 4 and 533 normal scripts are respectively obtained. Experimental results show that the method is obviously superior to the mainstream method; the pseudo tag data generated by the designed data enhancement method can forward promote the training process of the model, and effectively improve the performance of the model.

In addition, in this embodiment, in order to verify the reliability of the proposed method, the distribution of the multi-modal features, the invariant features, the causal features of the specific scenario, and the irrelevant features of a part of the test samples are also analyzed. The following conclusions can be obtained by visualizing the sample characteristics after adopting a t-SNE dimension reduction technology: (a) The effect of judging that the questions are inconsistent by using the multi-mode features is poor; (b) The invariant features can better distinguish the problem text disagreement and normal text pushing samples, but cannot judge the situation of the text pushing; (c) The causal features of the specific scenes have the capability of distinguishing the scenes to which the samples belong and distinguishing the inconsistent questions; (d) The extraneous feature, while capable of distinguishing the scene to which the sample belongs, does not contain any causal information that can identify the problem text disagreement. In general, the method can accurately induce the scene characteristics of the text, and output reliable discrimination results by combining the invariant features and the causal features of the specific scene.

In summary, compared with the prior art, the method provided by the invention has the following advantages:

1. multiple types of modal characteristics such as text modal characteristics, visual modal characteristics, language characteristics, cross-modal characteristics, author portrait characteristics and the like are extracted, and a rich basis is provided for distinguishing the mismatching of the questions;

2. Decoupling the multimodal features into a plurality of potential factors including causal information that characterizes decoy behavior in fine granularity; the scene is modeled by utilizing the scene to induce influences caused by different periods, decoy types and authors and alternately optimizing decoupling and scene prediction processes. The model has higher generalization by analyzing the causal effect of specific scenes;

3. under specific situations, contrast loss is introduced to restrict a causal inference process, irrelevant factors mixed with redundant information and false correlation bias are accurately removed, and therefore robustness of the model is improved; the training samples are expanded by utilizing a data enhancement technology, so that the problem of the demand of the model training process for scarce labeling resources is solved;

4. the method provided by the invention can be successfully deployed in actual business scenes with high requirements such as false association bias, multi-mode content, constantly-changed variant tweets and scarce labeling resources. The method can remarkably improve the WeChat public number user experience, enhance the platform public confidence and purify the network environment.

Example 3

As shown in fig. 3, the causal inference-based public number push text multi-mode question disagreement judging system comprises a multi-mode feature extraction module, a constant factor extraction module, a specific scenario causal factor decoupling module and a prediction and data enhancement module;

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The method for judging the multi-mode thematic disagreement of the public number push text based on causal inference is characterized by comprising the following steps of: the method comprises the following specific steps:

acquiring a training text and extracting multi-modal characteristics;

fusing unchanged factors and causal factors;

2. The causal inference-based public number push text multi-modal topic disagreement method of claim 1, wherein: the multi-modal features include visual features, text features, cross-modal matching features, linguistic features, statistical features.

3. The causal inference-based public number push text multi-modal topic disagreement method of claim 2, wherein: the method for extracting the multi-modal characteristics of the push text comprises the following specific steps:

language features are based on cross-modal matching features, including text and thumbnail consistency features, text and title consistency features, thumbnail and title consistency features, and title emotion polarity features; the text and thumbnail consistency characteristics are obtained by extracting fusion characteristics of text texts and thumbnails of the push text through a pre-training model CLIP; the text and title consistency characteristic is obtained by inputting the text and the title into a Siamese network; the consistency characteristics of the thumbnail and the title are obtained by inputting the thumbnail of the push text into a text generator through a pre-training text generator to generate a new title and calculating the cosine similarity of the BERT characteristics of the new title and the original title;

The title emotion polarity characteristics are obtained by inputting the title of the push text into an emotion classifier;

4. The causal inference-based public number push text multi-modal topic disagreement method of claim 3, wherein: simulating the influence caused by false association bias, and decoupling the invariant factors with discriminant under different scenes and the variable factors describing specific scene effects from the multi-mode characteristics, wherein the specific steps are as follows:

Building a constant mask m by constant maskingSelecting multi-modal feature x _i The dimension with general discriminant is obtained to obtain the invariant feature ic _i ＝m⊙x _i Wherein +.is the element-wise product operator;

5. The causal inference-based public number push text multi-modal topic disagreement method of claim 4, wherein: randomly assigning training tweets to individual scenario training subsetsThe variable features are subjected to iterative scene division and sample redistribution until convergence modeling to obtain a scene model, and the specific steps are as follows:

training textDivision into multiple situational training subsets +.>For every scene->Build parameters areIs pre-network-> Is composed of a multi-layer perceptron; according to vc _i Evaluating it in a scenarioLikelihood of (2):

wherein θ _s Is a model parameter.

6. The causal inference-based public number push text multi-modal topic disagreement method of claim 5, wherein: the unchanged risk minimization loss function is specifically:

where alpha and beta are trade-off parameters,representing a binary classification model based on a invariant mask, < +.>For scene change constraint->Training subset +.>The classification loss in the above-mentioned process,is->Model parameters.

7. The causal inference-based public number push text multi-modal topic disagreement method of claim 4, wherein: based on a contrast learning strategy, decoupling the variable factors into causal factors reflecting the decoy ways and writing styles of the questions under different conditions and irrelevant factors containing false association bias, wherein the method comprises the following specific steps of:

the causal mask gamma=gummel-SoftMax (ζ (vc) _i ),kd)；

8. The causal inference-based public number push text multi-modal topic disagreement method of claim 7, wherein: the method comprises the specific steps of:

9. The causal inference-based public number push text multi-modal topic disagreement method of claim 8, wherein: training the classifier through the enhanced training text, specifically:

10. The utility model provides a public number pushing multi-mode topic text disagreement system based on causal inference which characterized in that: the system comprises a multi-mode feature extraction module, a constant factor extraction module, a specific scene causal factor decoupling module and a prediction and data enhancement module;