CN117851943A - Message anomaly detection method and device - Google Patents

Message anomaly detection method and device Download PDF

Info

Publication number
CN117851943A
CN117851943A CN202311635822.2A CN202311635822A CN117851943A CN 117851943 A CN117851943 A CN 117851943A CN 202311635822 A CN202311635822 A CN 202311635822A CN 117851943 A CN117851943 A CN 117851943A
Authority
CN
China
Prior art keywords
text
message
feature
comment
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311635822.2A
Other languages
Chinese (zh)
Inventor
顾焱
范奇峰
张志磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311635822.2A priority Critical patent/CN117851943A/en
Publication of CN117851943A publication Critical patent/CN117851943A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a message anomaly detection method and device, which are applied to the technical field of big data, wherein the method comprises the following steps: acquiring a message text and a message image of a message to be detected, wherein the message to be detected corresponds to a comment text and a comment image of a comment; extracting text features of the message text and the comment text respectively to obtain a first text feature and a second text feature, and extracting image features of the message image and the comment image respectively to obtain a first image feature and a second image feature; and carrying out feature fusion on the first text feature, the second text feature, the first image feature and the first image feature to obtain fusion features, detecting the fusion features based on a preset detection model to obtain whether the message to be detected is abnormal or not, and if so, deleting the message to be detected from a message database. The method and the device can improve the accuracy of the message pushed to the user.

Description

Message anomaly detection method and device
Technical Field
The present disclosure relates to the field of message anomaly detection technologies, and in particular, to the field of big data technologies, and in particular, to a method and an apparatus for detecting message anomalies.
Background
This section is intended to provide a background or context to the embodiments of the application recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
With the popularity of the internet and social media, a vast amount of information is widely spread over networks, including news and social media posts, and the like. However, with the convenience and rapidness of information, the information push service subscribed by the user at present generally pushes related messages directly to the user, and detection and rejection of the abnormal messages are not performed, or although detection and rejection of the abnormal messages are performed, the adopted traditional message abnormality detection methods are mainly based on manual rules or keyword matching, and the methods often face the problems of low accuracy, poor adaptability, easy attack and the like. Moreover, the characteristics of false messages may not be fully captured by only relying on the text characteristics of the messages, resulting in lower accuracy of detecting message anomalies at present.
Disclosure of Invention
An object of the present application is to provide a message anomaly detection method, which improves the efficiency and accuracy of message anomaly detection, timely deletes anomaly messages with false problems and the like, and improves the accuracy of messages pushed to users. Another object of the present application is to provide a message anomaly detection device. It is yet another object of the present application to provide a computer device. It is yet another object of the present application to provide a readable medium.
In order to achieve the above objective, an aspect of the present application discloses a message anomaly detection method, including:
acquiring a message text and a message image of a message to be detected, wherein the message to be detected corresponds to a comment text and a comment image of a comment;
extracting text features of the message text and the comment text respectively to obtain a first text feature and a second text feature, and extracting image features of the message image and the comment image respectively to obtain a first image feature and a second image feature;
and carrying out feature fusion on the first text feature, the second text feature, the first image feature and the first image feature to obtain fusion features, detecting the fusion features based on a preset detection model to obtain whether the message to be detected is abnormal or not, and if so, deleting the message to be detected from a message database.
Optionally, after acquiring the message text and the message image of the message to be detected, the message to be detected corresponds to the comment text and the comment image of the comment, and then:
cleaning the text data of the message text and the comment text, and performing text word segmentation on the cleaned text data to obtain a vocabulary sequence;
Performing word drying or word shape reduction on the words in the word sequence to obtain basic form words;
and filtering the basic form vocabulary.
Optionally, the extracting text features of the message text and the comment text to obtain a first text feature and a second text feature includes:
and encoding the message text and the comment text into high-dimension text features by adopting a preset feature extraction model through an attention mechanism algorithm to obtain a first text feature and a second text feature.
Optionally, the comment document includes a plurality of comment data, and before feature fusion is performed on the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fusion feature, the method further includes:
carrying out emotion prediction on each piece of comment data in the comment text through a preset emotion analysis model to obtain emotion types of each piece of comment data;
obtaining emotion characteristics based on the emotion type of each piece of comment data and the number of comment data in the comment text;
and adding the emotion feature to the second text feature.
Optionally, the obtaining the emotion feature based on the emotion category of each piece of comment data and the number of comment data in the comment text includes:
Determining the weight corresponding to each piece of comment data;
and obtaining the emotion characteristics of the comment text based on the emotion categories and the corresponding weights of all comment texts.
Optionally, the determining the weight corresponding to each piece of comment data includes:
determining the ratio of the number of comment data corresponding to each emotion type in the number of all comment data;
and determining the weight of each emotion type according to the ratio of each emotion type, and taking the weight of each piece of comment data in the corresponding emotion type.
Optionally, the obtaining the emotion features of the comment text based on emotion categories and corresponding weights of all comment texts includes:
extracting features of each piece of comment data to obtain comment features;
averaging the comment features of all comment data in each emotion category, and multiplying the comment features with corresponding weights to obtain the classified emotion features of each emotion category;
and superposing the classified emotion characteristics of all emotion categories to obtain the emotion characteristics.
Optionally, the feature fusing the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fused feature includes:
And splicing the first text feature, the second text feature, the first image feature and the first image feature to obtain the fusion feature.
The application also discloses a message anomaly detection device, including:
the data processing module is used for acquiring message text and message image of the message to be detected, wherein the message to be detected corresponds to comment text and comment image of comment;
the feature extraction module is used for respectively extracting text features of the message text and the evaluation text to obtain a first text feature and a second text feature, and respectively extracting image features of the message image and the evaluation image to obtain a first image feature and a second image feature;
the abnormal message detection module is used for carrying out feature fusion on the first text feature, the second text feature, the first image feature and the first image feature to obtain fusion features, detecting the fusion features based on a preset detection model to obtain whether the message to be detected is abnormal or not, and if so, deleting the message to be detected from a message database.
The application also discloses a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the computer program.
The application also discloses a computer readable storage medium storing a computer program which when executed by a processor implements a method as described above.
The method and the device are based on the message text and the message image of the message to be detected and the comment text and the comment image corresponding to the message to be detected, not only are the characteristics of the characters and the images extracted from the message to be detected, but also the characteristics of the characters and the images extracted from the comment text of the message to be detected are extracted, and the extracted characters and the extracted image characteristics are subjected to characteristic fusion and then are subjected to message abnormality detection through a preset detection model, so that the traditional manual detection or keyword matching message abnormality detection mode is replaced, and the message abnormality detection efficiency is improved. In addition, the fusion characteristic is obtained by fusing the text characteristic and the image characteristic of the message to be detected and the comment so as to detect the message abnormality, the comment of the message to be detected is considered to have the effect of reflecting the authenticity of the message to be detected to a certain extent, and the image characteristic of the message to be detected and the comment is mined, so that the abnormality detection of the message can find out the problem that the text of the message to be detected cannot be detected, the abnormality detection accuracy of the message to be detected is improved, the abnormality message with false problems and the like is deleted in time, and the accuracy of the message pushed to the user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a schematic flow chart of an embodiment of a message anomaly detection method of the present application;
FIG. 2 is a flowchart of an embodiment S000 of a message anomaly detection method according to the present application;
FIG. 3 is a flowchart illustrating a message anomaly detection method according to an embodiment S200 of the present application;
fig. 4 is a flowchart of an embodiment S230 of a message anomaly detection method in the present application;
FIG. 5 is a flowchart illustrating a message anomaly detection method according to an embodiment S231;
FIG. 6 is a flowchart of an embodiment S232 of a message anomaly detection method of the present application;
FIG. 7 is a schematic structural diagram of an embodiment of a message anomaly detection device according to the present application;
fig. 8 shows a schematic structural diagram of a computer device suitable for use in implementing embodiments of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be further described in detail with reference to the accompanying drawings. The illustrative embodiments of the present application and their description are presented herein to illustrate the application and not to limit the application.
Before the technical scheme of the invention is described in detail, the technical scheme of the invention has the advantages that the acquisition, storage, use, processing and the like of the data meet the relevant regulations of national laws and regulations.
It should be noted that the method and the device for detecting message abnormality disclosed in the present application may be used in the technical field of big data, and may also be used in any field other than the technical field of big data.
According to one aspect of the present application, the present embodiment discloses a message anomaly detection method. As shown in fig. 1, in this embodiment, the method includes:
s100: and acquiring a message text and a message image of the message to be detected, wherein the message to be detected corresponds to the comment text and the comment image of the comment.
S200: and respectively extracting text features of the message text and the comment text to obtain a first text feature and a second text feature, and respectively extracting image features of the message image and the comment image to obtain a first image feature and a second image feature.
S300: and carrying out feature fusion on the first text feature, the second text feature, the first image feature and the first image feature to obtain fusion features, detecting the fusion features based on a preset detection model to obtain whether the message to be detected is abnormal or not, and if so, deleting the message to be detected from a message database.
The method and the device are based on the message text and the message image of the message to be detected and the comment text and the comment image corresponding to the message to be detected, not only are the characteristics of the characters and the images extracted from the message to be detected, but also the characteristics of the characters and the images extracted from the comment text of the message to be detected are extracted, and the extracted characters and the extracted image characteristics are subjected to characteristic fusion and then are subjected to message abnormality detection through a preset detection model, so that the traditional manual detection or keyword matching message abnormality detection mode is replaced, and the message abnormality detection efficiency is improved. In addition, the fusion characteristic is obtained by fusing the text characteristic and the image characteristic of the message to be detected and the comment so as to detect the message abnormality, the comment of the message to be detected is considered to have the effect of reflecting the authenticity of the message to be detected to a certain extent, and the image characteristic of the message to be detected and the comment is mined, so that the abnormality detection of the message can find out the problem that the text of the message to be detected cannot be detected, the abnormality detection accuracy of the message to be detected is improved, the abnormality message with false problems and the like is deleted in time, and the accuracy of the message pushed to the user is improved.
In an alternative embodiment, as shown in fig. 2, the method further includes, after acquiring a message text and a message image of a message to be detected, where the message to be detected corresponds to a comment text and a comment image of a comment, S000:
s010: and cleaning the text data of the message text and the comment text, and performing text word segmentation on the cleaned text data to obtain a vocabulary sequence.
S020: and carrying out word drying or word shape reduction on the words in the word sequence to obtain basic form words.
S030: and filtering the basic form vocabulary.
Specifically, before extracting text features of the message to be detected and the corresponding comments, text data of the message text and the comment text can be preprocessed, so that the quality of the text data is improved, and the accuracy of detecting the abnormality of the subsequent message is improved. Specifically, during data preprocessing, firstly, text data can be cleaned, and useless punctuation marks, special characters or HTML labels and the like are removed. Alternatively, the original text data may be cleaned by means of regular matching. Then, text word segmentation is carried out on the cleaned text data, and the cleaned text data is split into word sequences comprising words, subwords and other words. The text word segmentation can be realized by using common word segmentation tools or libraries, such as NLTK, spaCy or Tokenizer. Next, the vocabulary in the vocabulary sequence can be subjected to word drying or word shape reduction through a tool or library such as LemmInlect, and the words can be reduced to the basic form so as to reduce the variant forms of the vocabulary. Finally, according to specific requirements, filtering of the stop words can be performed, for example, a preset stop word dictionary filters the stop words, and common words irrelevant to classification or emotion analysis are removed.
In an alternative embodiment, the extracting, at S200, text features of the message text and the comment text to obtain a first text feature and a second text feature includes:
s210: and encoding the message text and the comment text into high-dimension text features by adopting a preset feature extraction model through an attention mechanism algorithm to obtain a first text feature and a second text feature.
Specifically, the message text and comment text of the message to be detected are encoded into high-dimensional feature vectors through a preset feature extraction model attention mechanism algorithm, and the specific gravity score of key components in data can be improved to capture key information in sentences, so that high-dimensional first text features and second text features containing rich semantics and context information can be obtained, and the accuracy of subsequent message anomaly detection is improved.
In a specific example, a Bert model may be used as a preset feature extraction model to extract text features of a message to be detected and comment text, that is, the message text and comment text of the message to be detected are input into the Bert model, context representation and semantic information of data are obtained, and the output of the last layer of the Bert model is selected as text features. The method comprises the steps that a Bert model encodes a message to be detected into a high-dimensional feature vector through a multi-layer transducer structure, and because the transducers use an attention mechanism-based algorithm, the attention mechanism can improve the specific gravity score of key components in data to capture key information in sentences, so that the extracted text features contain rich semantic and contextual information. The calculation formula of the attention algorithm may be:
Wherein: q represents the input matrix of the data point or context of current interest, K represents the matrix of all possible positions of interest, like an index, in one-to-one correspondence with the values, V represents the actual information or content matrix, d k Representing QKV the dimensions of the input matrix.
In an alternative embodiment, as shown in fig. 3, the comment document includes a plurality of comment data, and the method further includes, before feature fusing the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fused feature:
s220: and carrying out emotion prediction on each piece of comment data in the comment text through a preset emotion analysis model to obtain emotion types of each piece of comment data.
S230: and obtaining emotion characteristics based on the emotion type of each piece of comment data and the number of comment data in the comment text.
S240: and adding the emotion feature to the second text feature.
It is appreciated that comments on abnormal messages such as spurious messages often come with specific emotional trends and structures. Therefore, combining emotion analysis of comments with text analysis of a message to be detected, utilizing emotion features to enhance the detection capability of false messages is an effective method. By training the emotion analysis model and applying it to emotion scoring of user comments, emotion features associated with false messages can be extracted. Optionally, the emotion analysis model can be obtained through the Bert series model or the XLM-Robert model training, and emotion analysis is carried out on the comment text to obtain emotion types. For example, comment text may be mapped to corresponding emotion categories, including but not limited to positive, negative, or neutral, through a fully connected layer of emotion analysis models. The specific step of training to obtain the emotion analysis model is a conventional technical means in the art, and will not be described herein.
And performing emotion scoring on the comment text to be classified by using the trained emotion analysis model, and generating emotion characteristics. And according to the emotion classification result of the comment, a corresponding emotion score or probability can be obtained. These emotion features provide a measure of text emotion tendencies and facilitate detection of false messages.
In an alternative embodiment, as shown in fig. 4, the step S230 of obtaining the emotion feature based on the emotion category of each comment data and the number of comment data in the comment text includes:
s231: and determining the weight corresponding to each piece of comment data.
S232: and obtaining the emotion characteristics of the comment text based on the emotion categories and the corresponding weights of all comment texts.
Specifically, it can be understood that, for some scenes, a message to be detected, for example, a news, usually corresponds to a plurality of user comments, before feature fusion is performed, the plurality of user comments need to be integrated, emotion distribution features of evaluation data are analyzed from the whole layer of all comment texts, and the distribution features of the evaluation data are amplified in a weight-added manner, so that the accuracy of message anomaly detection is improved.
In an alternative embodiment, as shown in fig. 5, the determining, by S231, the weight corresponding to each piece of comment data includes:
S2311: and according to the ratio of the number of comment data corresponding to each emotion type in the number of all comment data.
S2312: and determining the weight of each emotion type according to the ratio of each emotion type, and taking the weight of each piece of comment data in the corresponding emotion type.
In a specific example, after the emotion types of each comment text are obtained, the occurring proportion of the comment text of each emotion type is counted, and then a weight is added to each specific based on the proportion, and the fused emotion characteristics are obtained through weight calculation.
In an alternative embodiment, as shown in fig. 6, the step S232 of obtaining the emotion features of the comment text based on emotion categories and corresponding weights of all comment texts includes:
s2321: and extracting the characteristics of each piece of comment data to obtain comment characteristics.
S2322: and averaging the comment features of all comment data in each emotion category, and multiplying the comment features by corresponding weights to obtain the classified emotion features of each emotion category.
S2323: and superposing the classified emotion characteristics of all emotion categories to obtain the emotion characteristics.
In a specific example, the calculation of emotion characteristics can be achieved by the following formula:
Emotional feature = positive weight× (positive feature 1+ positive feature 2+)/m +
Neutral weight x (neutral feature 1+ neutral feature 2+)/n + n
Negative weight x (negative characteristic 1+negative characteristic 2+)/k
The positive features are comment features extracted from comment texts with positive emotion categories, and m is the number of the positive features; the neutral features are comment features extracted from comment texts with emotion types being neutral, and n is the number of the neutral features; the negative features are comment features extracted from comment texts with emotion categories being negative, and n is the number of the negative features.
In an alternative embodiment, the step S300 of performing feature fusion on the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fusion feature includes:
s310: and splicing the first text feature, the second text feature, the first image feature and the first image feature to obtain the fusion feature.
In the alternative implementation mode, feature fusion is realized by splicing the extracted text features and the extracted image features, and a new feature sequence is generated. Of course, in other embodiments, the fusion feature may be obtained according to the text feature and the image feature in other manners, and those skilled in the art may determine the fusion feature according to the actual situation, which is not limited in this application.
It should be noted that, feature extraction may be performed on the message text through the BERT series model to obtain a first text feature, and feature extraction may be performed on each piece of comment data to obtain a comment feature. Of course, in practical application, feature extraction may be performed by defining features to be extracted of the message text and comment data, and a person skilled in the art may select a corresponding feature extraction manner according to practical situations, which is not limited in this application.
In this embodiment, when extracting the image features of the message image and the comment image to obtain the first image feature and the second image feature, all the images may be first formatted, and all the images may be processed into a size that can be processed, for example, a pixel size of 512×512. The image is then subjected to an image feature extraction operation using a pre-trained ViT (Vision Transformer) model. If the message image and the comment image comprise a plurality of images, the extracted image features of the plurality of message images are needed to be spliced to obtain a first image feature and a second image feature respectively.
And then, using the fused characteristics as input of a preset detection model to obtain a detection result. The common neural network or the LSTM model can be trained to obtain a preset detection model to detect false messages, and a feedforward neural network comprising a plurality of hidden layers or a circulating neural network comprising a plurality of LSTM layers is defined. Through the learning and optimization of the model, the fusion characteristics can be accurately classified, and the message to be detected is classified into a real message or a false message. In deep learning, the content is typically classified using Softmax, the following classification formula:
Wherein K represents a category, z j To output the output value of the node j, finallyMaximum value as scoreClass yields results.
During the training process, the model is optimized by using a proper optimization algorithm (such as random gradient descent) and a loss function (such as cross entropy loss), and the cross entropy loss function is adopted for optimization in the invention in consideration of the fact that multiple categories of false messages in a real scene are possible. The following is a multi-class cross entropy loss function:
wherein: m is the number of categories, N is the number of samples, L i Is the loss value of i samples, the loss value of the whole data set of L, y ic As a sign function, if the true class of sample i is equal to c, taken to be 1, otherwise, 0, p ic The predicted probability that sample i belongs to category c is observed.
The verification set is used for model tuning and selecting the best model. And finally, evaluating the model by using an independent test data set, and calculating indexes such as accuracy, recall rate, F1 value and the like to evaluate the performance of the message anomaly detection model. These indices are based on a confusion matrix in which the following concepts are found:
true Positive (TP) is a True class. The sample is truly a positive class, and the result of the model test is also a positive class.
False Negative (FN). The true class of the sample is the positive class, but the model detects it as the negative class.
False Positive (FP) False Positive class. The true class of the sample is the negative class, but the model detects it as the positive class.
True Negative (TN). The real class of the sample is a negative class and the model detects it as a negative class.
As shown in Table 1, a confusion matrix is exemplified by two classifications
TABLE 1
The Accuracy Accuracy is:
recall ratio Recall is:
precision is:
F 1 the value is
And synthesizing several indexes to obtain the optimal model.
Through the above detailed technical scheme, the false message detection method and device can fully utilize data preprocessing, text feature extraction, image feature extraction, feature fusion of the Bert series model and training of a common neural network or an LSTM model, realize false message detection, and are beneficial to improving the accuracy and robustness of message anomaly detection.
In summary, the application provides a message anomaly detection method based on a message to be detected and a corresponding comment, which aims to solve the problems that the traditional manual detection method is low in efficiency, high in cost, easy to be interfered by subjective factors and the like. The method and the device realize rapid processing and analysis of a large amount of text data and image data to be detected by utilizing an automation technology, and improve the processing efficiency. Compared with the traditional manual detection method, the automatic processing method not only saves time and manpower resources, but also can meet the real requirement of high information propagation speed. In addition, the method comprehensively utilizes the advantages of text features and comment emotion analysis technology, and improves the accuracy of message anomaly detection by mining features and emotion information in comments. Text and image feature detection can classify messages, and provides a basis for subsequent detection work. And comment emotion analysis technology can analyze emotion tendencies from texts and help judge the authenticity and intention of data. By comprehensively utilizing the two technologies, the false message can be detected more comprehensively, and the possibility of misjudgment is reduced.
Based on the same principle, the application also discloses a message anomaly detection device. As shown in fig. 7, in the present embodiment, the apparatus includes a data processing module 11, a feature extraction module 12, and an abnormal message detection module 13.
The data processing module 11 is configured to obtain a message text and a message image of a message to be detected, where the message to be detected corresponds to a comment text and a comment image of a comment.
The feature extraction module 12 is configured to extract text features of the message text and the comment text to obtain a first text feature and a second text feature, and extract image features of the message image and the comment image to obtain a first image feature and a second image feature, respectively.
The abnormal message detection module 13 is configured to perform feature fusion on the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fusion feature, detect the fusion feature based on a preset detection model to obtain whether the message to be detected is abnormal, and if so, delete the message to be detected from the message database.
The method and the device are based on the message text and the message image of the message to be detected and the comment text and the comment image corresponding to the message to be detected, not only are the characteristics of the characters and the images extracted from the message to be detected, but also the characteristics of the characters and the images extracted from the comment text of the message to be detected are extracted, and the extracted characters and the extracted image characteristics are subjected to characteristic fusion and then are subjected to message abnormality detection through a preset detection model, so that the traditional manual detection or keyword matching message abnormality detection mode is replaced, and the message abnormality detection efficiency is improved. In addition, the fusion characteristic is obtained by fusing the text characteristic and the image characteristic of the message to be detected and the comment so as to detect the message abnormality, the comment of the message to be detected is considered to have the effect of reflecting the authenticity of the message to be detected to a certain extent, and the image characteristic of the message to be detected and the comment is mined, so that the abnormality detection of the message can find out the problem that the text of the message to be detected cannot be detected, the abnormality detection accuracy of the message to be detected is improved, the abnormality message with false problems and the like is deleted in time, and the accuracy of the message pushed to the user is improved.
In an optional implementation manner, the data processing module 11 is further configured to, after acquiring a message text and a message image of a message to be detected, clean text data of the message text and the comment text after the comment text and the comment image of a comment corresponding to the message to be detected, and perform text word segmentation on the cleaned text data to obtain a vocabulary sequence; performing word drying or word shape reduction on the words in the word sequence to obtain basic form words; and filtering the basic form vocabulary.
In an alternative embodiment, the feature extraction module 12 is configured to encode the message text and the comment text into high-dimensional text features by using a preset feature extraction model through an attention mechanism algorithm to obtain a first text feature and a second text feature.
In an optional embodiment, the comment text includes a plurality of comment data, and the feature extraction module 12 is configured to predict emotion of each comment data in the comment text by using a preset emotion analysis model before feature fusion is performed on the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fusion feature, so as to obtain an emotion category of each comment data; obtaining emotion characteristics based on the emotion type of each piece of comment data and the number of comment data in the comment text; and adding the emotion feature to the second text feature.
In an alternative embodiment, the feature extraction module 12 is configured to determine a weight corresponding to each piece of comment data; and obtaining the emotion characteristics of the comment text based on the emotion categories and the corresponding weights of all comment texts.
In an optional embodiment, the feature extraction module 12 is configured to determine a ratio of the number of comment data corresponding to each emotion category to the number of all comment data; and determining the weight of each emotion type according to the ratio of each emotion type, and taking the weight of each piece of comment data in the corresponding emotion type.
In an alternative embodiment, the feature extraction module 12 is configured to perform feature extraction on each piece of comment data to obtain comment features; averaging the comment features of all comment data in each emotion category, and multiplying the comment features with corresponding weights to obtain the classified emotion features of each emotion category; and superposing the classified emotion characteristics of all emotion categories to obtain the emotion characteristics.
In an alternative embodiment, the abnormal message detection module 13 is configured to splice the first text feature, the second text feature, the first image feature, and the first image feature to obtain the fusion feature.
Since the principle of the device for solving the problem is similar to that of the above method, the implementation of the device can be referred to the implementation of the method, and will not be described herein.
The embodiment of the application also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method when executing the computer program.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the above method.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program to produce a system, apparatus, module, or unit of the embodiments described above, which may be implemented in particular by a computer chip or entity, or by an article of manufacture having some function. A typical implementation device is a computer device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the computer apparatus includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement a method performed by a client as described above, or where the processor executes the program to implement a method performed by a server as described above.
Referring now to FIG. 8, a schematic diagram of a computer device 600 suitable for use in implementing embodiments of the present application is shown.
As shown in fig. 8, the computer apparatus 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a liquid crystal feedback device (LCD), and the like, and a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on drive 610 as needed, so that a computer program read therefrom is mounted as needed as storage section 608.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (11)

1. A message anomaly detection method, comprising:
acquiring a message text and a message image of a message to be detected, wherein the message to be detected corresponds to a comment text and a comment image of a comment;
extracting text features of the message text and the comment text respectively to obtain a first text feature and a second text feature, and extracting image features of the message image and the comment image respectively to obtain a first image feature and a second image feature;
and carrying out feature fusion on the first text feature, the second text feature, the first image feature and the first image feature to obtain fusion features, detecting the fusion features based on a preset detection model to obtain whether the message to be detected is abnormal or not, and if so, deleting the message to be detected from a message database.
2. The message anomaly detection method of claim 1, further comprising, after acquiring a message text and a message image of a message to be detected, the message to be detected corresponding to comment text and comment image of a comment, then:
cleaning the text data of the message text and the comment text, and performing text word segmentation on the cleaned text data to obtain a vocabulary sequence;
Performing word drying or word shape reduction on the words in the word sequence to obtain basic form words;
and filtering the basic form vocabulary.
3. The message anomaly detection method of claim 1, wherein the extracting text features of the message text and the comment text, respectively, to obtain a first text feature and a second text feature comprises:
and encoding the message text and the comment text into high-dimension text features by adopting a preset feature extraction model through an attention mechanism algorithm to obtain a first text feature and a second text feature.
4. The message anomaly detection method of claim 1, wherein the comment document includes a plurality of pieces of comment data, the method further comprising, prior to feature fusing the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fused feature:
carrying out emotion prediction on each piece of comment data in the comment text through a preset emotion analysis model to obtain emotion types of each piece of comment data;
obtaining emotion characteristics based on the emotion type of each piece of comment data and the number of comment data in the comment text;
And adding the emotion feature to the second text feature.
5. The message anomaly detection method of claim 4, wherein the deriving emotion characteristics based on the emotion classification of each comment data and the number of comment data in the comment text comprises:
determining the weight corresponding to each piece of comment data;
and obtaining the emotion characteristics of the comment text based on the emotion categories and the corresponding weights of all comment texts.
6. The message anomaly detection method of claim 5, wherein determining the weight for each piece of comment data comprises:
determining the ratio of the number of comment data corresponding to each emotion type in the number of all comment data;
and determining the weight of each emotion type according to the ratio of each emotion type, and taking the weight of each piece of comment data in the corresponding emotion type.
7. The message anomaly detection method of claim 5, wherein the deriving the emotional characteristics of the comment text based on emotion categories and corresponding weights of all comment text comprises:
extracting features of each piece of comment data to obtain comment features;
Averaging the comment features of all comment data in each emotion category, and multiplying the comment features with corresponding weights to obtain the classified emotion features of each emotion category;
and superposing the classified emotion characteristics of all emotion categories to obtain the emotion characteristics.
8. The method of claim 1, wherein the feature fusing the first text feature, the second text feature, the first image feature, and the first image feature to obtain a fused feature comprises:
and splicing the first text feature, the second text feature, the first image feature and the first image feature to obtain the fusion feature.
9. A message anomaly detection apparatus, comprising:
the data processing module is used for acquiring message text and message image of the message to be detected, wherein the message to be detected corresponds to comment text and comment image of comment;
the feature extraction module is used for respectively extracting text features of the message text and the evaluation text to obtain a first text feature and a second text feature, and respectively extracting image features of the message image and the evaluation image to obtain a first image feature and a second image feature;
The abnormal message detection module is used for carrying out feature fusion on the first text feature, the second text feature, the first image feature and the first image feature to obtain fusion features, detecting the fusion features based on a preset detection model to obtain whether the message to be detected is abnormal or not, and if so, deleting the message to be detected from a message database.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 8 when executing the computer program.
11. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 8.
CN202311635822.2A 2023-11-30 2023-11-30 Message anomaly detection method and device Pending CN117851943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311635822.2A CN117851943A (en) 2023-11-30 2023-11-30 Message anomaly detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311635822.2A CN117851943A (en) 2023-11-30 2023-11-30 Message anomaly detection method and device

Publications (1)

Publication Number Publication Date
CN117851943A true CN117851943A (en) 2024-04-09

Family

ID=90533617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311635822.2A Pending CN117851943A (en) 2023-11-30 2023-11-30 Message anomaly detection method and device

Country Status (1)

Country Link
CN (1) CN117851943A (en)

Similar Documents

Publication Publication Date Title
CN111339305B (en) Text classification method and device, electronic equipment and storage medium
CN110019792A (en) File classification method and device and sorter model training method
CN113806482B (en) Cross-modal retrieval method, device, storage medium and equipment for video text
CA3104242C (en) Systems and methods for determining structured proceeding outcomes
CN114239591B (en) Sensitive word recognition method and device
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
CN112270187A (en) Bert-LSTM-based rumor detection model
CN117112744A (en) Assessment method and device for large language model and electronic equipment
CN110727758A (en) Public opinion analysis method and system based on multi-length text vector splicing
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN107992473B (en) Fraud information feature word extraction method and system based on point-to-point mutual information technology
CN111680120B (en) News category detection method and system
CN117828076A (en) Public opinion grading early warning method and system based on propagation link
Chen et al. An effective crowdsourced test report clustering model based on sentence embedding
CN116226756A (en) Open domain social event classification method, device, electronic equipment and storage medium
CN116305257A (en) Privacy information monitoring device and privacy information monitoring method
CN117851943A (en) Message anomaly detection method and device
CN112035670B (en) Multi-modal rumor detection method based on image emotional tendency
CN114254622A (en) Intention identification method and device
CN114325384A (en) Crowdsourcing acquisition system and method based on motor fault knowledge
CN117591740A (en) News message pushing method and device
CN114036283A (en) Text matching method, device, equipment and readable storage medium
CN112308453B (en) Risk identification model training method, user risk identification method and related devices
CN112434155A (en) Comment quality classification method, device, equipment and readable medium
CN118152532A (en) Reply generation method, long tail recognition model training method and corresponding device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination