CN113076754A - False comment detection method and system based on knowledge integration - Google Patents

False comment detection method and system based on knowledge integration Download PDF

Info

Publication number
CN113076754A
CN113076754A CN202110307754.1A CN202110307754A CN113076754A CN 113076754 A CN113076754 A CN 113076754A CN 202110307754 A CN202110307754 A CN 202110307754A CN 113076754 A CN113076754 A CN 113076754A
Authority
CN
China
Prior art keywords
false
comment
embedding
detection
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110307754.1A
Other languages
Chinese (zh)
Inventor
王红
韩书
李威
庄鲁贺
张慧
王正军
杨雪
杨杰
滑美芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202110307754.1A priority Critical patent/CN113076754A/en
Publication of CN113076754A publication Critical patent/CN113076754A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a false comment detection method and a false comment detection system based on knowledge integration, wherein the method comprises the following steps: obtaining comment data to be detected; carrying out false detection on the comment data to be detected by adopting a false comment detection model; the detection false comment model extracts text embedding features based on a knowledge embedding unit, extracts context embedding features based on a depth embedding unit, fuses the text embedding features and the context embedding features, and performs false detection by adopting the fusion embedding features. According to the method, the text embedding characteristics and the context characteristics of the comment data are integrated, so that the accuracy of characteristic semantic expression is enhanced, and the detection precision of the false comment is improved.

Description

False comment detection method and system based on knowledge integration
Technical Field
The invention belongs to the technical field of big data analysis, and particularly relates to a false comment detection method and system based on knowledge integration.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Network consumption such as online shopping, hotel reservation, ticket reservation and the like has become the mainstream consumption mode at present due to the advantages of rich content, convenience, good user experience and the like. Potential consumers usually visit the product reviews before deciding whether to consume, but as merchants improve their reputations or compete among merchants, there are often some false reviews, which brings inconvenience to the decision of consumers and is also not beneficial to effective online market supervision.
Related methods for evaluating and screening based on machine learning exist at present, but the existing methods generally only use word bags or psychological language marks to represent features when feature extraction is carried out, and do not consider the context of texts, so that semantic information of comment contents is difficult to capture, and comments cannot be correctly described. Moreover, since the comments are sequences of different lengths and have strong time dependence, the depth model is generally unable to balance the effects of long-term and short-term historical information, which results in a time-consuming and low-accuracy model. Finally, the deep learning model for false comment detection lacks interpretability, people can only see input and output of the model, but the work of the model is difficult to understand, and the trust of people on the model and the improvement of the model effect are influenced.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a false comment detection method and system based on knowledge integration. By integrating text embedding features and context features of comment data, the accuracy of feature semantic expression is enhanced, and the detection precision of false comments is improved.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a false comment detection method based on knowledge integration comprises the following steps:
obtaining comment data to be detected;
carrying out false detection on the comment data to be detected by adopting a false comment detection model;
the detection false comment model extracts text embedding features based on a knowledge embedding unit, extracts context embedding features based on a depth embedding unit, fuses the text embedding features and the context embedding features, and performs false detection by adopting the fusion embedding features.
Further, the knowledge embedding unit comprises a keyword feature unit used for extracting keyword features in the comment data to be detected.
Further, the knowledge embedding unit further comprises an emotion characteristic unit used for extracting emotion characteristics in the comment data to be detected.
Furthermore, the knowledge embedding unit further comprises an N-Gram embedding unit used for extracting text high-dimensional sparse embedding characteristics of the comment data to be detected.
Further, the deep embedding unit comprises a connected one-dimensional convolutional neural network and a long-short term neural network, and is used for extracting the context embedding features.
Further, the one-dimensional convolutional neural network includes a causal convolutional layer, an expansion convolutional layer, and a residual block.
Further, the method further comprises: and explaining the detection false comment model by adopting a LIME model to obtain the characteristics for detecting the false comment.
One or more embodiments of the present invention provide a false comment detection system based on knowledge integration, including:
the data acquisition module is configured to acquire the comment data to be detected;
the false comment detection module is used for carrying out false detection on the comment data to be detected by adopting a false comment detection model; the detection false comment model extracts text embedding features based on a knowledge embedding unit, extracts context embedding features based on a depth embedding unit, fuses the text embedding features and the context embedding features, and performs false detection by adopting the fusion embedding features.
One or more embodiments of the invention provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the knowledge integration-based false comment detection method when executing the program.
One or more embodiments of the present invention provide a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the knowledge integration-based false comment detection method.
The above one or more technical solutions have the following beneficial effects:
by integrating text embedding features and context features of the comment data, the accuracy of feature semantic expression is enhanced. Specifically, the knowledge semantics embedding of the comments to be detected is realized by fusing the keyword features and the emotion features in the comment data and/or the text high-dimensional sparse features of the comment data, so that the accuracy of the false comment detection is improved;
when the false comment is detected, the one-dimensional convolutional neural network and the long-term and short-term neural network are combined for deep embedding, so that the effects of long-term and short-term historical information of the comment are balanced, and the generalization capability of a detection model of the false comment is improved; moreover, the detection model is more stable by fusing the one-dimensional convolution network, the long-term and short-term memory network and the residual error connecting layer together;
by interpreting the detection model, key important words for detecting the false comment can be found, and the detection precision of the subsequent false comment is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow chart of a false comment detection method based on knowledge integration according to an embodiment of the present invention;
FIG. 2 is a diagram of a false comment detection model architecture in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a one-dimensional convolutional neural network according to an embodiment of the present invention;
FIG. 4 is a diagram of a causal dilation convolution according to an embodiment of the present invention;
FIG. 5 is a diagram of a residual network architecture in an embodiment of the present invention;
FIG. 6 is a diagram of the LSTM structure in an embodiment of the present invention;
FIG. 7 is a graph of model accuracy in an embodiment of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
The embodiment discloses a false comment detection method based on knowledge integration, which comprises the following steps as shown in fig. 1:
step 1: obtaining comment data to be detected;
after the comment data to be detected are obtained, preprocessing is carried out on the comment data, and the preprocessing comprises word segmentation, stop word deletion, punctuation mark deletion and the like.
Step 2: and carrying out false detection on the comment data to be detected by adopting a false comment detection model, wherein the false comment detection model extracts text embedding characteristics based on a knowledge embedding unit, extracts context embedding characteristics based on a depth embedding unit, fuses the text embedding characteristics and the context embedding characteristics, and carries out false detection by adopting the fusion embedding characteristics.
The embodiment provides a knowledge-integrated interpretable detection false comment model (EKI-SM) which integrates word embedding characteristics of a group of false comments, so that the vector of the comments contains richer semantic information, and the distance between similar comments is semantically smaller; a one-dimensional convolution network (1-D CNN) is introduced to extract high-dimensional features, and the one-dimensional CNN extracts features from local contents due to convolution operation, so that the representation is more effective; by learning a continuous sequence model from discrete observations with high dimensional features, the efficiency of the model is improved and the problem of reduced convolution performance is addressed. The model fuses 1-D CNN, a long-term short-term memory network (LSTM) and a residual connecting layer together to capture local and global dependencies of sequences, enables a false comment detection model to be more reliable, explains an EKI-SM model by the inspiration of an interpretable deep learning idea, and finds an important vocabulary for detecting false comments.
FIG. 2 shows an overall architecture that can explain the detection of false comment models. In overview, the detection of a false comment model includes four modules: the system comprises a knowledge embedding module, a depth embedding module, a feature fusion module and a classification module. Specifically, we first obtained semantic knowledge from reviews and N-Gram representations of reviews. Then, we input the comments into the deep embedding module to learn deep embedding of the comment sequence. The deep embedding module provides a compact representation form, and context information can be effectively coded. We integrate the outputs of the above three modules for use by the classifier.
The knowledge embedding module comprises a keyword feature unit, an emotion feature unit and an N-gram embedding unit, and is respectively configured to extract keyword features, emotion features and text high-dimensional sparse features in the comment data to be detected. The method comprises the following specific steps:
(1) keyword feature unit: TF-IDF (term frequency-inverse document frequency) word frequency-inverse file frequency. When processing text, the words are converted into vectors that the model can handle, and IF-IDF is one of the solutions to this problem. The importance of a word is proportional to the frequency with which it appears in text (IF) and inversely proportional to the frequency with which it appears in a corpus (IDF).
TF-IDF=TF×IDF (1)
(2) An emotional characteristic unit: this is the ratio of the emotional words to the other words defined in the equation.
Figure BDA0002988547870000051
Wherein emotion-degree is the emotion degree of the comment, emotion words is the number of emotion words in the comment, and all words is the number of words contained in the comment.
As described above, real reviewers often make objective assessments based on their experience. However, false reviewers tend to over-recommend or over-order certain products, and therefore often use more words with vivid emotions, such as "great", "bad", "beautiful", and "poror". Therefore, we use the HowNet emotion dictionary to calculate the number of positive/negative words. Most importantly, the emotional features are more easily obtained through a dictionary.
(3) And an N-Gram embedding unit, wherein the N-Gram is an algorithm based on a statistical language model. It performs a sliding window byte-based operation of size N on the text content to form a sequence of byte segments of length N. Each byte fragment is called a gram. The occurrence frequencies of all the grams are counted and filtered according to a preset threshold to form a key gram list, which is a feature vector of the text, and each type in the gram list is a one-dimensional of the feature.
The model assumes that the appearance of the nth word is only relevant to the preceding N-1 words and not to any other words. The probability of the whole sentence is the product of the probabilities of each word. These probabilities can be obtained by directly calculating the number of simultaneous occurrences of the N words from the corpus. Commonly used N-grams are Uni-Gram, Bi-Gram or Tri-Gram.
And the deep embedding module comprises a coding unit, a one-dimensional convolution unit and a long-short term neural network unit which are respectively configured to perform local context embedding and global context embedding on the comment data to be detected.
The embodiment adopts a one-dimensional convolution (1D-CNN) unit for comment embedding. The network can retain short-term historical information and capture local semantic relationships. In addition, it has advantages such as flexibility in input length, independence of prediction results on future information, and maintenance of effectiveness as network depth increases.
And after GloVe encoding is carried out on the comment data to be detected through the encoding unit, the comment data to be detected is sent to the 1D-CNN for further feature extraction. The 1D-CNN module includes a causal convolutional layer, an expansion convolutional layer, and a residual block (FIG. 3). A causal convolutional layer is a one-dimensional (1D) convolutional layer. The convolutional layer is causal, meaning that future information is not leaked to the past. The dilation convolution layer uses dilation causal convolution in each layer before the output layer.
In fig. 4, the spreading factor of the first layer is 1 and the size of the convolution kernel is 3. The first hidden layer has a dilation factor of 2 and a convolution kernel size of 3. Furthermore, the second hidden layer has a dilation factor of 4 and the convolution kernel size is 3. In addition, the output layer is initially padded with zeros to ensure that the output and input lengths are the same.
We can stack multiple hidden layers to make the causally extended convolutional network deeper, because the model is required to remember different levels of sequence history information.
To further formalize the principle of causal expansion layers, it is assumed that the input sequence is a one-dimensional input sequence X ═ X (X)0,…,xn)∈RnThe output sequence is Y ═ Y0…, y) and the convolution kernel is F ═ F (F)0,f1,…,fq) The sequence causal dilation convolution operation is represented as follows:
Figure BDA0002988547870000071
where k is the convolution kernel size, s is the element in the sequence, X is the input feature, d is the spreading factor, and F (X) represents the convolution operation in the X sequence.
As shown in equation (3), when dealing with large amounts of review data and long temporal connection distances, we construct multiple dilated convolution layers based on causal convolution and further extend the convolution. That is, as the expansion ratio increases, the range of the receptive field increases, indicating that the convolution output is related to long-term history information.
As described above, stacking causal convolutional layers can extract features at multiple levels. At the same time, it makes the neural network (CNN) deeper. However, deeper network structures often suffer from gradient vanishing or gradient explosion problems, which make the model difficult to converge. One way to solve this problem is to introduce richer information to the network. Therefore, the residual block is integrated into the causal extension convolutional layer to improve the generalization capability of CNN. Fig. 5 shows a network structure of the residual block.
Assume the input of the module is X and the output of the causal convolution module is H (X). In addition, two layers of causal expansion convolution networks and nonlinear activation functions are arranged. To improve generalization capability, we normalize the weights of the convolution kernels and set the Dropout layer in the residual block. The output of the entire residual block is represented as follows.
Z=Activation(X+H(X)) (4)
Wherein H (X) represents the output of the causal convolution layer.
In general, the causal expansion convolution and residual connection layers together form a 1D-CNN module that outputs a comment feature embedding matrix.
After 1D-CNN, the long-short term neural network (LSTM) learned the long-term features of the comments.
The LSTM model takes the feature vectors output by the 1D-CNN module as input and combines with the hidden state of the previous step. It is controlled by three gates, which can read, reset and update the history information. Further, the storage unit is for holding history information. FIG. 6 shows the architecture of the LSTM model. CtAre the values of the different steps. This information will have the following effect on the cell: it will be integrated with the incoming connection and the loop connection and will affect the state sent to the next time step. Conceptually, a carry data stream is a technique for modulating the next output and the next state.
Wherein it,ft,otRespectively input, forget and output gates, ctIs a memory cell, ztIs an input vector, htIs a hidden state, W corresponds to the weight of each layer input and output, b is the bias; σ is a sigmoid function, and the range of values is (0, 1). They are calculated using equations 8-12. Let the feature matrix Z ═ { Z ═ Z1,z2,z3…,ztInputting LSTM to carry out iterative computation, and outputting the probability distribution of the next element.
it=σ(Wzizt+Whiht-1+Wciht-1+bi) (5)
ft=σ(Wzfzt+Whfht-1+Wcfht-1+bf) (6)
ct=ftct-1+ittan h(Wzcxt+Whcxt-1+bc) (7)
ot=σ(Wxozt+Whoht-1+Wcoht-1+bo) (8)
ht=ottanh(ct) (9)
The loss function of the LSTM is a negative log-likelihood loss function shown in equation 10:
Figure BDA0002988547870000081
wherein y isiIs a basic fact, yi' is the probability value of the prediction and M is the number of classes.
Therefore, we have obtained comments that embed EKI-SM.
The reason that the present embodiment combines one-dimensional CNN and LSTM is that they have complementary advantages, thereby enabling EKI-SM to inherit their advantages and avoiding their disadvantages. In particular, one-dimensional CNNs process input sequences independently, so they extract features from local input sequences and improve processing efficiency, whereas the LSTM model is the opposite. Our strategy combines the speed and lightness of 1D-CNN with the sequential sensitivity of LSTM. This is particularly useful when the sequence of reviews is too long to be practically processed with LSTM.
Since 1D-CNN processes the input independently, it is insensitive to the order of the sequence (beyond local range, the size of the convolution window). To identify the global features of the reviews, we stack the 1D-CNN with the LSTM, and pass the output of the 1D-CNN module to the LSTM module. This strategy enables us to manipulate longer sequences of comments.
And the feature fusion module is configured to fuse the features obtained by the knowledge embedding module, the N-Gram embedding module and the depth embedding module to obtain a fusion feature vector.
And the classification module is configured to classify the comment data to be detected according to the fusion feature vector.
Since false comment detection belongs to the second class, sigmoid is adopted as an activation function in a classification module for classification.
On the basis of constructing the EKI-SM model, the EKI-SM model is explained by the LIME method, and an important vocabulary for detecting false comments is obtained.
After multiple experiments, if the comment is false, the LIME model predicts that the score is negative and is 1. If the comment is true, the LIME model predicts that the result scores higher positively than negatively. That is, in a real review, there is both positive and negative expression. False comments are more positive expressions. By adopting the LIME model to calculate the importance of the text characteristics in one piece of comment data, the contribution of the main characteristics of the comment data to prediction can be obtained, and important words which can be used for identifying false comments can be obtained according to the contribution value.
Experiment 1
Experiment 1 adopts a hotel data set as a reference data set, and compares and analyzes the detection results of the method of the embodiment with those of other methods to verify the effectiveness of the method of the embodiment.
The hotel dataset consists of two datasets of 800 different polarity reviews, including 400 real reviews and 400 pseudo reviews. Specifically, each data set contains TripAdvisor reviews for 20 hotels, 40 real reviews and 40 false reviews for each hotel. After the data set is acquired, the data is pre-processed, such as word segmentation, stop word deletion, capitalization conversion to lowercase, punctuation and non-alphabetic characters deletion, which eliminates irrelevant information and reduces the size of the data set.
The following model parameters were used in this example:
a keyword module: the first 2000 features were used.
An N-Gram module: Uni-Gram, Bi-Gram and Tri-Gram were used and the first 2000 features were used.
A deep embedding module: first, 300-dimensional word vectors trained by a GloVe model are used, then deep embedding is carried out through an EKI-SM model, local and global context features are obtained, specifically, 64 filters with the size of 3 x 3 are used for extracting the features in a CNN layer, and the number of neurons in two layers of LSTMs is 30 and 60 respectively. Dropout prevents the model from overfitting and its size is set to 0.3. In the deep embedding module, we use Adam for optimization, which utilizes momentum and adaptive learning rate to accelerate convergence speed.
In evaluating the experimental results, five evaluation indices were considered: acc (accuracy), P (accuracy), R (recall), F (F-score) and AUC. The accuracy is also the accuracy, i.e. the ratio of the number of truly correct results to the whole result in the returned results after retrieval. The recall is the ratio of the number of truly correct digits in the search results to the number of truly correct digits in the entire data set (both retrieved and not retrieved). In addition to using accuracy and recall, F-score was also used to integrate these two evaluation metrics. Equations (11) to (14) show the calculation methods of the four indices.
Figure BDA0002988547870000101
Figure BDA0002988547870000102
Figure BDA0002988547870000103
Figure BDA0002988547870000104
Wherein TP represents the number of true positive samples, FP represents the number of true positive samples, FN represents the number of false negative samples, and TN represents the number of true negative samples.
In order to prove the effectiveness of the method of the embodiment, the method of the embodiment and other methods are adopted to respectively detect the false comments, and the obtained experimental results are compared, wherein the false comment detection models adopted by the other methods are as follows:
(1) support Vector Machines (SVM);
(2) convolutional Neural Networks (CNN) take CBOW word vector features as input.
(3) The DFFNN model takes word embedding characteristics obtained by an N-Gram, word2vec model as input.
As can be seen from fig. 7, the false comment detection performance of the method of the present embodiment is superior to that of other models. In addition to the accuracy, it can be seen from table 1 that the model F-score used in the method of this example scores the highest.
Table 1 experimental results of the model
Method Acc AUC F-score
SVM 80.75 0.807 0.808
DFFNN 88.19 0.951 0.882
CNN 84.88 0.911 0.850
EKI-SM 90.75 0.902 0.907
Experiment 2
Experiment 2 based on the method proposed in this embodiment, different fusion features were used to verify the validity of the fusion features proposed in this embodiment. The method comprises the following steps:
(1) EKI-SM (TF-IDF): based on the low-dimensional vector;
(2) EKI-SM (TF-IDF + emotion): and fusion features based on the low-dimensional vectors and the emotional features.
(3) EKI-SM (TF-IDF + emotion + N-Gram): a fusion feature based on a low-dimensional vector, an emotional feature, and a high-dimensional sparse feature (the fusion feature proposed in the present embodiment).
TABLE 2 results of experiments with fusion of different characteristics
Method Acc P R F-score
EKI-SM(TF-IDF) 86.75 86.86 86.75 86.74
EKI-SM(TF-IDF+emotion) 87.88 87.82 87.75 87.74
EKI-SM(TF-IDF+emotion+N-Gram) 90.75 90.75 90.75 90.74
Experimental results show that the high-dimensional sparse feature and the emotional feature are more beneficial to improving the performance of the model.
Experiment 3
Experiment 3 compares false comment detection results of a deep embedding unit using only a CNN model, only an LSTM model, and a CNN + LSTM model under the same input characteristics, and as can be seen from table 3, the performance of the hybrid model (EKI-SM) using CNN + LSTM in combination in this embodiment is the best.
TABLE 3 results of the different models
Method Acc P R F-score
CNN 90.25 90.27 90.25 90.24
LSTM 87.75 87.82 87.75 87.74
EKI-SM 90.75 90.75 90.75 90.74
From the table, the CNN and LSTM models use the TF-IDF and N-Gram extracted features as input for detecting false comments.
Example two
The embodiment aims to provide a false comment detection system based on knowledge integration. The system comprises:
the data acquisition module is configured to acquire the comment data to be detected;
the false comment detection module is used for carrying out false detection on the comment data to be detected by adopting a false comment detection model; the detection false comment model extracts text embedding features based on a knowledge embedding unit, extracts context embedding features based on a depth embedding unit, fuses the text embedding features and the context embedding features, and performs false detection by adopting the fusion embedding features.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the false comment detection method based on knowledge integration according to an embodiment.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the false comment detection method based on knowledge integration according to an embodiment one.
The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first or second embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
In one or more embodiments, by integrating text embedding features and context features of comment data, mining of various knowledge information is achieved, accuracy of feature semantic expression is enhanced, and the long-term and short-term historical information effects of comments are balanced by combining a one-dimensional convolutional neural network and a long-term and short-term neural network, so that generalization capability of a detection model of false comments is improved.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A false comment detection method based on knowledge integration is characterized by comprising the following steps:
obtaining comment data to be detected;
carrying out false detection on the comment data to be detected by adopting a false comment detection model;
the detection false comment model extracts text embedding features based on a knowledge embedding unit, extracts context embedding features based on a depth embedding unit, fuses the text embedding features and the context embedding features, and performs false detection by adopting the fusion embedding features.
2. The false comment detection method based on knowledge integration according to claim 1, wherein the knowledge embedding unit comprises a keyword feature unit for extracting keyword features in the comment data to be detected.
3. A false comment detection method based on knowledge integration according to claim 2, wherein the knowledge embedding unit further comprises an emotional feature unit for extracting emotional features in the comment data to be detected.
4. The knowledge integration-based false comment detection method as claimed in claim 2 or 3, wherein the knowledge embedding unit further comprises an N-Gram embedding unit for extracting text high-dimensional sparse embedding features of the comment data to be detected.
5. The knowledge-integration-based false comment detection method of claim 1, wherein the deep embedding unit comprises a connected one-dimensional convolutional neural network and a long-short term neural network for extracting context-embedded features.
6. The knowledge-integration-based false comment detection method of claim 5, wherein the one-dimensional convolutional neural network comprises a causal convolutional layer, an expansion convolutional layer, and a residual block.
7. A knowledge integration based false comment detection method as claimed in claim 1, wherein the method further comprises: and explaining the detection false comment model by adopting a LIME model to obtain the characteristics for detecting the false comment.
8. A false comment detection system based on knowledge integration, comprising:
the data acquisition module is configured to acquire the comment data to be detected;
the false comment detection module is used for carrying out false detection on the comment data to be detected by adopting a false comment detection model; the detection false comment model extracts text embedding features based on a knowledge embedding unit, extracts context embedding features based on a depth embedding unit, fuses the text embedding features and the context embedding features, and performs false detection by adopting the fusion embedding features.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the knowledge integration based false comment detection method of any one of claims 1-7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the false comment detection method based on knowledge integration according to any one of claims 1 to 7.
CN202110307754.1A 2021-03-23 2021-03-23 False comment detection method and system based on knowledge integration Pending CN113076754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110307754.1A CN113076754A (en) 2021-03-23 2021-03-23 False comment detection method and system based on knowledge integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110307754.1A CN113076754A (en) 2021-03-23 2021-03-23 False comment detection method and system based on knowledge integration

Publications (1)

Publication Number Publication Date
CN113076754A true CN113076754A (en) 2021-07-06

Family

ID=76613343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110307754.1A Pending CN113076754A (en) 2021-03-23 2021-03-23 False comment detection method and system based on knowledge integration

Country Status (1)

Country Link
CN (1) CN113076754A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670542A (en) * 2018-12-11 2019-04-23 田刚 A kind of false comment detection method based on comment external information
CN111027686A (en) * 2019-12-26 2020-04-17 杭州鲁尔物联科技有限公司 Landslide displacement prediction method, device and equipment
CN111259140A (en) * 2020-01-13 2020-06-09 长沙理工大学 False comment detection method based on LSTM multi-entity feature fusion
US20200342314A1 (en) * 2019-04-26 2020-10-29 Harbin Institute Of Technology (shenzhen) Method and System for Detecting Fake News Based on Multi-Task Learning Model
CN112183056A (en) * 2020-08-19 2021-01-05 合肥工业大学 Context-dependent multi-classification emotion analysis method and system based on CNN-BilSTM framework

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670542A (en) * 2018-12-11 2019-04-23 田刚 A kind of false comment detection method based on comment external information
US20200342314A1 (en) * 2019-04-26 2020-10-29 Harbin Institute Of Technology (shenzhen) Method and System for Detecting Fake News Based on Multi-Task Learning Model
CN111027686A (en) * 2019-12-26 2020-04-17 杭州鲁尔物联科技有限公司 Landslide displacement prediction method, device and equipment
CN111259140A (en) * 2020-01-13 2020-06-09 长沙理工大学 False comment detection method based on LSTM multi-entity feature fusion
CN112183056A (en) * 2020-08-19 2021-01-05 合肥工业大学 Context-dependent multi-classification emotion analysis method and system based on CNN-BilSTM framework

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张恒: "基于深度学习的虚假评论识别方法研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
US20220180073A1 (en) Linguistically rich cross-lingual text event embeddings
Gao et al. Convolutional neural network based sentiment analysis using Adaboost combination
CN111914067B (en) Chinese text matching method and system
WO2017193685A1 (en) Method and device for data processing in social network
Mehmood et al. A precisely xtreme-multi channel hybrid approach for roman urdu sentiment analysis
Mitrović et al. nlpUP at SemEval-2019 task 6: A deep neural language model for offensive language detection
Grzegorczyk Vector representations of text data in deep learning
CN112818118A (en) Reverse translation-based Chinese humor classification model
CN112784532A (en) Multi-head attention memory network for short text sentiment classification
Srikanth et al. Sentiment analysis on COVID-19 twitter data streams using deep belief neural networks
Luo et al. Research on Text Sentiment Analysis Based on Neural Network and Ensemble Learning.
CN111507093A (en) Text attack method and device based on similar dictionary and storage medium
Jin et al. Multi-label sentiment analysis base on BERT with modified TF-IDF
Hamzah et al. The detection of sexual harassment and chat predators using artificial neural network
Anass et al. Deceptive opinion spam based on deep learning
Mansour et al. Text vectorization method based on concept mining using clustering techniques
Cao Learning meaning representations for text generation with deep generative models
Gao et al. Chinese causal event extraction using causality‐associated graph neural network
CN111723572A (en) Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM
CN113076754A (en) False comment detection method and system based on knowledge integration
Phat et al. Vietnamese text classification algorithm using long short term memory and Word2Vec
Nagelli et al. Optimal Trained Bi-Long Short Term Memory for Aspect Based Sentiment Analysis with Weighted Aspect Extraction
Koksal Artificial Intelligence-Based Categorization of Healthcare Text
Zhang Sentiment analysis and web development of movie reviews using naive bayes and LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210706

RJ01 Rejection of invention patent application after publication