CN118036602A

CN118036602A - False comment identification method and device

Info

Publication number: CN118036602A
Application number: CN202311022030.8A
Authority: CN
Inventors: 周策; 蓬蕾; 程博; 黄忠礼
Original assignee: Guangdong Piston Intelligence Technology Co ltd
Current assignee: Guangdong Piston Intelligence Technology Co ltd
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2024-05-14

Abstract

The invention relates to the technical field of natural language processing, and discloses a false comment identification method and device. According to the method, related comment data of the comment to be measured are obtained, the account behavior information of the publisher of the comment to be measured is obtained in a preset database, whether the account information of the publisher of the comment to be measured and the account behavior information of the publisher are missing or not is detected, and therefore a proper false comment identification model is selected. The method comprises the steps of taking a judging result of whether a comment to be measured is in a false comment high-incidence time period, judging probability that comment content of the comment to be measured is a false comment, market and product data related to comment vehicle types, and related information of a publisher account and comments as input, obtaining probability that the comment to be measured is the false comment by using a false comment identification model, and comparing the probability with a preset probability threshold value to judge whether the comment to be measured is the false comment; the invention can enhance the intelligence of model identification, improve the accuracy of identifying false comments and save labor cost.

Description

False comment identification method and device

Technical Field

The invention relates to the technical field of natural language processing, in particular to a false comment identification method and device.

Background

In current car purchasing processes, consumers increasingly rely on information on online platforms to guide their purchasing decisions. However, in various car-related forums, hot chat, public praise, social groups and other consumer speaking boards, the phenomenon of influencing the discussion direction, disturbing the impression of the consumer on the product and even misleading the decision of the consumer is frequent by taking false comments as means. The variety of the related false comments of the automobile is various, and the product is exaggerated, praise and praise or maliciously blackened competitors and the like are covered. In addition, the existence of false comments not only distorts the fairness of market competition, but also negatively affects the image of the whole industry and the trust of consumers, and also affects the trust degree of consumers on the platform. Therefore, the false comment identification has positive significance for protecting consumer rights, promoting industry development and maintaining platform reputation, not only can help consumers make intelligent vehicle purchasing decisions, but also can promote fair competition of markets, and can promote healthy development of network platforms.

The existing method for identifying false comments is designed for social media or network malls, is not suitable for comments related to automobiles, and comprises the steps of identifying network false accounts, determining the comments of the false accounts as false comments, and identifying the false comments from the aspects of behavior data such as posting quantity, posting frequency/length, scoring score rules, scoring times and the like of commentators, merchants and comment content three parties. However, when the network false account is identified, a relationship network is generally established by using an account interaction relationship to extract account characteristic information, and the comment text is excessively dependent on the behavior characteristics of the commentator, so that the attention of the commentary text is insufficient, and the false degree of the commentary is difficult to evaluate when the account information issued by the commentator is missing; in addition, the existing identification methods all need to label false account numbers or false comments manually to generate modeling data set training models, a large amount of training data is needed due to large difference of comments, more manpower is also needed, and the quality of the models is affected by manual labeling, so that accuracy is reduced.

Disclosure of Invention

The invention provides a false comment identification method and device, which can enhance the intelligence of model identification, improve the accuracy of false comment identification and save labor cost.

In order to solve the technical problems, the invention provides a false comment identification method, which comprises the following steps:

comment data of comments to be measured are obtained; the comment data comprise comment content, publisher account information, platform information, vehicle type information and comment publishing time;

judging whether the comment to be tested is in a false comment high-sending time period according to platform information, vehicle type information and comment posting time of the comment to be tested, and obtaining a first judging result;

judging the authenticity of the comment content of the comment to be tested by using a preset exclusive model of the automobile comment to obtain a first probability that the comment content is a false comment;

selecting a false comment identification model according to account information of a publisher of the comment to be detected; wherein the false comment recognition model comprises a first recognition model and a second recognition model;

obtaining the false comment probability of the comment to be tested according to comment data, a first judgment result and a first probability of the comment to be tested by using the false comment identification model;

And when the probability of the false comment is larger than a preset probability threshold, determining that the comment to be tested is a false comment.

The method comprises the steps of obtaining relevant comment data of comments to be measured, identifying whether the comments to be measured are in a false comment high-incidence time period by using the comment data to obtain a first judgment result, and judging comment authenticity of comment contents of the comments to be measured by using a preset automobile comment exclusive model to obtain a first probability; according to account information of a publisher of the comment to be measured, a corresponding false comment identification model can be selected, comment data of the comment to be measured, a first judgment result and a first probability are input as models, the probability that the comment to be measured is the false comment is obtained by using the models, and the probability is compared with a preset probability threshold, so that whether the comment to be measured is the false comment is judged; and analyzing all aspects of the comment to be tested, and improving the relevance of the data, so that the accuracy of identifying the false comment is improved, and the labor cost is saved.

Further, according to the platform information, the vehicle type information and the comment posting time of the comment to be detected, judging whether the comment to be detected is in a false comment high-sending time period, and obtaining a first judgment result, specifically:

According to platform information and vehicle type information of comments to be measured, determining that a vehicle type corresponding to the comments to be measured is a first vehicle type, and determining that a posting platform of the comments to be measured is a first platform;

acquiring all comment data of a first vehicle type on a first platform from a preset database;

Extracting comment features of a first vehicle model on a first platform, and forming a first time sequence according to the release time of each comment; the comment features comprise comment quantity, comment average word number and comment emotion tendencies;

Analyzing the first time sequence to obtain a plurality of breakpoint positions of the first time sequence;

dividing the first time sequence according to the breakpoint positions to obtain a plurality of time periods and comment feature distribution of each time period;

Comparing the comment feature distribution of each time period with a preset first feature distribution, and determining a false comment high-incidence time period of the first vehicle type;

And judging whether the comment to be tested is in a false comment high-sending time period of the first vehicle type according to comment posting time of the comment to be tested, and obtaining a first judging result.

According to the method, whether the comment to be tested is in the false comment high-emission time period is analyzed, specific comment data are obtained in a preset database according to the first vehicle type and the first platform corresponding to the comment to be tested, comment features are extracted, a first time sequence is formed, the false comment high-emission time period of the first vehicle type in the first platform is obtained through analyzing the comment features of the first time sequence, therefore, a first judgment result is obtained according to comment release time of the comment to be tested, and when the false comment high-emission time period is determined, two dimensions of the platform and the vehicle type are considered at the same time, so that a more accurate judgment result can be obtained.

Further, the preset model dedicated to the automobile comments specifically includes:

selecting a large language model as a basic model;

comment data of all vehicle types on all platforms in a preset database are obtained;

Analyzing all comment posting time in a preset database to obtain false comment high-sending time periods and normal time periods of each vehicle type on each platform;

Analyzing all comment issuing accounts in a preset database to obtain false accounts, suspected false accounts and non-false accounts of each vehicle type on each platform;

selecting comments of all false accounts from all comments in a false comment high-sending time period to form a false comment data set;

Removing all false account numbers and comments of suspected false account numbers from all comments in a normal time period to form a real comment data set;

Combining the false comment data set and the true comment data set to form a modeling data set;

Dividing the modeling data set into a training set, a verification set and a test set according to a preset proportion;

Training the basic model by using the modeling data set, and training model parameters of the basic model by using a cross entropy loss function according to the result of the verification set;

After training is completed, determining model parameters of the basic model to form an exclusive model of the automobile comment.

According to the method, whether comment data in the preset database are in a false comment high-transmission time period or not and whether the comment data are false data or not is analyzed, so that a false comment data set and a real comment data set can be obtained, and a modeling data set is formed; the method comprises the steps of selecting a large language model as a basic model, training the basic model by using a modeling data set, training model parameters by using a cross entropy loss function, forming an exclusive model of the automobile comment after model training is completed, judging whether comment texts of comments to be detected are false comments, obtaining the first probability that the comments to be detected are false comments, and improving recognition accuracy.

Further, the analyzing all comments in the preset database issues accounts to obtain false accounts, suspected false accounts and non-false accounts of each vehicle type on each platform, specifically comprises the following steps:

searching behavior information of the comment issuing account in a preset database according to the comment issuing account;

obtaining a first account result according to the comment quantity in the behavior information of the comment posting account; wherein the first account result includes a false account, a suspected false account, and a non-false account;

obtaining a second account result according to comment content in the behavior information of the comment posting account; wherein the second account result includes a false account, a suspected false account, and a non-false account;

And judging whether the comment issuing account is a false account or not according to the first account result and the second account result of the comment issuing account.

According to the method, whether the comment posting account is a false account is judged, the behavior information of the comment posting account is firstly obtained from the preset database, the possibility that the comment posting account is the false account is analyzed from the two aspects of the comment quantity and the comment content of the behavior information of the comment posting account, the comment posting account is judged according to the two analysis results, the comment quantity and the text content are used for evaluation, the misjudgment rate can be reduced, and the accuracy of identifying the false account is improved.

Further, the first account result is obtained according to the comment quantity in the behavior information of the comment issuing account, specifically:

acquiring all comment data from a preset database;

Dividing all comment time in a preset database into a plurality of time segments according to a preset time interval;

determining comment quantity of each account in each time period in a preset database, and calculating account level comment quantity average value, comment quantity median and comment quantity standard deviation of each time period;

according to the behavior information of the comment posting account, the comment quantity of the comment posting account in the plurality of time periods is obtained;

And determining a first account result of the comment issuing account according to the comment quantity average value, the comment quantity median and the comment quantity standard deviation of each time period and the comment quantity of the comment issuing account in each time period.

According to the method, the possibility that the comment posting account is a false account is analyzed in terms of the comment quantity of the behavior information of the comment posting account, all comment data are acquired in a preset database, all comment time is divided into a plurality of time periods, the comment quantity mean value, the comment quantity median value and the comment quantity standard deviation of each time period are calculated respectively, and the comment quantity of the comment posting account in each time period is acquired, so that the possibility that the comment posting account is the false account is obtained, and the evaluation result can be simply and quickly obtained through analysis from the view of the comment quantity.

Further, the step of issuing comment content in the account behavior information according to the comment to obtain a second account result, specifically:

calculating the text similarity of all the first comments in the behavior information of the comment posting account;

calculating the text similarity and topic relativity of all the replied comments and corresponding postings in the behavior information of the comment posting account;

And determining a second account result of the comment posting account according to the text similarity of the first comment in the behavior information of the comment posting account, the text similarity of the posting comment and the corresponding original post and the topic correlation.

According to the method, the possibility that the comment posting account is a false account is analyzed from the aspect of comment content of behavior information of the comment posting account, the text similarity of the first comment of the comment posting account and the text similarity and topic relevance of the reply comment and the corresponding original post are calculated, the text similarity and topic relevance are analyzed from the two angles of the first comment and the reply comment respectively, the possibility that the comment posting account is the false account is obtained, and the evaluation accuracy can be improved by analyzing from the aspect of the text content of the comment.

Further, the false comment identification model is selected according to account information of a publisher of the comment to be detected, specifically:

detecting whether the account information of the publisher and the account behavior information of the publisher of the comment to be detected are complete;

When the account information of the publisher and the account behavior information of the publisher of the comment to be measured are complete, selecting a first recognition model as a false comment recognition model of the comment to be measured; the first recognition model is formed by training the modeling data set and the account data set; the account data set comprises comment posting accounts in all comment data in a preset database;

When the account information of the publisher of the comment to be detected or the account behavior information of the publisher is missing, selecting the second recognition model as a false comment recognition model of the comment to be detected; wherein the second recognition model is trained from the modeling data set.

The first recognition model is trained by using the modeling data set and the account data set, so that when the account information of the publisher and the account behavior information of the publisher of the comment to be detected are complete, the accuracy of recognition can be improved by selecting the first recognition model for recognition; the second recognition model is trained by using the modeling data set, so that when the account information of the publisher of the comment to be detected or the account behavior information of the publisher is missing, the second recognition model can be selected for recognition, and the intelligence of model recognition is improved.

Further, the false comment recognition model is used for obtaining the false comment probability of the comment to be measured according to the comment data, the first judgment result and the first probability of the comment to be measured, specifically:

according to the input requirement of the false comment recognition model, carrying out data preprocessing on comment data of the comment to be detected, a first judgment result and a first probability to form model input data;

inputting the model input data to the false comment recognition model;

In the false comment recognition model, extracting features of the model input data to form a model intermediate result;

And predicting whether the comment is false comment or not by using the model intermediate result, and generating and outputting false comment probability of the comment to be detected.

The invention provides a false comment identification method, which comprises the steps of obtaining relevant comment data of a comment to be tested, obtaining account behavior information of a publisher of the comment to be tested in a preset database, and detecting whether the account information of the publisher of the comment to be tested and the account behavior information of the publisher are missing or not, so that a proper false comment identification model is selected. The method comprises the steps of taking a judging result of whether a comment to be measured is in a false comment high-incidence time period, judging probability that comment content of the comment to be measured is a false comment, market and product data related to comment vehicle types, and related information of a publisher account and comments as input, obtaining probability that the comment to be measured is the false comment by using a false comment identification model, and comparing the probability with a preset probability threshold value to judge whether the comment to be measured is the false comment; the invention can enhance the intelligence of model identification, improve the accuracy of identifying false comments and save labor cost.

Correspondingly, the invention provides a false comment identification device, which comprises: the device comprises an information acquisition module, a judgment module, a probability acquisition module, a selection module, an identification module and a determination module;

The information acquisition module is used for acquiring comment data of comments to be tested; the comment data comprise comment content, publisher account information, platform information, vehicle type information and comment publishing time;

the judging module is used for judging whether the comment to be tested is in a false comment high-sending time period according to platform information, vehicle type information and comment posting time of the comment to be tested, and obtaining a first judging result;

the probability acquisition module is used for judging the authenticity of the comment content of the comment to be detected by utilizing a preset exclusive model of the automobile comment to obtain a first probability that the comment content is a false comment;

The selecting module is used for selecting a false comment identification model according to account information of a publisher of the comment to be detected; wherein the false comment recognition model comprises a first recognition model and a second recognition model;

The identification module is used for utilizing the false comment identification model to obtain false comment probability of the comment to be detected according to comment data of the comment to be detected, a first judgment result and a first probability;

and the determining module is used for determining that the comment to be detected is a false comment when the false comment probability is larger than a preset probability threshold value.

Further, the judging module includes: the device comprises a determining unit, a data acquisition unit, a sequence forming unit, a breakpoint acquisition unit, a segmentation unit, a comparison unit and a judgment unit;

The determining unit is used for determining that the vehicle type corresponding to the comment to be measured is a first vehicle type and the posting platform of the comment to be measured is a first platform according to the platform information and the vehicle type information of the comment to be measured;

The data acquisition unit is used for acquiring all comment data of the first vehicle type on the first platform from a preset database;

the sequence forming unit is used for extracting comment features of a first vehicle type on a first platform and forming a first time sequence according to the release time of each comment; the comment features comprise comment quantity, comment average word number and comment emotion tendencies;

the breakpoint acquisition unit is used for analyzing the first time sequence and acquiring a plurality of breakpoint positions of the first time sequence;

The dividing unit is used for dividing the first time sequence according to the breakpoint positions to obtain a plurality of time periods and comment feature distribution of each time period;

The comparison unit is used for comparing the comment feature distribution of each time period with a preset first feature distribution and determining a false comment high-incidence time period of the first vehicle type;

the judging unit is used for judging whether the comment to be tested is in a false comment high-sending time period of the first vehicle type according to comment posting time of the comment to be tested, and obtaining a first judging result.

The invention provides a false comment recognition device which is based on the organic combination among modules, and can enhance the intelligence of model recognition, improve the accuracy of recognizing false comments and save labor cost.

Drawings

FIG. 1 is a flow chart of an embodiment of a method for false comment identification provided by the present invention;

fig. 2 is a schematic structural diagram of an embodiment of a false comment identifying apparatus provided by the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In describing the present invention, it should be understood that the terms are used herein as follows:

"vehicle model": the vehicle enterprise gives names to vehicles of the same type, brand, body form, category, and family.

"Lifecycle": the whole process of pushing out products produced by a vehicle enterprise under a certain technology and design to promotion and then exiting the market covers the stages of design development, production and manufacture, sales promotion, medium/small change and the like of the vehicle model until the stage of replacing the current generation of products by the new generation of products.

"LLM": a powerful natural language processing model based on GPT architecture training uses deep learning technology, particularly a Transformer (Transformer) model, potential structures and rules of languages are learned through a large-scale pre-training and fine-tuning process, and secondary development is performed by using a small amount of corpus on the basis of the potential structures and rules, so that the model can be suitable for specific tasks or fields.

Example 1

Referring to fig. 1, a flow chart of an embodiment of a false comment identifying method provided by the present invention, where the method includes steps 101 to 105, and the steps are specifically as follows:

step 101: comment data of comments to be measured are obtained; the comment data comprises comment content, publisher account information, platform information, vehicle type information and comment publishing time.

Step 102: and judging whether the comment to be tested is in a false comment high-sending time period according to the platform information, the vehicle type information and the comment posting time of the comment to be tested, and obtaining a first judgment result.

Further, in the first embodiment of the present invention, according to platform information, vehicle type information and comment posting time of a comment to be detected, whether the comment to be detected is in a false comment high-sending time period is judged, and a first judgment result is obtained, which specifically includes:

In the first embodiment of the invention, whether the comment to be tested is in the false comment high-incidence time period is analyzed, specific comment data is obtained in a preset database according to a first vehicle type and a first platform corresponding to the comment to be tested, comment features are extracted, a first time sequence is formed, the false comment high-incidence time period of the first vehicle type in the first platform is obtained by analyzing the comment features of the first time sequence, so that a first judgment result is obtained according to comment release time of the comment to be tested, and two dimensions of the platform and the vehicle type are considered simultaneously when the false comment high-incidence time period is determined, so that a more accurate judgment result can be obtained.

As an example of the first embodiment of the present invention, after all comment data of the first vehicle model on the first platform is obtained in the preset database, according to each comment posting time, various features of the posted comments, such as the number, the average word number, the emotion tendency, etc., may be calculated according to a preset time interval (for example, daily), so as to form a first time sequence of comment features. Since the life cycle and propaganda of the product can influence the posting quantity or comment heat, the changes caused by the life cycle, the propaganda and the like of the automobile need to be deducted after the first time sequence is formed, and then the time point of structural change of the first time sequence is determined as a breakpoint. For example, the vector autoregressive model and the breakpoint regressive model in the time sequence technology can be used simultaneously, so that the time point when the mutation of the time sequence structure occurs can be automatically found while the comment feature changes caused by the life cycle, the broadcasting and the like of the automobile are deducted, and specifically:

Wherein, Representing the value of the first time sequence at the time t; /(I)Representing a breakpoint variable, wherein the value of the variable is 0 before the moment T, and then the value of the variable is 1; /(I)As coefficients, the coefficients take different values before and after the break point; /(I)A vector of covariates; /(I)Is a coefficient of a vector of covariates.

The model can be fitted by optimizing a cost function to obtain estimated values of parameters and breakpoint positions, wherein one of the cost functions can be written as follows:

where K is a kernel function used to control the weight of each point in time.

After the positions of the break points are determined by using the model, the first time sequence can be divided into a plurality of time periods with comment feature distribution, the comment feature distribution of each time period is compared with the normal feature distribution, and when the comparison result exceeds a threshold value and the number, the word number or the ratio of extreme emotion tendencies in the comment features is obviously higher than that of the time periods of the normal distribution, the time periods are considered to be false comment high-incidence time periods. Among them, methods of alignment include, but are not limited to, KL divergence, wasserstein distance, and total variation distance.

Step 103: and judging the comment authenticity of the comment content of the comment to be tested by using a preset exclusive model of the automobile comment, and obtaining a first probability that the comment content is a false comment.

Further, in the first embodiment of the present invention, the preset model dedicated to car reviews is specifically:

selecting a large language model as a basic model;

In the first embodiment of the invention, a false comment data set and a real comment data set can be obtained by analyzing whether comment data in a preset database is in a false comment high-transmission time period or not and whether the comment data is false data, so as to form a modeling data set; the method comprises the steps of selecting a large language model as a basic model, training the basic model by using a modeling data set, training model parameters by using a cross entropy loss function, forming an exclusive model of the automobile comment after model training is completed, judging whether comment texts of comments to be detected are false comments, obtaining the first probability that the comments to be detected are false comments, and improving recognition accuracy.

As an example of the first embodiment of the present invention, a basic model may be obtained after initial parameters of a large language model such as chatGPT are loaded, a false comment data set and a true comment data set are obtained, the two sets are combined to form a modeling data set, the true comment and the false comment in the modeling data set are input into the basic model, a model misjudgment sample is marked as a model misjudgment sample, a model misjudgment data set is formed, the model misjudgment data set is weighted and combined with the modeling data set, and then divided into a training set, a verification set and a test set according to a ratio of 6:2:2. Note that similarity of comment distribution (i.e., true comments, false comments, misjudgment sample proportions are similar) is maintained in the training set, validation set, and test set. The cross entropy loss function is used, model parameters are adjusted through back propagation and optimization algorithms (such as random gradient descent), and indexes such as accuracy, recall rate, F1 score and the like and the change condition of the loss function are calculated. And (3) optimizing and improving the model according to the result of the verification set, such as adjusting the learning rate, adjusting the model architecture and the like, so that the model can better distinguish real comments from false comments and becomes a special model for the automobile comments for judging the authenticity of the automobile comments.

Further, in the first embodiment of the present invention, all comment issuing accounts in a preset database are analyzed to obtain a false account number, a suspected false account number and a non-false account number of each vehicle type on each platform, specifically:

In the first embodiment of the invention, whether the comment posting account is a false account is judged, behavior information of the comment posting account is firstly obtained in a preset database, possibility that the comment posting account is the false account is analyzed from two aspects of comment quantity and comment content of the behavior information of the comment posting account, and when a first account result of the comment posting account is the false account, or a second account result is the false account, or both the first account result and the second account result are the suspected false account, the comment posting account is the false account by combining the two analysis results; when one of the first account result or the second account result of the comment posting account is a suspected false account, the comment posting account is a suspected false account; when the first account result and the second account result of the comment posting account are both non-false accounts, the comment posting account is a non-false account. The invention evaluates by utilizing the relevance of the comment quantity and the text content, can reduce the misjudgment rate and improves the accuracy of identifying false account numbers.

Further, in the first embodiment of the present invention, according to the number of comments in the behavior information of the comment posting account, a first account result is obtained, which specifically includes:

acquiring all comment data from a preset database;

In the first embodiment of the invention, the possibility that the comment posting account is a false account is analyzed in terms of the comment quantity of the behavior information of the comment posting account, all comment data are acquired in a preset database, all comment time is divided into a plurality of time periods, the comment quantity average value, the comment quantity median and the comment quantity standard deviation of each time period are respectively calculated, and the comment quantity of the comment posting account in each time period is acquired, so that the possibility that the comment posting account is the false account is obtained, and the evaluation result can be simply and quickly obtained by analyzing from the viewpoint of the comment quantity.

As an example of the first embodiment of the present invention, the comment quantity average value of the time period t can be obtained through calculationComment quantity median/>And comment quantity standard deviation/>Simultaneously acquiring comment quantity/>, of comment posting account in time period tWhen/>+3*/></></>+3*/>The comment issuing account is noted as a suspected false account in a time period t; when/>>/>+3*/>The comment posting account is a false account in the time period t. After the false labels of the comment posting accounts in the time periods are obtained, setting a% proportion and b% proportion respectively, and determining that a first account result of the comment posting account is a false account when the comment posting account is marked as the false account in the time period of a% or marked as the suspected false account in the time period of b%.

Further, in the first embodiment of the present invention, according to the comment content in the behavior information of the comment publishing account, a second account result is obtained, which specifically includes:

In the first embodiment of the invention, the possibility that the comment posting account is a false account is analyzed from the aspect of comment content of behavior information of the comment posting account, the text similarity of the first comment of the comment posting account and the text similarity and topic relevance of the reply comment and the corresponding original post are calculated, the text similarity and topic relevance are analyzed from the aspects of the first comment and the reply comment respectively, the possibility that the comment posting account is the false account is obtained, and the evaluation accuracy can be improved by analyzing from the aspect of the text content of the comment.

As an example of the first embodiment of the present invention, all the first comments in the behavior information of the comment posting accounts are input to the LLM to obtain the text similarity of each account postAll the replied comments in the behavior information of the comment posting account and the content of the corresponding original post are input into the LLM to obtain text similarity/>Topic relevance/>Analyzing the similarity of the text of the first comment of all accounts in a preset database and the text similarity and topic relativity of the replied comments of all accounts and corresponding postings, and obtaining the median/>, of the similarity of the text of the first commentAnd standard deviation/>Median/>, of text similarity of posting comments to corresponding original postsAnd standard deviation/>And median/>, of relevance of the posting comments to the topic corresponding to the original postingAnd standard deviation/>. When/>，/>And/>And if any one of the above is true, determining that the second account result of the comment posting account is a false account.

Step 104: selecting a false comment identification model according to account information of a publisher of the comment to be detected; wherein the spurious comment recognition model includes a first recognition model and a second recognition model.

Further, in the first embodiment of the present invention, a false comment identification model is selected according to account information of a publisher of a comment to be measured, specifically:

In the first embodiment of the present invention, an account data set may be formed by acquiring comment posting accounts in all comment data in a preset database; by using any classification model architecture (such as a neural network, logistic regression and the like) as a basic model of a false comment identification model, a first identification model for predicting false comments according to the account information data of the reviewer can be obtained by using a modeling data set and an account data set training basic model, and a second identification model for predicting false comments without the account information of the reviewer can be obtained by using the modeling data set training basic model. Therefore, when the account information of the publisher and the account behavior information of the publisher of the comment to be detected are complete, the accuracy of identification can be improved by selecting the first identification model for identification; when the account information of the publisher or the account behavior information of the publisher of the comment to be detected is missing, the second recognition model can be selected for recognition, and the recognition result is better than that of the account information which is treated as missing and the first recognition model is used, because the second recognition model partially compensates the missing of the account information by increasing the weight of other information, thereby improving the recognition accuracy.

Step 105: and obtaining the false comment probability of the comment to be detected according to the comment data, the first judgment result and the first probability of the comment to be detected by using the false comment identification model.

Further, in the first embodiment of the present invention, the false comment recognition model is used to obtain the false comment probability of the comment to be measured according to the comment data, the first judgment result and the first probability of the comment to be measured, where the false comment probability is specifically:

inputting the model input data to the false comment recognition model;

Step 106: and when the probability of the false comment is larger than a preset probability threshold, determining that the comment to be tested is a false comment.

In summary, the first embodiment of the invention provides a false comment identification method, which comprises the steps of obtaining relevant comment data of a comment to be tested, obtaining account behavior information of a publisher of the comment to be tested in a preset database, and detecting whether the account information of the publisher of the comment to be tested and the account behavior information of the publisher are missing or not, so that a proper false comment identification model is selected. The method comprises the steps of taking a judging result of whether a comment to be measured is in a false comment high-incidence time period, judging probability that comment content of the comment to be measured is a false comment, market and product data related to comment vehicle types, and related information of a publisher account and comments as input, obtaining probability that the comment to be measured is the false comment by using a false comment identification model, and comparing the probability with a preset probability threshold value to judge whether the comment to be measured is the false comment; the invention can enhance the intelligence of model identification, improve the accuracy of identifying false comments and save labor cost.

Example 2

Referring to fig. 2, a schematic structural diagram of an embodiment of a false comment identifying apparatus provided by the present invention, where the apparatus includes an information obtaining module 201, a judging module 202, a probability obtaining module 203, a selecting module 204, an identifying module 205, and a determining module 206;

the information acquisition module 201 is used for acquiring comment data of comments to be tested; the comment data comprise comment content, publisher account information, platform information, vehicle type information and comment publishing time;

The judging module 202 is configured to judge whether the comment to be tested is in a spurious comment high-sending time period according to platform information, vehicle type information and comment posting time of the comment to be tested, so as to obtain a first judging result;

the probability acquisition module 203 is configured to determine the authenticity of a comment content to be tested by using a preset model exclusive for car comments, so as to obtain a first probability that the comment content is a false comment;

The selecting module 204 is configured to select a false comment identification model according to account information of a publisher of the comment to be tested; wherein the false comment recognition model comprises a first recognition model and a second recognition model;

The recognition module 205 is configured to obtain a false comment probability of the comment to be measured according to comment data of the comment to be measured, a first judgment result and a first probability by using the false comment recognition model;

the determining module 206 is configured to determine that the comment to be tested is a false comment when the probability of the false comment is greater than a preset probability threshold.

Further, in a second embodiment of the present invention, a second judging module includes: the device comprises a determining unit, a data acquisition unit, a sequence forming unit, a breakpoint acquisition unit, a segmentation unit, a comparison unit and a judgment unit;

The determining unit is used for determining that the vehicle type corresponding to the comment to be measured is a first vehicle type and the publishing platform of the comment to be measured is a first platform according to the platform information and the vehicle type information of the comment to be measured;

The segmentation unit is used for segmenting the first time sequence according to the breakpoint positions to obtain a plurality of time periods and comment feature distribution of each time period;

Further, in the second embodiment of the present invention, the preset model dedicated to car reviews is specifically:

selecting a large language model as a basic model;

Further, in the second embodiment of the present invention, the analysis of all comment issuing accounts in the preset database obtains a false account number, a suspected false account number and a non-false account number of each vehicle model on each platform, specifically:

Further, in the second embodiment of the present invention, the first account result is obtained according to the number of comments in the behavior information of the comment publishing account, which specifically is:

acquiring all comment data from a preset database;

Further, in the second embodiment of the present invention, the second account result is obtained according to the comment content in the behavior information of the comment publishing account, which specifically is:

Further, in a second embodiment of the present invention, the selecting module includes: the device comprises a detection unit, a first selection unit and a second selection unit;

The detection unit is used for detecting whether the account information of the publisher and the account behavior information of the publisher of the comment to be detected are complete or not;

The first selecting unit is used for selecting the first identification model as a false comment identification model of the comment to be detected when the account information of the publisher and the account behavior information of the publisher of the comment to be detected are complete; the first recognition model is formed by training the modeling data set and the account data set; the account data set comprises comment posting accounts in all comment data in a preset database;

The second selecting unit is used for selecting the second identification model as a false comment identification model of the comment to be detected when the account information of the publisher of the comment to be detected or the account behavior information of the publisher is missing; wherein the second recognition model is trained from the modeling data set.

Further, in the second embodiment of the present invention, the identification module 205 includes: the device comprises a preprocessing unit, an input unit, a feature extraction unit and a prediction unit;

the preprocessing unit is used for preprocessing the comment data of the comment to be detected, the first judging result and the first probability according to the input requirement of the false comment recognition model to form model input data;

the input unit is used for inputting the model input data into the false comment recognition model;

the feature extraction unit is used for extracting features of the model input data in the false comment recognition model to form a model intermediate result;

The prediction unit is used for predicting whether the comment is a false comment or not by using the model intermediate result, and generating and outputting the false comment probability of the comment to be detected.

In summary, the second embodiment of the invention provides a false comment identification device, which is based on the organic combination among modules, acquires relevant comment data of a comment to be tested, acquires the account behavior information of a publisher of the comment to be tested in a preset database, respectively analyzes the comment quantity and comment content of the account behavior information of the publisher, identifies whether the comment to be tested is in a high-incidence time period of the false comment, detects whether the account information of the publisher of the comment to be tested and the account behavior information of the publisher are missing, selects a proper false comment identification model, can obtain the probability that the comment to be tested is the false comment by using the false comment identification model, and compares the probability with a preset probability threshold value to judge whether the comment to be tested is the false comment; the invention can enhance the intelligence of model identification, improve the accuracy of identifying false comments and save labor cost.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims

1. A method of false comment identification, comprising:

2. The false comment identification method according to claim 1, wherein the step of judging whether the comment to be tested is in a false comment high-sending time period according to platform information, vehicle type information and comment posting time of the comment to be tested, so as to obtain a first judgment result, specifically:

3. The false comment identification method according to claim 1, wherein the preset model specific to the car comment specifically includes:

selecting a large language model as a basic model;

4. The false comment identification method according to claim 3, wherein the analyzing all comment posting accounts in the preset database obtains false accounts, suspected false accounts and non-false accounts of each vehicle type on each platform, specifically:

5. The false comment identification method according to claim 4, wherein the step of issuing the number of comments in the account behavior information according to the comment to obtain a first account result specifically includes:

acquiring all comment data from a preset database;

6. The false comment identification method according to claim 4, wherein the step of issuing comment content in the account behavior information according to the comment to obtain a second account result specifically includes:

7. The false comment identification method according to claim 3, wherein the false comment identification model is selected according to account information of a publisher of the comment to be tested, specifically:

8. The false comment identification method according to claim 1, wherein the false comment identification model is used to obtain the false comment probability of the comment to be measured according to comment data of the comment to be measured, the first judgment result and the first probability, and specifically is that:

inputting the model input data to the false comment recognition model;

9. A false comment identifying apparatus, characterized by comprising: the device comprises an information acquisition module, a judgment module, a probability acquisition module, a selection module, an identification module and a determination module;

10. The false comment identification device of claim 9 wherein the determination module includes: the device comprises a determining unit, a data acquisition unit, a sequence forming unit, a breakpoint acquisition unit, a segmentation unit, a comparison unit and a judgment unit;