CN107577759B

CN107577759B - Automatic recommendation method for user comments

Info

Publication number: CN107577759B
Application number: CN201710779830.2A
Authority: CN
Inventors: 李斌; 刘克礼; 万赛罗; 董露露
Original assignee: Anhui Open University
Current assignee: Anhui Open University
Priority date: 2017-09-01
Filing date: 2017-09-01
Publication date: 2021-07-30
Anticipated expiration: 2037-09-01
Also published as: CN107577759A

Abstract

The invention discloses an automatic recommendation method for user comments, which comprises the following steps: step 1: carrying out global usefulness analysis research on the comments according to the comment contents and the behavior of the reviewer, and filtering out a part of useless comments; step 2: mining the comment related information of the comment filtered in the step (1); and step 3: capturing user personalized information, and mining user personalized intention and preference; and 4, step 4: on the basis of the step 2 and the step 3, matching the user intention preference with the corresponding comment, and selecting the comment which meets the user requirement preference; and 5: and sequencing the comments recommended to the user so as to achieve the aim of recommending the acquired useful comments to the user according to the personalized information of the user.

Description

Automatic recommendation method for user comments

Technical Field

The invention relates to the field of network comment screening recommendation, in particular to an automatic user comment recommendation method.

Background

The user comment is a unique language form for expressing the viewpoint of people, and is also an important medium for people to realize and interact, for example, people often want to indirectly acquire the knowledge of objective things by means of the comment and guide the judgment and behavior of the people. With the help of the development of the internet, the number and types of user comments are gradually enriched, the transmission and sharing speed is increased day by day, people can realize efficient thinking and collision by means of subjective texts on the network, and can quickly and conveniently know and understand the objective world by means of knowledge of other people. However, the information derivation and transmission characteristics of 'who can write and who can read' given by the free network cause a great deal of low-quality comments, and due to the lack of manual or machine automatic identification, organization and supervision, people are difficult to obtain high-quality user comments meeting self-knowledge acquisition intentions while shielding low-quality, irrelevant and even 'harmful' (such as 'inciting' comments). At present, the research on recommendation technologies facing user comments at home and abroad is not deep, and especially, the research on the aspects of searching for the automatic evaluation of comment values based on subjective characteristics and combined query intention characteristics and related comment identification, matching, organization and recommendation is blank.

Disclosure of Invention

In order to solve the technical problem, the invention provides an automatic recommendation method for user comments, so as to achieve the purpose of recommending the obtained useful comments to the user according to the personalized information of the user.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a method for automatically recommending user comments comprises the following steps:

step 1: carrying out global usefulness analysis research on the comments according to the comment contents and the behavior of the reviewer, and filtering out a part of useless comments;

step 2: mining the comment related information of the comment filtered in the step (1);

and step 3: capturing user personalized information, and mining user personalized intention and preference;

and 4, step 4: on the basis of the step 2 and the step 3, matching the user intention preference with the corresponding comment, and selecting the comment which meets the user requirement preference;

and 5: the comments recommended to the user are ranked.

Preferably, the method for filtering useless comments in step 1 is to identify useless comments by interaction between reviewers and comments, and the specific steps are as follows: according to the characteristics of the comments, a machine learning method is used for modeling, model training is carried out through enough linguistic data, useless comments are judged and found out, authors of the comments are found through the useless comments, then more useless comments are found through the authors, and the authors are continuously found through the useless comments, so that continuous iteration is carried out, and finally the useless comments are comprehensively and accurately filtered.

Preferably, the filtered comments in step 2 are divided into comments on the commodity and comments on the event, and the mining of the comment related information includes: and extracting the attributes of the commodities in the comments and extracting the events and the related attributes thereof in the comments.

Preferably, the method for extracting the product attributes in the comments comprises the following steps: firstly, selecting a comment, identifying a commodity category targeted by the comment, secondly, finding a plurality of commodities of other same categories according to the commodity category, then excavating descriptive texts of the commodities, then adopting a multi-document keyword extraction method, extracting objects and attributes of the commodities which are usually discussed from the texts, forming a keyword set according to the extracted objects and attributes, finally returning to the comment, and simply matching words appearing in the comment and the keyword set at the same time, wherein the words are used as the objects and the attributes of the comment.

Preferably, the method for extracting events and their related attributes in the comments includes: firstly, carrying out text preprocessing on original comments, secondly, extracting a plurality of trigger words in a training set as original trigger words, manually collecting some trigger words about issuing opinions and attitudes, expanding the trigger words by using a synonym dictionary, then extracting candidate events by combining the result of text preprocessing and the result of trigger word expansion, and obtaining candidate categories to form a training example, describing candidate events from different angles by combining lexical characteristics, context characteristics and dictionary characteristics, and performing binary classification on the candidate events by using a classifier, thereby obtaining the category of the candidate event, finally, investigating the text sentence which is judged as the event and contains the trigger word, obtaining all the entities, time points and the like thereof as candidate elements, the category label is represented by the template corresponding to the event category, therefore, the event element identification is converted into a multi-element classification problem to obtain the event elements and the corresponding roles.

Preferably, the method for capturing the personalized information of the user in step 3 is that heuristic information and rules are used for collecting the personalized information of the user, the personalized information includes the demand and preference of the user for the commodity and the attribute thereof and the demand and preference of the user for comments, and the information helpful for collecting the personalized information includes: the method comprises the following steps of commenting by a user, evaluating comments of other people by the user, purchasing historical information of commodities, providing personal information and other personal information by the user and browsing behavior information of the user.

Preferably, the mining method for the user personalized intention and preference in step 3 comprises: the processing and organization method of the personalized information comprises the following steps of: designing user demand preferences into a data structure, wherein the data of the data structure comprises: user ID as unique identification, < point of interest: interestingness >, comments written by the user and comments deemed helpful by the user, the data of the root moment are subjected to machine learning and data mining to obtain user personalized information, and the self-adaptive tracking and learning of the user personalized information are used for periodically updating and updating the user personalized information in real time.

Preferably, the method for matching the user intention preference with the corresponding comment in step 4 includes: according to the matching of the comment contents and the matching of the relationship network of the commentator and the reader, the matching method according to the comment contents is that a mapping relationship model is established between the personalized data of the user and the recommended comment by utilizing a machine learning method. The system finds a plurality of comments matched with the user, the comments are the comments recommended to the user, the matching method according to the relation network of the reviewers and the readers is to establish the network relation through a machine learning method, and the matching degree between the user and the comments can be evaluated according to the reviewers of the comments.

Preferably, in the step 5, ranking the comments recommended to the user uses a RankSvm ranking algorithm to rank the selected recommended comments, and the features involved in the ranking include: the quality scores of the comments, the user intentions and the features used in the corresponding comments obtained in the global usefulness analysis and the authoritativeness of the comments.

The invention has the following advantages:

(1) the invention deeply detects low-quality comments, analyzes the global usefulness of the comments, identifies useless comments and eliminates the useless comments by an evaluation method of interaction of commentators and the comments, and effectively filters the useless comments by the method.

(2) The invention extracts the attributes of the useful comments and provides effective support for recommending the comments for the user.

(3) The invention deeply excavates the personalized information of the user, analyzes the requirement preference of the user and accurately provides valuable comments for the user.

(4) The method matches the user intention preference with the corresponding comments, and then sorts the matched comments to obtain high-quality personalized comment recommendation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flowchart of a method for automatically recommending user comments, which is disclosed in an embodiment of the present invention;

FIG. 2 is a flow chart of a method for evaluating interaction between a reviewer and a review as disclosed in embodiments of the present invention;

fig. 3 shows a graph of the relationship between < point of interest: interest > mining;

fig. 4 is a flowchart of object-centric event extraction in reviews disclosed in an embodiment of the invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The invention provides an automatic recommendation method for user comments, which has the working principle that high-quality personalized comments are recommended to a user by filtering useless comments and extracting attributes of useful comments and combining with personalized information of the user, so that the aim of accurately recommending the high-quality personalized comments to the user is fulfilled.

The present invention will be described in further detail with reference to examples and specific embodiments.

As shown in fig. 1 to 4, an automatic recommendation method for user comments includes:

and 5: the comments recommended to the user are ranked.

The method for filtering useless comments in the step 1 is to adopt an evaluation method of interaction between commentators and comments to identify the useless comments, and the main idea is as follows: the author of the comment is found through the useless comments, more useless comments are found through the author, the author is continuously found through the useless comments, and therefore continuous iteration is achieved, and finally the useless comments are filtered comprehensively and accurately.

Specifically, we analyze the characteristics of the unwanted commentators, corresponding to unwanted commentary, that they generally have:

(A) tending to issue the same piece of text (or very similar text) in different places

(B) Poor writing style, not strict (corresponding to low quality comments)

(C) The comments of the same person (or class of people) are quite some of them extreme and contrary to the main stream of opinion trends (e.g., cheating, false comment makers).

A reviewer will represent some comments he writes as: r ═ author, reviews >. The evaluation method of the usefulness of the comments in conjunction with the reviewers is shown in FIG. 2. First, a reviewer is associated with the review to form a set S having < author, reviews > as an element. In this set, each reviewer corresponds to an element whose comments are an adjunct to this element. Some comments in the additional set have been identified as being useless comments. The elements in the set S are then clustered, with the features used for clustering mainly from the comment sets of each element, including the content, style, repeatability of comments, whether they have been marked as useless comments, etc., where the feature of whether they have been marked as useless comments has a greater weight. Clustering of the elements in the set S is to cluster the reviewers according to the comments of the reviewers, and the reviewers with similar comments are gathered together in the process to obtain a plurality of categories of the reviewers. Next, each reviewer category is separately determined whether it is a category for "garbage review producers," based primarily on the number of reviews in that category that are marked as garbage reviews and some characteristics of the garbage reviews. For classes that are judged to be garbage comment makers, some of the least likely garbage comments are identified and removed; for the class of the maker of the comments judged to be not useless, the comments which are most probably useless are identified, so that a new set of comments which are judged to be useless is formed, the marks are used for increasing the original corpus marking, and finally, the method returns to the starting point to restart. The whole process is an iterative process, and when the operation result is stable or reaches the preset iteration times, the iteration can be stopped.

Step 2, mining the relevant information of the comment, namely extracting the commodity attribute in the comment, and extracting the event and the relevant attribute thereof in the comment, wherein the commodity attribute in the comment indicates the content subject discussed by the comment, and is shallow information, and the comment and the event are identified by adopting the following method:

(A) for a certain comment, firstly identifying a commodity category for which the comment aims;

(B) finding a plurality of commodities in other same categories according to the commodity category;

(C) descriptive texts for the commodities are mined, the texts are often appeared in shopping websites, and the set of the mined texts is assumed to be D ═ D₁,d₂,…,d_n}。

(D) And extracting the objects and attributes of the commodities, which are usually discussed, from the texts by adopting a multi-document keyword extraction technology. Suppose the extracted keyword set is W ═ W₁,w₂,…,w_m}。

(E) Returning to the comment, a simple match occurs for words that appear both in the comment and in the set of keywords, i.e. as objects and attributes of the object of the comment, whose weights are calculated with boolean weights or by TFIDF:

W(i)＝TF_i*log(N/DF_i)

wherein W (i) represents the weight of attribute i in the comment; TF_iRepresents the number of times attribute i appears in the review; DF (Decode-feed)_iDocument frequency is counted in the document set D for the attribute i; and N is the number of documents in the word document set.

The extraction of events and related attributes in comments can be decomposed into the following two tasks:

(A) event trigger word and event category identification

(B) Identification of event elements

Here, the event trigger word refers to a word that causes an event to occur, and is an important feature for determining the event category; the event element is a participant of an event, is a core part forming the event and constructs a frame of the whole event together with an event trigger word; event trigger words and event elements determine the category and subcategory of an event.

The comment text is different from the general news event text in some ways, and the differences mainly include 1) more emotional words appear in the text, 2) some attitudes of the reviewer are expressed in the event, and 3) special attention needs to be paid to the events of expressing viewpoints and emotions. Accordingly, the invention adopts a machine learning algorithm to solve the two tasks, and the specific steps are as follows: firstly, carrying out text preprocessing on original comments, secondly, extracting a plurality of trigger words in a training set as original trigger words, manually collecting some trigger words about issuing opinions and attitudes, expanding the trigger words by using a synonym dictionary, then extracting candidate events by combining the result of text preprocessing and the result of trigger word expansion, and obtaining candidate categories to form a training example, describing candidate events from different angles by combining lexical characteristics, context characteristics and dictionary characteristics, and performing binary classification on the candidate events by using a classifier, thereby obtaining the category of the candidate event, finally, investigating the text sentence which is judged as the event and contains the trigger word, obtaining all the entities, time points and the like thereof as candidate elements, the category label is represented by the template corresponding to the event category, therefore, the event element identification is converted into a multi-element classification problem to obtain the event elements and the corresponding roles.

Step 3, capturing user personalized information, and collecting the user personalized information by adopting heuristic information and rules, wherein the personalized information comprises two aspects, namely the demand and preference of the user on commodities and attributes thereof on one hand, and the demand and preference of the user on comments on the other hand, the collection of the two aspects of information respectively indirectly and directly serves the recommendation of personalized comments, and similar aspects exist between the two aspects, so that the information is collected together, and the following information is found to be beneficial to the collection of the personalized information through the research on the cognitive rules of people and the webpage environment where the comments are located:

information directly expressing demand preferences:

(A) the user makes comments himself. In the case where low-quality comments are filtered, it is clear that this type of text is the most accurate information that directly reflects the user's interest and preference in the comments.

(B) The user evaluates the comments of others. Some shopping site reviews have this statement: the "59/61 person considers the comment to be useful", at the bottom of the comment, there are two buttons for the user to rate the usefulness of the comment. For the comment, 61 users have evaluated, and 59 users consider the comment to be needed by them, and can consider the comment to directly reflect the needs and preferences of the 59 users.

Information indirectly expressing demand preferences:

(A) historical information of the purchased goods. The comments aimed by the invention are mainly comments about the commodities, and the history of the commodities purchased by the user is also very important personalized information. By utilizing the data mining technology, the characteristics of the purchased commodity and other similar commodities can be found by mining the characteristics of the purchased commodity in the similar commodities, and the information reflects the requirements and preferences of users.

(B) Personal information provided by the user and other personal information. Such information includes gender, age, school calendar, residence (or geographic location obtained by accessing the IP). The information is beneficial to knowing the interest background of the user, and clustering or classifying the user by using statistical information aiming at certain specific fields to assist in judging the potential preference and intention of the user.

(C) The user browses the behavior information. The information is obtained by monitoring the behavior of the user on the Web page, such as collecting the data of the user's stay time on a certain page, the length of the document, the URL address visited by the user, the history of the URL path, etc., forming a log file, and summarizing the characteristic data of the user by analyzing the log file. Research shows that stable interest of users is contained in a Web access log of a certain period of time. This approach is transparent to the user, but collection of user data often takes a longer period of time.

User personalized intention and preference mining is divided into personalized information processing and organization and user personalized information adaptive tracking and learning,

after collecting the above various personalized information, the next task is to process the information and mine the needs and preferences of the user.

Firstly, the invention designs a data structure expressing the user requirement preference, and stores the data structure into a database:

wherein "user ID" uniquely identifies a user; in "< interest point: interest degree >", the interest point generally represents the attribute and the characteristic of a certain commodity or a commodity, and the interest degree represents the interest degree or the demand degree of the user for the interest point; the "comments written by the present user" and the "comments which the present user considers helpful" are the information which is presented above to directly express the preference of the demand. In this data structure, "< interest point: interest >" of the user needs to be learned and mined on various personalized information described in the previous section by means of machine learning, data mining, and the like. For the comments written by the user and the comments considered helpful by the user, the invention considers that the rich information is contained, which not only expresses the interest of the user on the attributes of the commodity, but also expresses the preference of the user on the comment style and the description mode, so that the original texts are reserved, and the texts are directly used in the later research, so that various information can be reserved as much as possible.

Generally speaking, the interests of users are changed with time, and the demands of users sometimes show a sudden, in the present invention, the user personalized data is to be updated in two aspects:

and (3) periodic updating: first, the user personalized information is collected periodically (for example, every two days), the original data set is added, and the user data is recalculated. The longer the history of the original information in the calculation, the smaller the weight of the original information, and the interest degree of the corresponding interest point is reduced. Authority of the historical information changes over time:

wherein, W (inf) is the authority of the current information inf; delta T_infThe interval for the information inf from being collected to the current time; alpha and beta are related parameters and are adjusted in learning and testing.

In addition, the invention adopts some rules to update the weights of some interest points. For example, when a user purchases a computer, the user may lose focus on the computer and the focus on the peripherals of the computer may increase over time.

And (3) real-time updating: the demand of the user is sudden and jumpy, especially in the case that the user searches and purchases goods. Such as the first day when the user has looked up the camera, but the second day when the user has looked up the information about the furniture. For such a case, the present invention adopts the following solution:

(A) the collected personalized information is classified by subject, each class gets a point of interest (previous research has been done)

(B) A category of the current user interest event is identified, which category is one of the categories in the previous step.

(C) And correspondingly improving the interest degree of the interest points corresponding to the current category, and reducing the interest points corresponding to the rest categories.

Here, the change of the personalized data is only for the current needs and interests, so the above change of the personalized data does not need to be written into the database.

Step 4, matching the user intention preference and the corresponding comments, wherein the matching is divided into matching according to the comment content and matching according to the relationship network of the reviewer and the reader, the matching method according to the comment content mainly comprises the step of establishing a mapping relationship model between personalized data of the user and recommended comments by using a machine learning method, so that the system finds a plurality of comments matched with the user, the comments are the comments recommended to the user, and the problem description is as follows:

v＝MatchF(user_personal_data,review)

wherein, user _ personal _ data represents the personalized data of a certain user; the review represents a comment to be matched, and MathchF () is a function for evaluating the matching degree of the user and the comment and needs to be obtained by training through a machine learning method; v is a match metric value, and a threshold may be set to select which reviews are recommended. The finding of the MatchF function, which represents a matching relationship model, is essential here. The problem is converted into a binary classification problem, and the converted classification problem is expressed as follows:

(A) sample classification: < user _ personal _ data, review >, i.e. a combination of user and comment.

(B) Classification type: for each sample, whose class is "match" or "non-match," the class of the training sample is known.

(C) Characteristics used in classification: the classification features are formed by combining features of three aspects, which are respectively as follows:

characteristics of user _ personal _ data: points of interest, characteristics of related comments (comments written or deemed helpful by the user), such as length of the comment, attributes mentioned in the comment, events occurring in the comment and various attributes thereof, and the like.

review characteristics: length of comment, attributes mentioned in the comment, events occurring in the comment and their various attributes, etc.

Relationship characteristics between User _ personal _ data and review: similarity between related comments and review (text similarity, semantic similarity, etc.).

The network relation is established through a machine learning method according to the matching of the relation network of the reviewer and the reader, and the matching degree between the user and the comment can be evaluated according to the reviewer of the comment.

And 5, sorting the comments recommended to the user, and sorting the selected recommended comments by adopting a RankSvm sorting algorithm. In ranking, the features involved include:

(A) a review quality score obtained in the global usefulness analysis;

(B) some features used in the user's intent and corresponding comments;

(C) the authority of the comment is mainly obtained through the authority of the reviewer and the normalization of the text style of the comment.

The above description is only a preferred embodiment of the automatic recommendation method for user comment disclosed in the present invention, and it should be noted that, for those skilled in the art, many variations and modifications can be made without departing from the inventive concept of the present invention, and these variations and modifications are within the scope of the present invention.

Claims

1. An automatic recommendation method for user comments is characterized by comprising the following steps:

step 1, carrying out global usefulness analysis research on the comments according to the comment contents and the behavior of the reviewer, and filtering out a part of useless comments;

step 2, mining the comment related information filtered in the step 1;

step 3, capturing user personalized information, and mining the user personalized intention and preference; the method for capturing the user personalized information is to collect the personalized information of the user by heuristic information and rules;

step 4, on the basis of the step 2 and the step 3, matching the user intention preference with the corresponding comment, and selecting the comment which meets the user requirement preference;

step 5, sorting the comments recommended to the user;

the method for filtering the useless comments comprises the steps of identifying the useless comments through interaction of commentators and the comments, specifically, modeling by using a machine learning method according to characteristics of the comments, performing model training through enough linguistic data, judging and finding out the useless comments, finding out authors of the comments through the useless comments, finding out more useless comments through the authors, and continuing finding out the authors through the useless comments, so that continuous iteration is performed, and finally, the useless comments are filtered comprehensively and accurately.

2. The method for automatically recommending user comments, as claimed in claim 1, wherein the filtered comments in step 2 are divided into comments on commodities and comments on events, and the mining of related information of comments comprises the extraction of attributes of commodities in comments and the extraction of events and related attributes thereof in comments.

3. The method of claim 2, wherein the method of extracting the product attributes from the comments comprises the steps of firstly selecting a comment, identifying the product category to which the comment is directed, secondly finding several products of other same categories according to the product category, then mining descriptive texts for the products, then extracting the objects and attributes of the products which are usually discussed from the texts by adopting a multi-document keyword extraction method, forming a keyword set according to the extracted objects and attributes, and finally returning to the comment to simply match words which are simultaneously present in the comment and in the keyword set, wherein the words are the objects and the attributes of the objects of the comment.

4. The method of claim 2, wherein the method of extracting events and their related attributes from the comments comprises preprocessing the original comments to obtain a plurality of trigger words in a training set as original trigger words, manually collecting some trigger words about the presentation and attitude, expanding the trigger words with a synonym dictionary, extracting candidate events according to the result of preprocessing the text and the result of expanding the trigger words, obtaining candidate categories to form a training example, describing the candidate events according to different angles according to the characteristics of the lexical and contextual characteristics and the dictionary characteristics, binary classifying the candidate events by using a classifier to obtain the categories of the candidate events, and inspecting the text sentences judged as events, i.e. containing the trigger words, to obtain all entities, attributes, and the like, And the time point and the like are used as candidate elements, and the category label is represented by a template corresponding to the event category, so that the event element identification is converted into a multi-classification problem to obtain the event element and the corresponding role.

5. The method for automatically recommending user comments, as claimed in claim 1, wherein the personalized information includes user's needs and preferences for goods and their attributes and user's needs and preferences for comments, and the information helpful for personalized information collection includes comments posted by the user himself, comments of others evaluated by the user, history information of purchased goods, personal information provided by the user and other personal information, and user browsing behavior information.

6. The method for automatically recommending the user comments, as claimed in claim 1, is characterized in that the method for mining the user personalized intention and preference includes processing and organizing the personalized information, and adaptively tracking and learning the user personalized information in step 3. the method for processing and organizing the personalized information is that the user requirement preference is designed into a data structure, the data of the data structure includes user ID as unique identification, < interest point: interest >, comments written by the user and comments deemed helpful by the user, the user personalized information is obtained through machine learning and data mining according to the data, and the adaptively tracking and learning of the user personalized information is used for periodically updating and real-timely updating the personalized information of the user.

7. The method for automatically recommending user comments, according to claim 1, is characterized in that the method for matching user intention preferences with corresponding comments, according to step 4, comprises the steps of matching according to the contents of the comments and matching according to the relationship network between the reviewers and the readers, wherein the method for matching according to the contents of the comments is that a mapping relationship model is established between personalized data of the users and recommended comments by using a machine learning method, so that the system finds a plurality of comments matched with the users, the comments are the comments recommended to the users, the method for matching according to the relationship network between the reviewers and the readers is that the network relationship is established by using the machine learning method, and the matching degree between the users and the comments can be evaluated according to the reviewers of the comments.

8. The method for automatically recommending user comments, according to claim 1, characterized in that in the step 5, the comments recommended to the user are ranked by using a RankSvm ranking algorithm, and the features involved in the ranking include comment quality scores obtained in the global usefulness analysis, user intentions, features used in corresponding comments and authority of comments.