CN110825876B - Movie comment viewpoint emotion tendency analysis method - Google Patents

Movie comment viewpoint emotion tendency analysis method Download PDF

Info

Publication number
CN110825876B
CN110825876B CN201911082409.1A CN201911082409A CN110825876B CN 110825876 B CN110825876 B CN 110825876B CN 201911082409 A CN201911082409 A CN 201911082409A CN 110825876 B CN110825876 B CN 110825876B
Authority
CN
China
Prior art keywords
comment
emotion
words
viewpoint
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911082409.1A
Other languages
Chinese (zh)
Other versions
CN110825876A (en
Inventor
许青青
谢赟
韩欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Datatom Information Technology Co ltd
Original Assignee
Shanghai Datatom Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Datatom Information Technology Co ltd filed Critical Shanghai Datatom Information Technology Co ltd
Priority to CN201911082409.1A priority Critical patent/CN110825876B/en
Publication of CN110825876A publication Critical patent/CN110825876A/en
Application granted granted Critical
Publication of CN110825876B publication Critical patent/CN110825876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses a movie comment viewpoint emotion tendentiousness analysis method, which comprises the following steps: crawling film description information and comment information of a plurality of films of each category from a film comment website; carrying out data preprocessing on the collected film comment description information and comment information; formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank; marking by keyword matching or manual marking, and carrying out comment label category marking and emotion tendency marking on each comment statement; generating a comment viewpoint sentiment analysis model consisting of a comment label classification model and a label sentiment classification model; and automatically generating comment label category labels and emotion tendentiousness labels by using the comment viewpoint emotion analysis model aiming at the target film comment. The emotional expression of the user to the film can be comprehensively and accurately reflected.

Description

Movie review viewpoint emotion tendency analysis method
Technical Field
The invention relates to the technical field of information extraction and data mining, in particular to a movie comment viewpoint emotion orientation analysis method.
Background
In the internet big data era, online comments become public praise terms and are the most direct expression mode and channel of the emotional attitude of consumers. The analysis of the consumer comments can obtain the all-around evaluation of the product for the consumer, so that the product can be known in multiple dimensions, and the user can make a decision conveniently. For the merchant, the preference of the consumer and the market can be known, so that the service quality is improved, and the stickiness of the customer is increased. With the increasing innovation of internet media technology, the movie entertainment industry, such as the cinema industry and the home entertainment industry, is developing vigorously, movies have become daily entertainment options of people, and the acceptance and welcome of people to movies also breed a large amount of comment information. The subjective view is extracted from the public comments, and the positive tendency or the negative tendency of the public is judged to be an important problem in information extraction and mining in the field of natural language processing, and meanwhile, the movie comment information shows the value of the movie comment information in the aspects of value transmission, movie and television environment modeling and the like, and is developed and analyzed, so that the method is beneficial to the deepened development of movie and television research. Therefore, it is of great significance to analyze the emotional orientation in the viewpoint of movie reviews.
The commonly used method for extracting the opinion of the user comment is mainly an unsupervised rule extraction and clustering algorithm and the like. The method based on rule extraction mainly extracts viewpoints in the comments according to the syntactic structure manual summary rule, but the manual arrangement rule cannot cover all comment viewpoint expression modes, so that the method has limited effective viewpoints to extract. The clustering-based method is simple but low in accuracy, and is difficult to generate reasonable and accurate comment tags.
At present, dictionary matching and classification algorithms and the like are commonly used methods for comment sentiment analysis. The method based on the emotion dictionary completely depends on the emotion dictionary and is limited by the size of the scale of the dictionary; the emotion classification algorithm is a supervised method, some training sets are obtained according to comment information and score combination, some training sets are manually labeled, and a large amount of labor cost is consumed.
In addition, comment information of different industries often has respective focus points, so the ways of emotion analysis are slightly different. Compared with online comment information such as e-commerce, restaurants, hotels and the like, movie comments contain more complex user experience and feeling information, so that the current emotion analysis and viewpoint extraction method is not completely suitable for movie comment analysis. In addition, many online review studies use review opinion extraction and emotion classification as two separate research modules, and user reviews of a certain product or object are often multidimensional, and the review of each dimension of the product is not consistent, and it is obviously not correct to directly analyze whether the user emotion is good opinion (positive) or bad opinion (negative), so that it has more practical value to perform emotion analysis on the main opinion dimension extracted by the user. For example, for the comment "the actor in this movie performs a skill and explodes, but the story line is not good", the results of (actor, positive direction) and (plot, negative direction) obtained after emotion analysis are more accurate.
Disclosure of Invention
The invention aims to provide a movie comment viewpoint emotion tendentiousness analysis method which can comprehensively and accurately reflect emotion expression of a user on a movie.
The technical scheme for realizing the purpose is as follows:
a movie comment viewpoint emotion tendentiousness analysis method comprises the following steps:
step S1, crawling the film description information and comment information of a plurality of films of each category from the film evaluation website;
step S2, carrying out data preprocessing on the collected film comment description information and comment information;
step S3, formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and the sentiment words as a comment label word library and a viewpoint sentiment word library;
step S4, comment label category marking and emotion tendency marking are carried out on each comment sentence through keyword matching marking or manual marking;
step S5, generating a comment viewpoint sentiment analysis model composed of a comment label classification model and a label sentiment classification model;
step S6, automatically generating comment label category labels and emotion tendentiousness labels by using the comment viewpoint emotion analysis model aiming at the target movie comment.
Preferably, in step S1, the classification of the movies includes: love, animation, action, science fiction, horror, comedy, and suspicion;
the film description information comprises a film name, a director name, a genre and a total score;
the comment information includes: reviewer nicknames, review utility numbers, review time, review content, and ratings.
Preferably, the data preprocessing comprises:
integrating all collected comment information to form a comment material library;
removing repeated data in the comment corpus;
deleting data with missing comment content in the comment corpus;
converting all traditional Chinese characters in the comment corpus into simplified Chinese characters;
and acquiring the film name, the director name and the director name from the acquired description information of each film, storing the film names, the director names and the director names into a user-defined dictionary, and marking the film names with different symbols.
Preferably, the step S3 includes:
constructing a plurality of comment viewpoint extraction rules according to the dependency syntax structure, the part of speech among the words and the expression structure of viewpoint words and sentiment words in the comment viewpoints;
sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis are carried out on the comment content in the comment corpus to obtain each comment sentence, whether the comment sentences match a certain comment viewpoint extraction rule or not is checked, if matching, viewpoint words and sentiment words are obtained,
and respectively storing all the acquired viewpoint words and sentiment words into a comment label word bank and a viewpoint sentiment word bank.
Preferably, the dependency syntax structure includes: a main-meaning structure, a guest-moving structure, a centering structure, a shape-middle structure, a dynamic compensation structure and a parallel structure;
the part of speech among the words comprises: a subject component, an object or object-like component, a idiom component, and a noun component; a formal object refers to an indirect or object-like structure;
the expression structure of the viewpoint words and the emotion words refers to: the subject component is a viewpoint word, and the object or the shape-like object component is an emotional word; the fixed language component is an emotional word, and the noun component modified by the fixed language component is a viewpoint word.
Preferably, the step S4 includes:
acquiring a label category dictionary and an emotion dictionary;
and performing keyword matching marking on the comment sentences capable of extracting the viewpoint words and the emotion words in the step S3: matching the acquired viewpoint words with the label category dictionary, matching the acquired emotion words with the emotion dictionary, and marking the comment sentences with label category labels and emotion tendentiousness labels if the matching of the acquired viewpoint words and the emotion dictionary is successful; otherwise, carrying out manual label category marking and emotion tendency marking;
for the comment sentences for which the viewpoint words and the emotion words are not extracted in step S3, manual label type labeling and emotion tendency labeling are performed.
Preferably, the obtaining of the tag category dictionary includes:
respectively marking the film name, the director name and the actor name in the user-defined dictionary in the comment tag word library as 'film', 'director' and 'actor';
training each comment sentence through a word vector model to obtain a trained word vector model;
words in the comment label word library are expressed by using a trained word vector model, and the words in the comment label word library are clustered into k categories by using a k-means clustering algorithm;
manually inducing and screening to divide the public opinion of the movie reviews into 8 dimensions of 'director, photography, scenario, actor, emotion, audio-visual, subject, impression', screening words under each class cluster, and reserving related words to form a preliminary label class dictionary;
acquiring related words expansion tag class dictionaries of tag class words in the preliminary tag class dictionaries by using the trained word vector model, removing repeated words in the dictionaries, and generating final tag class dictionaries;
the obtaining of the emotion dictionary refers to: firstly, collecting open-source positive and negative emotion dictionaries for sorting and merging, then counting word frequency in the viewpoint emotion word library, reserving all words larger than a set threshold value, and then manually deleting words irrelevant to movie comment emotion to form an emotion dictionary.
Preferably, the step S5 includes:
respectively training and generating two preliminary comment label classification models and two preliminary label emotion classification models by utilizing the keyword matching marking data set and the manual marking data set;
weighting and fusing the two preliminary comment label classification models to generate a final comment label classification model;
and performing weighted fusion on the two primary label emotion classification models to generate a final label emotion classification model.
Preferably, the step of generating the preliminary comment tag classification model or the preliminary tag emotion classification model includes:
an up-sampling strategy is adopted for the keyword matching marking data set and the manual marking data set to carry out data balance;
dividing a keyword matched and marked data set and a manually marked data set after data balance into a training set and a test set according to a preset proportion;
segmenting words of the corpus in the training set, removing stop words, extracting text features by adopting a TF-IDF algorithm, and calculating chi-square values of the features to perform feature dimension reduction;
and importing the data into a random forest classification model, and performing model training, storage and evaluation.
Preferably, the step S6 includes:
extracting viewpoint words and emotion words, if the viewpoint words and the emotion words can be obtained, performing keyword matching including label category matching and emotion word matching, and if the viewpoint words and the emotion words can be successfully matched, directly outputting label category marks and emotion tendency marks; otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds T1 and T2, and outputting a tag class mark and an emotion tendency mark if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
The invention has the beneficial effects that: the method and the device are used for processing text information with complex movie comment contents and emotional tendencies, and analyzing the emotional tendencies of movie comment data in a mode of combining various methods and various strategies, so that the emotional tendencies of audiences to certain aspects of a movie can be captured accurately.
Drawings
FIG. 1 is a flow chart of a movie reviews perspective emotional orientation analysis method of the present invention;
FIG. 2 is a flow chart of keyword matching marking according to the present invention;
FIG. 3 is a schematic diagram of a review tag classification model fusion in the present invention;
FIG. 4 is a schematic diagram of label emotion classification model fusion in the present invention;
FIG. 5 is a schematic diagram of a classification model construction process according to the present invention;
FIG. 6 is a flow chart of automatic generation of comment emotion tags in the present invention.
Detailed Description
The invention will be further explained with reference to the drawings.
Referring to fig. 1, the method for analyzing sentiment orientation of film review viewpoint of the present invention mainly extracts review viewpoint from film review data, performs marking classification and sentiment orientation analysis of the viewpoint, i.e. obtains review label categories and sentiment orientation thereof, and simultaneously constructs a review viewpoint sentiment analysis model to realize analysis classification of new film review data and attach categories and sentiment labels. Comprises the following steps:
step S1, data crawling: and crawling love, animation, action, science fiction, horror, comedy and suspicion categories of film description information of a plurality of films and comment information of each film from a film evaluation website. The movie description information includes information such as movie name, director name, genre, and overall score. The comment information of the film comprises information such as a nickname of a reviewer, useful number of comments, comment time, comment content and rating.
Step S2, performing data preprocessing on the movie description information and the comment information, including:
integrating data, and integrating all collected comment information into a comment corpus;
removing duplicate data, and removing duplicate data in the comment corpus;
processing the missing value, and deleting data with missing comment content in the comment corpus;
the traditional Chinese processing converts all traditional Chinese in the comment corpus into simplified Chinese;
and self-defining a user dictionary, acquiring the film name, the director name and the director name from the collected film description information, storing the film name, the director name and the director name into the user-defined dictionary, and marking the film names with different symbols.
Step S3, comment viewpoint extraction: and (3) making a plurality of universal comment viewpoint extraction rules according to the dependency syntax structure and the part of speech among the words in the modern Chinese and by combining the expression structure of the viewpoint words and the emotion words in the actual comment viewpoint. The method comprises the steps of performing operations such as sentence segmentation, word segmentation, part of speech tagging and dependency syntactic analysis on comment contents in a comment corpus to obtain each comment sentence, checking whether the comment sentences are matched with a certain comment viewpoint extraction rule, obtaining (viewpoint words and sentiment words) if the comment sentences are matched with the comment viewpoint extraction rule, and finally respectively storing all the obtained viewpoint words and sentiment words into a comment label word bank and a viewpoint sentiment word bank.
The comment viewpoint extraction rule mainly divides the rule into two types according to the dependency syntax structure: the first is a rule system with a main and subordinate Structure (SBV) as a core, and the second is a rule system with a fixed and intermediate structure (ATT) as a core. The syntax relationship involved in the extraction rules is shown in table 1:
type of relationship Tag Description Example
Major-minor structure SBV subject-verb I send her a bunch of flowers (I < — send)
Structure of Buddhist guest VOB verb-object I send her bunch of flowers (send- - > flower)
Centering structure ATT attribute Red apple (Red < -apple)
Middle structure ADV adverbial Very beautiful (very < -beautiful)
Dynamic compensation structure CMP complement Completed operation (do- - > complete)
Parallel structure COO coordinate Dashan and sea (Dashan- - > sea)
TABLE 1
Further, SBV-based rule systems are largely classified into 4 categories, as shown in table 2:
Figure BDA0002264370080000071
TABLE 2
As can be seen from Table 2, the SBV-based rule is mainly based on the establishment of a relational connection between a noun subject and an object or an object-like structure (hereinafter, the indirect or object-like structure is referred to as an object-like structure) directly or indirectly. The extracted subject component is a comment viewpoint word, and the extracted object-like component is a comment viewpoint emotion word.
This rule does not only relate to the sentence structure listed in Table 2, but also considers whether the subject and the formal object have a parallel structure, and further considers whether the formal object has adverb modifications because negative words affect the emotion. For example, for the movie rating "movie and scenario good", two sets of viewpoint word and emotion word pairs (movie, good), (scenario, good) can be extracted according to the proposed rules; the 'subject rich and novel' can obtain a (subject, rich) and (subject, novel) label pair; "movie not good-looking" can be extracted (movie, not good-looking).
Further, the rule system with ATT as the core is also classified into 4 types, and the specific rules are shown in table 3.
Figure BDA0002264370080000081
TABLE 3
Since the final phrase is used to modify, define, and explain the quality and characteristics of a noun or pronoun, the final phrase is essential to the review viewpoint extraction rule. As seen from table 3, the adjectives are generally used as sentiment words for commenting on the viewpoint, and the nouns modified by them or verbs used as nouns are used as viewpoint words for commenting on. Similarly, the rules also need to consider the side-by-side structure of noun components, adjectives, and adverb components that modify adjectives. For example, the example sentence "stiff and embarrassing performance" in table 3 shows that "stiff and embarrassing" are in parallel relation, so that two sets of label pairs (representing, stiff) and (performing, embarrassing) can be extracted; the "show not live" can be extracted (show, not live).
And step S4, commenting the label category label and the emotion tendency label, and dividing the label into keyword matching marking and manual marking. The method comprises the following steps that a label category dictionary and an emotion dictionary need to be acquired during keyword matching marking, keyword matching is carried out, the main process is shown in figure 2, the label category dictionary is acquired firstly, and the method comprises the following steps:
1) film proper noun substitution. The comment tag word library contains the film names, director names and actor names in a user-defined dictionary, and the film names, director names and actor names are respectively marked as 'film', 'director' and 'actors', so that classification of partial words in the comment tag word library is realized; that is, if the names of actors such as "zhang san" and "li xi" exist in the comment tag word stock, but the machine cannot distinguish that "zhang san" and "li xi" are actors, the "zhang san" and "li xi" can be marked as "actors" by matching the names of the actors in the user-defined dictionary with the names of the actors in the user-defined dictionary; the same approach is used for the marking of the director's name and the film name.
2) And (5) training a word vector model. Dividing words of comment contents in a comment corpus, removing stop words, and storing the words in a text, wherein each comment sentence is stored in a line, and the words are separated from one another by spaces; obtaining a word vector model by utilizing the word2vec (word vector) model to train the well-processed comment content;
3) and (5) clustering words. Expressing words in the comment label word library by using a trained word vector model, and clustering the words in the comment label word library into k categories by using a k-means clustering algorithm; the k categories are determined by observing clustering results through multiple tests;
4) and (5) inducing the evaluation dimension and screening a category dictionary. Manually inducing and screening to divide the public opinion of the movie reviews into 8 dimensions of 'director, photography, drama, actor, emotion, audio-visual, subject matter and impression', screening words under each class cluster, and reserving related words to form a label class dictionary;
5) a tag class dictionary is augmented. And obtaining related words of the tag category words by using the trained word vector model to expand the tag category dictionary, removing repeated words in the dictionary, and generating a final tag category dictionary. The related words of the label category words are obtained by calculating the similarity between the words through a word vector model, setting a threshold value, determining that the words are related and similar when the similarity is greater than the threshold value, and manually screening the results of the related words to ensure the accuracy of the label category dictionary.
An example of the generated label category dictionary is shown in table 4:
Figure BDA0002264370080000091
TABLE 4
Next, an emotion dictionary is obtained. Firstly, collecting positive and negative emotion dictionaries with open sources, wherein the positive and negative emotion dictionaries mainly comprise a HowNet dictionary of a HowNet and an emotion dictionary with open sources of Taiwan university, and sorting and combining the dictionaries. The HowNet knowledge network dictionary only takes positive and negative evaluation words. Then, counting word frequency in the viewpoint emotion word bank, reserving all words larger than a set threshold value, and then manually deleting some words irrelevant to the movie comment emotion to form an emotion dictionary with movie characteristics.
And finally, matching keywords. The keyword matching is to extract comment sentences of the viewpoint words and the sentiment words from the comment viewpoint extraction, match the viewpoint words with the label category dictionary, match the sentiment words with the sentiment dictionary, and mark (label category and sentiment tendency) marks on the comment sentences if both the comment sentences and the sentiment dictionary can be successfully matched. For example, for a "story not strong" comment, a comment viewpoint is extracted to obtain a (story, not strong) mark, and a (story, negative) mark is obtained after a label category and an emotional tendency mark.
The manual marking has two conditions that sentences of the viewpoint words and the emotion words are not extracted in the comment viewpoint extraction, sentences which can extract the viewpoint words and the emotion words but cannot meet the keyword matching marking can be extracted in the comment viewpoint extraction, and the manual label category marking and the emotion tendency marking are carried out on the condition.
And step S5, generating a comment viewpoint emotion analysis model which is composed of a comment label classification model and a label emotion classification model, wherein the two classification models are different except for class labels, and the whole data processing and classification algorithm are the same. There are two categories of classification model datasets: firstly, a data set marked by keyword matching and secondly, a data set marked manually are respectively used for training to generate 2 comment label classification models and 2 label emotion classification models. In order to improve the accuracy of emotion analysis, the 2 comment label classification models are weighted and fused to generate a new comment label classification model, and the 2 label emotion classification models are weighted and fused to generate a new label emotion classification model, which is referred to fig. 3 and 4. In this embodiment, the weight of the model generated by the keyword marking data and the weight of the model generated by the manual marking data are 0.4 and 0.6, respectively.
The comment opinion sentiment analysis probability calculation formula is as follows:
Pi=0.4*P1i+0.6*P2i
wherein, PiProbability P of a comment content in a comment corpus being of i type1i、P2iAnd respectively representing the probability value obtained by the model generated by the keyword marking data and the probability value obtained by the model generated by the manual marking data. For the comment tag classification model, the values of i are 0-7, and the 8 categories of director, photography, scenario, actor, emotion, audio-visual and subject are represented respectively. For the label emotion classification model, the values of i are 0 and 1, 1 represents positive emotion, and 0 represents negative emotion.
The above construction process of the classification model, see fig. 5, involves the following steps:
first, data balancing is performed. The various samples of the classified data may have an unbalanced phenomenon, which has a great influence on the overall accuracy of classification. The invention adopts an upsampling (Oversampling) strategy, namely, copying small data types into multiple copies.
Second, dataset partitioning is performed. The scrambled data set is divided into a training set and a test set according to the ratio of 8: 2.
Then, feature extraction is performed. Segmenting the corpus of the training set, removing stop words, extracting text features by adopting TF-IDF algorithm (word frequency-inverse document frequency), and calculating CHI-square value (CHI2 or CHI) of each feature2) And by setting a threshold value K (K is an integer), keeping K characteristics before the chi-square value arrangement to realize characteristic dimension reduction.
And finally, importing the data into a random forest classification model, and performing model training, storage and evaluation.
Step S6, the comment emotion label is automatically generated. After the comment opinion sentiment analysis model is trained, automatic marking of new film comments can be performed, and a specific sentiment prediction process is referred to fig. 6. Firstly, comment viewpoint extraction and extraction (viewpoint words and sentiment words) are carried out, if the comment viewpoint extraction and the sentiment words can be obtained, keyword matching including label category matching and sentiment word matching is carried out, and if the keyword matching and the sentiment word matching can be successfully carried out, results are directly output. Otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds (T1 and T2), and outputting (comment tag class mark and emotion tendency mark) if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, therefore all equivalent technical solutions should also fall into the scope of the present invention, and should be defined by the claims.

Claims (7)

1. A movie comment viewpoint emotion tendentiousness analysis method is characterized by comprising the following steps:
step S1, crawling the film description information and comment information of a plurality of films of each category from the film evaluation website;
step S2, carrying out data preprocessing on the collected film comment description information and comment information;
step S3, formulating a plurality of comment viewpoint extraction rules, obtaining viewpoint words and sentiment words from each comment sentence of comment content of comment information by using the comment viewpoint extraction rules, and then respectively storing all the viewpoint words and sentiment words as a comment label word bank and a viewpoint sentiment word bank;
step S4, comment label category marking and emotion tendency marking are carried out on each comment sentence through keyword matching marking or manual marking;
step S5, generating a comment viewpoint emotion analysis model consisting of a comment label classification model and a label emotion classification model;
step S6, automatically generating comment label category labels and emotion tendentiousness labels by using a comment viewpoint emotion analysis model aiming at target movie comments;
the data preprocessing comprises the following steps:
integrating all collected comment information to form a comment material library;
removing repeated data in the comment corpus;
deleting data with missing comment contents in the comment corpus;
converting all traditional Chinese characters in the comment corpus into simplified Chinese characters;
acquiring a film name, a director name and a director name from the acquired film description information, storing the film name, the director name and the director name into a user-defined dictionary, and marking the film name, the director name and the director name by different symbols;
the step S4 includes:
acquiring a label category dictionary and an emotion dictionary;
and performing keyword matching marking on the comment sentences capable of extracting the viewpoint words and the emotion words in the step S3: matching the acquired viewpoint words with the label category dictionary, matching the acquired emotion words with the emotion dictionary, and marking label category marks and emotion tendentiousness marks on the comment sentences if the two words can be successfully matched; otherwise, carrying out manual label category marking and emotion tendentiousness marking;
for the comment sentences of which the viewpoint words and the emotion words are not extracted in the step S3, performing manual label type marking and emotion tendentiousness marking;
the step S5 includes:
respectively training and generating two preliminary comment label classification models and two preliminary label emotion classification models by utilizing the keyword matching marking data set and the manual marking data set;
weighting and fusing the two preliminary comment label classification models to generate a final comment label classification model;
and weighting and fusing the two preliminary label emotion classification models to generate a final label emotion classification model.
2. The method for analyzing emotion tendencies in comment viewpoints of movies according to claim 1, wherein in said step S1, the classification of movies includes: love, animation, action, science fiction, horror, comedy, and suspicion;
the film description information comprises a film name, a director name, a genre and a total score;
the comment information includes: the comment is a nickname, useful number of comments, time of comment, comment content and score.
3. The method for analyzing emotional tendency of opinion of movie reviews, according to claim 1, wherein said step S3 includes:
constructing a plurality of comment viewpoint extraction rules according to the dependency syntax structure, the part of speech among words and the expression structure of viewpoint words and sentiment words in comment viewpoints;
performing sentence segmentation, word segmentation, part-of-speech tagging and dependency syntactic analysis on comment contents in a comment corpus to obtain each comment sentence, checking whether the comment sentences match a comment viewpoint extraction rule or not, if matching, obtaining viewpoint words and sentiment words,
and respectively storing all the acquired viewpoint words and emotion words as a comment label word library and a viewpoint emotion word library.
4. The method for analyzing sentiment tendency of opinion of movie reviews according to claim 3, wherein the dependency syntax structure comprises: a main-meaning structure, a guest-moving structure, a centering structure, a shape-middle structure, a dynamic compensation structure and a parallel structure;
the parts of speech among the words comprise: a subject component, an object or object-like component, a idiom component, and a noun component; a formal object refers to an indirect or object-like structure;
the expression structure of the viewpoint words and the emotion words refers to: the subject component is a viewpoint word, and the object or the shape-liked object component is an emotional word; the phrase component is an emotional word, and the noun component modified by the phrase component is a viewpoint word.
5. The method for analyzing emotion tendentiousness of review viewpoint of movie as claimed in claim 1, wherein said obtaining tag category dictionary comprises:
respectively marking the film name, the director name and the actor name in the user-defined dictionary in the comment tag word library as 'film', 'director' and 'actor';
training each comment sentence through a word vector model to obtain a trained word vector model;
words in the comment label word library are expressed by using a trained word vector model, and the words in the comment label word library are clustered into k categories by using a k-means clustering algorithm;
manually inducing and screening to divide the public opinion of the movie reviews into 8 dimensions of 'director, photography, scenario, actor, emotion, audio-visual, subject, impression', screening words under each class cluster, and reserving related words to form a preliminary label class dictionary;
acquiring related words expansion tag class dictionaries of tag class words in the preliminary tag class dictionaries by using the trained word vector model, removing repeated words in the dictionaries, and generating final tag class dictionaries;
the obtaining of the emotion dictionary refers to: firstly, collecting open-source positive and negative emotion dictionaries for sorting and merging, then counting word frequency in the viewpoint emotion word library, reserving all words larger than a set threshold value, and then manually deleting words irrelevant to movie comment emotion to form an emotion dictionary.
6. The method for analyzing emotion tendentiousness of comment viewpoint of movie as claimed in claim 1, wherein said step of generating a preliminary comment tag classification model or a preliminary tag emotion classification model includes:
an up-sampling strategy is adopted for the keyword matching marking data set and the manual marking data set to carry out data balance;
dividing a keyword matched and marked data set and a manually marked data set after data balance into a training set and a test set according to a preset proportion;
segmenting words of the corpus in the training set, removing stop words, extracting text features by adopting a TF-IDF algorithm, and calculating chi-square values of the features to perform feature dimension reduction;
and importing the data into a random forest classification model, and performing model training, storage and evaluation.
7. The method for analyzing emotional tendency of opinion of movie reviews according to claim 1, wherein said step S6 includes:
extracting viewpoint words and emotion words, if the viewpoint words and the emotion words can be obtained, performing keyword matching including label category matching and emotion word matching, and if the viewpoint words and the emotion words can be successfully matched, directly outputting label category marks and emotion tendency marks; otherwise, directly calling the comment tag classification model and/or the tag emotion classification model to perform tag class prediction and tag emotion prediction, setting two thresholds T1 and T2, and outputting a tag class mark and an emotion tendency mark if the tag class prediction probability P1 is greater than T1 and the tag emotion prediction probability P2 is greater than T2.
CN201911082409.1A 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method Active CN110825876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911082409.1A CN110825876B (en) 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911082409.1A CN110825876B (en) 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method

Publications (2)

Publication Number Publication Date
CN110825876A CN110825876A (en) 2020-02-21
CN110825876B true CN110825876B (en) 2022-07-15

Family

ID=69553492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911082409.1A Active CN110825876B (en) 2019-11-07 2019-11-07 Movie comment viewpoint emotion tendency analysis method

Country Status (1)

Country Link
CN (1) CN110825876B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111565322B (en) * 2020-05-14 2022-03-04 北京奇艺世纪科技有限公司 User emotional tendency information obtaining method and device and electronic equipment
CN111666767B (en) * 2020-06-10 2023-07-18 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111966944B (en) * 2020-08-17 2024-04-09 中电科大数据研究院有限公司 Model construction method for multi-level user comment security audit
CN112214661B (en) * 2020-10-12 2022-04-08 西华大学 Emotional unstable user detection method for conventional video comments
CN112215003A (en) * 2020-11-09 2021-01-12 深圳市洪堡智慧餐饮科技有限公司 Comment label extraction method based on albert pre-training model and kmean algorithm
CN112651211A (en) * 2020-12-11 2021-04-13 北京大米科技有限公司 Label information determination method, device, server and storage medium
CN112527963A (en) * 2020-12-17 2021-03-19 深圳市欢太科技有限公司 Multi-label emotion classification method and device based on dictionary, equipment and storage medium
CN112612873B (en) * 2020-12-25 2023-07-07 上海德拓信息技术股份有限公司 Centralized event mining method based on NLP technology
CN113127640B (en) * 2021-03-12 2022-11-29 嘉兴职业技术学院 Malicious spam comment attack identification method based on natural language processing
CN113065052A (en) * 2021-04-07 2021-07-02 顶象科技有限公司 Method and device for analyzing authenticity of video comment, electronic equipment and storage medium
CN113312478B (en) * 2021-04-25 2022-07-19 国家计算机网络与信息安全管理中心 Viewpoint mining method and device based on reading understanding
CN113536080B (en) * 2021-07-20 2023-06-20 湖南快乐阳光互动娱乐传媒有限公司 Data uploading method and device and electronic equipment
CN113515663A (en) * 2021-08-03 2021-10-19 广州酷狗计算机科技有限公司 Comment information display method and device, electronic equipment and storage medium
CN113961725A (en) * 2021-10-25 2022-01-21 北京明略软件系统有限公司 Automatic label labeling method, system, equipment and storage medium
CN115392199B (en) * 2022-08-22 2023-08-04 再惠(上海)网络科技有限公司 Evaluation analysis and report generation method, device, electronic equipment and storage medium
CN116644754B (en) * 2023-05-31 2024-04-16 金智东博(北京)教育科技股份有限公司 Internet financial product comment viewpoint extraction method based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462487A (en) * 2014-12-19 2015-03-25 南开大学 Individualized online news comment mood forecast method capable of fusing multiple information sources
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9275361B2 (en) * 2013-01-11 2016-03-01 Tagnetics, Inc. Out of stock sensor
CN103279460B (en) * 2013-05-24 2017-02-08 北京尚友通达信息技术有限公司 Method for analyzing and processing online shopping comments
CN105117428B (en) * 2015-08-04 2018-12-04 电子科技大学 A kind of web comment sentiment analysis method based on word alignment model
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products
CN106096664B (en) * 2016-06-23 2019-09-20 广州云数信息科技有限公司 A kind of sentiment analysis method based on social network data
CN106156004B (en) * 2016-07-04 2019-03-26 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
CN106649519B (en) * 2016-10-17 2020-11-27 北京邮电大学 Product characteristic mining and evaluating method
CN108108433A (en) * 2017-12-19 2018-06-01 杭州电子科技大学 A kind of rule-based and the data network integration sentiment analysis method
CN108108468A (en) * 2017-12-29 2018-06-01 华中科技大学鄂州工业技术研究院 A kind of short text sentiment analysis method and apparatus based on concept and text emotion
CN108460010A (en) * 2018-01-17 2018-08-28 南京邮电大学 A kind of comprehensive grade model implementation method based on sentiment analysis
CN109684647B (en) * 2019-02-19 2020-07-24 东北林业大学 Movie comment sentiment analysis method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462487A (en) * 2014-12-19 2015-03-25 南开大学 Individualized online news comment mood forecast method capable of fusing multiple information sources
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data

Also Published As

Publication number Publication date
CN110825876A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN110825876B (en) Movie comment viewpoint emotion tendency analysis method
Eirinaki et al. Feature-based opinion mining and ranking
Basiri et al. Sentence-level sentiment analysis in Persian
Singh et al. Sentiment analysis of textual reviews; Evaluating machine learning, unsupervised and SentiWordNet approaches
CN106407420B (en) Multimedia resource recommendation method and system
Lima et al. Automatic sentiment analysis of Twitter messages
US20120029908A1 (en) Information processing device, related sentence providing method, and program
Ahlgren Research on sentiment analysis: the first decade
CN108491512A (en) The method of abstracting and device of headline
CN108460150A (en) The processing method and processing device of headline
Tiwari et al. Ensemble approach for twitter sentiment analysis
Leopairote et al. Software quality in use characteristic mining from customer reviews
CN108399265A (en) Real-time hot news providing method based on search and device
CN108470026A (en) The sentence trunk method for extracting content and device of headline
CN108363700A (en) The method for evaluating quality and device of headline
Nugraha et al. Typographic-based data augmentation to improve a question retrieval in short dialogue system
Grivolla et al. A hybrid recommender combining user, item and interaction data
Yao et al. Online deception detection refueled by real world data collection
Urriza et al. Aspect-based sentiment analysis of user created game reviews
Sindhu et al. Opinionated text classification for hindi tweets using deep learning
Li et al. Confidence estimation and reputation analysis in aspect extraction
Clarizia et al. Sentiment analysis in social networks: A methodology based on the latent dirichlet allocation approach
Dadoun et al. Sentiment Classification Techniques Applied to Swedish Tweets Investigating the Effects of translation on Sentiments from Swedish into English
CN107729509A (en) The chapter similarity decision method represented based on recessive higher-dimension distributed nature
Koorathota et al. Editing like humans: a contextual, multimodal framework for automated video editing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant