CN115587178A

CN115587178A - Automobile comment analysis method

Info

Publication number: CN115587178A
Application number: CN202211096916.2A
Authority: CN
Inventors: 皋勋; 韩骅; 许多
Original assignee: Shanghai Wangshang E Commerce Co ltd
Current assignee: Shanghai Wangshang E Commerce Co ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2023-01-10

Abstract

The application provides an analysis method of automobile comments, which comprises the steps of obtaining the automobile comments in a network forum by using a Python web crawler, preprocessing the obtained automobile comments, constructing a subjective comment text set by using a subjective analysis tool and an emotion analysis tool, removing objective comments or neutral comments, constructing a theme recognition model, recognizing themes of the subjective comment text set, constructing an emotion analysis classifier, and obtaining emotional tendency under each theme. According to the method and the device, useless and large comment data are removed, introduction of noise is reduced, and subsequent analysis efficiency and emotion classification accuracy are improved.

Description

Automobile comment analysis method

Technical Field

The application relates to an analysis method, in particular to an analysis method of automobile comments.

Background

With the rapid development of science and technology, the internet has been accompanied by the life of people all the time, and the comments of consumers on the internet are exponentially increased. The online comment of the automobile product is a valuable resource, and the evaluation of the user on various aspects of the automobile product is contained in the online comment, and the evaluation comprises the evaluation of multiple angles such as the most satisfactory evaluation, the most unsatisfactory evaluation, the space evaluation, the power evaluation, the cost performance evaluation and the like. These evaluations are the most realistic feelings of the consumers, include the emotional tendency of the buyers to the products, and are valuable resources of the car enterprises. For automobile manufacturers, automobile reviews are used for analysis, so that problems can be found timely, user requirements can be known timely, and enterprises are promoted to improve product design.

Currently, emotion analysis methods can be mainly classified into emotion analysis methods based on an emotion dictionary, emotion analysis methods based on traditional machine learning, and emotion analysis methods based on deep learning. The conventional sentiment analysis method directly classifies the sentiment after the pretreatment such as cleaning and marking is carried out on the automobile comments, and the objective or neutral comments are introduced unnecessarily, so that the subsequent sentiment analysis process is invalid and more problems are caused: 1. useless and large analysis data are introduced, and the analysis efficiency is reduced; 2. these objective or neutral comments introduce noise, compared to subjective comments, reducing the accuracy of subsequent sentiment classification.

Therefore, how to improve the analysis efficiency and accuracy of the automobile comments becomes a technical problem to be solved at present.

Disclosure of Invention

The present application is made in view of the above problems, so as to provide an analysis method of automobile reviews, which is used for improving the analysis efficiency and accuracy of automobile reviews.

The application provides an analysis method of automobile comments, which comprises the following steps:

s1, acquiring automobile comments in a web forum by using a Python web crawler;

s2, preprocessing the acquired automobile comments;

s3, constructing a subjective comment text set by using a subjective analysis tool and an emotion analysis tool, and removing objective comments or neutral comments;

s4, constructing a theme identification model to identify the theme of the subjective comment text set;

s5, constructing an emotion analysis classifier, and acquiring the emotional tendency of each theme;

wherein the step S3 specifically includes:

step S31, translating each automobile comment, and calculating the subjective score of each automobile comment by using a subjective analysis tool;

step S32, removing objective comments or neutral comments based on the subjective score and the subjective threshold value to obtain a first comment text set;

step S33, calculating the emotion score of each automobile comment in the first comment text set by using an emotion analysis tool;

and step S34, removing the objective comments or the neutral comments again based on the emotion scores and the emotion threshold values to obtain a subjective comment text set.

Further, step S2 further specifically includes:

step S21: removing repeated data;

step S22: data cleaning;

step S23: splitting a sentence;

step S24: stop words are removed.

Further, the subjective analysis tool is the TextBlob tool in Python.

Further, when the subjective score is larger than or equal to the subjective threshold, the automobile comment is considered to be subjective, the automobile comment is reserved, and when the subjective score is smaller than the subjective threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.

Further, the emotion analysis tool is a SnowNLP tool in Python.

Further, when the emotion score is greater than or equal to a first emotion threshold value or the emotion score is less than or equal to a second emotion threshold value, the automobile comment is considered to be positive or negative, namely subjective, the automobile comment is reserved, when the emotion score is less than the first emotion threshold value and the emotion score is greater than the second emotion threshold value, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.

Further, step S4 further specifically includes:

s41, constructing a theme recognition model by adopting a Word2Vec method;

s42, constructing a part-of-speech recognition module and extracting nouns in the subjective comment text set;

and S43, identifying the theme of the subjective comment text set by using the theme identification model.

Further, the emotion analysis classifier is constructed by using an LSTM method.

The beneficial effect of this application is:

the application provides an analysis method of automobile comments, the automobile comments are analyzed by using a TextBlob tool and a SnowNLP tool together, a subjective comment text set is constructed, useless and large comment data are removed, noise introduction is reduced, and subsequent analysis efficiency and emotion classification accuracy are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following descriptions are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method for analyzing automobile reviews provided by the present application;

FIG. 2 is a flow chart of the pre-processing provided herein;

FIG. 3 is a flow chart for constructing a subjective comment text set provided by the present application;

FIG. 4 is a flow chart of identifying a subject of a subjective opinion corpus topic as provided herein;

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a" and "an" typically include at least two, but do not exclude the presence of at least one.

The present application is further described with reference to the following figures and specific examples.

Fig. 1 is a flowchart of an analysis method for automobile reviews, according to an embodiment of the present invention. As shown in fig. 1, an analysis method of a car review includes:

and S1, acquiring the automobile comments in the network forum by using a Python web crawler.

The internet forum provides functions of user evaluation, consultation, question answering and the like, and the user can evaluate automobile price, product quality, service quality and the like on the internet forum.

And S2, preprocessing the acquired automobile comments.

Data preprocessing is necessary, and the quality of data can have great influence on subsequent emotion analysis. Data is generally text acquired in the internet forum, and the text belongs to unstructured data, wherein a plurality of noises which influence results, such as punctuation, emoticons, stop words and the like, must be processed.

Further, in another embodiment of the present application, as shown in fig. 2, step S2 further specifically includes:

step S21: duplicate data is removed.

The comments acquired in the network forum are acquired by the web crawler, so that repeated information can be generated in the capturing process inevitably, and the comments have repeated problems. These increase the training time of the model on the one hand, on the other hand have seriously influenced the reliable letter of the model training parameter in the model training, and then lead to that each index of final classification is all very low. Therefore, it is very necessary to filter out the duplicated information.

Step S22: and (5) data cleaning.

More and more reviewers currently prefer to express their own opinion using repeated punctuation or emoticons that are not relevant to analysis and require data cleansing. And (4) cleaning by using data, and deleting interferences such as non-Chinese characters, blank spaces, numbers, punctuation coincidence, marks, abbreviations and unrecognizable words.

Step S23: and (4) splitting the sentence.

The sentence is split according to the inflected terms, including "but", "however", "instead", and so on.

Step S24: stop words are removed.

And removing unimportant information by combining the common stop word list.

And S3, constructing a subjective comment text set by utilizing a subjective analysis tool and an emotion analysis tool.

And removing objective comments or neutral comments from the preprocessed comment texts, and constructing a text set only comprising subjective comments.

Further, in another embodiment of the present application, as shown in fig. 3, the step S3 includes:

and S31, translating each automobile comment of the automobile, and calculating the subjective score of each automobile comment by using a subjective analysis tool.

Specifically, the subjective analysis tool is preferably a TextBlob tool in Python, and each automobile comment is subjectively analyzed by using the TextBlob tool to calculate a corresponding subjective score.

And step S32, removing objective comments or neutral comments based on the subjective score and the subjective threshold value to obtain a first comment text set.

Specifically, when the subjective score is greater than or equal to the subjective threshold, the automobile comment is considered to be subjective, the automobile comment is retained, and when the subjective score is smaller than the subjective threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.

In this embodiment of the present application, the TextBlob tool, when performing subjective analysis, may divide the subjective score into two intervals: (0,0.2) is an objective comment, and [0.2,1] is a subjective comment.

At present, only a TextBlob tool in Python is used as a subjective analysis tool, but the TextBlob tool is for an English text and cannot process a Chinese text, so that the first comment text set is obtained by translating the Chinese automobile comment and then carrying out subjective analysis.

And step S33, calculating the emotion score of each automobile comment in the first comment text set by using an emotion analysis tool.

Specifically, the emotion analysis tool preferably selects a snowNLP tool in Python, and performs emotion analysis on each automobile comment by using the snowNLP tool to calculate a corresponding emotion score.

Specifically, when the sentiment score is greater than or equal to a first sentiment threshold value or the sentiment score is less than or equal to a second sentiment threshold value, the automobile comment is considered to be positive or negative, namely subjective, the automobile comment is reserved, when the sentiment score is less than the first sentiment threshold value and the sentiment score is greater than the second sentiment threshold value, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.

In this embodiment of the present application, during emotion analysis, the SnowNLP tool may divide the emotion degree into three intervals: [0,0.4] is a negative comment, (0.4,0.6) is a neutral comment, and [0.6,1] is a positive comment.

In the application, because the use of the subjective analysis tool needs to translate the Chinese automobile comment text, the English automobile comment text is not accurately represented, and the judgment of the subjective comment is wrong.

In the application, the automobile comments are analyzed by using the TextBlob tool and the SnowNLP tool together, a subjective comment text set is constructed, useless and large comment data are removed, noise introduction is reduced, and the analysis efficiency and the accuracy of subsequent emotion classification are improved.

And S4, constructing a theme recognition model to recognize the theme of the subjective comment text set.

Further, in another embodiment of the present application, as shown in fig. 4, the step S4 includes:

and S41, constructing a theme recognition model by adopting a Word2Vec method.

Specifically, the Word2Vec method can reduce vector dimension, reduce computational complexity, fully capture context semantic information, and is very suitable for text comments, so that the Word2Vec method is adopted to construct the topic identification model.

And S42, constructing a part-of-speech recognition module and extracting nouns in the subjective comment text set.

Specifically, the automotive theme can be divided into: the themes of power, control, appearance, space, oil consumption, interior, cost performance and the like are just nouns, so the method constructs a part of speech recognition module, processes the subjective comment text set, extracts the nouns of the subjective comment text set, and further reduces the operation amount of the theme recognition model.

And S5, constructing an emotion analysis classifier and acquiring the emotion tendency under each theme.

In the application, an emotion analysis classifier is constructed by using an LSTM method, and the emotional tendency of each theme in a subjective comment text set is acquired and displayed.

The foregoing description shows and describes several preferred embodiments of the present application, but as aforementioned, it is to be understood that the application is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the application, which is to be protected by the claims appended hereto.

Claims

1. An analysis method of automobile comments, characterized by comprising:

s1, acquiring the automobile comments in the network forum by using a Python web crawler;

s2, preprocessing the acquired automobile comments;

wherein the step S3 specifically includes:

step S31, translating each automobile comment, and calculating the subjective score of each automobile comment by using the subjective analysis tool;

step S32, removing the objective comments or the neutral comments based on the subjective score and the subjective threshold value to obtain a first comment text set;

step S33, calculating the emotion score of each automobile comment in the first comment text set by using the emotion analysis tool;

and step S34, removing the objective comments or the neutral comments again based on the emotion scores and the emotion threshold value to obtain the subjective comment text set.

2. The analysis method according to claim 1, wherein the step S2 further specifically comprises:

step S21: removing repeated data;

step S22: data cleaning;

step S23: splitting a sentence;

step S24: stop words are removed.

3. The analysis method according to claim 1, wherein the subjective analysis tool is the TextBlob tool in Python.

4. The analysis method according to claim 3, wherein when the subjective score is greater than or equal to the subjective threshold, the automobile comment is considered to be subjective, the automobile comment is retained, and when the subjective score is less than the subjective threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.

5. The analysis method according to claim 4, wherein the emotion analysis tool is a SnowNLP tool in Python.

6. The analysis method according to claim 5, wherein when the emotion score is equal to or greater than the first emotion threshold or the emotion score is equal to or less than the second emotion threshold, the automobile comment is considered to be positive or negative, i.e., subjective, the automobile comment is retained, when the emotion score is less than the first emotion threshold and the emotion score is greater than the second emotion threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.

7. The analysis method according to claim 1, wherein the step S4 further specifically comprises:

s41, constructing the theme recognition model by adopting a Word2Vec method;

step S42, a part-of-speech recognition module is constructed, and the nouns in the subjective comment text set are extracted;

8. The analysis method according to claim 7, wherein the emotion analysis classifier is constructed using an LSTM method.