CN115587178A - Automobile comment analysis method - Google Patents

Automobile comment analysis method Download PDF

Info

Publication number
CN115587178A
CN115587178A CN202211096916.2A CN202211096916A CN115587178A CN 115587178 A CN115587178 A CN 115587178A CN 202211096916 A CN202211096916 A CN 202211096916A CN 115587178 A CN115587178 A CN 115587178A
Authority
CN
China
Prior art keywords
subjective
automobile
emotion
comment
comments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211096916.2A
Other languages
Chinese (zh)
Inventor
皋勋
韩骅
许多
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Wangshang E Commerce Co ltd
Original Assignee
Shanghai Wangshang E Commerce Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Wangshang E Commerce Co ltd filed Critical Shanghai Wangshang E Commerce Co ltd
Priority to CN202211096916.2A priority Critical patent/CN115587178A/en
Publication of CN115587178A publication Critical patent/CN115587178A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application provides an analysis method of automobile comments, which comprises the steps of obtaining the automobile comments in a network forum by using a Python web crawler, preprocessing the obtained automobile comments, constructing a subjective comment text set by using a subjective analysis tool and an emotion analysis tool, removing objective comments or neutral comments, constructing a theme recognition model, recognizing themes of the subjective comment text set, constructing an emotion analysis classifier, and obtaining emotional tendency under each theme. According to the method and the device, useless and large comment data are removed, introduction of noise is reduced, and subsequent analysis efficiency and emotion classification accuracy are improved.

Description

Automobile comment analysis method
Technical Field
The application relates to an analysis method, in particular to an analysis method of automobile comments.
Background
With the rapid development of science and technology, the internet has been accompanied by the life of people all the time, and the comments of consumers on the internet are exponentially increased. The online comment of the automobile product is a valuable resource, and the evaluation of the user on various aspects of the automobile product is contained in the online comment, and the evaluation comprises the evaluation of multiple angles such as the most satisfactory evaluation, the most unsatisfactory evaluation, the space evaluation, the power evaluation, the cost performance evaluation and the like. These evaluations are the most realistic feelings of the consumers, include the emotional tendency of the buyers to the products, and are valuable resources of the car enterprises. For automobile manufacturers, automobile reviews are used for analysis, so that problems can be found timely, user requirements can be known timely, and enterprises are promoted to improve product design.
Currently, emotion analysis methods can be mainly classified into emotion analysis methods based on an emotion dictionary, emotion analysis methods based on traditional machine learning, and emotion analysis methods based on deep learning. The conventional sentiment analysis method directly classifies the sentiment after the pretreatment such as cleaning and marking is carried out on the automobile comments, and the objective or neutral comments are introduced unnecessarily, so that the subsequent sentiment analysis process is invalid and more problems are caused: 1. useless and large analysis data are introduced, and the analysis efficiency is reduced; 2. these objective or neutral comments introduce noise, compared to subjective comments, reducing the accuracy of subsequent sentiment classification.
Therefore, how to improve the analysis efficiency and accuracy of the automobile comments becomes a technical problem to be solved at present.
Disclosure of Invention
The present application is made in view of the above problems, so as to provide an analysis method of automobile reviews, which is used for improving the analysis efficiency and accuracy of automobile reviews.
The application provides an analysis method of automobile comments, which comprises the following steps:
s1, acquiring automobile comments in a web forum by using a Python web crawler;
s2, preprocessing the acquired automobile comments;
s3, constructing a subjective comment text set by using a subjective analysis tool and an emotion analysis tool, and removing objective comments or neutral comments;
s4, constructing a theme identification model to identify the theme of the subjective comment text set;
s5, constructing an emotion analysis classifier, and acquiring the emotional tendency of each theme;
wherein the step S3 specifically includes:
step S31, translating each automobile comment, and calculating the subjective score of each automobile comment by using a subjective analysis tool;
step S32, removing objective comments or neutral comments based on the subjective score and the subjective threshold value to obtain a first comment text set;
step S33, calculating the emotion score of each automobile comment in the first comment text set by using an emotion analysis tool;
and step S34, removing the objective comments or the neutral comments again based on the emotion scores and the emotion threshold values to obtain a subjective comment text set.
Further, step S2 further specifically includes:
step S21: removing repeated data;
step S22: data cleaning;
step S23: splitting a sentence;
step S24: stop words are removed.
Further, the subjective analysis tool is the TextBlob tool in Python.
Further, when the subjective score is larger than or equal to the subjective threshold, the automobile comment is considered to be subjective, the automobile comment is reserved, and when the subjective score is smaller than the subjective threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.
Further, the emotion analysis tool is a SnowNLP tool in Python.
Further, when the emotion score is greater than or equal to a first emotion threshold value or the emotion score is less than or equal to a second emotion threshold value, the automobile comment is considered to be positive or negative, namely subjective, the automobile comment is reserved, when the emotion score is less than the first emotion threshold value and the emotion score is greater than the second emotion threshold value, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.
Further, step S4 further specifically includes:
s41, constructing a theme recognition model by adopting a Word2Vec method;
s42, constructing a part-of-speech recognition module and extracting nouns in the subjective comment text set;
and S43, identifying the theme of the subjective comment text set by using the theme identification model.
Further, the emotion analysis classifier is constructed by using an LSTM method.
The beneficial effect of this application is:
the application provides an analysis method of automobile comments, the automobile comments are analyzed by using a TextBlob tool and a SnowNLP tool together, a subjective comment text set is constructed, useless and large comment data are removed, noise introduction is reduced, and subsequent analysis efficiency and emotion classification accuracy are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following descriptions are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of a method for analyzing automobile reviews provided by the present application;
FIG. 2 is a flow chart of the pre-processing provided herein;
FIG. 3 is a flow chart for constructing a subjective comment text set provided by the present application;
FIG. 4 is a flow chart of identifying a subject of a subjective opinion corpus topic as provided herein;
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a" and "an" typically include at least two, but do not exclude the presence of at least one.
The application provides an analysis method of automobile comments, the automobile comments are analyzed by using a TextBlob tool and a SnowNLP tool together, a subjective comment text set is constructed, useless and large comment data are removed, noise introduction is reduced, and subsequent analysis efficiency and emotion classification accuracy are improved.
The present application is further described with reference to the following figures and specific examples.
Fig. 1 is a flowchart of an analysis method for automobile reviews, according to an embodiment of the present invention. As shown in fig. 1, an analysis method of a car review includes:
and S1, acquiring the automobile comments in the network forum by using a Python web crawler.
The internet forum provides functions of user evaluation, consultation, question answering and the like, and the user can evaluate automobile price, product quality, service quality and the like on the internet forum.
And S2, preprocessing the acquired automobile comments.
Data preprocessing is necessary, and the quality of data can have great influence on subsequent emotion analysis. Data is generally text acquired in the internet forum, and the text belongs to unstructured data, wherein a plurality of noises which influence results, such as punctuation, emoticons, stop words and the like, must be processed.
Further, in another embodiment of the present application, as shown in fig. 2, step S2 further specifically includes:
step S21: duplicate data is removed.
The comments acquired in the network forum are acquired by the web crawler, so that repeated information can be generated in the capturing process inevitably, and the comments have repeated problems. These increase the training time of the model on the one hand, on the other hand have seriously influenced the reliable letter of the model training parameter in the model training, and then lead to that each index of final classification is all very low. Therefore, it is very necessary to filter out the duplicated information.
Step S22: and (5) data cleaning.
More and more reviewers currently prefer to express their own opinion using repeated punctuation or emoticons that are not relevant to analysis and require data cleansing. And (4) cleaning by using data, and deleting interferences such as non-Chinese characters, blank spaces, numbers, punctuation coincidence, marks, abbreviations and unrecognizable words.
Step S23: and (4) splitting the sentence.
The sentence is split according to the inflected terms, including "but", "however", "instead", and so on.
Step S24: stop words are removed.
And removing unimportant information by combining the common stop word list.
And S3, constructing a subjective comment text set by utilizing a subjective analysis tool and an emotion analysis tool.
And removing objective comments or neutral comments from the preprocessed comment texts, and constructing a text set only comprising subjective comments.
Further, in another embodiment of the present application, as shown in fig. 3, the step S3 includes:
and S31, translating each automobile comment of the automobile, and calculating the subjective score of each automobile comment by using a subjective analysis tool.
Specifically, the subjective analysis tool is preferably a TextBlob tool in Python, and each automobile comment is subjectively analyzed by using the TextBlob tool to calculate a corresponding subjective score.
And step S32, removing objective comments or neutral comments based on the subjective score and the subjective threshold value to obtain a first comment text set.
Specifically, when the subjective score is greater than or equal to the subjective threshold, the automobile comment is considered to be subjective, the automobile comment is retained, and when the subjective score is smaller than the subjective threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.
In this embodiment of the present application, the TextBlob tool, when performing subjective analysis, may divide the subjective score into two intervals: (0,0.2) is an objective comment, and [0.2,1] is a subjective comment.
At present, only a TextBlob tool in Python is used as a subjective analysis tool, but the TextBlob tool is for an English text and cannot process a Chinese text, so that the first comment text set is obtained by translating the Chinese automobile comment and then carrying out subjective analysis.
And step S33, calculating the emotion score of each automobile comment in the first comment text set by using an emotion analysis tool.
Specifically, the emotion analysis tool preferably selects a snowNLP tool in Python, and performs emotion analysis on each automobile comment by using the snowNLP tool to calculate a corresponding emotion score.
And step S34, removing the objective comments or the neutral comments again based on the emotion scores and the emotion threshold values to obtain a subjective comment text set.
Specifically, when the sentiment score is greater than or equal to a first sentiment threshold value or the sentiment score is less than or equal to a second sentiment threshold value, the automobile comment is considered to be positive or negative, namely subjective, the automobile comment is reserved, when the sentiment score is less than the first sentiment threshold value and the sentiment score is greater than the second sentiment threshold value, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.
In this embodiment of the present application, during emotion analysis, the SnowNLP tool may divide the emotion degree into three intervals: [0,0.4] is a negative comment, (0.4,0.6) is a neutral comment, and [0.6,1] is a positive comment.
In the application, because the use of the subjective analysis tool needs to translate the Chinese automobile comment text, the English automobile comment text is not accurately represented, and the judgment of the subjective comment is wrong.
In the application, the automobile comments are analyzed by using the TextBlob tool and the SnowNLP tool together, a subjective comment text set is constructed, useless and large comment data are removed, noise introduction is reduced, and the analysis efficiency and the accuracy of subsequent emotion classification are improved.
And S4, constructing a theme recognition model to recognize the theme of the subjective comment text set.
Further, in another embodiment of the present application, as shown in fig. 4, the step S4 includes:
and S41, constructing a theme recognition model by adopting a Word2Vec method.
Specifically, the Word2Vec method can reduce vector dimension, reduce computational complexity, fully capture context semantic information, and is very suitable for text comments, so that the Word2Vec method is adopted to construct the topic identification model.
And S42, constructing a part-of-speech recognition module and extracting nouns in the subjective comment text set.
Specifically, the automotive theme can be divided into: the themes of power, control, appearance, space, oil consumption, interior, cost performance and the like are just nouns, so the method constructs a part of speech recognition module, processes the subjective comment text set, extracts the nouns of the subjective comment text set, and further reduces the operation amount of the theme recognition model.
And S43, identifying the theme of the subjective comment text set by using the theme identification model.
And S5, constructing an emotion analysis classifier and acquiring the emotion tendency under each theme.
In the application, an emotion analysis classifier is constructed by using an LSTM method, and the emotional tendency of each theme in a subjective comment text set is acquired and displayed.
The foregoing description shows and describes several preferred embodiments of the present application, but as aforementioned, it is to be understood that the application is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the application, which is to be protected by the claims appended hereto.

Claims (8)

1. An analysis method of automobile comments, characterized by comprising:
s1, acquiring the automobile comments in the network forum by using a Python web crawler;
s2, preprocessing the acquired automobile comments;
s3, constructing a subjective comment text set by using a subjective analysis tool and an emotion analysis tool, and removing objective comments or neutral comments;
s4, constructing a theme identification model to identify the theme of the subjective comment text set;
s5, constructing an emotion analysis classifier, and acquiring the emotional tendency of each theme;
wherein the step S3 specifically includes:
step S31, translating each automobile comment, and calculating the subjective score of each automobile comment by using the subjective analysis tool;
step S32, removing the objective comments or the neutral comments based on the subjective score and the subjective threshold value to obtain a first comment text set;
step S33, calculating the emotion score of each automobile comment in the first comment text set by using the emotion analysis tool;
and step S34, removing the objective comments or the neutral comments again based on the emotion scores and the emotion threshold value to obtain the subjective comment text set.
2. The analysis method according to claim 1, wherein the step S2 further specifically comprises:
step S21: removing repeated data;
step S22: data cleaning;
step S23: splitting a sentence;
step S24: stop words are removed.
3. The analysis method according to claim 1, wherein the subjective analysis tool is the TextBlob tool in Python.
4. The analysis method according to claim 3, wherein when the subjective score is greater than or equal to the subjective threshold, the automobile comment is considered to be subjective, the automobile comment is retained, and when the subjective score is less than the subjective threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.
5. The analysis method according to claim 4, wherein the emotion analysis tool is a SnowNLP tool in Python.
6. The analysis method according to claim 5, wherein when the emotion score is equal to or greater than the first emotion threshold or the emotion score is equal to or less than the second emotion threshold, the automobile comment is considered to be positive or negative, i.e., subjective, the automobile comment is retained, when the emotion score is less than the first emotion threshold and the emotion score is greater than the second emotion threshold, the automobile comment is considered to be objective or neutral, and the automobile comment is deleted.
7. The analysis method according to claim 1, wherein the step S4 further specifically comprises:
s41, constructing the theme recognition model by adopting a Word2Vec method;
step S42, a part-of-speech recognition module is constructed, and the nouns in the subjective comment text set are extracted;
and S43, identifying the theme of the subjective comment text set by using the theme identification model.
8. The analysis method according to claim 7, wherein the emotion analysis classifier is constructed using an LSTM method.
CN202211096916.2A 2022-09-08 2022-09-08 Automobile comment analysis method Pending CN115587178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211096916.2A CN115587178A (en) 2022-09-08 2022-09-08 Automobile comment analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211096916.2A CN115587178A (en) 2022-09-08 2022-09-08 Automobile comment analysis method

Publications (1)

Publication Number Publication Date
CN115587178A true CN115587178A (en) 2023-01-10

Family

ID=84772229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211096916.2A Pending CN115587178A (en) 2022-09-08 2022-09-08 Automobile comment analysis method

Country Status (1)

Country Link
CN (1) CN115587178A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078167A1 (en) * 2009-09-28 2011-03-31 Neelakantan Sundaresan System and method for topic extraction and opinion mining
CN108446813A (en) * 2017-12-19 2018-08-24 清华大学 A kind of method of electric business service quality overall merit
CN110866398A (en) * 2020-01-07 2020-03-06 腾讯科技(深圳)有限公司 Comment text processing method and device, storage medium and computer equipment
CN110929026A (en) * 2018-09-19 2020-03-27 阿里巴巴集团控股有限公司 Abnormal text recognition method and device, computing equipment and medium
CN113408269A (en) * 2021-07-20 2021-09-17 北京百度网讯科技有限公司 Text emotion analysis method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078167A1 (en) * 2009-09-28 2011-03-31 Neelakantan Sundaresan System and method for topic extraction and opinion mining
CN108446813A (en) * 2017-12-19 2018-08-24 清华大学 A kind of method of electric business service quality overall merit
CN110929026A (en) * 2018-09-19 2020-03-27 阿里巴巴集团控股有限公司 Abnormal text recognition method and device, computing equipment and medium
CN110866398A (en) * 2020-01-07 2020-03-06 腾讯科技(深圳)有限公司 Comment text processing method and device, storage medium and computer equipment
CN113408269A (en) * 2021-07-20 2021-09-17 北京百度网讯科技有限公司 Text emotion analysis method and device

Similar Documents

Publication Publication Date Title
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN107092596B (en) Text emotion analysis method based on attention CNNs and CCR
CN109376251A (en) A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model
CN107729320B (en) Emoticon recommendation method based on time sequence analysis of user session emotion trend
CN105912629B (en) A kind of intelligent answer method and device
CN105427869A (en) Session emotion autoanalysis method based on depth learning
KR20120109943A (en) Emotion classification method for analysis of emotion immanent in sentence
Khasawneh et al. Sentiment analysis of Arabic social media content: a comparative study
CN108052504B (en) Structure analysis method and system for mathematic subjective question answer result
CN110377695B (en) Public opinion theme data clustering method and device and storage medium
CN112287197B (en) Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN107818173B (en) Vector space model-based Chinese false comment filtering method
Hassan et al. Opinion within opinion: segmentation approach for urdu sentiment analysis.
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN115687634A (en) Financial entity relationship extraction system and method combining priori knowledge
CN111651606A (en) Text processing method and device and electronic equipment
CN112711666B (en) Futures label extraction method and device
CN113688624A (en) Personality prediction method and device based on language style
CN114298021A (en) Rumor detection method based on sentiment value selection comments
CN112115712A (en) Topic-based group emotion analysis method
CN115587178A (en) Automobile comment analysis method
Hüning et al. Detecting arguments and their positions in experimental communication data
Gurin Methods for Automatic Sentiment Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination