CN110704710A - Chinese E-commerce emotion classification method based on deep learning - Google Patents

Chinese E-commerce emotion classification method based on deep learning Download PDF

Info

Publication number
CN110704710A
CN110704710A CN201910839186.2A CN201910839186A CN110704710A CN 110704710 A CN110704710 A CN 110704710A CN 201910839186 A CN201910839186 A CN 201910839186A CN 110704710 A CN110704710 A CN 110704710A
Authority
CN
China
Prior art keywords
chinese
corpus
deep learning
commerce
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910839186.2A
Other languages
Chinese (zh)
Inventor
黄继风
姚志安
陈海光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Normal University
University of Shanghai for Science and Technology
Original Assignee
Shanghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Normal University filed Critical Shanghai Normal University
Priority to CN201910839186.2A priority Critical patent/CN110704710A/en
Publication of CN110704710A publication Critical patent/CN110704710A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a Chinese E-commerce emotion classification method based on deep learning, which comprises the following steps of: step A: obtaining a corpus and a commodity comment corpus, combining the corpus and the commodity comment corpus for preprocessing, and performing Chinese word vector training by using processed data; and B: crawling a plurality of different Chinese and E-commerce comments by using a crawler technology, and converting Chinese and E-commerce comment data into text vectors serving as feature data by using pre-trained word vectors; and C: training by using a traditional machine learning model and a deep learning model based on the text vector, predicting the emotion of the Chinese and telegraph commercial by using the trained model, and evaluating and classifying the performance of the model according to the prediction result; step D: and after the hyper-parameters of the deep learning model learning network are changed, the performance comparison and judgment are carried out on the different output prediction results, and whether the training is stopped or not is judged. Compared with the prior art, the method has the advantages of high accuracy, good performance, good robustness and the like.

Description

Chinese E-commerce emotion classification method based on deep learning
Technical Field
The invention relates to the technical field of emotion classification, in particular to a Chinese e-commerce emotion classification method based on deep learning.
Background
The commodity comment information is very important for merchants, and the merchants often want to know the evaluation and the preference of consumers and the public on the products and the services of the consumers and the public in time, so that the merchant decision maker can quickly respond to and make a decision through the timely feedback of the commodity comment information. The prior art generally only adopts a Wikipedia corpus and trains original commodity comment data by using a traditional machine learning and simple deep learning method. Because the wikipedia corpus is only used for word vector training, the robustness is poor, and the classification effect of the traditional machine learning method and the existing deep learning model on the commodity comments is not ideal.
The robustness problem is a common problem in the field of machine learning, and is specifically represented as follows: the model works well in one product review but not in several other product reviews. Accuracy and F1 values are important metrics for evaluating model performance. In the past methods based on the emotion dictionary, texts are generally regarded as a set of words without considering the relation between the words, and a large amount of manpower and material resources are consumed. Traditional machine learning methods are not ideal in terms of the accuracy of the model. Therefore, how to improve the robustness and accuracy of the model and obtain ideal results becomes a problem to be studied intensively. There are two better ideas to solve the problems of robustness and accuracy at present: (1) improving text corpus size and quality (2) optimizing classifier models, such as using current mainstream deep learning methods to improve model performance. Many studies and experiments show that the accuracy of the classification result is improved well by the two methods.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a Chinese E-commerce emotion classification method based on deep learning.
The purpose of the invention can be realized by the following technical scheme:
a Chinese E-commerce emotion classification method based on deep learning comprises the following steps:
step A: obtaining a corpus and a commodity comment corpus, combining the corpus and the commodity comment corpus for preprocessing, and performing Chinese word vector training by using processed data;
and B: crawling a plurality of different Chinese and E-commerce comments by using a crawler technology, and converting Chinese and E-commerce comment data into text vectors serving as feature data by using pre-trained word vectors;
and C: training by using a traditional machine learning model or a deep learning model based on the text vector, predicting the emotion of the Chinese and telegraph commercial by using the trained model, and evaluating and classifying the performance of the model according to the prediction result;
step D: and after the hyper-parameters of the deep learning model learning network are changed, the performance comparison and judgment are carried out on the different output prediction results, and whether the training is stopped or not is judged.
Further, the step A comprises the following sub-steps:
step A1: decompressing an original Wikipedia corpus, merging the decompressed original Wikipedia corpus and a Jingdong comment corpus into a mixed corpus, and performing processing operations of complex to simplified conversion, word segmentation, English and stop word removal on the mixed corpus to obtain a mixed corpus word bank;
step A2: the mixed corpus thesaurus is trained using a correlation processing model to convert the mixed corpus thesaurus to a Chinese text word vector library.
Further, the relevant processing model in the step a2 is a Skip-Gram model.
Further, the step B comprises the following sub-steps:
step B1: crawling a plurality of different Chinese and E-commerce comments by using a crawler technology, and then performing processing operations of removing duplication, changing from complex to simple, segmenting words, removing English words and stop words on the comments to obtain preprocessed comment data;
step B2: and selecting a plurality of comments from the preprocessed comment data as a data set and simultaneously dividing the data set into a training set, a verification set and a test set.
Further, the plurality of comments in step B2 are composed of 5000 positive and negative comments.
Further, the machine learning model in step C includes DT (decision tree), Bayes (Bayes), KNN (k nearest neighbor), LR (logistic regression), and SVM (support vector machine), and the deep learning model in step C includes CNN (convolutional neural network), CNN-LSTM (convolutional neural network-long-short memory network), and CNN-LSTM-ATT (convolutional neural network-long-short memory network-attention mechanism).
Further, the step C comprises the following sub-steps:
step C1: training by using a traditional machine learning model and a deep learning model based on the text vectors, and inputting the text vectors of one batch each time as characteristic data into the model in the training process;
step C2: and predicting by using the trained model, and evaluating and classifying according to the accuracy, precision, recall rate and F1 value in the prediction result.
Further, the step D comprises the following sub-steps:
step D1: after changing the super parameters of the deep learning model learning network, comparing and judging the performance of different output prediction results, wherein the super parameters comprise batch size, epoch, learning rate and drop rate;
step D2: the part of the text vector that is used as the validation set is used to determine whether to stop training to prevent model overfitting.
Compared with the prior art, the invention has the following advantages:
(1) the method combines the Wikipedia corpus and the commodity comment corpus, and classifies the Chinese E-commerce comment data by adopting a deep learning method (CNN-LSTM-ATT), so that the performance problem of small-sample commodity comment data can be effectively improved, and the classification accuracy of Chinese E-commerce comment can be improved.
(2) The invention utilizes the Wikipedia corpus and the commodity review corpus to well solve the problem of poor robustness in the prior art.
(3) The invention adopts a deep learning method, combines a Convolutional Neural Network (CNN) with a long-time memory network (LSTM) and adds an attention mechanism, thereby improving the accuracy of the classifier.
Drawings
FIG. 1 is a block diagram of the step structure of the present invention;
FIG. 2 is a block diagram of a Chinese and electronic commerce comment emotion classification process based on a CNN-LSTM-ATT model;
FIG. 3 is a block diagram of the neural network model (CNN-LSTM-ATT) structure of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
The steps of the present invention (as shown in FIG. 1) are further described in conjunction with the following figures
The invention relates to Chinese and E-commerce comment emotion classification based on deep learning, which comprises the following steps:
A. firstly, a wikipedia original corpus and a commodity review corpus are obtained, then text preprocessing is carried out, and finally, a word vector model is used for training the preprocessed text corpus to generate a word vector library (as shown in fig. 2).
Further, in the present invention,
A1. and analyzing the compressed packet of the original Wikipedia corpus to obtain a text in the compressed packet, and then merging the text with the Kyoto commodity comment corpus text.
A2. And finally, carrying out word vector training on the preprocessed text by using a Skip-Gram model, wherein the window size is 5 and the word vector dimension is 300.
B. The method comprises the steps of crawling five different commodity reviews by using a crawler technology, then performing text preprocessing, and finally converting a data set into text vectors serving as feature data by using trained word vectors (as shown in figure 2).
Further, in the present invention,
B1. and performing processing such as duplicate removal and stop word removal on the five crawled commodity comments. And then utilizing the generated word vector library, taking 10000 (5000 positive and negative comments) from the preprocessed five commodity comment data as a data set, and simultaneously dividing the data set into a training set, a verification set and a test set, wherein the proportion of the training set, the verification set and the test set is (8:1:1), namely 4000 positive and negative samples of each commodity comment training set, and 500 positive and negative samples of the verification set and the test set are respectively.
B2. Converting the commodity comments into a text matrix with the same size of n × m, and inputting the text matrix as a neural network; wherein n is the number of standard words, m is the dimension of word vectors,where L is the length list of the individual reviews of the dataset, mean (L) is the mean of the L list, std (L) is the standard deviation of the L list.
C. And training the deep learning model, predicting by using the trained model, and evaluating the performance of the model according to the prediction result.
Further, in the present invention,
C1. the deep learning network (as shown in fig. 3) is based on a CNN and LSTM combination method, and includes constructing a data input layer, a convolutional layer, a Batch Normalization layer, an LSTM layer, a full link layer, and an Attention Model layer. Wherein the data input layer is used for receiving characteristic data. The convolutional layer is used for extracting the characteristics of the characteristic data.
And the Batch Normalization layer is used for carrying out Normalization processing on the characteristic data.
And the LSTM layer is used for extracting the time sequence relation of the characteristic data. The Attention Model layer is more focused on finding useful information in the input data related to the current output, and the fully-connected layer is used for performing linear transformation on the feature data.
C2. And (4) predicting by using the trained model, and calculating the accuracy, precision and recall rate of the prediction result and the value of F1.
D. And adjusting the hyper-parameters of the deep learning network, comparing the performance of the prediction result and determining whether to stop training.
Further, in the present invention,
D1. the main hyper-parameters of neural networks include: during the training process, the batch size, the epoch, the learning rate, the discard and the like need to adjust the hyper-parameters, so as to improve the performance of the model.
D2. And in the training process, the verification set is used for determining whether to stop training or not, so that overfitting of the model is prevented.
The accuracy of the method provided by the invention on the five E-commerce comments is respectively [0.938,0.901,0.892,0.928 and 0.924], and compared with the accuracy of the method in the prior art on the five E-commerce comments [0.91,0.877,0.848,0.886 and 0.895], the method provided by the invention is remarkably improved and has good robustness.
In conclusion, according to the Chinese e-commerce comment emotion analysis based on deep learning provided by the invention, a CNN-LSTM-ATT model is used for converting the crawled commodity comment into characteristic data, inputting the characteristic data into a neural network model and evaluating the performance of the neural network model. The invention combines Wikipedia and commodity comment material library, and uses the deep learning CNN-LSTM-ATT model, thereby greatly improving the prediction accuracy of the deep learning network and leading the result to have better robustness.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A Chinese E-commerce emotion classification method based on deep learning is characterized by comprising the following steps:
step A: obtaining a corpus and a commodity comment corpus, combining the corpus and the commodity comment corpus for preprocessing, and performing Chinese word vector training by using processed data;
and B: crawling a plurality of different Chinese and E-commerce comments by using a crawler technology, and converting Chinese and E-commerce comment data into text vectors serving as feature data by using pre-trained word vectors;
and C: training by using a traditional machine learning model or a deep learning model based on the text vector, predicting the emotion of the Chinese and telegraph commercial by using the trained model, and evaluating and classifying the performance of the model according to the prediction result;
step D: and after the hyper-parameters of the deep learning model learning network are changed, the performance comparison and judgment are carried out on the different output prediction results, and whether the training is stopped or not is judged.
2. The Chinese e-commerce emotion classification method based on deep learning of claim 1, wherein the step A comprises the following substeps:
step A1: decompressing an original Wikipedia corpus, merging the decompressed original Wikipedia corpus and a Jingdong comment corpus into a mixed corpus, and performing processing operations of complex to simplified conversion, word segmentation, English and stop word removal on the mixed corpus to obtain a mixed corpus word bank;
step A2: the mixed corpus thesaurus is trained using a correlation processing model to convert the mixed corpus thesaurus to a Chinese text word vector library.
3. The Chinese e-commerce emotion classification method based on deep learning of claim 2, wherein the correlation processing model in the step A2 is a Skip-Gram model.
4. The Chinese e-commerce emotion classification method based on deep learning of claim 1, wherein the step B comprises the following substeps:
step B1: crawling a plurality of different Chinese and E-commerce comments by using a crawler technology, and then performing processing operations of removing duplication, changing from complex to simple, segmenting words, removing English words and stop words on the comments to obtain preprocessed comment data;
step B2: and selecting a plurality of comments from the preprocessed comment data as a data set and simultaneously dividing the data set into a training set, a verification set and a test set.
5. The Chinese E-commerce emotion classification method based on deep learning of claim 4, wherein the plurality of comments in step B2 are composed of 5000 positive and negative comments.
6. The method as claimed in claim 1, wherein the machine learning models in step C include DT, Bayes, KNN, LR and SVM, and the deep learning models in step C include CNN, CNN-LSTM and CNN-LSTM-ATT.
7. The Chinese e-commerce emotion classification method based on deep learning of claim 1, wherein the step C comprises the following substeps:
step C1: training by using a traditional machine learning model and a deep learning model based on the text vectors, and inputting the text vectors of one batch each time as characteristic data into the model in the training process;
step C2: and predicting by using the trained model, and evaluating and classifying according to the accuracy, precision, recall rate and F1 value in the prediction result.
8. The Chinese e-commerce emotion classification method based on deep learning of claim 1, wherein the step D comprises the following substeps:
step D1: after changing the super parameters of the deep learning model learning network, comparing and judging the performance of different output prediction results, wherein the super parameters comprise batch size, epoch, learning rate and drop rate;
step D2: the part of the text vector that is used as the validation set is used to determine whether to stop training to prevent model overfitting.
CN201910839186.2A 2019-09-05 2019-09-05 Chinese E-commerce emotion classification method based on deep learning Pending CN110704710A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910839186.2A CN110704710A (en) 2019-09-05 2019-09-05 Chinese E-commerce emotion classification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910839186.2A CN110704710A (en) 2019-09-05 2019-09-05 Chinese E-commerce emotion classification method based on deep learning

Publications (1)

Publication Number Publication Date
CN110704710A true CN110704710A (en) 2020-01-17

Family

ID=69194418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910839186.2A Pending CN110704710A (en) 2019-09-05 2019-09-05 Chinese E-commerce emotion classification method based on deep learning

Country Status (1)

Country Link
CN (1) CN110704710A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695017A (en) * 2020-06-15 2020-09-22 山东浪潮云服务信息科技有限公司 Method and system for analyzing emotional tendency of user based on product comment
CN112668507A (en) * 2020-12-31 2021-04-16 南京信息工程大学 Sea clutter prediction method and system based on hybrid neural network and attention mechanism
CN113393276A (en) * 2021-06-25 2021-09-14 食亨(上海)科技服务有限公司 Comment data classification method and device and computer readable medium
CN117852507A (en) * 2024-03-07 2024-04-09 南京信息工程大学 Restaurant return guest prediction model, method, system and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357837A (en) * 2017-06-22 2017-11-17 华南师范大学 The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN108427670A (en) * 2018-04-08 2018-08-21 重庆邮电大学 A kind of sentiment analysis method based on context word vector sum deep learning
CN108984775A (en) * 2018-07-24 2018-12-11 南京新贝金服科技有限公司 A kind of public sentiment monitoring method and system based on comment on commodity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357837A (en) * 2017-06-22 2017-11-17 华南师范大学 The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method
CN108038240A (en) * 2017-12-26 2018-05-15 武汉大学 Based on content, the social networks rumour detection method of user's multiplicity
CN108427670A (en) * 2018-04-08 2018-08-21 重庆邮电大学 A kind of sentiment analysis method based on context word vector sum deep learning
CN108984775A (en) * 2018-07-24 2018-12-11 南京新贝金服科技有限公司 A kind of public sentiment monitoring method and system based on comment on commodity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姚志安: "基于深度神经网络的电商评论情感极性分析研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
李文江 等: "基于深度学习的商品评论情感分类研究", 《知识管理论坛》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695017A (en) * 2020-06-15 2020-09-22 山东浪潮云服务信息科技有限公司 Method and system for analyzing emotional tendency of user based on product comment
CN112668507A (en) * 2020-12-31 2021-04-16 南京信息工程大学 Sea clutter prediction method and system based on hybrid neural network and attention mechanism
CN113393276A (en) * 2021-06-25 2021-09-14 食亨(上海)科技服务有限公司 Comment data classification method and device and computer readable medium
CN113393276B (en) * 2021-06-25 2023-06-16 食亨(上海)科技服务有限公司 Comment data classification method, comment data classification device and computer-readable medium
CN117852507A (en) * 2024-03-07 2024-04-09 南京信息工程大学 Restaurant return guest prediction model, method, system and equipment
CN117852507B (en) * 2024-03-07 2024-05-17 南京信息工程大学 Restaurant return guest prediction model, method, system and equipment

Similar Documents

Publication Publication Date Title
CN110609897B (en) Multi-category Chinese text classification method integrating global and local features
CN110704710A (en) Chinese E-commerce emotion classification method based on deep learning
CN108363810B (en) Text classification method and device
CN107391772B (en) Text classification method based on naive Bayes
US20210027016A1 (en) Method for detecting deceptive e-commerce reviews based on sentiment-topic joint probability
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
Pandey et al. An analysis of machine learning techniques (J48 & AdaBoost)-for classification
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
CN106919997B (en) LDA-based user consumption prediction method for electronic commerce
WO2020151634A1 (en) Patent evaluation method and system
CN115410199A (en) Image content retrieval method, device, equipment and storage medium
Hussain et al. Design and analysis of news category predictor
CN111130942A (en) Application flow identification method based on message size analysis
Jayakody et al. Sentiment analysis on product reviews on twitter using Machine Learning Approaches
Kumar et al. Emotion analysis of news and social media text for stock price prediction using svm-lstm-gru composite model
Chen et al. Learning a general clause-to-clause relationships for enhancing emotion-cause pair extraction
CN113010705A (en) Label prediction method, device, equipment and storage medium
Sana et al. Data transformation based optimized customer churn prediction model for the telecommunication industry
Cahya et al. Deep feature weighting based on genetic algorithm and Naïve Bayes for Twitter sentiment analysis
Mehedi et al. Automatic bangla article content categorization using a hybrid deep learning model
CN115526174A (en) Deep learning model fusion method for finance and economics text emotional tendency classification
Kara et al. A SHAP-based active learning approach for creating high-quality training data
Papakostas et al. Evolutionary feature subset selection for pattern recognition applications
Guo et al. An active learning method based on mistake sampling for large scale imbalanced classification
CN113641824A (en) Text classification system and method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117

RJ01 Rejection of invention patent application after publication